Blame ld/ldint.texinfo

Packit ba3681
\input texinfo
Packit ba3681
@setfilename ldint.info
Packit ba3681
@c Copyright (C) 1992-2018 Free Software Foundation, Inc.
Packit ba3681
Packit ba3681
@ifnottex
Packit ba3681
@dircategory Software development
Packit ba3681
@direntry
Packit ba3681
* Ld-Internals: (ldint).	The GNU linker internals.
Packit ba3681
@end direntry
Packit ba3681
@end ifnottex
Packit ba3681
Packit ba3681
@copying
Packit ba3681
This file documents the internals of the GNU linker ld.
Packit ba3681
Packit ba3681
Copyright @copyright{} 1992-2018 Free Software Foundation, Inc.
Packit ba3681
Contributed by Cygnus Support.
Packit ba3681
Packit ba3681
Permission is granted to copy, distribute and/or modify this document
Packit ba3681
under the terms of the GNU Free Documentation License, Version 1.3 or
Packit ba3681
any later version published by the Free Software Foundation; with the
Packit ba3681
Invariant Sections being ``GNU General Public License'' and ``Funding
Packit ba3681
Free Software'', the Front-Cover texts being (a) (see below), and with
Packit ba3681
the Back-Cover Texts being (b) (see below).  A copy of the license is
Packit ba3681
included in the section entitled ``GNU Free Documentation License''.
Packit ba3681
Packit ba3681
(a) The FSF's Front-Cover Text is:
Packit ba3681
Packit ba3681
     A GNU Manual
Packit ba3681
Packit ba3681
(b) The FSF's Back-Cover Text is:
Packit ba3681
Packit ba3681
     You have freedom to copy and modify this GNU Manual, like GNU
Packit ba3681
     software.  Copies published by the Free Software Foundation raise
Packit ba3681
     funds for GNU development.
Packit ba3681
@end copying
Packit ba3681
Packit ba3681
@iftex
Packit ba3681
@finalout
Packit ba3681
@setchapternewpage off
Packit ba3681
@settitle GNU Linker Internals
Packit ba3681
@titlepage
Packit ba3681
@title{A guide to the internals of the GNU linker}
Packit ba3681
@author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
Packit ba3681
@author Cygnus Support
Packit ba3681
@page
Packit ba3681
Packit ba3681
@tex
Packit ba3681
\def\$#1${{#1}}  % Kluge: collect RCS revision info without $...$
Packit ba3681
\xdef\manvers{2.10.91}  % For use in headers, footers too
Packit ba3681
{\parskip=0pt
Packit ba3681
\hfill Cygnus Support\par
Packit ba3681
\hfill \manvers\par
Packit ba3681
\hfill \TeX{}info \texinfoversion\par
Packit ba3681
}
Packit ba3681
@end tex
Packit ba3681
Packit ba3681
@vskip 0pt plus 1filll
Packit ba3681
Copyright @copyright{} 1992-2018 Free Software Foundation, Inc.
Packit ba3681
Packit ba3681
      Permission is granted to copy, distribute and/or modify this document
Packit ba3681
      under the terms of the GNU Free Documentation License, Version 1.3
Packit ba3681
      or any later version published by the Free Software Foundation;
Packit ba3681
      with no Invariant Sections, with no Front-Cover Texts, and with no
Packit ba3681
      Back-Cover Texts.  A copy of the license is included in the
Packit ba3681
      section entitled "GNU Free Documentation License".
Packit ba3681
Packit ba3681
@end titlepage
Packit ba3681
@end iftex
Packit ba3681
Packit ba3681
@node Top
Packit ba3681
@top
Packit ba3681
Packit ba3681
This file documents the internals of the GNU linker @code{ld}.  It is a
Packit ba3681
collection of miscellaneous information with little form at this point.
Packit ba3681
Mostly, it is a repository into which you can put information about
Packit ba3681
GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
Packit ba3681
Packit ba3681
This document is distributed under the terms of the GNU Free
Packit ba3681
Documentation License.  A copy of the license is included in the
Packit ba3681
section entitled "GNU Free Documentation License".
Packit ba3681
Packit ba3681
@menu
Packit ba3681
* README::			The README File
Packit ba3681
* Emulations::			How linker emulations are generated
Packit ba3681
* Emulation Walkthrough::	A Walkthrough of a Typical Emulation
Packit ba3681
* Architecture Specific::	Some Architecture Specific Notes
Packit ba3681
* GNU Free Documentation License::  GNU Free Documentation License
Packit ba3681
@end menu
Packit ba3681
Packit ba3681
@node README
Packit ba3681
@chapter The @file{README} File
Packit ba3681
Packit ba3681
Check the @file{README} file; it often has useful information that does not
Packit ba3681
appear anywhere else in the directory.
Packit ba3681
Packit ba3681
@node Emulations
Packit ba3681
@chapter How linker emulations are generated
Packit ba3681
Packit ba3681
Each linker target has an @dfn{emulation}.  The emulation includes the
Packit ba3681
default linker script, and certain emulations also modify certain types
Packit ba3681
of linker behaviour.
Packit ba3681
Packit ba3681
Emulations are created during the build process by the shell script
Packit ba3681
@file{genscripts.sh}.
Packit ba3681
Packit ba3681
The @file{genscripts.sh} script starts by reading a file in the
Packit ba3681
@file{emulparams} directory.  This is a shell script which sets various
Packit ba3681
shell variables used by @file{genscripts.sh} and the other shell scripts
Packit ba3681
it invokes.
Packit ba3681
Packit ba3681
The @file{genscripts.sh} script will invoke a shell script in the
Packit ba3681
@file{scripttempl} directory in order to create default linker scripts
Packit ba3681
written in the linker command language.  The @file{scripttempl} script
Packit ba3681
will be invoked 5 (or, in some cases, 6) times, with different
Packit ba3681
assignments to shell variables, to create different default scripts.
Packit ba3681
The choice of script is made based on the command line options.
Packit ba3681
Packit ba3681
After creating the scripts, @file{genscripts.sh} will invoke yet another
Packit ba3681
shell script, this time in the @file{emultempl} directory.  That shell
Packit ba3681
script will create the emulation source file, which contains C code.
Packit ba3681
This C code permits the linker emulation to override various linker
Packit ba3681
behaviours.  Most targets use the generic emulation code, which is in
Packit ba3681
@file{emultempl/generic.em}.
Packit ba3681
Packit ba3681
To summarize, @file{genscripts.sh} reads three shell scripts: an
Packit ba3681
emulation parameters script in the @file{emulparams} directory, a linker
Packit ba3681
script generation script in the @file{scripttempl} directory, and an
Packit ba3681
emulation source file generation script in the @file{emultempl}
Packit ba3681
directory.
Packit ba3681
Packit ba3681
For example, the Sun 4 linker sets up variables in
Packit ba3681
@file{emulparams/sun4.sh}, creates linker scripts using
Packit ba3681
@file{scripttempl/aout.sc}, and creates the emulation code using
Packit ba3681
@file{emultempl/sunos.em}.
Packit ba3681
Packit ba3681
Note that the linker can support several emulations simultaneously,
Packit ba3681
depending upon how it is configured.  An emulation can be selected with
Packit ba3681
the @code{-m} option.  The @code{-V} option will list all supported
Packit ba3681
emulations.
Packit ba3681
Packit ba3681
@menu
Packit ba3681
* emulation parameters::        @file{emulparams} scripts
Packit ba3681
* linker scripts::              @file{scripttempl} scripts
Packit ba3681
* linker emulations::           @file{emultempl} scripts
Packit ba3681
@end menu
Packit ba3681
Packit ba3681
@node emulation parameters
Packit ba3681
@section @file{emulparams} scripts
Packit ba3681
Packit ba3681
Each target selects a particular file in the @file{emulparams} directory
Packit ba3681
by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
Packit ba3681
This shell variable is used by the @file{configure} script to control
Packit ba3681
building an emulation source file.
Packit ba3681
Packit ba3681
Certain conventions are enforced.  Suppose the @code{targ_emul} variable
Packit ba3681
is set to @var{emul} in @file{configure.tgt}.  The name of the emulation
Packit ba3681
shell script will be @file{emulparams/@var{emul}.sh}.  The
Packit ba3681
@file{Makefile} must have a target named @file{e@var{emul}.c}; this
Packit ba3681
target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
Packit ba3681
appropriate scripts in the @file{scripttempl} and @file{emultempl}
Packit ba3681
directories.  The @file{Makefile} target must invoke @code{GENSCRIPTS}
Packit ba3681
with two arguments: @var{emul}, and the value of the make variable
Packit ba3681
@code{tdir_@var{emul}}.  The value of the latter variable will be set by
Packit ba3681
the @file{configure} script, and is used to set the default target
Packit ba3681
directory to search.
Packit ba3681
Packit ba3681
By convention, the @file{emulparams/@var{emul}.sh} shell script should
Packit ba3681
only set shell variables.  It may set shell variables which are to be
Packit ba3681
interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
Packit ba3681
Certain shell variables are interpreted directly by the
Packit ba3681
@file{genscripts.sh} script.
Packit ba3681
Packit ba3681
Here is a list of shell variables interpreted by @file{genscripts.sh},
Packit ba3681
as well as some conventional shell variables interpreted by the
Packit ba3681
@file{scripttempl} and @file{emultempl} scripts.
Packit ba3681
Packit ba3681
@table @code
Packit ba3681
@item SCRIPT_NAME
Packit ba3681
This is the name of the @file{scripttempl} script to use.  If
Packit ba3681
@code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
Packit ba3681
the script @file{scripttempl/@var{script}.sc}.
Packit ba3681
Packit ba3681
@item TEMPLATE_NAME
Packit ba3681
This is the name of the @file{emultempl} script to use.  If
Packit ba3681
@code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
Packit ba3681
use the script @file{emultempl/@var{template}.em}.  If this variable is
Packit ba3681
not set, the default value is @samp{generic}.
Packit ba3681
Packit ba3681
@item GENERATE_SHLIB_SCRIPT
Packit ba3681
If this is set to a nonempty string, @file{genscripts.sh} will invoke
Packit ba3681
the @file{scripttempl} script an extra time to create a shared library
Packit ba3681
script.  @ref{linker scripts}.
Packit ba3681
Packit ba3681
@item OUTPUT_FORMAT
Packit ba3681
This is normally set to indicate the BFD output format use (e.g.,
Packit ba3681
@samp{"a.out-sunos-big"}.  The @file{scripttempl} script will normally
Packit ba3681
use it in an @code{OUTPUT_FORMAT} expression in the linker script.
Packit ba3681
Packit ba3681
@item ARCH
Packit ba3681
This is normally set to indicate the architecture to use (e.g.,
Packit ba3681
@samp{sparc}).  The @file{scripttempl} script will normally use it in an
Packit ba3681
@code{OUTPUT_ARCH} expression in the linker script.
Packit ba3681
Packit ba3681
@item ENTRY
Packit ba3681
Some @file{scripttempl} scripts use this to set the entry address, in an
Packit ba3681
@code{ENTRY} expression in the linker script.
Packit ba3681
Packit ba3681
@item TEXT_START_ADDR
Packit ba3681
Some @file{scripttempl} scripts use this to set the start address of the
Packit ba3681
@samp{.text} section.
Packit ba3681
Packit ba3681
@item SEGMENT_SIZE
Packit ba3681
The @file{genscripts.sh} script uses this to set the default value of
Packit ba3681
@code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
Packit ba3681
Packit ba3681
@item TARGET_PAGE_SIZE
Packit ba3681
If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
Packit ba3681
uses this to define it.
Packit ba3681
Packit ba3681
@item ALIGNMENT
Packit ba3681
Some @file{scripttempl} scripts set this to a number to pass to
Packit ba3681
@code{ALIGN} to set the required alignment for the @code{end} symbol.
Packit ba3681
@end table
Packit ba3681
Packit ba3681
@node linker scripts
Packit ba3681
@section @file{scripttempl} scripts
Packit ba3681
Packit ba3681
Each linker target uses a @file{scripttempl} script to generate the
Packit ba3681
default linker scripts.  The name of the @file{scripttempl} script is
Packit ba3681
set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
Packit ba3681
If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
Packit ba3681
invoke @file{scripttempl/@var{script}.sc}.
Packit ba3681
Packit ba3681
The @file{genscripts.sh} script will invoke the @file{scripttempl}
Packit ba3681
script 5 to 9 times.  Each time it will set the shell variable
Packit ba3681
@code{LD_FLAG} to a different value.  When the linker is run, the
Packit ba3681
options used will direct it to select a particular script.  (Script
Packit ba3681
selection is controlled by the @code{get_script} emulation entry point;
Packit ba3681
this describes the conventional behaviour).
Packit ba3681
Packit ba3681
The @file{scripttempl} script should just write a linker script, written
Packit ba3681
in the linker command language, to standard output.  If the emulation
Packit ba3681
name--the name of the @file{emulparams} file without the @file{.sc}
Packit ba3681
extension--is @var{emul}, then the output will be directed to
Packit ba3681
@file{ldscripts/@var{emul}.@var{extension}} in the build directory,
Packit ba3681
where @var{extension} changes each time the @file{scripttempl} script is
Packit ba3681
invoked.
Packit ba3681
Packit ba3681
Here is the list of values assigned to @code{LD_FLAG}.
Packit ba3681
Packit ba3681
@table @code
Packit ba3681
@item (empty)
Packit ba3681
The script generated is used by default (when none of the following
Packit ba3681
cases apply).  The output has an extension of @file{.x}.
Packit ba3681
@item n
Packit ba3681
The script generated is used when the linker is invoked with the
Packit ba3681
@code{-n} option.  The output has an extension of @file{.xn}.
Packit ba3681
@item N
Packit ba3681
The script generated is used when the linker is invoked with the
Packit ba3681
@code{-N} option.  The output has an extension of @file{.xbn}.
Packit ba3681
@item r
Packit ba3681
The script generated is used when the linker is invoked with the
Packit ba3681
@code{-r} option.  The output has an extension of @file{.xr}.
Packit ba3681
@item u
Packit ba3681
The script generated is used when the linker is invoked with the
Packit ba3681
@code{-Ur} option.  The output has an extension of @file{.xu}.
Packit ba3681
@item shared
Packit ba3681
The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
Packit ba3681
this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
Packit ba3681
@file{emulparams} file.  The @file{emultempl} script must arrange to use
Packit ba3681
this script at the appropriate time, normally when the linker is invoked
Packit ba3681
with the @code{-shared} option.  The output has an extension of
Packit ba3681
@file{.xs}.
Packit ba3681
@item c
Packit ba3681
The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
Packit ba3681
this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
Packit ba3681
@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The
Packit ba3681
@file{emultempl} script must arrange to use this script at the appropriate
Packit ba3681
time, normally when the linker is invoked with the @code{-z combreloc}
Packit ba3681
option.  The output has an extension of
Packit ba3681
@file{.xc}.
Packit ba3681
@item cshared
Packit ba3681
The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
Packit ba3681
this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
Packit ba3681
@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and
Packit ba3681
@code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file.
Packit ba3681
The @file{emultempl} script must arrange to use this script at the
Packit ba3681
appropriate time, normally when the linker is invoked with the @code{-shared
Packit ba3681
-z combreloc} option.  The output has an extension of @file{.xsc}.
Packit ba3681
@item auto_import
Packit ba3681
The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
Packit ba3681
this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the
Packit ba3681
@file{emulparams} file.  The @file{emultempl} script must arrange to
Packit ba3681
use this script at the appropriate time, normally when the linker is
Packit ba3681
invoked with the @code{--enable-auto-import} option.  The output has
Packit ba3681
an extension of @file{.xa}.
Packit ba3681
@end table
Packit ba3681
Packit ba3681
Besides the shell variables set by the @file{emulparams} script, and the
Packit ba3681
@code{LD_FLAG} variable, the @file{genscripts.sh} script will set
Packit ba3681
certain variables for each run of the @file{scripttempl} script.
Packit ba3681
Packit ba3681
@table @code
Packit ba3681
@item RELOCATING
Packit ba3681
This will be set to a non-empty string when the linker is doing a final
Packit ba3681
relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
Packit ba3681
Packit ba3681
@item CONSTRUCTING
Packit ba3681
This will be set to a non-empty string when the linker is building
Packit ba3681
global constructor and destructor tables (e.g., all scripts other than
Packit ba3681
@code{-r}).
Packit ba3681
Packit ba3681
@item DATA_ALIGNMENT
Packit ba3681
This will be set to an @code{ALIGN} expression when the output should be
Packit ba3681
page aligned, or to @samp{.} when generating the @code{-N} script.
Packit ba3681
Packit ba3681
@item CREATE_SHLIB
Packit ba3681
This will be set to a non-empty string when generating a @code{-shared}
Packit ba3681
script.
Packit ba3681
Packit ba3681
@item COMBRELOC
Packit ba3681
This will be set to a non-empty string when generating @code{-z combreloc}
Packit ba3681
scripts to a temporary file name which can be used during script generation.
Packit ba3681
@end table
Packit ba3681
Packit ba3681
The conventional way to write a @file{scripttempl} script is to first
Packit ba3681
set a few shell variables, and then write out a linker script using
Packit ba3681
@code{cat} with a here document.  The linker script will use variable
Packit ba3681
substitutions, based on the above variables and those set in the
Packit ba3681
@file{emulparams} script, to control its behaviour.
Packit ba3681
Packit ba3681
When there are parts of the @file{scripttempl} script which should only
Packit ba3681
be run when doing a final relocation, they should be enclosed within a
Packit ba3681
variable substitution based on @code{RELOCATING}.  For example, on many
Packit ba3681
targets special symbols such as @code{_end} should be defined when doing
Packit ba3681
a final link.  Naturally, those symbols should not be defined when doing
Packit ba3681
a relocatable link using @code{-r}.  The @file{scripttempl} script
Packit ba3681
could use a construct like this to define those symbols:
Packit ba3681
@smallexample
Packit ba3681
  $@{RELOCATING+ _end = .;@}
Packit ba3681
@end smallexample
Packit ba3681
This will do the symbol assignment only if the @code{RELOCATING}
Packit ba3681
variable is defined.
Packit ba3681
Packit ba3681
The basic job of the linker script is to put the sections in the correct
Packit ba3681
order, and at the correct memory addresses.  For some targets, the
Packit ba3681
linker script may have to do some other operations.
Packit ba3681
Packit ba3681
For example, on most MIPS platforms, the linker is responsible for
Packit ba3681
defining the special symbol @code{_gp}, used to initialize the
Packit ba3681
@code{$gp} register.  It must be set to the start of the small data
Packit ba3681
section plus @code{0x8000}.  Naturally, it should only be defined when
Packit ba3681
doing a final relocation.  This will typically be done like this:
Packit ba3681
@smallexample
Packit ba3681
  $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
Packit ba3681
@end smallexample
Packit ba3681
This line would appear just before the sections which compose the small
Packit ba3681
data section (@samp{.sdata}, @samp{.sbss}).  All those sections would be
Packit ba3681
contiguous in memory.
Packit ba3681
Packit ba3681
Many COFF systems build constructor tables in the linker script.  The
Packit ba3681
compiler will arrange to output the address of each global constructor
Packit ba3681
in a @samp{.ctor} section, and the address of each global destructor in
Packit ba3681
a @samp{.dtor} section (this is done by defining
Packit ba3681
@code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
Packit ba3681
@code{gcc} configuration files).  The @code{gcc} runtime support
Packit ba3681
routines expect the constructor table to be named @code{__CTOR_LIST__}.
Packit ba3681
They expect it to be a list of words, with the first word being the
Packit ba3681
count of the number of entries.  There should be a trailing zero word.
Packit ba3681
(Actually, the count may be -1 if the trailing word is present, and the
Packit ba3681
trailing word may be omitted if the count is correct, but, as the
Packit ba3681
@code{gcc} behaviour has changed slightly over the years, it is safest
Packit ba3681
to provide both).  Here is a typical way that might be handled in a
Packit ba3681
@file{scripttempl} file.
Packit ba3681
@smallexample
Packit ba3681
    $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
Packit ba3681
    $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
Packit ba3681
    $@{CONSTRUCTING+ *(.ctors)@}
Packit ba3681
    $@{CONSTRUCTING+ LONG(0)@}
Packit ba3681
    $@{CONSTRUCTING+ __CTOR_END__ = .;@}
Packit ba3681
    $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
Packit ba3681
    $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
Packit ba3681
    $@{CONSTRUCTING+ *(.dtors)@}
Packit ba3681
    $@{CONSTRUCTING+ LONG(0)@}
Packit ba3681
    $@{CONSTRUCTING+ __DTOR_END__ = .;@}
Packit ba3681
@end smallexample
Packit ba3681
The use of @code{CONSTRUCTING} ensures that these linker script commands
Packit ba3681
will only appear when the linker is supposed to be building the
Packit ba3681
constructor and destructor tables.  This example is written for a target
Packit ba3681
which uses 4 byte pointers.
Packit ba3681
Packit ba3681
Embedded systems often need to set a stack address.  This is normally
Packit ba3681
best done by using the @code{PROVIDE} construct with a default stack
Packit ba3681
address.  This permits the user to easily override the stack address
Packit ba3681
using the @code{--defsym} option.  Here is an example:
Packit ba3681
@smallexample
Packit ba3681
  $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
Packit ba3681
@end smallexample
Packit ba3681
The value of the symbol @code{__stack} would then be used in the startup
Packit ba3681
code to initialize the stack pointer.
Packit ba3681
Packit ba3681
@node linker emulations
Packit ba3681
@section @file{emultempl} scripts
Packit ba3681
Packit ba3681
Each linker target uses an @file{emultempl} script to generate the
Packit ba3681
emulation code.  The name of the @file{emultempl} script is set by the
Packit ba3681
@code{TEMPLATE_NAME} variable in the @file{emulparams} script.  If the
Packit ba3681
@code{TEMPLATE_NAME} variable is not set, the default is
Packit ba3681
@samp{generic}.  If the value of @code{TEMPLATE_NAME} is @var{template},
Packit ba3681
@file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
Packit ba3681
Packit ba3681
Most targets use the generic @file{emultempl} script,
Packit ba3681
@file{emultempl/generic.em}.  A different @file{emultempl} script is
Packit ba3681
only needed if the linker must support unusual actions, such as linking
Packit ba3681
against shared libraries.
Packit ba3681
Packit ba3681
The @file{emultempl} script is normally written as a simple invocation
Packit ba3681
of @code{cat} with a here document.  The document will use a few
Packit ba3681
variable substitutions.  Typically each function names uses a
Packit ba3681
substitution involving @code{EMULATION_NAME}, for ease of debugging when
Packit ba3681
the linker supports multiple emulations.
Packit ba3681
Packit ba3681
Every function and variable in the emitted file should be static.  The
Packit ba3681
only globally visible object must be named
Packit ba3681
@code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
Packit ba3681
the name of the emulation set in @file{configure.tgt} (this is also the
Packit ba3681
name of the @file{emulparams} file without the @file{.sh} extension).
Packit ba3681
The @file{genscripts.sh} script will set the shell variable
Packit ba3681
@code{EMULATION_NAME} before invoking the @file{emultempl} script.
Packit ba3681
Packit ba3681
The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
Packit ba3681
@code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
Packit ba3681
It defines a set of function pointers which are invoked by the linker,
Packit ba3681
as well as strings for the emulation name (normally set from the shell
Packit ba3681
variable @code{EMULATION_NAME} and the default BFD target name (normally
Packit ba3681
set from the shell variable @code{OUTPUT_FORMAT} which is normally set
Packit ba3681
by the @file{emulparams} file).
Packit ba3681
Packit ba3681
The @file{genscripts.sh} script will set the shell variable
Packit ba3681
@code{COMPILE_IN} when it invokes the @file{emultempl} script for the
Packit ba3681
default emulation.  In this case, the @file{emultempl} script should
Packit ba3681
include the linker scripts directly, and return them from the
Packit ba3681
@code{get_scripts} entry point.  When the emulation is not the default,
Packit ba3681
the @code{get_scripts} entry point should just return a file name.  See
Packit ba3681
@file{emultempl/generic.em} for an example of how this is done.
Packit ba3681
Packit ba3681
At some point, the linker emulation entry points should be documented.
Packit ba3681
Packit ba3681
@node Emulation Walkthrough
Packit ba3681
@chapter A Walkthrough of a Typical Emulation
Packit ba3681
Packit ba3681
This chapter is to help people who are new to the way emulations
Packit ba3681
interact with the linker, or who are suddenly thrust into the position
Packit ba3681
of having to work with existing emulations.  It will discuss the files
Packit ba3681
you need to be aware of.  It will tell you when the given "hooks" in
Packit ba3681
the emulation will be called.  It will, hopefully, give you enough
Packit ba3681
information about when and how things happen that you'll be able to
Packit ba3681
get by.  As always, the source is the definitive reference to this.
Packit ba3681
Packit ba3681
The starting point for the linker is in @file{ldmain.c} where
Packit ba3681
@code{main} is defined.  The bulk of the code that's emulation
Packit ba3681
specific will initially be in @code{emultempl/@var{emulation}.em} but
Packit ba3681
will end up in @code{e@var{emulation}.c} when the build is done.
Packit ba3681
Most of the work to select and interface with emulations is in
Packit ba3681
@code{ldemul.h} and @code{ldemul.c}.  Specifically, @code{ldemul.h}
Packit ba3681
defines the @code{ld_emulation_xfer_struct} structure your emulation
Packit ba3681
exports.
Packit ba3681
Packit ba3681
Your emulation file exports a symbol
Packit ba3681
@code{ld_@var{EMULATION_NAME}_emulation}.  If your emulation is
Packit ba3681
selected (it usually is, since usually there's only one),
Packit ba3681
@code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
Packit ba3681
@code{ldemul.c} also defines a number of API functions that interface
Packit ba3681
to your emulation, like @code{ldemul_after_parse} which simply calls
Packit ba3681
your @code{ld_@var{EMULATION}_emulation.after_parse} function.  For
Packit ba3681
the rest of this section, the functions will be mentioned, but you
Packit ba3681
should assume the indirect reference to your emulation also.
Packit ba3681
Packit ba3681
We will also skip or gloss over parts of the link process that don't
Packit ba3681
relate to emulations, like setting up internationalization.
Packit ba3681
Packit ba3681
After initialization, @code{main} selects an emulation by pre-scanning
Packit ba3681
the command line arguments.  It calls @code{ldemul_choose_target} to
Packit ba3681
choose a target.  If you set @code{choose_target} to
Packit ba3681
@code{ldemul_default_target}, it picks your @code{target_name} by
Packit ba3681
default.
Packit ba3681
Packit ba3681
@code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
Packit ba3681
@code{parse_args} calls @code{ldemul_parse_args} for each arg, which
Packit ba3681
must update the @code{getopt} globals if it recognizes the argument.
Packit ba3681
If the emulation doesn't recognize it, then parse_args checks to see
Packit ba3681
if it recognizes it.
Packit ba3681
Packit ba3681
Now that the emulation has had access to all its command-line options,
Packit ba3681
@code{main} calls @code{ldemul_set_symbols}.  This can be used for any
Packit ba3681
initialization that may be affected by options.  It is also supposed
Packit ba3681
to set up any variables needed by the emulation script.
Packit ba3681
Packit ba3681
@code{main} now calls @code{ldemul_get_script} to get the emulation
Packit ba3681
script to use (based on arguments, no doubt, @pxref{Emulations}) and
Packit ba3681
runs it.  While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
Packit ba3681
@code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
Packit ba3681
commands.  It may call @code{ldemul_unrecognized_file} if you asked
Packit ba3681
the linker to link a file it doesn't recognize.  It will call
Packit ba3681
@code{ldemul_recognized_file} for each file it does recognize, in case
Packit ba3681
the emulation wants to handle some files specially.  All the while,
Packit ba3681
it's loading the files (possibly calling
Packit ba3681
@code{ldemul_open_dynamic_archive}) and symbols and stuff.  After it's
Packit ba3681
done reading the script, @code{main} calls @code{ldemul_after_parse}.
Packit ba3681
Use the after-parse hook to set up anything that depends on stuff the
Packit ba3681
script might have set up, like the entry point.
Packit ba3681
Packit ba3681
@code{main} next calls @code{lang_process} in @code{ldlang.c}.  This
Packit ba3681
appears to be the main core of the linking itself, as far as emulation
Packit ba3681
hooks are concerned(*).  It first opens the output file's BFD, calling
Packit ba3681
@code{ldemul_set_output_arch}, and calls
Packit ba3681
@code{ldemul_create_output_section_statements} in case you need to use
Packit ba3681
other means to find or create object files (i.e. shared libraries
Packit ba3681
found on a path, or fake stub objects).  Despite the name, nobody
Packit ba3681
creates output sections here.
Packit ba3681
Packit ba3681
(*) In most cases, the BFD library does the bulk of the actual
Packit ba3681
linking, handling symbol tables, symbol resolution, relocations, and
Packit ba3681
building the final output file.  See the BFD reference for all the
Packit ba3681
details.  Your emulation is usually concerned more with managing
Packit ba3681
things at the file and section level, like "put this here, add this
Packit ba3681
section", etc.
Packit ba3681
Packit ba3681
Next, the objects to be linked are opened and BFDs created for them,
Packit ba3681
and @code{ldemul_after_open} is called.  At this point, you have all
Packit ba3681
the objects and symbols loaded, but none of the data has been placed
Packit ba3681
yet.
Packit ba3681
Packit ba3681
Next comes the Big Linking Thingy (except for the parts BFD does).
Packit ba3681
All input sections are mapped to output sections according to the
Packit ba3681
script.  If a section doesn't get mapped by default,
Packit ba3681
@code{ldemul_place_orphan} will get called to figure out where it goes.
Packit ba3681
Next it figures out the offsets for each section, calling
Packit ba3681
@code{ldemul_before_allocation} before and
Packit ba3681
@code{ldemul_after_allocation} after deciding where each input section
Packit ba3681
ends up in the output sections.
Packit ba3681
Packit ba3681
The last part of @code{lang_process} is to figure out all the symbols'
Packit ba3681
values.  After assigning final values to the symbols,
Packit ba3681
@code{ldemul_finish} is called, and after that, any undefined symbols
Packit ba3681
are turned into fatal errors.
Packit ba3681
Packit ba3681
OK, back to @code{main}, which calls @code{ldwrite} in
Packit ba3681
@file{ldwrite.c}.  @code{ldwrite} calls BFD's final_link, which does
Packit ba3681
all the relocation fixups and writes the output bfd to disk, and we're
Packit ba3681
done.
Packit ba3681
Packit ba3681
In summary,
Packit ba3681
Packit ba3681
@itemize @bullet
Packit ba3681
Packit ba3681
@item @code{main()} in @file{ldmain.c}
Packit ba3681
@item @file{emultempl/@var{EMULATION}.em} has your code
Packit ba3681
@item @code{ldemul_choose_target} (defaults to your @code{target_name})
Packit ba3681
@item @code{ldemul_before_parse}
Packit ba3681
@item Parse argv, calls @code{ldemul_parse_args} for each
Packit ba3681
@item @code{ldemul_set_symbols}
Packit ba3681
@item @code{ldemul_get_script}
Packit ba3681
@item parse script
Packit ba3681
Packit ba3681
@itemize @bullet
Packit ba3681
@item may call @code{ldemul_hll} or @code{ldemul_syslib}
Packit ba3681
@item may call @code{ldemul_open_dynamic_archive}
Packit ba3681
@end itemize
Packit ba3681
Packit ba3681
@item @code{ldemul_after_parse}
Packit ba3681
@item @code{lang_process()} in @file{ldlang.c}
Packit ba3681
Packit ba3681
@itemize @bullet
Packit ba3681
@item create @code{output_bfd}
Packit ba3681
@item @code{ldemul_set_output_arch}
Packit ba3681
@item @code{ldemul_create_output_section_statements}
Packit ba3681
@item read objects, create input bfds - all symbols exist, but have no values
Packit ba3681
@item may call @code{ldemul_unrecognized_file}
Packit ba3681
@item will call @code{ldemul_recognized_file}
Packit ba3681
@item @code{ldemul_after_open}
Packit ba3681
@item map input sections to output sections
Packit ba3681
@item may call @code{ldemul_place_orphan} for remaining sections
Packit ba3681
@item @code{ldemul_before_allocation}
Packit ba3681
@item gives input sections offsets into output sections, places output sections
Packit ba3681
@item @code{ldemul_after_allocation} - section addresses valid
Packit ba3681
@item assigns values to symbols
Packit ba3681
@item @code{ldemul_finish} - symbol values valid
Packit ba3681
@end itemize
Packit ba3681
Packit ba3681
@item output bfd is written to disk
Packit ba3681
Packit ba3681
@end itemize
Packit ba3681
Packit ba3681
@node Architecture Specific
Packit ba3681
@chapter Some Architecture Specific Notes
Packit ba3681
Packit ba3681
This is the place for notes on the behavior of @code{ld} on
Packit ba3681
specific platforms.  Currently, only Intel x86 is documented (and
Packit ba3681
of that, only the auto-import behavior for DLLs).
Packit ba3681
Packit ba3681
@menu
Packit ba3681
* ix86::                        Intel x86
Packit ba3681
@end menu
Packit ba3681
Packit ba3681
@node ix86
Packit ba3681
@section Intel x86
Packit ba3681
Packit ba3681
@table @emph
Packit ba3681
@code{ld} can create DLLs that operate with various runtimes available
Packit ba3681
on a common x86 operating system.  These runtimes include native (using
Packit ba3681
the mingw "platform"), cygwin, and pw.
Packit ba3681
Packit ba3681
@item auto-import from DLLs
Packit ba3681
@enumerate
Packit ba3681
@item
Packit ba3681
With this feature on, DLL clients can import variables from DLL
Packit ba3681
without any concern from their side (for example, without any source
Packit ba3681
code modifications).  Auto-import can be enabled using the
Packit ba3681
@code{--enable-auto-import} flag, or disabled via the
Packit ba3681
@code{--disable-auto-import} flag.  Auto-import is disabled by default.
Packit ba3681
Packit ba3681
@item
Packit ba3681
This is done completely in bounds of the PE specification (to be fair,
Packit ba3681
there's a minor violation of the spec at one point, but in practice
Packit ba3681
auto-import works on all known variants of that common x86 operating
Packit ba3681
system)  So, the resulting DLL can be used with any other PE
Packit ba3681
compiler/linker.
Packit ba3681
Packit ba3681
@item
Packit ba3681
Auto-import is fully compatible with standard import method, in which
Packit ba3681
variables are decorated using attribute modifiers. Libraries of either
Packit ba3681
type may be mixed together.
Packit ba3681
Packit ba3681
@item
Packit ba3681
Overhead (space): 8 bytes per imported symbol, plus 20 for each
Packit ba3681
reference to it; Overhead (load time): negligible; Overhead
Packit ba3681
(virtual/physical memory): should be less than effect of DLL
Packit ba3681
relocation.
Packit ba3681
@end enumerate
Packit ba3681
Packit ba3681
Motivation
Packit ba3681
Packit ba3681
The obvious and only way to get rid of dllimport insanity is
Packit ba3681
to make client access variable directly in the DLL, bypassing
Packit ba3681
the extra dereference imposed by ordinary DLL runtime linking.
Packit ba3681
I.e., whenever client contains something like
Packit ba3681
Packit ba3681
@code{mov dll_var,%eax,}
Packit ba3681
Packit ba3681
address of dll_var in the command should be relocated to point
Packit ba3681
into loaded DLL. The aim is to make OS loader do so, and than
Packit ba3681
make ld help with that.  Import section of PE made following
Packit ba3681
way: there's a vector of structures each describing imports
Packit ba3681
from particular DLL. Each such structure points to two other
Packit ba3681
parallel vectors: one holding imported names, and one which
Packit ba3681
will hold address of corresponding imported name. So, the
Packit ba3681
solution is de-vectorize these structures, making import
Packit ba3681
locations be sparse and pointing directly into code.
Packit ba3681
Packit ba3681
Implementation
Packit ba3681
Packit ba3681
For each reference of data symbol to be imported from DLL (to
Packit ba3681
set of which belong symbols with name <sym>, if __imp_<sym> is
Packit ba3681
found in implib), the import fixup entry is generated. That
Packit ba3681
entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3
Packit ba3681
subsection. Each fixup entry contains pointer to symbol's address
Packit ba3681
within .text section (marked with __fuN_<sym> symbol, where N is
Packit ba3681
integer), pointer to DLL name (so, DLL name is referenced by
Packit ba3681
multiple entries), and pointer to symbol name thunk. Symbol name
Packit ba3681
thunk is singleton vector (__nm_th_<symbol>) pointing to
Packit ba3681
IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing
Packit ba3681
imported name. Here comes that "om the edge" problem mentioned above:
Packit ba3681
PE specification rambles that name vector (OriginalFirstThunk) should
Packit ba3681
run in parallel with addresses vector (FirstThunk), i.e. that they
Packit ba3681
should have same number of elements and terminated with zero. We violate
Packit ba3681
this, since FirstThunk points directly into machine code. But in
Packit ba3681
practice, OS loader implemented the sane way: it goes thru
Packit ba3681
OriginalFirstThunk and puts addresses to FirstThunk, not something
Packit ba3681
else. It once again should be noted that dll and symbol name
Packit ba3681
structures are reused across fixup entries and should be there
Packit ba3681
anyway to support standard import stuff, so sustained overhead is
Packit ba3681
20 bytes per reference. Other question is whether having several
Packit ba3681
IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes,
Packit ba3681
it is done even by native compiler/linker (libth32's functions are in
Packit ba3681
fact resident in windows9x kernel32.dll, so if you use it, you have
Packit ba3681
two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is
Packit ba3681
whether referencing the same PE structures several times is valid.
Packit ba3681
The answer is why not, prohibiting that (detecting violation) would
Packit ba3681
require more work on behalf of loader than not doing it.
Packit ba3681
Packit ba3681
@end table
Packit ba3681
Packit ba3681
@node GNU Free Documentation License
Packit ba3681
@chapter GNU Free Documentation License
Packit ba3681
Packit ba3681
@include fdl.texi
Packit ba3681
Packit ba3681
@contents
Packit ba3681
@bye