public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* "Memoryless" archive processing of ld
@ 2020-09-04  6:34 Fangrui Song
  2020-09-04 23:58 ` Ian Lance Taylor
  0 siblings, 1 reply; 2+ messages in thread
From: Fangrui Song @ 2020-09-04  6:34 UTC (permalink / raw)
  To: binutils

Many people are aware that archive member(element) fetching does not allow backward references, i.e.

   ld def.a ref.o  will fail with "undefined reference to".

However, it is said that VMS (now OpenVMS), Mach-O ld64 and Windows link.exe
chose a different strategy when every archive symbol is remembered and thus
such a backward reference is allowed. Do folks know the pros and cons of GNU
ld's strategy? (Is it emulation of ancient Unix ELF linkers' behavior?)

People can quickly give one advantage: the "memoryless" archive processing saves
memory. This was probably important in the old days, and probably more so before
the archive symbol table was invented, but probably less relevant nowadays.

   https://www.gnu.org/software/coreutils/manual/html_node/tsort-background.html
   briefly describes 'lorder' (which still exists on a modern FreeBSD) and says
   
   > This whole procedure has been obsolete since about 1980, because Unix archives
   > now contain a symbol table (traditionally built by ranlib, now generally built
   > by ar itself), and the Unix linker uses the symbol table to effectively make
   > multiple passes over an archive file.
   
   > Anyhow, that’s where tsort came from. To solve an old problem with the way the
   > linker handled archive files, which has since been solved in different ways.

Some disadvantages:

* --start-group is needed to resolve circular dependencies among archives.
   People are probably used to ugly -lgcc -lgcc_eh or -lgcc_s on both side of -lc.
* Poor diagnostics: "undefined reference to" tells you the symbol name, the
   source file, but not the destination file. It usually takes some efforts
   to figure out the problem (the ordering problem is usually not obvious).
* An external program 'lorder' or build system's integrated topological sorting
   feature is needed to order archives. The ordering sacrifices commutativity.
   The loss of commutativity can make the build brittle, i.e. minor
   ordering tweak can cause subtle behavior changes (symbol resolution).
* When providing an interceptor library (a library providing overriding
   definitions), you usually want to make it optional, i.e. the intercepted
   library does not have a dependency on the interceptor. However, due to the
   memoryless archive processing, you have to make sure the interceptor comes
   after the intercepted library, which usually requires some special plumbing in
   the build system. An alternative is --whole-archive, which unfortunately loses
   drops the nice lazy property.

I have mentioned enough disadvantages:) As one additional advantage, the
memoryless nature enforces a (weak) layering of archives. The layering is one
particular topological sort of the dependency graph. It is weak as some
dependency edges can still be missing, e.g. if a->b, a->c, b->d, c->d. If there
is an unspecified dependency b->c, ld .. -la -lb -lc -ld will succeed but ld ..
-la -lc -lb -ld may fail.


Hope my few points above can intrigue more thoughts.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: "Memoryless" archive processing of ld
  2020-09-04  6:34 "Memoryless" archive processing of ld Fangrui Song
@ 2020-09-04 23:58 ` Ian Lance Taylor
  0 siblings, 0 replies; 2+ messages in thread
From: Ian Lance Taylor @ 2020-09-04 23:58 UTC (permalink / raw)
  To: Fangrui Song; +Cc: Binutils

On Thu, Sep 3, 2020 at 11:34 PM Fangrui Song <i@maskray.me> wrote:
>
> Many people are aware that archive member(element) fetching does not allow backward references, i.e.
>
>    ld def.a ref.o  will fail with "undefined reference to".
>
> However, it is said that VMS (now OpenVMS), Mach-O ld64 and Windows link.exe
> chose a different strategy when every archive symbol is remembered and thus
> such a backward reference is allowed. Do folks know the pros and cons of GNU
> ld's strategy? (Is it emulation of ancient Unix ELF linkers' behavior?)
>
> People can quickly give one advantage: the "memoryless" archive processing saves
> memory. This was probably important in the old days, and probably more so before
> the archive symbol table was invented, but probably less relevant nowadays.
>
>    https://www.gnu.org/software/coreutils/manual/html_node/tsort-background.html
>    briefly describes 'lorder' (which still exists on a modern FreeBSD) and says
>
>    > This whole procedure has been obsolete since about 1980, because Unix archives
>    > now contain a symbol table (traditionally built by ranlib, now generally built
>    > by ar itself), and the Unix linker uses the symbol table to effectively make
>    > multiple passes over an archive file.
>
>    > Anyhow, that’s where tsort came from. To solve an old problem with the way the
>    > linker handled archive files, which has since been solved in different ways.
>
> Some disadvantages:
>
> * --start-group is needed to resolve circular dependencies among archives.
>    People are probably used to ugly -lgcc -lgcc_eh or -lgcc_s on both side of -lc.
> * Poor diagnostics: "undefined reference to" tells you the symbol name, the
>    source file, but not the destination file. It usually takes some efforts
>    to figure out the problem (the ordering problem is usually not obvious).
> * An external program 'lorder' or build system's integrated topological sorting
>    feature is needed to order archives. The ordering sacrifices commutativity.
>    The loss of commutativity can make the build brittle, i.e. minor
>    ordering tweak can cause subtle behavior changes (symbol resolution).
> * When providing an interceptor library (a library providing overriding
>    definitions), you usually want to make it optional, i.e. the intercepted
>    library does not have a dependency on the interceptor. However, due to the
>    memoryless archive processing, you have to make sure the interceptor comes
>    after the intercepted library, which usually requires some special plumbing in
>    the build system. An alternative is --whole-archive, which unfortunately loses
>    drops the nice lazy property.
>
> I have mentioned enough disadvantages:) As one additional advantage, the
> memoryless nature enforces a (weak) layering of archives. The layering is one
> particular topological sort of the dependency graph. It is weak as some
> dependency edges can still be missing, e.g. if a->b, a->c, b->d, c->d. If there
> is an unspecified dependency b->c, ld .. -la -lb -lc -ld will succeed but ld ..
> -la -lc -lb -ld may fail.
>
>
> Hope my few points above can intrigue more thoughts.

The way that the linker handles archives is much older than ELF, of
course.  It dates back to the a.out format used on the original Unix
systems.

The lorder and tsort programs were used to permit the linker to read
and choose elements from a single .a file in a single pass.  As you
note these days they are not necessary, as archives have a symbol
table by default, and current linkers onl work if the archive has a
symbol table.

As you say, the main advantage of the approach to archives is
simplicity and memory use.  It does also permit interpolation by
adding an archive at the right point in the link, but of course that
can also be handled by using the archive ordering to determine which
archive to use to satisfy an undefined reference.  There is more
discussion at https://www.airs.com/blog/archives/48, including the
comments, but I don't think it really adds anything.

Ian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-09-04 23:58 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04  6:34 "Memoryless" archive processing of ld Fangrui Song
2020-09-04 23:58 ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).