public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Fangrui Song <i@maskray.me>
To: binutils@sourceware.org
Subject: "Memoryless" archive processing of ld
Date: Thu, 3 Sep 2020 23:34:24 -0700	[thread overview]
Message-ID: <20200904063424.umcmfcwhgdgkvdxz@gmail.com> (raw)

Many people are aware that archive member(element) fetching does not allow backward references, i.e.

   ld def.a ref.o  will fail with "undefined reference to".

However, it is said that VMS (now OpenVMS), Mach-O ld64 and Windows link.exe
chose a different strategy when every archive symbol is remembered and thus
such a backward reference is allowed. Do folks know the pros and cons of GNU
ld's strategy? (Is it emulation of ancient Unix ELF linkers' behavior?)

People can quickly give one advantage: the "memoryless" archive processing saves
memory. This was probably important in the old days, and probably more so before
the archive symbol table was invented, but probably less relevant nowadays.

   https://www.gnu.org/software/coreutils/manual/html_node/tsort-background.html
   briefly describes 'lorder' (which still exists on a modern FreeBSD) and says
   
   > This whole procedure has been obsolete since about 1980, because Unix archives
   > now contain a symbol table (traditionally built by ranlib, now generally built
   > by ar itself), and the Unix linker uses the symbol table to effectively make
   > multiple passes over an archive file.
   
   > Anyhow, that’s where tsort came from. To solve an old problem with the way the
   > linker handled archive files, which has since been solved in different ways.

Some disadvantages:

* --start-group is needed to resolve circular dependencies among archives.
   People are probably used to ugly -lgcc -lgcc_eh or -lgcc_s on both side of -lc.
* Poor diagnostics: "undefined reference to" tells you the symbol name, the
   source file, but not the destination file. It usually takes some efforts
   to figure out the problem (the ordering problem is usually not obvious).
* An external program 'lorder' or build system's integrated topological sorting
   feature is needed to order archives. The ordering sacrifices commutativity.
   The loss of commutativity can make the build brittle, i.e. minor
   ordering tweak can cause subtle behavior changes (symbol resolution).
* When providing an interceptor library (a library providing overriding
   definitions), you usually want to make it optional, i.e. the intercepted
   library does not have a dependency on the interceptor. However, due to the
   memoryless archive processing, you have to make sure the interceptor comes
   after the intercepted library, which usually requires some special plumbing in
   the build system. An alternative is --whole-archive, which unfortunately loses
   drops the nice lazy property.

I have mentioned enough disadvantages:) As one additional advantage, the
memoryless nature enforces a (weak) layering of archives. The layering is one
particular topological sort of the dependency graph. It is weak as some
dependency edges can still be missing, e.g. if a->b, a->c, b->d, c->d. If there
is an unspecified dependency b->c, ld .. -la -lb -lc -ld will succeed but ld ..
-la -lc -lb -ld may fail.


Hope my few points above can intrigue more thoughts.

             reply	other threads:[~2020-09-04  6:34 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04  6:34 Fangrui Song [this message]
2020-09-04 23:58 ` Ian Lance Taylor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200904063424.umcmfcwhgdgkvdxz@gmail.com \
    --to=i@maskray.me \
    --cc=binutils@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).