public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Giuliano Procida <gprocida@google.com>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: libabigail@sourceware.org, "Matthias Männich" <maennich@google.com>
Subject: Re: idea: abigail abixml archive
Date: Tue, 21 Nov 2023 14:54:05 +0000	[thread overview]
Message-ID: <CAGvU0H=fRZ7DicyONn2TRwwpf_Sp6d5t=zBKBFmsY-=1N1G7fQ@mail.gmail.com> (raw)
In-Reply-To: <20231115155306.GC15862@redhat.com>

Hi.

On Wed, 15 Nov 2023 at 15:53, Frank Ch. Eigler <fche@redhat.com> wrote:
>
> Hi -
>
> I'd love some feedback about the following idea, related to using
> libabigail to assemble a crowdsourced database of abixml files for
> linux distros.
>
> The germ of the idea is that developers may need to know whether a
> binary they built or found is likely to be abi-compatible with a given
> distro / version.  This is possible today by downloading the target
> distro binaries and running libabigail locally against them, or using
> front-end scripts like fedabipkgdiff that do the downloading first.
> But this is a pain if one wants to compare against a range of versions
> or foreign distros.
>
> So the idea is instead to let people use an public archive of abixml
> artifacts instead of the binaries.  The abixml files are relatively
> tiny, barely-ever changing, and should be an effective proxy for the
> real binaries.  It's just a small matter of (a) storing, (b)
> using, and (c) collecting it.
>
> --------------------
>
> For storing this data, I envision overloading the libabigil git repo
> (or a new one) with storage of the abixml documents.  To keep it dead
> simple, there could be one branch per /etc/os-release $ID/$VERSION_ID,
> one file per shared library in the distribution.  For example, a
> fedora-39-x86-64 copy of /usr/lib64/libc.so.6, the file abidw produces
> could sit at
>
>    repo  git://sourceware.org/git/libabigail.git
>  branch  gitabixml/fedora/39/x86_64
>    file  /usr/lib64/libc.so.6.xml
>
> (Symlinks in the distro fs could be represented as symlinks in git.)
>
> Updates to the distro package of course happen.  It seems natural to
> update the abixml file for the affected file(s) right there in place.
>
> Since it may sometimes be desirable to track what package version
> (e.g. rpm n-v-r) is associated with the abixml data of a given
> version, we could use stylized records in the git commit text (or a
> git note, or maybe a tag).  That would mean one git commit per updated
> package, with metadata message like:
>
>   Package: glibc-2.38-7.fc39.x86_64
>
> Maybe abidw version tags would be useful to add.
>
> --------------------
>
> For using this data, I envision abidiff / abicompat taking a new form
> for its right operand.  It could be a git url identifying the distro
> branch or tag.  libabigail would fetch the corresponding file.xml
> within that.  Simplify/default the heck out of it for ease of use:
>
>   export $BRANCH=fedora/39/x86_64
>   abicompat /bin/myprogram gitabixml:$BRANCH
>
> (Where "gitabixml:" could instruct the tool to look at the sourceware
> libabigail git / gitweb / cgit server.  Let users specify different or
> private git servers via environment variables or something.
>
> --------------------
>
> For collecting this data, I envision writing some distro-specific
> scripts, kind of like fedabipkgdiff, being run by contributors or
> ourselves.  One flavour could run in operational installed distros,
> doing the equivalent of
>
>     find $PATHS -name '*.so.*' | while read lib; do
>        # or filter with elfclassify
>        package=`rpm -qf "$lib"`
>        abidw "$lib" | (cd $gitrepo/`dirname $lib`; cat > "$lib.xml")
>        (cd $gitrepo; git commit -m"Package: $package" "$lib.xml")
>     done
>
> and rerun that occasionally as updates flow down from the distro.
> This could be done on a single beefy box running containers with
> different distros.
>
> Another flavour could be to take a set of RPM/etc. archives on a
> filesystem (or an ISO image), incrementally decompress them, run abidw
> on the individual files, and similarly construct the git repo of
> abixml files.  (This is kind of like how debuginfod produces indexes
> from a bunch of RPMs.)
>
> No matter how the local git repo is populated, each branch describing
> a data contributor's distro could be pushed to the central one,
> bringing that one up to date.  Patches representing updates could be
> emailed too, but no one will want to read/review that stuff.  We'd
> probably need a trusted pool of contributors who can just commit to
> areas of the central git repo.  Secured with gitsigur of course. :-)
>
> The central repo could be built up entirely gradually.  If some
> libraries were omitted from initial commits for a distro, a later
> contribution could fill in the gaps.
>
> --------------------
>
> OK, how reasonable does all this sound?

This sounds like an interesting project, but you can go further.

Starting at the level of a single binary with a single .so. The
questions we want answered are:

1. Will all undefined symbols resolve successfully (otherwise
reporting missing symbols)?
2. Are the types of the resolved symbols compatible (otherwise
reporting differences)?
3. Do we have ABI representations that let us answer 1. and 2. without
having some binaries to hand? (Not yet.)

Neither libabigail nor STG emit undefined symbols in their ABI
representations (yet), so answering 1. currently requires having
binaries and debug information to hand.

Now build this up to multiple binaries, SONAME, bundled and unbundled
shared objects, library dependencies (ELF needs), multiple
distributions, packages, versions and architectures, supporting
link-loaded plugins with dlopen etc.

Having a full database of existing libraries would allow compatibility
for a freshly-compiled binary (or package) to be checked (idea due to
Matthias Männich) or dependency hell to be explored without actually
installing any packages.

Similarly, if binaries (and their packages) are in the database,
questions about library upgrades could be answered.

Giuliano.

>
> - FChE
>

  parent reply	other threads:[~2023-11-21 14:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-15 15:53 Frank Ch. Eigler
2023-11-17 12:59 ` Dodji Seketeli
2023-11-17 14:53   ` Frank Ch. Eigler
2023-11-17 23:06     ` Dodji Seketeli
2023-12-06  0:14       ` Frank Ch. Eigler
2023-11-27 19:17   ` Ben Woodard
2023-11-28 13:52     ` Dodji Seketeli
2023-11-21 14:54 ` Giuliano Procida [this message]
2023-11-27 19:09 ` Ben Woodard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGvU0H=fRZ7DicyONn2TRwwpf_Sp6d5t=zBKBFmsY-=1N1G7fQ@mail.gmail.com' \
    --to=gprocida@google.com \
    --cc=fche@redhat.com \
    --cc=libabigail@sourceware.org \
    --cc=maennich@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).