On 11/15/23 07:53, Frank Ch. Eigler wrote:
> Hi -
>
> I'd love some feedback about the following idea, related to using
> libabigail to assemble a crowdsourced database of abixml files for
> linux distros.
>
> The germ of the idea is that developers may need to know whether a
> binary they built or found is likely to be abi-compatible with a given
> distro / version.  This is possible today by downloading the target
> distro binaries and running libabigail locally against them, or using
> front-end scripts like fedabipkgdiff that do the downloading first.
> But this is a pain if one wants to compare against a range of versions
> or foreign distros.

The other big use is "do I need to rebuild my binary for this new distro?"

This can be either a new Y release in RHEL X.Y where you have built for 
RHEL 9.2 and you want to make damn well sure that your program is 
compatible with RHEL9.3. Yes we make ABI stability guarantees but that 
only covers a portion of the distro. Or it could be you have your 
program built for Fedora 38 and you want to know if you need to rebuild 
it for Fedora 39 or if it happens to be ABI compatible.

The other use case is within the distro. You are a packager and you just 
applied a patch. You want to make sure that you didn't break ABI with 
the patch before you release it onto the world.

>
> So the idea is instead to let people use an public archive of abixml
> artifacts instead of the binaries.  The abixml files are relatively
> tiny, barely-ever changing, and should be an effective proxy for the
> real binaries.  It's just a small matter of (a) storing, (b)
> using, and (c) collecting it.

a) generating
b) storing
c) using
d) collecting

I add a) generating because as I have shown in the past there are many 
cases where libabigail needs to have some bugs fixed to be able to 
generate abixml files for particular packages.

> --------------------
>
> For storing this data, I envision overloading the libabigil git repo
> (or a new one) with storage of the abixml documents.  To keep it dead
> simple, there could be one branch per /etc/os-release $ID/$VERSION_ID,
> one file per shared library in the distribution.  For example, a
> fedora-39-x86-64 copy of /usr/lib64/libc.so.6, the file abidw produces
> could sit at
>
>     repo  git://sourceware.org/git/libabigail.git
>   branch  gitabixml/fedora/39/x86_64
>     file  /usr/lib64/libc.so.6.xml
>
> (Symlinks in the distro fs could be represented as symlinks in git.)
>
> Updates to the distro package of course happen.  It seems natural to
> update the abixml file for the affected file(s) right there in place.
>
> Since it may sometimes be desirable to track what package version
> (e.g. rpm n-v-r) is associated with the abixml data of a given
> version, we could use stylized records in the git commit text (or a
> git note, or maybe a tag).  That would mean one git commit per updated
> package, with metadata message like:
>
>    Package: glibc-2.38-7.fc39.x86_64
>
> Maybe abidw version tags would be useful to add.

Yeah, one challenge is going to be when for whatever reason dodji needs 
to update the abixml file format. In theory it shouldn't happen that 
often but when it does, it will be a HUGE churn in the git repo. I would 
suggest to mitigate that problem, there should be a abixml version in 
the pathname.

I know that one of the things that we need in the not too distant future 
is to track ABI on inline functions in C++ and so I expect that will 
need a AIBXML format change when it happens.

>
> --------------------
>
> For using this data, I envision abidiff / abicompat taking a new form
> for its right operand.  It could be a git url identifying the distro
> branch or tag.  libabigail would fetch the corresponding file.xml
> within that.  Simplify/default the heck out of it for ease of use:
>
>    export $BRANCH=fedora/39/x86_64
>    abicompat /bin/myprogram gitabixml:$BRANCH
>
> (Where "gitabixml:" could instruct the tool to look at the sourceware
> libabigail git / gitweb / cgit server.  Let users specify different or
> private git servers via environment variables or something.

As Dodji pointed out below abicompat doesn't currently handle 
dependencies very well. This has been a long term disagreement between 
us. To try to simplify (possibly oversimplify) the disagreement. Dodji 
believes that abicompat does one thing and that is test the forward ABI 
dependencies from the program to a particular library. Very Unixy do one 
thing. I disagree with this philosophy and all of my discussions with 
users indicate that they misinterpret what abicompat actually does 
thinking that it does more than it does. I'm a strong proponent of 
making abicompat do what the name implies and what users believe it does 
and that is actually check that the library is compatible with the 
program specified on the command line. This references another design 
principle: The principle of least suprise.

Thus IMHO abicompat should:

1) examine both forward and back dependencies. i.e. if a library makes 
use of a function or variable in the program itself then abicompat 
should consider that an ABI mismatch.
2) it should include all the dependencies of the program when 
considering ABI compatibility. Yes this could be made into a script but 
it ends up being a very inefficient and complicated script. I've written 
it two different times and people complain about the performance. The 
problem is it has to load the abixml over and over rather than keeping 
it in memory when doing the comparison. On complicated programs with 
lots of dependencies, this takes considerable time.


>
> --------------------
>
> For collecting this data, I envision writing some distro-specific
> scripts, kind of like fedabipkgdiff, being run by contributors or
> ourselves.  One flavour could run in operational installed distros,
> doing the equivalent of
>
>      find $PATHS -name '*.so.*' | while read lib; do
>         # or filter with elfclassify
>         package=`rpm -qf "$lib"`
>         abidw "$lib" | (cd $gitrepo/`dirname $lib`; cat > "$lib.xml")
>         (cd $gitrepo; git commit -m"Package: $package" "$lib.xml")
>      done

A way to gather this easily with the current framework would be to allow 
BRANCH to refer to a local clone of the abirepo and then let 
fedabipkgdiff stick its temporary files in there so that it could easily 
be committed.  That way when I do one of my tests which does a self 
comparison for every RPM, it generates the full data set.

> and rerun that occasionally as updates flow down from the distro.
> This could be done on a single beefy box running containers with
> different distros.
>
> Another flavour could be to take a set of RPM/etc. archives on a
> filesystem (or an ISO image), incrementally decompress them, run abidw
> on the individual files, and similarly construct the git repo of
> abixml files.  (This is kind of like how debuginfod produces indexes
> from a bunch of RPMs.)
>
> No matter how the local git repo is populated, each branch describing
> a data contributor's distro could be pushed to the central one,
> bringing that one up to date.  Patches representing updates could be
> emailed too, but no one will want to read/review that stuff.  We'd
> probably need a trusted pool of contributors who can just commit to
> areas of the central git repo.  Secured with gitsigur of course. :-)
>
> The central repo could be built up entirely gradually.  If some
> libraries were omitted from initial commits for a distro, a later
> contribution could fill in the gaps.
>
> --------------------
>
> OK, how reasonable does all this sound?
>
>
> - FChE
>