On 11/15/23 07:53, Frank Ch. Eigler wrote: > Hi - > > I'd love some feedback about the following idea, related to using > libabigail to assemble a crowdsourced database of abixml files for > linux distros. > > The germ of the idea is that developers may need to know whether a > binary they built or found is likely to be abi-compatible with a given > distro / version. This is possible today by downloading the target > distro binaries and running libabigail locally against them, or using > front-end scripts like fedabipkgdiff that do the downloading first. > But this is a pain if one wants to compare against a range of versions > or foreign distros. The other big use is "do I need to rebuild my binary for this new distro?" This can be either a new Y release in RHEL X.Y where you have built for RHEL 9.2 and you want to make damn well sure that your program is compatible with RHEL9.3. Yes we make ABI stability guarantees but that only covers a portion of the distro. Or it could be you have your program built for Fedora 38 and you want to know if you need to rebuild it for Fedora 39 or if it happens to be ABI compatible. The other use case is within the distro. You are a packager and you just applied a patch. You want to make sure that you didn't break ABI with the patch before you release it onto the world. > > So the idea is instead to let people use an public archive of abixml > artifacts instead of the binaries. The abixml files are relatively > tiny, barely-ever changing, and should be an effective proxy for the > real binaries. It's just a small matter of (a) storing, (b) > using, and (c) collecting it. a) generating b) storing c) using d) collecting I add a) generating because as I have shown in the past there are many cases where libabigail needs to have some bugs fixed to be able to generate abixml files for particular packages. > -------------------- > > For storing this data, I envision overloading the libabigil git repo > (or a new one) with storage of the abixml documents. To keep it dead > simple, there could be one branch per /etc/os-release $ID/$VERSION_ID, > one file per shared library in the distribution. For example, a > fedora-39-x86-64 copy of /usr/lib64/libc.so.6, the file abidw produces > could sit at > > repo git://sourceware.org/git/libabigail.git > branch gitabixml/fedora/39/x86_64 > file /usr/lib64/libc.so.6.xml > > (Symlinks in the distro fs could be represented as symlinks in git.) > > Updates to the distro package of course happen. It seems natural to > update the abixml file for the affected file(s) right there in place. > > Since it may sometimes be desirable to track what package version > (e.g. rpm n-v-r) is associated with the abixml data of a given > version, we could use stylized records in the git commit text (or a > git note, or maybe a tag). That would mean one git commit per updated > package, with metadata message like: > > Package: glibc-2.38-7.fc39.x86_64 > > Maybe abidw version tags would be useful to add. Yeah, one challenge is going to be when for whatever reason dodji needs to update the abixml file format. In theory it shouldn't happen that often but when it does, it will be a HUGE churn in the git repo. I would suggest to mitigate that problem, there should be a abixml version in the pathname. I know that one of the things that we need in the not too distant future is to track ABI on inline functions in C++ and so I expect that will need a AIBXML format change when it happens. > > -------------------- > > For using this data, I envision abidiff / abicompat taking a new form > for its right operand. It could be a git url identifying the distro > branch or tag. libabigail would fetch the corresponding file.xml > within that. Simplify/default the heck out of it for ease of use: > > export $BRANCH=fedora/39/x86_64 > abicompat /bin/myprogram gitabixml:$BRANCH > > (Where "gitabixml:" could instruct the tool to look at the sourceware > libabigail git / gitweb / cgit server. Let users specify different or > private git servers via environment variables or something. As Dodji pointed out below abicompat doesn't currently handle dependencies very well. This has been a long term disagreement between us. To try to simplify (possibly oversimplify) the disagreement. Dodji believes that abicompat does one thing and that is test the forward ABI dependencies from the program to a particular library. Very Unixy do one thing. I disagree with this philosophy and all of my discussions with users indicate that they misinterpret what abicompat actually does thinking that it does more than it does. I'm a strong proponent of making abicompat do what the name implies and what users believe it does and that is actually check that the library is compatible with the program specified on the command line. This references another design principle: The principle of least suprise. Thus IMHO abicompat should: 1) examine both forward and back dependencies. i.e. if a library makes use of a function or variable in the program itself then abicompat should consider that an ABI mismatch. 2) it should include all the dependencies of the program when considering ABI compatibility. Yes this could be made into a script but it ends up being a very inefficient and complicated script. I've written it two different times and people complain about the performance. The problem is it has to load the abixml over and over rather than keeping it in memory when doing the comparison. On complicated programs with lots of dependencies, this takes considerable time. > > -------------------- > > For collecting this data, I envision writing some distro-specific > scripts, kind of like fedabipkgdiff, being run by contributors or > ourselves. One flavour could run in operational installed distros, > doing the equivalent of > > find $PATHS -name '*.so.*' | while read lib; do > # or filter with elfclassify > package=`rpm -qf "$lib"` > abidw "$lib" | (cd $gitrepo/`dirname $lib`; cat > "$lib.xml") > (cd $gitrepo; git commit -m"Package: $package" "$lib.xml") > done A way to gather this easily with the current framework would be to allow BRANCH to refer to a local clone of the abirepo and then let fedabipkgdiff stick its temporary files in there so that it could easily be committed.  That way when I do one of my tests which does a self comparison for every RPM, it generates the full data set. > and rerun that occasionally as updates flow down from the distro. > This could be done on a single beefy box running containers with > different distros. > > Another flavour could be to take a set of RPM/etc. archives on a > filesystem (or an ISO image), incrementally decompress them, run abidw > on the individual files, and similarly construct the git repo of > abixml files. (This is kind of like how debuginfod produces indexes > from a bunch of RPMs.) > > No matter how the local git repo is populated, each branch describing > a data contributor's distro could be pushed to the central one, > bringing that one up to date. Patches representing updates could be > emailed too, but no one will want to read/review that stuff. We'd > probably need a trusted pool of contributors who can just commit to > areas of the central git repo. Secured with gitsigur of course. :-) > > The central repo could be built up entirely gradually. If some > libraries were omitted from initial commits for a distro, a later > contribution could fill in the gaps. > > -------------------- > > OK, how reasonable does all this sound? > > > - FChE >