* GNU Toolchain Infrastructure at sourceware @ 2022-05-30 22:01 Mark Wielaard 2022-05-31 16:39 ` Frank Ch. Eigler 2022-06-22 9:14 ` Roadmap update Mark Wielaard 0 siblings, 2 replies; 7+ messages in thread From: Mark Wielaard @ 2022-05-30 22:01 UTC (permalink / raw) To: overseers Hi, Here is a quick overview of some of the GNU Toolchain Infrastructure work around CI/CD and how to integrate that better with the email based workflow used on sourceware. Nothing that hasn't been discussed before, but it might be helpful to have it all in one place. And to see if we are moving in the right direction. It seems these services are fairly low on resource usage, at least compared to other web, git, email and bugzilla services. But please take a look in case there are any concerns about the sustainability or scalability on the current hosting infrastructure. builder.sourceware.org has been running for a couple of months now. The buildbot process seems to have a low load on the machine. Low single digit %CPU. The state database has grown to ~800MB, the gitpoller-work dir is ~4GB (that is mainly gcc ~2G, libabigail ~750M and binutils-gdb ~600M, others are < 200M). We have native/VM workers for ppc64le, s390x, ppc64, i386, arm64 and armhf for debian, fedora and centos (although not all combinations) and x86_64 container builders for fedora, debian and opensuse. See https://builder.sourceware.org/ for the current sponsors. There are a couple of other people who have offered runners. So at the moment there is enough room for expansion of the builders. If we need more, or more powerful runners, I intend to ask the GCC Compile Farm maintainers or the OSUOSL. There are 80 builders on 14 workers, doing ~120 builds a day (more on week days, less on weekends). With the just added new opensuse container builders this will likely jump to ~150 a day. There are a couple of full testsuite builders (for gcc and binutils-gdb), but most builders are "quick" CI builders, which will sent email whenever a regression is detected. It seems to catch and report a couple of issues a week across all projects. There have been various flaky tests but those seem to all have been disabled now. Various builders upload their test results to the bunsendb so they can be analysed. I am slightly worried about the long time these uploads take. For example the upload of results from the elfutils builders takes a significant amount of time (almost half) of the build (4 to 5 out of 10 to 12 minutes). Especially on the slower workers this can cause them to get behind on the build queue. The bunsendb.git is currently ~400MB and contains ~2000 results. It seems it compresses fairly well and could maybe use a periodic repacking (making a local clone reduces the size to ~260MB). There is still a small TODO list for the buildbot: https://sourceware.org/git/?p=builder.git;a=blob;f=TODO But maintenance is now minimal with the builders just running without needing supervision. Discussions have moved to the buildbot@sourceware.org mailinglist. It is now up to the projects to provide testsuite (subsets, possibly split by arch) that are stable and act when the buildbot reports a regression. The bunsendb should be able to help with selecting non-flaky tests. One of the workers runs on sourceware itself. It is currently used to reconfig the buildbot itself (meta!). But projects can also use it to run tasks on specific commits or periodically. Tasks which are now often done by a cron job or git hook. For example to update documentation, websites, generate release tars or update bugzilla. The advantage over cron jobs is that it can be done more immediately and/or only when really needed based on specific commit files. The advantage over git hooks is that they run in the builder context, not in the context of the specific user that pushed a commit. The current builder CI checks what has been committed on the main branch of the projects. This makes sure that what is checked out is in a good state and any pushed regressions are found early and often. Now that most of these builders are green we can start watching user/try branches. So when a user pushes to their try branch the same builder CI checks are ran, so a project developer knows their proposed patch(es) won't break the build or introduce regressions. The above only helps developers that have commit access on sourceware, but not others who sent in patches. For that we have https://patchwork.sourceware.org/ plus the CICD trybot that DJ wrote https://sourceware.org/glibc/wiki/CICDDesign To make this work better and connect it to the buildbot we need to upgrade to the latest patchwork (and update django). The current trybot doesn't do authentication, this might not be OK for all builders. So we want to either require checking for known GPG keys on the patch emails or let a trusted developer set a flag in patchwork. Once we have public-inbox setup we could also use b4 for DKIM attestation for known/trusted hackers. Some projects have already experimented with public-inbox. But we don't have an instance running on sourceware itself yet. This would resolve complaints of not very usable mailman archives. And for people wanting a more forge like experience we are already running a mirror of sourceware projects at sourcehut https://sr.ht/~sourceware/ This allows a user on sourcehut to fork any project, prepare their patches and submit a merge request through email (without having to locally setup git send-email or smtp - the patch emails are generated server side). The sourcehut mirror is currently read-only. Sourcehut is designed around email based workflows, fully Free Software, doesn't use javascript and is much faster and resource constrained compared to (proprietary) alternatives. The sourcehut beta will have groups support. We can test a self-hosted instance then. The various sr.ht components are very modular so we can only use those parts we need. Cheers, Mark ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GNU Toolchain Infrastructure at sourceware 2022-05-30 22:01 GNU Toolchain Infrastructure at sourceware Mark Wielaard @ 2022-05-31 16:39 ` Frank Ch. Eigler 2022-05-31 21:50 ` Mark Wielaard 2022-07-22 16:48 ` Mark Wielaard 2022-06-22 9:14 ` Roadmap update Mark Wielaard 1 sibling, 2 replies; 7+ messages in thread From: Frank Ch. Eigler @ 2022-05-31 16:39 UTC (permalink / raw) To: Overseers mailing list; +Cc: Mark Wielaard Hi - > builder.sourceware.org has been running for a couple of months > now. The buildbot process seems to have a low load on the machine. Low > single digit %CPU. The state database has grown to ~800MB, the > gitpoller-work dir is ~4GB (that is mainly gcc ~2G, libabigail ~750M > and binutils-gdb ~600M, others are < 200M). I suspect those git work trees could be made into shallow clones, or use git-alternates or somesuch to minimize unnecessary .git/objects duplication right there on the same machine. > I am slightly worried about the long time these uploads take. For > example the upload of results from the elfutils builders takes a > significant amount of time (almost half) of the build (4 to 5 out of > 10 to 12 minutes). Especially on the slower workers this can cause > them to get behind on the build queue. Yeah, this appears to be a buildbot infrastructure limitation. Something is very inefficient about the way it evaluates file upload directives. If it turns into a bigger problem, can try to revisit. > The bunsendb.git is currently ~400MB and contains ~2000 results. It > seems it compresses fairly well and could maybe use a periodic > repacking (making a local clone reduces the size to ~260MB). No big deal, auto repack will do most of the work fine. Added to that, there is the bunsenql.sqlite database, which is at the moment around 2GB. It will ebb and flow with the bunsendb.git contents. I think all of the above storage levels are sustainable for at least several months at this rate, after which point we could start aging out old data. (Plus we have an extra TB of space for /sourceware[12] that I plan to bring online shortly.) - FChE ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GNU Toolchain Infrastructure at sourceware 2022-05-31 16:39 ` Frank Ch. Eigler @ 2022-05-31 21:50 ` Mark Wielaard 2022-05-31 22:12 ` Joseph Myers 2022-07-22 16:48 ` Mark Wielaard 1 sibling, 1 reply; 7+ messages in thread From: Mark Wielaard @ 2022-05-31 21:50 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Overseers mailing list Hi Frank, On Tue, May 31, 2022 at 12:39:32PM -0400, Frank Ch. Eigler wrote: > > builder.sourceware.org has been running for a couple of months > > now. The buildbot process seems to have a low load on the machine. Low > > single digit %CPU. The state database has grown to ~800MB, the > > gitpoller-work dir is ~4GB (that is mainly gcc ~2G, libabigail ~750M > > and binutils-gdb ~600M, others are < 200M). > > I suspect those git work trees could be made into shallow clones, or > use git-alternates or somesuch to minimize unnecessary .git/objects > duplication right there on the same machine. Good point the buildbot and the git repos share the same machine/storage. We use https URLs for the changesource pollers because we reuse the same for the build factories which run on the workers. But they don't have to be the same. I'll experiment with that. But will have to be careful to not reset the whole history. A more general point is that gcc.git is really a couple of orders bigger than anything else. And that affects more things than just the buildbot. I wonder if we should cut off a bit more history. It would mean that people who really have to search back to before say gcc-5 need to stich in the gcc-old.git. But if it makes the default clone 1GB smaller that would be really good. > > I am slightly worried about the long time these uploads take. For > > example the upload of results from the elfutils builders takes a > > significant amount of time (almost half) of the build (4 to 5 out of > > 10 to 12 minutes). Especially on the slower workers this can cause > > them to get behind on the build queue. > > Yeah, this appears to be a buildbot infrastructure limitation. > Something is very inefficient about the way it evaluates file upload > directives. If it turns into a bigger problem, can try to revisit. Do we really need that all the individual .log and .trs files for each test? That is more than hundreds of files in some cases (like elfutils). Given that one file seems to take 0.5 a second that is easily multiple minutes. > > The bunsendb.git is currently ~400MB and contains ~2000 results. It > > seems it compresses fairly well and could maybe use a periodic > > repacking (making a local clone reduces the size to ~260MB). > > No big deal, auto repack will do most of the work fine. > > Added to that, there is the bunsenql.sqlite database, which is at the > moment around 2GB. It will ebb and flow with the bunsendb.git > contents. > > I think all of the above storage levels are sustainable for at least > several months at this rate, after which point we could start aging > out old data. There is a janitor which will delete builder logs older than ~2.5 months. Which is about as long as the buildbot has been running. So I expect the state database to not grow as much from now on. > (Plus we have an extra TB of space for /sourceware[12] > that I plan to bring online shortly.) OK! Now we are talking. And here I was concerned about a couple of GBs :) Thanks, Mark ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GNU Toolchain Infrastructure at sourceware 2022-05-31 21:50 ` Mark Wielaard @ 2022-05-31 22:12 ` Joseph Myers 2022-06-01 10:09 ` Mark Wielaard 0 siblings, 1 reply; 7+ messages in thread From: Joseph Myers @ 2022-05-31 22:12 UTC (permalink / raw) To: Mark Wielaard via Overseers; +Cc: Frank Ch. Eigler, Mark Wielaard On Tue, 31 May 2022, Mark Wielaard via Overseers wrote: > A more general point is that gcc.git is really a couple of orders > bigger than anything else. And that affects more things than just the > buildbot. I wonder if we should cut off a bit more history. It would > mean that people who really have to search back to before say gcc-5 > need to stich in the gcc-old.git. But if it makes the default clone > 1GB smaller that would be really good. gcc-old.git is the old git-svn mirror, now read only. It has no relation to the main version of the history in gcc.git (the gcc-old.git history is also available in gcc.git under refs/git-old/ and refs/git-svn-old/, not fetched by default). The vast bulk of the approximately 6000 refs in gcc.git are *not* fetched by default, and the repository is set up with delta islands to make that efficient. People not wanting full history of the branches they clone can create a shallow clone with git clone --depth (and people wanting full history but only for one branch can use --single-branch). The GCC repository size is similar to that for the Linux kernel. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GNU Toolchain Infrastructure at sourceware 2022-05-31 22:12 ` Joseph Myers @ 2022-06-01 10:09 ` Mark Wielaard 0 siblings, 0 replies; 7+ messages in thread From: Mark Wielaard @ 2022-06-01 10:09 UTC (permalink / raw) To: Overseers mailing list; +Cc: Joseph Myers Hi Joseph, On Tue, 2022-05-31 at 22:12 +0000, Joseph Myers via Overseers wrote: > On Tue, 31 May 2022, Mark Wielaard via Overseers wrote: > > > A more general point is that gcc.git is really a couple of orders > > bigger than anything else. And that affects more things than just the > > buildbot. I wonder if we should cut off a bit more history. It would > > mean that people who really have to search back to before say gcc-5 > > need to stich in the gcc-old.git. But if it makes the default clone > > 1GB smaller that would be really good. > > gcc-old.git is the old git-svn mirror, now read only. It has no relation > to the main version of the history in gcc.git (the gcc-old.git history is > also available in gcc.git under refs/git-old/ and refs/git-svn-old/, not > fetched by default). How embarrassing. I only assumed, I didn't actually check, (nor did I actually try to create such a "small" gcc.git). You are of course right. a) The default fetches objects go all the way back to including pre egcs history. b) By default you indeed only fetch about 1GB. > The vast bulk of the approximately 6000 refs in gcc.git are *not* fetched > by default, and the repository is set up with delta islands to make that > efficient. > > People not wanting full history of the branches they clone can create a > shallow clone with git clone --depth (and people wanting full history but > only for one branch can use --single-branch). > > The GCC repository size is similar to that for the Linux kernel. Right, so it isn't completely unusual to have such large repos. And indeed the buildbot workers do use --depth 1. So for automation this isn't really such a big deal. But I would say that like the linux kernel tree it is somewhat unusual for developers to have such a giant tree. That 1GB is still somewhat big on slower networks and it does take significant resources when resolving the deltas. --single-branch doesn't really significantly reduce that. But --depth 1 does of course. IMHO it would be good to have something in between. Maybe a standard tree of --depth ~20000 (around when we cutover to git). But maybe that is still too little history as a default. And it could of course be clearly documented. But who reads docs... Another hurdle for the first time gcc hacker is the default configure flags. If you forget --disable-bootstrap for your first hacking it really takes forever to build. But maybe this is a better conversation on the gcc@ mailinglist. I also may be too eager to be popular with hackers who don't like reading setup documentation. And in the end gcc does contain so many frontends and libraries these days that a single "easy" default is impossible to give. Cheers, Mark ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GNU Toolchain Infrastructure at sourceware 2022-05-31 16:39 ` Frank Ch. Eigler 2022-05-31 21:50 ` Mark Wielaard @ 2022-07-22 16:48 ` Mark Wielaard 1 sibling, 0 replies; 7+ messages in thread From: Mark Wielaard @ 2022-07-22 16:48 UTC (permalink / raw) To: overseers Hi, Just a quick size/resources update after two more months. On Tue, May 31, 2022 at 12:39:32PM -0400, Frank Ch. Eigler via Overseers wrote: > > builder.sourceware.org has been running for a couple of months > > now. The buildbot process seems to have a low load on the machine. Low > > single digit %CPU. The state database has grown to ~800MB, the > > gitpoller-work dir is ~4GB (that is mainly gcc ~2G, libabigail ~750M > > and binutils-gdb ~600M, others are < 200M). > > I suspect those git work trees could be made into shallow clones, or > use git-alternates or somesuch to minimize unnecessary .git/objects > duplication right there on the same machine. While we haven't done that yet the gitpoller-work git repos shrunk to 2.5G. Even though we added a new poller and through the try-schedulers now also pull the users branches for binutils-gdb and libabigai. It looks like this is more efficient packing. But I don't full understand how it became that much more efficiently stored. We did add an extra debian-i386 stable VM worker. Added container builders for debian testing and fedora rawhide. IBM provided power8, power9 and power10 workers for gdb and valgrind builders, Arm did the same for gdb with workers for armhf and arm64. And OSUOSL provided an 8 core arm64 worker on which we run fedora latest builders for all projects (mostly replacing the little arm64 odroid board. We added a glibc poller, scheduler and builders on fedora-x86_64, fedora-arm64, debian-i386, rawhide-x86_64, debian-testing-x86_64, fedora-s390x, debian-ppc64, fedora-ppc64le, opensusue-tumbleweed and leap on x86_64. And added try builders for binutils, gdb and libabigail. And a gccrs bootstrap builder. The new gcc builders are build-only --disable-bootstrap builders for just C and C++. We are still waiting on a larger x86_64 worker to run the full gcc testsuite. We are seeing ~500 builds a day. The CPU usage is slightly higher now because builds are running all day now, but mostly still single digit percentage. But while uploading new bunsen results CPU usage spikes to ~40%/~70%. The state database is now 4GB. > > I am slightly worried about the long time these uploads take. For > > example the upload of results from the elfutils builders takes a > > significant amount of time (almost half) of the build (4 to 5 out of > > 10 to 12 minutes). Especially on the slower workers this can cause > > them to get behind on the build queue. > > Yeah, this appears to be a buildbot infrastructure limitation. > Something is very inefficient about the way it evaluates file upload > directives. If it turns into a bigger problem, can try to revisit. The new cpio helped, the uploads take less time. But we are seeing much more uploads with some CPU spikes. I don't think it is concerning, but something to watch. > > The bunsendb.git is currently ~400MB and contains ~2000 results. It > > seems it compresses fairly well and could maybe use a periodic > > repacking (making a local clone reduces the size to ~260MB). > > No big deal, auto repack will do most of the work fine. It contains almost 10 times the number of results now (19000+) and the bunsendb.git is only 3 times as big (1.2G). > Added to that, there is the bunsenql.sqlite database, which is at the > moment around 2GB. It will ebb and flow with the bunsendb.git > contents. The sqlite database did indeed grow as much as the bunsendb.git, it is now 3GB. > I think all of the above storage levels are sustainable for at least > several months at this rate, after which point we could start aging > out old data. (Plus we have an extra TB of space for /sourceware[12] > that I plan to bring online shortly.) Yes, it looks like even with the 10x growth in 2 months we didn't really grow resource usage that much. We won't need the extra TB of storage just yet. There is more than 128GB free at the moment, which will certainly get us to next year. But having that extra TB of storage online would help expanding other services. Cheers, Mark ^ permalink raw reply [flat|nested] 7+ messages in thread
* Roadmap update 2022-05-30 22:01 GNU Toolchain Infrastructure at sourceware Mark Wielaard 2022-05-31 16:39 ` Frank Ch. Eigler @ 2022-06-22 9:14 ` Mark Wielaard 1 sibling, 0 replies; 7+ messages in thread From: Mark Wielaard @ 2022-06-22 9:14 UTC (permalink / raw) To: Overseers mailing list; +Cc: buildbot Given the recent additions of a bunsenwidget, the patchwork upgrade, the git users try branches and getting more worker compute resources I updated the roadmap a bit adding a bit more context. More ideas, comments, welcome! The same, in HTML, with a few more URLs added, can be found here: https://gnu.wildebeest.org/blog/mjw/2022/06/22/sourceware-gnu-toolchain-infrastructure-roadmap/ Sourceware – GNU Toolchain Infrastructure roadmap ================================================= Making email/git based workflow more fun, secure and productive by automating contribution tracking and testing across different distros and architectures. What is Sourceware? ------------------- Sourceware, https://sourceware.org/, is community run infrastructure, mailinglists, git, bug trackers, wikis, etc. hosted in the Red Hat Open Source Community Infrastructure Community Cage together with servers from e.g. Ceph, CentOS, Fedora and Gnome. Sourceware is mainly known for hosting the GNU Toolchain projects, like gcc at https://gcc.gnu.org/, glibc, binutils and gdb. But also hosts projects like annobin, bunsen, bzip2, cgen, cygwin at https://cygwin.org/, debugedit, dwz, elfutils at http://elfutils.org, gccrs, gnu-abi, insight, kawa, libffi, libabigail, mauve, newlib, systemtap and valgrind at https://valgrind.org/. A longer list of Sourceware projects, those without their own domain name, including several dormant projects, can be found here: https://sourceware.org/mailman/listinfo. Most of these projects use a email/git based workflow using mailinglists for discussing patches in preference to web based "forges". Zero maintenance automation --------------------------- Although email based git workflows are great for real patch discussions, they do not always make tracking the state of patches easy. Just like our other services, such as bugzilla, mailinglists and git repos we like to provide zero maintenance infrastructure for tracking and automation of patches and testing. So we are trying to consolidate around a shared buildbot for (test) automation and patchwork for tracking the state of contributions. By sharing experiences between the Sourceware projects and coordination and fully automating the infrastructure services. A shared buildbot ----------------- We have a shared buildbot for Sourceware projects at https://builder.sourceware.org/. This includes compute resources (buildbot-workers) for various architectures thanks to some generous sponsors. We have native/VM workers for x86_64, ppc64le, s390x, ppc64, i386, arm64 and armhf for debian, fedora and centos (although not all combinations yet) and x86_64 container builders for fedora, debian and opensuse. There are currently 95 builders on 15 workers, doing ~300 builds a day (more on week days, less on weekends). There are a couple of full testsuite builders (for gcc and binutils-gdb), but most builders are "quick" CI builders, which will sent email whenever a regression is detected. It seems to catch and report a couple of issues a week across all projects. Builder is its own project on Sourceware which comes with its own git repo, mailinglist and amazing community, that can help you integrate new builders, add workers, containers and get you access to systems to replicate any failures where the buildbot logs don't give enough information. And buildbot itself is automated, so whenever a change is made to add a new builder, or define a new container, the buildbot automatically reconfigures itself and the workers will start using the new container images starting with the next build. The same mechanism can also be used to run tasks on specific commits or periodically. Tasks which are now often done by a cron job or git hook. For example to update documentation, websites, generate release tars or update bugzilla. The advantage over cron jobs is that it can be done more immediately and/or only when really needed based on specific commit files. The advantage over git hooks is that they run in the builder context, not in the context of the specific user that pushed a commit. Picking your (CI) tests ----------------------- Although the buildbot itself is zero maintenance, getting and acting on the results of course is not. We already divide the tests into quick CI tests and full test runs. And most tests upload all results to bunsendb. bunsen can help pick good CI tests by indicating which tests are flaky or compare results across different setups. A prototype testsuite log comparison bunsenweb widget is running at https://builder.sourceware.org/testruns/ Lots of things will be coming here, including taking advantage of testrun cluster analysis that's already being done, a per-testrun testcase search/browse engine, other search operators, testsuite summary (vs detail) grids, who knows, ideas welcome! What about pre-commit checks? ----------------------------- The builder CI checks what has been committed on the main branch of the projects. This makes sure that what is checked out is in a good state and that any pushed regressions are found early and often. There is also support for git user try branches. When a user pushes to their try branch the same builder CI checks are ran, so a project developer knows their proposed patch(es) won't break the build or introduce regressions. The binutils and gdb communities are currently trying this out. Once new builder resources from OSUOSL are installed we'll roll this out to other Sourceware projects. What about non-committers? -------------------------- The above only helps developers that have commit access on sourceware, but not others who sent in patches. For that we have https://patchwork.sourceware.org/ plus the CICD trybot that DJ wrote https://sourceware.org/glibc/wiki/CICDDesign. The glibc community is already using this. We would like to connect patchwork, buildbot and the trybot for other Sourceware projects The current trybot doesn't do authentication, this might not be OK for all builders. So we want to either require checking for known GPG keys on the patch emails or let a trusted developer set a flag in patchwork before the trybot triggers. Once we have public-inbox setup we could also use b4 for DKIM attestation for known/trusted hackers. Some projects have already experimented with public-inbox. But we don't have an instance running on sourceware itself yet. This would resolve complaints of not very usable mailman archives. But I really like to have a webforge! ------------------------------------- You are in luck. We already have a sourcehut mirror at https://sr.ht/~sourceware/ This allows anybody to fork any sourceware project on sourcehut, prepare their patches and submit a merge request through email (without having to locally setup git send-email or smtp, the patch emails are generated server side). Sourcehut is designed around email based workflows, fully Free Software, doesn't use javascript and is much faster and resource constrained compared to (proprietary) alternatives. The sourcehut mirror is currently read-only (but syncs automatically with any git updates made on sourceware). When sourcehut supports project groups (one of the beta goals) we will test a self-hosted instance to see whether this is a good way to attract more contributors without loosing the advantages of the email based workflow. The various sr.ht components are very modular so we can only use those parts we need. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-07-22 16:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-30 22:01 GNU Toolchain Infrastructure at sourceware Mark Wielaard 2022-05-31 16:39 ` Frank Ch. Eigler 2022-05-31 21:50 ` Mark Wielaard 2022-05-31 22:12 ` Joseph Myers 2022-06-01 10:09 ` Mark Wielaard 2022-07-22 16:48 ` Mark Wielaard 2022-06-22 9:14 ` Roadmap update Mark Wielaard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).