From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) by sourceware.org (Postfix) with ESMTPS id DA2DA3858D20 for ; Tue, 2 Apr 2024 00:54:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DA2DA3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DA2DA3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.128.182 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712019265; cv=none; b=WpwlLwkfI1N4zE9XoZ/c3IfUhsWqnTBuSHHqraTGjKAvIZYfrxYKyg0w20fsixoyKI7qNlDQ28GWYXYbzPG0CRzzLWneCtB4LWbbzZih9De1xxBk/EqLtsua4rHHvQ35WcVLIcYu08+nbAsYPysnjxUpYoBQu/MejwyaPyfdQZg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712019265; c=relaxed/simple; bh=YcwP7zkwjLUJPayQ/I5i9VM9LVK70xcqbxGaW4WWdDs=; h=MIME-Version:From:Date:Message-ID:Subject:To; b=lbxNlOH/y2otG9561QILE8Dn//W1A+OyTP/0t6a70AUQelUExcpF4KqEckrX/sdsX0XxxNsEmcwAEEbHoEN6Kw6rjuG5BKLkkY1VOHzLLjure72E8Ka6oUkN98dq3V8QAKmWi5UXpr9hLTUoSoR/bJeZik9cUAVW1X1gN+32KeE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-6151d79db7bso2891817b3.1 for ; Mon, 01 Apr 2024 17:54:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712019262; x=1712624062; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YcwP7zkwjLUJPayQ/I5i9VM9LVK70xcqbxGaW4WWdDs=; b=hk8h72Tr8mkRqBl/jf0AsLGAOE6CaNi1mG7faaWaJNkqQ540/0jBR10YHUxc35WRu9 WkczfdGRSCeFPKFk2HQ2Mc3eOlfRZH8tkcRmD3vLpjQTDbxEloCy6IcITE48nGZP+BOa DE2c0hso90ql81xtOHAS7ZeZpdQFO/rq/liMe9Dk2tXggT/kkP5adKTzQu85tKiqOXjQ d1+HIn88BuJTH6AvQ5/+k6kjCSOqhOo7Z3UJi1RulVxBmm203b0UH2B7AmXSa7udwT7a o5MvB6mgeCt+ernl+i0gcT6r439DrDs7xsWsreF6UGRiraOVYSbLHPI/20Zt46aCU1Oy qHmg== X-Forwarded-Encrypted: i=1; AJvYcCWpqGuTFCqj3Zoo3/kZyglTZmp6PDLaYF6y/C5HWpG1vziXCpcntjZrv3yRElY6ixbJhgp/Kezbj1dt61bNZ5gBmLZBrZTvxA== X-Gm-Message-State: AOJu0YzHT86W3bVo5+Er8YqjP5+TSyeH0LbTwDWP28OPVdyN9xtCUH2K gVgnvZvqMwG6Sa5JS54CE2b/lWWRD/RbjRNqL0ZmHk3CLYbQLCHqL7390m3HEGw= X-Google-Smtp-Source: AGHT+IE7RcJ1pESDw+3F41WH9K+7HgUPJndAQWmcmFFke9cqxpUmulB4myVsqgZ292EWPPTFKEGUuw== X-Received: by 2002:a81:49d7:0:b0:615:e0e:d234 with SMTP id w206-20020a8149d7000000b006150e0ed234mr1972734ywa.16.1712019261907; Mon, 01 Apr 2024 17:54:21 -0700 (PDT) Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com. [209.85.128.174]) by smtp.gmail.com with ESMTPSA id v13-20020a81b80d000000b0060a16fb9209sm2506243ywe.115.2024.04.01.17.54.21 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 01 Apr 2024 17:54:21 -0700 (PDT) Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-615019cd427so7817287b3.3 for ; Mon, 01 Apr 2024 17:54:21 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVGuHPCb1e2OVuGRIt5yQzbGJi1hi+8hI6lDU6K5SKOM0Qx+OWQb5aOlXUQCm74tEfSP4hUT+ulNMpY5pN8vpb5Uyj2XSev8g== X-Received: by 2002:a0d:eac5:0:b0:615:1d51:6188 with SMTP id t188-20020a0deac5000000b006151d516188mr699854ywe.26.1712019261154; Mon, 01 Apr 2024 17:54:21 -0700 (PDT) MIME-Version: 1.0 References: <877chhe80o.fsf@gentoo.org> In-Reply-To: From: Luca Boccassi Date: Tue, 2 Apr 2024 00:54:09 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Remove dependency on libjansson To: Rui Ueyama Cc: Sam James , Fangrui Song , Binutils Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 1 Apr 2024 at 14:23, Rui Ueyama wrote: > > On Mon, Apr 1, 2024 at 9:16=E2=80=AFPM Luca Boccassi w= rote: > > > > On Mon, 1 Apr 2024 at 12:44, Rui Ueyama wrote: > > > > > > On Mon, Apr 1, 2024 at 7:38=E2=80=AFPM Luca Boccassi wrote: > > > > > > > > On Mon, 1 Apr 2024 at 10:50, Sam James wrote: > > > > > > > > > > Fangrui Song writes: > > > > > > > > > > > On Sun, Mar 31, 2024 at 8:31=E2=80=AFPM Rui Ueyama wrote: > > > > > >> > > > > > >> Hi, > > > > > >> > > > > > >> The recent xz incident demonstrated that supply chain attacks = are a > > > > > >> real threat, and dependence on third-party libraries can have > > > > > >> significant consequences. > > > > > >> > > > > > >> In the wake of the incident, I propose we remove the dependenc= y on > > > > > >> libjansson from GNU ld. > > > > > >> > > > > > >> First of all, why does GNU ld depend on libjansson which is a = JSON > > > > > >> parsing library? GNU ld gained the `--package-metadata` option= in May > > > > > >> 2022 to embed a JSON string into a .note section for package > > > > > >> management for Fedora and other Linux distributions. At the sa= me time, > > > > > >> the dependency on libjansson, a library for parsing JSON-forma= t > > > > > >> strings, was introduced to validate an argument for that optio= n. If an > > > > > >> argument is not a valid JSON string, ld reports an error. If t= he > > > > > >> library is unavailable, or if `--disable-jansson` was passed t= o the > > > > > >> configure script, the library will not be linked and the error= check > > > > > >> will be disabled. By default, the library will be linked if it= exists. > > > > > >> > > > > > >> I opposed adding an extra dependency to GNU ld just for string > > > > > >> verification purposes because it didn't seem worth adding extr= a > > > > > >> dependency to the linker. LLVM lld and the mold linker also su= pport > > > > > >> the option, but they do not verify if the argument is a valid = JSON > > > > > >> string -- they simply treat it as an opaque string. If libjans= son is > > > > > >> unavailable, even GNU ld doesn't verify arguments. Therefore, = the > > > > > >> verification is not trustworthy, and the reader must be prepar= ed for a > > > > > >> malformed JSON string when reading a .note section. Moreover, > > > > > >> verifying a string is straightforward without the feature; you= can > > > > > >> simply `echo` the string to pipe it to `jq` for verification b= efore > > > > > >> passing it to GNU ld. > > > > > >> > > > > > >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be > > > > > >> released this month, and the dependency on libjansson was inde= ed > > > > > >> present. > > > > > >> > > > > > >> How much risk does it pose? Probably not much, as long as the = library > > > > > >> is maintained properly. However, the stakes are high; if someo= ne takes > > > > > >> control of the library and introduces malicious code, they cou= ld > > > > > >> execute a Ken Thompson-style supply chain attack. Since GNU ld= is used > > > > > >> to build essentially everything, the attacker could in theory = gain the > > > > > >> power to not just contaminate a specific program such as opens= sh, but > > > > > >> every executable in an official Linux distribution image. I th= ink the > > > > > >> risk is not worth taking. I believe we just should remove the = string > > > > > >> verification code and the dependency on the library from GNU l= d. > > > > > >> > > > > > >> Rui Ueyama > > > > > > > > > > > > Thanks for bringing this up again. I support removing the json = dependency. > > > > > > > > > > > > I lightly expressed my concern > > > > > > https://sourceware.org/pipermail/binutils/2022-May/120846.html = and > > > > > > there might be others unsure about the dependency as well. > > > > > > > > > > I'd like to hear bluca's take before making up my mind. Note that= it's > > > > > also automagic right now IIRC (enabled if installed, not opt-in). > > > > > > > > > > But my take on it so far is that it doesn't sound worth it. > > > > > > > > $ apt-cache rdepends libjansson4 | wc -l > > > > 310 > > > > > > > > Sorry, but this looks to me like a knee-jerk reaction that misses t= he > > > > wood for the trees. The xz issue is that an extremely complex and > > > > sophisticated social engineering attack was planned and executed ov= er > > > > multiple years. If it hadn't been on xz, it would have been on anot= her > > > > project. Dropping dependencies left and right without thinking will > > > > not stop the next attack, as there is no shortage of overworked > > > > maintainers to try and abuse. In fact, avoiding common libraries an= d > > > > everyone reimplementing the wheel _increases_ the attack > > > > opportunities. > > > > > > > > The reason for this integration is that finding issues at build tim= e > > > > is cheaper than finding them at runtime or release time. We rely on > > > > this in Fedora, Debian and Ubuntu to ensure valid data is written (= in > > > > all packages in the former, in an increasing set of opt-ins in the > > > > latters) - in fact, validating input _before_ shipping binaries > > > > _improves_ security, rather than decreasing it, given the payload i= s > > > > parsed by a privileged component. > > > > > > > > So if it was removed upstream, we'd just have to patch it back in > > > > downstream in each interested distro - which if you recall, was > > > > exactly how sshd was exploited, due to the lack of an upstream comm= on > > > > implementation. This is opt-in, so distributions that are not > > > > interested can just avoid enabling it, with no extra effort require= d. > > > > So please, leave it as-is. Thank you. > > > > > > I'd like to say that that's not an instinctive reaction to the xz > > > incident, as I expressed a concern when the dependency was being > > > introduced. I also want to say that I'm not trying to drop > > > dependencies left and right without thinking. > > > > Well, the email did start with: > > > > > The recent xz incident demonstrated that supply chain attacks are a r= eal threat, and dependence on third-party libraries can have significant co= nsequences. > > > > so it should be understandable why that's the impression it gave me. > > This feature has been present in binutils for 2 years now, and AFAIK > > caused zero issues so far (probably going to jinx it with this > > comment, but still). > > > > > I understand that the error check would be sometimes useful, but I'd > > > like to propose reevaluating its cost and benefit. IIUC, the > > > --package-metadata option is, as its name suggests, used by packaging > > > tools such as dpkg-dev or rpmbuild. My understanding is that this > > > option is not intended to be used directly by developers at large, bu= t > > > rather, the JSON string is generated by a very limited set of > > > packaging tools from package description files and passed to the > > > linker. If that's the case, the string format check itself is an extr= a > > > measure to catch rare string construction bugs, given that JSON > > > strings are machine-generated and not manually crafted. > > > > The metadata comes from each package, so it is very much possible for > > errors to appear in one build but not another. And they would be hard > > to spot and expensive to fix, as they'd likely only be found when > > things go really wrong and the data is needed at runtime, or worse. > > Correct me if I'm wrong, but I believe the main concern in creating a > malformed JSON string is forgetting to escape certain characters when > embedding it in a JSON string. Correctly handling it shouldn't be too > difficult. No, it's about anything that results in an invalid payload. > I understand that it is sometimes hard to notice an error in the .note > section as it's just embedded data sitting there, and errors in the > section would often be noticed exactly in the situation when you > really need it. But if you are very cautious about the sanity of the > .note section, why would you stop here and only focus on catching JSON > format errors? If you do not trust your own tool and aim to catch bugs > before its generated data goes into production, you should read the > .note section back and compare the JSON contents with its package > metadata file. There are tons of different ways for the tool to > semantically go wrong such as missing field or something like that, > and just catching JSON format errors (or, maybe just string escaping > bugs) seems insufficient to me. And if you think that that kind of > error would be unlikely, then I wonder why you are worried about the > JSON format error. No, that is not a problem in reality, I'm not sure where you got that impression. What really matters is ensuring the payload is valid json, and the best way to do that is the current setup, because it costs nothing, it's already implemented, it works out of the box automatically for any caller regardless of the distribution and the packaging format, it's inoffensive as it is not enabled by default and requires opt-in, and has zero drawbacks or issues as evidenced by the 2 years it's been available and used. I don't see why it's necessary to create non-existing problems out of thin air and break existing workflows, as a knee-jerk reaction driven by a fundamental misunderstanding about what happened to a completely separate and independent project. There is literally nothing to gain and all to lose. > > > While we may still do extra checks for catching possible tooling bugs= , > > > I wonder how difficult it would be to assign this additional > > > verification to jq or a similar tool instead of the linker. IIRC, dpk= g > > > comes with a tool to scan all executables in a directory to find > > > packaging errors. Isn't it a better place to run the extra format > > > check? Would it be too complex to output the .note section contents > > > after linking and pipe them to jq to confirm the embedded string is a > > > valid JSON string? > > > > There's no such thing in dpkg. You might be thinking of Lintian, but > > that's an entirely separate tool, that is optional, runs long after > > the package is published and distributed, and is only seldomly checked > > (most maintainers just ignore it as it's informational only), and > > doesn't stop bad builds from being generated and released. It's also > > package-format-specific, so again every distribution would have to > > reimplement exactly the same check in different ways, which is just > > duplication of work for no gain. The intention here was to have bad > > input fail the build immediately, with no extra effort required from > > distributions or packagers, and it works very well for that. > > > > This is akin to suggesting that compilers shoudn't output warnings, > > but the job should just be left to static analyzers. The earlier > > problems are detected, the cheaper they are to fix, and the less > > impact they have. This is one of the few truisms of engineering. > > If a --package-metadata argument is machine-generated, a better > analogy would be the relationship between the compiler and the linker, > rather than the human programmer and the compiler. The linker > sometimes verifies object file contents, but it doesn't disassemble > the .text section to catch rare assembler bugs. No, it's actually a good analogy. Removing warnings and telling people to just do post-processing would be equally nonsensical, as warnings are already implemented, they provide value if one wants to use them and are harmless if one wants to ignore them, and do not cause any real issue. Moreover, as already explained, this is not how distributions and packaging actually work in the real world, there is no such postprocessing, so it's even more nonsensical as a suggestion.