From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by sourceware.org (Postfix) with ESMTPS id 62AEF3858C36 for ; Tue, 2 Apr 2024 09:40:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 62AEF3858C36 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 62AEF3858C36 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::d2b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712050832; cv=none; b=gM2FOhhx2irjOlfDun49Vud/KAUbFNj6Z7OwAE8caOuuk0EWW3XVw9PwpiTfwSqmCyBXmZk/70R7CTaDMURMbNtwwFvDYHntQsDMiD4NENttghnVVF++dswqq+mrSVQalhD0OCPH3LwcDg4Y33UA4TVoi+ORISBOxvu1rKrMZM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712050832; c=relaxed/simple; bh=Sk6YtvVyHSTmpXAphSF/leuC9kOJj9zy4esebdVdRs0=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=wWDReYqRGDTdpFXOyUqZjFIc/0hNgjvhqVRef46BhTj5rtVf85YydGlNLU96NwcG9FmIrkr8LAy4UqPTRQcsK2KLVA1GFhYfQxjT0x/0CEUjmzbKwD6rzoEsqQbkbOsUZwudTRK39p52Y2V5TLTN+gBf/Bf67rxXGMk7DrMK4e0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-io1-xd2b.google.com with SMTP id ca18e2360f4ac-7d0377850aaso237329839f.3 for ; Tue, 02 Apr 2024 02:40:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712050828; x=1712655628; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Sk6YtvVyHSTmpXAphSF/leuC9kOJj9zy4esebdVdRs0=; b=KjLzIFQkTdhOH1sYHup7FhKD8PMyrwGEZ66dHG7wR746u7BlrlhkDnlh8T1ikqC2ST mQ2TTda7CvpPdGU9hrExztkFl1NrxeaawbpVK4QoPJl0eSqWFZ472m4EqRQSz8iVmFPl rSjQAOwmnCEgjzmtBaB4QbOKJexJDWuFcaBrcpglXqCDOHD2nzEK7SZaQefNpx3SDIji oBx1PFwYp8nE0B0L0W3BmaF+3bNM0NE3J6PGB9eJEy9tUZ8e217HC0e/d4bNvAvS8yuM BaGptATQdf2Z8JHGJNNTf3YgDYha3ul7LkxkqC3+1S4aAr3yy+f4Hvn+OMQf15axW4Qj MDdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712050828; x=1712655628; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sk6YtvVyHSTmpXAphSF/leuC9kOJj9zy4esebdVdRs0=; b=n9+PmM3JorRoojq60msR5jTwfwvqhFCxbMTwp1xDZ0rN0ou/4odkWAqRyETUiRJLoV 1r2DwbOubYyRp48998S24nNRFMx2UnM/GSSw9XKd9feOkUtBi7Yoi2UjBm94TYejwo2v BvBkWoWmPuGD68SbUI3FBKsmr251Df3YaTu7u2u1/8yB6cGuZdXDVpJlz+Ou2Q+ESVAK VJspP3ao0Bm6FlnKrobLQoBdNrMQltKkowuUF0QBgQ6fKMOhZEa3e2U9XSRqPkXaHViC yBIZUzMTM3qtor91rD4BkMh/wpjmc7OMsOGXzZUTfogShFaOe3cdzEyMMfd2+DlVyyEP PTDw== X-Forwarded-Encrypted: i=1; AJvYcCUF0lAmOZNnQPrp5tCW/jE7W6Yuortem/cagjw+fM4lEf/ev0j+mmMvDKeCigD3GSnvgagr0p8eKoATAGmD66PXIuMF1tZMwQ== X-Gm-Message-State: AOJu0YwTVCPFiUbAz2OJQXkRrBii+92yarMpdXLVbQCSA9YbkImlHW6i qmprl/IpdeU2pZbMS7FS6y+iYVNDBvP49qNS7OhPYS1bbMhVEfJqw3f/qN+XvSvZmPlOtYywN49 2c4xp4M528wh6X2/x1++vJTYPfxM= X-Google-Smtp-Source: AGHT+IF/wcBDjxrexsG281zmAYXSJ0SiSs7U+npWvONnErbt5KeU1p0GTdUNT3BqeG67htXZ++cQdp0GaCjWErQLtu8= X-Received: by 2002:a6b:d90a:0:b0:7d0:da67:9909 with SMTP id r10-20020a6bd90a000000b007d0da679909mr3127392ioc.19.1712050828373; Tue, 02 Apr 2024 02:40:28 -0700 (PDT) MIME-Version: 1.0 References: <877chhe80o.fsf@gentoo.org> In-Reply-To: From: Rui Ueyama Date: Tue, 2 Apr 2024 18:40:17 +0900 Message-ID: Subject: Re: Remove dependency on libjansson To: Luca Boccassi Cc: Sam James , Fangrui Song , Binutils Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: We have discussed various topics already, and I don't think there's a single answer because this is all about engineering tradeoffs. I'd like to hear from other devs who are following this thread if there are= any. On Tue, Apr 2, 2024 at 9:54=E2=80=AFAM Luca Boccassi wro= te: > > On Mon, 1 Apr 2024 at 14:23, Rui Ueyama wrote: > > > > On Mon, Apr 1, 2024 at 9:16=E2=80=AFPM Luca Boccassi = wrote: > > > > > > On Mon, 1 Apr 2024 at 12:44, Rui Ueyama wrote: > > > > > > > > On Mon, Apr 1, 2024 at 7:38=E2=80=AFPM Luca Boccassi wrote: > > > > > > > > > > On Mon, 1 Apr 2024 at 10:50, Sam James wrote: > > > > > > > > > > > > Fangrui Song writes: > > > > > > > > > > > > > On Sun, Mar 31, 2024 at 8:31=E2=80=AFPM Rui Ueyama wrote: > > > > > > >> > > > > > > >> Hi, > > > > > > >> > > > > > > >> The recent xz incident demonstrated that supply chain attack= s are a > > > > > > >> real threat, and dependence on third-party libraries can hav= e > > > > > > >> significant consequences. > > > > > > >> > > > > > > >> In the wake of the incident, I propose we remove the depende= ncy on > > > > > > >> libjansson from GNU ld. > > > > > > >> > > > > > > >> First of all, why does GNU ld depend on libjansson which is = a JSON > > > > > > >> parsing library? GNU ld gained the `--package-metadata` opti= on in May > > > > > > >> 2022 to embed a JSON string into a .note section for package > > > > > > >> management for Fedora and other Linux distributions. At the = same time, > > > > > > >> the dependency on libjansson, a library for parsing JSON-for= mat > > > > > > >> strings, was introduced to validate an argument for that opt= ion. If an > > > > > > >> argument is not a valid JSON string, ld reports an error. If= the > > > > > > >> library is unavailable, or if `--disable-jansson` was passed= to the > > > > > > >> configure script, the library will not be linked and the err= or check > > > > > > >> will be disabled. By default, the library will be linked if = it exists. > > > > > > >> > > > > > > >> I opposed adding an extra dependency to GNU ld just for stri= ng > > > > > > >> verification purposes because it didn't seem worth adding ex= tra > > > > > > >> dependency to the linker. LLVM lld and the mold linker also = support > > > > > > >> the option, but they do not verify if the argument is a vali= d JSON > > > > > > >> string -- they simply treat it as an opaque string. If libja= nsson is > > > > > > >> unavailable, even GNU ld doesn't verify arguments. Therefore= , the > > > > > > >> verification is not trustworthy, and the reader must be prep= ared for a > > > > > > >> malformed JSON string when reading a .note section. Moreover= , > > > > > > >> verifying a string is straightforward without the feature; y= ou can > > > > > > >> simply `echo` the string to pipe it to `jq` for verification= before > > > > > > >> passing it to GNU ld. > > > > > > >> > > > > > > >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to = be > > > > > > >> released this month, and the dependency on libjansson was in= deed > > > > > > >> present. > > > > > > >> > > > > > > >> How much risk does it pose? Probably not much, as long as th= e library > > > > > > >> is maintained properly. However, the stakes are high; if som= eone takes > > > > > > >> control of the library and introduces malicious code, they c= ould > > > > > > >> execute a Ken Thompson-style supply chain attack. Since GNU = ld is used > > > > > > >> to build essentially everything, the attacker could in theor= y gain the > > > > > > >> power to not just contaminate a specific program such as ope= nssh, but > > > > > > >> every executable in an official Linux distribution image. I = think the > > > > > > >> risk is not worth taking. I believe we just should remove th= e string > > > > > > >> verification code and the dependency on the library from GNU= ld. > > > > > > >> > > > > > > >> Rui Ueyama > > > > > > > > > > > > > > Thanks for bringing this up again. I support removing the jso= n dependency. > > > > > > > > > > > > > > I lightly expressed my concern > > > > > > > https://sourceware.org/pipermail/binutils/2022-May/120846.htm= l and > > > > > > > there might be others unsure about the dependency as well. > > > > > > > > > > > > I'd like to hear bluca's take before making up my mind. Note th= at it's > > > > > > also automagic right now IIRC (enabled if installed, not opt-in= ). > > > > > > > > > > > > But my take on it so far is that it doesn't sound worth it. > > > > > > > > > > $ apt-cache rdepends libjansson4 | wc -l > > > > > 310 > > > > > > > > > > Sorry, but this looks to me like a knee-jerk reaction that misses= the > > > > > wood for the trees. The xz issue is that an extremely complex and > > > > > sophisticated social engineering attack was planned and executed = over > > > > > multiple years. If it hadn't been on xz, it would have been on an= other > > > > > project. Dropping dependencies left and right without thinking wi= ll > > > > > not stop the next attack, as there is no shortage of overworked > > > > > maintainers to try and abuse. In fact, avoiding common libraries = and > > > > > everyone reimplementing the wheel _increases_ the attack > > > > > opportunities. > > > > > > > > > > The reason for this integration is that finding issues at build t= ime > > > > > is cheaper than finding them at runtime or release time. We rely = on > > > > > this in Fedora, Debian and Ubuntu to ensure valid data is written= (in > > > > > all packages in the former, in an increasing set of opt-ins in th= e > > > > > latters) - in fact, validating input _before_ shipping binaries > > > > > _improves_ security, rather than decreasing it, given the payload= is > > > > > parsed by a privileged component. > > > > > > > > > > So if it was removed upstream, we'd just have to patch it back in > > > > > downstream in each interested distro - which if you recall, was > > > > > exactly how sshd was exploited, due to the lack of an upstream co= mmon > > > > > implementation. This is opt-in, so distributions that are not > > > > > interested can just avoid enabling it, with no extra effort requi= red. > > > > > So please, leave it as-is. Thank you. > > > > > > > > I'd like to say that that's not an instinctive reaction to the xz > > > > incident, as I expressed a concern when the dependency was being > > > > introduced. I also want to say that I'm not trying to drop > > > > dependencies left and right without thinking. > > > > > > Well, the email did start with: > > > > > > > The recent xz incident demonstrated that supply chain attacks are a= real threat, and dependence on third-party libraries can have significant = consequences. > > > > > > so it should be understandable why that's the impression it gave me. > > > This feature has been present in binutils for 2 years now, and AFAIK > > > caused zero issues so far (probably going to jinx it with this > > > comment, but still). > > > > > > > I understand that the error check would be sometimes useful, but I'= d > > > > like to propose reevaluating its cost and benefit. IIUC, the > > > > --package-metadata option is, as its name suggests, used by packagi= ng > > > > tools such as dpkg-dev or rpmbuild. My understanding is that this > > > > option is not intended to be used directly by developers at large, = but > > > > rather, the JSON string is generated by a very limited set of > > > > packaging tools from package description files and passed to the > > > > linker. If that's the case, the string format check itself is an ex= tra > > > > measure to catch rare string construction bugs, given that JSON > > > > strings are machine-generated and not manually crafted. > > > > > > The metadata comes from each package, so it is very much possible for > > > errors to appear in one build but not another. And they would be hard > > > to spot and expensive to fix, as they'd likely only be found when > > > things go really wrong and the data is needed at runtime, or worse. > > > > Correct me if I'm wrong, but I believe the main concern in creating a > > malformed JSON string is forgetting to escape certain characters when > > embedding it in a JSON string. Correctly handling it shouldn't be too > > difficult. > > No, it's about anything that results in an invalid payload. > > > I understand that it is sometimes hard to notice an error in the .note > > section as it's just embedded data sitting there, and errors in the > > section would often be noticed exactly in the situation when you > > really need it. But if you are very cautious about the sanity of the > > .note section, why would you stop here and only focus on catching JSON > > format errors? If you do not trust your own tool and aim to catch bugs > > before its generated data goes into production, you should read the > > .note section back and compare the JSON contents with its package > > metadata file. There are tons of different ways for the tool to > > semantically go wrong such as missing field or something like that, > > and just catching JSON format errors (or, maybe just string escaping > > bugs) seems insufficient to me. And if you think that that kind of > > error would be unlikely, then I wonder why you are worried about the > > JSON format error. > > No, that is not a problem in reality, I'm not sure where you got that > impression. What really matters is ensuring the payload is valid json, > and the best way to do that is the current setup, because it costs > nothing, it's already implemented, it works out of the box > automatically for any caller regardless of the distribution and the > packaging format, it's inoffensive as it is not enabled by default and > requires opt-in, and has zero drawbacks or issues as evidenced by the > 2 years it's been available and used. I don't see why it's necessary > to create non-existing problems out of thin air and break existing > workflows, as a knee-jerk reaction driven by a fundamental > misunderstanding about what happened to a completely separate and > independent project. There is literally nothing to gain and all to > lose. > > > > > While we may still do extra checks for catching possible tooling bu= gs, > > > > I wonder how difficult it would be to assign this additional > > > > verification to jq or a similar tool instead of the linker. IIRC, d= pkg > > > > comes with a tool to scan all executables in a directory to find > > > > packaging errors. Isn't it a better place to run the extra format > > > > check? Would it be too complex to output the .note section contents > > > > after linking and pipe them to jq to confirm the embedded string is= a > > > > valid JSON string? > > > > > > There's no such thing in dpkg. You might be thinking of Lintian, but > > > that's an entirely separate tool, that is optional, runs long after > > > the package is published and distributed, and is only seldomly checke= d > > > (most maintainers just ignore it as it's informational only), and > > > doesn't stop bad builds from being generated and released. It's also > > > package-format-specific, so again every distribution would have to > > > reimplement exactly the same check in different ways, which is just > > > duplication of work for no gain. The intention here was to have bad > > > input fail the build immediately, with no extra effort required from > > > distributions or packagers, and it works very well for that. > > > > > > This is akin to suggesting that compilers shoudn't output warnings, > > > but the job should just be left to static analyzers. The earlier > > > problems are detected, the cheaper they are to fix, and the less > > > impact they have. This is one of the few truisms of engineering. > > > > If a --package-metadata argument is machine-generated, a better > > analogy would be the relationship between the compiler and the linker, > > rather than the human programmer and the compiler. The linker > > sometimes verifies object file contents, but it doesn't disassemble > > the .text section to catch rare assembler bugs. > > No, it's actually a good analogy. Removing warnings and telling people > to just do post-processing would be equally nonsensical, as warnings > are already implemented, they provide value if one wants to use them > and are harmless if one wants to ignore them, and do not cause any > real issue. Moreover, as already explained, this is not how > distributions and packaging actually work in the real world, there is > no such postprocessing, so it's even more nonsensical as a suggestion.