public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* Remove dependency on libjansson
@ 2024-04-01  3:31 Rui Ueyama
  2024-04-01  7:28 ` Fangrui Song
  0 siblings, 1 reply; 13+ messages in thread
From: Rui Ueyama @ 2024-04-01  3:31 UTC (permalink / raw)
  To: Binutils

Hi,

The recent xz incident demonstrated that supply chain attacks are a
real threat, and dependence on third-party libraries can have
significant consequences.

In the wake of the incident, I propose we remove the dependency on
libjansson from GNU ld.

First of all, why does GNU ld depend on libjansson which is a JSON
parsing library? GNU ld gained the `--package-metadata` option in May
2022 to embed a JSON string into a .note section for package
management for Fedora and other Linux distributions. At the same time,
the dependency on libjansson, a library for parsing JSON-format
strings, was introduced to validate an argument for that option. If an
argument is not a valid JSON string, ld reports an error. If the
library is unavailable, or if `--disable-jansson` was passed to the
configure script, the library will not be linked and the error check
will be disabled. By default, the library will be linked if it exists.

I opposed adding an extra dependency to GNU ld just for string
verification purposes because it didn't seem worth adding extra
dependency to the linker. LLVM lld and the mold linker also support
the option, but they do not verify if the argument is a valid JSON
string -- they simply treat it as an opaque string. If libjansson is
unavailable, even GNU ld doesn't verify arguments. Therefore, the
verification is not trustworthy, and the reader must be prepared for a
malformed JSON string when reading a .note section. Moreover,
verifying a string is straightforward without the feature; you can
simply `echo` the string to pipe it to `jq` for verification before
passing it to GNU ld.

I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
released this month, and the dependency on libjansson was indeed
present.

How much risk does it pose? Probably not much, as long as the library
is maintained properly. However, the stakes are high; if someone takes
control of the library and introduces malicious code, they could
execute a Ken Thompson-style supply chain attack. Since GNU ld is used
to build essentially everything, the attacker could in theory gain the
power to not just contaminate a specific program such as openssh, but
every executable in an official Linux distribution image. I think the
risk is not worth taking. I believe we just should remove the string
verification code and the dependency on the library from GNU ld.

Rui Ueyama

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01  3:31 Remove dependency on libjansson Rui Ueyama
@ 2024-04-01  7:28 ` Fangrui Song
  2024-04-01  9:39   ` Sam James
  0 siblings, 1 reply; 13+ messages in thread
From: Fangrui Song @ 2024-04-01  7:28 UTC (permalink / raw)
  To: Rui Ueyama; +Cc: Binutils

On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
>
> Hi,
>
> The recent xz incident demonstrated that supply chain attacks are a
> real threat, and dependence on third-party libraries can have
> significant consequences.
>
> In the wake of the incident, I propose we remove the dependency on
> libjansson from GNU ld.
>
> First of all, why does GNU ld depend on libjansson which is a JSON
> parsing library? GNU ld gained the `--package-metadata` option in May
> 2022 to embed a JSON string into a .note section for package
> management for Fedora and other Linux distributions. At the same time,
> the dependency on libjansson, a library for parsing JSON-format
> strings, was introduced to validate an argument for that option. If an
> argument is not a valid JSON string, ld reports an error. If the
> library is unavailable, or if `--disable-jansson` was passed to the
> configure script, the library will not be linked and the error check
> will be disabled. By default, the library will be linked if it exists.
>
> I opposed adding an extra dependency to GNU ld just for string
> verification purposes because it didn't seem worth adding extra
> dependency to the linker. LLVM lld and the mold linker also support
> the option, but they do not verify if the argument is a valid JSON
> string -- they simply treat it as an opaque string. If libjansson is
> unavailable, even GNU ld doesn't verify arguments. Therefore, the
> verification is not trustworthy, and the reader must be prepared for a
> malformed JSON string when reading a .note section. Moreover,
> verifying a string is straightforward without the feature; you can
> simply `echo` the string to pipe it to `jq` for verification before
> passing it to GNU ld.
>
> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
> released this month, and the dependency on libjansson was indeed
> present.
>
> How much risk does it pose? Probably not much, as long as the library
> is maintained properly. However, the stakes are high; if someone takes
> control of the library and introduces malicious code, they could
> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
> to build essentially everything, the attacker could in theory gain the
> power to not just contaminate a specific program such as openssh, but
> every executable in an official Linux distribution image. I think the
> risk is not worth taking. I believe we just should remove the string
> verification code and the dependency on the library from GNU ld.
>
> Rui Ueyama

Thanks for bringing this up again. I support removing the json dependency.

I lightly expressed my concern
https://sourceware.org/pipermail/binutils/2022-May/120846.html and
there might be others unsure about the dependency as well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01  7:28 ` Fangrui Song
@ 2024-04-01  9:39   ` Sam James
  2024-04-01 10:38     ` Luca Boccassi
  0 siblings, 1 reply; 13+ messages in thread
From: Sam James @ 2024-04-01  9:39 UTC (permalink / raw)
  To: Fangrui Song; +Cc: Rui Ueyama, Binutils, Luca Boccassi

Fangrui Song <i@maskray.me> writes:

> On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
>>
>> Hi,
>>
>> The recent xz incident demonstrated that supply chain attacks are a
>> real threat, and dependence on third-party libraries can have
>> significant consequences.
>>
>> In the wake of the incident, I propose we remove the dependency on
>> libjansson from GNU ld.
>>
>> First of all, why does GNU ld depend on libjansson which is a JSON
>> parsing library? GNU ld gained the `--package-metadata` option in May
>> 2022 to embed a JSON string into a .note section for package
>> management for Fedora and other Linux distributions. At the same time,
>> the dependency on libjansson, a library for parsing JSON-format
>> strings, was introduced to validate an argument for that option. If an
>> argument is not a valid JSON string, ld reports an error. If the
>> library is unavailable, or if `--disable-jansson` was passed to the
>> configure script, the library will not be linked and the error check
>> will be disabled. By default, the library will be linked if it exists.
>>
>> I opposed adding an extra dependency to GNU ld just for string
>> verification purposes because it didn't seem worth adding extra
>> dependency to the linker. LLVM lld and the mold linker also support
>> the option, but they do not verify if the argument is a valid JSON
>> string -- they simply treat it as an opaque string. If libjansson is
>> unavailable, even GNU ld doesn't verify arguments. Therefore, the
>> verification is not trustworthy, and the reader must be prepared for a
>> malformed JSON string when reading a .note section. Moreover,
>> verifying a string is straightforward without the feature; you can
>> simply `echo` the string to pipe it to `jq` for verification before
>> passing it to GNU ld.
>>
>> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
>> released this month, and the dependency on libjansson was indeed
>> present.
>>
>> How much risk does it pose? Probably not much, as long as the library
>> is maintained properly. However, the stakes are high; if someone takes
>> control of the library and introduces malicious code, they could
>> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
>> to build essentially everything, the attacker could in theory gain the
>> power to not just contaminate a specific program such as openssh, but
>> every executable in an official Linux distribution image. I think the
>> risk is not worth taking. I believe we just should remove the string
>> verification code and the dependency on the library from GNU ld.
>>
>> Rui Ueyama
>
> Thanks for bringing this up again. I support removing the json dependency.
>
> I lightly expressed my concern
> https://sourceware.org/pipermail/binutils/2022-May/120846.html and
> there might be others unsure about the dependency as well.

I'd like to hear bluca's take before making up my mind. Note that it's
also automagic right now IIRC (enabled if installed, not opt-in).

But my take on it so far is that it doesn't sound worth it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01  9:39   ` Sam James
@ 2024-04-01 10:38     ` Luca Boccassi
  2024-04-01 10:44       ` Sam James
  2024-04-01 11:44       ` Rui Ueyama
  0 siblings, 2 replies; 13+ messages in thread
From: Luca Boccassi @ 2024-04-01 10:38 UTC (permalink / raw)
  To: Sam James; +Cc: Fangrui Song, Rui Ueyama, Binutils

On Mon, 1 Apr 2024 at 10:50, Sam James <sam@gentoo.org> wrote:
>
> Fangrui Song <i@maskray.me> writes:
>
> > On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> The recent xz incident demonstrated that supply chain attacks are a
> >> real threat, and dependence on third-party libraries can have
> >> significant consequences.
> >>
> >> In the wake of the incident, I propose we remove the dependency on
> >> libjansson from GNU ld.
> >>
> >> First of all, why does GNU ld depend on libjansson which is a JSON
> >> parsing library? GNU ld gained the `--package-metadata` option in May
> >> 2022 to embed a JSON string into a .note section for package
> >> management for Fedora and other Linux distributions. At the same time,
> >> the dependency on libjansson, a library for parsing JSON-format
> >> strings, was introduced to validate an argument for that option. If an
> >> argument is not a valid JSON string, ld reports an error. If the
> >> library is unavailable, or if `--disable-jansson` was passed to the
> >> configure script, the library will not be linked and the error check
> >> will be disabled. By default, the library will be linked if it exists.
> >>
> >> I opposed adding an extra dependency to GNU ld just for string
> >> verification purposes because it didn't seem worth adding extra
> >> dependency to the linker. LLVM lld and the mold linker also support
> >> the option, but they do not verify if the argument is a valid JSON
> >> string -- they simply treat it as an opaque string. If libjansson is
> >> unavailable, even GNU ld doesn't verify arguments. Therefore, the
> >> verification is not trustworthy, and the reader must be prepared for a
> >> malformed JSON string when reading a .note section. Moreover,
> >> verifying a string is straightforward without the feature; you can
> >> simply `echo` the string to pipe it to `jq` for verification before
> >> passing it to GNU ld.
> >>
> >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
> >> released this month, and the dependency on libjansson was indeed
> >> present.
> >>
> >> How much risk does it pose? Probably not much, as long as the library
> >> is maintained properly. However, the stakes are high; if someone takes
> >> control of the library and introduces malicious code, they could
> >> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
> >> to build essentially everything, the attacker could in theory gain the
> >> power to not just contaminate a specific program such as openssh, but
> >> every executable in an official Linux distribution image. I think the
> >> risk is not worth taking. I believe we just should remove the string
> >> verification code and the dependency on the library from GNU ld.
> >>
> >> Rui Ueyama
> >
> > Thanks for bringing this up again. I support removing the json dependency.
> >
> > I lightly expressed my concern
> > https://sourceware.org/pipermail/binutils/2022-May/120846.html and
> > there might be others unsure about the dependency as well.
>
> I'd like to hear bluca's take before making up my mind. Note that it's
> also automagic right now IIRC (enabled if installed, not opt-in).
>
> But my take on it so far is that it doesn't sound worth it.

$ apt-cache rdepends libjansson4 | wc -l
310

Sorry, but this looks to me like a knee-jerk reaction that misses the
wood for the trees. The xz issue is that an extremely complex and
sophisticated social engineering attack was planned and executed over
multiple years. If it hadn't been on xz, it would have been on another
project. Dropping dependencies left and right without thinking will
not stop the next attack, as there is no shortage of overworked
maintainers to try and abuse. In fact, avoiding common libraries and
everyone reimplementing the wheel _increases_ the attack
opportunities.

The reason for this integration is that finding issues at build time
is cheaper than finding them at runtime or release time. We rely on
this in Fedora, Debian and Ubuntu to ensure valid data is written (in
all packages in the former, in an increasing set of opt-ins in the
latters) - in fact, validating input _before_ shipping binaries
_improves_ security, rather than decreasing it, given the payload is
parsed by a privileged component.

So if it was removed upstream, we'd just have to patch it back in
downstream in each interested distro - which if you recall, was
exactly how sshd was exploited, due to the lack of an upstream common
implementation. This is opt-in, so distributions that are not
interested can just avoid enabling it, with no extra effort required.
So please, leave it as-is. Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01 10:38     ` Luca Boccassi
@ 2024-04-01 10:44       ` Sam James
  2024-04-01 11:44       ` Rui Ueyama
  1 sibling, 0 replies; 13+ messages in thread
From: Sam James @ 2024-04-01 10:44 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: Fangrui Song, Rui Ueyama, Binutils

Luca Boccassi <bluca@debian.org> writes:

> On Mon, 1 Apr 2024 at 10:50, Sam James <sam@gentoo.org> wrote:
>>
>> Fangrui Song <i@maskray.me> writes:
>>
>> > On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> The recent xz incident demonstrated that supply chain attacks are a
>> >> real threat, and dependence on third-party libraries can have
>> >> significant consequences.
>> >>
>> >> In the wake of the incident, I propose we remove the dependency on
>> >> libjansson from GNU ld.
>> >>
>> >> First of all, why does GNU ld depend on libjansson which is a JSON
>> >> parsing library? GNU ld gained the `--package-metadata` option in May
>> >> 2022 to embed a JSON string into a .note section for package
>> >> management for Fedora and other Linux distributions. At the same time,
>> >> the dependency on libjansson, a library for parsing JSON-format
>> >> strings, was introduced to validate an argument for that option. If an
>> >> argument is not a valid JSON string, ld reports an error. If the
>> >> library is unavailable, or if `--disable-jansson` was passed to the
>> >> configure script, the library will not be linked and the error check
>> >> will be disabled. By default, the library will be linked if it exists.
>> >>
>> >> I opposed adding an extra dependency to GNU ld just for string
>> >> verification purposes because it didn't seem worth adding extra
>> >> dependency to the linker. LLVM lld and the mold linker also support
>> >> the option, but they do not verify if the argument is a valid JSON
>> >> string -- they simply treat it as an opaque string. If libjansson is
>> >> unavailable, even GNU ld doesn't verify arguments. Therefore, the
>> >> verification is not trustworthy, and the reader must be prepared for a
>> >> malformed JSON string when reading a .note section. Moreover,
>> >> verifying a string is straightforward without the feature; you can
>> >> simply `echo` the string to pipe it to `jq` for verification before
>> >> passing it to GNU ld.
>> >>
>> >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
>> >> released this month, and the dependency on libjansson was indeed
>> >> present.
>> >>
>> >> How much risk does it pose? Probably not much, as long as the library
>> >> is maintained properly. However, the stakes are high; if someone takes
>> >> control of the library and introduces malicious code, they could
>> >> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
>> >> to build essentially everything, the attacker could in theory gain the
>> >> power to not just contaminate a specific program such as openssh, but
>> >> every executable in an official Linux distribution image. I think the
>> >> risk is not worth taking. I believe we just should remove the string
>> >> verification code and the dependency on the library from GNU ld.
>> >>
>> >> Rui Ueyama
>> >
>> > Thanks for bringing this up again. I support removing the json dependency.
>> >
>> > I lightly expressed my concern
>> > https://sourceware.org/pipermail/binutils/2022-May/120846.html and
>> > there might be others unsure about the dependency as well.
>>
>> I'd like to hear bluca's take before making up my mind. Note that it's
>> also automagic right now IIRC (enabled if installed, not opt-in).
>>
>> But my take on it so far is that it doesn't sound worth it.
> [...]
> So if it was removed upstream, we'd just have to patch it back in
> downstream in each interested distro - which if you recall, was
> exactly how sshd was exploited, due to the lack of an upstream common
> implementation. This is opt-in, so distributions that are not
> interested can just avoid enabling it, with no extra effort required.
> So please, leave it as-is. Thank you.

(Yeah, I just checked, it's indeed off-by-default and I'm sorry I
misrecalled. Doesn't really bother me then.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01 10:38     ` Luca Boccassi
  2024-04-01 10:44       ` Sam James
@ 2024-04-01 11:44       ` Rui Ueyama
  2024-04-01 12:16         ` Luca Boccassi
  1 sibling, 1 reply; 13+ messages in thread
From: Rui Ueyama @ 2024-04-01 11:44 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: Sam James, Fangrui Song, Binutils

On Mon, Apr 1, 2024 at 7:38 PM Luca Boccassi <bluca@debian.org> wrote:
>
> On Mon, 1 Apr 2024 at 10:50, Sam James <sam@gentoo.org> wrote:
> >
> > Fangrui Song <i@maskray.me> writes:
> >
> > > On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> The recent xz incident demonstrated that supply chain attacks are a
> > >> real threat, and dependence on third-party libraries can have
> > >> significant consequences.
> > >>
> > >> In the wake of the incident, I propose we remove the dependency on
> > >> libjansson from GNU ld.
> > >>
> > >> First of all, why does GNU ld depend on libjansson which is a JSON
> > >> parsing library? GNU ld gained the `--package-metadata` option in May
> > >> 2022 to embed a JSON string into a .note section for package
> > >> management for Fedora and other Linux distributions. At the same time,
> > >> the dependency on libjansson, a library for parsing JSON-format
> > >> strings, was introduced to validate an argument for that option. If an
> > >> argument is not a valid JSON string, ld reports an error. If the
> > >> library is unavailable, or if `--disable-jansson` was passed to the
> > >> configure script, the library will not be linked and the error check
> > >> will be disabled. By default, the library will be linked if it exists.
> > >>
> > >> I opposed adding an extra dependency to GNU ld just for string
> > >> verification purposes because it didn't seem worth adding extra
> > >> dependency to the linker. LLVM lld and the mold linker also support
> > >> the option, but they do not verify if the argument is a valid JSON
> > >> string -- they simply treat it as an opaque string. If libjansson is
> > >> unavailable, even GNU ld doesn't verify arguments. Therefore, the
> > >> verification is not trustworthy, and the reader must be prepared for a
> > >> malformed JSON string when reading a .note section. Moreover,
> > >> verifying a string is straightforward without the feature; you can
> > >> simply `echo` the string to pipe it to `jq` for verification before
> > >> passing it to GNU ld.
> > >>
> > >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
> > >> released this month, and the dependency on libjansson was indeed
> > >> present.
> > >>
> > >> How much risk does it pose? Probably not much, as long as the library
> > >> is maintained properly. However, the stakes are high; if someone takes
> > >> control of the library and introduces malicious code, they could
> > >> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
> > >> to build essentially everything, the attacker could in theory gain the
> > >> power to not just contaminate a specific program such as openssh, but
> > >> every executable in an official Linux distribution image. I think the
> > >> risk is not worth taking. I believe we just should remove the string
> > >> verification code and the dependency on the library from GNU ld.
> > >>
> > >> Rui Ueyama
> > >
> > > Thanks for bringing this up again. I support removing the json dependency.
> > >
> > > I lightly expressed my concern
> > > https://sourceware.org/pipermail/binutils/2022-May/120846.html and
> > > there might be others unsure about the dependency as well.
> >
> > I'd like to hear bluca's take before making up my mind. Note that it's
> > also automagic right now IIRC (enabled if installed, not opt-in).
> >
> > But my take on it so far is that it doesn't sound worth it.
>
> $ apt-cache rdepends libjansson4 | wc -l
> 310
>
> Sorry, but this looks to me like a knee-jerk reaction that misses the
> wood for the trees. The xz issue is that an extremely complex and
> sophisticated social engineering attack was planned and executed over
> multiple years. If it hadn't been on xz, it would have been on another
> project. Dropping dependencies left and right without thinking will
> not stop the next attack, as there is no shortage of overworked
> maintainers to try and abuse. In fact, avoiding common libraries and
> everyone reimplementing the wheel _increases_ the attack
> opportunities.
>
> The reason for this integration is that finding issues at build time
> is cheaper than finding them at runtime or release time. We rely on
> this in Fedora, Debian and Ubuntu to ensure valid data is written (in
> all packages in the former, in an increasing set of opt-ins in the
> latters) - in fact, validating input _before_ shipping binaries
> _improves_ security, rather than decreasing it, given the payload is
> parsed by a privileged component.
>
> So if it was removed upstream, we'd just have to patch it back in
> downstream in each interested distro - which if you recall, was
> exactly how sshd was exploited, due to the lack of an upstream common
> implementation. This is opt-in, so distributions that are not
> interested can just avoid enabling it, with no extra effort required.
> So please, leave it as-is. Thank you.

I'd like to say that that's not an instinctive reaction to the xz
incident, as I expressed a concern when the dependency was being
introduced. I also want to say that I'm not trying to drop
dependencies left and right without thinking.

I understand that the error check would be sometimes useful, but I'd
like to propose reevaluating its cost and benefit. IIUC, the
--package-metadata option is, as its name suggests, used by packaging
tools such as dpkg-dev or rpmbuild. My understanding is that this
option is not intended to be used directly by developers at large, but
rather, the JSON string is generated by a very limited set of
packaging tools from package description files and passed to the
linker. If that's the case, the string format check itself is an extra
measure to catch rare string construction bugs, given that JSON
strings are machine-generated and not manually crafted.

While we may still do extra checks for catching possible tooling bugs,
I wonder how difficult it would be to assign this additional
verification to jq or a similar tool instead of the linker. IIRC, dpkg
comes with a tool to scan all executables in a directory to find
packaging errors. Isn't it a better place to run the extra format
check? Would it be too complex to output the .note section contents
after linking and pipe them to jq to confirm the embedded string is a
valid JSON string?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01 11:44       ` Rui Ueyama
@ 2024-04-01 12:16         ` Luca Boccassi
  2024-04-01 13:23           ` Rui Ueyama
  0 siblings, 1 reply; 13+ messages in thread
From: Luca Boccassi @ 2024-04-01 12:16 UTC (permalink / raw)
  To: Rui Ueyama; +Cc: Sam James, Fangrui Song, Binutils

On Mon, 1 Apr 2024 at 12:44, Rui Ueyama <rui314@gmail.com> wrote:
>
> On Mon, Apr 1, 2024 at 7:38 PM Luca Boccassi <bluca@debian.org> wrote:
> >
> > On Mon, 1 Apr 2024 at 10:50, Sam James <sam@gentoo.org> wrote:
> > >
> > > Fangrui Song <i@maskray.me> writes:
> > >
> > > > On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> The recent xz incident demonstrated that supply chain attacks are a
> > > >> real threat, and dependence on third-party libraries can have
> > > >> significant consequences.
> > > >>
> > > >> In the wake of the incident, I propose we remove the dependency on
> > > >> libjansson from GNU ld.
> > > >>
> > > >> First of all, why does GNU ld depend on libjansson which is a JSON
> > > >> parsing library? GNU ld gained the `--package-metadata` option in May
> > > >> 2022 to embed a JSON string into a .note section for package
> > > >> management for Fedora and other Linux distributions. At the same time,
> > > >> the dependency on libjansson, a library for parsing JSON-format
> > > >> strings, was introduced to validate an argument for that option. If an
> > > >> argument is not a valid JSON string, ld reports an error. If the
> > > >> library is unavailable, or if `--disable-jansson` was passed to the
> > > >> configure script, the library will not be linked and the error check
> > > >> will be disabled. By default, the library will be linked if it exists.
> > > >>
> > > >> I opposed adding an extra dependency to GNU ld just for string
> > > >> verification purposes because it didn't seem worth adding extra
> > > >> dependency to the linker. LLVM lld and the mold linker also support
> > > >> the option, but they do not verify if the argument is a valid JSON
> > > >> string -- they simply treat it as an opaque string. If libjansson is
> > > >> unavailable, even GNU ld doesn't verify arguments. Therefore, the
> > > >> verification is not trustworthy, and the reader must be prepared for a
> > > >> malformed JSON string when reading a .note section. Moreover,
> > > >> verifying a string is straightforward without the feature; you can
> > > >> simply `echo` the string to pipe it to `jq` for verification before
> > > >> passing it to GNU ld.
> > > >>
> > > >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
> > > >> released this month, and the dependency on libjansson was indeed
> > > >> present.
> > > >>
> > > >> How much risk does it pose? Probably not much, as long as the library
> > > >> is maintained properly. However, the stakes are high; if someone takes
> > > >> control of the library and introduces malicious code, they could
> > > >> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
> > > >> to build essentially everything, the attacker could in theory gain the
> > > >> power to not just contaminate a specific program such as openssh, but
> > > >> every executable in an official Linux distribution image. I think the
> > > >> risk is not worth taking. I believe we just should remove the string
> > > >> verification code and the dependency on the library from GNU ld.
> > > >>
> > > >> Rui Ueyama
> > > >
> > > > Thanks for bringing this up again. I support removing the json dependency.
> > > >
> > > > I lightly expressed my concern
> > > > https://sourceware.org/pipermail/binutils/2022-May/120846.html and
> > > > there might be others unsure about the dependency as well.
> > >
> > > I'd like to hear bluca's take before making up my mind. Note that it's
> > > also automagic right now IIRC (enabled if installed, not opt-in).
> > >
> > > But my take on it so far is that it doesn't sound worth it.
> >
> > $ apt-cache rdepends libjansson4 | wc -l
> > 310
> >
> > Sorry, but this looks to me like a knee-jerk reaction that misses the
> > wood for the trees. The xz issue is that an extremely complex and
> > sophisticated social engineering attack was planned and executed over
> > multiple years. If it hadn't been on xz, it would have been on another
> > project. Dropping dependencies left and right without thinking will
> > not stop the next attack, as there is no shortage of overworked
> > maintainers to try and abuse. In fact, avoiding common libraries and
> > everyone reimplementing the wheel _increases_ the attack
> > opportunities.
> >
> > The reason for this integration is that finding issues at build time
> > is cheaper than finding them at runtime or release time. We rely on
> > this in Fedora, Debian and Ubuntu to ensure valid data is written (in
> > all packages in the former, in an increasing set of opt-ins in the
> > latters) - in fact, validating input _before_ shipping binaries
> > _improves_ security, rather than decreasing it, given the payload is
> > parsed by a privileged component.
> >
> > So if it was removed upstream, we'd just have to patch it back in
> > downstream in each interested distro - which if you recall, was
> > exactly how sshd was exploited, due to the lack of an upstream common
> > implementation. This is opt-in, so distributions that are not
> > interested can just avoid enabling it, with no extra effort required.
> > So please, leave it as-is. Thank you.
>
> I'd like to say that that's not an instinctive reaction to the xz
> incident, as I expressed a concern when the dependency was being
> introduced. I also want to say that I'm not trying to drop
> dependencies left and right without thinking.

Well, the email did start with:

> The recent xz incident demonstrated that supply chain attacks are a real threat, and dependence on third-party libraries can have significant consequences.

so it should be understandable why that's the impression it gave me.
This feature has been present in binutils for 2 years now, and AFAIK
caused zero issues so far (probably going to jinx it with this
comment, but still).

> I understand that the error check would be sometimes useful, but I'd
> like to propose reevaluating its cost and benefit. IIUC, the
> --package-metadata option is, as its name suggests, used by packaging
> tools such as dpkg-dev or rpmbuild. My understanding is that this
> option is not intended to be used directly by developers at large, but
> rather, the JSON string is generated by a very limited set of
> packaging tools from package description files and passed to the
> linker. If that's the case, the string format check itself is an extra
> measure to catch rare string construction bugs, given that JSON
> strings are machine-generated and not manually crafted.

The metadata comes from each package, so it is very much possible for
errors to appear in one build but not another. And they would be hard
to spot and expensive to fix, as they'd likely only be found when
things go really wrong and the data is needed at runtime, or worse.

> While we may still do extra checks for catching possible tooling bugs,
> I wonder how difficult it would be to assign this additional
> verification to jq or a similar tool instead of the linker. IIRC, dpkg
> comes with a tool to scan all executables in a directory to find
> packaging errors. Isn't it a better place to run the extra format
> check? Would it be too complex to output the .note section contents
> after linking and pipe them to jq to confirm the embedded string is a
> valid JSON string?

There's no such thing in dpkg. You might be thinking of Lintian, but
that's an entirely separate tool, that is optional, runs long after
the package is published and distributed, and is only seldomly checked
(most maintainers just ignore it as it's informational only), and
doesn't stop bad builds from being generated and released. It's also
package-format-specific, so again every distribution would have to
reimplement exactly the same check in different ways, which is just
duplication of work for no gain. The intention here was to have bad
input fail the build immediately, with no extra effort required from
distributions or packagers, and it works very well for that.

This is akin to suggesting that compilers shoudn't output warnings,
but the job should just be left to static analyzers. The earlier
problems are detected, the cheaper they are to fix, and the less
impact they have. This is one of the few truisms of engineering.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01 12:16         ` Luca Boccassi
@ 2024-04-01 13:23           ` Rui Ueyama
  2024-04-02  0:54             ` Luca Boccassi
  0 siblings, 1 reply; 13+ messages in thread
From: Rui Ueyama @ 2024-04-01 13:23 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: Sam James, Fangrui Song, Binutils

On Mon, Apr 1, 2024 at 9:16 PM Luca Boccassi <bluca@debian.org> wrote:
>
> On Mon, 1 Apr 2024 at 12:44, Rui Ueyama <rui314@gmail.com> wrote:
> >
> > On Mon, Apr 1, 2024 at 7:38 PM Luca Boccassi <bluca@debian.org> wrote:
> > >
> > > On Mon, 1 Apr 2024 at 10:50, Sam James <sam@gentoo.org> wrote:
> > > >
> > > > Fangrui Song <i@maskray.me> writes:
> > > >
> > > > > On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> The recent xz incident demonstrated that supply chain attacks are a
> > > > >> real threat, and dependence on third-party libraries can have
> > > > >> significant consequences.
> > > > >>
> > > > >> In the wake of the incident, I propose we remove the dependency on
> > > > >> libjansson from GNU ld.
> > > > >>
> > > > >> First of all, why does GNU ld depend on libjansson which is a JSON
> > > > >> parsing library? GNU ld gained the `--package-metadata` option in May
> > > > >> 2022 to embed a JSON string into a .note section for package
> > > > >> management for Fedora and other Linux distributions. At the same time,
> > > > >> the dependency on libjansson, a library for parsing JSON-format
> > > > >> strings, was introduced to validate an argument for that option. If an
> > > > >> argument is not a valid JSON string, ld reports an error. If the
> > > > >> library is unavailable, or if `--disable-jansson` was passed to the
> > > > >> configure script, the library will not be linked and the error check
> > > > >> will be disabled. By default, the library will be linked if it exists.
> > > > >>
> > > > >> I opposed adding an extra dependency to GNU ld just for string
> > > > >> verification purposes because it didn't seem worth adding extra
> > > > >> dependency to the linker. LLVM lld and the mold linker also support
> > > > >> the option, but they do not verify if the argument is a valid JSON
> > > > >> string -- they simply treat it as an opaque string. If libjansson is
> > > > >> unavailable, even GNU ld doesn't verify arguments. Therefore, the
> > > > >> verification is not trustworthy, and the reader must be prepared for a
> > > > >> malformed JSON string when reading a .note section. Moreover,
> > > > >> verifying a string is straightforward without the feature; you can
> > > > >> simply `echo` the string to pipe it to `jq` for verification before
> > > > >> passing it to GNU ld.
> > > > >>
> > > > >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
> > > > >> released this month, and the dependency on libjansson was indeed
> > > > >> present.
> > > > >>
> > > > >> How much risk does it pose? Probably not much, as long as the library
> > > > >> is maintained properly. However, the stakes are high; if someone takes
> > > > >> control of the library and introduces malicious code, they could
> > > > >> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
> > > > >> to build essentially everything, the attacker could in theory gain the
> > > > >> power to not just contaminate a specific program such as openssh, but
> > > > >> every executable in an official Linux distribution image. I think the
> > > > >> risk is not worth taking. I believe we just should remove the string
> > > > >> verification code and the dependency on the library from GNU ld.
> > > > >>
> > > > >> Rui Ueyama
> > > > >
> > > > > Thanks for bringing this up again. I support removing the json dependency.
> > > > >
> > > > > I lightly expressed my concern
> > > > > https://sourceware.org/pipermail/binutils/2022-May/120846.html and
> > > > > there might be others unsure about the dependency as well.
> > > >
> > > > I'd like to hear bluca's take before making up my mind. Note that it's
> > > > also automagic right now IIRC (enabled if installed, not opt-in).
> > > >
> > > > But my take on it so far is that it doesn't sound worth it.
> > >
> > > $ apt-cache rdepends libjansson4 | wc -l
> > > 310
> > >
> > > Sorry, but this looks to me like a knee-jerk reaction that misses the
> > > wood for the trees. The xz issue is that an extremely complex and
> > > sophisticated social engineering attack was planned and executed over
> > > multiple years. If it hadn't been on xz, it would have been on another
> > > project. Dropping dependencies left and right without thinking will
> > > not stop the next attack, as there is no shortage of overworked
> > > maintainers to try and abuse. In fact, avoiding common libraries and
> > > everyone reimplementing the wheel _increases_ the attack
> > > opportunities.
> > >
> > > The reason for this integration is that finding issues at build time
> > > is cheaper than finding them at runtime or release time. We rely on
> > > this in Fedora, Debian and Ubuntu to ensure valid data is written (in
> > > all packages in the former, in an increasing set of opt-ins in the
> > > latters) - in fact, validating input _before_ shipping binaries
> > > _improves_ security, rather than decreasing it, given the payload is
> > > parsed by a privileged component.
> > >
> > > So if it was removed upstream, we'd just have to patch it back in
> > > downstream in each interested distro - which if you recall, was
> > > exactly how sshd was exploited, due to the lack of an upstream common
> > > implementation. This is opt-in, so distributions that are not
> > > interested can just avoid enabling it, with no extra effort required.
> > > So please, leave it as-is. Thank you.
> >
> > I'd like to say that that's not an instinctive reaction to the xz
> > incident, as I expressed a concern when the dependency was being
> > introduced. I also want to say that I'm not trying to drop
> > dependencies left and right without thinking.
>
> Well, the email did start with:
>
> > The recent xz incident demonstrated that supply chain attacks are a real threat, and dependence on third-party libraries can have significant consequences.
>
> so it should be understandable why that's the impression it gave me.
> This feature has been present in binutils for 2 years now, and AFAIK
> caused zero issues so far (probably going to jinx it with this
> comment, but still).
>
> > I understand that the error check would be sometimes useful, but I'd
> > like to propose reevaluating its cost and benefit. IIUC, the
> > --package-metadata option is, as its name suggests, used by packaging
> > tools such as dpkg-dev or rpmbuild. My understanding is that this
> > option is not intended to be used directly by developers at large, but
> > rather, the JSON string is generated by a very limited set of
> > packaging tools from package description files and passed to the
> > linker. If that's the case, the string format check itself is an extra
> > measure to catch rare string construction bugs, given that JSON
> > strings are machine-generated and not manually crafted.
>
> The metadata comes from each package, so it is very much possible for
> errors to appear in one build but not another. And they would be hard
> to spot and expensive to fix, as they'd likely only be found when
> things go really wrong and the data is needed at runtime, or worse.

Correct me if I'm wrong, but I believe the main concern in creating a
malformed JSON string is forgetting to escape certain characters when
embedding it in a JSON string. Correctly handling it shouldn't be too
difficult.

I understand that it is sometimes hard to notice an error in the .note
section as it's just embedded data sitting there, and errors in the
section would often be noticed exactly in the situation when you
really need it. But if you are very cautious about the sanity of the
.note section, why would you stop here and only focus on catching JSON
format errors? If you do not trust your own tool and aim to catch bugs
before its generated data goes into production, you should read the
.note section back and compare the JSON contents with its package
metadata file. There are tons of different ways for the tool to
semantically go wrong such as missing field or something like that,
and just catching JSON format errors (or, maybe just string escaping
bugs) seems insufficient to me. And if you think that that kind of
error would be unlikely, then I wonder why you are worried about the
JSON format error.

> > While we may still do extra checks for catching possible tooling bugs,
> > I wonder how difficult it would be to assign this additional
> > verification to jq or a similar tool instead of the linker. IIRC, dpkg
> > comes with a tool to scan all executables in a directory to find
> > packaging errors. Isn't it a better place to run the extra format
> > check? Would it be too complex to output the .note section contents
> > after linking and pipe them to jq to confirm the embedded string is a
> > valid JSON string?
>
> There's no such thing in dpkg. You might be thinking of Lintian, but
> that's an entirely separate tool, that is optional, runs long after
> the package is published and distributed, and is only seldomly checked
> (most maintainers just ignore it as it's informational only), and
> doesn't stop bad builds from being generated and released. It's also
> package-format-specific, so again every distribution would have to
> reimplement exactly the same check in different ways, which is just
> duplication of work for no gain. The intention here was to have bad
> input fail the build immediately, with no extra effort required from
> distributions or packagers, and it works very well for that.
>
> This is akin to suggesting that compilers shoudn't output warnings,
> but the job should just be left to static analyzers. The earlier
> problems are detected, the cheaper they are to fix, and the less
> impact they have. This is one of the few truisms of engineering.

If a --package-metadata argument is machine-generated, a better
analogy would be the relationship between the compiler and the linker,
rather than the human programmer and the compiler. The linker
sometimes verifies object file contents, but it doesn't disassemble
the .text section to catch rare assembler bugs.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-01 13:23           ` Rui Ueyama
@ 2024-04-02  0:54             ` Luca Boccassi
  2024-04-02  9:40               ` Rui Ueyama
  0 siblings, 1 reply; 13+ messages in thread
From: Luca Boccassi @ 2024-04-02  0:54 UTC (permalink / raw)
  To: Rui Ueyama; +Cc: Sam James, Fangrui Song, Binutils

On Mon, 1 Apr 2024 at 14:23, Rui Ueyama <rui314@gmail.com> wrote:
>
> On Mon, Apr 1, 2024 at 9:16 PM Luca Boccassi <bluca@debian.org> wrote:
> >
> > On Mon, 1 Apr 2024 at 12:44, Rui Ueyama <rui314@gmail.com> wrote:
> > >
> > > On Mon, Apr 1, 2024 at 7:38 PM Luca Boccassi <bluca@debian.org> wrote:
> > > >
> > > > On Mon, 1 Apr 2024 at 10:50, Sam James <sam@gentoo.org> wrote:
> > > > >
> > > > > Fangrui Song <i@maskray.me> writes:
> > > > >
> > > > > > On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
> > > > > >>
> > > > > >> Hi,
> > > > > >>
> > > > > >> The recent xz incident demonstrated that supply chain attacks are a
> > > > > >> real threat, and dependence on third-party libraries can have
> > > > > >> significant consequences.
> > > > > >>
> > > > > >> In the wake of the incident, I propose we remove the dependency on
> > > > > >> libjansson from GNU ld.
> > > > > >>
> > > > > >> First of all, why does GNU ld depend on libjansson which is a JSON
> > > > > >> parsing library? GNU ld gained the `--package-metadata` option in May
> > > > > >> 2022 to embed a JSON string into a .note section for package
> > > > > >> management for Fedora and other Linux distributions. At the same time,
> > > > > >> the dependency on libjansson, a library for parsing JSON-format
> > > > > >> strings, was introduced to validate an argument for that option. If an
> > > > > >> argument is not a valid JSON string, ld reports an error. If the
> > > > > >> library is unavailable, or if `--disable-jansson` was passed to the
> > > > > >> configure script, the library will not be linked and the error check
> > > > > >> will be disabled. By default, the library will be linked if it exists.
> > > > > >>
> > > > > >> I opposed adding an extra dependency to GNU ld just for string
> > > > > >> verification purposes because it didn't seem worth adding extra
> > > > > >> dependency to the linker. LLVM lld and the mold linker also support
> > > > > >> the option, but they do not verify if the argument is a valid JSON
> > > > > >> string -- they simply treat it as an opaque string. If libjansson is
> > > > > >> unavailable, even GNU ld doesn't verify arguments. Therefore, the
> > > > > >> verification is not trustworthy, and the reader must be prepared for a
> > > > > >> malformed JSON string when reading a .note section. Moreover,
> > > > > >> verifying a string is straightforward without the feature; you can
> > > > > >> simply `echo` the string to pipe it to `jq` for verification before
> > > > > >> passing it to GNU ld.
> > > > > >>
> > > > > >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
> > > > > >> released this month, and the dependency on libjansson was indeed
> > > > > >> present.
> > > > > >>
> > > > > >> How much risk does it pose? Probably not much, as long as the library
> > > > > >> is maintained properly. However, the stakes are high; if someone takes
> > > > > >> control of the library and introduces malicious code, they could
> > > > > >> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
> > > > > >> to build essentially everything, the attacker could in theory gain the
> > > > > >> power to not just contaminate a specific program such as openssh, but
> > > > > >> every executable in an official Linux distribution image. I think the
> > > > > >> risk is not worth taking. I believe we just should remove the string
> > > > > >> verification code and the dependency on the library from GNU ld.
> > > > > >>
> > > > > >> Rui Ueyama
> > > > > >
> > > > > > Thanks for bringing this up again. I support removing the json dependency.
> > > > > >
> > > > > > I lightly expressed my concern
> > > > > > https://sourceware.org/pipermail/binutils/2022-May/120846.html and
> > > > > > there might be others unsure about the dependency as well.
> > > > >
> > > > > I'd like to hear bluca's take before making up my mind. Note that it's
> > > > > also automagic right now IIRC (enabled if installed, not opt-in).
> > > > >
> > > > > But my take on it so far is that it doesn't sound worth it.
> > > >
> > > > $ apt-cache rdepends libjansson4 | wc -l
> > > > 310
> > > >
> > > > Sorry, but this looks to me like a knee-jerk reaction that misses the
> > > > wood for the trees. The xz issue is that an extremely complex and
> > > > sophisticated social engineering attack was planned and executed over
> > > > multiple years. If it hadn't been on xz, it would have been on another
> > > > project. Dropping dependencies left and right without thinking will
> > > > not stop the next attack, as there is no shortage of overworked
> > > > maintainers to try and abuse. In fact, avoiding common libraries and
> > > > everyone reimplementing the wheel _increases_ the attack
> > > > opportunities.
> > > >
> > > > The reason for this integration is that finding issues at build time
> > > > is cheaper than finding them at runtime or release time. We rely on
> > > > this in Fedora, Debian and Ubuntu to ensure valid data is written (in
> > > > all packages in the former, in an increasing set of opt-ins in the
> > > > latters) - in fact, validating input _before_ shipping binaries
> > > > _improves_ security, rather than decreasing it, given the payload is
> > > > parsed by a privileged component.
> > > >
> > > > So if it was removed upstream, we'd just have to patch it back in
> > > > downstream in each interested distro - which if you recall, was
> > > > exactly how sshd was exploited, due to the lack of an upstream common
> > > > implementation. This is opt-in, so distributions that are not
> > > > interested can just avoid enabling it, with no extra effort required.
> > > > So please, leave it as-is. Thank you.
> > >
> > > I'd like to say that that's not an instinctive reaction to the xz
> > > incident, as I expressed a concern when the dependency was being
> > > introduced. I also want to say that I'm not trying to drop
> > > dependencies left and right without thinking.
> >
> > Well, the email did start with:
> >
> > > The recent xz incident demonstrated that supply chain attacks are a real threat, and dependence on third-party libraries can have significant consequences.
> >
> > so it should be understandable why that's the impression it gave me.
> > This feature has been present in binutils for 2 years now, and AFAIK
> > caused zero issues so far (probably going to jinx it with this
> > comment, but still).
> >
> > > I understand that the error check would be sometimes useful, but I'd
> > > like to propose reevaluating its cost and benefit. IIUC, the
> > > --package-metadata option is, as its name suggests, used by packaging
> > > tools such as dpkg-dev or rpmbuild. My understanding is that this
> > > option is not intended to be used directly by developers at large, but
> > > rather, the JSON string is generated by a very limited set of
> > > packaging tools from package description files and passed to the
> > > linker. If that's the case, the string format check itself is an extra
> > > measure to catch rare string construction bugs, given that JSON
> > > strings are machine-generated and not manually crafted.
> >
> > The metadata comes from each package, so it is very much possible for
> > errors to appear in one build but not another. And they would be hard
> > to spot and expensive to fix, as they'd likely only be found when
> > things go really wrong and the data is needed at runtime, or worse.
>
> Correct me if I'm wrong, but I believe the main concern in creating a
> malformed JSON string is forgetting to escape certain characters when
> embedding it in a JSON string. Correctly handling it shouldn't be too
> difficult.

No, it's about anything that results in an invalid payload.

> I understand that it is sometimes hard to notice an error in the .note
> section as it's just embedded data sitting there, and errors in the
> section would often be noticed exactly in the situation when you
> really need it. But if you are very cautious about the sanity of the
> .note section, why would you stop here and only focus on catching JSON
> format errors? If you do not trust your own tool and aim to catch bugs
> before its generated data goes into production, you should read the
> .note section back and compare the JSON contents with its package
> metadata file. There are tons of different ways for the tool to
> semantically go wrong such as missing field or something like that,
> and just catching JSON format errors (or, maybe just string escaping
> bugs) seems insufficient to me. And if you think that that kind of
> error would be unlikely, then I wonder why you are worried about the
> JSON format error.

No, that is not a problem in reality, I'm not sure where you got that
impression. What really matters is ensuring the payload is valid json,
and the best way to do that is the current setup, because it costs
nothing, it's already implemented, it works out of the box
automatically for any caller regardless of the distribution and the
packaging format, it's inoffensive as it is not enabled by default and
requires opt-in, and has zero drawbacks or issues as evidenced by the
2 years it's been available and used. I don't see why it's necessary
to create non-existing problems out of thin air and break existing
workflows, as a knee-jerk reaction driven by a fundamental
misunderstanding about what happened to a completely separate and
independent project. There is literally nothing to gain and all to
lose.

> > > While we may still do extra checks for catching possible tooling bugs,
> > > I wonder how difficult it would be to assign this additional
> > > verification to jq or a similar tool instead of the linker. IIRC, dpkg
> > > comes with a tool to scan all executables in a directory to find
> > > packaging errors. Isn't it a better place to run the extra format
> > > check? Would it be too complex to output the .note section contents
> > > after linking and pipe them to jq to confirm the embedded string is a
> > > valid JSON string?
> >
> > There's no such thing in dpkg. You might be thinking of Lintian, but
> > that's an entirely separate tool, that is optional, runs long after
> > the package is published and distributed, and is only seldomly checked
> > (most maintainers just ignore it as it's informational only), and
> > doesn't stop bad builds from being generated and released. It's also
> > package-format-specific, so again every distribution would have to
> > reimplement exactly the same check in different ways, which is just
> > duplication of work for no gain. The intention here was to have bad
> > input fail the build immediately, with no extra effort required from
> > distributions or packagers, and it works very well for that.
> >
> > This is akin to suggesting that compilers shoudn't output warnings,
> > but the job should just be left to static analyzers. The earlier
> > problems are detected, the cheaper they are to fix, and the less
> > impact they have. This is one of the few truisms of engineering.
>
> If a --package-metadata argument is machine-generated, a better
> analogy would be the relationship between the compiler and the linker,
> rather than the human programmer and the compiler. The linker
> sometimes verifies object file contents, but it doesn't disassemble
> the .text section to catch rare assembler bugs.

No, it's actually a good analogy. Removing warnings and telling people
to just do post-processing would be equally nonsensical, as warnings
are already implemented, they provide value if one wants to use them
and are harmless if one wants to ignore them, and do not cause any
real issue. Moreover, as already explained, this is not how
distributions and packaging actually work in the real world, there is
no such postprocessing, so it's even more nonsensical as a suggestion.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-02  0:54             ` Luca Boccassi
@ 2024-04-02  9:40               ` Rui Ueyama
  2024-04-02 13:10                 ` Orlando Arias
  0 siblings, 1 reply; 13+ messages in thread
From: Rui Ueyama @ 2024-04-02  9:40 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: Sam James, Fangrui Song, Binutils

We have discussed various topics already, and I don't think there's a
single answer because this is all about engineering tradeoffs.

I'd like to hear from other devs who are following this thread if there are any.

On Tue, Apr 2, 2024 at 9:54 AM Luca Boccassi <bluca@debian.org> wrote:
>
> On Mon, 1 Apr 2024 at 14:23, Rui Ueyama <rui314@gmail.com> wrote:
> >
> > On Mon, Apr 1, 2024 at 9:16 PM Luca Boccassi <bluca@debian.org> wrote:
> > >
> > > On Mon, 1 Apr 2024 at 12:44, Rui Ueyama <rui314@gmail.com> wrote:
> > > >
> > > > On Mon, Apr 1, 2024 at 7:38 PM Luca Boccassi <bluca@debian.org> wrote:
> > > > >
> > > > > On Mon, 1 Apr 2024 at 10:50, Sam James <sam@gentoo.org> wrote:
> > > > > >
> > > > > > Fangrui Song <i@maskray.me> writes:
> > > > > >
> > > > > > > On Sun, Mar 31, 2024 at 8:31 PM Rui Ueyama <rui314@gmail.com> wrote:
> > > > > > >>
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> The recent xz incident demonstrated that supply chain attacks are a
> > > > > > >> real threat, and dependence on third-party libraries can have
> > > > > > >> significant consequences.
> > > > > > >>
> > > > > > >> In the wake of the incident, I propose we remove the dependency on
> > > > > > >> libjansson from GNU ld.
> > > > > > >>
> > > > > > >> First of all, why does GNU ld depend on libjansson which is a JSON
> > > > > > >> parsing library? GNU ld gained the `--package-metadata` option in May
> > > > > > >> 2022 to embed a JSON string into a .note section for package
> > > > > > >> management for Fedora and other Linux distributions. At the same time,
> > > > > > >> the dependency on libjansson, a library for parsing JSON-format
> > > > > > >> strings, was introduced to validate an argument for that option. If an
> > > > > > >> argument is not a valid JSON string, ld reports an error. If the
> > > > > > >> library is unavailable, or if `--disable-jansson` was passed to the
> > > > > > >> configure script, the library will not be linked and the error check
> > > > > > >> will be disabled. By default, the library will be linked if it exists.
> > > > > > >>
> > > > > > >> I opposed adding an extra dependency to GNU ld just for string
> > > > > > >> verification purposes because it didn't seem worth adding extra
> > > > > > >> dependency to the linker. LLVM lld and the mold linker also support
> > > > > > >> the option, but they do not verify if the argument is a valid JSON
> > > > > > >> string -- they simply treat it as an opaque string. If libjansson is
> > > > > > >> unavailable, even GNU ld doesn't verify arguments. Therefore, the
> > > > > > >> verification is not trustworthy, and the reader must be prepared for a
> > > > > > >> malformed JSON string when reading a .note section. Moreover,
> > > > > > >> verifying a string is straightforward without the feature; you can
> > > > > > >> simply `echo` the string to pipe it to `jq` for verification before
> > > > > > >> passing it to GNU ld.
> > > > > > >>
> > > > > > >> I just checked /usr/bin/ld on Ubuntu 24.04, which is set to be
> > > > > > >> released this month, and the dependency on libjansson was indeed
> > > > > > >> present.
> > > > > > >>
> > > > > > >> How much risk does it pose? Probably not much, as long as the library
> > > > > > >> is maintained properly. However, the stakes are high; if someone takes
> > > > > > >> control of the library and introduces malicious code, they could
> > > > > > >> execute a Ken Thompson-style supply chain attack. Since GNU ld is used
> > > > > > >> to build essentially everything, the attacker could in theory gain the
> > > > > > >> power to not just contaminate a specific program such as openssh, but
> > > > > > >> every executable in an official Linux distribution image. I think the
> > > > > > >> risk is not worth taking. I believe we just should remove the string
> > > > > > >> verification code and the dependency on the library from GNU ld.
> > > > > > >>
> > > > > > >> Rui Ueyama
> > > > > > >
> > > > > > > Thanks for bringing this up again. I support removing the json dependency.
> > > > > > >
> > > > > > > I lightly expressed my concern
> > > > > > > https://sourceware.org/pipermail/binutils/2022-May/120846.html and
> > > > > > > there might be others unsure about the dependency as well.
> > > > > >
> > > > > > I'd like to hear bluca's take before making up my mind. Note that it's
> > > > > > also automagic right now IIRC (enabled if installed, not opt-in).
> > > > > >
> > > > > > But my take on it so far is that it doesn't sound worth it.
> > > > >
> > > > > $ apt-cache rdepends libjansson4 | wc -l
> > > > > 310
> > > > >
> > > > > Sorry, but this looks to me like a knee-jerk reaction that misses the
> > > > > wood for the trees. The xz issue is that an extremely complex and
> > > > > sophisticated social engineering attack was planned and executed over
> > > > > multiple years. If it hadn't been on xz, it would have been on another
> > > > > project. Dropping dependencies left and right without thinking will
> > > > > not stop the next attack, as there is no shortage of overworked
> > > > > maintainers to try and abuse. In fact, avoiding common libraries and
> > > > > everyone reimplementing the wheel _increases_ the attack
> > > > > opportunities.
> > > > >
> > > > > The reason for this integration is that finding issues at build time
> > > > > is cheaper than finding them at runtime or release time. We rely on
> > > > > this in Fedora, Debian and Ubuntu to ensure valid data is written (in
> > > > > all packages in the former, in an increasing set of opt-ins in the
> > > > > latters) - in fact, validating input _before_ shipping binaries
> > > > > _improves_ security, rather than decreasing it, given the payload is
> > > > > parsed by a privileged component.
> > > > >
> > > > > So if it was removed upstream, we'd just have to patch it back in
> > > > > downstream in each interested distro - which if you recall, was
> > > > > exactly how sshd was exploited, due to the lack of an upstream common
> > > > > implementation. This is opt-in, so distributions that are not
> > > > > interested can just avoid enabling it, with no extra effort required.
> > > > > So please, leave it as-is. Thank you.
> > > >
> > > > I'd like to say that that's not an instinctive reaction to the xz
> > > > incident, as I expressed a concern when the dependency was being
> > > > introduced. I also want to say that I'm not trying to drop
> > > > dependencies left and right without thinking.
> > >
> > > Well, the email did start with:
> > >
> > > > The recent xz incident demonstrated that supply chain attacks are a real threat, and dependence on third-party libraries can have significant consequences.
> > >
> > > so it should be understandable why that's the impression it gave me.
> > > This feature has been present in binutils for 2 years now, and AFAIK
> > > caused zero issues so far (probably going to jinx it with this
> > > comment, but still).
> > >
> > > > I understand that the error check would be sometimes useful, but I'd
> > > > like to propose reevaluating its cost and benefit. IIUC, the
> > > > --package-metadata option is, as its name suggests, used by packaging
> > > > tools such as dpkg-dev or rpmbuild. My understanding is that this
> > > > option is not intended to be used directly by developers at large, but
> > > > rather, the JSON string is generated by a very limited set of
> > > > packaging tools from package description files and passed to the
> > > > linker. If that's the case, the string format check itself is an extra
> > > > measure to catch rare string construction bugs, given that JSON
> > > > strings are machine-generated and not manually crafted.
> > >
> > > The metadata comes from each package, so it is very much possible for
> > > errors to appear in one build but not another. And they would be hard
> > > to spot and expensive to fix, as they'd likely only be found when
> > > things go really wrong and the data is needed at runtime, or worse.
> >
> > Correct me if I'm wrong, but I believe the main concern in creating a
> > malformed JSON string is forgetting to escape certain characters when
> > embedding it in a JSON string. Correctly handling it shouldn't be too
> > difficult.
>
> No, it's about anything that results in an invalid payload.
>
> > I understand that it is sometimes hard to notice an error in the .note
> > section as it's just embedded data sitting there, and errors in the
> > section would often be noticed exactly in the situation when you
> > really need it. But if you are very cautious about the sanity of the
> > .note section, why would you stop here and only focus on catching JSON
> > format errors? If you do not trust your own tool and aim to catch bugs
> > before its generated data goes into production, you should read the
> > .note section back and compare the JSON contents with its package
> > metadata file. There are tons of different ways for the tool to
> > semantically go wrong such as missing field or something like that,
> > and just catching JSON format errors (or, maybe just string escaping
> > bugs) seems insufficient to me. And if you think that that kind of
> > error would be unlikely, then I wonder why you are worried about the
> > JSON format error.
>
> No, that is not a problem in reality, I'm not sure where you got that
> impression. What really matters is ensuring the payload is valid json,
> and the best way to do that is the current setup, because it costs
> nothing, it's already implemented, it works out of the box
> automatically for any caller regardless of the distribution and the
> packaging format, it's inoffensive as it is not enabled by default and
> requires opt-in, and has zero drawbacks or issues as evidenced by the
> 2 years it's been available and used. I don't see why it's necessary
> to create non-existing problems out of thin air and break existing
> workflows, as a knee-jerk reaction driven by a fundamental
> misunderstanding about what happened to a completely separate and
> independent project. There is literally nothing to gain and all to
> lose.
>
> > > > While we may still do extra checks for catching possible tooling bugs,
> > > > I wonder how difficult it would be to assign this additional
> > > > verification to jq or a similar tool instead of the linker. IIRC, dpkg
> > > > comes with a tool to scan all executables in a directory to find
> > > > packaging errors. Isn't it a better place to run the extra format
> > > > check? Would it be too complex to output the .note section contents
> > > > after linking and pipe them to jq to confirm the embedded string is a
> > > > valid JSON string?
> > >
> > > There's no such thing in dpkg. You might be thinking of Lintian, but
> > > that's an entirely separate tool, that is optional, runs long after
> > > the package is published and distributed, and is only seldomly checked
> > > (most maintainers just ignore it as it's informational only), and
> > > doesn't stop bad builds from being generated and released. It's also
> > > package-format-specific, so again every distribution would have to
> > > reimplement exactly the same check in different ways, which is just
> > > duplication of work for no gain. The intention here was to have bad
> > > input fail the build immediately, with no extra effort required from
> > > distributions or packagers, and it works very well for that.
> > >
> > > This is akin to suggesting that compilers shoudn't output warnings,
> > > but the job should just be left to static analyzers. The earlier
> > > problems are detected, the cheaper they are to fix, and the less
> > > impact they have. This is one of the few truisms of engineering.
> >
> > If a --package-metadata argument is machine-generated, a better
> > analogy would be the relationship between the compiler and the linker,
> > rather than the human programmer and the compiler. The linker
> > sometimes verifies object file contents, but it doesn't disassemble
> > the .text section to catch rare assembler bugs.
>
> No, it's actually a good analogy. Removing warnings and telling people
> to just do post-processing would be equally nonsensical, as warnings
> are already implemented, they provide value if one wants to use them
> and are harmless if one wants to ignore them, and do not cause any
> real issue. Moreover, as already explained, this is not how
> distributions and packaging actually work in the real world, there is
> no such postprocessing, so it's even more nonsensical as a suggestion.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-02  9:40               ` Rui Ueyama
@ 2024-04-02 13:10                 ` Orlando Arias
  2024-04-03 14:58                   ` Michael Matz
  0 siblings, 1 reply; 13+ messages in thread
From: Orlando Arias @ 2024-04-02 13:10 UTC (permalink / raw)
  To: binutils

Greetings,

On 4/2/24 5:40 AM, Rui Ueyama wrote:
> We have discussed various topics already, and I don't think there's a
> single answer because this is all about engineering tradeoffs.
> 
> I'd like to hear from other devs who are following this thread if there are any.

I am not a developer but I am a security researcher. The dependency as 
it is should be left in for a simple reason: you should always sanitize 
your inputs [1]. If the LLVM stack ignores this precept then they are 
doing it wrong. Why? Because of a little something we call in security a 
weird machine [2]. Parsers are complex beasts and can be made very easy 
to misbehave given errors in the grammar handler.

Your proposal of removing the dependecy to libjansson would require 
binutils to implement a brand new JSON parser to sanitize inputs, which, 
as previously mentioned is something you should be doing. This now 
introduces extra burden on the maintainers as well as the possibility of 
introducing parsing bugs in binutils. Binutils is now worse off and 
possibly vulnerable.

As previously mentioned, you want your tooling to catch errors earlier. 
Binutils is something that is not normally run as a superuser. However, 
the tools that end up processing the JSON metadata, such as the core 
dump handler in systemd can be running at high privileges. Do you want 
to risk it and move the weird machine behavior to that target?

Cheers.

[1] https://xkcd.com/327/
[2] https://en.wikipedia.org/wiki/Weird_machine

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-02 13:10                 ` Orlando Arias
@ 2024-04-03 14:58                   ` Michael Matz
  2024-04-03 15:37                     ` Orlando Arias
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Matz @ 2024-04-03 14:58 UTC (permalink / raw)
  To: Orlando Arias; +Cc: binutils

Hello,

On Tue, 2 Apr 2024, Orlando Arias wrote:

> Greetings,
> 
> On 4/2/24 5:40 AM, Rui Ueyama wrote:
> > We have discussed various topics already, and I don't think there's a
> > single answer because this is all about engineering tradeoffs.
> > 
> > I'd like to hear from other devs who are following this thread if there are
> > any.
> 
> I am not a developer but I am a security researcher. The dependency as 
> it is should be left in for a simple reason: you should always sanitize 
> your inputs [1].

Indeed.  But note that you're saying something else than what you wanted 
to say :)  For ld the input here is "blob of bytes".  That it's actually 
JSON (or claims to be!) is a matter for the processor of these .note 
sections.  _Those_ need to check the contents of them for being proper 
JSON themself.  They cannot rely on ld having produced "correct" .note 
sections anyway.  They could have been produced by bad tools, or 
retroactively be mangled.

So, as such checking in the consumer tools for the .notes cannot be 
avoided the early checking at producer time is a bit wasteful, and from a 
security perspective achieves exactly nothing.

(FWIW, for openSUSE I've disabled this all (or rather, didn't enable it) 
as I don't want libjansson in the bootstrap pkgbuild cycle just for this. 
Supply chain attacks or similar weren't my reason back then, and aren't 
now :) )


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Remove dependency on libjansson
  2024-04-03 14:58                   ` Michael Matz
@ 2024-04-03 15:37                     ` Orlando Arias
  0 siblings, 0 replies; 13+ messages in thread
From: Orlando Arias @ 2024-04-03 15:37 UTC (permalink / raw)
  To: Michael Matz; +Cc: binutils


[-- Attachment #1.1: Type: text/plain, Size: 1171 bytes --]

Greetings,

On 4/3/24 10:58 AM, Michael Matz wrote:
> Indeed.  But note that you're saying something else than what you wanted
> to say :)  For ld the input here is "blob of bytes".  That it's actually
> JSON (or claims to be!) is a matter for the processor of these .note
> sections.  _Those_ need to check the contents of them for being proper
> JSON themself.  They cannot rely on ld having produced "correct" .note
> sections anyway.  They could have been produced by bad tools, or
> retroactively be mangled.
> 
> So, as such checking in the consumer tools for the .notes cannot be
> avoided the early checking at producer time is a bit wasteful, and from a
> security perspective achieves exactly nothing.

Apologies for the confusion. Yes, I wholly agree, checking must be done 
at every stage of the chain. That includes both generation and every 
stage of the way thereafter. The entire chain must be sanitized. As you 
mention, tampering along the way is a possibility. Also, it is very 
possible that different parsers treat the same construct in different 
ways. This goes back to the entire weird machine situation.

Cheers,
Orlando.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 756 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-03 15:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-01  3:31 Remove dependency on libjansson Rui Ueyama
2024-04-01  7:28 ` Fangrui Song
2024-04-01  9:39   ` Sam James
2024-04-01 10:38     ` Luca Boccassi
2024-04-01 10:44       ` Sam James
2024-04-01 11:44       ` Rui Ueyama
2024-04-01 12:16         ` Luca Boccassi
2024-04-01 13:23           ` Rui Ueyama
2024-04-02  0:54             ` Luca Boccassi
2024-04-02  9:40               ` Rui Ueyama
2024-04-02 13:10                 ` Orlando Arias
2024-04-03 14:58                   ` Michael Matz
2024-04-03 15:37                     ` Orlando Arias

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).