public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Scaling -fmacro-prefix-map= to thousands entries
@ 2023-10-04 21:19 Sergei Trofimovich
  2023-10-05  7:19 ` Richard Biener
  2023-10-05 11:20 ` Ben Boeckel
  0 siblings, 2 replies; 9+ messages in thread
From: Sergei Trofimovich @ 2023-10-04 21:19 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 5217 bytes --]

Hi gcc developers!

Tl;DR:

I would like to implement a scalable way to pass `-fmacro-prefix-map=`
for `NixOS` distribution to avoid leaking build-time paths generated by
`__FILE__` macros used by various libraries.

I need some guidance what path to take to be acceptable for `gcc`
upstream.

I have a few possible solutions and wonder what I should try to upstream
to GCC. The options I see:

1. Hardcode NixOS-specific way to mangle paths.

   Pros: simplest to implement, can be easily configured away if needed
   Cons: inflexible, `clang` might or might not accept the same hack

2. Extend `-fmacro-prefix-map=` (or add a new `-fmacro-prefix-map-file=`)
   to allow passing a file

   Pros: still not too hard to implement, generic enough to be used in
         other contexts.
   Cons: Will require client to construct the map file.

3. Have more flexible `-fmacro-prefix-map-regex=` option that allows
   patterns. Something like:

      -fmacro-prefix-map-regex=/nix/store/[a-z0-9]{32}-=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-

  Pros: at least for NixOS one option will be enough to cover all
        packages as they all share above template.
  Cons: pulls some form of regex with it's can of worms including escape
        delimiters, might not be flexible enough for other use cases.

4. Something else?

Which one(s) should I take to implement?

More words:

`NixOS` (and `nixpkgs` repository) install every software package into
an individual directory with unique prefix. Some examples:

    /nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev
    /nix/store/rb3q4kcyfg77cmkiwywx2aqdd3x5ch93-libmpc-1.3.1
    /nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2
    ...

It's a fundamental design decision to allow parallel package installs.

From dependency tracking standpoint it's highly undesirable to have
these absolute paths to be hardcoded into final executable binaries if
they are not used at runtime.

Example redundant path we would like not to have in final binaries:

    $ strings result/bin/nix | grep phjcmy025rd1ankw5y1b21xsdii83cyk
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/json.hpp
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/output/serializer.hpp
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/conversions/to_chars.hpp
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/lexer.hpp
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iter_impl.hpp
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/json_sax.hpp
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iteration_proxy.hpp
    /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/parser.hpp

Those paths are inserted via glibc's assert() uses of `__FILE__`
directive and thus hardcode header file paths from various packages
(like lttng-ust or nlohmann/json) into compiled binaries. Sometimes
`__FILE__` usage is mire creating than assert().

I would like to get rid of references to header files. I think
`-fmacro-prefix-map=` are ideal for this particular use case.

The prototype that creates equivalent of the following commands does
work for smaller packages:

    -fmacro-prefix-map=/nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.37-8-dev
    -fmacro-prefix-map=/nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gmp-with-cxx-6.3.0-dev
    -fmacro-prefix-map=/nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-nlohmann_json-3.11.2
    ...

The above works for small amount of options (like, 100). But around 1000
options we start hitting linux limits on the single environment variable
or real-world packages like `qemu` with a ton of input depends.

The command-line limitations are in various places:
- `gcc` limitation of lifting all command line options into a single
  environment variable: https://gcc.gnu.org/PR111527
- `linux` limitation of constraining single environ variable to a value
  way below than full available environment space:
  https://lkml.org/lkml/2023/9/24/381

`linux` fix would buy us 50x more budged (A Lot) but it will not help
much other operating systems like `Darwin` where absolute environment
limit is a lot lower than `linux`.

I already implemented [1.] in https://github.com/NixOS/nixpkgs/pull/255192
(also attached `mangle-NIX_STORE-in-__FILE__.patch` 3.5K patch against
`master` as a proof of concept).

What would be the best way to scale up `-fmacro-prefix-map=` up to NixOS
needs for `gcc`? I would like to implement something sensible I could
upstream.

What do you think?

Thanks!

-- 

  Sergei

[-- Attachment #2: mangle-NIX_STORE-in-__FILE__.patch --]
[-- Type: text/x-patch, Size: 3476 bytes --]

From b10785c1be469319a09b10bc69db21159b0599ee Mon Sep 17 00:00:00 2001
From: Sergei Trofimovich <siarheit@google.com>
Date: Fri, 22 Sep 2023 22:41:49 +0100
Subject: [PATCH] gcc/file-prefix-map.cc: always mangle __FILE__ into invalid
 store path

Without the change `__FILE__` used in static inline functions in headers
embed paths to header files into executable images. For local headers
it's not a problem, but for headers in `/nix/store` this causes `-dev`
inputs to be retained in runtime closure.

Typical examples are `nix` -> `nlonhamm_json` and `pipewire` ->
`lttng-ust.dev`.

Ideally we would like to use `-fmacro-prefix-map=` feature of `gcc` as:

  -fmacro-prefix-map=/nix/store/$hash1-nlohmann-json-ver=/nix/store/eeee.eee-nlohmann-json-ver
  -fmacro-prefix-map=/nix/...

IN practice it quickly exhausts argument lengtth limit due to `gcc`
deficiency: https://gcc.gnu.org/PR111527

Until it;s fixed let's hardcode header mangling if $NIX_STORE variable
is present in the environment.

Tested as:

    $ printf "# 0 \"/nix/store/01234567890123456789012345678901-pppppp-vvvvvvv\" \nconst char * f(void) { return __FILE__; }" | NIX_STORE=/nix/store ./gcc/xgcc -Bgcc -x c - -S -o -
    ...
    .string "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-pppppp-vvvvvvv"
    ...

Mangled successfully.
---
 gcc/file-prefix-map.cc | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/file-prefix-map.cc b/gcc/file-prefix-map.cc
index 0e6db7c142a..da39404b9cd 100644
--- a/gcc/file-prefix-map.cc
+++ b/gcc/file-prefix-map.cc
@@ -69,6 +69,9 @@ add_prefix_map (file_prefix_map *&maps, const char *arg, const char *opt)
   maps = map;
 }
 
+/* Forward declatration for a $NIX_STORE remap hack below. */
+static file_prefix_map *macro_prefix_maps; /* -fmacro-prefix-map  */
+
 /* Perform user-specified mapping of filename prefixes.  Return the
    GC-allocated new name corresponding to FILENAME or FILENAME if no
    remapping was performed.  */
@@ -102,6 +105,29 @@ remap_filename (file_prefix_map *maps, const char *filename)
       break;
   if (!map)
     {
+      if (maps == macro_prefix_maps)
+	{
+	  /* Remap all fo $NIX_STORE/.{32} paths to
+	   * equivalent $NIX_STORE/e{32}.
+	   *
+	   * That way we avoid argument parameters explosion
+	   * and still avoid embedding headers into runtime closure:
+	   *   https://gcc.gnu.org/PR111527
+	   */
+	   char * nix_store = getenv("NIX_STORE");
+	   size_t nix_store_len = nix_store ? strlen(nix_store) : 0;
+	   const char * name = realname ? realname : filename;
+	   size_t name_len = strlen(name);
+	   if (nix_store && name_len >= nix_store_len + 1 + 32 && memcmp(name, nix_store, nix_store_len) == 0)
+	     {
+		s = (char *) ggc_alloc_atomic (name_len + 1);
+		memcpy(s, name, name_len + 1);
+		memset(s + nix_store_len + 1, 'e', 32);
+		if (realname != filename)
+		  free (const_cast <char *> (realname));
+		return s;
+	     }
+	}
       if (realname != filename)
 	free (const_cast <char *> (realname));
       return filename;
@@ -124,7 +150,6 @@ remap_filename (file_prefix_map *maps, const char *filename)
    ignore it in DW_AT_producer (gen_command_line_string in opts.cc).  */
 
 /* Linked lists of file_prefix_map structures.  */
-static file_prefix_map *macro_prefix_maps; /* -fmacro-prefix-map  */
 static file_prefix_map *debug_prefix_maps; /* -fdebug-prefix-map  */
 static file_prefix_map *profile_prefix_maps; /* -fprofile-prefix-map  */
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-06  6:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-04 21:19 Scaling -fmacro-prefix-map= to thousands entries Sergei Trofimovich
2023-10-05  7:19 ` Richard Biener
2023-10-05 11:59   ` Sergei Trofimovich
2023-10-05 12:05     ` Richard Biener
2023-10-05 12:14     ` Arsen Arsenović
2023-10-06  6:55       ` Richard Biener
2023-10-05 15:59     ` Ben Boeckel
2023-10-05 11:20 ` Ben Boeckel
2023-10-05 12:05   ` Sergei Trofimovich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).