From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by sourceware.org (Postfix) with ESMTPS id 408883857C43 for ; Thu, 5 Oct 2023 11:59:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 408883857C43 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-x42e.google.com with SMTP id ffacd0b85a97d-32320381a07so921177f8f.0 for ; Thu, 05 Oct 2023 04:59:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696507163; x=1697111963; darn=gcc.gnu.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=W9EvydeJaJfERsvR/lTnhmyj5PObtniWV0Ph+cDgko4=; b=TCHKAqV4yO6FJ5EKhfacspI8sdqoiagw974vTYc2lA5K/BZhocxwGyOJIJQefpX64S uNtKPW+8VNhdoF2oJCwDpyCp1ktHiGT5M2dL5GYZe2kJjGD4XlvVRltYAV+V2w53GYD4 SFawh8QRpaqkKWhzXQETMgUxwIgl7GzFwZ0H3Nn4F/XtOz4GMzXhCxyc/THanPiSTYS9 48O1C8OBWqdLPi6UecLIQfvuCgwxsrlo+QAFmvyB96m5d8P2pTvn9c4G6xiaaEbcXZfn PMDFGRbDM5+UWhOjMuUc0Av5NkTF4DVhMLqCwBJ8x4jINifMeDC1zWucd0xMRGwCEWve jjdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696507163; x=1697111963; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W9EvydeJaJfERsvR/lTnhmyj5PObtniWV0Ph+cDgko4=; b=biatjbDXjx6pXummtQYOwLKqP4ugj95PtO1g8NT2exi6i+Bxmv1ZzTwC0hXaQi6T/X l7tG3hqM04q3Ofp1Tpzapwtn5mh4zRb4v8E9dQOlPaetGCJfZGDnTZRwP9/z4GnR5X3o 2K9rV3QH0A1kIfzB5GB0pLwrdBXSCDflI65CI6ExQl4kgSfQO0jPi+j1szV8UavOLukK owOx2LOemMJTw4Prq+JxfEFwkcxgKSihRZv4Gr2o/6IyDzT7UkcM0qr5kboeL1vwBxqg v+mLJDNLPGYAfPItmRD3lSU3gHCC1lrfc/bkrte11ZjBMPH2ijITyl8z7ufl/BwZwIEQ FPLw== X-Gm-Message-State: AOJu0YyBVZzt59ydDMC4TrKcDjWnSZYgOylkODwFtTSIzjl6SZ3QgPR+ +2HoY+jOCrTYd2CfZ2gjfeo+riVDR+w= X-Google-Smtp-Source: AGHT+IGVIXkRHpGp/L7c1dQrhfHaKf02tuhchiOBJBI376tCrLx8T6mn5m5xz02XEIz2p9dyYDrUZg== X-Received: by 2002:adf:fdc7:0:b0:323:22d9:4930 with SMTP id i7-20020adffdc7000000b0032322d94930mr4473175wrs.33.1696507162601; Thu, 05 Oct 2023 04:59:22 -0700 (PDT) Received: from nz.home (host86-139-202-110.range86-139.btcentralplus.com. [86.139.202.110]) by smtp.gmail.com with ESMTPSA id p16-20020a5d4e10000000b0031fa870d4b3sm1633087wrt.60.2023.10.05.04.59.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 04:59:22 -0700 (PDT) Received: by nz.home (Postfix, from userid 1000) id 93D1911434CB56; Thu, 5 Oct 2023 12:59:21 +0100 (BST) Date: Thu, 5 Oct 2023 12:59:21 +0100 From: Sergei Trofimovich To: Richard Biener Cc: gcc@gcc.gnu.org Subject: Re: Scaling -fmacro-prefix-map= to thousands entries Message-ID: References: <20231004221932.06980e3f@nz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Oct 05, 2023 at 09:19:15AM +0200, Richard Biener wrote: > On Wed, Oct 4, 2023 at 11:20 PM Sergei Trofimovich via Gcc > wrote: > > > > Hi gcc developers! > > > > Tl;DR: > > > > I would like to implement a scalable way to pass `-fmacro-prefix-map=` > > for `NixOS` distribution to avoid leaking build-time paths generated by > > `__FILE__` macros used by various libraries. > > > > I need some guidance what path to take to be acceptable for `gcc` > > upstream. > > > > I have a few possible solutions and wonder what I should try to upstream > > to GCC. The options I see: > > > > 1. Hardcode NixOS-specific way to mangle paths. > > > > Pros: simplest to implement, can be easily configured away if needed > > Cons: inflexible, `clang` might or might not accept the same hack > > > > 2. Extend `-fmacro-prefix-map=` (or add a new `-fmacro-prefix-map-file=`) > > to allow passing a file > > > > Pros: still not too hard to implement, generic enough to be used in > > other contexts. > > Cons: Will require client to construct the map file. > > > > 3. Have more flexible `-fmacro-prefix-map-regex=` option that allows > > patterns. Something like: > > > > -fmacro-prefix-map-regex=/nix/store/[a-z0-9]{32}-=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee- > > > > Pros: at least for NixOS one option will be enough to cover all > > packages as they all share above template. > > Cons: pulls some form of regex with it's can of worms including escape > > delimiters, might not be flexible enough for other use cases. > > > > 4. Something else? > > > > Which one(s) should I take to implement? > > > > More words: > > > > `NixOS` (and `nixpkgs` repository) install every software package into > > an individual directory with unique prefix. Some examples: > > > > /nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev > > /nix/store/rb3q4kcyfg77cmkiwywx2aqdd3x5ch93-libmpc-1.3.1 > > /nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2 > > ... > > > > It's a fundamental design decision to allow parallel package installs. > > > > From dependency tracking standpoint it's highly undesirable to have > > these absolute paths to be hardcoded into final executable binaries if > > they are not used at runtime. > > > > Example redundant path we would like not to have in final binaries: > > > > $ strings result/bin/nix | grep phjcmy025rd1ankw5y1b21xsdii83cyk > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/json.hpp > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/output/serializer.hpp > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/conversions/to_chars.hpp > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/lexer.hpp > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iter_impl.hpp > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/json_sax.hpp > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iteration_proxy.hpp > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/parser.hpp > > > > Those paths are inserted via glibc's assert() uses of `__FILE__` > > directive and thus hardcode header file paths from various packages > > (like lttng-ust or nlohmann/json) into compiled binaries. Sometimes > > `__FILE__` usage is mire creating than assert(). > > > > I would like to get rid of references to header files. I think > > `-fmacro-prefix-map=` are ideal for this particular use case. > > > > The prototype that creates equivalent of the following commands does > > work for smaller packages: > > > > -fmacro-prefix-map=/nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.37-8-dev > > -fmacro-prefix-map=/nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gmp-with-cxx-6.3.0-dev > > -fmacro-prefix-map=/nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-nlohmann_json-3.11.2 > > ... > > > > The above works for small amount of options (like, 100). But around 1000 > > options we start hitting linux limits on the single environment variable > > or real-world packages like `qemu` with a ton of input depends. > > > > The command-line limitations are in various places: > > - `gcc` limitation of lifting all command line options into a single > > environment variable: https://gcc.gnu.org/PR111527 > > - `linux` limitation of constraining single environ variable to a value > > way below than full available environment space: > > https://lkml.org/lkml/2023/9/24/381 > > > > `linux` fix would buy us 50x more budged (A Lot) but it will not help > > much other operating systems like `Darwin` where absolute environment > > limit is a lot lower than `linux`. > > > > I already implemented [1.] in https://github.com/NixOS/nixpkgs/pull/255192 > > (also attached `mangle-NIX_STORE-in-__FILE__.patch` 3.5K patch against > > `master` as a proof of concept). > > > > What would be the best way to scale up `-fmacro-prefix-map=` up to NixOS > > needs for `gcc`? I would like to implement something sensible I could > > upstream. > > > > What do you think? > > Go for (2) which I think is the only way to truly solve the command-line > limitation issue (with less regular paths even regex wouldn't cut it). Sounds good. Do you have any preference over specific syntax? My suggestions would be: 1. `-fmacro-prefix-map=file-name`: if `file-name` there is not in `key=val` format then treat it as file 2. `-fmacro-prefix-map=@file-name`: use @ as a signal to use file 3. `fmacro-prefix-map-file=file-name`: use a new option > Btw, I thought we have response files to deal with command-line limits, > why doesn't that work here? I see the driver expands response files > but IIRC it also builds those when the command-line gets too large > and uses it for the environment and the cc1 invocation? If it doesn't > do the latter why not fix it that way? Yeah, in theory response files would extend the limit. In practice `gcc` always extends response files internally into a single `COLLECT_GCC_OPTIONS` option and hits the environment variable limit very early: https://gcc.gnu.org/PR111527 Example reproducer: $ for i in `seq 1 1000`; do printf -- "-fmacro-prefix-map=%0*d=%0*d\n" 200 1 200 2; done > a.rsp $ touch a.c; gcc @a.rsp -c a.c gcc: fatal error: cannot execute 'cc1': execv: Argument list too long compilation terminated. And if you want to look at the gory details: $ strace -f -etrace=execve -s 1000000 -v -v -v gcc @a.rsp -c a.c ... [pid 78] execve("cc1", ["cc1", "-quiet", "a.c", "-quiet", "-dumpbase", "a.c", "-dumpbase-ext", ".c", "-mtune=generic", "-march=x86-64", "-fmacro-prefix-map=...=...", "-fmacro-prefix-map=...=...", ...], [..., "COLLECT_GCC=gcc", "COLLECT_GCC_OPTIONS='-fmacro-prefix-map=...=...' '-fmacro-prefix-map=...=...' ... '-c' '-mtune=generic' '-march=x86-64'"]) = -1 E2BIG (Argument list too long) Note how `gcc` not only expands response file into an argument list (that is not too bad) but also duplicates the whole list as a single `COLLECT_GCC_OPTIONS=...` environment variable with added quoting on top. Would be nice if `gcc` just passed response files around as is :) -- Sergei