From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by sourceware.org (Postfix) with ESMTPS id D16883858D28 for ; Thu, 5 Jan 2023 22:34:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D16883858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x130.google.com with SMTP id m6so46667279lfj.11 for ; Thu, 05 Jan 2023 14:34:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=zYJBD+axBp4/rP7s47d6um1nKJIctkYXKNRP2Dw04CM=; b=KvWFcZTxsLxwU6kYgtlPfI76w7RUmVfpQJ46uOkv2sZdVdkOEuc7/HoOGnqcI+Obr5 wQVWTxf3fKhPL3i97lj/jT7kNde8TFxlB35Ye4wOykejzGKC4+xwvdIbXkd8U+xRkX0r 7qxCmJOs0pJDJ9OSzjRKHuwODGNVsqqwO86QRBXwBrqisWxd4t31p8LB89reast6dIKk 27+mH36dhb2eh9KUN16K7w5bV25gfQiKFQQlOi1vhn0BqJoB+MWo31vdqhT9Zuj+AvlC ApwiaJEAzMKjGw0A6i5ICTH8buwdA/NzsK6Pww1i/dOhdsPIMu2SGYhg0MaSmerUPiyc 3XSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zYJBD+axBp4/rP7s47d6um1nKJIctkYXKNRP2Dw04CM=; b=RP5T/i9M0RGufpHLgdDelMVeYjSijSq8vffusGkSIMap5aGLnvz3A8iozAPRZG98ss BYliTuortLTujp4UU//uc6zv4klm1Dkh0gbJL7C5tg/8+VyL+vVskIJlgdiwNrhd/9mr eGJjX1vqh8lXytgvs1bBooWsV5Bww/nn7ag9mQfiudPI4tXSJdbg1FOyCz5RNIqt+cdN QUIrbr1nzMUSpnsOf4hY81+7RL/e30huquZgVqHykhr9gU1L2uo89S0sa9LXrplkT0As sf2u1OG8awcw0+NNDEIZV4+Prcikcw6T2/ClW3jWsazItbHDbLzlZo5UxltdIoBJZnK7 WnLw== X-Gm-Message-State: AFqh2kof6YkffnLrEcZf40AbTljGIDH6R+YAOgs59G4gIJWur2okIdEJ Cv3Qgo55HDBfpxiU8GtyzyL5blS+ofHkuoqBd3PfxO9k1V0= X-Google-Smtp-Source: AMrXdXsnpikxTKeN8ik8rbyodicrHh1dEBO6Tenb7txSpJHdi9gP0lF5iIuA1/AXHFNiDNmW0TUW4kFNfvgSzUBLlc0= X-Received: by 2002:a05:6512:6cb:b0:4cc:53bf:d5f4 with SMTP id u11-20020a05651206cb00b004cc53bfd5f4mr429826lff.600.1672958093078; Thu, 05 Jan 2023 14:34:53 -0800 (PST) MIME-Version: 1.0 References: <298deff4873718eb8b5e366007e8f2d4baa83bdb.camel@redhat.com> <20221117212115.GA38914@ldh-imac.local> In-Reply-To: <20221117212115.GA38914@ldh-imac.local> From: Lewis Hyatt Date: Thu, 5 Jan 2023 17:34:41 -0500 Message-ID: Subject: Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers To: gcc-patches@gcc.gnu.org Cc: David Malcolm Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3035.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Nov 17, 2022 at 4:21 PM Lewis Hyatt wrote: > > On Sat, Nov 05, 2022 at 12:23:28PM -0400, David Malcolm wrote: > > On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote: > > [...snip...] > > > > > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc > > > index 5890c18bdc3..2935d7fb236 100644 > > > --- a/gcc/c-family/c-common.cc > > > +++ b/gcc/c-family/c-common.cc > > > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc) > > > const line_map_ordinary *ord_map > > > = LINEMAPS_ORDINARY_MAP_AT (line_table, i); > > > > > > + if (ord_map->reason == LC_GEN) > > > + continue; > > > + > > > if (const line_map_ordinary *from > > > = linemap_included_from_linemap (line_table, ord_map)) > > > /* We cannot use pointer equality, because with preprocessed > > > input all filename strings are unique. */ > > > - if (0 == strcmp (from->to_file, file)) > > > + if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file)) > > > { > > > last_include_ord_map = from; > > > last_ord_map_after_include = NULL; > > > > [...snip...] > > > > I'm not a fan of having the "to_file" field change meaning based on > > whether reason is LC_GEN. > > > > How involved would it be to split line_map_ordinary into two > > subclasses, so that we'd have this hierarchy (with indentation showing > > inheritance): > > > > line_map > > line_map_ordinary > > line_map_ordinary_file > > line_map_ordinary_generated > > line_map_macro > > > > Alternatively, how about renaming "to_file" to be "data" (or "m_data"), > > to emphasize that it might not be a filename, and that we have to check > > everywhere we access that field. > > > > Please can all those checks for LC_GEN go into an inline function so we > > can write e.g. > > map->generated_p () > > or somesuch. > > > > If I reading things right, patch 6 adds the sole usage of this in > > destringize_and_run. Would we ever want to discriminate between > > different kinds of generated buffers? > > > > [...snip...] > > > > > @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where) > > > N_("of module"), > > > N_("In module imported at"), /* 6 */ > > > N_("imported at"), > > > + N_("In buffer generated from"), /* 8 */ > > > }; > > > > We use the wording "destringized" in: > > > > so maybe this should be "In buffer destringized from" ??? (I'm not > > sure) > > > > [...snip...] > > > > > diff --git a/gcc/input.cc b/gcc/input.cc > > > index 483cb6e940d..3cf5480551d 100644 > > > --- a/gcc/input.cc > > > +++ b/gcc/input.cc > > > > [..snip...] > > > > > @@ -58,7 +64,7 @@ public: > > > ~file_cache_slot (); > > > > My initial thought reading the input.cc part of this patch was that I > > want it to be very clear when a file_cache_slot is for a real file vs > > when we're replaying generated data. I'd hoped that this could have > > been expressed via inheritance, but we preallocate all the cache slots > > once in an array in file_cache's ctor and the slots get reused over > > time. So instead of that, can we please have some kind of: > > > > bool file_slot_p () const; > > bool generated_slot_p () const; > > > > or somesuch, so that we can have clear assertions and conditionals > > about the current state of a slot (I think the discriminating condition > > is that generated_data_len > 0, right?) > > > > If I'm reading things right, it looks like file_cache_slot::m_file_path > > does double duty after this patch, and is either a filename, or a > > pointer to the generated data. If so, please can the patch rename it, > > and have all usage guarded appropriately. Can it be a union? (or does > > the ctor prevent that?) > > > > [...snip...] > > > > > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count) > > > num_file_slots files are cached. */ > > > > > > file_cache_slot* > > > -file_cache::add_file (const char *file_path) > > > +file_cache::add_file (const char *file_path, unsigned int generated_data_len) > > > > Can we split this into two functions: one for files, and one for > > generated data? (add_file vs add_generated_data?) > > > > > { > > > > > > - FILE *fp = fopen (file_path, "r"); > > > - if (fp == NULL) > > > - return NULL; > > > + FILE *fp; > > > + if (generated_data_len) > > > + fp = NULL; > > > + else > > > + { > > > + fp = fopen (file_path, "r"); > > > + if (fp == NULL) > > > + return NULL; > > > + } > > > > > > unsigned highest_use_count = 0; > > > file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count); > > > - if (!r->create (in_context, file_path, fp, highest_use_count)) > > > + if (!r->create (in_context, file_path, fp, highest_use_count, > > > + generated_data_len)) > > > return NULL; > > > return r; > > > } > > > > [...snip...] > > > > > @@ -535,11 +571,12 @@ file_cache::~file_cache () > > > it. */ > > > > > > file_cache_slot* > > > -file_cache::lookup_or_add_file (const char *file_path) > > > +file_cache::lookup_or_add_file (const char *file_path, > > > + unsigned int generated_data_len) > > > > Likewise, could this be split into: > > lookup_or_add_file > > and > > lookup_or_add_generated > > or somesuch? > > > > > { > > > file_cache_slot *r = lookup_file (file_path); > > > > The patch doesn't seem to touch file_cache::lookup_file. Is the > > current implementation of that ideal (it looks like we're going to be > > doing strcmp of generated buffers, when presumably for those we could > > simply be doing pointer comparisons). > > > > Maybe rename it to lookup_slot? > > > > > if (r == NULL) > > > - r = add_file (file_path); > > > + r = add_file (file_path, generated_data_len); > > > return r; > > > } > > > > > > @@ -547,7 +584,8 @@ file_cache::lookup_or_add_file (const char *file_path) > > > diagnostic. */ > > > > > > file_cache_slot::file_cache_slot () > > > -: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0), > > > +: m_use_count (0), m_file_path (NULL), m_fp (NULL), > > > + m_data (0), m_data_active (0), > > > m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0), > > > m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true) > > > { > > > > [...snip...] > > > > > > > > > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h > > > index 50207cacc12..eb281809cbd 100644 > > > --- a/libcpp/include/line-map.h > > > +++ b/libcpp/include/line-map.h > > > @@ -75,6 +75,8 @@ enum lc_reason > > > LC_RENAME_VERBATIM, /* Likewise, but "" != stdin. */ > > > LC_ENTER_MACRO, /* Begin macro expansion. */ > > > LC_MODULE, /* A (C++) Module. */ > > > + LC_GEN, /* Internally generated source. */ > > > + > > > /* FIXME: add support for stringize and paste. */ > > > LC_HWM /* High Water Mark. */ > > > }; > > > @@ -437,7 +439,11 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map { > > > > > > /* Pointer alignment boundary on both 32 and 64-bit systems. */ > > > > > > - const char *to_file; > > > + /* This GTY markup is in case this is an LC_GEN map, in which case > > > + to_file actually points to the generated data, which we do not > > > + want to require to be free of null bytes. */ > > > + const char * GTY((string_length ("%h.to_file_len"))) to_file; > > > + unsigned int to_file_len; > > > linenum_type to_line; > > > > What's the intended interaction between this, the garbage-collector, > > and PCH? Is to_file to be allocated in the GC-managed heap, or can it > > be outside of it? Looking at patch 6 I see that this seems to be > > allocated (in destringize_and_run) by _cpp_unaligned_alloc. I don't > > remember off the top of my head if that's valid. > > > > > > > > /* Location from whence this line map was included. For regular > > > @@ -1101,13 +1107,15 @@ extern line_map *line_map_new_raw (line_maps *, bool, unsigned); > > > at least as long as the lifetime of SET. An empty > > > TO_FILE means standard input. If reason is LC_LEAVE, and > > > TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their > > > - natural values considering the file we are returning to. > > > + natural values considering the file we are returning to. If reason > > > + is LC_GEN, then TO_FILE is not a file name, but rather the actual > > > + content, and TO_FILE_LEN>0 is the length of it. > > > > > > A call to this function can relocate the previous set of > > > maps, so any stored line_map pointers should not be used. */ > > > extern const line_map *linemap_add > > > (class line_maps *, enum lc_reason, unsigned int sysp, > > > - const char *to_file, linenum_type to_line); > > > + const char *to_file, linenum_type to_line, unsigned int to_file_len = 0); > > > > > > /* Create a macro map. A macro map encodes source locations of tokens > > > that are part of a macro replacement-list, at a macro expansion > > > @@ -1304,7 +1312,8 @@ linemap_location_before_p (class line_maps *set, > > > > > > typedef struct > > > { > > > - /* The name of the source file involved. */ > > > + /* The name of the source file involved, or NULL if > > > + generated_data is non-NULL. */ > > > const char *file; > > > > > > /* The line-location in the source file. */ > > > @@ -1316,6 +1325,10 @@ typedef struct > > > > > > /* In a system header?. */ > > > bool sysp; > > > + > > > + /* If generated data, the data and its length. */ > > > + unsigned int generated_data_len; > > > + const char *generated_data; > > > } expanded_location; > > > > Probably worth noting that generated_data can contain NUL bytes, and > > isn't necessarily NUL-terminated. > > > > > > Thanks again for the patch; hope this is constructive > > Dave > > > > Hi Dave- > > Thanks again for taking a look at this one, sorry it's so long. I redid this > patch 4/6 taking into account all of your suggestions. It's attached here. > Now, on the linemap side of things, I renamed the member variable from TO_FILE > to DATA, and created inline accessor functions to get at it. The inline > accessors will assert that the linemap is of the correct type. I checked all > the call sites and adjusted as needed. On the input.cc side of things, I > switched it to use inheritance. The logic for finding and caching lines > resides in a base class, while the two derived classes handle retrieving the > data from the necessary source (a file, or an in-memory buffer). I think it's > much nicer now, please let me know what you think? Thanks! > > BTW, the remaining patches downstream of this one do not need to be modified, > except that the new testcase for 5c/6 (the SARIF output patch) needs one line > changed since the output of -fdump-internal-locations now distinguishes LC_GEN > maps as well. I can resend that and/or any of the others once the dust > settles with this one if that's helpful. > > -Lewis Hello- I thought I might check in on this one to see if you'd still be able to take a look at it sometime please, and if you like the new approach better? It is at https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606616.html , but I am going to re-submit the 4 remaining patches to the list shortly so they are all in one place cleanly with more logical numbering. The other 3 are unchanged from before other than the one-line change to SARIF output. Thanks! -Lewis