On Sat, Nov 05, 2022 at 12:23:28PM -0400, David Malcolm wrote: > On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote: > [...snip...] > > > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc > > index 5890c18bdc3..2935d7fb236 100644 > > --- a/gcc/c-family/c-common.cc > > +++ b/gcc/c-family/c-common.cc > > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc) > > const line_map_ordinary *ord_map > > = LINEMAPS_ORDINARY_MAP_AT (line_table, i); > > > > + if (ord_map->reason == LC_GEN) > > + continue; > > + > > if (const line_map_ordinary *from > > = linemap_included_from_linemap (line_table, ord_map)) > > /* We cannot use pointer equality, because with preprocessed > > input all filename strings are unique. */ > > - if (0 == strcmp (from->to_file, file)) > > + if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file)) > > { > > last_include_ord_map = from; > > last_ord_map_after_include = NULL; > > [...snip...] > > I'm not a fan of having the "to_file" field change meaning based on > whether reason is LC_GEN. > > How involved would it be to split line_map_ordinary into two > subclasses, so that we'd have this hierarchy (with indentation showing > inheritance): > > line_map > line_map_ordinary > line_map_ordinary_file > line_map_ordinary_generated > line_map_macro > > Alternatively, how about renaming "to_file" to be "data" (or "m_data"), > to emphasize that it might not be a filename, and that we have to check > everywhere we access that field. > > Please can all those checks for LC_GEN go into an inline function so we > can write e.g. > map->generated_p () > or somesuch. > > If I reading things right, patch 6 adds the sole usage of this in > destringize_and_run. Would we ever want to discriminate between > different kinds of generated buffers? > > [...snip...] > > > @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where) > > N_("of module"), > > N_("In module imported at"), /* 6 */ > > N_("imported at"), > > + N_("In buffer generated from"), /* 8 */ > > }; > > We use the wording "destringized" in: > > so maybe this should be "In buffer destringized from" ??? (I'm not > sure) > > [...snip...] > > > diff --git a/gcc/input.cc b/gcc/input.cc > > index 483cb6e940d..3cf5480551d 100644 > > --- a/gcc/input.cc > > +++ b/gcc/input.cc > > [..snip...] > > > @@ -58,7 +64,7 @@ public: > > ~file_cache_slot (); > > My initial thought reading the input.cc part of this patch was that I > want it to be very clear when a file_cache_slot is for a real file vs > when we're replaying generated data. I'd hoped that this could have > been expressed via inheritance, but we preallocate all the cache slots > once in an array in file_cache's ctor and the slots get reused over > time. So instead of that, can we please have some kind of: > > bool file_slot_p () const; > bool generated_slot_p () const; > > or somesuch, so that we can have clear assertions and conditionals > about the current state of a slot (I think the discriminating condition > is that generated_data_len > 0, right?) > > If I'm reading things right, it looks like file_cache_slot::m_file_path > does double duty after this patch, and is either a filename, or a > pointer to the generated data. If so, please can the patch rename it, > and have all usage guarded appropriately. Can it be a union? (or does > the ctor prevent that?) > > [...snip...] > > > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count) > > num_file_slots files are cached. */ > > > > file_cache_slot* > > -file_cache::add_file (const char *file_path) > > +file_cache::add_file (const char *file_path, unsigned int generated_data_len) > > Can we split this into two functions: one for files, and one for > generated data? (add_file vs add_generated_data?) > > > { > > > > - FILE *fp = fopen (file_path, "r"); > > - if (fp == NULL) > > - return NULL; > > + FILE *fp; > > + if (generated_data_len) > > + fp = NULL; > > + else > > + { > > + fp = fopen (file_path, "r"); > > + if (fp == NULL) > > + return NULL; > > + } > > > > unsigned highest_use_count = 0; > > file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count); > > - if (!r->create (in_context, file_path, fp, highest_use_count)) > > + if (!r->create (in_context, file_path, fp, highest_use_count, > > + generated_data_len)) > > return NULL; > > return r; > > } > > [...snip...] > > > @@ -535,11 +571,12 @@ file_cache::~file_cache () > > it. */ > > > > file_cache_slot* > > -file_cache::lookup_or_add_file (const char *file_path) > > +file_cache::lookup_or_add_file (const char *file_path, > > + unsigned int generated_data_len) > > Likewise, could this be split into: > lookup_or_add_file > and > lookup_or_add_generated > or somesuch? > > > { > > file_cache_slot *r = lookup_file (file_path); > > The patch doesn't seem to touch file_cache::lookup_file. Is the > current implementation of that ideal (it looks like we're going to be > doing strcmp of generated buffers, when presumably for those we could > simply be doing pointer comparisons). > > Maybe rename it to lookup_slot? > > > if (r == NULL) > > - r = add_file (file_path); > > + r = add_file (file_path, generated_data_len); > > return r; > > } > > > > @@ -547,7 +584,8 @@ file_cache::lookup_or_add_file (const char *file_path) > > diagnostic. */ > > > > file_cache_slot::file_cache_slot () > > -: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0), > > +: m_use_count (0), m_file_path (NULL), m_fp (NULL), > > + m_data (0), m_data_active (0), > > m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0), > > m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true) > > { > > [...snip...] > > > > > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h > > index 50207cacc12..eb281809cbd 100644 > > --- a/libcpp/include/line-map.h > > +++ b/libcpp/include/line-map.h > > @@ -75,6 +75,8 @@ enum lc_reason > > LC_RENAME_VERBATIM, /* Likewise, but "" != stdin. */ > > LC_ENTER_MACRO, /* Begin macro expansion. */ > > LC_MODULE, /* A (C++) Module. */ > > + LC_GEN, /* Internally generated source. */ > > + > > /* FIXME: add support for stringize and paste. */ > > LC_HWM /* High Water Mark. */ > > }; > > @@ -437,7 +439,11 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map { > > > > /* Pointer alignment boundary on both 32 and 64-bit systems. */ > > > > - const char *to_file; > > + /* This GTY markup is in case this is an LC_GEN map, in which case > > + to_file actually points to the generated data, which we do not > > + want to require to be free of null bytes. */ > > + const char * GTY((string_length ("%h.to_file_len"))) to_file; > > + unsigned int to_file_len; > > linenum_type to_line; > > What's the intended interaction between this, the garbage-collector, > and PCH? Is to_file to be allocated in the GC-managed heap, or can it > be outside of it? Looking at patch 6 I see that this seems to be > allocated (in destringize_and_run) by _cpp_unaligned_alloc. I don't > remember off the top of my head if that's valid. > > > > > /* Location from whence this line map was included. For regular > > @@ -1101,13 +1107,15 @@ extern line_map *line_map_new_raw (line_maps *, bool, unsigned); > > at least as long as the lifetime of SET. An empty > > TO_FILE means standard input. If reason is LC_LEAVE, and > > TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their > > - natural values considering the file we are returning to. > > + natural values considering the file we are returning to. If reason > > + is LC_GEN, then TO_FILE is not a file name, but rather the actual > > + content, and TO_FILE_LEN>0 is the length of it. > > > > A call to this function can relocate the previous set of > > maps, so any stored line_map pointers should not be used. */ > > extern const line_map *linemap_add > > (class line_maps *, enum lc_reason, unsigned int sysp, > > - const char *to_file, linenum_type to_line); > > + const char *to_file, linenum_type to_line, unsigned int to_file_len = 0); > > > > /* Create a macro map. A macro map encodes source locations of tokens > > that are part of a macro replacement-list, at a macro expansion > > @@ -1304,7 +1312,8 @@ linemap_location_before_p (class line_maps *set, > > > > typedef struct > > { > > - /* The name of the source file involved. */ > > + /* The name of the source file involved, or NULL if > > + generated_data is non-NULL. */ > > const char *file; > > > > /* The line-location in the source file. */ > > @@ -1316,6 +1325,10 @@ typedef struct > > > > /* In a system header?. */ > > bool sysp; > > + > > + /* If generated data, the data and its length. */ > > + unsigned int generated_data_len; > > + const char *generated_data; > > } expanded_location; > > Probably worth noting that generated_data can contain NUL bytes, and > isn't necessarily NUL-terminated. > > > Thanks again for the patch; hope this is constructive > Dave > Hi Dave- Thanks again for taking a look at this one, sorry it's so long. I redid this patch 4/6 taking into account all of your suggestions. It's attached here. Now, on the linemap side of things, I renamed the member variable from TO_FILE to DATA, and created inline accessor functions to get at it. The inline accessors will assert that the linemap is of the correct type. I checked all the call sites and adjusted as needed. On the input.cc side of things, I switched it to use inheritance. The logic for finding and caching lines resides in a base class, while the two derived classes handle retrieving the data from the necessary source (a file, or an in-memory buffer). I think it's much nicer now, please let me know what you think? Thanks! BTW, the remaining patches downstream of this one do not need to be modified, except that the new testcase for 5c/6 (the SARIF output patch) needs one line changed since the output of -fdump-internal-locations now distinguishes LC_GEN maps as well. I can resend that and/or any of the others once the dust settles with this one if that's helpful. -Lewis