From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 054233858D26 for ; Sat, 5 Nov 2022 16:23:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 054233858D26 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667665412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dm66VVjDXU0jvpFRB4HeTIg3Bji3tIMjyoInfDjr63s=; b=Dr50yR26HC1Glqvtrkuu49MvVSyuZOjTaghMUljolTN1ya1uJWUmrmTtQq1KRWJlGdeX2b 75N8AmFRVJ/K3Jni5PYp9N5Uvcxtck3GSrFJjdKMLe33S4+bpBPIeqKIo8wh33MGlgf1cT fe33EzEfr3o/VEujFVjhGPvQWC6Zj2w= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-198-HXvMf-4eNsy1JyCPuYJ2wQ-1; Sat, 05 Nov 2022 12:23:30 -0400 X-MC-Unique: HXvMf-4eNsy1JyCPuYJ2wQ-1 Received: by mail-qv1-f71.google.com with SMTP id on28-20020a056214449c00b004bbf12d7976so5128057qvb.18 for ; Sat, 05 Nov 2022 09:23:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=dm66VVjDXU0jvpFRB4HeTIg3Bji3tIMjyoInfDjr63s=; b=M++UQXyL3HGgt/FQ9O5AEvWgid0TiBsGTJZX525+ilIjRS4RMxXSM+AMvR+rcKKjR3 Kgqi4hsjigidRvTkx6N/Oaoa5JBTvTFuX9cwirEf7s0SGPlUXGYJqZxaY2Ks3d85Iu68 IW5nY3D9yhv2wP31GAqTr4AmWL5Yki6E9QBBJdmcvD5vF5BTf3HfB1SVCvbsi0ggHlWo SF1/lFrmONzfBgqVCop1FvVEafXbfvZDKQ4IDdmU93fefsvZY0W6xWq2NmMa6U3uXhVv qlLkLCAr5QLFkjjxs///vYsAMoiXTzd8y9aetZ7zlSyzePKHR5x7CX95FWXamxQJvQoD UUkQ== X-Gm-Message-State: ACrzQf1baIg1mHwXbzSeShVtOurFlab9r7BqmhT2JxgVUbs9XswcYlzL yyIOR4fMuqcBHgeB0XUkJ7BINTnV0JhYpdKEZn6SCc6tPfZAKJEH0rDXs2jxbRGNgeTEWXgDtvl FiJFk3UNv59/w3vaAVA== X-Received: by 2002:a0c:b2c6:0:b0:4bc:22fb:b9c4 with SMTP id d6-20020a0cb2c6000000b004bc22fbb9c4mr18640815qvf.16.1667665410107; Sat, 05 Nov 2022 09:23:30 -0700 (PDT) X-Google-Smtp-Source: AMsMyM485R4h4eJYoBOvzbiAEV/08u2Pzql06XjpEfFGEasZkudVQRaHCHz74MzbFiVNnUqAaMOPlA== X-Received: by 2002:a0c:b2c6:0:b0:4bc:22fb:b9c4 with SMTP id d6-20020a0cb2c6000000b004bc22fbb9c4mr18640791qvf.16.1667665409682; Sat, 05 Nov 2022 09:23:29 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id b14-20020ac844ce000000b003a527d29a41sm2021786qto.75.2022.11.05.09.23.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 05 Nov 2022 09:23:29 -0700 (PDT) Message-ID: <298deff4873718eb8b5e366007e8f2d4baa83bdb.camel@redhat.com> Subject: Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers From: David Malcolm To: Lewis Hyatt , gcc-patches@gcc.gnu.org Date: Sat, 05 Nov 2022 12:23:28 -0400 In-Reply-To: References: User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote: > Add a new linemap reason LC_GEN which enables encoding the location > of data > that was generated during compilation and does not appear in any > source file. > There could be many use cases, such as, for instance, referring to > the content > of builtin macros (not yet implemented, but an easy lift after this > one.) The > first intended application is to create a place to store the input to > a > _Pragma directive, so that proper locations can be assigned to those > tokens. This will be done in a subsequent commit. >=20 > The actual change needed to the line-maps API in libcpp is very > minimal and > requires no space overhead in the line map data structures (on 64-bit > systems > that is; one newly added data member to class line_map_ordinary sits > inside > former padding bytes.) An LC_GEN map is just an ordinary map like any > other, > but the TO_FILE member that normally points to the file name points > instead to > the actual data.=C2=A0 This works automatically with PCH as well, for the > same > reason that the file name makes its way into a PCH. >=20 > Outside libcpp, there are many small changes but most of them are to > selftests, which are necessarily more sensitive to implementation > details. From the perspective of the user (the "user", here, being a > frontend > using line maps or else the diagnostics infrastructure), the chief > visible > change is that the function location_get_source_line() should be > passed an > expanded_location object instead of a separate filename and line > number.=C2=A0 This > is not a big change because in most cases, this information came > anyway from a > call to expand_location and the needed expanded_location object is > readily > available. The new overload of location_get_source_line() uses the > extra > information in the expanded_location object to obtain the data from > the > in-memory buffer when it originated from an LC_GEN map. >=20 > Until the subsequent patch that starts using LC_GEN maps, none are > yet > generated within GCC, hence nothing is added to the testsuite here; > but all > relevant selftests have been extended to cover generated data maps in > addition to normal files. Thanks for this patch. [...snip...] >=20 > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc > index 5890c18bdc3..2935d7fb236 100644 > --- a/gcc/c-family/c-common.cc > +++ b/gcc/c-family/c-common.cc > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const = char *file, location_t loc) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 const line_map_ordinary *ord_map > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=3D LINEMAPS_ORDINARY_MAP= _AT (line_table, i); > =C2=A0 > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (ord_map->reason =3D=3D LC_GEN) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0continue; > + > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (const line_map_ordinary *from > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D linemap_includ= ed_from_linemap (line_table, ord_map)) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* We cannot use pointer = equality, because with preprocessed > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 input all fi= lename strings are unique.=C2=A0 */ > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (0 =3D=3D strcmp (from->to_= file, file)) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (from->reason !=3D LC_GEN &= & 0 =3D=3D strcmp (from->to_file, file)) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 last_i= nclude_ord_map =3D from; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 last_o= rd_map_after_include =3D NULL; [...snip...] I'm not a fan of having the "to_file" field change meaning based on whether reason is LC_GEN. How involved would it be to split line_map_ordinary into two subclasses, so that we'd have this hierarchy (with indentation showing inheritance): line_map line_map_ordinary line_map_ordinary_file line_map_ordinary_generated line_map_macro Alternatively, how about renaming "to_file" to be "data" (or "m_data"), to emphasize that it might not be a filename, and that we have to check everywhere we access that field. Please can all those checks for LC_GEN go into an inline function so we can write e.g. map->generated_p () or somesuch. If I reading things right, patch 6 adds the sole usage of this in destringize_and_run. Would we ever want to discriminate between different kinds of generated buffers? [...snip...] > @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_contex= t *context, location_t where) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 N_("of module"), > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 N_("In module imported at"),=C2=A0=C2=A0=C2=A0/* 6= */ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 N_("imported at"), > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 N_("In buffer generated from"),=C2=A0=C2=A0 /* 8 */ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0}; We use the wording "destringized" in: so maybe this should be "In buffer destringized from" ??? (I'm not sure)=20 [...snip...] > diff --git a/gcc/input.cc b/gcc/input.cc > index 483cb6e940d..3cf5480551d 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc [..snip...] > @@ -58,7 +64,7 @@ public: > =C2=A0=C2=A0 ~file_cache_slot (); My initial thought reading the input.cc part of this patch was that I want it to be very clear when a file_cache_slot is for a real file vs when we're replaying generated data. I'd hoped that this could have been expressed via inheritance, but we preallocate all the cache slots once in an array in file_cache's ctor and the slots get reused over time. So instead of that, can we please have some kind of: bool file_slot_p () const; bool generated_slot_p () const; or somesuch, so that we can have clear assertions and conditionals about the current state of a slot (I think the discriminating condition is that generated_data_len > 0, right?) If I'm reading things right, it looks like file_cache_slot::m_file_path does double duty after this patch, and is either a filename, or a pointer to the generated data. If so, please can the patch rename it, and have all usage guarded appropriately. Can it be a union? (or does the ctor prevent that?) [...snip...] =C2=A0 > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned *high= est_use_count) > =C2=A0=C2=A0=C2=A0 num_file_slots files are cached.=C2=A0 */ > =C2=A0 > =C2=A0file_cache_slot* > -file_cache::add_file (const char *file_path) > +file_cache::add_file (const char *file_path, unsigned int generated_data= _len) Can we split this into two functions: one for files, and one for generated data? (add_file vs add_generated_data?) > =C2=A0{ > =C2=A0 > -=C2=A0 FILE *fp =3D fopen (file_path, "r"); > -=C2=A0 if (fp =3D=3D NULL) > -=C2=A0=C2=A0=C2=A0 return NULL; > +=C2=A0 FILE *fp; > +=C2=A0 if (generated_data_len) > +=C2=A0=C2=A0=C2=A0 fp =3D NULL; > +=C2=A0 else > +=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fp =3D fopen (file_path, "r"); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (fp =3D=3D NULL) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return NULL; > +=C2=A0=C2=A0=C2=A0 } > =C2=A0 > =C2=A0=C2=A0 unsigned highest_use_count =3D 0; > =C2=A0=C2=A0 file_cache_slot *r =3D evicted_cache_tab_entry (&highest_use= _count); > -=C2=A0 if (!r->create (in_context, file_path, fp, highest_use_count)) > +=C2=A0 if (!r->create (in_context, file_path, fp, highest_use_count, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 generated_data_len)) > =C2=A0=C2=A0=C2=A0=C2=A0 return NULL; > =C2=A0=C2=A0 return r; > =C2=A0} [...snip...] > @@ -535,11 +571,12 @@ file_cache::~file_cache () > =C2=A0=C2=A0=C2=A0 it.=C2=A0 */ > =C2=A0 > =C2=A0file_cache_slot* > -file_cache::lookup_or_add_file (const char *file_path) > +file_cache::lookup_or_add_file (const char *file_path, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0unsigned int generated_data_len) Likewise, could this be split into: lookup_or_add_file and lookup_or_add_generated or somesuch? > =C2=A0{ > =C2=A0=C2=A0 file_cache_slot *r =3D lookup_file (file_path); The patch doesn't seem to touch file_cache::lookup_file. Is the current implementation of that ideal (it looks like we're going to be doing strcmp of generated buffers, when presumably for those we could simply be doing pointer comparisons). Maybe rename it to lookup_slot? > =C2=A0=C2=A0 if (r =3D=3D NULL) > -=C2=A0=C2=A0=C2=A0 r =3D add_file (file_path); > +=C2=A0=C2=A0=C2=A0 r =3D add_file (file_path, generated_data_len); > =C2=A0=C2=A0 return r; > =C2=A0} > =C2=A0 > @@ -547,7 +584,8 @@ file_cache::lookup_or_add_file (const char *file_path= ) > =C2=A0=C2=A0=C2=A0 diagnostic.=C2=A0 */ > =C2=A0 > =C2=A0file_cache_slot::file_cache_slot () > -: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0), > +: m_use_count (0), m_file_path (NULL), m_fp (NULL), > +=C2=A0 m_data (0), m_data_active (0), > =C2=A0=C2=A0 m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_= idx (0), > =C2=A0=C2=A0 m_line_num (0), m_total_lines (0), m_missing_trailing_newlin= e (true) > =C2=A0{ [...snip...] > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h > index 50207cacc12..eb281809cbd 100644 > --- a/libcpp/include/line-map.h > +++ b/libcpp/include/line-map.h > @@ -75,6 +75,8 @@ enum lc_reason > =C2=A0=C2=A0 LC_RENAME_VERBATIM,=C2=A0=C2=A0/* Likewise, but "" !=3D stdi= n.=C2=A0 */ > =C2=A0=C2=A0 LC_ENTER_MACRO,=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Begin = macro expansion.=C2=A0 */ > =C2=A0=C2=A0 LC_MODULE,=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0/* A (C++) Module.=C2=A0 */ > +=C2=A0 LC_GEN,=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0/* Internally generated source.=C2=A0 */ > + > =C2=A0=C2=A0 /* FIXME: add support for stringize and paste.=C2=A0 */ > =C2=A0=C2=A0 LC_HWM /* High Water Mark.=C2=A0 */ > =C2=A0}; > @@ -437,7 +439,11 @@ struct GTY((tag ("1"))) line_map_ordinary : public l= ine_map { > =C2=A0 > =C2=A0=C2=A0 /* Pointer alignment boundary on both 32 and 64-bit systems.= =C2=A0 */ > =C2=A0 > -=C2=A0 const char *to_file; > +=C2=A0 /* This GTY markup is in case this is an LC_GEN map, in which cas= e > +=C2=A0=C2=A0=C2=A0=C2=A0 to_file actually points to the generated data, = which we do not > +=C2=A0=C2=A0=C2=A0=C2=A0 want to require to be free of null bytes.=C2=A0= */ > +=C2=A0 const char * GTY((string_length ("%h.to_file_len"))) to_file; > +=C2=A0 unsigned int to_file_len; > =C2=A0=C2=A0 linenum_type to_line; What's the intended interaction between this, the garbage-collector, and PCH? Is to_file to be allocated in the GC-managed heap, or can it be outside of it? Looking at patch 6 I see that this seems to be allocated (in destringize_and_run) by _cpp_unaligned_alloc. I don't remember off the top of my head if that's valid. > =C2=A0 > =C2=A0=C2=A0 /* Location from whence this line map was included.=C2=A0 Fo= r regular > @@ -1101,13 +1107,15 @@ extern line_map *line_map_new_raw (line_maps *, b= ool, unsigned); > =C2=A0=C2=A0=C2=A0 at least as long as the lifetime of SET.=C2=A0 An empt= y > =C2=A0=C2=A0=C2=A0 TO_FILE means standard input.=C2=A0 If reason is LC_LE= AVE, and > =C2=A0=C2=A0=C2=A0 TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are gi= ven their > -=C2=A0=C2=A0 natural values considering the file we are returning to. > +=C2=A0=C2=A0 natural values considering the file we are returning to.=C2= =A0 If reason > +=C2=A0=C2=A0 is LC_GEN, then TO_FILE is not a file name, but rather the = actual > +=C2=A0=C2=A0 content, and TO_FILE_LEN>0 is the length of it. > =C2=A0 > =C2=A0=C2=A0=C2=A0 A call to this function can relocate the previous set = of > =C2=A0=C2=A0=C2=A0 maps, so any stored line_map pointers should not be us= ed.=C2=A0 */ > =C2=A0extern const line_map *linemap_add > =C2=A0=C2=A0 (class line_maps *, enum lc_reason, unsigned int sysp, > -=C2=A0=C2=A0 const char *to_file, linenum_type to_line); > +=C2=A0=C2=A0 const char *to_file, linenum_type to_line, unsigned int to_= file_len =3D 0); > =C2=A0 > =C2=A0/* Create a macro map.=C2=A0 A macro map encodes source locations o= f tokens > =C2=A0=C2=A0=C2=A0 that are part of a macro replacement-list, at a macro = expansion > @@ -1304,7 +1312,8 @@ linemap_location_before_p (class line_maps *set, > =C2=A0 > =C2=A0typedef struct > =C2=A0{ > -=C2=A0 /* The name of the source file involved.=C2=A0 */ > +=C2=A0 /* The name of the source file involved, or NULL if > +=C2=A0=C2=A0=C2=A0=C2=A0 generated_data is non-NULL.=C2=A0 */ > =C2=A0=C2=A0 const char *file; > =C2=A0 > =C2=A0=C2=A0 /* The line-location in the source file.=C2=A0 */ > @@ -1316,6 +1325,10 @@ typedef struct > =C2=A0 > =C2=A0=C2=A0 /* In a system header?. */ > =C2=A0=C2=A0 bool sysp; > + > +=C2=A0 /* If generated data, the data and its length.=C2=A0 */ > +=C2=A0 unsigned int generated_data_len; > +=C2=A0 const char *generated_data; > =C2=A0} expanded_location; Probably worth noting that generated_data can contain NUL bytes, and isn't necessarily NUL-terminated. Thanks again for the patch; hope this is constructive Dave