From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 902C03858C56 for ; Thu, 3 Nov 2022 23:06:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 902C03858C56 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667516800; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CKZHk8i/38OFZ+E3EHQHmh57Q0iUxy05gZsuJzgLNk0=; b=VrzwV2+5xPDLO2Y+TS27QXGLZo71kzsUIcix4oPC1p/TDN576o/WTrsIYdubGvzz0UnMa8 tmiyHvShE3tjd7+3Qsddxiq0newHLQlUi7nxFiatACO0xvwoloS8kVLMnz9R3O7pU9YRAu gFSD11ksWzkJ73N8czwq+JaFURxz398= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-513-oSiC5hDYNJq7uXF94KpCJA-1; Thu, 03 Nov 2022 19:06:38 -0400 X-MC-Unique: oSiC5hDYNJq7uXF94KpCJA-1 Received: by mail-qt1-f199.google.com with SMTP id fp9-20020a05622a508900b003a503ff1d4cso2819776qtb.22 for ; Thu, 03 Nov 2022 16:06:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=CKZHk8i/38OFZ+E3EHQHmh57Q0iUxy05gZsuJzgLNk0=; b=cVvc238cPlcNaHxe7/ysDzxbTTr+2uup/lsb1xy7mgYhODj4460Ebxr06LRqBr97p1 aK7R/5XZpjDxmguQ557bQ2kaaRtqhpLHy6cL0MlmenvEXysDMv0F1T0sz7uj6iGaWlMQ MSDGAv2yN6PjnPQUNmYOCobCp3qt2ZwTRQfNgFx3mD5QaxfZscm12wU1iryVW9TKvQ2v TWwmjnOSG6QcTGOzXU2IW1iGuDTRigB21XMQ0lQkOOdvNcPMQ/SrORsbRxi57hHCwI54 d+6P10PezeV/vQjLHAIoDwGD7XxtR7ufI1exGtdJW3nMpNdFDqq618m0OmBQ89mXA9Kq M7hA== X-Gm-Message-State: ACrzQf1T45di9Ph/1K9Y6Sq7a06m5QIQX3B0pvet5KOyd6AVmPyeCHk+ C7GnTQitz+HAhtBZnx+SLc5JtbHLTVS8tSgkoidmcJ6symT1yLOfKBI2CDebDf4Mv+HbFCnMxiM R+VQwzLKF4vN3CSShWg== X-Received: by 2002:a05:6214:29e9:b0:4bc:da6:d604 with SMTP id jv9-20020a05621429e900b004bc0da6d604mr18527765qvb.106.1667516798030; Thu, 03 Nov 2022 16:06:38 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4wpy2UEn667+aejH/qt7ZYxNBfSHhJYjDVALovbrmDC/EJCoOBUSynxWkYrVgGAwYLZYPrQA== X-Received: by 2002:a05:6214:29e9:b0:4bc:da6:d604 with SMTP id jv9-20020a05621429e900b004bc0da6d604mr18527736qvb.106.1667516797773; Thu, 03 Nov 2022 16:06:37 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id u1-20020a05620a0c4100b006ec5238eb97sm1729169qki.83.2022.11.03.16.06.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Nov 2022 16:06:37 -0700 (PDT) Message-ID: <906b1326bd95c094331f7a5ff46723986215e3cf.camel@redhat.com> Subject: Re: [PATCH RFA] input: add get_source_text_between From: David Malcolm To: Jason Merrill , gcc-patches@gcc.gnu.org Date: Thu, 03 Nov 2022 19:06:36 -0400 In-Reply-To: <20221103195902.2114479-1-jason@redhat.com> References: <20221103195902.2114479-1-jason@redhat.com> User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 2022-11-03 at 15:59 -0400, Jason Merrill via Gcc-patches wrote: > Tested x86_64-pc-linux-gnu, OK for trunk? >=20 > -- >8 -- >=20 > The c++-contracts branch uses this to retrieve the source form of the > contract predicate, to be returned by contract_violation::comment(). >=20 > gcc/ChangeLog: >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* input.cc (get_source_te= xt_between): New fn. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* input.h (get_source_tex= t_between): Declare. > --- > =C2=A0gcc/input.h=C2=A0 |=C2=A0 1 + > =C2=A0gcc/input.cc | 76 > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > =C2=A02 files changed, 77 insertions(+) >=20 > diff --git a/gcc/input.h b/gcc/input.h > index 11c571d076f..f18769950b5 100644 > --- a/gcc/input.h > +++ b/gcc/input.h > @@ -111,6 +111,7 @@ class char_span > =C2=A0}; > =C2=A0 > =C2=A0extern char_span location_get_source_line (const char *file_path, > int line); > +extern char *get_source_text_between (location_t, location_t); > =C2=A0 > =C2=A0extern bool location_missing_trailing_newline (const char > *file_path); > =C2=A0 > diff --git a/gcc/input.cc b/gcc/input.cc > index a28abfac5ac..9b36356338a 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -949,6 +949,82 @@ location_get_source_line (const char *file_path, > int line) > =C2=A0=C2=A0 return char_span (buffer, len); > =C2=A0} > =C2=A0 > +/* Return a copy of the source text between two locations.=C2=A0 The > caller is > +=C2=A0=C2=A0 responsible for freeing the return value.=C2=A0 */ > + > +char * > +get_source_text_between (location_t start, location_t end) > +{ > +=C2=A0 expanded_location expstart =3D > +=C2=A0=C2=A0=C2=A0 expand_location_to_spelling_point (start, > LOCATION_ASPECT_START); > +=C2=A0 expanded_location expend =3D > +=C2=A0=C2=A0=C2=A0 expand_location_to_spelling_point (end, LOCATION_ASPE= CT_FINISH); > + > +=C2=A0 /* If the locations are in different files or the end comes befor= e > the > +=C2=A0=C2=A0=C2=A0=C2=A0 start, abort and return nothing.=C2=A0 */ I don't like the use of the term "abort" here, as it suggests to me the use of abort(3). Maybe "bail out" instead? > +=C2=A0 if (!expstart.file || !expend.file) > +=C2=A0=C2=A0=C2=A0 return NULL; > +=C2=A0 if (strcmp (expstart.file, expend.file) !=3D 0) > +=C2=A0=C2=A0=C2=A0 return NULL; > +=C2=A0 if (expstart.line > expend.line) > +=C2=A0=C2=A0=C2=A0 return NULL; > +=C2=A0 if (expstart.line =3D=3D expend.line > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 && expstart.column > expend.column) > +=C2=A0=C2=A0=C2=A0 return NULL; We occasionally use the convention that (column =3D=3D 0) means "the whole line". Probably should detect that case and bail out early for it. > + > +=C2=A0 /* For a single line we need to trim both edges.=C2=A0 */ > +=C2=A0 if (expstart.line =3D=3D expend.line) > +=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 char_span line =3D location_get_source_li= ne (expstart.file, > expstart.line); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (line.length () < 1) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return NULL; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int s =3D expstart.column - 1; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int l =3D expend.column - s; Can we please avoid lower-case L "ell" for variable names, as it looks so similar to the numeral for one. "len" would be more descriptive here. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (line.length () < (size_t)expend.colum= n) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return NULL; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return line.subspan (s, l).xstrdup (); > +=C2=A0=C2=A0=C2=A0 } > + > +=C2=A0 struct obstack buf_obstack; > +=C2=A0 obstack_init (&buf_obstack); > + > +=C2=A0 /* Loop through all lines in the range and append each to buf; ma= y > trim > +=C2=A0=C2=A0=C2=A0=C2=A0 parts of the start and end lines off depending = on column > values.=C2=A0 */ > +=C2=A0 for (int l =3D expstart.line; l <=3D expend.line; ++l) Again, please let's not have a var named "l". Maybe "iter_line" as that's what is being done? > +=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 char_span line =3D location_get_source_li= ne (expstart.file, l); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (line.length () < 1 && (l !=3D expstar= t.line && l !=3D > expend.line)) ...especially as I *think* the first comparison is against numeral one, whereas comparisons two and three are against lower-case ell, but I'm having to squint at the font in my email client to be sure :-/ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0continue; > + > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* For the first line in the range, only = start at > expstart.column */ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (l =3D=3D expstart.line) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (expstart.column =3D= =3D 0) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return NULL= ; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (line.length () < (s= ize_t)expstart.column - 1) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return NULL= ; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 line =3D line.subspan (= expstart.column - 1, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 line.length() - expstart.column + 1); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* For the last line, don't go past expen= d.column */ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else if (l =3D=3D expend.line) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (line.length () < (s= ize_t)expend.column) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return NULL= ; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 line =3D line.subspan (= 0, expend.column); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > + > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 obstack_grow (&buf_obstack, line.get_buff= er (), line.length > ()); Is this accumulating the trailing newline characters into the buf_obstack? I *think* it is, but it seems worth a comment for each of the three cases (first line, intermediate line, last line). > +=C2=A0=C2=A0=C2=A0 } > + > +=C2=A0 /* NUL-terminate and finish the buf obstack.=C2=A0 */ > +=C2=A0 obstack_1grow (&buf_obstack, 0); > +=C2=A0 const char *buf =3D (const char *) obstack_finish (&buf_obstack); > + > +=C2=A0 /* TODO should we collapse/trim newlines and runs of spaces?=C2= =A0 */ > +=C2=A0 return xstrdup (buf); > +} > + Do you have test coverage for this from the DejaGnu side? If not, you could add selftest coverage for this; see input.cc's test_reading_source_line for something similar. OK for trunk otherwise. Hope this is helpful Dave