From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailrelay.tugraz.at (mailrelay.tugraz.at [129.27.2.202]) by sourceware.org (Postfix) with ESMTPS id 1593C3858CDB for ; Wed, 25 Oct 2023 10:39:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1593C3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=tugraz.at Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tugraz.at ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1593C3858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=129.27.2.202 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698230379; cv=none; b=l5gRH8kjJ3xj7pbFttxZ6O6Z5bs7rGs34+aCefHBRC75eFsmP/E3ZKmvWHDoAABY4cQqipB2MRLFU2pcVeVUBgQBAfSMuecXB4xR8nJaxc+ouKtq6CqIuLLUQ4wopO3/5SvSz9fGptRPTTmvyUM07ABmvIM/a+D8PrZP8QaqOlw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698230379; c=relaxed/simple; bh=9mOyLTa0Too2ltqs1QcbnlFLQcB9UdxYBHfNzfNPhpg=; h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=pUUarqQY6nekxKugmsXrpl4TjrxrAcYuncUYhdsqVxAWLWduHMv9+t9GZ7B3QHBJssWUIO97R+pUTtGz1PL2h+qh+uegtIKzsWDyBOgZ7CKLyrxchnI1aP2F5WD3is6cW0T2PPTIm7Tj1K+ExiSUlU4EEJWIpFbmkzqNILc/rZ4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from fbmtpc21.tugraz.at (fbmtpc21.tugraz.at [129.27.144.40]) by mailrelay.tugraz.at (Postfix) with ESMTPSA id 4SFlmM4dzQz3wXB; Wed, 25 Oct 2023 12:39:31 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tugraz.at; s=mailrelay; t=1698230372; bh=jig3hUgp2g7qarz6Pnr7qWSmMG31hK8ZMhl+hu9QSjk=; h=Subject:From:To:Cc:Date:In-Reply-To:References; b=LsOifo4p5Z5nss3AGaNElr2uz7Ftbh78JECiU1+o//CEKzIpchi+W9N2GnF0fjI2R rs2me+ogoBYLzeEXPD/XQ89PfNNkmqLM433/1Qu5PWI0n/9d0TCbmU9yecR4BndWUV zG0lyIbPFQzCuJBJnJ6+cdu4b3k9D+7aBbQRagZA= Message-ID: <34fba43f91d050bb9f144ecb870f5ab7f14c75aa.camel@tugraz.at> Subject: Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896) From: Martin Uecker To: Richard Biener Cc: Qing Zhao , Siddhesh Poyarekar , Joseph Myers , Jakub Jelinek , gcc Patches , kees Cook , isanbard@gmail.com Date: Wed, 25 Oct 2023 12:39:31 +0200 In-Reply-To: References: <9cdc6bc40cf1d0e701a69e6e37cdffa89a0db690.camel@tugraz.at> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4-2 MIME-Version: 1.0 X-TUG-Backscatter-control: G/VXY7/6zeyuAY/PU2/0qw X-Spam-Scanner: SpamAssassin 3.003001 X-Spam-Score-relay: -1.9 X-Scanned-By: MIMEDefang 2.74 on 129.27.10.116 X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener: >=20 > > Am 25.10.2023 um 10:16 schrieb Martin Uecker : > >=20 > > =EF=BB=BFAm Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Bie= ner: > > >=20 > > > > > Am 24.10.2023 um 22:38 schrieb Martin Uecker : > > > >=20 > > > > =EF=BB=BFAm Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zh= ao: > > > > > Hi, Sid, > > > > >=20 > > > > > Really appreciate for your example and detailed explanation. Very= helpful. > > > > > I think that this example is an excellent example to show (almost= ) all the issues we need to consider. > > > > >=20 > > > > > I slightly modified this example to make it to be compilable and = run-able, as following:=20 > > > > > (but I still cannot make the incorrect reordering or DSE happenin= g, anyway, the potential reordering possibility is there=E2=80=A6) > > > > >=20 > > > > > 1 #include > > > > > 2 struct A > > > > > 3 { > > > > > 4 size_t size; > > > > > 5 char buf[] __attribute__((counted_by(size))); > > > > > 6 }; > > > > > 7=20 > > > > > 8 static size_t > > > > > 9 get_size_from (void *ptr) > > > > > 10 { > > > > > 11 return __builtin_dynamic_object_size (ptr, 1); > > > > > 12 } > > > > > 13=20 > > > > > 14 void > > > > > 15 foo (size_t sz) > > > > > 16 { > > > > > 17 struct A *obj =3D __builtin_malloc (sizeof(struct A) + sz * s= izeof(char)); > > > > > 18 obj->size =3D sz; > > > > > 19 obj->buf[0] =3D 2; > > > > > 20 __builtin_printf (=E2=80=9C%d\n", get_size_from (obj->buf)); > > > > > 21 return; > > > > > 22 } > > > > > 23=20 > > > > > 24 int main () > > > > > 25 { > > > > > 26 foo (20); > > > > > 27 return 0; > > > > > 28 } > > > > >=20 > > > > > With my GCC, it was compiled and worked: > > > > > [opc@qinzhao-ol8u3-x86 ]$ /home/opc/Install/latest-d/bin/gcc -O1= t5.c > > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out > > > > > 20 > > > > > Situation 1: With O1 and above, the routine =E2=80=9Cget_size_fro= m=E2=80=9D was inlined into =E2=80=9Cfoo=E2=80=9D, therefore, the call to _= _bdos is in the same routine as the instantiation of the object, and the TY= PE information and the attached counted_by attribute information in the TYP= E of the object can be USED by the __bdos call to compute the final object = size.=20 > > > > >=20 > > > > > [opc@qinzhao-ol8u3-x86]$ /home/opc/Install/latest-d/bin/gcc -O0 = t5.c > > > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out > > > > > -1 > > > > > Situation 2: With O0, the routine =E2=80=9Cget_size_from=E2=80=9D= was NOT inlined into =E2=80=9Cfoo=E2=80=9D, therefore, the call to __bdos = is Not in the same routine as the instantiation of the object, As a result,= the TYPE info and the attached counted_by info of the object can NOT be US= ED by the __bdos call.=20 > > > > >=20 > > > > > Keep in mind of the above 2 situations, we will refer them in bel= ow: > > > > >=20 > > > > > 1. First, the problem we are trying to resolve is: > > > > >=20 > > > > > (Your description): > > > > >=20 > > > > > > the reordering of __bdos w.r.t. initialization of the size para= meter but to also account for DSE of the assignment, we can abstract this p= roblem to that of DFA being unable to see implicit use of the size paramete= r in the __bdos call. > > > > >=20 > > > > > basically is correct. However, with the following exception: > > > > >=20 > > > > > The implicit use of the size parameter in the __bdos call is not = always there, it ONLY exists WHEN the __bdos is able to evaluated to an exp= ression of the size parameter in the =E2=80=9Cobjsz=E2=80=9D phase, i.e., t= he =E2=80=9CSituation 1=E2=80=9D of the above example.=20 > > > > > In the =E2=80=9CSituation 2=E2=80=9D, when the __bdos does not se= e the TYPE of the real object, it does not see the counted_by information = from the TYPE, therefore, it is not able to evaluate the size of the objec= t through the counted_by information. As a result, the implicit use of the= size parameter in the __bdos call does NOT exist at all. The optimizer ca= n freely reorder the initialization of the size parameter with the __bdos c= all since there is no data flow dependency between these two.=20 > > > > >=20 > > > > > With this exception in mind, we can see that your proposed =E2=80= =9Coption 2=E2=80=9D (making the type of size =E2=80=9Cvolatile=E2=80=9D) i= s too conservative, it will disable many optimizations unnecessarily, eve= n though it=E2=80=99s safe and simple to implement.=20 > > > > >=20 > > > > > As a compiler optimization person for many many years, I really d= on=E2=80=99t want to take this approach at this moment. -:) > > > > >=20 > > > > > 2. Some facts I=E2=80=99d like to mention: > > > > >=20 > > > > > A. The incorrect reordering (or CSE) potential ONLY exists in th= e TREE optimization stage. During RTL stage, the __bdos call has already b= een replaced by an expression of the size parameter or a constant, the data= dependency is explicitly in the IR already. I believe that the data analy= sis in RTL stage should pick up the data dependency correctly, No special h= andling is needed in RTL. > > > > >=20 > > > > > B. If the __bdos call cannot see the real object , it has no way = to get the =E2=80=9Ccounted_by=E2=80=9D field from the TYPE of the real obj= ect. So, if we try to add the implicit use of the =E2=80=9Ccounted_by=E2=80= =9D field to the __bdos call, the object instantiation should be in the sam= e routine as the __bdos call. Both the FE and the gimplification phase are= too early to do this work.=20 > > > > >=20 > > > > > 2. Then, what=E2=80=99s the best approach to resolve this problem= : > > > > >=20 > > > > > There were several suggestions so far: > > > > >=20 > > > > > A. Add an additional argument, the size parameter, to __bdos,= =20 > > > > > A.1, during FE; > > > > > A.2, during gimplification phase; > > > > > B. Encode the implicit USE in the type of size, to make the siz= e =E2=80=9Cvolatile=E2=80=9D; > > > > > C. Encode the implicit USE in the type of buf, then update the = optimization passes to use this implicit USE encoded in the type of buf. > > > > >=20 > > > > > As I explained in the above,=20 > > > > > ** Approach A (both A.1 and A.2) does not work; > > > > > ** Approach B will have big performance impact, I=E2=80=99d prefe= r not to take this approach at this moment. > > > > > ** Approach C will be a lot of change in GCC, and also not very n= ecessary since the ONLY implicit use of the size parameter is in the __bdos= call when __bdos can see the real object. > > > > >=20 > > > > > So, all the above proposed approaches, A, B, C, are not very good= .=20 > > > > >=20 > > > > > Then, maybe the following might work better? > > > > >=20 > > > > > In the tree optimization stage,=20 > > > > > * After the inlining transformation applied, =20 > > > > > + * Before the data-flow related optimization happens,=20 > > > > > + * when the data flow analysis is constructed,=20 > > > > >=20 > > > > > For each call to __bdos, add the implicit use of size parameter.= =20 > > > > >=20 > > > > > Is this doable?=20 > > > >=20 > > > > Here is another proposal: Add a new builtin function > > > >=20 > > > > __builtin_with_size(x, size) > > > >=20 > > > > that return x but behaves similar to an allocation > > > > function in that BDOS can look at the size argument > > > > to discover the size. > > > >=20 > > > > The FE insers this function when the field is accessed: > > >=20 > > > When it=E2=80=99s set I suppose. Turn > > >=20 > > > X.l =3D n; > > >=20 > > > Into > > >=20 > > > X.l =3D __builtin_with_size (x.buf, n); > >=20 > > It would turn=20 > >=20 > > some_variable =3D (&) x.buf > >=20 > > into=20 > >=20 > > some_variable =3D __builtin_with_size ( (&) x.buf. x.len) >=20 > Unless you use the address of x.Len this will not work when len is initia= lized after buf. And the address will not have a meaningful data dependenc= e. > >=20 It would be a semantic requirement for this feature that x.len needs to be initialized before x.buf is accessed. =C2=A0 Otherwise, I am not sure how to define the time point=C2=A0 at which x.len should be evaluated.=20 > > So the later access to x.buf and not the initialization > > of a member of the struct (which is too early). >=20 > > >=20 > > > And indeed we need sth like a fat pointer to reliably solve all the i= ssues. > >=20 > > What happens for other languages such as FORTRAN=20 > > and ADA do? Are those pointers lowered in the FE? >=20 > Yes >=20 > > To me it seems there are two sound ways to introduce > > such information: > >=20 > > - either by using the type system. This works in > > the FE in C using variably modified types > >=20 > > char buf[n]; > > __auto_type p =3D &buf; > >=20 > > ... =3D sizeof (*p); > >=20 > > But if I understand Jakob's comment to some PR=20 > > correctly the size information in the TREE_TYPE > > is not processed correctly anymore in the > > middle-end.=20 >=20 > The type based info is lowered during gimplification and in particular fo= r pointer types the middle-end quickly loses track of the original type. >=20 Would it work if we make sure that we find a suitable type? Or in other words, are the (non-constant) size=C2=A0 expressions inside=C2=A0it still useful in later passes?=20 Martin > Richard=20 >=20 > >=20 > > - or one injects the information via some > > tree node or builtin at certain points in > > time as suggested here, and the compiler > > derives the information from these points=20 > > as tree-object-size does. =20 > >=20 > >=20 > > The use of attributes seems fragile and - looking > > at the access attribute also overly complex. And=20 > > we somehow support this only for function types > > and not elsewhere and also this then gets lost > > during inlining. So I think for all this stuff > > (nonnull, access, counted_by) I think a better > > approach is needed. > >=20 > >=20 > > Martin > >=20 > >=20 > > >=20 > > > Richard=20 > >=20 > >=20 > >=20 > >=20 > > >=20 > > > > __builtin_with_size(x.buf, x.L); > > > >=20 > > > >=20 > > > > Martin > > > >=20 > > > >=20 > > > >=20 > > > > >=20 > > > > > Otherwise, we might need to take the =E2=80=9Cvolatile=E2=80=9D a= pproach.=20 > > > > >=20 > > > > > Let me know your suggestion and comment. > > > > >=20 > > > > > Thanks a lot. > > > > >=20 > > > > > Qing > > > > >=20 > > > > >=20 > > > > > > __bdos is the one such implicit user of the size parameter and = you're proposing to solve this by encoding the relationship between buffer = and size at the __bdos call site. But what about the case when the instant= iation of the object is not at the same place as the __bdos call site, i.e.= the DFA is unable to make that relationship? > > > > > >=20 > > > > > > The example Martin showed where the subobject gets "hidden" beh= ind a pointer was a trivial one where DFA *may* actually work in practice (= because the object-size pass can thread through these assignments) but thin= k about this one: > > > > > >=20 > > > > > > struct A > > > > > > { > > > > > > size_t size; > > > > > > char buf[] __attribute__((counted_by(size))); > > > > > > } > > > > > >=20 > > > > > > static size_t > > > > > > get_size_of (void *ptr) > > > > > > { > > > > > > return __bdos (ptr, 1); > > > > > > } > > > > > >=20 > > > > > > void > > > > > > foo (size_t sz) > > > > > > { > > > > > > struct A *obj =3D __builtin_malloc (sz); > > > > > > obj->size =3D sz; > > > > > >=20 > > > > > > ... > > > > > > __builtin_printf ("%zu\n", get_size_of (obj->array)); > > > > > > ... > > > > > > } > > > > > >=20 > > > > > > Until get_size_of is inlined, no DFA can see the __bdos call in= the same place as the point where obj is allocated. As a result, the assi= gnment to obj->size could get reordered (or the store eliminated) w.r.t. th= e __bdos call until the inlining happens. > > > > > >=20 > > > > > > As a result, the relationship between buf and size established = by the attribute needs to be encoded into the type somehow. There are two = options: > > > > > >=20 > > > > > > Option 1: Encode the relationship in the type of buf > > > > > >=20 > > > > > > This is kinda what you end up doing with component_ref_has_coun= ted_by and it does show the relationship if one is looking (through that ca= ll), but nothing more that can be used to, e.g. prevent reordering or tell = the optimizer that the reference to the buf member may imply a reference to= the size member as well. This could be remedied by somehow encoding the U= SES relationship for size into the type of buf that the optimization passes= can see. I feel like this may be a bit convoluted to specify in a future = language extension in a way that will actually be well understood by develo= pers, but it will likely generate faster runtime code. This will also like= ly require a bigger change across passes. > > > > > >=20 > > > > > > Option 2: Encode the relationship in the type of size > > > > > >=20 > > > > > > The other option is to enhance the type of size somehow so that= it discourages reordering and store elimination, basically pessimizing cod= e. I think volatile semantics might be the way to do this and may even be = straightforward to specify in the future language extension given that it b= uilds on a known language construct and is thematically related. However i= t does pessimize output for code that implements __counted_by__. > > > > > >=20 > > > > > > Thanks, > > > > > > Sid > > > > >=20 > > > >=20 > >=20 > > --=20 > > Univ.-Prof. Dr. rer. nat. Martin Uecker > > Graz University of Technology > > Institute of Biomedical Imaging > >=20 > >=20 --=20 Univ.-Prof. Dr. rer. nat. Martin Uecker Graz University of Technology Institute of Biomedical Imaging