From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailrelay.tugraz.at (mailrelay.tugraz.at [129.27.2.202]) by sourceware.org (Postfix) with ESMTPS id 2A79B3858D28 for ; Wed, 25 Oct 2023 05:26:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2A79B3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=tugraz.at Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tugraz.at ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2A79B3858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=129.27.2.202 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698211609; cv=none; b=wjl+8LpiYQoUOorK8ZvQKIgF+wamG8wid/2OgCXfu78yxg66fjhd8Jqm+1NEBQsfZtTALGH0f6kr4duKETjsPiyeni/tBHBVTlSvGzayXv/LQOrr6osxk7Ue97GbG5olVkzpA8Bs19C+f3AC74gB/oJ8FSVhWWzaIf/D3JLaKu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698211609; c=relaxed/simple; bh=uR6k2BpKH4jVZxqmkmdtVPWalRzgzB7fIpunDs4HvDQ=; h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=dM8Yxtd9OitstNX3BNuu4sXKdFU4AYQ3FNWaUJaSOSK/zqP6wr3gulcwpiMWq7YKYJwrQCOQczr0GczbnQKrCa4q2fzA9NZ9RAQJw5mORIvuk1DYdUF0ThgyYZqP7ohiaV5k88xSnVnRbi9V42HLHLIep28gk+hOJpxIGnPUh+g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from vra-173-46.tugraz.at (vra-173-46.tugraz.at [129.27.173.46]) by mailrelay.tugraz.at (Postfix) with ESMTPSA id 4SFcqG3mgWz3wNk; Wed, 25 Oct 2023 07:26:34 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tugraz.at; s=mailrelay; t=1698211595; bh=dfwzvShDq3xtAiAGbKf1/QFiUsV/MTqI9zEx+el1MHk=; h=Subject:From:To:Cc:Date:In-Reply-To:References; b=HnyVw1IBI73tOcsFE12AKON1dtRa0uMBoco3uA8M91tSSrbKo2VK5/e8XTMCjUO+P 4jTDvU/faVpZ2gWTsf6e/v/X6tbneyevmTbbQFR5kJhHNNP8JEvEDL/KetYGCdaBw5 n4Zg7BAuqS8JUKsj1rsEOw3bbqk6nl4vM/CBuQRE= Message-ID: <0502f5b807f59d6e7972b80a644fa7e68ed9290c.camel@tugraz.at> Subject: Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896) From: Martin Uecker To: Qing Zhao Cc: Siddhesh Poyarekar , Richard Biener , Joseph Myers , Jakub Jelinek , gcc Patches , kees Cook , "isanbard@gmail.com" Date: Wed, 25 Oct 2023 07:26:34 +0200 In-Reply-To: <2A78AF07-4E87-43DD-9C1B-23A15C43A8E4@oracle.com> References: <9DDD0677-BFE7-4733-8C11-0FA8B3C25569@oracle.com> <283B5497-52BD-47D4-BC08-0982AB6740CA@oracle.com> <53e8ed5778d0e908d224d940ddc3d99575b83dd3.camel@tugraz.at> <168fea24-844e-4d1a-9361-afb6332b4f11@gotplt.org> <8C548B97-BD81-4EBB-A59C-F7E6E0A3C4F7@oracle.com> <92426898-afa7-47f7-9aab-5f2114eb826a@gotplt.org> <05d65224067ff72124d94b6b5274f14d013b64af.camel@tugraz.at> <2A78AF07-4E87-43DD-9C1B-23A15C43A8E4@oracle.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4-2 MIME-Version: 1.0 X-TUG-Backscatter-control: G/VXY7/6zeyuAY/PU2/0qw X-Spam-Scanner: SpamAssassin 3.003001 X-Spam-Score-relay: -0.4 X-Scanned-By: MIMEDefang 2.74 on 129.27.10.116 X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Am Dienstag, dem 24.10.2023 um 22:51 +0000 schrieb Qing Zhao: >=20 > > On Oct 24, 2023, at 4:38 PM, Martin Uecker wrote: > >=20 > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao: > > > Hi, Sid, > > >=20 > > > Really appreciate for your example and detailed explanation. Very hel= pful. > > > I think that this example is an excellent example to show (almost) al= l the issues we need to consider. > > >=20 > > > I slightly modified this example to make it to be compilable and run-= able, as following:=20 > > > (but I still cannot make the incorrect reordering or DSE happening, a= nyway, the potential reordering possibility is there=E2=80=A6) > > >=20 > > > 1 #include > > > 2 struct A > > > 3 { > > > 4 size_t size; > > > 5 char buf[] __attribute__((counted_by(size))); > > > 6 }; > > > 7=20 > > > 8 static size_t > > > 9 get_size_from (void *ptr) > > > 10 { > > > 11 return __builtin_dynamic_object_size (ptr, 1); > > > 12 } > > > 13=20 > > > 14 void > > > 15 foo (size_t sz) > > > 16 { > > > 17 struct A *obj =3D __builtin_malloc (sizeof(struct A) + sz * sizeo= f(char)); > > > 18 obj->size =3D sz; > > > 19 obj->buf[0] =3D 2; > > > 20 __builtin_printf (=E2=80=9C%d\n", get_size_from (obj->buf)); > > > 21 return; > > > 22 } > > > 23=20 > > > 24 int main () > > > 25 { > > > 26 foo (20); > > > 27 return 0; > > > 28 } > > >=20 > > > With my GCC, it was compiled and worked: > > > [opc@qinzhao-ol8u3-x86 ]$ /home/opc/Install/latest-d/bin/gcc -O1 t5.= c > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out > > > 20 > > > Situation 1: With O1 and above, the routine =E2=80=9Cget_size_from=E2= =80=9D was inlined into =E2=80=9Cfoo=E2=80=9D, therefore, the call to __bdo= s is in the same routine as the instantiation of the object, and the TYPE i= nformation and the attached counted_by attribute information in the TYPE of= the object can be USED by the __bdos call to compute the final object size= .=20 > > >=20 > > > [opc@qinzhao-ol8u3-x86]$ /home/opc/Install/latest-d/bin/gcc -O0 t5.= c > > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out > > > -1 > > > Situation 2: With O0, the routine =E2=80=9Cget_size_from=E2=80=9D was= NOT inlined into =E2=80=9Cfoo=E2=80=9D, therefore, the call to __bdos is N= ot in the same routine as the instantiation of the object, As a result, the= TYPE info and the attached counted_by info of the object can NOT be USED b= y the __bdos call.=20 > > >=20 > > > Keep in mind of the above 2 situations, we will refer them in below: > > >=20 > > > 1. First, the problem we are trying to resolve is: > > >=20 > > > (Your description): > > >=20 > > > > the reordering of __bdos w.r.t. initialization of the size paramete= r but to also account for DSE of the assignment, we can abstract this probl= em to that of DFA being unable to see implicit use of the size parameter in= the __bdos call. > > >=20 > > > basically is correct. However, with the following exception: > > >=20 > > > The implicit use of the size parameter in the __bdos call is not alwa= ys there, it ONLY exists WHEN the __bdos is able to evaluated to an express= ion of the size parameter in the =E2=80=9Cobjsz=E2=80=9D phase, i.e., the = =E2=80=9CSituation 1=E2=80=9D of the above example.=20 > > > In the =E2=80=9CSituation 2=E2=80=9D, when the __bdos does not see th= e TYPE of the real object, it does not see the counted_by information from= the TYPE, therefore, it is not able to evaluate the size of the object th= rough the counted_by information. As a result, the implicit use of the siz= e parameter in the __bdos call does NOT exist at all. The optimizer can fr= eely reorder the initialization of the size parameter with the __bdos call = since there is no data flow dependency between these two.=20 > > >=20 > > > With this exception in mind, we can see that your proposed =E2=80=9Co= ption 2=E2=80=9D (making the type of size =E2=80=9Cvolatile=E2=80=9D) is to= o conservative, it will disable many optimizations unnecessarily, even th= ough it=E2=80=99s safe and simple to implement.=20 > > >=20 > > > As a compiler optimization person for many many years, I really don= =E2=80=99t want to take this approach at this moment. -:) > > >=20 > > > 2. Some facts I=E2=80=99d like to mention: > > >=20 > > > A. The incorrect reordering (or CSE) potential ONLY exists in the TR= EE optimization stage. During RTL stage, the __bdos call has already been = replaced by an expression of the size parameter or a constant, the data dep= endency is explicitly in the IR already. I believe that the data analysis = in RTL stage should pick up the data dependency correctly, No special handl= ing is needed in RTL. > > >=20 > > > B. If the __bdos call cannot see the real object , it has no way to g= et the =E2=80=9Ccounted_by=E2=80=9D field from the TYPE of the real object.= So, if we try to add the implicit use of the =E2=80=9Ccounted_by=E2=80=9D = field to the __bdos call, the object instantiation should be in the same ro= utine as the __bdos call. Both the FE and the gimplification phase are too= early to do this work.=20 > > >=20 > > > 2. Then, what=E2=80=99s the best approach to resolve this problem: > > >=20 > > > There were several suggestions so far: > > >=20 > > > A. Add an additional argument, the size parameter, to __bdos,=20 > > > A.1, during FE; > > > A.2, during gimplification phase; > > > B. Encode the implicit USE in the type of size, to make the size = =E2=80=9Cvolatile=E2=80=9D; > > > C. Encode the implicit USE in the type of buf, then update the opti= mization passes to use this implicit USE encoded in the type of buf. > > >=20 > > > As I explained in the above,=20 > > > ** Approach A (both A.1 and A.2) does not work; > > > ** Approach B will have big performance impact, I=E2=80=99d prefer no= t to take this approach at this moment. > > > ** Approach C will be a lot of change in GCC, and also not very neces= sary since the ONLY implicit use of the size parameter is in the __bdos cal= l when __bdos can see the real object. > > >=20 > > > So, all the above proposed approaches, A, B, C, are not very good.= =20 > > >=20 > > > Then, maybe the following might work better? > > >=20 > > > In the tree optimization stage,=20 > > > * After the inlining transformation applied, =20 > > > + * Before the data-flow related optimization happens,=20 > > > + * when the data flow analysis is constructed,=20 > > >=20 > > > For each call to __bdos, add the implicit use of size parameter.=20 > > >=20 > > > Is this doable?=20 > >=20 > > Here is another proposal: Add a new builtin function > >=20 > > __builtin_with_size(x, size) > >=20 > > that return x but behaves similar to an allocation > > function in that BDOS can look at the size argument > > to discover the size. > >=20 > > The FE insers this function when the field is accessed: > >=20 > > __builtin_with_size(x.buf, x.L); >=20 > Thanks for the proposal! >=20 > So what you suggested is: >=20 > For every x.buf, change it as a __builtin_with_size(x.buf, x.L) in the F= E, then the call to the _bdos (x.buf, 1) will > Become: >=20 > _bdos(__builtin_with_size(x.buf, x.L), 1)? >=20 > Then the implicit use of x.L in _bdos(x.buf.1) will become explicit? >=20 > This looks like a very promising solution. >=20 > Will study this a. Little bit more. Yes, the load will be created explicitely in the FE at the right position. The BDOS pass can then later propagate the size to where it is used: x =3D &__builtin_with_size(x.buf, x.L) ...other stuff is happening... __bdos(x, 1). See for a working example: https://godbolt.org/z/Ej3s1GToa which shows that reordering is still possible. I think this should be easy to implement because it is similar to how BDOS works with builtin allocation functions. And this feature seems generally useful.=20 I am not sure whether the builtin to take a pointer argument or the object itself. Note that the builtin does nothing. It just ensure that the size argument is evaluated at the right point in time. Maybe it should have arguments for min / max=20 subobject etc.. Not sure. Martin >=20 > Qing > >=20 > >=20 > > Martin > >=20 > >=20 > >=20 > > >=20 > > > Otherwise, we might need to take the =E2=80=9Cvolatile=E2=80=9D appro= ach.=20 > > >=20 > > > Let me know your suggestion and comment. > > >=20 > > > Thanks a lot. > > >=20 > > > Qing > > >=20 > > >=20 > > > > __bdos is the one such implicit user of the size parameter and you'= re proposing to solve this by encoding the relationship between buffer and = size at the __bdos call site. But what about the case when the instantiati= on of the object is not at the same place as the __bdos call site, i.e. the= DFA is unable to make that relationship? > > > >=20 > > > > The example Martin showed where the subobject gets "hidden" behind = a pointer was a trivial one where DFA *may* actually work in practice (beca= use the object-size pass can thread through these assignments) but think ab= out this one: > > > >=20 > > > > struct A > > > > { > > > > size_t size; > > > > char buf[] __attribute__((counted_by(size))); > > > > } > > > >=20 > > > > static size_t > > > > get_size_of (void *ptr) > > > > { > > > > return __bdos (ptr, 1); > > > > } > > > >=20 > > > > void > > > > foo (size_t sz) > > > > { > > > > struct A *obj =3D __builtin_malloc (sz); > > > > obj->size =3D sz; > > > >=20 > > > > ... > > > > __builtin_printf ("%zu\n", get_size_of (obj->array)); > > > > ... > > > > } > > > >=20 > > > > Until get_size_of is inlined, no DFA can see the __bdos call in the= same place as the point where obj is allocated. As a result, the assignme= nt to obj->size could get reordered (or the store eliminated) w.r.t. the __= bdos call until the inlining happens. > > > >=20 > > > > As a result, the relationship between buf and size established by t= he attribute needs to be encoded into the type somehow. There are two opti= ons: > > > >=20 > > > > Option 1: Encode the relationship in the type of buf > > > >=20 > > > > This is kinda what you end up doing with component_ref_has_counted_= by and it does show the relationship if one is looking (through that call),= but nothing more that can be used to, e.g. prevent reordering or tell the = optimizer that the reference to the buf member may imply a reference to the= size member as well. This could be remedied by somehow encoding the USES = relationship for size into the type of buf that the optimization passes can= see. I feel like this may be a bit convoluted to specify in a future lang= uage extension in a way that will actually be well understood by developers= , but it will likely generate faster runtime code. This will also likely r= equire a bigger change across passes. > > > >=20 > > > > Option 2: Encode the relationship in the type of size > > > >=20 > > > > The other option is to enhance the type of size somehow so that it = discourages reordering and store elimination, basically pessimizing code. = I think volatile semantics might be the way to do this and may even be stra= ightforward to specify in the future language extension given that it build= s on a known language construct and is thematically related. However it do= es pessimize output for code that implements __counted_by__. > > > >=20 > > > > Thanks, > > > > Sid >=20