From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id 19E5A3857713 for ; Wed, 25 Oct 2023 10:25:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 19E5A3857713 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 19E5A3857713 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::136 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698229532; cv=none; b=fs90LAAbrWdF0ri1hfhQVy9PIWS/ikfJ98yqa1vSp6xKE7S/65JNp53ufeCORRpA0oG8346mVQvVb4YJFjfTQzQVMX3ppy3X1QpvX3VeLsuIdN+QTO5/dPhwXp2tTFWAr6qcpEKCM9bkSRlw+GGGkfwdftJKgTROlv+CRZeIVWI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698229532; c=relaxed/simple; bh=HToVsUp+faD51cbTFdjqLF/f6T11+dwKZCEY0Lwd/tI=; h=DKIM-Signature:From:Mime-Version:Subject:Date:Message-Id:To; b=jEOHXhEcz5FTW1QNg8PXIbjKZ/knuM9h/kYJ9snR949BhplbLU9W9gaRJoD/SSL9ZESE1wtjLbazF0S3H3hLFLJLHFZtDFrMovVhkQ0X4njIaNm+blYX0W0csFEuAG8zYiu9D72L9O8drdW0ctwCi2j/I17qLoeb14z7+1A/HhQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-508126afb9bso1431047e87.0 for ; Wed, 25 Oct 2023 03:25:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698229528; x=1698834328; darn=gcc.gnu.org; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=Q069TZ6iCwJmpoO6yRZvap6gNyCPmqgZkH13NBJ0ab4=; b=Jr7OKyipleMHAavAVIJ6H5SrR5tzI2afbztNLrw8g+pPoJAVbnDtux4BsbCtNkNf5F kvLUfj1MotEOZewGnYjZDilBDqyznOw3uDHI3OqA6sPZa6iBNd3CWfbkSNaaK2rz7ipS b/W8nLdp/J1TKDkxKtM97wapT3TW/we0GPMw+K3fZsQ7kTjvM15X/edlYvPmGnAF6cXZ Z+UvrEuCMWjru2An5xqxcVOzi2LCr3oehf1VPLasmrwnLqjTcmH3cQsPWJ8epDAbdBHd DdqMieU7l3oiC/PPq6F6Xy8XnqPJNhM8bNlZASmOYzpV3zOTvRhCPwNCSNkfym7vWvhT 8Oew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698229528; x=1698834328; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q069TZ6iCwJmpoO6yRZvap6gNyCPmqgZkH13NBJ0ab4=; b=ZEVnXFkTfCHy2nevSZG4bUhyfqNGcRBWu17A1gnPzT/99Uz5e6tUKnnPoqt+dWPZOL /zNwBnGw6xtfbUELU6OjurYAFlH9Fd+ComZ47X7cBwMAVvV5X4kvV5SZgnd9l8M4YdDu m01gpg8Y2+HKgk1M17RpRxE0PWpjCzHPQSFTW9zWp+no7gSjgaFBqCRiBR8D3hzo6Yuf +io8qHHlLgGCmfUN/fIaoMDjZC8g49miUpx/lauq6kktSKG9sHI/KS2cHdZbIABMMPO2 rd7ALoVhLzsxKfkaECfXGgb6uqGwxiIwoq8WMSlF4lgPt2ko6FHaobL4tMjmxxiABR60 Js7A== X-Gm-Message-State: AOJu0YxAuzeXpLCvtbOqCXv0+fb/0dKsuK0nlt7pMa5Qq5Y4GrVdAMsr 0u/Mpsk2l/Cxrry8So4ylQg= X-Google-Smtp-Source: AGHT+IGW3fDHKYmOrkXnpq74IA47KRgTHIQNkdoH/odHLCAiOEha4Xpe/Lw3yCo9lPcFHBZnKUAptw== X-Received: by 2002:a05:6512:55a:b0:4f8:7513:8cac with SMTP id h26-20020a056512055a00b004f875138cacmr10683272lfl.48.1698229528095; Wed, 25 Oct 2023 03:25:28 -0700 (PDT) Received: from smtpclient.apple ([2a02:3038:206:bd89:d92a:212f:d326:b356]) by smtp.gmail.com with ESMTPSA id s3-20020adfea83000000b0031980783d78sm11765308wrm.54.2023.10.25.03.25.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Oct 2023 03:25:27 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Richard Biener Mime-Version: 1.0 (1.0) Subject: Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896) Date: Wed, 25 Oct 2023 12:25:16 +0200 Message-Id: References: <9cdc6bc40cf1d0e701a69e6e37cdffa89a0db690.camel@tugraz.at> Cc: Qing Zhao , Siddhesh Poyarekar , Joseph Myers , Jakub Jelinek , gcc Patches , kees Cook , isanbard@gmail.com In-Reply-To: <9cdc6bc40cf1d0e701a69e6e37cdffa89a0db690.camel@tugraz.at> To: Martin Uecker X-Mailer: iPhone Mail (20H30) X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Am 25.10.2023 um 10:16 schrieb Martin Uecker : >=20 > =EF=BB=BFAm Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener= : >>=20 >>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker : >>>=20 >>> =EF=BB=BFAm Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao: >>>> Hi, Sid, >>>>=20 >>>> Really appreciate for your example and detailed explanation. Very helpf= ul. >>>> I think that this example is an excellent example to show (almost) all t= he issues we need to consider. >>>>=20 >>>> I slightly modified this example to make it to be compilable and run-ab= le, as following:=20 >>>> (but I still cannot make the incorrect reordering or DSE happening, any= way, the potential reordering possibility is there=E2=80=A6) >>>>=20 >>>> 1 #include >>>> 2 struct A >>>> 3 { >>>> 4 size_t size; >>>> 5 char buf[] __attribute__((counted_by(size))); >>>> 6 }; >>>> 7=20 >>>> 8 static size_t >>>> 9 get_size_from (void *ptr) >>>> 10 { >>>> 11 return __builtin_dynamic_object_size (ptr, 1); >>>> 12 } >>>> 13=20 >>>> 14 void >>>> 15 foo (size_t sz) >>>> 16 { >>>> 17 struct A *obj =3D __builtin_malloc (sizeof(struct A) + sz * sizeof(= char)); >>>> 18 obj->size =3D sz; >>>> 19 obj->buf[0] =3D 2; >>>> 20 __builtin_printf (=E2=80=9C%d\n", get_size_from (obj->buf)); >>>> 21 return; >>>> 22 } >>>> 23=20 >>>> 24 int main () >>>> 25 { >>>> 26 foo (20); >>>> 27 return 0; >>>> 28 } >>>>=20 >>>> With my GCC, it was compiled and worked: >>>> [opc@qinzhao-ol8u3-x86 ]$ /home/opc/Install/latest-d/bin/gcc -O1 t5.c >>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out >>>> 20 >>>> Situation 1: With O1 and above, the routine =E2=80=9Cget_size_from=E2=80= =9D was inlined into =E2=80=9Cfoo=E2=80=9D, therefore, the call to __bdos is= in the same routine as the instantiation of the object, and the TYPE inform= ation and the attached counted_by attribute information in the TYPE of the o= bject can be USED by the __bdos call to compute the final object size.=20 >>>>=20 >>>> [opc@qinzhao-ol8u3-x86]$ /home/opc/Install/latest-d/bin/gcc -O0 t5.c >>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out >>>> -1 >>>> Situation 2: With O0, the routine =E2=80=9Cget_size_from=E2=80=9D was N= OT inlined into =E2=80=9Cfoo=E2=80=9D, therefore, the call to __bdos is Not i= n the same routine as the instantiation of the object, As a result, the TYPE= info and the attached counted_by info of the object can NOT be USED by the _= _bdos call.=20 >>>>=20 >>>> Keep in mind of the above 2 situations, we will refer them in below: >>>>=20 >>>> 1. First, the problem we are trying to resolve is: >>>>=20 >>>> (Your description): >>>>=20 >>>>> the reordering of __bdos w.r.t. initialization of the size parameter b= ut to also account for DSE of the assignment, we can abstract this problem t= o that of DFA being unable to see implicit use of the size parameter in the _= _bdos call. >>>>=20 >>>> basically is correct. However, with the following exception: >>>>=20 >>>> The implicit use of the size parameter in the __bdos call is not always= there, it ONLY exists WHEN the __bdos is able to evaluated to an expression= of the size parameter in the =E2=80=9Cobjsz=E2=80=9D phase, i.e., the =E2=80= =9CSituation 1=E2=80=9D of the above example.=20 >>>> In the =E2=80=9CSituation 2=E2=80=9D, when the __bdos does not see the T= YPE of the real object, it does not see the counted_by information from the= TYPE, therefore, it is not able to evaluate the size of the object through= the counted_by information. As a result, the implicit use of the size para= meter in the __bdos call does NOT exist at all. The optimizer can freely re= order the initialization of the size parameter with the __bdos call since th= ere is no data flow dependency between these two.=20 >>>>=20 >>>> With this exception in mind, we can see that your proposed =E2=80=9Copt= ion 2=E2=80=9D (making the type of size =E2=80=9Cvolatile=E2=80=9D) is too c= onservative, it will disable many optimizations unnecessarily, even though= it=E2=80=99s safe and simple to implement.=20 >>>>=20 >>>> As a compiler optimization person for many many years, I really don=E2=80= =99t want to take this approach at this moment. -:) >>>>=20 >>>> 2. Some facts I=E2=80=99d like to mention: >>>>=20 >>>> A. The incorrect reordering (or CSE) potential ONLY exists in the TREE= optimization stage. During RTL stage, the __bdos call has already been rep= laced by an expression of the size parameter or a constant, the data depende= ncy is explicitly in the IR already. I believe that the data analysis in RT= L stage should pick up the data dependency correctly, No special handling is= needed in RTL. >>>>=20 >>>> B. If the __bdos call cannot see the real object , it has no way to get= the =E2=80=9Ccounted_by=E2=80=9D field from the TYPE of the real object. So= , if we try to add the implicit use of the =E2=80=9Ccounted_by=E2=80=9D fiel= d to the __bdos call, the object instantiation should be in the same routine= as the __bdos call. Both the FE and the gimplification phase are too early= to do this work.=20 >>>>=20 >>>> 2. Then, what=E2=80=99s the best approach to resolve this problem: >>>>=20 >>>> There were several suggestions so far: >>>>=20 >>>> A. Add an additional argument, the size parameter, to __bdos,=20 >>>> A.1, during FE; >>>> A.2, during gimplification phase; >>>> B. Encode the implicit USE in the type of size, to make the size =E2=80= =9Cvolatile=E2=80=9D; >>>> C. Encode the implicit USE in the type of buf, then update the optimi= zation passes to use this implicit USE encoded in the type of buf. >>>>=20 >>>> As I explained in the above,=20 >>>> ** Approach A (both A.1 and A.2) does not work; >>>> ** Approach B will have big performance impact, I=E2=80=99d prefer not t= o take this approach at this moment. >>>> ** Approach C will be a lot of change in GCC, and also not very necessa= ry since the ONLY implicit use of the size parameter is in the __bdos call w= hen __bdos can see the real object. >>>>=20 >>>> So, all the above proposed approaches, A, B, C, are not very good.=20 >>>>=20 >>>> Then, maybe the following might work better? >>>>=20 >>>> In the tree optimization stage,=20 >>>> * After the inlining transformation applied, =20 >>>> + * Before the data-flow related optimization happens,=20 >>>> + * when the data flow analysis is constructed,=20 >>>>=20 >>>> For each call to __bdos, add the implicit use of size parameter.=20 >>>>=20 >>>> Is this doable?=20 >>>=20 >>> Here is another proposal: Add a new builtin function >>>=20 >>> __builtin_with_size(x, size) >>>=20 >>> that return x but behaves similar to an allocation >>> function in that BDOS can look at the size argument >>> to discover the size. >>>=20 >>> The FE insers this function when the field is accessed: >>=20 >> When it=E2=80=99s set I suppose. Turn >>=20 >> X.l =3D n; >>=20 >> Into >>=20 >> X.l =3D __builtin_with_size (x.buf, n); >=20 > It would turn=20 >=20 > some_variable =3D (&) x.buf >=20 > into=20 >=20 > some_variable =3D __builtin_with_size ( (&) x.buf. x.len) Unless you use the address of x.Len this will not work when len is initializ= ed after buf. And the address will not have a meaningful data dependence. >=20 > So the later access to x.buf and not the initialization > of a member of the struct (which is too early). >>=20 >> And indeed we need sth like a fat pointer to reliably solve all the issue= s. >=20 > What happens for other languages such as FORTRAN=20 > and ADA do? Are those pointers lowered in the FE? Yes > To me it seems there are two sound ways to introduce > such information: >=20 > - either by using the type system. This works in > the FE in C using variably modified types >=20 > char buf[n]; > __auto_type p =3D &buf; >=20 > ... =3D sizeof (*p); >=20 > But if I understand Jakob's comment to some PR=20 > correctly the size information in the TREE_TYPE > is not processed correctly anymore in the > middle-end.=20 The type based info is lowered during gimplification and in particular for p= ointer types the middle-end quickly loses track of the original type. Richard=20 >=20 > - or one injects the information via some > tree node or builtin at certain points in > time as suggested here, and the compiler > derives the information from these points=20 > as tree-object-size does. =20 >=20 >=20 > The use of attributes seems fragile and - looking > at the access attribute also overly complex. And=20 > we somehow support this only for function types > and not elsewhere and also this then gets lost > during inlining. So I think for all this stuff > (nonnull, access, counted_by) I think a better > approach is needed. >=20 >=20 > Martin >=20 >=20 >>=20 >> Richard=20 >=20 >=20 >=20 >=20 >>=20 >>> __builtin_with_size(x.buf, x.L); >>>=20 >>>=20 >>> Martin >>>=20 >>>=20 >>>=20 >>>>=20 >>>> Otherwise, we might need to take the =E2=80=9Cvolatile=E2=80=9D approac= h.=20 >>>>=20 >>>> Let me know your suggestion and comment. >>>>=20 >>>> Thanks a lot. >>>>=20 >>>> Qing >>>>=20 >>>>=20 >>>>> __bdos is the one such implicit user of the size parameter and you're p= roposing to solve this by encoding the relationship between buffer and size a= t the __bdos call site. But what about the case when the instantiation of t= he object is not at the same place as the __bdos call site, i.e. the DFA is u= nable to make that relationship? >>>>>=20 >>>>> The example Martin showed where the subobject gets "hidden" behind a p= ointer was a trivial one where DFA *may* actually work in practice (because t= he object-size pass can thread through these assignments) but think about th= is one: >>>>>=20 >>>>> struct A >>>>> { >>>>> size_t size; >>>>> char buf[] __attribute__((counted_by(size))); >>>>> } >>>>>=20 >>>>> static size_t >>>>> get_size_of (void *ptr) >>>>> { >>>>> return __bdos (ptr, 1); >>>>> } >>>>>=20 >>>>> void >>>>> foo (size_t sz) >>>>> { >>>>> struct A *obj =3D __builtin_malloc (sz); >>>>> obj->size =3D sz; >>>>>=20 >>>>> ... >>>>> __builtin_printf ("%zu\n", get_size_of (obj->array)); >>>>> ... >>>>> } >>>>>=20 >>>>> Until get_size_of is inlined, no DFA can see the __bdos call in the sa= me place as the point where obj is allocated. As a result, the assignment t= o obj->size could get reordered (or the store eliminated) w.r.t. the __bdos c= all until the inlining happens. >>>>>=20 >>>>> As a result, the relationship between buf and size established by the a= ttribute needs to be encoded into the type somehow. There are two options: >>>>>=20 >>>>> Option 1: Encode the relationship in the type of buf >>>>>=20 >>>>> This is kinda what you end up doing with component_ref_has_counted_by a= nd it does show the relationship if one is looking (through that call), but n= othing more that can be used to, e.g. prevent reordering or tell the optimiz= er that the reference to the buf member may imply a reference to the size me= mber as well. This could be remedied by somehow encoding the USES relations= hip for size into the type of buf that the optimization passes can see. I f= eel like this may be a bit convoluted to specify in a future language extens= ion in a way that will actually be well understood by developers, but it wil= l likely generate faster runtime code. This will also likely require a bigg= er change across passes. >>>>>=20 >>>>> Option 2: Encode the relationship in the type of size >>>>>=20 >>>>> The other option is to enhance the type of size somehow so that it dis= courages reordering and store elimination, basically pessimizing code. I th= ink volatile semantics might be the way to do this and may even be straightf= orward to specify in the future language extension given that it builds on a= known language construct and is thematically related. However it does pess= imize output for code that implements __counted_by__. >>>>>=20 >>>>> Thanks, >>>>> Sid >>>>=20 >>>=20 >=20 > --=20 > Univ.-Prof. Dr. rer. nat. Martin Uecker > Graz University of Technology > Institute of Biomedical Imaging >=20 >=20