public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Qing Zhao <qing.zhao@oracle.com>
To: richard Biener <richard.guenther@gmail.com>,
	Joseph Myers <joseph@codesourcery.com>,
	Siddhesh Poyarekar <siddhesh@gotplt.org>,
	Martin Uecker <uecker@tugraz.at>,
	Jakub Jelinek <jakub@redhat.com>
Cc: Kees Cook <keescook@chromium.org>,
	"isanbard@gmail.com" <isanbard@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: RFC (V2) the proposal to resolve the missing dependency issue for counted_by attribute
Date: Thu, 9 Nov 2023 15:49:49 +0000	[thread overview]
Message-ID: <2D704EB7-B3FE-4280-ACC0-AA0C4236E39F@oracle.com> (raw)
In-Reply-To: <5B561B7B-ACD9-4378-A771-F1479A87382D@oracle.com>

Is it reasonable to add one option to disable the “counted_by” attribute?
(then no insertion of the new .ACCESS_WITH_SIZE into IL).  

The major reason is: some users might want to ignore all the “counted_by” attribute added in the source code,
We need to provide them a way to disable this feature.

thanks.

Qing

> On Nov 6, 2023, at 7:12 PM, Qing Zhao <qing.zhao@oracle.com> wrote:
> 
> Hi,
> 
> Attached is the 2nd version of the proposal based on all the discussion so far.
> 
> Let me know if you have more comment and suggestion.
> 
> Thanks a lot for all the help.
> 
> Qing
> ===========================================
> Represent the missing dependence for the "counted_by" attribute and its consumers 
> 
> Qing Zhao
> 
> 11/06/2023
> ==============================================
> 
> The whole discussion is at:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634844.html
> 
> 1. The problem
> 
> There is a data dependency between the size assignment and the implicit use of the size information in the __builtin_dynamic_object_size that is missing in the IL (line 11 and line 13 in the below example). Such information missing will result incorrect code reordering and other code transformations. 
> 
>  1 struct A
>  2 {
>  3  size_t size;
>  4  char buf[] __attribute__((counted_by(size)));
>  5 };
>  6 
>  7 size_t 
>  8 foo (size_t sz)
>  9 {
> 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 11  obj->size = sz;
> 12  obj->buf[0] = 2;
> 13  return __builtin_dynamic_object_size (obj->buf, 1);
> 14 }
> 
> Please see a more complicate example in the Appendex 1.
> 
> We need to represent such data dependency correctly in the IL. 
> 
> 2. The solution:
> 
> 2.1 Summary
> 
> * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information for every reference to a FAM field;
> * In C FE, Replace every reference to a FAM field whose TYPE has the "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound sanitizer, query the size information or ACCESS_MODE information from the new internal function;
> * When the size information and the "ACCESS_MODE" information are not used anymore, possibly at the 2nd object size phase, replace the internal function with the actual reference to the FAM field; 
> * Some adjustment to inlining heuristic, ipa alias analysis, and other SSA passes to mitigate the impact to the optimizer and code generation. 
> 
> 2.2 The new internal function 
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "REF_TO_OBJ" same as the 1st argument;
> 
> 1st argument "REF_TO_OBJ": The reference to the object;
> 2nd argument "REF_TO_SIZE": The reference to the size of the object, 
> 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents 
>   0: unknown;
>   1: the number of the elements of the object type;
>   2: the number of bytes; 
> 4th argument "SIZE_OF_SIZE": how many bytes is the object that REF_TO_SIZE points;
> 5th argument "ACCESS_MODE": 
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write
> 
> NOTEs, 
>  A. This new internal function is intended for a more general use from all the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode the "size" and "access_mode" information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503);
>  B. For "counted_by" the 3rd argument will be 1;
>  C. For "counted_by" and "alloc_size" attributes, the 5th argument will be -1;   
>  D. In this wrieup, we focus on the implementation details for the "counted_by" attribute. However, this function should be ready to be used by "access" and "alloc_size" without issue. 
> 
> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
>  struct A
>  {
>   size_t size;
>   char buf[] __attribute__((counted_by(size)));
>  };
> 
> for any object with such type:
> 
>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The initialization to the size field should be done before the first reference to the FAM field,
> Otherwise, the behavior is undefined.
> Such additional requirement to the user will guarantee that the first reference to the FAM knows the size of the FAM.  
> 
> Another thing that need to be clarified is:
> A later reference to the FAM field will use the latest value assigned to the size field before that reference. For example, 
> obj->size = val1;
> ref1 (obj->buf);
> obj->size = val2;
> ref2 (obj->buf);
> in the above, "ref1" will use val1 and "ref2" will use val2. 
> This clarification will inform user that the dynamic array feature is fully supported.
> 
> We need to add the above additional requirement and clarification to the user documentation.
> The complete user documentation is in Appendix 2. 
> 
> 2.4 Replace the reference to a FAM field with the new function .ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small example,
>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>  if YES, replace the reference to "obj->buf" with a call to
>      .ACCESS_WITH_SIZE (obj->buf, &obj->size, 1, sizeof(obj->size), -1); 
> 
> This includes the case when the object is statically allocated and initialized as following example:
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = &x.buf;
> 
> The reference to x.buf should be replaced by the .ACCESS_WITH_SIZE (&x.buf, &x.size, 1, sizeof(x.size), -1).
> 
> 
> 2.5 Query the size info 
> 
> There are multiple consumers of the size info (and ACCESS_MODE info):
> 
>  * __builtin_dynamic_object_size;
>  * array bound sanitizer;
> 
> in these consumers, get the size info from the 2nd, 3rd, and 4th arguments of the call to
> ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, 1, sizeof(SIZE), -1)
> 
> 2.6 Eliminate the internal function when not useful anymore
> 
> After the last consumer of the size information in the ACCESS_WITH_SIZE, We should replace the internal call with its first argument.
> 
> Do it in the 2nd object size phase. 
> 
> 2.7 Adjustment to inlining heuristic, IPA alias analysis and other IPA analysis
> 
> the FE changes:
> 
> obj->buf
> 
> to
> 
> _1 = obj->buf;
> _2 = &(obj->size);
> .ACCESS_WITH_SIZE (_1, _2, 1, sizeof (obj->size), -1)
> 
> there are major two changes:
> 
>  A. the # of Insns of the routine will be increased,
>  B. additional calls to an non-pure internal routines.
> 
> In order to minimize the impact on code generation and optimizaitons, We might need to
> 
>  * Adjust Inlining decision to ignore the additional insns and calls;
>  * Adjust the routine "call_may_clobber_ref_p_1" in tree-ssa-alias.cc to exclude the new internal .ACCESS_WITH_SIZE from clobbering anything. 
> 
> 
> 3. The patch sets:
> 
> We will provide the following patches:
> 
>  3.1 Provide counted_by attribute to flexible array member field;
>      which includes:
>      * "counted_by" attribute documentation;  
>      * C FE handling of the new attribute; 
>        syntax checking, error reporting;
>      * testing cases; (aditional testing cases for staticly initialized object)
> 
>  3.2 Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.  
>      which includes:
>      * The definition of the new internal function .ACCESS_WITH_SIZE in internal-fn.def. 
>      * C FE converts every reference to a FAM with "counted_by" attribute to a call to the internal function .ACCESS_WITH_SIZE.
>        (build_component_ref in c_typeck.cc)
>      * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>        (After the size and access_mode information are used in the 2nd object size phase, we need to convert .ACCESS_WITH_SIZE back to its first argument as soon as possible in order to minimize the impact to other phases after object size phase. So, do this in the end of the 2nd object size phase)
>      * adjust inlining heuristic for the additional call. 
>      * adjust alias analysis to exclude the new internal from clobbering anything.
>        (call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
>      * verify a call to .ACCESS_WITH_SIZE in tree-cfg.cc.
>      * provide utility routines to get the size or ACCESS_MODE from the call to .ACCESS_WITH_SIZE. 
> 
>  3.3 Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
>      which includes: 
>      * use the size info of the .ACCESS_WITH_SIZE for sub-object.
>      * testing cases. (additional testing cases for dynamic array support)
> 
>  3.4 Use the .ACCESS_WITH_SIZE in bound sanitizer
>      which includes:
>      * use the size info of the .ACCESS_WITH_SIZE for bound sanitizer.
>      * testing cases. (additional testing cases for dynamic array support)
> 
>  3.5 Improve __bdos to use the counted_by info in whole-object size for the structure with FAM. 
>  3.6 Emit warnings when the user breaks the requirments for the new counted_by attribute
>      compilation time: -Wcounted-by
>      run time: -fsanitizer=counted-by
> 
>      * The initialization to the size field should be done before the first reference to the FAM field. 
>      * the array has at least # of elements specified by the size field all the time during the program.  
> 
> 
> 4. Future work
> 
>  4.1 Use the new .ACCESS_WITH_SIZE for other relative attributes, for example, "access" and "alloc_size", to resolve the known issues for them:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503
> 
>  4.2 Add "counted_by" attribute to general pointers as proposed by the -fbounds-safety proposal to LLVM from Apple (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854).  
> 
> the .ACCESS_WITH_SIZE approach should be adopted naturally. 
> 
> 
> Appendex 1. A more complicated example: 
> 
>  1 #include <malloc.h>
>  2 struct A
>  3 {
>  4  size_t size;
>  5  char buf[] __attribute__((counted_by(size)));
>  6 };
>  7 
>  8 static size_t
>  9 get_size_from (void *ptr)
> 10 {
> 11  return __builtin_dynamic_object_size (ptr, 1);
> 12 }
> 13 
> 14 void
> 15 foo (size_t sz)
> 16 {
> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 18  obj->size = sz;
> 19  obj->buf[0] = 2;
> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> 21  return;
> 22 }
> 23 
> 24 int main ()
> 25 {
> 26  foo (20);
> 27  return 0;
> 28 }
> 
> In this example, the instantiation of the object at line 17 is not in the same routine as the _BDOS call at line 11. 
> 
> Appendex 2. The user documentation:
> 
> 'counted_by (COUNT)'
>     The 'counted_by' attribute may be attached to the flexible array
>     member of a structure.  It indicates that the number of the
>     elements of the array is given by the field named "COUNT" in the
>     same structure as the flexible array member.  GCC uses this
>     information to improve the results of the array bound sanitizer and
>     the '__builtin_dynamic_object_size'.
> 
>     For instance, the following code:
> 
>          struct P {
>            size_t count;
>            char other;
>            char array[] __attribute__ ((counted_by (count)));
>          } *p;
> 
>     specifies that the 'array' is a flexible array member whose number
>     of elements is given by the field 'count' in the same structure.
> 
>     The field that represents the number of the elements should have an
>     integer type. Otherwise, the compiler will report a warning and ignore 
>     the attribute.
> 
>     An explicit 'counted_by' annotation defines a relationship between
>     two objects, 'p->array' and 'p->count', and there are the following
>     requirementthat on the relationship between this pair:
> 
>     A. 'p->count' should be initialized before the first reference to 
>        'p->array'.
>     B. "p->array' has _at least_ 'p->count' number of elements available
>        all the time.
>        This requirement must hold even after any of these related objects
>        are updated during the program.
> 
>     It's the user's responsibility to make sure the above requirments to 
>     be kept all the time.  Otherwise the compiler will report warnings, 
>     at the same time, the results of the array bound sanitizer and the
>     '__builtin_dynamic_object_size' is undefined. 
> 
>     One important feature of the attribute is, A reference to the flexible
>     array member field will use the latest value assigned to the field that
>     represent the number of the elements before that reference. For example,
>       p->count = val1;
>       ref1 (p->array);
>       p->count = val2;
>       ref2 (p->array);
>     in the above, 'ref1' will use 'val1' as the number of the elements in
>     'p->array', and "ref2" will use 'val2' as the number of elements in
>     'p->array'.
> 
> 
> Appendex 3. Other approaches that have been discussed:
> 
> A.  Add an additional argument, the size parameter, to the call to __bdos, 
>      A.1, during FE;
>      A.2, during gimplification phase;
> B.  Encode the implicit USE in the type of size, to make the size as “volatile”;
> C.  Encode the implicit USE in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
> 
> ** Approach A (both A.1 and A.2) will not work if the object instantiation is not in the same routine as the __bdos call as shown in the above example;
> ** Approach B will have a bigger performance impact due to the "volatile" will inhibit a lot of optimizations; 
> ** Approach C will involve a lot of changes in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.


  reply	other threads:[~2023-11-09 15:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07  0:12 Qing Zhao
2023-11-09 15:49 ` Qing Zhao [this message]
2023-11-09 15:57   ` Jakub Jelinek
2023-11-09 16:50     ` Jose E. Marchesi
2023-11-09 17:23       ` Qing Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2D704EB7-B3FE-4280-ACC0-AA0C4236E39F@oracle.com \
    --to=qing.zhao@oracle.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=isanbard@gmail.com \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=keescook@chromium.org \
    --cc=richard.guenther@gmail.com \
    --cc=siddhesh@gotplt.org \
    --cc=uecker@tugraz.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).