public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC: the proposal to resolve the missing dependency issue for counted_by attribute
@ 2023-10-31 16:26 Qing Zhao
  2023-10-31 17:35 ` Siddhesh Poyarekar
  2023-10-31 22:14 ` Joseph Myers
  0 siblings, 2 replies; 34+ messages in thread
From: Qing Zhao @ 2023-10-31 16:26 UTC (permalink / raw)
  To: Richard Biener, Joseph Myers, Siddhesh Poyarekar, Martin Uecker
  Cc: Kees Cook, Jakub Jelinek, isanbard, GCC Patches

Hi, 

I wrote a summary based on our extensive discussion, hopefully this can be served as an informal proposal. 

Please take a look at it and let me know any comment or suggestion.

There are some (???) in the section 3.2 and 3.6, those are my questions seeking for help.  -:)

Thanks again for all the help.

Qing.

========================================================
Represent the missing dependence for the "counted_by" attribute and its consumers 

Qing Zhao

10/30/2023
==============================================

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html

1. The problem

There is a data dependency between the size assignment and the implicit use of the size information in the __builtin_dynamic_object_size that is missing in the IL (line 11 and line 13 in the below example). Such information missing will result incorrect code reordering and other code transformations. 

  1 struct A
  2 {
  3  size_t size;
  4  char buf[] __attribute__((counted_by(size)));
  5 };
  6 
  7 size_t 
  8 foo (size_t sz)
  9 {
 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 11  obj->size = sz;
 12  obj->buf[0] = 2;
 13  return __builtin_dynamic_object_size (obj->buf, 1);
 14 }
  
Please see a more complicate example in the Appendex 1.

We need to represent such data dependency correctly in the IL. 

2. The solution:

2.1 Summary

* Add a new internal function "ACCESS_WITH_SIZE" to carry the size information for every FAM field access;
* In C FE, Replace every FAM field access whose TYPE has the "counted_by" attribute with the new internal function "ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound sanitizer, query the size information or ACCESS_MODE information from the new internal function;
* When the size information and the "ACCESS_MODE" information are not used anymore, possibly at the 2nd object size phase, replace the internal function with the actual FAM field access; 
* Some adjustment to inlining heuristic and some SSA passes to mitigate the impact to the optimizer and code generation. 

2.2 The new internal function 

  .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "PTR" same as the 1st argument;

1st argument "PTR": Pointer to the object;
2nd argument "SIZE": The size of the pointed object, 
  if the pointee of the "PTR" has a
    * real type, it's the number of the elements of the type;
    * void type, it's the number of bytes; 
3rd argument "ACCESS_MODE": 
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

NOTEs, 
  A. This new internal function is intended for a more general use from all the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode the "size" and "access_mode" information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
  B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.   
  C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. However, this function should be ready to be used by "access" and "alloc_size" without issue. 

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

  struct A
  {
   size_t size;
   char buf[] __attribute__((counted_by(size)));
  };

for any object with such type:

  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The setting to the size field should be done before the first reference to the FAM field.

Such requirement to the user will guarantee that the first reference to the FAM knows the size of the FAM.  

We need to add this additional requirement to the user document.

2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
  if YES, replace the reference to "obj->buf" with a call to
      .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 

2.5 Query the size info 

There are multiple consumers of the size info (and ACCESS_MODE info):

  * __builtin_dynamic_object_size;
  * array bound sanitizer;

in these consumers, get the size info from the 2nd argument of the call to
ACCESS_WITH_SIZE (PTR, SIZE, -1)

2.6 Eliminate the internal function when not useful anymore

After the last consumer of the size information in the ACCESS_WITH_SIZE, We should replace the internal call with its first argument.

Do it in the 2nd object size phase. 

2.7 Adjustment to inlining heuristic and other IPA analysis

the FE changes:

obj->buf

to

_1 = obj->buf;
_2 = obj->size;
.ACCESS_WITH_SIZE (_1, _2, -1)

there are major two changes:

  A. the # of LOADs, the # of Insns of the routine will be increased,
  B. additional calls to internal routines.

As a result:
  Inlining decision and some IPA analysis might be changed due to these changes. 

We need to adjust inlining heurisic and IPA analysis properly.

3. The patch sets:

We will provide the following patches:

  3.1 Provide counted_by attribute to flexible array member field;
      which includes:
      * "counted_by" attribute documentation;  
      * C FE handling of the new attribute; 
        syntax checking, error reporting;
      * testing cases;

  3.2 Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.  
      which includes:
      * The definition of the new internal function .ACCESS_WITH_SIZE in internal-fn.def. 
      * C FE converts every access to a FAM with "counted_by" attribute to a call to the internal function .ACCESS_WITH_SIZE. (where in the FE, which routine should look at???)
      * Convert every call to .ACCESS_WITH_SIZE to its first argument. where should we do this? in the end of the 2nd object size phase or in the expand phase? (I prefer to do this in the end of the 2nd object size phase, comments???)
      * adjust inlining heuristic for the additional call and loads (???).
      * Other phases need to be adjusted (???).
      * verify a call to .ACCESS_WITH_SIZE in tree-cfg.cc.
      * provide utility routines to get the size or ACCESS_MODE from the call to .ACCESS_WITH_SIZE. 

  3.3 Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
      which includes: 
      * use the size info of the .ACCESS_WITH_SIZE for sub-object.
      * testing cases.

  3.4 Use the .ACCESS_WITH_SIZE in bound sanitizer
      which includes:
      * use the size info of the .ACCESS_WITH_SIZE for bound sanitizer.
      * testing cases.

  3.5 Improve __bdos to use the counted_by info in whole-object size for the structure with FAM. 
  3.6 Support for dynamic array of FMA (???)  
      https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634568.html

  3.7 Emit warnings when the user breaks the requirments for the new counted_by attribute
      compilation time: -Wcounted-by
      run time: -fsanitizer=counted-by
      
      * The setting to the size field should be done before the first reference to the FAM field. 
      * the array has at least # of elements specified by the size field all the time during the program.  
 

4. Future work
  
  4.1 Use the new .ACCESS_WITH_SIZE for other relative attributes, for example, "access" and "alloc_size", to resolve the known issues for them:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503

  4.2 Add "counted_by" attribute to general pointers as proposed by the -fbounds-safety proposal to LLVM from Apple (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854).  

the .ACCESS_WITH_SIZE approach should be adopted naturally. 


Appendex 1. A more complicated example: 

  1 #include <malloc.h>
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7 
  8 static size_t
  9 get_size_from (void *ptr)
 10 {
 11  return __builtin_dynamic_object_size (ptr, 1);
 12 }
 13 
 14 void
 15 foo (size_t sz)
 16 {
 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 18  obj->size = sz;
 19  obj->buf[0] = 2;
 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
 21  return;
 22 }
 23 
 24 int main ()
 25 {
 26  foo (20);
 27  return 0;
 28 }

In this example, the instantiation of the object at line 17 is not in the same routine as the _BDOS call at line 11. 

Appendex 2. Other approaches that have been discussed:

A.  Add an additional argument, the size parameter, to the call to __bdos, 
      A.1, during FE;
      A.2, during gimplification phase;
B.  Encode the implicit USE in the type of size, to make the size as “volatile”;
C.  Encode the implicit USE in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.

** Approach A (both A.1 and A.2) will not work if the object instantiation is not in the same routine as the __bdos call as shown in the above example;
** Approach B will have a bigger performance impact due to the "volatile" will inhibit a lot of optimizations; 
** Approach C will involve a lot of changes in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-10-31 16:26 RFC: the proposal to resolve the missing dependency issue for counted_by attribute Qing Zhao
@ 2023-10-31 17:35 ` Siddhesh Poyarekar
  2023-10-31 18:35   ` Qing Zhao
  2023-10-31 22:14 ` Joseph Myers
  1 sibling, 1 reply; 34+ messages in thread
From: Siddhesh Poyarekar @ 2023-10-31 17:35 UTC (permalink / raw)
  To: Qing Zhao, Richard Biener, Joseph Myers, Martin Uecker
  Cc: Kees Cook, Jakub Jelinek, isanbard, GCC Patches

On 2023-10-31 12:26, Qing Zhao wrote:
> Hi,
> 
> I wrote a summary based on our extensive discussion, hopefully this can be served as an informal proposal.
> 
> Please take a look at it and let me know any comment or suggestion.
> 
> There are some (???) in the section 3.2 and 3.6, those are my questions seeking for help.  -:)
> 
> Thanks again for all the help.
> 
> Qing.
> 
> ========================================================
> Represent the missing dependence for the "counted_by" attribute and its consumers
> 
> Qing Zhao
> 
> 10/30/2023
> ==============================================
> 
> The whole discussion is at:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
> 
> 1. The problem
> 
> There is a data dependency between the size assignment and the implicit use of the size information in the __builtin_dynamic_object_size that is missing in the IL (line 11 and line 13 in the below example). Such information missing will result incorrect code reordering and other code transformations.
> 
>    1 struct A
>    2 {
>    3  size_t size;
>    4  char buf[] __attribute__((counted_by(size)));
>    5 };
>    6
>    7 size_t
>    8 foo (size_t sz)
>    9 {
>   10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>   11  obj->size = sz;
>   12  obj->buf[0] = 2;
>   13  return __builtin_dynamic_object_size (obj->buf, 1);
>   14 }
>    
> Please see a more complicate example in the Appendex 1.
> 
> We need to represent such data dependency correctly in the IL.
> 
> 2. The solution:
> 
> 2.1 Summary
> 
> * Add a new internal function "ACCESS_WITH_SIZE" to carry the size information for every FAM field access;
> * In C FE, Replace every FAM field access whose TYPE has the "counted_by" attribute with the new internal function "ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound sanitizer, query the size information or ACCESS_MODE information from the new internal function;
> * When the size information and the "ACCESS_MODE" information are not used anymore, possibly at the 2nd object size phase, replace the internal function with the actual FAM field access;
> * Some adjustment to inlining heuristic and some SSA passes to mitigate the impact to the optimizer and code generation.
> 
> 2.2 The new internal function
> 
>    .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "PTR" same as the 1st argument;
> 
> 1st argument "PTR": Pointer to the object;
> 2nd argument "SIZE": The size of the pointed object,
>    if the pointee of the "PTR" has a
>      * real type, it's the number of the elements of the type;
>      * void type, it's the number of bytes;
> 3rd argument "ACCESS_MODE":
>    -1: Unknown access semantics
>     0: none
>     1: read_only
>     2: write_only
>     3: read_write
> 
> NOTEs,
>    A. This new internal function is intended for a more general use from all the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode the "size" and "access_mode" information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
>    B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.
>    C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. However, this function should be ready to be used by "access" and "alloc_size" without issue.
> 
> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
>    struct A
>    {
>     size_t size;
>     char buf[] __attribute__((counted_by(size)));
>    };
> 
> for any object with such type:
> 
>    struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The setting to the size field should be done before the first reference to the FAM field.

A more flexible specification could be stating that validation for a 
reference to the FAM field will use the latest value assigned to the 
size field before that reference.  That will allow for situations like:

   o->size = val1;
   deref (o->buf);
   o->size = val2;

making it clear that deref will see val1 and not val2.

> 
> Such requirement to the user will guarantee that the first reference to the FAM knows the size of the FAM.
> 
> We need to add this additional requirement to the user document.
> 
> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small example,
>    check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>    if YES, replace the reference to "obj->buf" with a call to
>        .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> 
> 2.5 Query the size info
> 
> There are multiple consumers of the size info (and ACCESS_MODE info):
> 
>    * __builtin_dynamic_object_size;
>    * array bound sanitizer;
> 
> in these consumers, get the size info from the 2nd argument of the call to
> ACCESS_WITH_SIZE (PTR, SIZE, -1)
> 
> 2.6 Eliminate the internal function when not useful anymore
> 
> After the last consumer of the size information in the ACCESS_WITH_SIZE, We should replace the internal call with its first argument.
> 
> Do it in the 2nd object size phase.
> 
> 2.7 Adjustment to inlining heuristic and other IPA analysis
> 
> the FE changes:
> 
> obj->buf
> 
> to
> 
> _1 = obj->buf;
> _2 = obj->size;
> .ACCESS_WITH_SIZE (_1, _2, -1)
> 
> there are major two changes:
> 
>    A. the # of LOADs, the # of Insns of the routine will be increased,
>    B. additional calls to internal routines.
> 
> As a result:
>    Inlining decision and some IPA analysis might be changed due to these changes.
> 
> We need to adjust inlining heurisic and IPA analysis properly.
> 
> 3. The patch sets:
> 
> We will provide the following patches:
> 
>    3.1 Provide counted_by attribute to flexible array member field;
>        which includes:
>        * "counted_by" attribute documentation;
>        * C FE handling of the new attribute;
>          syntax checking, error reporting;
>        * testing cases;
> 
>    3.2 Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
>        which includes:
>        * The definition of the new internal function .ACCESS_WITH_SIZE in internal-fn.def.
>        * C FE converts every access to a FAM with "counted_by" attribute to a call to the internal function .ACCESS_WITH_SIZE. (where in the FE, which routine should look at???)
>        * Convert every call to .ACCESS_WITH_SIZE to its first argument. where should we do this? in the end of the 2nd object size phase or in the expand phase? (I prefer to do this in the end of the 2nd object size phase, comments???)

I'd say be like __bos here, i.e. replace instances in the objsz2 pass 
and also as a fallback, implement expand_builtin_access_with_size (like 
expand_builtin_object_size) to replace all remaining instances of the 
builtin with the first arg.

>        * adjust inlining heuristic for the additional call and loads (???).
>        * Other phases need to be adjusted (???).

Do we need this?  I thought expressing the dependency with the builtin 
ought to be sufficient.

>        * verify a call to .ACCESS_WITH_SIZE in tree-cfg.cc.
>        * provide utility routines to get the size or ACCESS_MODE from the call to .ACCESS_WITH_SIZE.
> 
>    3.3 Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
>        which includes:
>        * use the size info of the .ACCESS_WITH_SIZE for sub-object.
>        * testing cases.
> 
>    3.4 Use the .ACCESS_WITH_SIZE in bound sanitizer
>        which includes:
>        * use the size info of the .ACCESS_WITH_SIZE for bound sanitizer.
>        * testing cases.
> 
>    3.5 Improve __bdos to use the counted_by info in whole-object size for the structure with FAM.
>    3.6 Support for dynamic array of FMA (???)
>        https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634568.html

I'm not entirely convinced that it's an issue.  __bdos will just look at 
__counted_by__ and use its most recent value (using the above user 
specification) and with that it's just a matter of application 
preference.  If they want dynamic array behaviour, they could just 
annotate the array count member with __counted_by__ instead of the array 
capacity member.

Thanks,
Sid


> 
>    3.7 Emit warnings when the user breaks the requirments for the new counted_by attribute
>        compilation time: -Wcounted-by
>        run time: -fsanitizer=counted-by
>        
>        * The setting to the size field should be done before the first reference to the FAM field.
>        * the array has at least # of elements specified by the size field all the time during the program.
>   
> 
> 4. Future work
>    
>    4.1 Use the new .ACCESS_WITH_SIZE for other relative attributes, for example, "access" and "alloc_size", to resolve the known issues for them:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503
> 
>    4.2 Add "counted_by" attribute to general pointers as proposed by the -fbounds-safety proposal to LLVM from Apple (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854).
> 
> the .ACCESS_WITH_SIZE approach should be adopted naturally.
> 
> 
> Appendex 1. A more complicated example:
> 
>    1 #include <malloc.h>
>    2 struct A
>    3 {
>    4  size_t size;
>    5  char buf[] __attribute__((counted_by(size)));
>    6 };
>    7
>    8 static size_t
>    9 get_size_from (void *ptr)
>   10 {
>   11  return __builtin_dynamic_object_size (ptr, 1);
>   12 }
>   13
>   14 void
>   15 foo (size_t sz)
>   16 {
>   17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>   18  obj->size = sz;
>   19  obj->buf[0] = 2;
>   20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>   21  return;
>   22 }
>   23
>   24 int main ()
>   25 {
>   26  foo (20);
>   27  return 0;
>   28 }
> 
> In this example, the instantiation of the object at line 17 is not in the same routine as the _BDOS call at line 11.
> 
> Appendex 2. Other approaches that have been discussed:
> 
> A.  Add an additional argument, the size parameter, to the call to __bdos,
>        A.1, during FE;
>        A.2, during gimplification phase;
> B.  Encode the implicit USE in the type of size, to make the size as “volatile”;
> C.  Encode the implicit USE in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
> 
> ** Approach A (both A.1 and A.2) will not work if the object instantiation is not in the same routine as the __bdos call as shown in the above example;
> ** Approach B will have a bigger performance impact due to the "volatile" will inhibit a lot of optimizations;
> ** Approach C will involve a lot of changes in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-10-31 17:35 ` Siddhesh Poyarekar
@ 2023-10-31 18:35   ` Qing Zhao
  0 siblings, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-10-31 18:35 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Jakub Jelinek
  Cc: Richard Biener, Joseph Myers, Martin Uecker, Kees Cook,
	Jakub Jelinek, isanbard, GCC Patches



> On Oct 31, 2023, at 1:35 PM, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 2023-10-31 12:26, Qing Zhao wrote:
>> Hi,
>> I wrote a summary based on our extensive discussion, hopefully this can be served as an informal proposal.
>> Please take a look at it and let me know any comment or suggestion.
>> There are some (???) in the section 3.2 and 3.6, those are my questions seeking for help.  -:)
>> Thanks again for all the help.
>> Qing.
>> ========================================================
>> Represent the missing dependence for the "counted_by" attribute and its consumers
>> Qing Zhao
>> 10/30/2023
>> ==============================================
>> The whole discussion is at:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
>> 1. The problem
>> There is a data dependency between the size assignment and the implicit use of the size information in the __builtin_dynamic_object_size that is missing in the IL (line 11 and line 13 in the below example). Such information missing will result incorrect code reordering and other code transformations.
>>   1 struct A
>>   2 {
>>   3  size_t size;
>>   4  char buf[] __attribute__((counted_by(size)));
>>   5 };
>>   6
>>   7 size_t
>>   8 foo (size_t sz)
>>   9 {
>>  10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>  11  obj->size = sz;
>>  12  obj->buf[0] = 2;
>>  13  return __builtin_dynamic_object_size (obj->buf, 1);
>>  14 }
>>   Please see a more complicate example in the Appendex 1.
>> We need to represent such data dependency correctly in the IL.
>> 2. The solution:
>> 2.1 Summary
>> * Add a new internal function "ACCESS_WITH_SIZE" to carry the size information for every FAM field access;
>> * In C FE, Replace every FAM field access whose TYPE has the "counted_by" attribute with the new internal function "ACCESS_WITH_SIZE";
>> * In every consumer of the size information, for example, BDOS or array bound sanitizer, query the size information or ACCESS_MODE information from the new internal function;
>> * When the size information and the "ACCESS_MODE" information are not used anymore, possibly at the 2nd object size phase, replace the internal function with the actual FAM field access;
>> * Some adjustment to inlining heuristic and some SSA passes to mitigate the impact to the optimizer and code generation.
>> 2.2 The new internal function
>>   .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>> which returns the "PTR" same as the 1st argument;
>> 1st argument "PTR": Pointer to the object;
>> 2nd argument "SIZE": The size of the pointed object,
>>   if the pointee of the "PTR" has a
>>     * real type, it's the number of the elements of the type;
>>     * void type, it's the number of bytes;
>> 3rd argument "ACCESS_MODE":
>>   -1: Unknown access semantics
>>    0: none
>>    1: read_only
>>    2: write_only
>>    3: read_write
>> NOTEs,
>>   A. This new internal function is intended for a more general use from all the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode the "size" and "access_mode" information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
>>   B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.
>>   C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. However, this function should be ready to be used by "access" and "alloc_size" without issue.
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> For the following structure including a FAM with a counted_by attribute:
>>   struct A
>>   {
>>    size_t size;
>>    char buf[] __attribute__((counted_by(size)));
>>   };
>> for any object with such type:
>>   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> The setting to the size field should be done before the first reference to the FAM field.
> 
> A more flexible specification could be stating that validation for a reference to the FAM field will use the latest value assigned to the size field before that reference.  That will allow for situations like:
> 
>  o->size = val1;
>  deref (o->buf);
>  o->size = val2;
> 
> making it clear that deref will see val1 and not val2.

Good point! Yes, with the current design, this is reasonable. 
Will update the proposal with this.
> 
>> Such requirement to the user will guarantee that the first reference to the FAM knows the size of the FAM.
>> We need to add this additional requirement to the user document.
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> In C FE:
>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>   if YES, replace the reference to "obj->buf" with a call to
>>       .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>> 2.5 Query the size info
>> There are multiple consumers of the size info (and ACCESS_MODE info):
>>   * __builtin_dynamic_object_size;
>>   * array bound sanitizer;
>> in these consumers, get the size info from the 2nd argument of the call to
>> ACCESS_WITH_SIZE (PTR, SIZE, -1)
>> 2.6 Eliminate the internal function when not useful anymore
>> After the last consumer of the size information in the ACCESS_WITH_SIZE, We should replace the internal call with its first argument.
>> Do it in the 2nd object size phase.
>> 2.7 Adjustment to inlining heuristic and other IPA analysis
>> the FE changes:
>> obj->buf
>> to
>> _1 = obj->buf;
>> _2 = obj->size;
>> .ACCESS_WITH_SIZE (_1, _2, -1)
>> there are major two changes:
>>   A. the # of LOADs, the # of Insns of the routine will be increased,
>>   B. additional calls to internal routines.
>> As a result:
>>   Inlining decision and some IPA analysis might be changed due to these changes.
>> We need to adjust inlining heurisic and IPA analysis properly.
>> 3. The patch sets:
>> We will provide the following patches:
>>   3.1 Provide counted_by attribute to flexible array member field;
>>       which includes:
>>       * "counted_by" attribute documentation;
>>       * C FE handling of the new attribute;
>>         syntax checking, error reporting;
>>       * testing cases;
>>   3.2 Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
>>       which includes:
>>       * The definition of the new internal function .ACCESS_WITH_SIZE in internal-fn.def.
>>       * C FE converts every access to a FAM with "counted_by" attribute to a call to the internal function .ACCESS_WITH_SIZE. (where in the FE, which routine should look at???)
>>       * Convert every call to .ACCESS_WITH_SIZE to its first argument. where should we do this? in the end of the 2nd object size phase or in the expand phase? (I prefer to do this in the end of the 2nd object size phase, comments???)
> 
> I'd say be like __bos here, i.e. replace instances in the objsz2 pass and also as a fallback, implement expand_builtin_access_with_size (like expand_builtin_object_size) to replace all remaining instances of the builtin with the first arg.

Okay. This is reasonable too.
> 
>>       * adjust inlining heuristic for the additional call and loads (???).
>>       * Other phases need to be adjusted (???).
> 
> Do we need this?  I thought expressing the dependency with the builtin ought to be sufficient.

Yes, I think we need this. Not for correctness, but for mitigating the performance impact due to the possible inlining decision change or other IPA analysis impacted by the new function calls.
Please see Jakub’s comment at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634343.html

> 
>>       * verify a call to .ACCESS_WITH_SIZE in tree-cfg.cc.
>>       * provide utility routines to get the size or ACCESS_MODE from the call to .ACCESS_WITH_SIZE.
>>   3.3 Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
>>       which includes:
>>       * use the size info of the .ACCESS_WITH_SIZE for sub-object.
>>       * testing cases.
>>   3.4 Use the .ACCESS_WITH_SIZE in bound sanitizer
>>       which includes:
>>       * use the size info of the .ACCESS_WITH_SIZE for bound sanitizer.
>>       * testing cases.
>>   3.5 Improve __bdos to use the counted_by info in whole-object size for the structure with FAM.
>>   3.6 Support for dynamic array of FMA (???)
>>       https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634568.html
> 
> I'm not entirely convinced that it's an issue.  __bdos will just look at __counted_by__ and use its most recent value (using the above user specification) and with that it's just a matter of application preference.  If they want dynamic array behaviour, they could just annotate the array count member with __counted_by__ instead of the array capacity member.

This makes sense.  Yes, will update the proposal with this.

Thanks.

Qing
> 
> Thanks,
> Sid
> 
> 
>>   3.7 Emit warnings when the user breaks the requirments for the new counted_by attribute
>>       compilation time: -Wcounted-by
>>       run time: -fsanitizer=counted-by
>>              * The setting to the size field should be done before the first reference to the FAM field.
>>       * the array has at least # of elements specified by the size field all the time during the program.
>>  4. Future work
>>      4.1 Use the new .ACCESS_WITH_SIZE for other relative attributes, for example, "access" and "alloc_size", to resolve the known issues for them:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503
>>   4.2 Add "counted_by" attribute to general pointers as proposed by the -fbounds-safety proposal to LLVM from Apple (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854).
>> the .ACCESS_WITH_SIZE approach should be adopted naturally.
>> Appendex 1. A more complicated example:
>>   1 #include <malloc.h>
>>   2 struct A
>>   3 {
>>   4  size_t size;
>>   5  char buf[] __attribute__((counted_by(size)));
>>   6 };
>>   7
>>   8 static size_t
>>   9 get_size_from (void *ptr)
>>  10 {
>>  11  return __builtin_dynamic_object_size (ptr, 1);
>>  12 }
>>  13
>>  14 void
>>  15 foo (size_t sz)
>>  16 {
>>  17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>  18  obj->size = sz;
>>  19  obj->buf[0] = 2;
>>  20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>  21  return;
>>  22 }
>>  23
>>  24 int main ()
>>  25 {
>>  26  foo (20);
>>  27  return 0;
>>  28 }
>> In this example, the instantiation of the object at line 17 is not in the same routine as the _BDOS call at line 11.
>> Appendex 2. Other approaches that have been discussed:
>> A.  Add an additional argument, the size parameter, to the call to __bdos,
>>       A.1, during FE;
>>       A.2, during gimplification phase;
>> B.  Encode the implicit USE in the type of size, to make the size as “volatile”;
>> C.  Encode the implicit USE in the type of buf, then update the optimization passes to use this implicit USE encoded in the type of buf.
>> ** Approach A (both A.1 and A.2) will not work if the object instantiation is not in the same routine as the __bdos call as shown in the above example;
>> ** Approach B will have a bigger performance impact due to the "volatile" will inhibit a lot of optimizations;
>> ** Approach C will involve a lot of changes in GCC, and also not very necessary since the ONLY implicit use of the size parameter is in the __bdos call when __bdos can see the real object.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-10-31 16:26 RFC: the proposal to resolve the missing dependency issue for counted_by attribute Qing Zhao
  2023-10-31 17:35 ` Siddhesh Poyarekar
@ 2023-10-31 22:14 ` Joseph Myers
  2023-11-01 14:47   ` Qing Zhao
  1 sibling, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2023-10-31 22:14 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Richard Biener, Siddhesh Poyarekar, Martin Uecker, Kees Cook,
	Jakub Jelinek, isanbard, GCC Patches

On Tue, 31 Oct 2023, Qing Zhao wrote:

> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
>   struct A
>   {
>    size_t size;
>    char buf[] __attribute__((counted_by(size)));
>   };
> 
> for any object with such type:
> 
>   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The setting to the size field should be done before the first reference 
> to the FAM field.
> 
> Such requirement to the user will guarantee that the first reference to 
> the FAM knows the size of the FAM.
> 
> We need to add this additional requirement to the user document.

Make sure the manual is very specific about exactly when size is 
considered to be an accurate representation of the space available for buf 
(given that, after malloc or realloc, it's going to be temporarily 
inaccurate).  If the intent is that inaccurate size at such a time means 
undefined behavior, say so explicitly.

> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small example,
>   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>   if YES, replace the reference to "obj->buf" with a call to
>       .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 

This seems plausible - but you should also consider the case of static 
initializers - remember the GNU extension for statically allocated objects 
with flexible array members (unless you're not allowing it with 
counted_by).

static struct A x = { sizeof "hello", "hello" };
static char *y = &x.buf;

I'd expect that to be valid - and unless you say such a usage is invalid, 
you should avoid the replacement in such a static initializer context when 
the FAM reference is to an object with a constant address (if 
.ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
expression; if it works fine as a constant-address lvalue, then the 
replacement would be OK).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-10-31 22:14 ` Joseph Myers
@ 2023-11-01 14:47   ` Qing Zhao
  2023-11-01 15:00     ` Martin Uecker
  2023-11-02  7:57     ` Richard Biener
  0 siblings, 2 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-01 14:47 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Biener, Siddhesh Poyarekar, Martin Uecker, Kees Cook,
	Jakub Jelinek, isanbard, GCC Patches



> On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Tue, 31 Oct 2023, Qing Zhao wrote:
> 
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> 
>> For the following structure including a FAM with a counted_by attribute:
>> 
>>  struct A
>>  {
>>   size_t size;
>>   char buf[] __attribute__((counted_by(size)));
>>  };
>> 
>> for any object with such type:
>> 
>>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 
>> The setting to the size field should be done before the first reference 
>> to the FAM field.
>> 
>> Such requirement to the user will guarantee that the first reference to 
>> the FAM knows the size of the FAM.
>> 
>> We need to add this additional requirement to the user document.
> 
> Make sure the manual is very specific about exactly when size is 
> considered to be an accurate representation of the space available for buf 
> (given that, after malloc or realloc, it's going to be temporarily 
> inaccurate).  If the intent is that inaccurate size at such a time means 
> undefined behavior, say so explicitly.

Yes, good point. We need to define this clearly in the beginning. 
We need to explicit say that 

the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.

Is the above good enough?


> 
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> 
>> In C FE:
>> 
>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>  if YES, replace the reference to "obj->buf" with a call to
>>      .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
> 
> This seems plausible - but you should also consider the case of static 
> initializers - remember the GNU extension for statically allocated objects 
> with flexible array members (unless you're not allowing it with 
> counted_by).
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = &x.buf;
> 
> I'd expect that to be valid - and unless you say such a usage is invalid, 

At this moment, I think that this should be valid.

I,e, the following:

struct A
{
 size_t size;
 char buf[] __attribute__((counted_by(size)));
};

static struct A x = {sizeof "hello", "hello”};

Should be valid, and x.size represents the number of elements of x.buf. 
Both x.size and x.buf are initialized statically. 

> you should avoid the replacement in such a static initializer context when 
> the FAM reference is to an object with a constant address (if 
> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
> expression; if it works fine as a constant-address lvalue, then the 
> replacement would be OK).

Then if such usage for the “counted_by” is valid, we need to replace the FAM 
reference by a call to  .ACCESS_WITH_SIZE as well.
Otherwise the “counted_by” relationship will be lost to the Middle end. 

With the current definition of .ACCESS_WITH_SIZE

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

Isn’t the PTR (return value of the call) a LVALUE? 

Qing
> 
> -- 
> Joseph S. Myers
> joseph@codesourcery.com


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-01 14:47   ` Qing Zhao
@ 2023-11-01 15:00     ` Martin Uecker
  2023-11-01 15:48       ` Qing Zhao
  2023-11-02  7:57     ` Richard Biener
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Uecker @ 2023-11-01 15:00 UTC (permalink / raw)
  To: Qing Zhao, Joseph Myers; +Cc: GCC Patches

Am Mittwoch, dem 01.11.2023 um 14:47 +0000 schrieb Qing Zhao:
> 
> > On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> > 
> > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > 
> > > 2.3 A new semantic requirement in the user documentation of "counted_by"
> > > 
> > > For the following structure including a FAM with a counted_by attribute:
> > > 
> > >  struct A
> > >  {
> > >   size_t size;
> > >   char buf[] __attribute__((counted_by(size)));
> > >  };
> > > 
> > > for any object with such type:
> > > 
> > >  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > 
> > > The setting to the size field should be done before the first reference 
> > > to the FAM field.
> > > 
> > > Such requirement to the user will guarantee that the first reference to 
> > > the FAM knows the size of the FAM.
> > > 
> > > We need to add this additional requirement to the user document.
> > 
> > Make sure the manual is very specific about exactly when size is 
> > considered to be an accurate representation of the space available for buf 
> > (given that, after malloc or realloc, it's going to be temporarily 
> > inaccurate).  If the intent is that inaccurate size at such a time means 
> > undefined behavior, say so explicitly.
> 
> Yes, good point. We need to define this clearly in the beginning. 
> We need to explicit say that 
> 
> the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.

It is defined by the latest "counted_by" value before x.buf
is referenced, but not the latest before x.buf is dereferenced.

> 
> Is the above good enough?
> 
> 
> > 
> > > 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > > 
> > > In C FE:
> > > 
> > > for every reference to a FAM, for example, "obj->buf" in the small example,
> > >  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> > >  if YES, replace the reference to "obj->buf" with a call to
> > >      .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
> > 
> > This seems plausible - but you should also consider the case of static 
> > initializers - remember the GNU extension for statically allocated objects 
> > with flexible array members (unless you're not allowing it with 
> > counted_by).
> > 
> > static struct A x = { sizeof "hello", "hello" };
> > static char *y = &x.buf;
> > 
> > I'd expect that to be valid - and unless you say such a usage is invalid, 
> 
> At this moment, I think that this should be valid.
> 
> I,e, the following:
> 
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> };
> 
> static struct A x = {sizeof "hello", "hello”};
> 
> Should be valid, and x.size represents the number of elements of x.buf. 
> Both x.size and x.buf are initialized statically. 

Joseph is talking about the compile-time initialization of y.

> 
> > you should avoid the replacement in such a static initializer context when 
> > the FAM reference is to an object with a constant address (if 
> > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
> > expression; if it works fine as a constant-address lvalue, then the 
> > replacement would be OK).
> 
> Then if such usage for the “counted_by” is valid, we need to replace the FAM 
> reference by a call to  .ACCESS_WITH_SIZE as well.
> Otherwise the “counted_by” relationship will be lost to the Middle end. 
> 
> With the current definition of .ACCESS_WITH_SIZE
> 
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> 
> Isn’t the PTR (return value of the call) a LVALUE? 

The question is whether we get an address constant
that can be used for compile-time initialization.

I think it would be good to collect a list of test
cases and to include this example.

Martin

> 
> Qing
> > 
> > -- 
> > Joseph S. Myers
> > joseph@codesourcery.com
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-01 15:00     ` Martin Uecker
@ 2023-11-01 15:48       ` Qing Zhao
  0 siblings, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-01 15:48 UTC (permalink / raw)
  To: Martin Uecker; +Cc: Joseph Myers, GCC Patches



> On Nov 1, 2023, at 11:00 AM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> Am Mittwoch, dem 01.11.2023 um 14:47 +0000 schrieb Qing Zhao:
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>> 
>>>> For the following structure including a FAM with a counted_by attribute:
>>>> 
>>>> struct A
>>>> {
>>>>  size_t size;
>>>>  char buf[] __attribute__((counted_by(size)));
>>>> };
>>>> 
>>>> for any object with such type:
>>>> 
>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>> 
>>>> The setting to the size field should be done before the first reference 
>>>> to the FAM field.
>>>> 
>>>> Such requirement to the user will guarantee that the first reference to 
>>>> the FAM knows the size of the FAM.
>>>> 
>>>> We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is 
>>> considered to be an accurate representation of the space available for buf 
>>> (given that, after malloc or realloc, it's going to be temporarily 
>>> inaccurate).  If the intent is that inaccurate size at such a time means 
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning. 
>> We need to explicit say that 
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
> 
> It is defined by the latest "counted_by" value before x.buf
> is referenced, but not the latest before x.buf is dereferenced.

Then:

The size of the FAM is defined by the latest “counted_by” value before the FAM is referenced. 
It’s an undefined behavior when the “counted_by” value is not initialized before the FAM is referenced. 

> 
>> 
>> Is the above good enough?
>> 
>> 
>>> 
>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>> 
>>>> In C FE:
>>>> 
>>>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>> if YES, replace the reference to "obj->buf" with a call to
>>>>     .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
>>> 
>>> This seems plausible - but you should also consider the case of static 
>>> initializers - remember the GNU extension for statically allocated objects 
>>> with flexible array members (unless you're not allowing it with 
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = &x.buf;
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid, 
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf. 
>> Both x.size and x.buf are initialized statically. 
> 
> Joseph is talking about the compile-time initialization of y.

Okay, -:) 
so, this is the point where the x.buf is referenced,
 and I think that replacing this reference to a call to .ACCESS_WITH_SIZE is still needed.
Otherwise, the “counted_by” relationship will NOT be seen by the middle-end anymore.


> 
>> 
>>> you should avoid the replacement in such a static initializer context when 
>>> the FAM reference is to an object with a constant address (if 
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
>>> expression; if it works fine as a constant-address lvalue, then the 
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM 
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end. 
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE? 
> 
> The question is whether we get an address constant
> that can be used for compile-time initialization.

Oh, I see.

So, now, PTR is already an constant at FE, the replacement will be

.ACCESS_WITH_SIZE( CONSTANT_ADDRESS, SIZE, ACCESS_MODE)

This looks awkward….
Should we allow this?

If not allowed, then the “counted_by” attribute will not work for the static initialization. 

> 
> I think it would be good to collect a list of test
> cases and to include this example.

Yes, I will put this into the testing case list.

Qing
> 
> Martin
> 
>> 
>> Qing
>>> 
>>> -- 
>>> Joseph S. Myers
>>> joseph@codesourcery.com


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-01 14:47   ` Qing Zhao
  2023-11-01 15:00     ` Martin Uecker
@ 2023-11-02  7:57     ` Richard Biener
  2023-11-02  8:27       ` Jakub Jelinek
                         ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Richard Biener @ 2023-11-02  7:57 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Joseph Myers, Siddhesh Poyarekar, Martin Uecker, Kees Cook,
	Jakub Jelinek, isanbard, GCC Patches

On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
>
>
> > On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > On Tue, 31 Oct 2023, Qing Zhao wrote:
> >
> >> 2.3 A new semantic requirement in the user documentation of "counted_by"
> >>
> >> For the following structure including a FAM with a counted_by attribute:
> >>
> >>  struct A
> >>  {
> >>   size_t size;
> >>   char buf[] __attribute__((counted_by(size)));
> >>  };
> >>
> >> for any object with such type:
> >>
> >>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> >>
> >> The setting to the size field should be done before the first reference
> >> to the FAM field.
> >>
> >> Such requirement to the user will guarantee that the first reference to
> >> the FAM knows the size of the FAM.
> >>
> >> We need to add this additional requirement to the user document.
> >
> > Make sure the manual is very specific about exactly when size is
> > considered to be an accurate representation of the space available for buf
> > (given that, after malloc or realloc, it's going to be temporarily
> > inaccurate).  If the intent is that inaccurate size at such a time means
> > undefined behavior, say so explicitly.
>
> Yes, good point. We need to define this clearly in the beginning.
> We need to explicit say that
>
> the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
>
> Is the above good enough?
>
>
> >
> >> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> >>
> >> In C FE:
> >>
> >> for every reference to a FAM, for example, "obj->buf" in the small example,
> >>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> >>  if YES, replace the reference to "obj->buf" with a call to
> >>      .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> >
> > This seems plausible - but you should also consider the case of static
> > initializers - remember the GNU extension for statically allocated objects
> > with flexible array members (unless you're not allowing it with
> > counted_by).
> >
> > static struct A x = { sizeof "hello", "hello" };
> > static char *y = &x.buf;
> >
> > I'd expect that to be valid - and unless you say such a usage is invalid,
>
> At this moment, I think that this should be valid.
>
> I,e, the following:
>
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> };
>
> static struct A x = {sizeof "hello", "hello”};
>
> Should be valid, and x.size represents the number of elements of x.buf.
> Both x.size and x.buf are initialized statically.
>
> > you should avoid the replacement in such a static initializer context when
> > the FAM reference is to an object with a constant address (if
> > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > expression; if it works fine as a constant-address lvalue, then the
> > replacement would be OK).
>
> Then if such usage for the “counted_by” is valid, we need to replace the FAM
> reference by a call to  .ACCESS_WITH_SIZE as well.
> Otherwise the “counted_by” relationship will be lost to the Middle end.
>
> With the current definition of .ACCESS_WITH_SIZE
>
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>
> Isn’t the PTR (return value of the call) a LVALUE?

You probably want to specify that when a pointer to the array is taken the
pointer has to be to the first array element (or do we want to mangle the
'size' accordingly for the instrumentation?).  You also want to specify that
the 'size' associated with such pointer is assumed to be unchanging and
after changing the size such pointer has to be re-obtained.  Plus that
changes to the allocated object/size have to be performed through an
lvalue where the containing type and thus the 'counted_by' attribute is
visible.  That is,

size_t *s = &a.size;
*s = 1;

is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
awkward since for example that wouldn't support using posix_memalign
for allocation, though aligned_alloc would be fine).

Richard.

> Qing
> >
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02  7:57     ` Richard Biener
@ 2023-11-02  8:27       ` Jakub Jelinek
  2023-11-02 10:18         ` Richard Biener
  2023-11-02 13:50       ` Qing Zhao
  2023-11-03  0:13       ` Bill Wendling
  2 siblings, 1 reply; 34+ messages in thread
From: Jakub Jelinek @ 2023-11-02  8:27 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, isanbard, GCC Patches

On Thu, Nov 02, 2023 at 08:57:36AM +0100, Richard Biener wrote:
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.  That is,
> 
> size_t *s = &a.size;
> *s = 1;
> 
> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).

Depends on what behavior we want to guarantee and what kind of price we want
to pay for it.  If the size is .ACCESS_WITH_SIZE operand, the size used in
__bdos will be whatever counted_by size an array had upon taking address of
the array, wherever that happens in the program.  And while we can CSE
the calls, they'd be CSEd only if they have the same size.

Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
the arguments not the size value, but its address.  Then at __bdos time
we would dereference that pointer to get the size.
So,
struct S { int a; char b __attribute__((counted_by (a))) []; };
struct S s;
s.a = 5;
char *p = &s.b[2];
int i1 = __builtin_dynamic_object_size (p, 0);
s.a = 3;
int i2 = __builtin_dynamic_object_size (p, 0);
would then yield 3 and 1 rather than 3 and 3.  But dunno if we wouldn't
need to drop leaf attribute from __bdos to make that work, that would be
I think a significant case against doing that, because while in all the
current plans one just pay code performance price when using counted_by
attribute, even when not using __bdos for it, if we had to make __bdos
non-leaf we'd pay extra price even when nobody is using that attribute
just in -D_FORTIFY_SOURCE=3 / -fhardened compilations, which is how
several distros build basically everything.

	Jakub


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02  8:27       ` Jakub Jelinek
@ 2023-11-02 10:18         ` Richard Biener
  2023-11-02 10:39           ` Jakub Jelinek
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Biener @ 2023-11-02 10:18 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Qing Zhao, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, isanbard, GCC Patches

On Thu, Nov 2, 2023 at 9:27 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Nov 02, 2023 at 08:57:36AM +0100, Richard Biener wrote:
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.  That is,
> >
> > size_t *s = &a.size;
> > *s = 1;
> >
> > is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
> > awkward since for example that wouldn't support using posix_memalign
> > for allocation, though aligned_alloc would be fine).
>
> Depends on what behavior we want to guarantee and what kind of price we want
> to pay for it.  If the size is .ACCESS_WITH_SIZE operand, the size used in
> __bdos will be whatever counted_by size an array had upon taking address of
> the array, wherever that happens in the program.  And while we can CSE
> the calls, they'd be CSEd only if they have the same size.
>
> Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
> the arguments not the size value, but its address.  Then at __bdos time
> we would dereference that pointer to get the size.
> So,
> struct S { int a; char b __attribute__((counted_by (a))) []; };
> struct S s;
> s.a = 5;
> char *p = &s.b[2];
> int i1 = __builtin_dynamic_object_size (p, 0);
> s.a = 3;
> int i2 = __builtin_dynamic_object_size (p, 0);
> would then yield 3 and 1 rather than 3 and 3.

I fail to see how we can get the __builtin_dynamic_object_size call
data dependent on s.a, thus avoid re-ordering or even DSE of the
store.

Basically the model is that __builtin_dynamic_object_size will get
you the size at the point 'p' was formed from something that "last"
had the container with the counted_by attribute visible (plus adjustments
to 'p' inbetween that we are able to track).

s.a = 5;
char *p = &a.b[0];

will get you '5' as size,

char *p = &a.b[0];
s.a = 7;

will get you whatever was in 's.a' at the point of the address taking,
s.a  = 7 will _not_ be honored for __builtin_dynamic_object_size
calls on 'p'.

>  But dunno if we wouldn't
> need to drop leaf attribute from __bdos to make that work, that would be
> I think a significant case against doing that, because while in all the
> current plans one just pay code performance price when using counted_by
> attribute, even when not using __bdos for it, if we had to make __bdos
> non-leaf we'd pay extra price even when nobody is using that attribute
> just in -D_FORTIFY_SOURCE=3 / -fhardened compilations, which is how
> several distros build basically everything.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 10:18         ` Richard Biener
@ 2023-11-02 10:39           ` Jakub Jelinek
  2023-11-02 11:52             ` Richard Biener
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Jelinek @ 2023-11-02 10:39 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, isanbard, GCC Patches

On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
> > Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
> > the arguments not the size value, but its address.  Then at __bdos time
> > we would dereference that pointer to get the size.
> > So,
> > struct S { int a; char b __attribute__((counted_by (a))) []; };
> > struct S s;
> > s.a = 5;
> > char *p = &s.b[2];
> > int i1 = __builtin_dynamic_object_size (p, 0);
> > s.a = 3;
> > int i2 = __builtin_dynamic_object_size (p, 0);
> > would then yield 3 and 1 rather than 3 and 3.
> 
> I fail to see how we can get the __builtin_dynamic_object_size call
> data dependent on s.a, thus avoid re-ordering or even DSE of the
> store.

If &s.b[2] is lowered as
sz_1 = s.a;
tmp_2 = .ACCESS_WITH_SIZE (&s.b[0], sz_1);
p_3 = &tmp_2[2];
then sure, there is no way, you get the size from that point.
tree-object-size.cc tracking then determines that in a particular
case the pointer is size associated with sz_1 and use that value
as the size (with the usual adjustments for pointer arithmetics and the
like).

What I meant is to emit
tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
p_5 = &tmp_4[2];
i.e. don't associate the pointer with a value of the size, but with
an address where to find the size (plus how large it is), basically escape
pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
so supposedly it can depend on what the escaped pointer points to.
We'd see that a particular pointer is size associated with &s.a address
and would use that address cast to the type of the third argument (to
preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
VN CSE it anyway if one has say
union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
 	  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) []; } t; };
and
.ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
...
.ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
?

It would mean though that counted_by wouldn't be allowed to be a
bit-field...

	Jakub


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 10:39           ` Jakub Jelinek
@ 2023-11-02 11:52             ` Richard Biener
  2023-11-02 12:09               ` Jakub Jelinek
  2023-11-02 20:45               ` Qing Zhao
  0 siblings, 2 replies; 34+ messages in thread
From: Richard Biener @ 2023-11-02 11:52 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Qing Zhao, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, isanbard, GCC Patches

On Thu, Nov 2, 2023 at 11:40 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
> > > Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
> > > the arguments not the size value, but its address.  Then at __bdos time
> > > we would dereference that pointer to get the size.
> > > So,
> > > struct S { int a; char b __attribute__((counted_by (a))) []; };
> > > struct S s;
> > > s.a = 5;
> > > char *p = &s.b[2];
> > > int i1 = __builtin_dynamic_object_size (p, 0);
> > > s.a = 3;
> > > int i2 = __builtin_dynamic_object_size (p, 0);
> > > would then yield 3 and 1 rather than 3 and 3.
> >
> > I fail to see how we can get the __builtin_dynamic_object_size call
> > data dependent on s.a, thus avoid re-ordering or even DSE of the
> > store.
>
> If &s.b[2] is lowered as
> sz_1 = s.a;
> tmp_2 = .ACCESS_WITH_SIZE (&s.b[0], sz_1);
> p_3 = &tmp_2[2];
> then sure, there is no way, you get the size from that point.
> tree-object-size.cc tracking then determines that in a particular
> case the pointer is size associated with sz_1 and use that value
> as the size (with the usual adjustments for pointer arithmetics and the
> like).
>
> What I meant is to emit
> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
> p_5 = &tmp_4[2];
> i.e. don't associate the pointer with a value of the size, but with
> an address where to find the size (plus how large it is), basically escape
> pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
> so supposedly it can depend on what the escaped pointer points to.

Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
escape point (quite bad IMHO) and __builtin_dynamic_object_size being
non-const (that's probably not too bad).

> We'd see that a particular pointer is size associated with &s.a address
> and would use that address cast to the type of the third argument (to
> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
> VN CSE it anyway if one has say
> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>           struct T { char c, d, e, f; char g __attribute__((counted_by (c))) []; } t; };
> and
> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
> ...
> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
> ?

We'd probably CSE that - the usual issue of address-with-same-value.

> It would mean though that counted_by wouldn't be allowed to be a
> bit-field...

Yup.  We could also pass a pointer to the container though, that's good enough
for the escape, and pass the size by value in addition to that.

>         Jakub
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 11:52             ` Richard Biener
@ 2023-11-02 12:09               ` Jakub Jelinek
  2023-11-02 20:35                 ` Qing Zhao
  2023-11-02 20:47                 ` Qing Zhao
  2023-11-02 20:45               ` Qing Zhao
  1 sibling, 2 replies; 34+ messages in thread
From: Jakub Jelinek @ 2023-11-02 12:09 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, isanbard, GCC Patches

On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
> > What I meant is to emit
> > tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
> > p_5 = &tmp_4[2];
> > i.e. don't associate the pointer with a value of the size, but with
> > an address where to find the size (plus how large it is), basically escape
> > pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
> > so supposedly it can depend on what the escaped pointer points to.
> 
> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
> escape point (quite bad IMHO)

That is why I've said we need to decide what cost we want to suffer because
of that.

> and __builtin_dynamic_object_size being
> non-const (that's probably not too bad).

It is already pure,leaf,nothrow (unlike __builtin_object_size which is obviously
const,leaf,nothrow).  Because under the hood, it can read memory when
expanded.

> > We'd see that a particular pointer is size associated with &s.a address
> > and would use that address cast to the type of the third argument (to
> > preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
> > VN CSE it anyway if one has say
> > union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
> >           struct T { char c, d, e, f; char g __attribute__((counted_by (c))) []; } t; };
> > and
> > .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
> > ...
> > .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
> > ?
> 
> We'd probably CSE that - the usual issue of address-with-same-value.
> 
> > It would mean though that counted_by wouldn't be allowed to be a
> > bit-field...
> 
> Yup.  We could also pass a pointer to the container though, that's good enough
> for the escape, and pass the size by value in addition to that.

I was wondering about stuff like _BitInt.  But sure, counted_by is just an
extension, we can just refuse counting by _BitInt in addition to counting by
floating point, pointers, aggregates, bit-fields, or we could somehow encode
all the needed type's properties numerically into an integral constant.
Similarly for alias set (unless it uses 0 for reads).

	Jakub


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02  7:57     ` Richard Biener
  2023-11-02  8:27       ` Jakub Jelinek
@ 2023-11-02 13:50       ` Qing Zhao
  2023-11-02 13:54         ` Richard Biener
  2023-11-02 14:12         ` Martin Uecker
  2023-11-03  0:13       ` Bill Wendling
  2 siblings, 2 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-02 13:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: Joseph Myers, Siddhesh Poyarekar, Martin Uecker, Kees Cook,
	Jakub Jelinek, isanbard, GCC Patches



> On Nov 2, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>> 
>> 
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>> 
>>>> For the following structure including a FAM with a counted_by attribute:
>>>> 
>>>> struct A
>>>> {
>>>>  size_t size;
>>>>  char buf[] __attribute__((counted_by(size)));
>>>> };
>>>> 
>>>> for any object with such type:
>>>> 
>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>> 
>>>> The setting to the size field should be done before the first reference
>>>> to the FAM field.
>>>> 
>>>> Such requirement to the user will guarantee that the first reference to
>>>> the FAM knows the size of the FAM.
>>>> 
>>>> We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is
>>> considered to be an accurate representation of the space available for buf
>>> (given that, after malloc or realloc, it's going to be temporarily
>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning.
>> We need to explicit say that
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
>> 
>> Is the above good enough?
>> 
>> 
>>> 
>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>> 
>>>> In C FE:
>>>> 
>>>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>> if YES, replace the reference to "obj->buf" with a call to
>>>>     .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>> 
>>> This seems plausible - but you should also consider the case of static
>>> initializers - remember the GNU extension for statically allocated objects
>>> with flexible array members (unless you're not allowing it with
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = &x.buf;
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf.
>> Both x.size and x.buf are initialized statically.
>> 
>>> you should avoid the replacement in such a static initializer context when
>>> the FAM reference is to an object with a constant address (if
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>> expression; if it works fine as a constant-address lvalue, then the
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE?
> 
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).

Yes. Will add this into the user documentation.

>  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.

What do you mean by “re-obtained”? 

>  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.

Through an lvalue with the containing type?

Yes, will add this too. 


>  That is,
> 
> size_t *s = &a.size;
> *s = 1;
> 
> is invoking undefined behavior,

right.

> likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).
Is there a small example for the undefined behavior for this?

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>> --
>>> Joseph S. Myers
>>> joseph@codesourcery.com
>> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 13:50       ` Qing Zhao
@ 2023-11-02 13:54         ` Richard Biener
  2023-11-02 14:26           ` Qing Zhao
  2023-11-02 14:12         ` Martin Uecker
  1 sibling, 1 reply; 34+ messages in thread
From: Richard Biener @ 2023-11-02 13:54 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Joseph Myers, Siddhesh Poyarekar, Martin Uecker, Kees Cook,
	Jakub Jelinek, isanbard, GCC Patches

On Thu, Nov 2, 2023 at 2:50 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
>
>
> > On Nov 2, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> >>>
> >>> On Tue, 31 Oct 2023, Qing Zhao wrote:
> >>>
> >>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
> >>>>
> >>>> For the following structure including a FAM with a counted_by attribute:
> >>>>
> >>>> struct A
> >>>> {
> >>>>  size_t size;
> >>>>  char buf[] __attribute__((counted_by(size)));
> >>>> };
> >>>>
> >>>> for any object with such type:
> >>>>
> >>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> >>>>
> >>>> The setting to the size field should be done before the first reference
> >>>> to the FAM field.
> >>>>
> >>>> Such requirement to the user will guarantee that the first reference to
> >>>> the FAM knows the size of the FAM.
> >>>>
> >>>> We need to add this additional requirement to the user document.
> >>>
> >>> Make sure the manual is very specific about exactly when size is
> >>> considered to be an accurate representation of the space available for buf
> >>> (given that, after malloc or realloc, it's going to be temporarily
> >>> inaccurate).  If the intent is that inaccurate size at such a time means
> >>> undefined behavior, say so explicitly.
> >>
> >> Yes, good point. We need to define this clearly in the beginning.
> >> We need to explicit say that
> >>
> >> the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
> >>
> >> Is the above good enough?
> >>
> >>
> >>>
> >>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> >>>>
> >>>> In C FE:
> >>>>
> >>>> for every reference to a FAM, for example, "obj->buf" in the small example,
> >>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> >>>> if YES, replace the reference to "obj->buf" with a call to
> >>>>     .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> >>>
> >>> This seems plausible - but you should also consider the case of static
> >>> initializers - remember the GNU extension for statically allocated objects
> >>> with flexible array members (unless you're not allowing it with
> >>> counted_by).
> >>>
> >>> static struct A x = { sizeof "hello", "hello" };
> >>> static char *y = &x.buf;
> >>>
> >>> I'd expect that to be valid - and unless you say such a usage is invalid,
> >>
> >> At this moment, I think that this should be valid.
> >>
> >> I,e, the following:
> >>
> >> struct A
> >> {
> >> size_t size;
> >> char buf[] __attribute__((counted_by(size)));
> >> };
> >>
> >> static struct A x = {sizeof "hello", "hello”};
> >>
> >> Should be valid, and x.size represents the number of elements of x.buf.
> >> Both x.size and x.buf are initialized statically.
> >>
> >>> you should avoid the replacement in such a static initializer context when
> >>> the FAM reference is to an object with a constant address (if
> >>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> >>> expression; if it works fine as a constant-address lvalue, then the
> >>> replacement would be OK).
> >>
> >> Then if such usage for the “counted_by” is valid, we need to replace the FAM
> >> reference by a call to  .ACCESS_WITH_SIZE as well.
> >> Otherwise the “counted_by” relationship will be lost to the Middle end.
> >>
> >> With the current definition of .ACCESS_WITH_SIZE
> >>
> >> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> >>
> >> Isn’t the PTR (return value of the call) a LVALUE?
> >
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).
>
> Yes. Will add this into the user documentation.
>
> >  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.
>
> What do you mean by “re-obtained”?

do

p = &container.array[0];

after any adjustments to 'array' or 'len' again and base further accesses on
the new 'p'.

> >  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.
>
> Through an lvalue with the containing type?
>
> Yes, will add this too.
>
>
> >  That is,
> >
> > size_t *s = &a.size;
> > *s = 1;
> >
> > is invoking undefined behavior,
>
> right.
>
> > likewise modifying 'buf' (makes it a bit
> > awkward since for example that wouldn't support using posix_memalign
> > for allocation, though aligned_alloc would be fine).
> Is there a small example for the undefined behavior for this?

a.len = len;
posix_memalign (&a.buf, 16, len);

we would probably have to somehow instrument this.

> Qing
> >
> > Richard.
> >
> >> Qing
> >>>
> >>> --
> >>> Joseph S. Myers
> >>> joseph@codesourcery.com
> >>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 13:50       ` Qing Zhao
  2023-11-02 13:54         ` Richard Biener
@ 2023-11-02 14:12         ` Martin Uecker
  2023-11-02 15:41           ` Siddhesh Poyarekar
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Uecker @ 2023-11-02 14:12 UTC (permalink / raw)
  To: Qing Zhao, Richard Biener
  Cc: Joseph Myers, Siddhesh Poyarekar, Kees Cook, Jakub Jelinek,
	isanbard, GCC Patches

Am Donnerstag, dem 02.11.2023 um 13:50 +0000 schrieb Qing Zhao:
> 
> > On Nov 2, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> > 
> > On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> > > 
> > > 
> > > 
> > > > On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> > > > 
> > > > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > > > 
> > > > > 2.3 A new semantic requirement in the user documentation of "counted_by"
> > > > > 
> > > > > For the following structure including a FAM with a counted_by attribute:
> > > > > 
> > > > > struct A
> > > > > {
> > > > >  size_t size;
> > > > >  char buf[] __attribute__((counted_by(size)));
> > > > > };
> > > > > 
> > > > > for any object with such type:
> > > > > 
> > > > > struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > > > 
> > > > > The setting to the size field should be done before the first reference
> > > > > to the FAM field.
> > > > > 
> > > > > Such requirement to the user will guarantee that the first reference to
> > > > > the FAM knows the size of the FAM.
> > > > > 
> > > > > We need to add this additional requirement to the user document.
> > > > 
> > > > Make sure the manual is very specific about exactly when size is
> > > > considered to be an accurate representation of the space available for buf
> > > > (given that, after malloc or realloc, it's going to be temporarily
> > > > inaccurate).  If the intent is that inaccurate size at such a time means
> > > > undefined behavior, say so explicitly.
> > > 
> > > Yes, good point. We need to define this clearly in the beginning.
> > > We need to explicit say that
> > > 
> > > the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
> > > 
> > > Is the above good enough?
> > > 
> > > 
> > > > 
> > > > > 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > > > > 
> > > > > In C FE:
> > > > > 
> > > > > for every reference to a FAM, for example, "obj->buf" in the small example,
> > > > > check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> > > > > if YES, replace the reference to "obj->buf" with a call to
> > > > >     .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> > > > 
> > > > This seems plausible - but you should also consider the case of static
> > > > initializers - remember the GNU extension for statically allocated objects
> > > > with flexible array members (unless you're not allowing it with
> > > > counted_by).
> > > > 
> > > > static struct A x = { sizeof "hello", "hello" };
> > > > static char *y = &x.buf;
> > > > 
> > > > I'd expect that to be valid - and unless you say such a usage is invalid,
> > > 
> > > At this moment, I think that this should be valid.
> > > 
> > > I,e, the following:
> > > 
> > > struct A
> > > {
> > > size_t size;
> > > char buf[] __attribute__((counted_by(size)));
> > > };
> > > 
> > > static struct A x = {sizeof "hello", "hello”};
> > > 
> > > Should be valid, and x.size represents the number of elements of x.buf.
> > > Both x.size and x.buf are initialized statically.
> > > 
> > > > you should avoid the replacement in such a static initializer context when
> > > > the FAM reference is to an object with a constant address (if
> > > > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > > > expression; if it works fine as a constant-address lvalue, then the
> > > > replacement would be OK).
> > > 
> > > Then if such usage for the “counted_by” is valid, we need to replace the FAM
> > > reference by a call to  .ACCESS_WITH_SIZE as well.
> > > Otherwise the “counted_by” relationship will be lost to the Middle end.
> > > 
> > > With the current definition of .ACCESS_WITH_SIZE
> > > 
> > > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> > > 
> > > Isn’t the PTR (return value of the call) a LVALUE?
> > 
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).
> 
> Yes. Will add this into the user documentation.

This shouldn't be necessary. The object-size pass
can track pointer arithmeti if it comes after
inserting the .ACCESS_WITH_SIZE.

https://godbolt.org/z/fvc3aoPfd

> 
> >  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.
> 
> What do you mean by “re-obtained”? 
> 
> >  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.
> 
> Through an lvalue with the containing type?
> 
> Yes, will add this too. 

I do not understand this.  It shouldn't matter how
it is updated as long as taking the reference to buf
happens through an lvalue that has the attribute.
> 
> 
> >  That is,
> > 
> > size_t *s = &a.size;
> > *s = 1;
> > 
> > is invoking undefined behavior,
> 
> right.

size_t *s = &a.size;
*s = 1;
char *buf = a.buf;

Should work just fine as long as the reference to buf
comes after the store to the size (which can be indirect).


Martin

> 
> > likewise modifying 'buf' (makes it a bit
> > awkward since for example that wouldn't support using posix_memalign
> > for allocation, though aligned_alloc would be fine).
> Is there a small example for the undefined behavior for this?
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 13:54         ` Richard Biener
@ 2023-11-02 14:26           ` Qing Zhao
  0 siblings, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-02 14:26 UTC (permalink / raw)
  To: Richard Biener, Kees Cook
  Cc: Joseph Myers, Siddhesh Poyarekar, Martin Uecker, Jakub Jelinek,
	isanbard, GCC Patches



> On Nov 2, 2023, at 9:54 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Thu, Nov 2, 2023 at 2:50 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>> 
>> 
>> 
>>> On Nov 2, 2023, at 3:57 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
>>>>> 
>>>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>>>> 
>>>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>>>> 
>>>>>> For the following structure including a FAM with a counted_by attribute:
>>>>>> 
>>>>>> struct A
>>>>>> {
>>>>>> size_t size;
>>>>>> char buf[] __attribute__((counted_by(size)));
>>>>>> };
>>>>>> 
>>>>>> for any object with such type:
>>>>>> 
>>>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>> 
>>>>>> The setting to the size field should be done before the first reference
>>>>>> to the FAM field.
>>>>>> 
>>>>>> Such requirement to the user will guarantee that the first reference to
>>>>>> the FAM knows the size of the FAM.
>>>>>> 
>>>>>> We need to add this additional requirement to the user document.
>>>>> 
>>>>> Make sure the manual is very specific about exactly when size is
>>>>> considered to be an accurate representation of the space available for buf
>>>>> (given that, after malloc or realloc, it's going to be temporarily
>>>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>>>> undefined behavior, say so explicitly.
>>>> 
>>>> Yes, good point. We need to define this clearly in the beginning.
>>>> We need to explicit say that
>>>> 
>>>> the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
>>>> 
>>>> Is the above good enough?
>>>> 
>>>> 
>>>>> 
>>>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>>>> 
>>>>>> In C FE:
>>>>>> 
>>>>>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>>>> if YES, replace the reference to "obj->buf" with a call to
>>>>>>    .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>>>> 
>>>>> This seems plausible - but you should also consider the case of static
>>>>> initializers - remember the GNU extension for statically allocated objects
>>>>> with flexible array members (unless you're not allowing it with
>>>>> counted_by).
>>>>> 
>>>>> static struct A x = { sizeof "hello", "hello" };
>>>>> static char *y = &x.buf;
>>>>> 
>>>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>>>> 
>>>> At this moment, I think that this should be valid.
>>>> 
>>>> I,e, the following:
>>>> 
>>>> struct A
>>>> {
>>>> size_t size;
>>>> char buf[] __attribute__((counted_by(size)));
>>>> };
>>>> 
>>>> static struct A x = {sizeof "hello", "hello”};
>>>> 
>>>> Should be valid, and x.size represents the number of elements of x.buf.
>>>> Both x.size and x.buf are initialized statically.
>>>> 
>>>>> you should avoid the replacement in such a static initializer context when
>>>>> the FAM reference is to an object with a constant address (if
>>>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>>>> expression; if it works fine as a constant-address lvalue, then the
>>>>> replacement would be OK).
>>>> 
>>>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>>>> reference by a call to  .ACCESS_WITH_SIZE as well.
>>>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>>>> 
>>>> With the current definition of .ACCESS_WITH_SIZE
>>>> 
>>>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>>>> 
>>>> Isn’t the PTR (return value of the call) a LVALUE?
>>> 
>>> You probably want to specify that when a pointer to the array is taken the
>>> pointer has to be to the first array element (or do we want to mangle the
>>> 'size' accordingly for the instrumentation?).
>> 
>> Yes. Will add this into the user documentation.
>> 
>>> You also want to specify that
>>> the 'size' associated with such pointer is assumed to be unchanging and
>>> after changing the size such pointer has to be re-obtained.
>> 
>> What do you mean by “re-obtained”?
> 
> do
> 
> p = &container.array[0];
> 
> after any adjustments to 'array' or 'len' again and base further accesses on
> the new 'p'.


Then for the following example form Kees:

	struct foo *f;
	char *p;
	int i;

	f = alloc(maximum_possible);
	f->count = 0;
	p = f->buf;

	for (i; data_is_available() && i < maximum_possible; i++) {
		f->count ++;
		p[i] = next_data_item();
	}

Will not work?

We have to change it as:

	struct foo *f;
	char *p;
	int i;

	f = alloc(maximum_possible);
	f->count = 0;
	p = f->buf;

	for (i; data_is_available() && i < maximum_possible; i++) {
		f->count ++;
                p = f->buf;             // add this?
		p[i] = next_data_item();
	}

Why?


> 
>>> Plus that
>>> changes to the allocated object/size have to be performed through an
>>> lvalue where the containing type and thus the 'counted_by' attribute is
>>> visible.
>> 
>> Through an lvalue with the containing type?
>> 
>> Yes, will add this too.
>> 
>> 
>>> That is,
>>> 
>>> size_t *s = &a.size;
>>> *s = 1;
>>> 
>>> is invoking undefined behavior,
>> 
>> right.
>> 
>>> likewise modifying 'buf' (makes it a bit
>>> awkward since for example that wouldn't support using posix_memalign
>>> for allocation, though aligned_alloc would be fine).
>> Is there a small example for the undefined behavior for this?
> 
> a.len = len;
> posix_memalign (&a.buf, 16, len);
> 
> we would probably have to somehow instrument this.

For the above, the buffer and the size are still synchronized even after the buffer is modified? Should we allow this?

Qing
> 
>> Qing
>>> 
>>> Richard.
>>> 
>>>> Qing
>>>>> 
>>>>> --
>>>>> Joseph S. Myers
>>>>> joseph@codesourcery.com
>>>> 
>> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 14:12         ` Martin Uecker
@ 2023-11-02 15:41           ` Siddhesh Poyarekar
  0 siblings, 0 replies; 34+ messages in thread
From: Siddhesh Poyarekar @ 2023-11-02 15:41 UTC (permalink / raw)
  To: Martin Uecker, Qing Zhao, Richard Biener
  Cc: Joseph Myers, Kees Cook, Jakub Jelinek, isanbard, GCC Patches

On 2023-11-02 10:12, Martin Uecker wrote:
> This shouldn't be necessary. The object-size pass
> can track pointer arithmeti if it comes after
> inserting the .ACCESS_WITH_SIZE.
> 
> https://godbolt.org/z/fvc3aoPfd

The problem is dependency tracking through the pointer arithmetic, which 
Jakub suggested to work around by passing a reference to the size in 
.ACCESS_WITH_SIZE to avoid DCE/reordering.

Thanks,
Sid

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 12:09               ` Jakub Jelinek
@ 2023-11-02 20:35                 ` Qing Zhao
  2023-11-03  0:28                   ` Bill Wendling
  2023-11-02 20:47                 ` Qing Zhao
  1 sibling, 1 reply; 34+ messages in thread
From: Qing Zhao @ 2023-11-02 20:35 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Kees Cook
  Cc: Joseph Myers, Siddhesh Poyarekar, Martin Uecker, Kees Cook,
	isanbard, GCC Patches

Thanks a lot for raising these issues. 

If I understand correctly,  the major question we need to answer is:

For the following example: (Jakub mentioned this  in an early message)

  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
  2 struct S s;
  3 s.a = 5;
  4 char *p = &s.b[2];
  5 int i1 = __builtin_dynamic_object_size (p, 0);
  6 s.a = 3;
  7 int i2 = __builtin_dynamic_object_size (p, 0);

Should the 2nd __bdos call (line 7) get
	A. the latest value of s.a (line 6) for it’s size? 
Or 	B. the value when the s.b was referenced (line 3, line 4)?

A should be more convenient for the user to use the dynamic array feature.
With B, the user has to modify the source code (to add code to “re-obtain” 
the pointer after the size was adjusted at line 6) as mentioned by Richard. 

This depends on how we design the new internal function .ACCESS_WITH_SIZE

1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed. 

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.

PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)

With 1, We can only provide B, the user needs to modify the source code to get the full feature of dynamic array;
With 2, We can provide  A, the user will get full support to the dynamic array without restrictions in the source code. 

However, We have to pay additional cost for supporting A by using 2, which includes:

1. .ACCESS_WITH_SIZE will become an escape point, which will further impact the IPA optimizations, more runtime overhead. 
    Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?

2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also impact some IPA optimizations, more runtime overhead. 

I think the following are the factors that make the decision:

1. How big the performance impact?
2. How important the dynamic array feature? Is adding some user restrictions as Richard mentioned feasible to support this feature?

Maybe we can implement 1 first, if the full support to the dynamic array is needed, we can add 2 then? 
Or, we can implement both, and compare the performance difference, then decide?

Qing




> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
>>> p_5 = &tmp_4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with &s.a address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>          struct T { char c, d, e, f; char g __attribute__((counted_by (c))) []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 11:52             ` Richard Biener
  2023-11-02 12:09               ` Jakub Jelinek
@ 2023-11-02 20:45               ` Qing Zhao
  1 sibling, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-02 20:45 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, isanbard, GCC Patches



> On Nov 2, 2023, at 7:52 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Thu, Nov 2, 2023 at 11:40 AM Jakub Jelinek <jakub@redhat.com> wrote:
>> 
>> On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
>>>> Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
>>>> the arguments not the size value, but its address.  Then at __bdos time
>>>> we would dereference that pointer to get the size.
>>>> So,
>>>> struct S { int a; char b __attribute__((counted_by (a))) []; };
>>>> struct S s;
>>>> s.a = 5;
>>>> char *p = &s.b[2];
>>>> int i1 = __builtin_dynamic_object_size (p, 0);
>>>> s.a = 3;
>>>> int i2 = __builtin_dynamic_object_size (p, 0);
>>>> would then yield 3 and 1 rather than 3 and 3.
>>> 
>>> I fail to see how we can get the __builtin_dynamic_object_size call
>>> data dependent on s.a, thus avoid re-ordering or even DSE of the
>>> store.
>> 
>> If &s.b[2] is lowered as
>> sz_1 = s.a;
>> tmp_2 = .ACCESS_WITH_SIZE (&s.b[0], sz_1);
>> p_3 = &tmp_2[2];
>> then sure, there is no way, you get the size from that point.
>> tree-object-size.cc tracking then determines that in a particular
>> case the pointer is size associated with sz_1 and use that value
>> as the size (with the usual adjustments for pointer arithmetics and the
>> like).
>> 
>> What I meant is to emit
>> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
>> p_5 = &tmp_4[2];
>> i.e. don't associate the pointer with a value of the size, but with
>> an address where to find the size (plus how large it is), basically escape
>> pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
>> so supposedly it can depend on what the escaped pointer points to.
> 
> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
> escape point (quite bad IMHO) and __builtin_dynamic_object_size being
> non-const (that's probably not too bad).
> 
>> We'd see that a particular pointer is size associated with &s.a address
>> and would use that address cast to the type of the third argument (to
>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>> VN CSE it anyway if one has say
>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>          struct T { char c, d, e, f; char g __attribute__((counted_by (c))) []; } t; };
>> and
>> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
>> ...
>> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
>> ?
> 
> We'd probably CSE that - the usual issue of address-with-same-value.
> 
>> It would mean though that counted_by wouldn't be allowed to be a
>> bit-field...
> 
> Yup.  We could also pass a pointer to the container though, that's good enough
> for the escape, and pass the size by value in addition to that.
Could you explain a little bit more here? Then the .ACCESS_WITH_SIZE will become

PTR = .ACCESS_WITH_SIZE (PTR, &PTR’s Container, SIZE, ACCESS_MODE)

??

> 
>>        Jakub
>> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 12:09               ` Jakub Jelinek
  2023-11-02 20:35                 ` Qing Zhao
@ 2023-11-02 20:47                 ` Qing Zhao
  1 sibling, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-02 20:47 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, isanbard, GCC Patches



> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
>>> p_5 = &tmp_4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with &s.a address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>          struct T { char c, d, e, f; char g __attribute__((counted_by (c))) []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).

counted_by currently is limited to INTEGER type. This should resolve this issue, right?

Qing
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02  7:57     ` Richard Biener
  2023-11-02  8:27       ` Jakub Jelinek
  2023-11-02 13:50       ` Qing Zhao
@ 2023-11-03  0:13       ` Bill Wendling
  2023-11-03 19:28         ` Qing Zhao
  2 siblings, 1 reply; 34+ messages in thread
From: Bill Wendling @ 2023-11-03  0:13 UTC (permalink / raw)
  To: Richard Biener
  Cc: Qing Zhao, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, Jakub Jelinek, GCC Patches

On Thu, Nov 2, 2023 at 1:00 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> >
> >
> >
> > > On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> > >
> > > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > >
> > >> 2.3 A new semantic requirement in the user documentation of "counted_by"
> > >>
> > >> For the following structure including a FAM with a counted_by attribute:
> > >>
> > >>  struct A
> > >>  {
> > >>   size_t size;
> > >>   char buf[] __attribute__((counted_by(size)));
> > >>  };
> > >>
> > >> for any object with such type:
> > >>
> > >>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > >>
> > >> The setting to the size field should be done before the first reference
> > >> to the FAM field.
> > >>
> > >> Such requirement to the user will guarantee that the first reference to
> > >> the FAM knows the size of the FAM.
> > >>
> > >> We need to add this additional requirement to the user document.
> > >
> > > Make sure the manual is very specific about exactly when size is
> > > considered to be an accurate representation of the space available for buf
> > > (given that, after malloc or realloc, it's going to be temporarily
> > > inaccurate).  If the intent is that inaccurate size at such a time means
> > > undefined behavior, say so explicitly.
> >
> > Yes, good point. We need to define this clearly in the beginning.
> > We need to explicit say that
> >
> > the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
> >
> > Is the above good enough?
> >
> >
> > >
> > >> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > >>
> > >> In C FE:
> > >>
> > >> for every reference to a FAM, for example, "obj->buf" in the small example,
> > >>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> > >>  if YES, replace the reference to "obj->buf" with a call to
> > >>      .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> > >
> > > This seems plausible - but you should also consider the case of static
> > > initializers - remember the GNU extension for statically allocated objects
> > > with flexible array members (unless you're not allowing it with
> > > counted_by).
> > >
> > > static struct A x = { sizeof "hello", "hello" };
> > > static char *y = &x.buf;
> > >
> > > I'd expect that to be valid - and unless you say such a usage is invalid,
> >
> > At this moment, I think that this should be valid.
> >
> > I,e, the following:
> >
> > struct A
> > {
> >  size_t size;
> >  char buf[] __attribute__((counted_by(size)));
> > };
> >
> > static struct A x = {sizeof "hello", "hello”};
> >
> > Should be valid, and x.size represents the number of elements of x.buf.
> > Both x.size and x.buf are initialized statically.
> >
> > > you should avoid the replacement in such a static initializer context when
> > > the FAM reference is to an object with a constant address (if
> > > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > > expression; if it works fine as a constant-address lvalue, then the
> > > replacement would be OK).
> >
> > Then if such usage for the “counted_by” is valid, we need to replace the FAM
> > reference by a call to  .ACCESS_WITH_SIZE as well.
> > Otherwise the “counted_by” relationship will be lost to the Middle end.
> >
> > With the current definition of .ACCESS_WITH_SIZE
> >
> > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> >
> > Isn’t the PTR (return value of the call) a LVALUE?
>
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.  That is,
>
> size_t *s = &a.size;
> *s = 1;
>
> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).
>
I believe Qing's original documentation for counted_by makes the
relationship between 'size' and the FAM very clear and that without
agreement it'll result in undefined behavior. Though it might be best
to state that in a strong way.

-bw

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-02 20:35                 ` Qing Zhao
@ 2023-11-03  0:28                   ` Bill Wendling
  2023-11-03  6:07                     ` Martin Uecker
  2023-11-03 19:33                     ` Qing Zhao
  0 siblings, 2 replies; 34+ messages in thread
From: Bill Wendling @ 2023-11-03  0:28 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Jakub Jelinek, Richard Biener, Kees Cook, Joseph Myers,
	Siddhesh Poyarekar, Martin Uecker, GCC Patches

On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
> Thanks a lot for raising these issues.
>
> If I understand correctly,  the major question we need to answer is:
>
> For the following example: (Jakub mentioned this  in an early message)
>
>   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>   2 struct S s;
>   3 s.a = 5;
>   4 char *p = &s.b[2];
>   5 int i1 = __builtin_dynamic_object_size (p, 0);
>   6 s.a = 3;
>   7 int i2 = __builtin_dynamic_object_size (p, 0);
>
> Should the 2nd __bdos call (line 7) get
>         A. the latest value of s.a (line 6) for it’s size?
> Or      B. the value when the s.b was referenced (line 3, line 4)?
>
I personally think it should be (A). The user is specifically
indicating that the size has somehow changed, and the compiler should
behave accordingly.

> A should be more convenient for the user to use the dynamic array feature.
> With B, the user has to modify the source code (to add code to “re-obtain”
> the pointer after the size was adjusted at line 6) as mentioned by Richard.
>
> This depends on how we design the new internal function .ACCESS_WITH_SIZE
>
> 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
>
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>
> 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
>
> PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)
>
> With 1, We can only provide B, the user needs to modify the source code to get the full feature of dynamic array;
> With 2, We can provide  A, the user will get full support to the dynamic array without restrictions in the source code.
>
My understanding of ACCESS_WITH_SIZE is that it's there to add an
explicit reference to SIZE so that the optimizers won't reorder the
code incorrectly. If that's the case, then it should act as if
ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
dereference into the FAM). We get that with (2) it appears. It would
be a major headache to make the user go throughout their code base to
ensure that SIZE was either unmodified, or if it was that extra code
must be added to ensure the expected behavior.

> However, We have to pay additional cost for supporting A by using 2, which includes:
>
> 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact the IPA optimizations, more runtime overhead.
>     Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
>
> 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also impact some IPA optimizations, more runtime overhead.
>
> I think the following are the factors that make the decision:
>
> 1. How big the performance impact?
> 2. How important the dynamic array feature? Is adding some user restrictions as Richard mentioned feasible to support this feature?
>
> Maybe we can implement 1 first, if the full support to the dynamic array is needed, we can add 2 then?
> Or, we can implement both, and compare the performance difference, then decide?
>
> Qing
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03  0:28                   ` Bill Wendling
@ 2023-11-03  6:07                     ` Martin Uecker
  2023-11-03  6:22                       ` Jakub Jelinek
  2023-11-03 19:33                     ` Qing Zhao
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Uecker @ 2023-11-03  6:07 UTC (permalink / raw)
  To: Bill Wendling, Qing Zhao
  Cc: Jakub Jelinek, Richard Biener, Kees Cook, Joseph Myers,
	Siddhesh Poyarekar, GCC Patches

Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> > 
> > Thanks a lot for raising these issues.
> > 
> > If I understand correctly,  the major question we need to answer is:
> > 
> > For the following example: (Jakub mentioned this  in an early message)
> > 
> >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> >   2 struct S s;
> >   3 s.a = 5;
> >   4 char *p = &s.b[2];
> >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> >   6 s.a = 3;
> >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > 
> > Should the 2nd __bdos call (line 7) get
> >         A. the latest value of s.a (line 6) for it’s size?
> > Or      B. the value when the s.b was referenced (line 3, line 4)?
> > 
> I personally think it should be (A). The user is specifically
> indicating that the size has somehow changed, and the compiler should
> behave accordingly.


One potential problem for A apart from the potential impact on
optimization is that the information may get lost more
easily. Consider:

char *p = &s.b[2];
f(&s);
int i = __bdos(p, 0);

If the compiler can not see into 'f', the information is lost
because f may have changed the size.

And if I understand it correctly, if the pointers escapes
with .ACCESS_WITH_SIZE, then this is already true for:

char *p = &s.b[2];
g();
int i = __bdos(p, 0);


If we make it UB to change the size, then I guess we could
also delay this choice.  Or we implement B but have a UBSan
option based on A that only verifies at run-time that the size 
did not change.


Martin


> 
> > A should be more convenient for the user to use the dynamic array feature.
> > With B, the user has to modify the source code (to add code to “re-obtain”
> > the pointer after the size was adjusted at line 6) as mentioned by Richard.
> > 
> > This depends on how we design the new internal function .ACCESS_WITH_SIZE
> > 
> > 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
> > 
> > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> > 
> > 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
> > 
> > PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)
> > 
> > With 1, We can only provide B, the user needs to modify the source code to get the full feature of dynamic array;
> > With 2, We can provide  A, the user will get full support to the dynamic array without restrictions in the source code.
> > 
> My understanding of ACCESS_WITH_SIZE is that it's there to add an
> explicit reference to SIZE so that the optimizers won't reorder the
> code incorrectly. If that's the case, then it should act as if
> ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
> dereference into the FAM). We get that with (2) it appears. It would
> be a major headache to make the user go throughout their code base to
> ensure that SIZE was either unmodified, or if it was that extra code
> must be added to ensure the expected behavior.
> 
> > However, We have to pay additional cost for supporting A by using 2, which includes:
> > 
> > 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact the IPA optimizations, more runtime overhead.
> >     Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
> > 
> > 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also impact some IPA optimizations, more runtime overhead.
> > 
> > I think the following are the factors that make the decision:
> > 
> > 1. How big the performance impact?
> > 2. How important the dynamic array feature? Is adding some user restrictions as Richard mentioned feasible to support this feature?
> > 
> > Maybe we can implement 1 first, if the full support to the dynamic array is needed, we can add 2 then?
> > Or, we can implement both, and compare the performance difference, then decide?
> > 
> > Qing
> > 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03  6:07                     ` Martin Uecker
@ 2023-11-03  6:22                       ` Jakub Jelinek
  2023-11-03  6:32                         ` Martin Uecker
  2023-11-03 14:32                         ` Qing Zhao
  0 siblings, 2 replies; 34+ messages in thread
From: Jakub Jelinek @ 2023-11-03  6:22 UTC (permalink / raw)
  To: Martin Uecker
  Cc: Bill Wendling, Qing Zhao, Richard Biener, Kees Cook,
	Joseph Myers, Siddhesh Poyarekar, GCC Patches

On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> > On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> > > 
> > > Thanks a lot for raising these issues.
> > > 
> > > If I understand correctly,  the major question we need to answer is:
> > > 
> > > For the following example: (Jakub mentioned this  in an early message)
> > > 
> > >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> > >   2 struct S s;
> > >   3 s.a = 5;
> > >   4 char *p = &s.b[2];
> > >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> > >   6 s.a = 3;
> > >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > > 
> > > Should the 2nd __bdos call (line 7) get
> > >         A. the latest value of s.a (line 6) for it’s size?
> > > Or      B. the value when the s.b was referenced (line 3, line 4)?
> > > 
> > I personally think it should be (A). The user is specifically
> > indicating that the size has somehow changed, and the compiler should
> > behave accordingly.
> 
> 
> One potential problem for A apart from the potential impact on
> optimization is that the information may get lost more
> easily. Consider:
> 
> char *p = &s.b[2];
> f(&s);
> int i = __bdos(p, 0);
> 
> If the compiler can not see into 'f', the information is lost
> because f may have changed the size.

Why?  It doesn't really matter.  The options are
A. p is at &s.b[2] associated with &s.a and int type (or size of int
   or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
   POV we can describe it with more detail that it doesn't modify anything
   in the pointed structure, just escapes the pointer; __bdos can stay
   leaf I believe; and when expanding __bdos later on, it would just
   dereference the associated pointer at that point (note, __bdos is
   pure, so it has vuse but not vdef and can load from memory); if
   f changes s.a, no problem, __bdos will load the changed value in there
B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
   __bdos later will use s.a value from the &s.b[2] spot

	Jakub


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03  6:22                       ` Jakub Jelinek
@ 2023-11-03  6:32                         ` Martin Uecker
  2023-11-03 16:20                           ` Qing Zhao
  2023-11-03 14:32                         ` Qing Zhao
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Uecker @ 2023-11-03  6:32 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Bill Wendling, Qing Zhao, Richard Biener, Kees Cook,
	Joseph Myers, Siddhesh Poyarekar, GCC Patches

Am Freitag, dem 03.11.2023 um 07:22 +0100 schrieb Jakub Jelinek:
> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
> > Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> > > On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> > > > 
> > > > Thanks a lot for raising these issues.
> > > > 
> > > > If I understand correctly,  the major question we need to answer is:
> > > > 
> > > > For the following example: (Jakub mentioned this  in an early message)
> > > > 
> > > >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> > > >   2 struct S s;
> > > >   3 s.a = 5;
> > > >   4 char *p = &s.b[2];
> > > >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> > > >   6 s.a = 3;
> > > >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > > > 
> > > > Should the 2nd __bdos call (line 7) get
> > > >         A. the latest value of s.a (line 6) for it’s size?
> > > > Or      B. the value when the s.b was referenced (line 3, line 4)?
> > > > 
> > > I personally think it should be (A). The user is specifically
> > > indicating that the size has somehow changed, and the compiler should
> > > behave accordingly.
> > 
> > 
> > One potential problem for A apart from the potential impact on
> > optimization is that the information may get lost more
> > easily. Consider:
> > 
> > char *p = &s.b[2];
> > f(&s);
> > int i = __bdos(p, 0);
> > 
> > If the compiler can not see into 'f', the information is lost
> > because f may have changed the size.
> 
> Why?  It doesn't really matter.  The options are
> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>    or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
>    POV we can describe it with more detail that it doesn't modify anything
>    in the pointed structure, just escapes the pointer; __bdos can stay
>    leaf I believe; and when expanding __bdos later on, it would just
>    dereference the associated pointer at that point (note, __bdos is
>    pure, so it has vuse but not vdef and can load from memory); if
>    f changes s.a, no problem, __bdos will load the changed value in there

Ah, I right. Because of the reload it doesn't matter. 
Thank you for the explanation!

Martin

> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>    point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>    __bdos later will use s.a value from the &s.b[2] spot


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03  6:22                       ` Jakub Jelinek
  2023-11-03  6:32                         ` Martin Uecker
@ 2023-11-03 14:32                         ` Qing Zhao
  2023-11-03 14:46                           ` Jakub Jelinek
  1 sibling, 1 reply; 34+ messages in thread
From: Qing Zhao @ 2023-11-03 14:32 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Martin Uecker, Bill Wendling, Richard Biener, Kees Cook,
	Joseph Myers, Siddhesh Poyarekar, GCC Patches



> On Nov 3, 2023, at 2:22 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
>>> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>> 
>>>> Thanks a lot for raising these issues.
>>>> 
>>>> If I understand correctly,  the major question we need to answer is:
>>>> 
>>>> For the following example: (Jakub mentioned this  in an early message)
>>>> 
>>>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>>>  2 struct S s;
>>>>  3 s.a = 5;
>>>>  4 char *p = &s.b[2];
>>>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>>>  6 s.a = 3;
>>>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>>>> 
>>>> Should the 2nd __bdos call (line 7) get
>>>>        A. the latest value of s.a (line 6) for it’s size?
>>>> Or      B. the value when the s.b was referenced (line 3, line 4)?
>>>> 
>>> I personally think it should be (A). The user is specifically
>>> indicating that the size has somehow changed, and the compiler should
>>> behave accordingly.
>> 
>> 
>> One potential problem for A apart from the potential impact on
>> optimization is that the information may get lost more
>> easily. Consider:
>> 
>> char *p = &s.b[2];
>> f(&s);
>> int i = __bdos(p, 0);
>> 
>> If the compiler can not see into 'f', the information is lost
>> because f may have changed the size.
> 
> Why?  It doesn't really matter.  The options are
> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>   or whatever); .ACCESS_WITH_SIZE can't be pure,

.ACCESS_WITH_SIZE will only load the size from its address, no any write to memory.
It still can be PURE, right? (It will not be CONST anymore).

> but sure, for aliasing
>   POV we can describe it with more detail that it doesn't modify anything
>   in the pointed structure, just escapes the pointer;

If we need to do this, where in the gcc code we need to add these details?

> __bdos can stay
>   leaf I believe;

That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)

Qing

> and when expanding __bdos later on, it would just
>   dereference the associated pointer at that point (note, __bdos is
>   pure, so it has vuse but not vdef and can load from memory); if
>   f changes s.a, no problem, __bdos will load the changed value in there
> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>   __bdos later will use s.a value from the &s.b[2] spot
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03 14:32                         ` Qing Zhao
@ 2023-11-03 14:46                           ` Jakub Jelinek
  2023-11-03 15:22                             ` Qing Zhao
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Jelinek @ 2023-11-03 14:46 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Martin Uecker, Bill Wendling, Richard Biener, Kees Cook,
	Joseph Myers, Siddhesh Poyarekar, GCC Patches

On Fri, Nov 03, 2023 at 02:32:04PM +0000, Qing Zhao wrote:
> > Why?  It doesn't really matter.  The options are
> > A. p is at &s.b[2] associated with &s.a and int type (or size of int
> >   or whatever); .ACCESS_WITH_SIZE can't be pure,
> 
> .ACCESS_WITH_SIZE will only load the size from its address, no any write to memory.
> It still can be PURE, right? (It will not be CONST anymore).

No, it can't be pure.  Because for the IL purposes, it needs to be treated
as if it saves that address of the counter into some unnamed global variable
somewhere.
> 
> > but sure, for aliasing
> >   POV we can describe it with more detail that it doesn't modify anything
> >   in the pointed structure, just escapes the pointer;
> 
> If we need to do this, where in the gcc code we need to add these details?

I think ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1, but Richi is
expert here.

> > __bdos can stay
> >   leaf I believe;
> 
> That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)

No, it shouldn't call it obviously.  If tree-object-size.cc discovery tracks
something to a pointer initialized by .ACCESS_WITH_SIZE call, then it should
I believe recurse on the first argument of that call (say if one has
  ptr_3 = malloc (sz_1);
  ptr_2 = .ACCESS_WITH_SIZE (ptr_3, &ptr_3[4], ...);
then supposedly __bdos later on should e.g. for 0/1 modes take minimum from
ptr_3 (the size actually allocated)) and the the counter.

	Jakub


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03 14:46                           ` Jakub Jelinek
@ 2023-11-03 15:22                             ` Qing Zhao
  0 siblings, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-03 15:22 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Martin Uecker, Bill Wendling, Richard Biener, Kees Cook,
	Joseph Myers, Siddhesh Poyarekar, GCC Patches



> On Nov 3, 2023, at 10:46 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Fri, Nov 03, 2023 at 02:32:04PM +0000, Qing Zhao wrote:
>>> Why?  It doesn't really matter.  The options are
>>> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>>>  or whatever); .ACCESS_WITH_SIZE can't be pure,
>> 
>> .ACCESS_WITH_SIZE will only load the size from its address, no any write to memory.
>> It still can be PURE, right? (It will not be CONST anymore).
> 
> No, it can't be pure.  Because for the IL purposes, it needs to be treated
> as if it saves that address of the counter into some unnamed global variable
> somewhere.

Okay. I see.

>> 
>>> but sure, for aliasing
>>>  POV we can describe it with more detail that it doesn't modify anything
>>>  in the pointed structure, just escapes the pointer;
>> 
>> If we need to do this, where in the gcc code we need to add these details?
> 
> I think ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1, but Richi is
> expert here.

Just checked these routines, looks like that some other non-pure internal functions are handled here too.
For example, 
      case IFN_UBSAN_BOUNDS:
      case IFN_UBSAN_VPTR:
      case IFN_UBSAN_OBJECT_SIZE:
      case IFN_UBSAN_PTR:
      case IFN_ASAN_CHECK:

Looks like the correct place to adjust the new .ACCESS_WITH_SIZE. 
> 
>>> __bdos can stay
>>>  leaf I believe;
>> 
>> That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)
> 
> No, it shouldn't call it obviously.  If tree-object-size.cc discovery tracks
> something to a pointer initialized by .ACCESS_WITH_SIZE call, then it should
> I believe recurse on the first argument of that call (say if one has
>  ptr_3 = malloc (sz_1);
>  ptr_2 = .ACCESS_WITH_SIZE (ptr_3, &ptr_3[4], ...);
> then supposedly __bdos later on should e.g. for 0/1 modes take minimum from
> ptr_3 (the size actually allocated)) and the the counter.

Yes, this is the situation in my mind too. 
I thought this might eliminate the LEAF feature from __bdos. -:) if not, that’s good.

Qing
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03  6:32                         ` Martin Uecker
@ 2023-11-03 16:20                           ` Qing Zhao
  2023-11-03 16:30                             ` Jakub Jelinek
  0 siblings, 1 reply; 34+ messages in thread
From: Qing Zhao @ 2023-11-03 16:20 UTC (permalink / raw)
  To: Martin Uecker, Jakub Jelinek, Bill Wendling
  Cc: Richard Biener, Kees Cook, Joseph Myers, Siddhesh Poyarekar, GCC Patches

So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as following:

 .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)

which returns the “REF_TO_OBJ" same as the 1st argument;

1st argument “REF_TO_OBJ": Reference to the object;
2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by the 1st argument, 
 if the object that the “REF_TO_OBJ” refered has a
   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the elements of the type;
   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
3rd argument "ACCESS_MODE": 
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write

NOTEs, 
 A. This new internal function is intended for a more general use from all the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode the "size" and "access_mode" information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
 B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.   
 C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. However, this function should be ready to be used by "access" and "alloc_size" without issue. 

Although .ACCESS_WITH_SIZE is not PURE anymore, but it’s only read from the 2nd argument, and not modify anything in the pointed objects. So, we can adjust the IPA alias analysis phase with this details (ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1).

One more note: only the integer type is allowed for the SIZE, and in tree_object_size.cc, all the SIZE
 (in attributes “access”, “alloc_size”, etc) are converted to “sizetype”.  So, we don’t need to specify
The type of the size for “REF_TO_SIZE” since it’s always integer types and always converted to “sizetype” internally. 

Let me know any more comment or suggestion. 

Qing


On Nov 3, 2023, at 2:32 AM, Martin Uecker <uecker@tugraz.at> wrote:
> 
> 
> Am Freitag, dem 03.11.2023 um 07:22 +0100 schrieb Jakub Jelinek:
>> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
>>>> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>>>> 
>>>>> Thanks a lot for raising these issues.
>>>>> 
>>>>> If I understand correctly,  the major question we need to answer is:
>>>>> 
>>>>> For the following example: (Jakub mentioned this  in an early message)
>>>>> 
>>>>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>>>>  2 struct S s;
>>>>>  3 s.a = 5;
>>>>>  4 char *p = &s.b[2];
>>>>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>>>>  6 s.a = 3;
>>>>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>>>>> 
>>>>> Should the 2nd __bdos call (line 7) get
>>>>>        A. the latest value of s.a (line 6) for it’s size?
>>>>> Or      B. the value when the s.b was referenced (line 3, line 4)?
>>>>> 
>>>> I personally think it should be (A). The user is specifically
>>>> indicating that the size has somehow changed, and the compiler should
>>>> behave accordingly.
>>> 
>>> 
>>> One potential problem for A apart from the potential impact on
>>> optimization is that the information may get lost more
>>> easily. Consider:
>>> 
>>> char *p = &s.b[2];
>>> f(&s);
>>> int i = __bdos(p, 0);
>>> 
>>> If the compiler can not see into 'f', the information is lost
>>> because f may have changed the size.
>> 
>> Why?  It doesn't really matter.  The options are
>> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>>   or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
>>   POV we can describe it with more detail that it doesn't modify anything
>>   in the pointed structure, just escapes the pointer; __bdos can stay
>>   leaf I believe; and when expanding __bdos later on, it would just
>>   dereference the associated pointer at that point (note, __bdos is
>>   pure, so it has vuse but not vdef and can load from memory); if
>>   f changes s.a, no problem, __bdos will load the changed value in there
> 
> Ah, I right. Because of the reload it doesn't matter. 
> Thank you for the explanation!
> 
> Martin
> 
>> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>>   __bdos later will use s.a value from the &s.b[2] spot


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03 16:20                           ` Qing Zhao
@ 2023-11-03 16:30                             ` Jakub Jelinek
  2023-11-03 16:36                               ` Qing Zhao
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Jelinek @ 2023-11-03 16:30 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Martin Uecker, Bill Wendling, Richard Biener, Kees Cook,
	Joseph Myers, Siddhesh Poyarekar, GCC Patches

On Fri, Nov 03, 2023 at 04:20:57PM +0000, Qing Zhao wrote:
> So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as following:
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the “REF_TO_OBJ" same as the 1st argument;
> 
> 1st argument “REF_TO_OBJ": Reference to the object;
> 2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by the 1st argument, 
>  if the object that the “REF_TO_OBJ” refered has a
>    * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the elements of the type;
>    * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 

No, you can't do this.  Conversions between pointers are mostly useless in
GIMPLE, , so you can't make decisions based on TREE_TYPE (TREE_TYPE (fnarg))
as it could have some random completely unrelated type.
So, the multiplication factor needs to be encoded in the arguments rather
than derived from REF_TO_OBJ's type, and similarly the size of what
REF_TO_SIZE points to needs to be encoded somewhere.

> 3rd argument "ACCESS_MODE": 
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write

	Jakub


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03 16:30                             ` Jakub Jelinek
@ 2023-11-03 16:36                               ` Qing Zhao
  0 siblings, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-03 16:36 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Martin Uecker, Bill Wendling, Richard Biener, Kees Cook,
	Joseph Myers, Siddhesh Poyarekar, GCC Patches



> On Nov 3, 2023, at 12:30 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Fri, Nov 03, 2023 at 04:20:57PM +0000, Qing Zhao wrote:
>> So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as following:
>> 
>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)
>> 
>> INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)
>> 
>> which returns the “REF_TO_OBJ" same as the 1st argument;
>> 
>> 1st argument “REF_TO_OBJ": Reference to the object;
>> 2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by the 1st argument, 
>> if the object that the “REF_TO_OBJ” refered has a
>>   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the elements of the type;
>>   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
> 
> No, you can't do this.  Conversions between pointers are mostly useless in
> GIMPLE, , so you can't make decisions based on TREE_TYPE (TREE_TYPE (fnarg))
> as it could have some random completely unrelated type.
> So, the multiplication factor needs to be encoded in the arguments rather
> than derived from REF_TO_OBJ's type, and similarly the size of what
> REF_TO_SIZE points to needs to be encoded somewhere.

Okay, I see, so 2 more arguments to the new function.

Qing
> 
>> 3rd argument "ACCESS_MODE": 
>> -1: Unknown access semantics
>>  0: none
>>  1: read_only
>>  2: write_only
>>  3: read_write
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03  0:13       ` Bill Wendling
@ 2023-11-03 19:28         ` Qing Zhao
  0 siblings, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-03 19:28 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Richard Biener, Joseph Myers, Siddhesh Poyarekar, Martin Uecker,
	Kees Cook, Jakub Jelinek, GCC Patches



> On Nov 2, 2023, at 8:13 PM, Bill Wendling <isanbard@gmail.com> wrote:
> 
> On Thu, Nov 2, 2023 at 1:00 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>> 
>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers <joseph@codesourcery.com> wrote:
>>>> 
>>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>>> 
>>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>>> 
>>>>> For the following structure including a FAM with a counted_by attribute:
>>>>> 
>>>>> struct A
>>>>> {
>>>>>  size_t size;
>>>>>  char buf[] __attribute__((counted_by(size)));
>>>>> };
>>>>> 
>>>>> for any object with such type:
>>>>> 
>>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>> 
>>>>> The setting to the size field should be done before the first reference
>>>>> to the FAM field.
>>>>> 
>>>>> Such requirement to the user will guarantee that the first reference to
>>>>> the FAM knows the size of the FAM.
>>>>> 
>>>>> We need to add this additional requirement to the user document.
>>>> 
>>>> Make sure the manual is very specific about exactly when size is
>>>> considered to be an accurate representation of the space available for buf
>>>> (given that, after malloc or realloc, it's going to be temporarily
>>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>>> undefined behavior, say so explicitly.
>>> 
>>> Yes, good point. We need to define this clearly in the beginning.
>>> We need to explicit say that
>>> 
>>> the size of the FAM is defined by the latest “counted_by” value. And it’s an undefined behavior when the size field is not defined when the FAM is referenced.
>>> 
>>> Is the above good enough?
>>> 
>>> 
>>>> 
>>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>>> 
>>>>> In C FE:
>>>>> 
>>>>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>>> if YES, replace the reference to "obj->buf" with a call to
>>>>>     .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>>> 
>>>> This seems plausible - but you should also consider the case of static
>>>> initializers - remember the GNU extension for statically allocated objects
>>>> with flexible array members (unless you're not allowing it with
>>>> counted_by).
>>>> 
>>>> static struct A x = { sizeof "hello", "hello" };
>>>> static char *y = &x.buf;
>>>> 
>>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>>> 
>>> At this moment, I think that this should be valid.
>>> 
>>> I,e, the following:
>>> 
>>> struct A
>>> {
>>> size_t size;
>>> char buf[] __attribute__((counted_by(size)));
>>> };
>>> 
>>> static struct A x = {sizeof "hello", "hello”};
>>> 
>>> Should be valid, and x.size represents the number of elements of x.buf.
>>> Both x.size and x.buf are initialized statically.
>>> 
>>>> you should avoid the replacement in such a static initializer context when
>>>> the FAM reference is to an object with a constant address (if
>>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>>> expression; if it works fine as a constant-address lvalue, then the
>>>> replacement would be OK).
>>> 
>>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>>> reference by a call to  .ACCESS_WITH_SIZE as well.
>>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>>> 
>>> With the current definition of .ACCESS_WITH_SIZE
>>> 
>>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>>> 
>>> Isn’t the PTR (return value of the call) a LVALUE?
>> 
>> You probably want to specify that when a pointer to the array is taken the
>> pointer has to be to the first array element (or do we want to mangle the
>> 'size' accordingly for the instrumentation?).  You also want to specify that
>> the 'size' associated with such pointer is assumed to be unchanging and
>> after changing the size such pointer has to be re-obtained.  Plus that
>> changes to the allocated object/size have to be performed through an
>> lvalue where the containing type and thus the 'counted_by' attribute is
>> visible.  That is,
>> 
>> size_t *s = &a.size;
>> *s = 1;
>> 
>> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
>> awkward since for example that wouldn't support using posix_memalign
>> for allocation, though aligned_alloc would be fine).
>> 
> I believe Qing's original documentation for counted_by makes the
> relationship between 'size' and the FAM very clear and that without
> agreement it'll result in undefined behavior. Though it might be best
> to state that in a strong way.

I will update the counted-by documentation with the following additions:

1. The initialization of the size field should be done before the first reference to the FAM;
And.
 2. A later reference to the FAM will use the latest value assigned to the size field before that reference;

I think adding these two on top of my current user documentation should be enough. 

I will update the counted-by user documentation and then send out for comment in a later email.

thanks.

Qing
> 
> -bw


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
  2023-11-03  0:28                   ` Bill Wendling
  2023-11-03  6:07                     ` Martin Uecker
@ 2023-11-03 19:33                     ` Qing Zhao
  1 sibling, 0 replies; 34+ messages in thread
From: Qing Zhao @ 2023-11-03 19:33 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Jakub Jelinek, Richard Biener, Kees Cook, Joseph Myers,
	Siddhesh Poyarekar, Martin Uecker, GCC Patches

Yes, after today’s discussion, I think we agreed on 

1. Passing the size field by reference to .ACCESS_WITH_SIZE as jakub suggested.
2. Then the compiler should be able to always use the latest value of size field for the reference to FAM.

As a result, no need to add code for pointer re-obtaining purpose in the source code. 

I will update the proposal one more time.

thanks.

Qing

> On Nov 2, 2023, at 8:28 PM, Bill Wendling <isanbard@gmail.com> wrote:
> 
> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>> 
>> Thanks a lot for raising these issues.
>> 
>> If I understand correctly,  the major question we need to answer is:
>> 
>> For the following example: (Jakub mentioned this  in an early message)
>> 
>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>  2 struct S s;
>>  3 s.a = 5;
>>  4 char *p = &s.b[2];
>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>  6 s.a = 3;
>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>> 
>> Should the 2nd __bdos call (line 7) get
>>        A. the latest value of s.a (line 6) for it’s size?
>> Or      B. the value when the s.b was referenced (line 3, line 4)?
>> 
> I personally think it should be (A). The user is specifically
> indicating that the size has somehow changed, and the compiler should
> behave accordingly.
> 
>> A should be more convenient for the user to use the dynamic array feature.
>> With B, the user has to modify the source code (to add code to “re-obtain”
>> the pointer after the size was adjusted at line 6) as mentioned by Richard.
>> 
>> This depends on how we design the new internal function .ACCESS_WITH_SIZE
>> 
>> 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
>> 
>> PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)
>> 
>> With 1, We can only provide B, the user needs to modify the source code to get the full feature of dynamic array;
>> With 2, We can provide  A, the user will get full support to the dynamic array without restrictions in the source code.
>> 
> My understanding of ACCESS_WITH_SIZE is that it's there to add an
> explicit reference to SIZE so that the optimizers won't reorder the
> code incorrectly. If that's the case, then it should act as if
> ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
> dereference into the FAM). We get that with (2) it appears. It would
> be a major headache to make the user go throughout their code base to
> ensure that SIZE was either unmodified, or if it was that extra code
> must be added to ensure the expected behavior.
> 
>> However, We have to pay additional cost for supporting A by using 2, which includes:
>> 
>> 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact the IPA optimizations, more runtime overhead.
>>    Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
>> 
>> 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also impact some IPA optimizations, more runtime overhead.
>> 
>> I think the following are the factors that make the decision:
>> 
>> 1. How big the performance impact?
>> 2. How important the dynamic array feature? Is adding some user restrictions as Richard mentioned feasible to support this feature?
>> 
>> Maybe we can implement 1 first, if the full support to the dynamic array is needed, we can add 2 then?
>> Or, we can implement both, and compare the performance difference, then decide?
>> 
>> Qing
>> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-11-03 19:34 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-31 16:26 RFC: the proposal to resolve the missing dependency issue for counted_by attribute Qing Zhao
2023-10-31 17:35 ` Siddhesh Poyarekar
2023-10-31 18:35   ` Qing Zhao
2023-10-31 22:14 ` Joseph Myers
2023-11-01 14:47   ` Qing Zhao
2023-11-01 15:00     ` Martin Uecker
2023-11-01 15:48       ` Qing Zhao
2023-11-02  7:57     ` Richard Biener
2023-11-02  8:27       ` Jakub Jelinek
2023-11-02 10:18         ` Richard Biener
2023-11-02 10:39           ` Jakub Jelinek
2023-11-02 11:52             ` Richard Biener
2023-11-02 12:09               ` Jakub Jelinek
2023-11-02 20:35                 ` Qing Zhao
2023-11-03  0:28                   ` Bill Wendling
2023-11-03  6:07                     ` Martin Uecker
2023-11-03  6:22                       ` Jakub Jelinek
2023-11-03  6:32                         ` Martin Uecker
2023-11-03 16:20                           ` Qing Zhao
2023-11-03 16:30                             ` Jakub Jelinek
2023-11-03 16:36                               ` Qing Zhao
2023-11-03 14:32                         ` Qing Zhao
2023-11-03 14:46                           ` Jakub Jelinek
2023-11-03 15:22                             ` Qing Zhao
2023-11-03 19:33                     ` Qing Zhao
2023-11-02 20:47                 ` Qing Zhao
2023-11-02 20:45               ` Qing Zhao
2023-11-02 13:50       ` Qing Zhao
2023-11-02 13:54         ` Richard Biener
2023-11-02 14:26           ` Qing Zhao
2023-11-02 14:12         ` Martin Uecker
2023-11-02 15:41           ` Siddhesh Poyarekar
2023-11-03  0:13       ` Bill Wendling
2023-11-03 19:28         ` Qing Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).