public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Michael Matz <matz@suse.de>
To: Qing Zhao <qing.zhao@oracle.com>
Cc: Martin Uecker <uecker@tugraz.at>,
	Kees Cook <keescook@chromium.org>,
	 Joseph Myers <joseph@codesourcery.com>,
	 Richard Biener <richard.guenther@gmail.com>,
	 jakub Jelinek <jakub@redhat.com>,
	 Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org>,
	 Siddhesh Poyarekar <siddhesh@gotplt.org>,
	 "isanbard@gmail.com" <isanbard@gmail.com>
Subject: Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
Date: Thu, 10 Aug 2023 13:54:21 +0000 (UTC)	[thread overview]
Message-ID: <alpine.LSU.2.20.2308101243350.25429@wotan.suse.de> (raw)
In-Reply-To: <A98EAAAD-213F-4483-A670-244EABA5EC9F@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 5709 bytes --]

Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> > So, should the equivalent FAM struct also have this sizeof()?  If no: 
> > there should be a good argument why it shouldn't be similar to the non-FAM 
> > one.
> 
> The sizeof() of a structure with FAM is defined as: (after I searched online,
>  I think that the one in Wikipedia is the most reasonable one):
> https://en.wikipedia.org/wiki/Flexible_array_member

Well, wikipedia has it's uses.  Here we are in language lawyering land 
together with a discussion what makes most sense in many circumstances.  
FWIW, in this case I think the cited text from wikipedia is correct in the 
sense of "not wrong" but not helpful in the sense of "good advise".

> By definition, the sizeof() of a struct with FAM might not be the same 
> as the non-FAM one. i.e, for the following two structures, one with FAM, 
> the other with fixed array:
> 
> struct foo_flex { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> struct foo_fix {int a; short b; char t[3]; } 
> 
> With current GCC:
> sizeof(foo_flex) == 8
> sizeof(foo_fix) == 12
> 
> I think that the current behavior of sizeof for structure with FAM in 
> GCC is correct.

It is, yes.

> The major issue is what was pointed out by Martin in the previous email:
> 
> Whether using the following formula is correct to compute the 
> allocation?
> 
> sizeof(struct foo_flex) + N * sizeof(foo->t);
> 
> As pointed out  in the wikipedia, the value computed by this formula might
> be bigger than the actual size since “sizeof(struct foo_flex)” might include 
> paddings that are used as part of the array.

That doesn't make the formula incorrect, but rather conservatively 
correct.  If you don't want to be conservative, then yes, you can use a 
different formula if you happen to know the layout rules your compiler at 
hand uses (or the ones prescribed by an ABI, if it does that).  I think 
it would be bad advise to the general population do advertise this scheme 
as better.

> So the more accurate formula should be
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);

"* sizeof(foo->t[0])", but yes.

> For the question, whether the compiler needs to allocate paddings after 
> the FAM field, I don’t know the answer, and it’s not specified in the 
> standard either. Does it matter?

It matters for two things:

1) Abstract reasons: is there as reason to deviate 
from the normal rules?  If not: it shouldn't deviate.  Future 
extensibility: while it right now is not possible to form an array 
of FMA-structs (in C!), who's to say that it may not be eventually added?  
It seems a natural enough extension of an extension, and while it has 
certain implementation problems (the "real" size of the elements needs to 
be computable, and hence be part of the array type) those could be 
overcome.  At that point you _really_ want to have the elements aligned 
naturally, be compatible with sizeof, and be the same as an individual 
object.

2) Practical reasons: codegeneration works better if the accessible sizes 
of objects are a multiple of their alignment, as often you have 
instructions that can move around alignment-sized blobs (say, words).  If 
you don't pad out objects you have to be careful to use only byte accesses 
when you get to the end of an object.

Let me ask the question in the opposite way: what would you _gain_ by not 
allocating padding?  And is that enough to deviate from the most obvious 
choices?  (Do note that e.g. global or local FMA-typed objects don't exist 
in isolation, and their surrounding objects have their own alignment 
rules, which often will lead to tail padding anyway, even if you would use 
the non-conservative size calculation; the same applies for malloc'ed 
objects).

> > Note that if one choses to allocate less space than sizeof implies that 
> > this will have quite some consequences for code generation, in that 
> > sometimes the instruction sequences (e.g. for copying) need to be careful 
> > to never access tail padding that should be there in array context, but 
> > isn't there in single-object context.  I think this alone should make it 
> > clear that it's advisable that sizeof() and allocated size agree.
> 
> Sizeof by definition is return the size of the TYPE, not the size of the 
> allocated object.

Sure.  Outside special cases like FMA it's the same, though.  And there 
sizeof does include tail padding.

> > And then the next question is what __builtin_object_size should do with 
> > these: should it return the size with or without padding at end (i.e. 
> > could/should it return 9 even if sizeof is 12).  I can see arguments for 
> > both.
> 
> Currently, GCC’s __builtin_object_size use the following formula to 
> compute the object size for The structure with FAM:
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
> 
> I think it’s correct.

See above.  It's non-conservatively correct.  And that may be the right 
choice for this builtin, considering its intended use-cases (strict 
checking of allowed access ranges) ...

> I think that the users might need to use this formula to compute the 
> allocation size for a structure with FAM too.

... but that doesn't automatically follows.  There's a difference between 
accessible memory range (for language semantic purposes) and allocated 
memory range.  I could imagine that somewhen __builtin_object_size doesn't 
include tail padding for normal non-FMA types either.  After all you can't 
rely on the contents there, though you won't get segfaults when you access 
it.  sizeof will continue to have to include it, though.  So that's a 
demonstration of why _bos is not the right thing to use for allocation 
purposes.


Ciao,
Michael.

  parent reply	other threads:[~2023-08-10 13:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-04 19:44 Qing Zhao
2023-08-04 19:44 ` [V2][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896) Qing Zhao
2023-08-04 19:44 ` [V2][PATCH 2/3] Use the counted_by atribute info in builtin object size [PR108896] Qing Zhao
2023-08-04 19:44 ` [V2][PATCH 3/3] Use the counted_by attribute information in bound sanitizer[PR108896] Qing Zhao
2023-08-07 16:16 ` [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896) Kees Cook
2023-08-07 16:33   ` Qing Zhao
2023-08-09 19:17     ` Kees Cook
2023-08-08 14:54   ` Martin Uecker
2023-08-08 16:18     ` Michael Matz
2023-08-08 19:58     ` Kees Cook
2023-08-09 16:05     ` Qing Zhao
2023-08-09 16:21       ` Michael Matz
2023-08-09 20:10         ` Qing Zhao
2023-08-10  6:58           ` Martin Uecker
2023-08-10 13:59             ` Qing Zhao
2023-08-10 14:38               ` Martin Uecker
2023-08-10 14:42                 ` Jakub Jelinek
2023-08-10 14:47                   ` Martin Uecker
2023-08-10 14:58                     ` Siddhesh Poyarekar
2023-08-10 15:18                       ` Martin Uecker
2023-08-10 16:28                         ` Qing Zhao
2023-08-10 16:30                         ` Siddhesh Poyarekar
2023-08-10 16:39                           ` Jakub Jelinek
2023-08-10 17:06                             ` Siddhesh Poyarekar
2023-08-16 21:42                               ` Qing Zhao
2023-08-10 18:18                             ` Qing Zhao
2023-08-10 14:02             ` Michael Matz
2023-08-10 13:54           ` Michael Matz [this message]
2023-08-09 20:34     ` Qing Zhao
2023-08-17  5:31 ` Kees Cook
2023-08-17  6:38   ` Kees Cook
2023-08-17 13:44     ` Qing Zhao
2023-08-17 16:54       ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.20.2308101243350.25429@wotan.suse.de \
    --to=matz@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=isanbard@gmail.com \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=keescook@chromium.org \
    --cc=qing.zhao@oracle.com \
    --cc=richard.guenther@gmail.com \
    --cc=siddhesh@gotplt.org \
    --cc=uecker@tugraz.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).