From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 0F0BF3857700 for ; Thu, 10 Aug 2023 13:54:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F0BF3857700 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 367D91F74D; Thu, 10 Aug 2023 13:54:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1691675664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k70/mqJDR2cMX+B1Qd8W8VuZxAKMEQGfXhNucde/0Ew=; b=d3NkLZ8sG6dLSE1TxN3ulW0aiNRIweKeeim1qTrg/22B1LKQiM5CdGpsBzStYMPR4WHwXj eJhkt2vOyTgfHa8/QVd3KhpB+l7yeiE1uBClzSII8+X59ro9wiGtrZhuRmpezschGMRbVc 3TlvI0dJ8qIME+4k61tmrPHRxBR4ZHI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1691675664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k70/mqJDR2cMX+B1Qd8W8VuZxAKMEQGfXhNucde/0Ew=; b=oOcVndk2IMZjTX/94GAZ20xRnIt1b9u4TF9WWl2ghdCDK9Uv/EaAk821lFxMLLpAPmbObI ZnWP3Uzkbklc6qDg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id AD95D2C142; Thu, 10 Aug 2023 13:54:23 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id A44F769B6; Thu, 10 Aug 2023 13:54:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id A286769B0; Thu, 10 Aug 2023 13:54:23 +0000 (UTC) Date: Thu, 10 Aug 2023 13:54:21 +0000 (UTC) From: Michael Matz To: Qing Zhao cc: Martin Uecker , Kees Cook , Joseph Myers , Richard Biener , jakub Jelinek , Qing Zhao via Gcc-patches , Siddhesh Poyarekar , "isanbard@gmail.com" Subject: Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896) In-Reply-To: Message-ID: References: <20230804194431.993958-1-qing.zhao@oracle.com> <202308070858.D2FB43E@keescook> <5f76638c8cfca7611e955ef9fadacfd7f8dca0fb.camel@tugraz.at> <9E6E0BBA-A97F-4C94-B188-8E4A620B36DB@oracle.com> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="-1609957120-967838230-1691675663=:25429" X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609957120-967838230-1691675663=:25429 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Hello, On Wed, 9 Aug 2023, Qing Zhao wrote: > > So, should the equivalent FAM struct also have this sizeof()? If no: > > there should be a good argument why it shouldn't be similar to the non-FAM > > one. > > The sizeof() of a structure with FAM is defined as: (after I searched online, > I think that the one in Wikipedia is the most reasonable one): > https://en.wikipedia.org/wiki/Flexible_array_member Well, wikipedia has it's uses. Here we are in language lawyering land together with a discussion what makes most sense in many circumstances. FWIW, in this case I think the cited text from wikipedia is correct in the sense of "not wrong" but not helpful in the sense of "good advise". > By definition, the sizeof() of a struct with FAM might not be the same > as the non-FAM one. i.e, for the following two structures, one with FAM, > the other with fixed array: > > struct foo_flex { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } }; > struct foo_fix {int a; short b; char t[3]; } > > With current GCC: > sizeof(foo_flex) == 8 > sizeof(foo_fix) == 12 > > I think that the current behavior of sizeof for structure with FAM in > GCC is correct. It is, yes. > The major issue is what was pointed out by Martin in the previous email: > > Whether using the following formula is correct to compute the > allocation? > > sizeof(struct foo_flex) + N * sizeof(foo->t); > > As pointed out in the wikipedia, the value computed by this formula might > be bigger than the actual size since “sizeof(struct foo_flex)” might include > paddings that are used as part of the array. That doesn't make the formula incorrect, but rather conservatively correct. If you don't want to be conservative, then yes, you can use a different formula if you happen to know the layout rules your compiler at hand uses (or the ones prescribed by an ABI, if it does that). I think it would be bad advise to the general population do advertise this scheme as better. > So the more accurate formula should be > > offset(struct foo_flex, t[0]) + N * sizeof(foo->t); "* sizeof(foo->t[0])", but yes. > For the question, whether the compiler needs to allocate paddings after > the FAM field, I don’t know the answer, and it’s not specified in the > standard either. Does it matter? It matters for two things: 1) Abstract reasons: is there as reason to deviate from the normal rules? If not: it shouldn't deviate. Future extensibility: while it right now is not possible to form an array of FMA-structs (in C!), who's to say that it may not be eventually added? It seems a natural enough extension of an extension, and while it has certain implementation problems (the "real" size of the elements needs to be computable, and hence be part of the array type) those could be overcome. At that point you _really_ want to have the elements aligned naturally, be compatible with sizeof, and be the same as an individual object. 2) Practical reasons: codegeneration works better if the accessible sizes of objects are a multiple of their alignment, as often you have instructions that can move around alignment-sized blobs (say, words). If you don't pad out objects you have to be careful to use only byte accesses when you get to the end of an object. Let me ask the question in the opposite way: what would you _gain_ by not allocating padding? And is that enough to deviate from the most obvious choices? (Do note that e.g. global or local FMA-typed objects don't exist in isolation, and their surrounding objects have their own alignment rules, which often will lead to tail padding anyway, even if you would use the non-conservative size calculation; the same applies for malloc'ed objects). > > Note that if one choses to allocate less space than sizeof implies that > > this will have quite some consequences for code generation, in that > > sometimes the instruction sequences (e.g. for copying) need to be careful > > to never access tail padding that should be there in array context, but > > isn't there in single-object context. I think this alone should make it > > clear that it's advisable that sizeof() and allocated size agree. > > Sizeof by definition is return the size of the TYPE, not the size of the > allocated object. Sure. Outside special cases like FMA it's the same, though. And there sizeof does include tail padding. > > And then the next question is what __builtin_object_size should do with > > these: should it return the size with or without padding at end (i.e. > > could/should it return 9 even if sizeof is 12). I can see arguments for > > both. > > Currently, GCC’s __builtin_object_size use the following formula to > compute the object size for The structure with FAM: > > offset(struct foo_flex, t[0]) + N * sizeof(foo->t); > > I think it’s correct. See above. It's non-conservatively correct. And that may be the right choice for this builtin, considering its intended use-cases (strict checking of allowed access ranges) ... > I think that the users might need to use this formula to compute the > allocation size for a structure with FAM too. ... but that doesn't automatically follows. There's a difference between accessible memory range (for language semantic purposes) and allocated memory range. I could imagine that somewhen __builtin_object_size doesn't include tail padding for normal non-FMA types either. After all you can't rely on the contents there, though you won't get segfaults when you access it. sizeof will continue to have to include it, though. So that's a demonstration of why _bos is not the right thing to use for allocation purposes. Ciao, Michael. ---1609957120-967838230-1691675663=:25429--