public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Qing Zhao <qing.zhao@oracle.com>
Cc: "joseph@codesourcery.com" <joseph@codesourcery.com>,
	"richard.guenther@gmail.com" <richard.guenther@gmail.com>,
	"jakub@redhat.com" <jakub@redhat.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	"siddhesh@gotplt.org" <siddhesh@gotplt.org>,
	"uecker@tugraz.at" <uecker@tugraz.at>,
	"isanbard@gmail.com" <isanbard@gmail.com>
Subject: Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)
Date: Mon, 17 Jul 2023 16:40:12 -0700	[thread overview]
Message-ID: <202307171612.406D82C39@keescook> (raw)
In-Reply-To: <ECA688BE-4F24-4A6F-9816-3519752530B1@oracle.com>

On Mon, Jul 17, 2023 at 09:17:48PM +0000, Qing Zhao wrote:
> 
> > On Jul 13, 2023, at 4:31 PM, Kees Cook <keescook@chromium.org> wrote:
> > 
> > In the bug, the problem is that "p" isn't known to be allocated, if I'm
> > reading that correctly?
> 
> I think that the major point in PR109557 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557):
> 
> for the following pointer p.3_1, 
> 
> p.3_1 = p;
> _2 = __builtin_object_size (p.3_1, 0);
> 
> Question: why the size of p.3_1 cannot use the TYPE_SIZE of the pointee of p when the TYPE_SIZE can be determined at compile time?
> 
> Answer:  From just knowing the type of the pointee of p, the compiler cannot determine the size of the object.  

Why is that? "p" points to "struct P", which has a fixed size. There
must be an assumption somewhere that a pointer is allocated, otherwise
__bos would almost never work?

> Therefore the bug has been closed. 
> 
> In your following testing 5:
> 
> > I'm not sure this is a reasonable behavior, but
> > let me get into the specific test, which looks like this:
> > 
> > TEST(counted_by_seen_by_bdos)
> > {
> >        struct annotated *p;
> >        int index = MAX_INDEX + unconst;
> > 
> >        p = alloc_annotated(index);
> > 
> >        REPORT_SIZE(p->array);
> > /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array));
> >        /* Check array size alone. */
> > /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1), SIZE_MAX);
> > /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->foo * sizeof(*p->array));
> >        /* Check check entire object size. */
> > /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX);
> > /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->foo * sizeof(*p->array));
> > }
> > 
> > Test 5 should pass as well, since, again, p can be examined. Passing p
> > to __bdos implies it is allocated and the __counted_by annotation can be
> > examined.
> 
> Since the call to the routine “alloc_annotated" cannot be inlined, GCC does not see any allocation calls for the pointer p.

Correct.

> At the same time, due to the same reason as PR109986, GCC cannot determine the size of the object by just knowing the TYPE_SIZE
> of the pointee of p.  

So the difference between test 3 and test 5 is that "p" is explicitly
dereferenced to find "array", and therefore an assumption is established
that "p" is allocated?

> So, this is exactly the same issue as PR109557.  It’s an existing behavior per the current __buitlin_object_size algorithm.
> I am still not very sure whether the situation in PR109557 can be improved or not, but either way, it’s a separate issue.

Okay, so the issue is one of object allocation visibility (or
assumptions there in)?

> Please see the new testing case I added for PR109557, you will see that the above case 5 is a similar case as the new testing case in PR109557.

I will ponder this a bit more to see if I can come up with a useful
test case to replace the part from "test 5" above.

> > 
> > If "return p->array[index];" would be caught by the sanitizer, then
> > it follows that __builtin_dynamic_object_size(p, 1) must also know the
> > size. i.e. both must examine "p" and its trailing flexible array and
> > __counted_by annotation.
> > 
> >> 
> >> 2. The common issue for  the failed testing 3, 4, 9, 10 is:
> >> 
> >> for the following annotated structure: 
> >> 
> >> ====
> >> struct annotated {
> >>        unsigned long flags;
> >>        size_t foo;
> >>        int array[] __attribute__((counted_by (foo)));
> >> };
> >> 
> >> 
> >> struct annotated *p;
> >> int index = 16;
> >> 
> >> p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real size 
> >> 
> >> p->foo = index + 2;  // p->foo was set by a different value than the real size of p->array as in test 9 and 10
> > 
> > Right, tests 9 and 10 are checking that the _smallest_ possible value of
> > the array is used. (There are two sources of information: the allocation
> > size and the size calculated by counted_by. The smaller of the two
> > should be used when both are available.)
> 
> The counted_by attribute is used to annotate a Flexible array member on how many elements it will have.
> However, if this information can not accurately reflect the real number of elements for the array allocated, 
> What’s the purpose of such information? 

For example, imagine code that allocates space for 100 elements since
the common case is that the number of elements will grow over time.
Elements are added as it goes. For example:

struct grows {
	int alloc_count;
	int valid_count;
	struct element item[] __counted_by(valid_count);
} *p;

void something(void)
{
	p = malloc(sizeof(*p) + sizeof(*p->item) * 100);
	p->alloc_count = 100;
	p->valid_count = 0;

	/* this loop doesn't check that we don't go over 100. */
	while (items_to_copy) {
		struct element *item_ptr = get_next_item();
		/* __counted_by stays in sync: */
		p->valid_count++;
		p->item[p->valid_count - 1] = *item_ptr;
	}
}

We would want to catch cases there p->item[] is accessed with an index
that is >= p->valid_count, even though the allocation is (currently)
larger.

However, if we ever reached valid_count >= alloc_count, we need to trap
too, since we can still "see" the true allocation size.

Now, the __alloc_size hint is visible in very few places, so if there is
a strong reason to do so, I can live with saying that __counted_by takes
full precedence over __alloc_size. It seems it should be possible to
compare when both are present, but I can live with __counted_by being
the universal truth. :)

> 
> >> or
> >> p->foo was not set to any value as in test 3 and 4
> > 
> > For tests 3 and 4, yes, this was my mistake. I have fixed up the tests
> > now. Bill noticed the same issue. Sorry for the think-o!
> > 
> >> ====
> >> 
> >> i.e, the value of p->foo is NOT synced with the number of elements allocated for the array p->array.  
> >> 
> >> I think that this should be considered as an user error, and the documentation of the attribute should include
> >> this requirement.  (In the LLVM’s RFC, such requirement was included in the programing model: 
> >> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)
> >> 
> >> We can add a new warning option -Wcounted-by to report such user error if needed.
> >> 
> >> What’s your opinion on this?
> > 
> > I think it is correct to allow mismatch between allocation and
> > counted_by as long as only the least of the two is used.
> 
> What’s your mean by “only the least of the two is used”?

I was just restating the above: if size information is available via
both __alloc_size and __counted_by, the smaller of the two should be
used.

> 
> > This may be
> > desirable in a few situations. One example would be a large allocation
> > that is slowly filled up by the program.
> 
> So, for such situation, whenever the allocation is filled up, the field that hold the “counted_by” attribute should be increased at the same time,
> Then, the “counted_by” value always sync with the real allocation. 
> > I.e. the counted_by member is
> > slowly increased during runtime (but not beyond the true allocation size).
> 
> Then there should be source code to increase the “counted_by” field whenever the allocated space increased too. 
> > 
> > Of course allocation size is only available in limited situations, so
> > the loss of that info is fine: we have counted_by for everything else.
> 
> The point is: allocation size should synced with the value of “counted_by”. LLVM’s RFC also have the similar requirement:
> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18

Right, I'm saying it would be nice if __alloc_size was checked as well,
in the sense that if it is available, it knows without question what the
size of the allocation is. If __alloc_size and __counted_by conflict,
the smaller of the two should be the truth.

But, as I said, if there is some need to explicitly ignore __alloc_size
when __counted_by is present, I can live with it; we just need to
document it.

If the RFC and you agree that the __counted_by variable can only ever be
(re)assigned after the flex array has been (re)allocated, then I guess
we'll see how it goes. :) I think most places in the kernel using
__counted_by will be fine, but I suspect we may have cases where we need
to update it like in the loop I described above. If that's true, we can
revisit the requirement then. :)

-Kees

-- 
Kees Cook

  reply	other threads:[~2023-07-17 23:40 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-25 16:14 Qing Zhao
2023-05-25 16:14 ` [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896) Qing Zhao
2023-05-25 21:02   ` Joseph Myers
2023-05-26 13:32     ` Qing Zhao
2023-05-26 18:15       ` Joseph Myers
2023-05-26 19:09         ` Qing Zhao
2023-06-07 19:59         ` Qing Zhao
2023-06-07 20:53           ` Joseph Myers
2023-06-07 21:32             ` Qing Zhao
2023-06-07 22:05               ` Joseph Myers
2023-06-08 13:06                 ` Qing Zhao
2023-06-15 15:09                 ` Qing Zhao
2023-06-15 16:55                   ` Joseph Myers
2023-06-15 19:54                     ` Qing Zhao
2023-06-15 22:48                       ` Joseph Myers
2023-06-16 15:01                         ` Qing Zhao
2023-06-16  7:21                     ` Martin Uecker
2023-06-16 15:14                       ` Qing Zhao
2023-06-16 16:21                       ` Joseph Myers
2023-06-16 17:07                         ` Martin Uecker
2023-06-16 20:20                           ` Qing Zhao
2023-06-16 21:35                             ` Joseph Myers
2023-06-20 19:40                               ` Qing Zhao
2023-06-27 15:44                                 ` Qing Zhao
2023-05-25 16:14 ` [V1][PATCH 2/3] Use the element_count atribute info in builtin object size [PR108896] Qing Zhao
2023-05-27 10:20   ` Martin Uecker
2023-05-30 16:08     ` Qing Zhao
2023-05-25 16:14 ` [V1][PATCH 3/3] Use the element_count attribute information in bound sanitizer[PR108896] Qing Zhao
2023-05-26 16:12 ` [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896) Kees Cook
2023-05-30 21:44   ` Qing Zhao
2023-05-26 20:40 ` Kees Cook
2023-05-30 15:43   ` Qing Zhao
2023-07-06 18:56   ` Qing Zhao
2023-07-06 21:10     ` Martin Uecker
2023-07-07 15:47       ` Qing Zhao
2023-07-07 20:21         ` Qing Zhao
2023-07-13 20:31     ` Kees Cook
2023-07-17 21:17       ` Qing Zhao
2023-07-17 23:40         ` Kees Cook [this message]
2023-07-18 15:37           ` Qing Zhao
2023-07-18 16:03             ` Martin Uecker
2023-07-18 16:25               ` Qing Zhao
2023-07-18 16:50                 ` Martin Uecker
2023-07-18 18:53             ` Qing Zhao
2023-07-19  8:41           ` Martin Uecker
2023-07-19 16:16           ` Qing Zhao
2023-07-19 18:52           ` Qing Zhao
2023-07-31 20:14             ` Qing Zhao
2023-08-01 22:45               ` Kees Cook
2023-08-02  6:25                 ` Martin Uecker
2023-08-02 15:02                   ` Qing Zhao
2023-08-02 15:09                 ` Qing Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202307171612.406D82C39@keescook \
    --to=keescook@chromium.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=isanbard@gmail.com \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=qing.zhao@oracle.com \
    --cc=richard.guenther@gmail.com \
    --cc=siddhesh@gotplt.org \
    --cc=uecker@tugraz.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).