public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] cache compute_objsize results in strlen/sprintf (PR 97373)
Date: Thu, 5 Nov 2020 08:20:20 -0700	[thread overview]
Message-ID: <bd5c263a-4a5a-4e19-e445-31db57bb224e@gmail.com> (raw)
In-Reply-To: <CAFiYyc3ZsC3tWmvCAGUWyXxNd-kXk0UocB36obHDAeNfFmeG9A@mail.gmail.com>

On 11/5/20 12:31 AM, Richard Biener wrote:
> On Thu, Nov 5, 2020 at 1:59 AM Martin Sebor via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> To determine the target of a pointer expression and the offset into
>> it, the increasingly widely used compute_objsize function traverses
>> the IL following the DEF statements of pointer variables, aggregating
>> offsets from POINTER_PLUS assignments along the way.  It does that
>> for many statements that involve pointers, including calls to
>> built-in functions and (so far only) accesses to char arrays.  When
>> a function has many such statements with pointers to the same objects
>> but with different offsets, the traversal ends up visiting the same
>> pointer assignments repeatedly and unnecessarily.
>>
>> To avoid this repeated traversal, the attached patch adds the ability
>> to cache results obtained in prior calls to the function.  The cache
>> is optional and only used when enabled.
>>
>> To exercise the cache I have enabled it for the strlen pass (which
>> is probably the heaviest compute_objsize user).  That happens to
>> resolve PR 97373 which tracks the pass' failure to detect sprintf
>> overflowing allocated buffers at a non-constant offset.  I thought
>> about making this a separate patch but the sprintf/strlen changes
>> are completely mechanical so it didn't seem worth the effort.
>>
>> In the benchmarking I've done the cache isn't a huge win there but
>> it does have a measurable difference in the project I'm wrapping up
>> where most pointer assignments need to be examined.  The space used
>> for the cache is negligible on average: fewer than 20 entries per
>> Glibc function and about 6 for GCC.  The worst case in Glibc is
>> 6 thousand entries and 10k in GCC.  Since the entries are sizable
>> (216 bytes each) the worst case memory consumption can be reduced
>> by adding a level of indirection.  A further savings can be obtained
>> by replacing some of the offset_int members of the entries with
>> HOST_WIDE_INT.
>>
>> The efficiency benefits of the cache should increase further as more
>> of the access checking code is integrated into the same pass.  This
>> should eventually include the checking currently done in the built-in
>> expanders.
>>
>> Tested on x86_64-linux, along with Glibc and Binutils/GDB.
> 
> I'm quite sure the objsz pass already has a cache, why not
> re-use it instead of piggy-backing another one onto its machinery?

compute_objsize() and the objsz pass are completely independent.
The pass is also quite limited in that it doesn't make use of
ranges.  That limitation was also the main reason for introducing
the compute_objsize() function.

I'd love to see the objsize pass and compute_objsize() integrated
and exposed under an interface similar to range_query, with
the information available anywhere, and on demand.  I might tackle
it some day, but I expect it will be a nontrivial project, much
bigger than letting compute_objsize() do this simple caching for
the time being.

Martin

> 
> Richard.
> 
>> Martin
>>
>> PS The patch add the new pointer_query class (loosely modeled on
>> range_query) to builtins.{h,c}.  This should be only temporary,
>> until the access checking code is moved into a file (and ultimately
>> a pass) of its own.


  reply	other threads:[~2020-11-05 15:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-05  0:58 Martin Sebor
2020-11-05  7:31 ` Richard Biener
2020-11-05 15:20   ` Martin Sebor [this message]
2020-11-05 15:29     ` Jakub Jelinek
2020-11-05 16:23       ` Martin Sebor
2020-11-23 21:04 ` Jeff Law
2020-12-01 20:57   ` Martin Sebor
2020-12-01 21:21     ` Martin Sebor
2020-12-01 23:38       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd5c263a-4a5a-4e19-e445-31db57bb224e@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).