public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Aldy Hernandez <aldyh@redhat.com>, Jakub Jelinek <jakub@redhat.com>
Cc: Martin Sebor <msebor@redhat.com>, GCC patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Convert strlen pass from evrp to ranger.
Date: Fri, 8 Oct 2021 10:51:53 -0600	[thread overview]
Message-ID: <fa8ccb2e-39fd-48d9-fc29-1760bff311d5@gmail.com> (raw)
In-Reply-To: <20211008151222.37790-1-aldyh@redhat.com>

On 10/8/21 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
> The following patch converts the strlen pass from evrp to ranger,
> leaving DOM as the last remaining user.

Thanks for doing this.  I know I said I'd work on it but I'm still
bogged down in my stage 1 work that's not going so great :(  I just
have a few minor comments/questions on the strlen change (inline)
but I am a bit concerned about the test failure.

> 
> No additional cleanups have been done.  For example, the strlen pass
> still has uses of VR_ANTI_RANGE, and the sprintf still passes around
> pairs of integers instead of using a proper range.  Fixing this
> could further improve these passes.
> 
> As a further enhancement, if the relevant maintainers deem useful,
> the domwalk could be removed from strlen.  That is, unless the pass
> needs it for something else.
> 
> With ranger we are now able to remove the range calculation from
> before_dom_children entirely.  Just working with the ranger on-demand
> catches all the strlen and sprintf testcases with the exception of
> builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
> code.  I have XFAILed the test and documented what the problem is.

builtin-sprintf-warn-22.c is a regression test for a false positive
in Glibc.  If it fails we'll have to deal with the Glibc failure
again, which I would rather avoid.  Have you checked to see if
Glibc is affected by the change?

> 
> It looks like the same problem in the sprintf test triggers a false
> positive in gimple-ssa-warn-access.cc so I have added
> -Wno-format-overflow until it can be fixed.
> 
> I can expand on the false positive if necessary, but the gist is that
> this:
> 
>      _17 = strlen (_132);
>      _18 = strlen (_136);
>      _19 = _18 + _17;
>      if (_19 > 75)
>        goto <bb 59>; [0.00%]
>      else
>        goto <bb 61>; [100.00%]
> 
> ...dominates the sprintf in BB61.  This means that ranger can figure
> out that the _17 and _18 are [0, 75].  On the other hand, evrp
> returned a range of [0, 9223372036854775805] which presumably the
> sprintf code was ignoring as a false positive here:

This is a feature designed to avoid false positives when the sprintf
pass doesn't know anything about the strings (i.e., their lengths
are unconstrained by either the sizes of the arrays they're stored
in or any expressions like asserts involving their lengths).

It sounds like the strlen/ranger improvement partially propagates
constraints from subsequent expressions into the strlen results
but it doesn't go far enough for them to actually fully satisfy
the constraint, which is what in turn triggers the warning.

I.e., in the test:

void g (char *s1, char *s2)
{
   char b[1025];
   size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
   if (n + d + 1 >= 1025)
     return;

   sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }

the range of n and d is [0, INF] and so the sprintf call doesn't
trigger a warning.  With your change, because their range is
[0, 1023] each (and there's no way to express that their sum
is less than 1025), the warning triggers because it considers
the worst case scenario (the upper bounds of both).

> 
> 	      char sizstr[80];
> 	      ...
> 	      ...
> 	      char *s1 = print_generic_expr_to_str (sizrng[1]);
> 	      gcc_checking_assert (strlen (s0) + strlen (s1)
> 				   < sizeof sizstr - 4);
> 	      sprintf (sizstr, "[%s, %s]", s0, s1);
> 
> The warning triggers with:
> 
> gimple-ssa-warn-access.cc: In member function ‘void {anonymous}::pass_waccess::maybe_check_access_sizes(rdwr_map*, tree, tree, gimple*)’:
> gimple-ssa-warn-access.cc:2916:32: warning: ‘%s’ directive writing up to 75 bytes into a region of size between 2 and 77 [-Wformat-overflow=]
>   2916 |               sprintf (sizstr, "[%s, %s]", s0, s1);
>        |                                ^~~~~~~~~~
> gimple-ssa-warn-access.cc:2916:23: note: ‘sprintf’ output between 5 and 155 bytes into a destination of size 80
>   2916 |               sprintf (sizstr, "[%s, %s]", s0, s1);
>        |               ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

Yes, that does look like the same problem.  It's a side-effect
of the checking_assert.  What's troubling is that it's one that
has exactly the opposite effect of what's intended: it causes
warnings when it's intended to avoid them, which was the main
goal of the strlen/sprintf integration.

Suppressing the warning in these cases, while technically simple,
would be a design change.  We might just have to live with this.
The asserts still work to constrain individual lenghts, they just
won't work for more complex expressions involving relationships
between two or more strings.

> On a positive note, these changes found two possible sprintf overflow
> bugs in the C++ and Fortran front-ends which I have fixed below.

That's good to hear! :)

> 
> Bootstrap and regtested on x86-64 Linux.  I also ran it through our
> callgrind harness and there was no overall change in overall
> compilation time.
> 
> OK?
> 
...
> @@ -269,7 +270,7 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off,
>       return -1;
>   
>     value_range vr;
> -  if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt))
> +  if (!rvals->range_of_expr (vr, si->nonzero_chars, stmt))

We need stmt rather than si->stmt because the latter may be different
or null when the former will always refer to the current statement,
correct?

>       return -1;
>     value_range_kind rng = vr.kind ();
>     if (rng != VR_RANGE)
> @@ -324,7 +325,8 @@ get_next_strinfo (strinfo *si)
>      information.  */
>   
>   static int
> -get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out,
> +get_addr_stridx (tree exp, gimple *stmt,
> +		 tree ptr, unsigned HOST_WIDE_INT *offset_out,
>   		 range_query *rvals = NULL)

I think there still are quite a few calls to get_addr_stridx() and
get_addr() that don't pass in rvals.  I started doing this back in
the GCC 11 cycle but didn't finish the job.  Eventually, rvals
should be passed to all their callers (or they made class members
with a ranger member).  I mention this in case it has an effect on
the range propagation (since the pass also sets ranges).

>   {
>     HOST_WIDE_INT off;
...
> @@ -4653,7 +4662,7 @@ count_nonzero_bytes_addr (tree exp, unsigned HOST_WIDE_INT offset,
>   	       && TREE_CODE (si->nonzero_chars) == SSA_NAME)
>   	{
>   	  value_range vr;
> -	  rvals->range_of_expr (vr, si->nonzero_chars, si->stmt);
> +	  rvals->range_of_expr (vr, si->nonzero_chars, stmt);


Same question about si->stmt.  (Just making sure I understand.)

Thanks
Martin

  reply	other threads:[~2021-10-08 16:51 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 15:12 Aldy Hernandez
2021-10-08 16:51 ` Martin Sebor [this message]
2021-10-08 17:56   ` Andrew MacLeod
2021-10-08 20:27     ` Martin Sebor
2021-10-09 15:04   ` Aldy Hernandez
2021-10-09 16:19     ` Martin Sebor
2021-10-09 17:59       ` Martin Sebor
2021-10-11  6:54         ` Aldy Hernandez
2021-10-09 18:47 ` Aldy Hernandez
2021-10-14 22:07   ` Martin Sebor
2021-10-14 23:45     ` Jeff Law
2021-10-15  0:47     ` Andrew MacLeod
2021-10-15 10:39       ` Aldy Hernandez
2021-10-17 22:49         ` Jeff Law
2021-10-18  7:43           ` Aldy Hernandez
2021-10-22 11:11         ` Aldy Hernandez
2021-10-29 20:04         ` Aldy Hernandez
2021-11-09  0:09         ` Jeff Law
2021-10-17 22:52 ` Jeff Law
2021-10-18  8:17   ` Aldy Hernandez
2021-10-20 20:58     ` Jeff Law
2021-10-21  7:42       ` Aldy Hernandez
2021-10-21 18:20         ` Jeff Law
2021-10-23 21:32           ` Jeff Law
2021-10-25  1:59             ` Jeff Law
2021-10-21 10:20       ` Richard Biener
2021-10-21 12:56         ` Aldy Hernandez
2021-10-21 13:14           ` Richard Biener
2021-10-21 13:30             ` Aldy Hernandez
2021-10-21 13:46               ` Richard Biener
2021-10-21 14:17                 ` Aldy Hernandez
2021-10-21 13:43           ` Jeff Law
2021-10-21 14:18             ` Aldy Hernandez
2021-10-25  2:15     ` Jeff Law
2021-10-25  4:42       ` Jeff Law
2021-10-25 11:27         ` Aldy Hernandez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa8ccb2e-39fd-48d9-fc29-1760bff311d5@gmail.com \
    --to=msebor@gmail.com \
    --cc=aldyh@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=msebor@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).