public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Qing Zhao <qing.zhao@oracle.com>
Cc: David Malcolm <dmalcolm@redhat.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	 Andrew Pinski <pinskia@gmail.com>,
	kees Cook <keescook@chromium.org>
Subject: Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading
Date: Mon, 3 Jun 2024 08:29:51 +0200	[thread overview]
Message-ID: <CAFiYyc3wNHmGh-Bq7NXtvmbnyVijFakJ-SZwRpde28BAB0FxBA@mail.gmail.com> (raw)
In-Reply-To: <6D93ECB0-17CF-4BF8-8644-83EC1955DCEC@oracle.com>

On Fri, May 31, 2024 at 11:23 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
>
>
> > On May 23, 2024, at 07:46, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Wed, May 22, 2024 at 8:53 PM Qing Zhao <qing.zhao@oracle.com> wrote:
> >>
> >>
> >>
> >>> On May 22, 2024, at 03:38, Richard Biener <richard.guenther@gmail.com> wrote:
> >>>
> >>> On Tue, May 21, 2024 at 11:36 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >>>>
> >>>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
> >>>>> Thanks for the comments and suggestions.
> >>>>>
> >>>>>> On May 15, 2024, at 10:00, David Malcolm <dmalcolm@redhat.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> >>>>>>> On Mon, 13 May 2024, Qing Zhao wrote:
> >>>>>>>
> >>>>>>>> -Warray-bounds is an important option to enable linux kernal to
> >>>>>>>> keep
> >>>>>>>> the array out-of-bound errors out of the source tree.
> >>>>>>>>
> >>>>>>>> However, due to the false positive warnings reported in
> >>>>>>>> PR109071
> >>>>>>>> (-Warray-bounds false positive warnings due to code duplication
> >>>>>>>> from
> >>>>>>>> jump threading), -Warray-bounds=1 cannot be added on by
> >>>>>>>> default.
> >>>>>>>>
> >>>>>>>> Although it's impossible to elinimate all the false positive
> >>>>>>>> warnings
> >>>>>>>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> >>>>>>>> documentation says "always out of bounds"), we should minimize
> >>>>>>>> the
> >>>>>>>> false positive warnings in -Warray-bounds=1.
> >>>>>>>>
> >>>>>>>> The root reason for the false positive warnings reported in
> >>>>>>>> PR109071 is:
> >>>>>>>>
> >>>>>>>> When the thread jump optimization tries to reduce the # of
> >>>>>>>> branches
> >>>>>>>> inside the routine, sometimes it needs to duplicate the code
> >>>>>>>> and
> >>>>>>>> split into two conditional pathes. for example:
> >>>>>>>>
> >>>>>>>> The original code:
> >>>>>>>>
> >>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
> >>>>>>>> {
> >>>>>>>> if (index >= 4)
> >>>>>>>>   warn ();
> >>>>>>>> *ptr = 0;
> >>>>>>>> *val = sg->vals[index];
> >>>>>>>> if (index >= 4)
> >>>>>>>>   warn ();
> >>>>>>>> *ptr = *val;
> >>>>>>>>
> >>>>>>>> return;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> With the thread jump, the above becomes:
> >>>>>>>>
> >>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
> >>>>>>>> {
> >>>>>>>> if (index >= 4)
> >>>>>>>>   {
> >>>>>>>>     warn ();
> >>>>>>>>     *ptr = 0;         // Code duplications since "warn" does
> >>>>>>>> return;
> >>>>>>>>     *val = sg->vals[index];   // same this line.
> >>>>>>>>                               // In this path, since it's
> >>>>>>>> under
> >>>>>>>> the condition
> >>>>>>>>                               // "index >= 4", the compiler
> >>>>>>>> knows
> >>>>>>>> the value
> >>>>>>>>                               // of "index" is larger then 4,
> >>>>>>>> therefore the
> >>>>>>>>                               // out-of-bound warning.
> >>>>>>>>     warn ();
> >>>>>>>>   }
> >>>>>>>> else
> >>>>>>>>   {
> >>>>>>>>     *ptr = 0;
> >>>>>>>>     *val = sg->vals[index];
> >>>>>>>>   }
> >>>>>>>> *ptr = *val;
> >>>>>>>> return;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> We can see, after the thread jump optimization, the # of
> >>>>>>>> branches
> >>>>>>>> inside
> >>>>>>>> the routine "sparx5_set" is reduced from 2 to 1, however,  due
> >>>>>>>> to
> >>>>>>>> the
> >>>>>>>> code duplication (which is needed for the correctness of the
> >>>>>>>> code),
> >>>>>>>> we
> >>>>>>>> got a false positive out-of-bound warning.
> >>>>>>>>
> >>>>>>>> In order to eliminate such false positive out-of-bound warning,
> >>>>>>>>
> >>>>>>>> A. Add one more flag for GIMPLE: is_splitted.
> >>>>>>>> B. During the thread jump optimization, when the basic blocks
> >>>>>>>> are
> >>>>>>>>  duplicated, mark all the STMTs inside the original and
> >>>>>>>> duplicated
> >>>>>>>>  basic blocks as "is_splitted";
> >>>>>>>> C. Inside the array bound checker, add the following new
> >>>>>>>> heuristic:
> >>>>>>>>
> >>>>>>>> If
> >>>>>>>>  1. the stmt is duplicated and splitted into two conditional
> >>>>>>>> paths;
> >>>>>>>> +  2. the warning level < 2;
> >>>>>>>> +  3. the current block is not dominating the exit block
> >>>>>>>> Then not report the warning.
> >>>>>>>>
> >>>>>>>> The false positive warnings are moved from -Warray-bounds=1 to
> >>>>>>>> -Warray-bounds=2 now.
> >>>>>>>>
> >>>>>>>> Bootstrapped and regression tested on both x86 and aarch64.
> >>>>>>>> adjusted
> >>>>>>>> -Warray-bounds-61.c due to the false positive warnings.
> >>>>>>>>
> >>>>>>>> Let me know if you have any comments and suggestions.
> >>>>>>>
> >>>>>>> At the last Cauldron I talked with David Malcolm about these kind
> >>>>>>> of
> >>>>>>> issues and thought of instead of suppressing diagnostics to
> >>>>>>> record
> >>>>>>> how a block was duplicated.  For jump threading my idea was to
> >>>>>>> record
> >>>>>>> the condition that was proved true when entering the path and do
> >>>>>>> this
> >>>>>>> by recording the corresponding locations
> >>>>>
> >>>>> Is only recording the location for the TRUE path  enough?
> >>>>> We might need to record the corresponding locations for both TRUE and
> >>>>> FALSE paths since the VRP might be more accurate on both paths.
> >>>>> Is only recording the location is enough?
> >>>>> Do we need to record the pointer to the original condition stmt?
> >>>>
> >>>> Just to be clear: I don't plan to work on this myself (I have far too
> >>>> much already to work on...); I'm assuming Richard Biener is
> >>>> hoping/planning to implement this himself.
> >>>
> >>> While I think some of this might be an improvement to those vast
> >>> number of "false positive" diagnostics we have from (too) late diagnostic
> >>> passes I do not have the cycles to work on this.
> >>
> >> I can study a little bit more on how to improve the diagnostics for PR 109071 first.
> >>
> >> FYI, I just updated PR109071’s description as: Need more context for -Warray-bounds warnings due to code duplication from jump threading.
> >>
> >> As we already agreed, this is NOT a false positive. It caught a real bug in linux kernel that need to be patched in the kernel source. (See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)
> >>
> >> In order to add more context to the diagnostics to help the end user locate the bug accurately in their source code, compiler needs:
> >>
> >> 1. During jump threading phase, record the following information for each duplicated STMT:
> >>        A. A pointer to the COND stmt;
> >>        B. True/false for such COND
> >> 2. During array out-of-bound checking phase, whenever locate an out-of-bound case,
> >>        A. Use a rich_location for the diagnostic;
> >>        B. Create an instance of diagnositic_path, add events to this diagnostic_path based on the COND stmt, True/False info recorded in the STMT;
> >>        C. Call richloc.set_path() to associate the path with the rich_location;
> >>        D. Then issue warning with this rich_location instead of the current regular location.
> >>
> >> Any comment and suggestion to the above?
> >
> > I was originally thinking of using location_adhoc_data to store 1.A
> > and 1.B as a common bit to associate to each
> > copied stmt.  IIRC location_adhoc_data->data is the stmts associated
> > lexical block so we'd need to stuff another
> > 'data' field into this struct, like a "copy history" where we could
> > then even record copied-of-copy-of-X.
> > locataion_adhoc_data seems imperfectly packed right now, with a 32bit
> > hole before 'data' and 32bit unused
> > at its end, so we might get away without memory use by re-ordering
> > discriminator before 'data' ...
>
> Like this?
>
> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> index e6e2b0897572..ee344f91333b 100644
> --- a/libcpp/include/line-map.h
> +++ b/libcpp/include/line-map.h
> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro {
>  struct GTY(()) location_adhoc_data {
>    location_t locus;
>    source_range src_range;
> -  void * GTY((skip)) data;
>    unsigned discriminator;
> +  void * GTY((skip)) data;
> +  void * GTY((skip)) copy_history;
>  };
>    struct htab;

Yes.

> How about the copy_history? Do we need a new data structure (like the following, or other suggestion) for this? Where should I add this new data structure?

As it needs to be managed by libcpp it should be in this very same file.

> struct copy_history {
>   location_t condition;
>   Bool is_true_path;
> }

I think we want a pointer to the previous copy-of state as well in case a stmt
was copied twice.  We'll see whether a single (condition) location
plus edge flag
is sufficient.  I'd say we should plan for an enum to indicate the duplication
reason at least (jump threading, unswitching, unrolling come to my mind).  For
jump threading being able to say "when <condition> is true/false" is probably
good enough, though it might not always be easy to identify a single condition
here given a threading path starts at an incoming edge to a CFG merge and
will usually end with the outgoing edge of a condition that we can statically
evaluate.  The condition controlling the path entry might not be determined
fully by a single condition location.

Possibly building a full "diagnostic path" object at threading time might be
the only way to record all the facts, but that's also going to be more
expensive.
I can think of having a -fdiagnostic-try-to-explain-harder option to
clarify confusing
diagnostics people could use on-demand?

Richard.

> Qing
> >
> > Richard.
> >
> >> Thanks.
> >>
> >> Qing
> >>
> >>
> >>>
> >>> Richard.
> >>>
> >>>> But feel free to ask me any questions about the diagnostic_path
> >>>> machinery within the diagnostics subsystem.
> >>>>
> >>>> [...snip...]
> >>>>
> >>>> Regards
> >>>> Dave
>
>

  reply	other threads:[~2024-06-03  6:30 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-13 19:48 Qing Zhao
2024-05-13 20:46 ` Jeff Law
2024-05-13 21:41   ` Kees Cook
2024-05-13 23:38     ` Andrew Pinski
2024-05-14  0:14       ` Kees Cook
2024-05-14 14:57         ` Qing Zhao
2024-05-14 15:08           ` Jeff Law
2024-05-14 16:03             ` Qing Zhao
2024-05-14 13:08 ` Richard Biener
2024-05-14 14:17   ` Qing Zhao
2024-05-14 14:29     ` Richard Biener
2024-05-14 15:19       ` Qing Zhao
2024-05-14 17:14         ` Richard Biener
2024-05-14 17:21           ` Qing Zhao
2024-05-15  6:09             ` Richard Biener
2024-05-15 13:37               ` Qing Zhao
2024-05-14 19:50     ` Kees Cook
2024-05-15  5:50       ` Richard Biener
2024-05-15 14:00   ` David Malcolm
2024-05-21 15:13     ` Qing Zhao
2024-05-21 21:36       ` David Malcolm
2024-05-22  7:38         ` Richard Biener
2024-05-22 18:53           ` Qing Zhao
2024-05-23 11:46             ` Richard Biener
2024-05-23 14:03               ` Qing Zhao
2024-05-23 14:13                 ` David Malcolm
2024-05-23 14:23                   ` Qing Zhao
2024-05-31 21:22               ` Qing Zhao
2024-06-03  6:29                 ` Richard Biener [this message]
2024-06-03 14:33                   ` Qing Zhao
2024-06-03 14:48                   ` David Malcolm
2024-06-04  7:43                     ` Richard Biener
2024-06-04 20:30                       ` Qing Zhao
2024-06-05  7:26                         ` Richard Biener
2024-06-05 16:38                           ` Qing Zhao
2024-06-05 17:07                             ` Richard Biener
2024-06-05 17:58                               ` Qing Zhao
2024-06-07 19:13                                 ` Qing Zhao
2024-06-12 18:15                                   ` Qing Zhao
2024-05-14 16:43 ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc3wNHmGh-Bq7NXtvmbnyVijFakJ-SZwRpde28BAB0FxBA@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=keescook@chromium.org \
    --cc=pinskia@gmail.com \
    --cc=qing.zhao@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).