From: Ken Matsui <kmatsui@cs.washington.edu>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Ken Matsui <kmatsui@gcc.gnu.org>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y
Date: Thu, 7 Mar 2024 11:29:08 -0800 [thread overview]
Message-ID: <CAML+3pXodx1R9HoAyDZaSqVNC8=qjA4TU-e4A-FMo-t+W1ttCQ@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc3f4=WenmsdYPK+2X-My0NyBsEpMYeOZFRSZec0U-bwSw@mail.gmail.com>
On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
> >
> > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
> > > >
> > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > integers and CMP is <, <=, >, or >=. Similarly, 0 CMP (x - y) is
> > > > equivalent to y CMP x. As reported in PR middle-end/113680, this
> > > > equivalence does not hold for types other than signed integers. When
> > > > it comes to conditions, the former was translated to a combination of
> > > > sub and test, whereas the latter was translated to a single cmp.
> > > > Thus, this optimization pass tries to optimize the former to the
> > > > latter.
> > > >
> > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > as defined behavior, specifically, wrapping around according to two's
> > > > complement arithmetic. This has implications for optimizations that
> > > > rely on the standard behavior of signed integers, where overflow is
> > > > undefined. Consider the example given:
> > > >
> > > > long long llmax = __LONG_LONG_MAX__;
> > > > long long llmin = -llmax - 1;
> > > >
> > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > simplifies to `2 * llmax + 1`. Given that `llmax` is the maximum value
> > > > for a `long long`, this calculation overflows in a defined manner
> > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > produces a negative value due to two's complement wraparound.
> > > > Therefore, `llmax - llmin < 0` is true.
> > > >
> > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > is the maximum possible value and `llmin` is the minimum. Hence,
> > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > enabled. This is why this optimization pass is disabled under
> > > > `-fwrapv`.
> > > >
> > > > This optimization pass must run before the Jump Threading pass and the
> > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > >
> > > > (1)
> > > > int diff = x - y;
> > > > if (diff > 0)
> > > > foo();
> > > > if (diff < 0)
> > > > bar();
> > > >
> > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > because we know the postcondition of the first condition is diff <= 0,
> > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > after this VRP, we get:
> > > >
> > > > (2)
> > > > int diff = x - y;
> > > > if (x > y)
> > > > foo();
> > > > if (diff != 0)
> > > > bar();
> > > >
> > > > This generates sub and test for the second condition and cmp for the
> > > > first condition. However, if we apply this pass beforehand, we simply
> > > > get:
> > > >
> > > > (3)
> > > > int diff = x - y;
> > > > if (x > y)
> > > > foo();
> > > > if (x < y)
> > > > bar();
> > > >
> > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > will not be generated, which is more efficient.
> > > >
> > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > (3) above are recognized as different, which prevents TCO.
> > > >
> > > > PR middle-end/113680
> > >
> > > This shouldn't be done as a new optimization pass. It fits either
> > > the explicit code present in the forwprop pass or a new match.pd
> > > pattern. There's possible interaction with x - y value being used
> > > elsewhere and thus exposing a CSE opportunity as well as
> > > a comparison against zero being possibly implemented by
> > > a flag setting subtraction instruction.
> > >
> >
> > Thank you so much for your review! Although the forwprop pass runs
> > multiple times, we might not need to apply this optimization multiple
> > times. Would it be acceptable to add such optimization? More
> > generally, I would like to know how to determine where to put
> > optimization in the future.
>
> This kind of pattern matching expression simplification is best
> addressed by patterns in match.pd though historically the forwprop
> pass still catches cases not in match.pd in its
> forward_propagate_into_comparison_1 (and callers).
>
I see. When would patterns in match.pd be applied? Around forwprop
or somewhere else? (Also, could you please tell me a document I can
learn about these if it exists?) I ask because this optimization
should be applied before the Jump Threading and VRP passes.
> > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
> >
> > > Our VN pass has some tricks to anticipate CSE opportunities
> > > like this, but it's not done "properly" in the way of anticipating
> > > both forms during PRE.
> > >
> > > I'll note we have
> > >
> > > /* (A - B) != 0 ? (A - B) : (B - A) same as (A - B) */
> > > (for cmp (ne ltgt)
> > >
> > > and similar which might be confused by canonicalizing to A != B.
> >
> > I will investigate and update my patch (after my final exam ends...)!
> >
> > > I'm also surprised we don't already have the pattern you add.
> >
> > Hmm, so am I.
>
> It looks like we do it for equality compares which can also handle
> types where overflow is undefined. -fdump-tree-all-folding shows
> in the 005t.original dump we simplify a - b != 0 via match.pd:6105:
>
> /* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
> ??? The transformation is valid for the other operators if overflow
> is undefined for the type, but performing it here badly interacts
> with the transformation in fold_cond_expr_with_comparison which
> attempts to synthetize ABS_EXPR. */
> (for cmp (eq ne)
> (for sub (minus pointer_diff)
> (simplify
> (cmp (sub@2 @0 @1) integer_zerop)
> (if (single_use (@2))
> (cmp @0 @1)))))
>
> and the comment explains why we don't perform the transform
> it the way you intend. That comment might no longer apply
> but I would guess there's testsuite fallout when you amend this
> pattern.
Thank you so much! I will check this out.
>
> Richard.
>
> > I saw that this optimization sometimes messes up VRP
> > while sometimes helping it (as I described in my comment). I also
> > need to research this.
> >
> > >
> > > Richard.
> > >
> > > > gcc/ChangeLog:
> > > >
> > > > * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > > > * common.opt: Define ftree-cmp
> > > > * doc/invoke.texi: Document ftree-cmp.
> > > > * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > > > * passes.def (pass_cmp): New optimization pass.
> > > > * timevar.def (TV_TREE_CMP): New variable for timing.
> > > > * tree-pass.h (make_pass_cmp): New declaration.
> > > > * tree-ssa-cmp.cc: New file.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/pr113680.c: New test.
> > > >
> > > > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > > > ---
> > > > gcc/Makefile.in | 1 +
> > > > gcc/common.opt | 4 +
> > > > gcc/doc/invoke.texi | 11 +-
> > > > gcc/opts.cc | 1 +
> > > > gcc/passes.def | 3 +
> > > > gcc/testsuite/gcc.dg/pr113680.c | 47 ++++++
> > > > gcc/timevar.def | 1 +
> > > > gcc/tree-pass.h | 1 +
> > > > gcc/tree-ssa-cmp.cc | 262 ++++++++++++++++++++++++++++++++
> > > > 9 files changed, 330 insertions(+), 1 deletion(-)
> > > > create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> > > > create mode 100644 gcc/tree-ssa-cmp.cc
> > > >
> > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > index a74761b7ab3..935b80b6947 100644
> > > > --- a/gcc/Makefile.in
> > > > +++ b/gcc/Makefile.in
> > > > @@ -1731,6 +1731,7 @@ OBJS = \
> > > > tree-ssa-address.o \
> > > > tree-ssa-alias.o \
> > > > tree-ssa-ccp.o \
> > > > + tree-ssa-cmp.o \
> > > > tree-ssa-coalesce.o \
> > > > tree-ssa-copy.o \
> > > > tree-ssa-dce.o \
> > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > index 51c4a17da83..7c853224458 100644
> > > > --- a/gcc/common.opt
> > > > +++ b/gcc/common.opt
> > > > @@ -3053,6 +3053,10 @@ ftree-ch
> > > > Common Var(flag_tree_ch) Optimization
> > > > Enable loop header copying on trees.
> > > >
> > > > +ftree-cmp
> > > > +Common Var(flag_tree_cmp) Optimization
> > > > +Enable SSA comparison optimization on trees.
> > > > +
> > > > ftree-coalesce-inlined-vars
> > > > Common Ignore RejectNegative
> > > > Does nothing. Preserved for backward compatibility.
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index bdf05be387d..04762d490a3 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > -fsplit-wide-types -fsplit-wide-types-early -fssa-backprop -fssa-phiopt
> > > > -fstdarg-opt -fstore-merging -fstrict-aliasing -fipa-strict-aliasing
> > > > -fthread-jumps -ftracer -ftree-bit-ccp
> > > > --ftree-builtin-call-dce -ftree-ccp -ftree-ch
> > > > +-ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-cmp
> > > > -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts
> > > > -ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting
> > > > -ftree-loop-if-convert -ftree-loop-im
> > > > @@ -12377,6 +12377,7 @@ compilation time.
> > > > -ftree-bit-ccp
> > > > -ftree-ccp
> > > > -ftree-ch
> > > > +-ftree-cmp
> > > > -ftree-coalesce-vars
> > > > -ftree-copy-prop
> > > > -ftree-dce
> > > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations. It also saves one jump. This flag
> > > > is enabled by default at @option{-O1} and higher. It is not enabled
> > > > for @option{-Os}, since it usually increases code size.
> > > >
> > > > +@opindex ftree-cmp
> > > > +@item -ftree-cmp
> > > > +Perform comparison optimization on trees. This decreases unnecessary
> > > > +instructions for comparisons. This flag is enabled by default at
> > > > +@option{-O1} and higher. This optimization is automatically disabled when
> > > > +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> > > > +overflow wrapping.
> > > > +
> > > > @opindex ftree-loop-optimize
> > > > @item -ftree-loop-optimize
> > > > Perform loop optimizations on trees. This flag is enabled by default
> > > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > > index 3333600e0ea..9ee64a48313 100644
> > > > --- a/gcc/opts.cc
> > > > +++ b/gcc/opts.cc
> > > > @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
> > > > { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
> > > > { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> > > > { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > > > + { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> > > > { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> > > > { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> > > > { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > > index 1cbbd413097..e28a2d3c309 100644
> > > > --- a/gcc/passes.def
> > > > +++ b/gcc/passes.def
> > > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3. If not see
> > > > /* After CCP we rewrite no longer addressed locals into SSA
> > > > form if possible. */
> > > > NEXT_PASS (pass_forwprop);
> > > > + /* Do cmp before early_thread_jumps and vrp since it may
> > > > + modify conditions. */
> > > > + NEXT_PASS (pass_cmp);
> > > > NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> > > > NEXT_PASS (pass_sra_early);
> > > > /* pass_build_ealias is a dummy pass that ensures that we
> > > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> > > > new file mode 100644
> > > > index 00000000000..59cd865aa21
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > > > @@ -0,0 +1,47 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > > > +
> > > > +volatile int i = 0;
> > > > +
> > > > +void foo() { i = 1; }
> > > > +void bar() { i = 2; }
> > > > +
> > > > +void f1(int x, int y)
> > > > +{
> > > > + int diff = x - y;
> > > > + if (diff > 0)
> > > > + foo();
> > > > + if (diff < 0)
> > > > + bar();
> > > > +}
> > > > +
> > > > +void f2(int x, int y)
> > > > +{
> > > > + if ((x - y) > 0)
> > > > + foo();
> > > > + if ((x - y) < 0)
> > > > + bar();
> > > > +}
> > > > +
> > > > +void f3(int x, int y)
> > > > +{
> > > > + if (x > y)
> > > > + foo();
> > > > + if (x < y)
> > > > + bar();
> > > > +}
> > > > +
> > > > +void f4(int x, int y)
> > > > +{
> > > > + int diff = x - y;
> > > > + if (diff > 0)
> > > > + foo();
> > > > + if (x < y)
> > > > + bar();
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> > > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> > > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> > > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> > > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > > index 8e2168e0817..468266d0c56 100644
> > > > --- a/gcc/timevar.def
> > > > +++ b/gcc/timevar.def
> > > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH , "tree eh")
> > > > DEFTIMEVAR (TV_TREE_CFG , "tree CFG construction")
> > > > DEFTIMEVAR (TV_TREE_CLEANUP_CFG , "tree CFG cleanup")
> > > > DEFTIMEVAR (TV_TREE_TAIL_MERGE , "tree tail merge")
> > > > +DEFTIMEVAR (TV_TREE_CMP , "tree cmp")
> > > > DEFTIMEVAR (TV_TREE_VRP , "tree VRP")
> > > > DEFTIMEVAR (TV_TREE_VRP_THREADER , "tree VRP threader")
> > > > DEFTIMEVAR (TV_TREE_EARLY_VRP , "tree Early VRP")
> > > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > > index 29267589eeb..71789a279c0 100644
> > > > --- a/gcc/tree-pass.h
> > > > +++ b/gcc/tree-pass.h
> > > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> > > > extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> > > > extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> > > > extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> > > > extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> > > > extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> > > > extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> > > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > > > new file mode 100644
> > > > index 00000000000..41d4a58a5b5
> > > > --- /dev/null
> > > > +++ b/gcc/tree-ssa-cmp.cc
> > > > @@ -0,0 +1,262 @@
> > > > +/* Global, SSA-based optimizations using comparison identities.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > +
> > > > +This file is part of GCC.
> > > > +
> > > > +GCC is free software; you can redistribute it and/or modify it
> > > > +under the terms of the GNU General Public License as published by the
> > > > +Free Software Foundation; either version 3, or (at your option) any
> > > > +later version.
> > > > +
> > > > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> > > > +for more details.
> > > > +
> > > > +You should have received a copy of the GNU General Public License
> > > > +along with GCC; see the file COPYING3. If not see
> > > > +<http://www.gnu.org/licenses/>. */
> > > > +
> > > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> > > > + CMP is <, <=, >, or >=. Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> > > > + As reported in PR middle-end/113680, this equivalence does not hold for
> > > > + types other than signed integers. When it comes to conditions, the former
> > > > + was translated to a combination of sub and test, whereas the latter was
> > > > + translated to a single cmp. Thus, this optimization pass tries to optimize
> > > > + the former to the latter.
> > > > +
> > > > + When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> > > > + defined behavior, specifically, wrapping around according to two's
> > > > + complement arithmetic. This has implications for optimizations that
> > > > + rely on the standard behavior of signed integers, where overflow is
> > > > + undefined. Consider the example given:
> > > > +
> > > > + long long llmax = __LONG_LONG_MAX__;
> > > > + long long llmin = -llmax - 1;
> > > > +
> > > > + Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > + simplifies to `2 * llmax + 1`. Given that `llmax` is the maximum value for
> > > > + a `long long`, this calculation overflows in a defined manner
> > > > + (wrapping around), which under `-fwrapv` is a legal operation that produces
> > > > + a negative value due to two's complement wraparound. Therefore,
> > > > + `llmax - llmin < 0` is true.
> > > > +
> > > > + However, the direct comparison `llmax < llmin` is false since `llmax` is the
> > > > + maximum possible value and `llmin` is the minimum. Hence, optimizations
> > > > + that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> > > > + cannot be safely applied when `-fwrapv` is enabled. This is why this
> > > > + optimization pass is disabled under `-fwrapv`.
> > > > +
> > > > + This optimization pass must run before the Jump Threading pass and
> > > > + the VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > +
> > > > + (1)
> > > > + int diff = x - y;
> > > > + if (diff > 0)
> > > > + foo();
> > > > + if (diff < 0)
> > > > + bar();
> > > > +
> > > > + The second condition would be converted to diff != 0 in the VRP pass because
> > > > + we know the postcondition of the first condition is diff <= 0, and then
> > > > + diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> > > > + we get:
> > > > +
> > > > + (2)
> > > > + int diff = x - y;
> > > > + if (x > y)
> > > > + foo();
> > > > + if (diff != 0)
> > > > + bar();
> > > > +
> > > > + This generates sub and test for the second condition and cmp for the first
> > > > + condition. However, if we apply this pass beforehand, we simply get:
> > > > +
> > > > + (3)
> > > > + int diff = x - y;
> > > > + if (x > y)
> > > > + foo();
> > > > + if (x < y)
> > > > + bar();
> > > > +
> > > > + In this code, diff will be eliminated as a dead code, and sub and test will
> > > > + not be generated, which is more efficient.
> > > > +
> > > > + For the Jump Threading pass, without this optimization pass, (1) and (3)
> > > > + above are recognized as different, which prevents TCO. */
> > > > +
> > > > +#include "config.h"
> > > > +#include "system.h"
> > > > +#include "coretypes.h"
> > > > +#include "backend.h"
> > > > +#include "tree.h"
> > > > +#include "gimple.h"
> > > > +#include "tree-pass.h"
> > > > +#include "ssa.h"
> > > > +#include "gimple-iterator.h"
> > > > +#include "domwalk.h"
> > > > +
> > > > +/* True if VAR is a signed integer, false otherwise. */
> > > > +
> > > > +static bool
> > > > +is_signed_integer(const_tree var)
> > > > +{
> > > > + return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> > > > + && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > > > +}
> > > > +
> > > > +/* If EXPR is an integer zero, return it. If EXPR is SSA_NAME, and its
> > > > + defining statement is a subtraction of two signed integers, return the
> > > > + MINUS_EXPR. Otherwise, return NULL_TREE. */
> > > > +
> > > > +static const_tree
> > > > +prop_integer_zero_or_minus_expr (const_tree expr)
> > > > +{
> > > > + if (integer_zerop (expr))
> > > > + return expr;
> > > > +
> > > > + if (TREE_CODE (expr) != SSA_NAME)
> > > > + return NULL_TREE;
> > > > +
> > > > + gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> > > > + if (!is_gimple_assign (defining_stmt)
> > > > + || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> > > > + return NULL_TREE;
> > > > +
> > > > + tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> > > > + if (!is_signed_integer (rhs1))
> > > > + return NULL_TREE;
> > > > +
> > > > + tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> > > > + if (!is_signed_integer (rhs2))
> > > > + return NULL_TREE;
> > > > +
> > > > + return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > > > +}
> > > > +
> > > > +/* Optimize:
> > > > +
> > > > + 1. (x - y) CMP 0
> > > > + 2. 0 CMP (x - y)
> > > > +
> > > > + to:
> > > > +
> > > > + 1. x CMP y
> > > > + 2. y CMP x
> > > > +
> > > > + where CMP is either <, <=, >, or >=. */
> > > > +
> > > > +static void
> > > > +optimize_signed_comparison (gcond *stmt)
> > > > +{
> > > > + const enum tree_code code = gimple_cond_code (stmt);
> > > > + if (code < LT_EXPR || GE_EXPR < code)
> > > > + return;
> > > > +
> > > > + const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> > > > + if (lhs == NULL_TREE)
> > > > + return;
> > > > +
> > > > + const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> > > > + if (rhs == NULL_TREE)
> > > > + return;
> > > > +
> > > > + if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> > > > + {
> > > > + /* Case 1: (x - y) CMP 0 */
> > > > + tree x = TREE_OPERAND (lhs, 0);
> > > > + tree y = TREE_OPERAND (lhs, 1);
> > > > +
> > > > + /* => x CMP y */
> > > > + gimple_cond_set_lhs (stmt, x);
> > > > + gimple_cond_set_rhs (stmt, y);
> > > > + update_stmt (stmt);
> > > > + }
> > > > + else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> > > > + {
> > > > + /* Case 2: 0 CMP (x - y) */
> > > > + tree x = TREE_OPERAND (rhs, 0);
> > > > + tree y = TREE_OPERAND (rhs, 1);
> > > > +
> > > > + /* => y CMP x */
> > > > + gimple_cond_set_lhs (stmt, y);
> > > > + gimple_cond_set_rhs (stmt, x);
> > > > + update_stmt (stmt);
> > > > + }
> > > > +}
> > > > +
> > > > +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> > > > + 0, and replace them with the equivalent comparison of the operands of
> > > > + the MINUS_EXPR without the subtraction. */
> > > > +
> > > > +namespace {
> > > > +
> > > > +const pass_data pass_data_cmp =
> > > > +{
> > > > + GIMPLE_PASS, /* type */
> > > > + "cmp", /* name */
> > > > + OPTGROUP_NONE, /* optinfo_flags */
> > > > + TV_TREE_CMP, /* tv_id */
> > > > + PROP_ssa, /* properties_required */
> > > > + 0, /* properties_provided */
> > > > + 0, /* properties_destroyed */
> > > > + 0, /* todo_flags_start */
> > > > + TODO_remove_unused_locals, /* todo_flags_finish */
> > > > +};
> > > > +
> > > > +class pass_cmp : public gimple_opt_pass
> > > > +{
> > > > +public:
> > > > + pass_cmp (gcc::context *ctxt)
> > > > + : gimple_opt_pass (pass_data_cmp, ctxt)
> > > > + {}
> > > > +
> > > > + /* opt_pass methods: */
> > > > + bool gate (function *) final override
> > > > + {
> > > > + /* If -fwrapv is enabled, this pass cannot be safely applied as described
> > > > + in the comment at the top of this file. */
> > > > + return flag_tree_cmp != 0 && flag_wrapv == 0;
> > > > + }
> > > > +
> > > > + unsigned int execute (function *) final override;
> > > > +
> > > > +}; // class pass_cmp
> > > > +
> > > > +class cmp_dom_walker : public dom_walker
> > > > +{
> > > > +public:
> > > > + cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > > > +
> > > > + void after_dom_children (basic_block) final override;
> > > > +};
> > > > +
> > > > +void
> > > > +cmp_dom_walker::after_dom_children (basic_block bb)
> > > > +{
> > > > + gimple_stmt_iterator gsi;
> > > > +
> > > > + for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > + {
> > > > + gimple *stmt = gsi_stmt (gsi);
> > > > + if (gimple_code (stmt) == GIMPLE_COND)
> > > > + optimize_signed_comparison (as_a <gcond *> (stmt));
> > > > + }
> > > > +}
> > > > +
> > > > +
> > > > +unsigned int
> > > > +pass_cmp::execute (function *)
> > > > +{
> > > > + calculate_dominance_info (CDI_DOMINATORS);
> > > > + cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > + return 0;
> > > > +}
> > > > +
> > > > +} // anon namespace
> > > > +
> > > > +gimple_opt_pass *
> > > > +make_pass_cmp (gcc::context *ctxt)
> > > > +{
> > > > + return new pass_cmp (ctxt);
> > > > +}
> > > > --
> > > > 2.44.0
> > > >
next prev parent reply other threads:[~2024-03-07 19:29 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-04 20:38 Ken Matsui
2024-03-04 21:28 ` [PATCH v2] " Ken Matsui
2024-03-05 8:38 ` [PATCH] " Richard Biener
2024-03-05 12:51 ` Ken Matsui
2024-03-05 15:58 ` Richard Biener
2024-03-07 19:29 ` Ken Matsui [this message]
2024-03-08 6:48 ` Richard Biener
2024-03-08 17:49 ` Ken Matsui
2024-03-11 7:22 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAML+3pXodx1R9HoAyDZaSqVNC8=qjA4TU-e4A-FMo-t+W1ttCQ@mail.gmail.com' \
--to=kmatsui@cs.washington.edu \
--cc=gcc-patches@gcc.gnu.org \
--cc=kmatsui@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).