From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=TCrD=KR=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b])
	by sourceware.org (Postfix) with ESMTPS id 53A533858422;
	Mon, 11 Mar 2024 07:22:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 53A533858422
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 53A533858422
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12b
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710141745; cv=none;
	b=JhpaOfsoZrtub8oAwK4cwsZorSr1uMFb7PnLLBHWdwU7imD+ch1LAO8D5DmKGgpLik+fyrFKUZqpmqG9mEuntVDU+PWH15jDvEBt2kUWPbOh+w5jnW6bpuhQnXk3j7n6FPzZTipf4Tf+0PqFhX3jvurQ1LKVtSH9B9M2lnoPhQU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1710141745; c=relaxed/simple;
	bh=CaJY5vPG8d8yTkEmhCq/Ty2FqBHN5z5eB+Q9S7hdAz0=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=U0NaSgzky0wTdWQUd33Brl79TAyRamgSH0lltSf2o2bG4nmnAV13jWsP/hzux3/3Q/mGvSKFj4lqKK9IReyF/bCbKsxI8TDusf6qtXWY94NGsemtbY7+iqK4R/V6wnOxjeagqio5yvnCd8h3d0rhsfOrmqgfmB/PoJnlt9378J0=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-513a08f2263so1181566e87.3;
        Mon, 11 Mar 2024 00:22:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1710141739; x=1710746539; darn=gcc.gnu.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=TZpn+CbGZmeIrRBzVbdJ52B/dB7XWpx5VDrcCdrD76o=;
        b=l28mSgae5XKH2WUywv9I2Cp9haJKjvSd+841O6wyKK7sr8UPvIEILTzhdTu3bdUfsN
         20+Jy7NsCLGkku+ZBvINjxVg87t+SDRQ12UaflUnex8rqvxoyTZN/YOIh8sSsayydocl
         EwyUq/vORpm10vHx7WIdYJseHBp52mH4UobJ6vwD+ghXtFhV6oD2YZuMxhChA51Xaw+1
         DKJqIJk3OAQah074kZx67AEGVjmOnOuKxnJ0s/QO6GLoaKA0qd/BdbrMbsFXi9zht4Tk
         QdranPGV+3E/O9HnG8tUH8AIxSQ/9Gm94/2v6sVeIxN8pye2ywldHLiKMz7j9AOLToxJ
         QJmA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1710141739; x=1710746539;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=TZpn+CbGZmeIrRBzVbdJ52B/dB7XWpx5VDrcCdrD76o=;
        b=FHv7eaGKqlXWlvd1KZVfxCDT+vVRRb6tFr7sMB3YAydUck6JmsCs6dRLKErQ67W++u
         wZJ9Lk63n61wHG5G7VaHK7lpr6THpG4DhTCFmDf4SBCPw3Nz1SLOW0B5dAmnFxU7+M/V
         dEqA3/k2Kc/WFizyobjLNDfhZKBdYfYy6JiJD+n1R7qMjU8fScmWlK+IBVI/bK6Sswzq
         CaQa5f6JNnuLFMVgx5uP5QsN1rjyHsVn/lB8uTdsmKiHLLRajGbBHifnjqr0kjyBLtKq
         fsXbjzuJ+2FLID8p27wLY+adQrq00DeUwocJjIz/HlbO5J9S1xFa6RiPSQnua8xcxfQA
         1AAw==
X-Forwarded-Encrypted: i=1; AJvYcCXyHaW/4/E8DmepAGXKMyiW7jPsisq6G2um/nXRI6nMTp9qLTRjSS1LyM9ILDsyXW1fN+foJ7nTppodoRPmf9xS63sv0MJYgA==
X-Gm-Message-State: AOJu0Yz0cjJoKg+gz9DI79Ail905B+dP72MEPfK9Mw3+npnCANva6vsx
	nuX15XPB2J5UXXwbx1QUNkE9gakLJpJWq/gHT7XSAH0e8cT9WAHrdGJp/gTVMtnOPjtz1ub7drY
	DAdOx4gER2a52BONYmsRwUdap/LUFqzdu
X-Google-Smtp-Source: AGHT+IGwmK9c4T2pS29dUFoUpg7kLPoQlkZKL8lUVoW+X6d6N1wRKXrq8zOU7+0/U1wDMzB7AVyaCn2MElcnD7Y/sQI=
X-Received: by 2002:a05:6512:4002:b0:513:aef9:7159 with SMTP id
 br2-20020a056512400200b00513aef97159mr730533lfb.39.1710141738274; Mon, 11 Mar
 2024 00:22:18 -0700 (PDT)
MIME-Version: 1.0
References: <20240304203830.3740968-1-kmatsui@gcc.gnu.org> <CAFiYyc2Ha7aMg-duH6a_ETL9XLa__L9ZZh8XJ12XYHAcuE6Ntw@mail.gmail.com>
 <CAML+3pWhwo9-4M6ADFKrvtNno4a6Xt2vqQ5P+Vmd=RWxvkJBSw@mail.gmail.com>
 <CAFiYyc3f4=WenmsdYPK+2X-My0NyBsEpMYeOZFRSZec0U-bwSw@mail.gmail.com>
 <CAML+3pXodx1R9HoAyDZaSqVNC8=qjA4TU-e4A-FMo-t+W1ttCQ@mail.gmail.com>
 <CAFiYyc1z=Ey+0bJuvzsmG4g4D=B7Hcq7k_qkv88F1Fr4bMgnig@mail.gmail.com> <CAML+3pUcefcGgtmO_jK06VV8gEUL7iaif0jERDLuzNDHQTxKaw@mail.gmail.com>
In-Reply-To: <CAML+3pUcefcGgtmO_jK06VV8gEUL7iaif0jERDLuzNDHQTxKaw@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 11 Mar 2024 08:22:06 +0100
Message-ID: <CAFiYyc0gLPvaP7Dks5rzknUyE_3np3grL9=QNxChquT91msyjg@mail.gmail.com>
Subject: Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y
To: Ken Matsui <kmatsui@cs.washington.edu>
Cc: Ken Matsui <kmatsui@gcc.gnu.org>, gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, Mar 8, 2024 at 6:50=E2=80=AFPM Ken Matsui <kmatsui@cs.washington.ed=
u> wrote:
>
> On Thu, Mar 7, 2024 at 10:49=E2=80=AFPM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Mar 7, 2024 at 8:29=E2=80=AFPM Ken Matsui <kmatsui@cs.washingto=
n.edu> wrote:
> > >
> > > On Tue, Mar 5, 2024 at 7:58=E2=80=AFAM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 1:51=E2=80=AFPM Ken Matsui <kmatsui@cs.washi=
ngton.edu> wrote:
> > > > >
> > > > > On Tue, Mar 5, 2024 at 12:38=E2=80=AFAM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Mar 4, 2024 at 9:40=E2=80=AFPM Ken Matsui <kmatsui@gcc.=
gnu.org> wrote:
> > > > > > >
> > > > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are sign=
ed
> > > > > > > integers and CMP is <, <=3D, >, or >=3D.  Similarly, 0 CMP (x=
 - y) is
> > > > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, =
this
> > > > > > > equivalence does not hold for types other than signed integer=
s.  When
> > > > > > > it comes to conditions, the former was translated to a combin=
ation of
> > > > > > > sub and test, whereas the latter was translated to a single c=
mp.
> > > > > > > Thus, this optimization pass tries to optimize the former to =
the
> > > > > > > latter.
> > > > > > >
> > > > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed =
integers
> > > > > > > as defined behavior, specifically, wrapping around according =
to two's
> > > > > > > complement arithmetic.  This has implications for optimizatio=
ns that
> > > > > > > rely on the standard behavior of signed integers, where overf=
low is
> > > > > > > undefined.  Consider the example given:
> > > > > > >
> > > > > > >         long long llmax =3D __LONG_LONG_MAX__;
> > > > > > >         long long llmin =3D -llmax - 1;
> > > > > > >
> > > > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - =
1)`, which
> > > > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the max=
imum value
> > > > > > > for a `long long`, this calculation overflows in a defined ma=
nner
> > > > > > > (wrapping around), which under `-fwrapv` is a legal operation=
 that
> > > > > > > produces a negative value due to two's complement wraparound.
> > > > > > > Therefore, `llmax - llmin < 0` is true.
> > > > > > >
> > > > > > > However, the direct comparison `llmax < llmin` is false since=
 `llmax`
> > > > > > > is the maximum possible value and `llmin` is the minimum.  He=
nce,
> > > > > > > optimizations that rely on the equivalence of `(x - y) CMP 0`=
 to
> > > > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fw=
rapv` is
> > > > > > > enabled.  This is why this optimization pass is disabled unde=
r
> > > > > > > `-fwrapv`.
> > > > > > >
> > > > > > > This optimization pass must run before the Jump Threading pas=
s and the
> > > > > > > VRP pass, as it may modify conditions. For example, in the VR=
P pass:
> > > > > > >
> > > > > > >         (1)
> > > > > > >           int diff =3D x - y;
> > > > > > >           if (diff > 0)
> > > > > > >             foo();
> > > > > > >           if (diff < 0)
> > > > > > >             bar();
> > > > > > >
> > > > > > > The second condition would be converted to diff !=3D 0 in the=
 VRP pass
> > > > > > > because we know the postcondition of the first condition is d=
iff <=3D 0,
> > > > > > > and then diff !=3D 0 is cheaper than diff < 0. If we apply th=
is pass
> > > > > > > after this VRP, we get:
> > > > > > >
> > > > > > >         (2)
> > > > > > >           int diff =3D x - y;
> > > > > > >           if (x > y)
> > > > > > >             foo();
> > > > > > >           if (diff !=3D 0)
> > > > > > >             bar();
> > > > > > >
> > > > > > > This generates sub and test for the second condition and cmp =
for the
> > > > > > > first condition. However, if we apply this pass beforehand, w=
e simply
> > > > > > > get:
> > > > > > >
> > > > > > >         (3)
> > > > > > >           int diff =3D x - y;
> > > > > > >           if (x > y)
> > > > > > >             foo();
> > > > > > >           if (x < y)
> > > > > > >             bar();
> > > > > > >
> > > > > > > In this code, diff will be eliminated as a dead code, and sub=
 and test
> > > > > > > will not be generated, which is more efficient.
> > > > > > >
> > > > > > > For the Jump Threading pass, without this optimization pass, =
(1) and
> > > > > > > (3) above are recognized as different, which prevents TCO.
> > > > > > >
> > > > > > >         PR middle-end/113680
> > > > > >
> > > > > > This shouldn't be done as a new optimization pass.  It fits eit=
her
> > > > > > the explicit code present in the forwprop pass or a new match.p=
d
> > > > > > pattern.  There's possible interaction with x - y value being u=
sed
> > > > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > > > a comparison against zero being possibly implemented by
> > > > > > a flag setting subtraction instruction.
> > > > > >
> > > > >
> > > > > Thank you so much for your review!  Although the forwprop pass ru=
ns
> > > > > multiple times, we might not need to apply this optimization mult=
iple
> > > > > times.  Would it be acceptable to add such optimization?  More
> > > > > generally, I would like to know how to determine where to put
> > > > > optimization in the future.
> > > >
> > > > This kind of pattern matching expression simplification is best
> > > > addressed by patterns in match.pd though historically the forwprop
> > > > pass still catches cases not in match.pd in its
> > > > forward_propagate_into_comparison_1 (and callers).
> > > >
> > >
> > > I see.  When would patterns in match.pd be applied?  Around forwprop
> > > or somewhere else?  (Also, could you please tell me a document I can
> > > learn about these if it exists?)  I ask because this optimization
> > > should be applied before the Jump Threading and VRP passes.
> >
> > match.pd patterns get used whenever an expression is simplified.  And
> > yes, the forwprop pass tries to simplify expressions rooted at each stm=
t.
> > In general each pass changing a statement or one of its operands should
> > re-simplify the expression rooted at it, so the theory is that all expr=
essions
> > always remain simplified.
> >
> > There's no documentation about the patterns present in match.pd if
> > you meant that.
>
> What I meant was the high-level structure around match.pd and other
> passes, but it is answered!  That makes sense.  Thank you for your
> time!

In theory there's doc/passes.texi but we do a very bad job at keeping
that up-to-date.

> > I'm usually writing small testcases to "query" match.pd
> > and use -fdump-tree-all-folding looking into .original (when folded at
> > generic time), .gimple, .ccp and .forwprop which are the usual early
> > places where simplifications are triggered.
>
> This is really helpful, thank you!
>
> >
> > Richard.
> >
> > > > > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCours=
e
> > > > >
> > > > > > Our VN pass has some tricks to anticipate CSE opportunities
> > > > > > like this, but it's not done "properly" in the way of anticipat=
ing
> > > > > > both forms during PRE.
> > > > > >
> > > > > > I'll note we have
> > > > > >
> > > > > >  /* (A - B) !=3D 0 ? (A - B) : (B - A)    same as (A - B) */
> > > > > >  (for cmp (ne ltgt)
> > > > > >
> > > > > > and similar which might be confused by canonicalizing to A !=3D=
 B.
> > > > >
> > > > > I will investigate and update my patch (after my final exam ends.=
..)!
> > > > >
> > > > > > I'm also surprised we don't already have the pattern you add.
> > > > >
> > > > > Hmm, so am I.
> > > >
> > > > It looks like we do it for equality compares which can also handle
> > > > types where overflow is undefined.  -fdump-tree-all-folding shows
> > > > in the 005t.original dump we simplify a - b !=3D 0 via match.pd:610=
5:
> > > >
> > > > /* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
> > > >    ??? The transformation is valid for the other operators if overf=
low
> > > >    is undefined for the type, but performing it here badly interact=
s
> > > >    with the transformation in fold_cond_expr_with_comparison which
> > > >    attempts to synthetize ABS_EXPR.  */
> > > > (for cmp (eq ne)
> > > >  (for sub (minus pointer_diff)
> > > >   (simplify
> > > >    (cmp (sub@2 @0 @1) integer_zerop)
> > > >    (if (single_use (@2))
> > > >     (cmp @0 @1)))))
> > > >
> > > > and the comment explains why we don't perform the transform
> > > > it the way you intend.  That comment might no longer apply
> > > > but I would guess there's testsuite fallout when you amend this
> > > > pattern.
> > >
> > > Thank you so much!  I will check this out.
> > >
> > > >
> > > > Richard.
> > > >
> > > > >  I saw that this optimization sometimes messes up VRP
> > > > > while sometimes helping it (as I described in my comment).  I als=
o
> > > > > need to research this.
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > > > > > >         * common.opt: Define ftree-cmp
> > > > > > >         * doc/invoke.texi: Document ftree-cmp.
> > > > > > >         * opts.cc (default_options_table): Handle OPT_ftree_c=
mp.
> > > > > > >         * passes.def (pass_cmp): New optimization pass.
> > > > > > >         * timevar.def (TV_TREE_CMP): New variable for timing.
> > > > > > >         * tree-pass.h (make_pass_cmp): New declaration.
> > > > > > >         * tree-ssa-cmp.cc: New file.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > >         * gcc.dg/pr113680.c: New test.
> > > > > > >
> > > > > > > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > > > > > > ---
> > > > > > >  gcc/Makefile.in                 |   1 +
> > > > > > >  gcc/common.opt                  |   4 +
> > > > > > >  gcc/doc/invoke.texi             |  11 +-
> > > > > > >  gcc/opts.cc                     |   1 +
> > > > > > >  gcc/passes.def                  |   3 +
> > > > > > >  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
> > > > > > >  gcc/timevar.def                 |   1 +
> > > > > > >  gcc/tree-pass.h                 |   1 +
> > > > > > >  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++=
++++++++++
> > > > > > >  9 files changed, 330 insertions(+), 1 deletion(-)
> > > > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> > > > > > >  create mode 100644 gcc/tree-ssa-cmp.cc
> > > > > > >
> > > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > > > > index a74761b7ab3..935b80b6947 100644
> > > > > > > --- a/gcc/Makefile.in
> > > > > > > +++ b/gcc/Makefile.in
> > > > > > > @@ -1731,6 +1731,7 @@ OBJS =3D \
> > > > > > >         tree-ssa-address.o \
> > > > > > >         tree-ssa-alias.o \
> > > > > > >         tree-ssa-ccp.o \
> > > > > > > +       tree-ssa-cmp.o \
> > > > > > >         tree-ssa-coalesce.o \
> > > > > > >         tree-ssa-copy.o \
> > > > > > >         tree-ssa-dce.o \
> > > > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > > > index 51c4a17da83..7c853224458 100644
> > > > > > > --- a/gcc/common.opt
> > > > > > > +++ b/gcc/common.opt
> > > > > > > @@ -3053,6 +3053,10 @@ ftree-ch
> > > > > > >  Common Var(flag_tree_ch) Optimization
> > > > > > >  Enable loop header copying on trees.
> > > > > > >
> > > > > > > +ftree-cmp
> > > > > > > +Common Var(flag_tree_cmp) Optimization
> > > > > > > +Enable SSA comparison optimization on trees.
> > > > > > > +
> > > > > > >  ftree-coalesce-inlined-vars
> > > > > > >  Common Ignore RejectNegative
> > > > > > >  Does nothing.  Preserved for backward compatibility.
> > > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > > > index bdf05be387d..04762d490a3 100644
> > > > > > > --- a/gcc/doc/invoke.texi
> > > > > > > +++ b/gcc/doc/invoke.texi
> > > > > > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > > > >  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop=
  -fssa-phiopt
> > > > > > >  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-stric=
t-aliasing
> > > > > > >  -fthread-jumps  -ftracer  -ftree-bit-ccp
> > > > > > > --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> > > > > > > +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
> > > > > > >  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-d=
ominator-opts
> > > > > > >  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> > > > > > >  -ftree-loop-if-convert  -ftree-loop-im
> > > > > > > @@ -12377,6 +12377,7 @@ compilation time.
> > > > > > >  -ftree-bit-ccp
> > > > > > >  -ftree-ccp
> > > > > > >  -ftree-ch
> > > > > > > +-ftree-cmp
> > > > > > >  -ftree-coalesce-vars
> > > > > > >  -ftree-copy-prop
> > > > > > >  -ftree-dce
> > > > > > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimiz=
ations.  It also saves one jump.  This flag
> > > > > > >  is enabled by default at @option{-O1} and higher.  It is not=
 enabled
> > > > > > >  for @option{-Os}, since it usually increases code size.
> > > > > > >
> > > > > > > +@opindex ftree-cmp
> > > > > > > +@item -ftree-cmp
> > > > > > > +Perform comparison optimization on trees.  This decreases un=
necessary
> > > > > > > +instructions for comparisons.  This flag is enabled by defau=
lt at
> > > > > > > +@option{-O1} and higher.  This optimization is automatically=
 disabled when
> > > > > > > +@option{-fwrapv} is specified to ensure correct behavior und=
er signed integer
> > > > > > > +overflow wrapping.
> > > > > > > +
> > > > > > >  @opindex ftree-loop-optimize
> > > > > > >  @item -ftree-loop-optimize
> > > > > > >  Perform loop optimizations on trees.  This flag is enabled b=
y default
> > > > > > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > > > > > index 3333600e0ea..9ee64a48313 100644
> > > > > > > --- a/gcc/opts.cc
> > > > > > > +++ b/gcc/opts.cc
> > > > > > > @@ -589,6 +589,7 @@ static const struct default_options defau=
lt_options_table[] =3D
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1=
 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > > > > > > +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > > > > > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > > > > > index 1cbbd413097..e28a2d3c309 100644
> > > > > > > --- a/gcc/passes.def
> > > > > > > +++ b/gcc/passes.def
> > > > > > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If =
not see
> > > > > > >           /* After CCP we rewrite no longer addressed locals =
into SSA
> > > > > > >              form if possible.  */
> > > > > > >           NEXT_PASS (pass_forwprop);
> > > > > > > +         /* Do cmp before early_thread_jumps and vrp since i=
t may
> > > > > > > +            modify conditions.  */
> > > > > > > +         NEXT_PASS (pass_cmp);
> > > > > > >            NEXT_PASS (pass_early_thread_jumps, /*first=3D*/tr=
ue);
> > > > > > >           NEXT_PASS (pass_sra_early);
> > > > > > >           /* pass_build_ealias is a dummy pass that ensures t=
hat we
> > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/=
gcc.dg/pr113680.c
> > > > > > > new file mode 100644
> > > > > > > index 00000000000..59cd865aa21
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > > > > > > @@ -0,0 +1,47 @@
> > > > > > > +/* { dg-do compile } */
> > > > > > > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > > > > > > +
> > > > > > > +volatile int i =3D 0;
> > > > > > > +
> > > > > > > +void foo() { i =3D 1; }
> > > > > > > +void bar() { i =3D 2; }
> > > > > > > +
> > > > > > > +void f1(int x, int y)
> > > > > > > +{
> > > > > > > +  int diff =3D x - y;
> > > > > > > +  if (diff > 0)
> > > > > > > +    foo();
> > > > > > > +  if (diff < 0)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +void f2(int x, int y)
> > > > > > > +{
> > > > > > > +  if ((x - y) > 0)
> > > > > > > +    foo();
> > > > > > > +  if ((x - y) < 0)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +void f3(int x, int y)
> > > > > > > +{
> > > > > > > +  if (x > y)
> > > > > > > +    foo();
> > > > > > > +  if (x < y)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +void f4(int x, int y)
> > > > > > > +{
> > > > > > > +  int diff =3D x - y;
> > > > > > > +  if (diff > 0)
> > > > > > > +    foo();
> > > > > > > +  if (x < y)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > > > > > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86=
-*-* x86_64-*-* } } } } */
> > > > > > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { tar=
get { i?86-*-* x86_64-*-* } } } } */
> > > > > > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-*=
 x86_64-*-* } } } } */
> > > > > > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-=
* x86_64-*-* } } } } */
> > > > > > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > > > > > index 8e2168e0817..468266d0c56 100644
> > > > > > > --- a/gcc/timevar.def
> > > > > > > +++ b/gcc/timevar.def
> > > > > > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   =
, "tree eh")
> > > > > > >  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG cons=
truction")
> > > > > > >  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG clea=
nup")
> > > > > > >  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> > > > > > > +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
> > > > > > >  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
> > > > > > >  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
> > > > > > >  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> > > > > > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > > > > > index 29267589eeb..71789a279c0 100644
> > > > > > > --- a/gcc/tree-pass.h
> > > > > > > +++ b/gcc/tree-pass.h
> > > > > > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc=
::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctx=
t);
> > > > > > >  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt=
);
> > > > > > >  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > > > > > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_split_paths (gcc::context =
*ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *c=
txt);
> > > > > > >  extern gimple_opt_pass *make_pass_build_alias (gcc::context =
*ctxt);
> > > > > > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > > > > > > new file mode 100644
> > > > > > > index 00000000000..41d4a58a5b5
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/tree-ssa-cmp.cc
> > > > > > > @@ -0,0 +1,262 @@
> > > > > > > +/* Global, SSA-based optimizations using comparison identiti=
es.
> > > > > > > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > > > +
> > > > > > > +This file is part of GCC.
> > > > > > > +
> > > > > > > +GCC is free software; you can redistribute it and/or modify =
it
> > > > > > > +under the terms of the GNU General Public License as publish=
ed by the
> > > > > > > +Free Software Foundation; either version 3, or (at your opti=
on) any
> > > > > > > +later version.
> > > > > > > +
> > > > > > > +GCC is distributed in the hope that it will be useful, but W=
ITHOUT
> > > > > > > +ANY WARRANTY; without even the implied warranty of MERCHANTA=
BILITY or
> > > > > > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Publi=
c License
> > > > > > > +for more details.
> > > > > > > +
> > > > > > > +You should have received a copy of the GNU General Public Li=
cense
> > > > > > > +along with GCC; see the file COPYING3.  If not see
> > > > > > > +<http://www.gnu.org/licenses/>.  */
> > > > > > > +
> > > > > > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are =
signed integers and
> > > > > > > +   CMP is <, <=3D, >, or >=3D.  Similarly, 0 CMP (x - y) is =
equivalent to y CMP x.
> > > > > > > +   As reported in PR middle-end/113680, this equivalence doe=
s not hold for
> > > > > > > +   types other than signed integers.  When it comes to condi=
tions, the former
> > > > > > > +   was translated to a combination of sub and test, whereas =
the latter was
> > > > > > > +   translated to a single cmp.  Thus, this optimization pass=
 tries to optimize
> > > > > > > +   the former to the latter.
> > > > > > > +
> > > > > > > +   When `-fwrapv` is enabled, GCC treats the overflow of sig=
ned integers as
> > > > > > > +   defined behavior, specifically, wrapping around according=
 to two's
> > > > > > > +   complement arithmetic.  This has implications for optimiz=
ations that
> > > > > > > +   rely on the standard behavior of signed integers, where o=
verflow is
> > > > > > > +   undefined.  Consider the example given:
> > > > > > > +
> > > > > > > +       long long llmax =3D __LONG_LONG_MAX__;
> > > > > > > +       long long llmin =3D -llmax - 1;
> > > > > > > +
> > > > > > > +   Here, `llmax - llmin` effectively becomes `llmax - (-llma=
x - 1)`, which
> > > > > > > +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the=
 maximum value for
> > > > > > > +   a `long long`, this calculation overflows in a defined ma=
nner
> > > > > > > +   (wrapping around), which under `-fwrapv` is a legal opera=
tion that produces
> > > > > > > +   a negative value due to two's complement wraparound.  The=
refore,
> > > > > > > +   `llmax - llmin < 0` is true.
> > > > > > > +
> > > > > > > +   However, the direct comparison `llmax < llmin` is false s=
ince `llmax` is the
> > > > > > > +   maximum possible value and `llmin` is the minimum.  Hence=
, optimizations
> > > > > > > +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP=
 y` (and vice versa)
> > > > > > > +   cannot be safely applied when `-fwrapv` is enabled.  This=
 is why this
> > > > > > > +   optimization pass is disabled under `-fwrapv`.
> > > > > > > +
> > > > > > > +   This optimization pass must run before the Jump Threading=
 pass and
> > > > > > > +   the VRP pass, as it may modify conditions. For example, i=
n the VRP pass:
> > > > > > > +
> > > > > > > +       (1)
> > > > > > > +         int diff =3D x - y;
> > > > > > > +         if (diff > 0)
> > > > > > > +           foo();
> > > > > > > +         if (diff < 0)
> > > > > > > +           bar();
> > > > > > > +
> > > > > > > +   The second condition would be converted to diff !=3D 0 in=
 the VRP pass because
> > > > > > > +   we know the postcondition of the first condition is diff =
<=3D 0, and then
> > > > > > > +   diff !=3D 0 is cheaper than diff < 0. If we apply this pa=
ss after this VRP,
> > > > > > > +   we get:
> > > > > > > +
> > > > > > > +       (2)
> > > > > > > +         int diff =3D x - y;
> > > > > > > +         if (x > y)
> > > > > > > +           foo();
> > > > > > > +         if (diff !=3D 0)
> > > > > > > +           bar();
> > > > > > > +
> > > > > > > +   This generates sub and test for the second condition and =
cmp for the first
> > > > > > > +   condition. However, if we apply this pass beforehand, we =
simply get:
> > > > > > > +
> > > > > > > +       (3)
> > > > > > > +         int diff =3D x - y;
> > > > > > > +         if (x > y)
> > > > > > > +           foo();
> > > > > > > +         if (x < y)
> > > > > > > +           bar();
> > > > > > > +
> > > > > > > +   In this code, diff will be eliminated as a dead code, and=
 sub and test will
> > > > > > > +   not be generated, which is more efficient.
> > > > > > > +
> > > > > > > +   For the Jump Threading pass, without this optimization pa=
ss, (1) and (3)
> > > > > > > +   above are recognized as different, which prevents TCO.  *=
/
> > > > > > > +
> > > > > > > +#include "config.h"
> > > > > > > +#include "system.h"
> > > > > > > +#include "coretypes.h"
> > > > > > > +#include "backend.h"
> > > > > > > +#include "tree.h"
> > > > > > > +#include "gimple.h"
> > > > > > > +#include "tree-pass.h"
> > > > > > > +#include "ssa.h"
> > > > > > > +#include "gimple-iterator.h"
> > > > > > > +#include "domwalk.h"
> > > > > > > +
> > > > > > > +/* True if VAR is a signed integer, false otherwise.  */
> > > > > > > +
> > > > > > > +static bool
> > > > > > > +is_signed_integer(const_tree var)
> > > > > > > +{
> > > > > > > +  return (TREE_CODE (TREE_TYPE (var)) =3D=3D INTEGER_TYPE
> > > > > > > +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NA=
ME, and its
> > > > > > > +   defining statement is a subtraction of two signed integer=
s, return the
> > > > > > > +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> > > > > > > +
> > > > > > > +static const_tree
> > > > > > > +prop_integer_zero_or_minus_expr (const_tree expr)
> > > > > > > +{
> > > > > > > +  if (integer_zerop (expr))
> > > > > > > +    return expr;
> > > > > > > +
> > > > > > > +  if (TREE_CODE (expr) !=3D SSA_NAME)
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  gimple *defining_stmt =3D SSA_NAME_DEF_STMT (expr);
> > > > > > > +  if (!is_gimple_assign (defining_stmt)
> > > > > > > +      || gimple_assign_rhs_code (defining_stmt) !=3D MINUS_E=
XPR)
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  tree rhs1 =3D gimple_assign_rhs1 (defining_stmt);
> > > > > > > +  if (!is_signed_integer (rhs1))
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  tree rhs2 =3D gimple_assign_rhs2 (defining_stmt);
> > > > > > > +  if (!is_signed_integer (rhs2))
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Optimize:
> > > > > > > +
> > > > > > > +       1. (x - y) CMP 0
> > > > > > > +       2. 0 CMP (x - y)
> > > > > > > +
> > > > > > > +   to:
> > > > > > > +
> > > > > > > +       1. x CMP y
> > > > > > > +       2. y CMP x
> > > > > > > +
> > > > > > > +   where CMP is either <, <=3D, >, or >=3D.  */
> > > > > > > +
> > > > > > > +static void
> > > > > > > +optimize_signed_comparison (gcond *stmt)
> > > > > > > +{
> > > > > > > +  const enum tree_code code =3D gimple_cond_code (stmt);
> > > > > > > +  if (code < LT_EXPR || GE_EXPR < code)
> > > > > > > +    return;
> > > > > > > +
> > > > > > > +  const_tree lhs =3D prop_integer_zero_or_minus_expr (gimple=
_cond_lhs (stmt));
> > > > > > > +  if (lhs =3D=3D NULL_TREE)
> > > > > > > +    return;
> > > > > > > +
> > > > > > > +  const_tree rhs =3D prop_integer_zero_or_minus_expr (gimple=
_cond_rhs (stmt));
> > > > > > > +  if (rhs =3D=3D NULL_TREE)
> > > > > > > +    return;
> > > > > > > +
> > > > > > > +  if (TREE_CODE (lhs) =3D=3D MINUS_EXPR && integer_zerop (rh=
s))
> > > > > > > +    {
> > > > > > > +      /* Case 1: (x - y) CMP 0  */
> > > > > > > +      tree x =3D TREE_OPERAND (lhs, 0);
> > > > > > > +      tree y =3D TREE_OPERAND (lhs, 1);
> > > > > > > +
> > > > > > > +      /* =3D> x CMP y  */
> > > > > > > +      gimple_cond_set_lhs (stmt, x);
> > > > > > > +      gimple_cond_set_rhs (stmt, y);
> > > > > > > +      update_stmt (stmt);
> > > > > > > +    }
> > > > > > > +  else if (integer_zerop (lhs) && TREE_CODE (rhs) =3D=3D MIN=
US_EXPR)
> > > > > > > +    {
> > > > > > > +      /* Case 2: 0 CMP (x - y)  */
> > > > > > > +      tree x =3D TREE_OPERAND (rhs, 0);
> > > > > > > +      tree y =3D TREE_OPERAND (rhs, 1);
> > > > > > > +
> > > > > > > +      /* =3D> y CMP x  */
> > > > > > > +      gimple_cond_set_lhs (stmt, y);
> > > > > > > +      gimple_cond_set_rhs (stmt, x);
> > > > > > > +      update_stmt (stmt);
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Find signed integer comparisons where the operands are MI=
NUS_EXPR and
> > > > > > > +   0, and replace them with the equivalent comparison of the=
 operands of
> > > > > > > +   the MINUS_EXPR without the subtraction.  */
> > > > > > > +
> > > > > > > +namespace {
> > > > > > > +
> > > > > > > +const pass_data pass_data_cmp =3D
> > > > > > > +{
> > > > > > > +  GIMPLE_PASS, /* type */
> > > > > > > +  "cmp", /* name */
> > > > > > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > > > > > +  TV_TREE_CMP, /* tv_id */
> > > > > > > +  PROP_ssa, /* properties_required */
> > > > > > > +  0, /* properties_provided */
> > > > > > > +  0, /* properties_destroyed */
> > > > > > > +  0, /* todo_flags_start */
> > > > > > > +  TODO_remove_unused_locals, /* todo_flags_finish */
> > > > > > > +};
> > > > > > > +
> > > > > > > +class pass_cmp : public gimple_opt_pass
> > > > > > > +{
> > > > > > > +public:
> > > > > > > +  pass_cmp (gcc::context *ctxt)
> > > > > > > +    : gimple_opt_pass (pass_data_cmp, ctxt)
> > > > > > > +  {}
> > > > > > > +
> > > > > > > +  /* opt_pass methods: */
> > > > > > > +  bool gate (function *) final override
> > > > > > > +    {
> > > > > > > +      /* If -fwrapv is enabled, this pass cannot be safely a=
pplied as described
> > > > > > > +        in the comment at the top of this file.  */
> > > > > > > +      return flag_tree_cmp !=3D 0 && flag_wrapv =3D=3D 0;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +  unsigned int execute (function *) final override;
> > > > > > > +
> > > > > > > +}; // class pass_cmp
> > > > > > > +
> > > > > > > +class cmp_dom_walker : public dom_walker
> > > > > > > +{
> > > > > > > +public:
> > > > > > > +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > > > > > > +
> > > > > > > +  void after_dom_children (basic_block) final override;
> > > > > > > +};
> > > > > > > +
> > > > > > > +void
> > > > > > > +cmp_dom_walker::after_dom_children (basic_block bb)
> > > > > > > +{
> > > > > > > +  gimple_stmt_iterator gsi;
> > > > > > > +
> > > > > > > +  for (gsi =3D gsi_after_labels (bb); !gsi_end_p (gsi); gsi_=
next (&gsi))
> > > > > > > +    {
> > > > > > > +      gimple *stmt =3D gsi_stmt (gsi);
> > > > > > > +      if (gimple_code (stmt) =3D=3D GIMPLE_COND)
> > > > > > > +       optimize_signed_comparison (as_a <gcond *> (stmt));
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +
> > > > > > > +unsigned int
> > > > > > > +pass_cmp::execute (function *)
> > > > > > > +{
> > > > > > > +  calculate_dominance_info (CDI_DOMINATORS);
> > > > > > > +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > > > > +  return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +} // anon namespace
> > > > > > > +
> > > > > > > +gimple_opt_pass *
> > > > > > > +make_pass_cmp (gcc::context *ctxt)
> > > > > > > +{
> > > > > > > +  return new pass_cmp (ctxt);
> > > > > > > +}
> > > > > > > --
> > > > > > > 2.44.0
> > > > > > >