From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by sourceware.org (Postfix) with ESMTPS id 53A533858422; Mon, 11 Mar 2024 07:22:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 53A533858422 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 53A533858422 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710141745; cv=none; b=JhpaOfsoZrtub8oAwK4cwsZorSr1uMFb7PnLLBHWdwU7imD+ch1LAO8D5DmKGgpLik+fyrFKUZqpmqG9mEuntVDU+PWH15jDvEBt2kUWPbOh+w5jnW6bpuhQnXk3j7n6FPzZTipf4Tf+0PqFhX3jvurQ1LKVtSH9B9M2lnoPhQU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710141745; c=relaxed/simple; bh=CaJY5vPG8d8yTkEmhCq/Ty2FqBHN5z5eB+Q9S7hdAz0=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=U0NaSgzky0wTdWQUd33Brl79TAyRamgSH0lltSf2o2bG4nmnAV13jWsP/hzux3/3Q/mGvSKFj4lqKK9IReyF/bCbKsxI8TDusf6qtXWY94NGsemtbY7+iqK4R/V6wnOxjeagqio5yvnCd8h3d0rhsfOrmqgfmB/PoJnlt9378J0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-513a08f2263so1181566e87.3; Mon, 11 Mar 2024 00:22:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710141739; x=1710746539; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TZpn+CbGZmeIrRBzVbdJ52B/dB7XWpx5VDrcCdrD76o=; b=l28mSgae5XKH2WUywv9I2Cp9haJKjvSd+841O6wyKK7sr8UPvIEILTzhdTu3bdUfsN 20+Jy7NsCLGkku+ZBvINjxVg87t+SDRQ12UaflUnex8rqvxoyTZN/YOIh8sSsayydocl EwyUq/vORpm10vHx7WIdYJseHBp52mH4UobJ6vwD+ghXtFhV6oD2YZuMxhChA51Xaw+1 DKJqIJk3OAQah074kZx67AEGVjmOnOuKxnJ0s/QO6GLoaKA0qd/BdbrMbsFXi9zht4Tk QdranPGV+3E/O9HnG8tUH8AIxSQ/9Gm94/2v6sVeIxN8pye2ywldHLiKMz7j9AOLToxJ QJmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710141739; x=1710746539; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TZpn+CbGZmeIrRBzVbdJ52B/dB7XWpx5VDrcCdrD76o=; b=FHv7eaGKqlXWlvd1KZVfxCDT+vVRRb6tFr7sMB3YAydUck6JmsCs6dRLKErQ67W++u wZJ9Lk63n61wHG5G7VaHK7lpr6THpG4DhTCFmDf4SBCPw3Nz1SLOW0B5dAmnFxU7+M/V dEqA3/k2Kc/WFizyobjLNDfhZKBdYfYy6JiJD+n1R7qMjU8fScmWlK+IBVI/bK6Sswzq CaQa5f6JNnuLFMVgx5uP5QsN1rjyHsVn/lB8uTdsmKiHLLRajGbBHifnjqr0kjyBLtKq fsXbjzuJ+2FLID8p27wLY+adQrq00DeUwocJjIz/HlbO5J9S1xFa6RiPSQnua8xcxfQA 1AAw== X-Forwarded-Encrypted: i=1; AJvYcCXyHaW/4/E8DmepAGXKMyiW7jPsisq6G2um/nXRI6nMTp9qLTRjSS1LyM9ILDsyXW1fN+foJ7nTppodoRPmf9xS63sv0MJYgA== X-Gm-Message-State: AOJu0Yz0cjJoKg+gz9DI79Ail905B+dP72MEPfK9Mw3+npnCANva6vsx nuX15XPB2J5UXXwbx1QUNkE9gakLJpJWq/gHT7XSAH0e8cT9WAHrdGJp/gTVMtnOPjtz1ub7drY DAdOx4gER2a52BONYmsRwUdap/LUFqzdu X-Google-Smtp-Source: AGHT+IGwmK9c4T2pS29dUFoUpg7kLPoQlkZKL8lUVoW+X6d6N1wRKXrq8zOU7+0/U1wDMzB7AVyaCn2MElcnD7Y/sQI= X-Received: by 2002:a05:6512:4002:b0:513:aef9:7159 with SMTP id br2-20020a056512400200b00513aef97159mr730533lfb.39.1710141738274; Mon, 11 Mar 2024 00:22:18 -0700 (PDT) MIME-Version: 1.0 References: <20240304203830.3740968-1-kmatsui@gcc.gnu.org> In-Reply-To: From: Richard Biener Date: Mon, 11 Mar 2024 08:22:06 +0100 Message-ID: Subject: Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y To: Ken Matsui Cc: Ken Matsui , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Mar 8, 2024 at 6:50=E2=80=AFPM Ken Matsui wrote: > > On Thu, Mar 7, 2024 at 10:49=E2=80=AFPM Richard Biener > wrote: > > > > On Thu, Mar 7, 2024 at 8:29=E2=80=AFPM Ken Matsui wrote: > > > > > > On Tue, Mar 5, 2024 at 7:58=E2=80=AFAM Richard Biener > > > wrote: > > > > > > > > On Tue, Mar 5, 2024 at 1:51=E2=80=AFPM Ken Matsui wrote: > > > > > > > > > > On Tue, Mar 5, 2024 at 12:38=E2=80=AFAM Richard Biener > > > > > wrote: > > > > > > > > > > > > On Mon, Mar 4, 2024 at 9:40=E2=80=AFPM Ken Matsui wrote: > > > > > > > > > > > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are sign= ed > > > > > > > integers and CMP is <, <=3D, >, or >=3D. Similarly, 0 CMP (x= - y) is > > > > > > > equivalent to y CMP x. As reported in PR middle-end/113680, = this > > > > > > > equivalence does not hold for types other than signed integer= s. When > > > > > > > it comes to conditions, the former was translated to a combin= ation of > > > > > > > sub and test, whereas the latter was translated to a single c= mp. > > > > > > > Thus, this optimization pass tries to optimize the former to = the > > > > > > > latter. > > > > > > > > > > > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed = integers > > > > > > > as defined behavior, specifically, wrapping around according = to two's > > > > > > > complement arithmetic. This has implications for optimizatio= ns that > > > > > > > rely on the standard behavior of signed integers, where overf= low is > > > > > > > undefined. Consider the example given: > > > > > > > > > > > > > > long long llmax =3D __LONG_LONG_MAX__; > > > > > > > long long llmin =3D -llmax - 1; > > > > > > > > > > > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - = 1)`, which > > > > > > > simplifies to `2 * llmax + 1`. Given that `llmax` is the max= imum value > > > > > > > for a `long long`, this calculation overflows in a defined ma= nner > > > > > > > (wrapping around), which under `-fwrapv` is a legal operation= that > > > > > > > produces a negative value due to two's complement wraparound. > > > > > > > Therefore, `llmax - llmin < 0` is true. > > > > > > > > > > > > > > However, the direct comparison `llmax < llmin` is false since= `llmax` > > > > > > > is the maximum possible value and `llmin` is the minimum. He= nce, > > > > > > > optimizations that rely on the equivalence of `(x - y) CMP 0`= to > > > > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fw= rapv` is > > > > > > > enabled. This is why this optimization pass is disabled unde= r > > > > > > > `-fwrapv`. > > > > > > > > > > > > > > This optimization pass must run before the Jump Threading pas= s and the > > > > > > > VRP pass, as it may modify conditions. For example, in the VR= P pass: > > > > > > > > > > > > > > (1) > > > > > > > int diff =3D x - y; > > > > > > > if (diff > 0) > > > > > > > foo(); > > > > > > > if (diff < 0) > > > > > > > bar(); > > > > > > > > > > > > > > The second condition would be converted to diff !=3D 0 in the= VRP pass > > > > > > > because we know the postcondition of the first condition is d= iff <=3D 0, > > > > > > > and then diff !=3D 0 is cheaper than diff < 0. If we apply th= is pass > > > > > > > after this VRP, we get: > > > > > > > > > > > > > > (2) > > > > > > > int diff =3D x - y; > > > > > > > if (x > y) > > > > > > > foo(); > > > > > > > if (diff !=3D 0) > > > > > > > bar(); > > > > > > > > > > > > > > This generates sub and test for the second condition and cmp = for the > > > > > > > first condition. However, if we apply this pass beforehand, w= e simply > > > > > > > get: > > > > > > > > > > > > > > (3) > > > > > > > int diff =3D x - y; > > > > > > > if (x > y) > > > > > > > foo(); > > > > > > > if (x < y) > > > > > > > bar(); > > > > > > > > > > > > > > In this code, diff will be eliminated as a dead code, and sub= and test > > > > > > > will not be generated, which is more efficient. > > > > > > > > > > > > > > For the Jump Threading pass, without this optimization pass, = (1) and > > > > > > > (3) above are recognized as different, which prevents TCO. > > > > > > > > > > > > > > PR middle-end/113680 > > > > > > > > > > > > This shouldn't be done as a new optimization pass. It fits eit= her > > > > > > the explicit code present in the forwprop pass or a new match.p= d > > > > > > pattern. There's possible interaction with x - y value being u= sed > > > > > > elsewhere and thus exposing a CSE opportunity as well as > > > > > > a comparison against zero being possibly implemented by > > > > > > a flag setting subtraction instruction. > > > > > > > > > > > > > > > > Thank you so much for your review! Although the forwprop pass ru= ns > > > > > multiple times, we might not need to apply this optimization mult= iple > > > > > times. Would it be acceptable to add such optimization? More > > > > > generally, I would like to know how to determine where to put > > > > > optimization in the future. > > > > > > > > This kind of pattern matching expression simplification is best > > > > addressed by patterns in match.pd though historically the forwprop > > > > pass still catches cases not in match.pd in its > > > > forward_propagate_into_comparison_1 (and callers). > > > > > > > > > > I see. When would patterns in match.pd be applied? Around forwprop > > > or somewhere else? (Also, could you please tell me a document I can > > > learn about these if it exists?) I ask because this optimization > > > should be applied before the Jump Threading and VRP passes. > > > > match.pd patterns get used whenever an expression is simplified. And > > yes, the forwprop pass tries to simplify expressions rooted at each stm= t. > > In general each pass changing a statement or one of its operands should > > re-simplify the expression rooted at it, so the theory is that all expr= essions > > always remain simplified. > > > > There's no documentation about the patterns present in match.pd if > > you meant that. > > What I meant was the high-level structure around match.pd and other > passes, but it is answered! That makes sense. Thank you for your > time! In theory there's doc/passes.texi but we do a very bad job at keeping that up-to-date. > > I'm usually writing small testcases to "query" match.pd > > and use -fdump-tree-all-folding looking into .original (when folded at > > generic time), .gimple, .ccp and .forwprop which are the usual early > > places where simplifications are triggered. > > This is really helpful, thank you! > > > > > Richard. > > > > > > > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCours= e > > > > > > > > > > > Our VN pass has some tricks to anticipate CSE opportunities > > > > > > like this, but it's not done "properly" in the way of anticipat= ing > > > > > > both forms during PRE. > > > > > > > > > > > > I'll note we have > > > > > > > > > > > > /* (A - B) !=3D 0 ? (A - B) : (B - A) same as (A - B) */ > > > > > > (for cmp (ne ltgt) > > > > > > > > > > > > and similar which might be confused by canonicalizing to A !=3D= B. > > > > > > > > > > I will investigate and update my patch (after my final exam ends.= ..)! > > > > > > > > > > > I'm also surprised we don't already have the pattern you add. > > > > > > > > > > Hmm, so am I. > > > > > > > > It looks like we do it for equality compares which can also handle > > > > types where overflow is undefined. -fdump-tree-all-folding shows > > > > in the 005t.original dump we simplify a - b !=3D 0 via match.pd:610= 5: > > > > > > > > /* Transform comparisons of the form X - Y CMP 0 to X CMP Y. > > > > ??? The transformation is valid for the other operators if overf= low > > > > is undefined for the type, but performing it here badly interact= s > > > > with the transformation in fold_cond_expr_with_comparison which > > > > attempts to synthetize ABS_EXPR. */ > > > > (for cmp (eq ne) > > > > (for sub (minus pointer_diff) > > > > (simplify > > > > (cmp (sub@2 @0 @1) integer_zerop) > > > > (if (single_use (@2)) > > > > (cmp @0 @1))))) > > > > > > > > and the comment explains why we don't perform the transform > > > > it the way you intend. That comment might no longer apply > > > > but I would guess there's testsuite fallout when you amend this > > > > pattern. > > > > > > Thank you so much! I will check this out. > > > > > > > > > > > Richard. > > > > > > > > > I saw that this optimization sometimes messes up VRP > > > > > while sometimes helping it (as I described in my comment). I als= o > > > > > need to research this. > > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > * Makefile.in: Add tree-ssa-cmp.o to OBJS. > > > > > > > * common.opt: Define ftree-cmp > > > > > > > * doc/invoke.texi: Document ftree-cmp. > > > > > > > * opts.cc (default_options_table): Handle OPT_ftree_c= mp. > > > > > > > * passes.def (pass_cmp): New optimization pass. > > > > > > > * timevar.def (TV_TREE_CMP): New variable for timing. > > > > > > > * tree-pass.h (make_pass_cmp): New declaration. > > > > > > > * tree-ssa-cmp.cc: New file. > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > * gcc.dg/pr113680.c: New test. > > > > > > > > > > > > > > Signed-off-by: Ken Matsui > > > > > > > --- > > > > > > > gcc/Makefile.in | 1 + > > > > > > > gcc/common.opt | 4 + > > > > > > > gcc/doc/invoke.texi | 11 +- > > > > > > > gcc/opts.cc | 1 + > > > > > > > gcc/passes.def | 3 + > > > > > > > gcc/testsuite/gcc.dg/pr113680.c | 47 ++++++ > > > > > > > gcc/timevar.def | 1 + > > > > > > > gcc/tree-pass.h | 1 + > > > > > > > gcc/tree-ssa-cmp.cc | 262 ++++++++++++++++++++++= ++++++++++ > > > > > > > 9 files changed, 330 insertions(+), 1 deletion(-) > > > > > > > create mode 100644 gcc/testsuite/gcc.dg/pr113680.c > > > > > > > create mode 100644 gcc/tree-ssa-cmp.cc > > > > > > > > > > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > > > > > > > index a74761b7ab3..935b80b6947 100644 > > > > > > > --- a/gcc/Makefile.in > > > > > > > +++ b/gcc/Makefile.in > > > > > > > @@ -1731,6 +1731,7 @@ OBJS =3D \ > > > > > > > tree-ssa-address.o \ > > > > > > > tree-ssa-alias.o \ > > > > > > > tree-ssa-ccp.o \ > > > > > > > + tree-ssa-cmp.o \ > > > > > > > tree-ssa-coalesce.o \ > > > > > > > tree-ssa-copy.o \ > > > > > > > tree-ssa-dce.o \ > > > > > > > diff --git a/gcc/common.opt b/gcc/common.opt > > > > > > > index 51c4a17da83..7c853224458 100644 > > > > > > > --- a/gcc/common.opt > > > > > > > +++ b/gcc/common.opt > > > > > > > @@ -3053,6 +3053,10 @@ ftree-ch > > > > > > > Common Var(flag_tree_ch) Optimization > > > > > > > Enable loop header copying on trees. > > > > > > > > > > > > > > +ftree-cmp > > > > > > > +Common Var(flag_tree_cmp) Optimization > > > > > > > +Enable SSA comparison optimization on trees. > > > > > > > + > > > > > > > ftree-coalesce-inlined-vars > > > > > > > Common Ignore RejectNegative > > > > > > > Does nothing. Preserved for backward compatibility. > > > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > > > > > index bdf05be387d..04762d490a3 100644 > > > > > > > --- a/gcc/doc/invoke.texi > > > > > > > +++ b/gcc/doc/invoke.texi > > > > > > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}. > > > > > > > -fsplit-wide-types -fsplit-wide-types-early -fssa-backprop= -fssa-phiopt > > > > > > > -fstdarg-opt -fstore-merging -fstrict-aliasing -fipa-stric= t-aliasing > > > > > > > -fthread-jumps -ftracer -ftree-bit-ccp > > > > > > > --ftree-builtin-call-dce -ftree-ccp -ftree-ch > > > > > > > +-ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-cmp > > > > > > > -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-d= ominator-opts > > > > > > > -ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting > > > > > > > -ftree-loop-if-convert -ftree-loop-im > > > > > > > @@ -12377,6 +12377,7 @@ compilation time. > > > > > > > -ftree-bit-ccp > > > > > > > -ftree-ccp > > > > > > > -ftree-ch > > > > > > > +-ftree-cmp > > > > > > > -ftree-coalesce-vars > > > > > > > -ftree-copy-prop > > > > > > > -ftree-dce > > > > > > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimiz= ations. It also saves one jump. This flag > > > > > > > is enabled by default at @option{-O1} and higher. It is not= enabled > > > > > > > for @option{-Os}, since it usually increases code size. > > > > > > > > > > > > > > +@opindex ftree-cmp > > > > > > > +@item -ftree-cmp > > > > > > > +Perform comparison optimization on trees. This decreases un= necessary > > > > > > > +instructions for comparisons. This flag is enabled by defau= lt at > > > > > > > +@option{-O1} and higher. This optimization is automatically= disabled when > > > > > > > +@option{-fwrapv} is specified to ensure correct behavior und= er signed integer > > > > > > > +overflow wrapping. > > > > > > > + > > > > > > > @opindex ftree-loop-optimize > > > > > > > @item -ftree-loop-optimize > > > > > > > Perform loop optimizations on trees. This flag is enabled b= y default > > > > > > > diff --git a/gcc/opts.cc b/gcc/opts.cc > > > > > > > index 3333600e0ea..9ee64a48313 100644 > > > > > > > --- a/gcc/opts.cc > > > > > > > +++ b/gcc/opts.cc > > > > > > > @@ -589,6 +589,7 @@ static const struct default_options defau= lt_options_table[] =3D > > > > > > > { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1= }, > > > > > > > { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 }, > > > > > > > { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 }, > > > > > > > + { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 }, > > > > > > > { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 }, > > > > > > > { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 }, > > > > > > > { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 }, > > > > > > > diff --git a/gcc/passes.def b/gcc/passes.def > > > > > > > index 1cbbd413097..e28a2d3c309 100644 > > > > > > > --- a/gcc/passes.def > > > > > > > +++ b/gcc/passes.def > > > > > > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3. If = not see > > > > > > > /* After CCP we rewrite no longer addressed locals = into SSA > > > > > > > form if possible. */ > > > > > > > NEXT_PASS (pass_forwprop); > > > > > > > + /* Do cmp before early_thread_jumps and vrp since i= t may > > > > > > > + modify conditions. */ > > > > > > > + NEXT_PASS (pass_cmp); > > > > > > > NEXT_PASS (pass_early_thread_jumps, /*first=3D*/tr= ue); > > > > > > > NEXT_PASS (pass_sra_early); > > > > > > > /* pass_build_ealias is a dummy pass that ensures t= hat we > > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/= gcc.dg/pr113680.c > > > > > > > new file mode 100644 > > > > > > > index 00000000000..59cd865aa21 > > > > > > > --- /dev/null > > > > > > > +++ b/gcc/testsuite/gcc.dg/pr113680.c > > > > > > > @@ -0,0 +1,47 @@ > > > > > > > +/* { dg-do compile } */ > > > > > > > +/* { dg-options "-Os -fdump-tree-optimized" } */ > > > > > > > + > > > > > > > +volatile int i =3D 0; > > > > > > > + > > > > > > > +void foo() { i =3D 1; } > > > > > > > +void bar() { i =3D 2; } > > > > > > > + > > > > > > > +void f1(int x, int y) > > > > > > > +{ > > > > > > > + int diff =3D x - y; > > > > > > > + if (diff > 0) > > > > > > > + foo(); > > > > > > > + if (diff < 0) > > > > > > > + bar(); > > > > > > > +} > > > > > > > + > > > > > > > +void f2(int x, int y) > > > > > > > +{ > > > > > > > + if ((x - y) > 0) > > > > > > > + foo(); > > > > > > > + if ((x - y) < 0) > > > > > > > + bar(); > > > > > > > +} > > > > > > > + > > > > > > > +void f3(int x, int y) > > > > > > > +{ > > > > > > > + if (x > y) > > > > > > > + foo(); > > > > > > > + if (x < y) > > > > > > > + bar(); > > > > > > > +} > > > > > > > + > > > > > > > +void f4(int x, int y) > > > > > > > +{ > > > > > > > + int diff =3D x - y; > > > > > > > + if (diff > 0) > > > > > > > + foo(); > > > > > > > + if (x < y) > > > > > > > + bar(); > > > > > > > +} > > > > > > > + > > > > > > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */ > > > > > > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86= -*-* x86_64-*-* } } } } */ > > > > > > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { tar= get { i?86-*-* x86_64-*-* } } } } */ > > > > > > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-*= x86_64-*-* } } } } */ > > > > > > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-= * x86_64-*-* } } } } */ > > > > > > > diff --git a/gcc/timevar.def b/gcc/timevar.def > > > > > > > index 8e2168e0817..468266d0c56 100644 > > > > > > > --- a/gcc/timevar.def > > > > > > > +++ b/gcc/timevar.def > > > > > > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH = , "tree eh") > > > > > > > DEFTIMEVAR (TV_TREE_CFG , "tree CFG cons= truction") > > > > > > > DEFTIMEVAR (TV_TREE_CLEANUP_CFG , "tree CFG clea= nup") > > > > > > > DEFTIMEVAR (TV_TREE_TAIL_MERGE , "tree tail merge") > > > > > > > +DEFTIMEVAR (TV_TREE_CMP , "tree cmp") > > > > > > > DEFTIMEVAR (TV_TREE_VRP , "tree VRP") > > > > > > > DEFTIMEVAR (TV_TREE_VRP_THREADER , "tree VRP threader") > > > > > > > DEFTIMEVAR (TV_TREE_EARLY_VRP , "tree Early VRP") > > > > > > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > > > > > > > index 29267589eeb..71789a279c0 100644 > > > > > > > --- a/gcc/tree-pass.h > > > > > > > +++ b/gcc/tree-pass.h > > > > > > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc= ::context *ctxt); > > > > > > > extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctx= t); > > > > > > > extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt= ); > > > > > > > extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); > > > > > > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt); > > > > > > > extern gimple_opt_pass *make_pass_split_paths (gcc::context = *ctxt); > > > > > > > extern gimple_opt_pass *make_pass_build_ssa (gcc::context *c= txt); > > > > > > > extern gimple_opt_pass *make_pass_build_alias (gcc::context = *ctxt); > > > > > > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc > > > > > > > new file mode 100644 > > > > > > > index 00000000000..41d4a58a5b5 > > > > > > > --- /dev/null > > > > > > > +++ b/gcc/tree-ssa-cmp.cc > > > > > > > @@ -0,0 +1,262 @@ > > > > > > > +/* Global, SSA-based optimizations using comparison identiti= es. > > > > > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > > > > > + > > > > > > > +This file is part of GCC. > > > > > > > + > > > > > > > +GCC is free software; you can redistribute it and/or modify = it > > > > > > > +under the terms of the GNU General Public License as publish= ed by the > > > > > > > +Free Software Foundation; either version 3, or (at your opti= on) any > > > > > > > +later version. > > > > > > > + > > > > > > > +GCC is distributed in the hope that it will be useful, but W= ITHOUT > > > > > > > +ANY WARRANTY; without even the implied warranty of MERCHANTA= BILITY or > > > > > > > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Publi= c License > > > > > > > +for more details. > > > > > > > + > > > > > > > +You should have received a copy of the GNU General Public Li= cense > > > > > > > +along with GCC; see the file COPYING3. If not see > > > > > > > +. */ > > > > > > > + > > > > > > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are = signed integers and > > > > > > > + CMP is <, <=3D, >, or >=3D. Similarly, 0 CMP (x - y) is = equivalent to y CMP x. > > > > > > > + As reported in PR middle-end/113680, this equivalence doe= s not hold for > > > > > > > + types other than signed integers. When it comes to condi= tions, the former > > > > > > > + was translated to a combination of sub and test, whereas = the latter was > > > > > > > + translated to a single cmp. Thus, this optimization pass= tries to optimize > > > > > > > + the former to the latter. > > > > > > > + > > > > > > > + When `-fwrapv` is enabled, GCC treats the overflow of sig= ned integers as > > > > > > > + defined behavior, specifically, wrapping around according= to two's > > > > > > > + complement arithmetic. This has implications for optimiz= ations that > > > > > > > + rely on the standard behavior of signed integers, where o= verflow is > > > > > > > + undefined. Consider the example given: > > > > > > > + > > > > > > > + long long llmax =3D __LONG_LONG_MAX__; > > > > > > > + long long llmin =3D -llmax - 1; > > > > > > > + > > > > > > > + Here, `llmax - llmin` effectively becomes `llmax - (-llma= x - 1)`, which > > > > > > > + simplifies to `2 * llmax + 1`. Given that `llmax` is the= maximum value for > > > > > > > + a `long long`, this calculation overflows in a defined ma= nner > > > > > > > + (wrapping around), which under `-fwrapv` is a legal opera= tion that produces > > > > > > > + a negative value due to two's complement wraparound. The= refore, > > > > > > > + `llmax - llmin < 0` is true. > > > > > > > + > > > > > > > + However, the direct comparison `llmax < llmin` is false s= ince `llmax` is the > > > > > > > + maximum possible value and `llmin` is the minimum. Hence= , optimizations > > > > > > > + that rely on the equivalence of `(x - y) CMP 0` to `x CMP= y` (and vice versa) > > > > > > > + cannot be safely applied when `-fwrapv` is enabled. This= is why this > > > > > > > + optimization pass is disabled under `-fwrapv`. > > > > > > > + > > > > > > > + This optimization pass must run before the Jump Threading= pass and > > > > > > > + the VRP pass, as it may modify conditions. For example, i= n the VRP pass: > > > > > > > + > > > > > > > + (1) > > > > > > > + int diff =3D x - y; > > > > > > > + if (diff > 0) > > > > > > > + foo(); > > > > > > > + if (diff < 0) > > > > > > > + bar(); > > > > > > > + > > > > > > > + The second condition would be converted to diff !=3D 0 in= the VRP pass because > > > > > > > + we know the postcondition of the first condition is diff = <=3D 0, and then > > > > > > > + diff !=3D 0 is cheaper than diff < 0. If we apply this pa= ss after this VRP, > > > > > > > + we get: > > > > > > > + > > > > > > > + (2) > > > > > > > + int diff =3D x - y; > > > > > > > + if (x > y) > > > > > > > + foo(); > > > > > > > + if (diff !=3D 0) > > > > > > > + bar(); > > > > > > > + > > > > > > > + This generates sub and test for the second condition and = cmp for the first > > > > > > > + condition. However, if we apply this pass beforehand, we = simply get: > > > > > > > + > > > > > > > + (3) > > > > > > > + int diff =3D x - y; > > > > > > > + if (x > y) > > > > > > > + foo(); > > > > > > > + if (x < y) > > > > > > > + bar(); > > > > > > > + > > > > > > > + In this code, diff will be eliminated as a dead code, and= sub and test will > > > > > > > + not be generated, which is more efficient. > > > > > > > + > > > > > > > + For the Jump Threading pass, without this optimization pa= ss, (1) and (3) > > > > > > > + above are recognized as different, which prevents TCO. *= / > > > > > > > + > > > > > > > +#include "config.h" > > > > > > > +#include "system.h" > > > > > > > +#include "coretypes.h" > > > > > > > +#include "backend.h" > > > > > > > +#include "tree.h" > > > > > > > +#include "gimple.h" > > > > > > > +#include "tree-pass.h" > > > > > > > +#include "ssa.h" > > > > > > > +#include "gimple-iterator.h" > > > > > > > +#include "domwalk.h" > > > > > > > + > > > > > > > +/* True if VAR is a signed integer, false otherwise. */ > > > > > > > + > > > > > > > +static bool > > > > > > > +is_signed_integer(const_tree var) > > > > > > > +{ > > > > > > > + return (TREE_CODE (TREE_TYPE (var)) =3D=3D INTEGER_TYPE > > > > > > > + && !TYPE_UNSIGNED (TREE_TYPE (var))); > > > > > > > +} > > > > > > > + > > > > > > > +/* If EXPR is an integer zero, return it. If EXPR is SSA_NA= ME, and its > > > > > > > + defining statement is a subtraction of two signed integer= s, return the > > > > > > > + MINUS_EXPR. Otherwise, return NULL_TREE. */ > > > > > > > + > > > > > > > +static const_tree > > > > > > > +prop_integer_zero_or_minus_expr (const_tree expr) > > > > > > > +{ > > > > > > > + if (integer_zerop (expr)) > > > > > > > + return expr; > > > > > > > + > > > > > > > + if (TREE_CODE (expr) !=3D SSA_NAME) > > > > > > > + return NULL_TREE; > > > > > > > + > > > > > > > + gimple *defining_stmt =3D SSA_NAME_DEF_STMT (expr); > > > > > > > + if (!is_gimple_assign (defining_stmt) > > > > > > > + || gimple_assign_rhs_code (defining_stmt) !=3D MINUS_E= XPR) > > > > > > > + return NULL_TREE; > > > > > > > + > > > > > > > + tree rhs1 =3D gimple_assign_rhs1 (defining_stmt); > > > > > > > + if (!is_signed_integer (rhs1)) > > > > > > > + return NULL_TREE; > > > > > > > + > > > > > > > + tree rhs2 =3D gimple_assign_rhs2 (defining_stmt); > > > > > > > + if (!is_signed_integer (rhs2)) > > > > > > > + return NULL_TREE; > > > > > > > + > > > > > > > + return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2); > > > > > > > +} > > > > > > > + > > > > > > > +/* Optimize: > > > > > > > + > > > > > > > + 1. (x - y) CMP 0 > > > > > > > + 2. 0 CMP (x - y) > > > > > > > + > > > > > > > + to: > > > > > > > + > > > > > > > + 1. x CMP y > > > > > > > + 2. y CMP x > > > > > > > + > > > > > > > + where CMP is either <, <=3D, >, or >=3D. */ > > > > > > > + > > > > > > > +static void > > > > > > > +optimize_signed_comparison (gcond *stmt) > > > > > > > +{ > > > > > > > + const enum tree_code code =3D gimple_cond_code (stmt); > > > > > > > + if (code < LT_EXPR || GE_EXPR < code) > > > > > > > + return; > > > > > > > + > > > > > > > + const_tree lhs =3D prop_integer_zero_or_minus_expr (gimple= _cond_lhs (stmt)); > > > > > > > + if (lhs =3D=3D NULL_TREE) > > > > > > > + return; > > > > > > > + > > > > > > > + const_tree rhs =3D prop_integer_zero_or_minus_expr (gimple= _cond_rhs (stmt)); > > > > > > > + if (rhs =3D=3D NULL_TREE) > > > > > > > + return; > > > > > > > + > > > > > > > + if (TREE_CODE (lhs) =3D=3D MINUS_EXPR && integer_zerop (rh= s)) > > > > > > > + { > > > > > > > + /* Case 1: (x - y) CMP 0 */ > > > > > > > + tree x =3D TREE_OPERAND (lhs, 0); > > > > > > > + tree y =3D TREE_OPERAND (lhs, 1); > > > > > > > + > > > > > > > + /* =3D> x CMP y */ > > > > > > > + gimple_cond_set_lhs (stmt, x); > > > > > > > + gimple_cond_set_rhs (stmt, y); > > > > > > > + update_stmt (stmt); > > > > > > > + } > > > > > > > + else if (integer_zerop (lhs) && TREE_CODE (rhs) =3D=3D MIN= US_EXPR) > > > > > > > + { > > > > > > > + /* Case 2: 0 CMP (x - y) */ > > > > > > > + tree x =3D TREE_OPERAND (rhs, 0); > > > > > > > + tree y =3D TREE_OPERAND (rhs, 1); > > > > > > > + > > > > > > > + /* =3D> y CMP x */ > > > > > > > + gimple_cond_set_lhs (stmt, y); > > > > > > > + gimple_cond_set_rhs (stmt, x); > > > > > > > + update_stmt (stmt); > > > > > > > + } > > > > > > > +} > > > > > > > + > > > > > > > +/* Find signed integer comparisons where the operands are MI= NUS_EXPR and > > > > > > > + 0, and replace them with the equivalent comparison of the= operands of > > > > > > > + the MINUS_EXPR without the subtraction. */ > > > > > > > + > > > > > > > +namespace { > > > > > > > + > > > > > > > +const pass_data pass_data_cmp =3D > > > > > > > +{ > > > > > > > + GIMPLE_PASS, /* type */ > > > > > > > + "cmp", /* name */ > > > > > > > + OPTGROUP_NONE, /* optinfo_flags */ > > > > > > > + TV_TREE_CMP, /* tv_id */ > > > > > > > + PROP_ssa, /* properties_required */ > > > > > > > + 0, /* properties_provided */ > > > > > > > + 0, /* properties_destroyed */ > > > > > > > + 0, /* todo_flags_start */ > > > > > > > + TODO_remove_unused_locals, /* todo_flags_finish */ > > > > > > > +}; > > > > > > > + > > > > > > > +class pass_cmp : public gimple_opt_pass > > > > > > > +{ > > > > > > > +public: > > > > > > > + pass_cmp (gcc::context *ctxt) > > > > > > > + : gimple_opt_pass (pass_data_cmp, ctxt) > > > > > > > + {} > > > > > > > + > > > > > > > + /* opt_pass methods: */ > > > > > > > + bool gate (function *) final override > > > > > > > + { > > > > > > > + /* If -fwrapv is enabled, this pass cannot be safely a= pplied as described > > > > > > > + in the comment at the top of this file. */ > > > > > > > + return flag_tree_cmp !=3D 0 && flag_wrapv =3D=3D 0; > > > > > > > + } > > > > > > > + > > > > > > > + unsigned int execute (function *) final override; > > > > > > > + > > > > > > > +}; // class pass_cmp > > > > > > > + > > > > > > > +class cmp_dom_walker : public dom_walker > > > > > > > +{ > > > > > > > +public: > > > > > > > + cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {} > > > > > > > + > > > > > > > + void after_dom_children (basic_block) final override; > > > > > > > +}; > > > > > > > + > > > > > > > +void > > > > > > > +cmp_dom_walker::after_dom_children (basic_block bb) > > > > > > > +{ > > > > > > > + gimple_stmt_iterator gsi; > > > > > > > + > > > > > > > + for (gsi =3D gsi_after_labels (bb); !gsi_end_p (gsi); gsi_= next (&gsi)) > > > > > > > + { > > > > > > > + gimple *stmt =3D gsi_stmt (gsi); > > > > > > > + if (gimple_code (stmt) =3D=3D GIMPLE_COND) > > > > > > > + optimize_signed_comparison (as_a (stmt)); > > > > > > > + } > > > > > > > +} > > > > > > > + > > > > > > > + > > > > > > > +unsigned int > > > > > > > +pass_cmp::execute (function *) > > > > > > > +{ > > > > > > > + calculate_dominance_info (CDI_DOMINATORS); > > > > > > > + cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > +} // anon namespace > > > > > > > + > > > > > > > +gimple_opt_pass * > > > > > > > +make_pass_cmp (gcc::context *ctxt) > > > > > > > +{ > > > > > > > + return new pass_cmp (ctxt); > > > > > > > +} > > > > > > > -- > > > > > > > 2.44.0 > > > > > > >