From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <roger@nextmovesoftware.com>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69])
	by sourceware.org (Postfix) with ESMTPS id 9F5C43858D28
	for <gcc-patches@gcc.gnu.org>; Tue, 13 Sep 2022 17:55:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9F5C43858D28
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
	Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
	List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
	bh=EccVkO1TQ1emsETA40yATHPg7IiXR1tbrZ3uAFrPhN4=; b=sj8ZHzU+kQwz8kFy3rNkSOD2AF
	SXIXA8WS6Vq5bINCUsFBLpCs0aOwTiQYUW3zeMjNaEjSdehd1pzv8RfVt+NIJmFrPCLDpawgZpvjN
	c4PeP/IqZ3BkL8//wBb4fK9lJOQetUXSgz3c0La27hrRZY6Gl6WN2LUBCMdQ2SY1jzWNDaDoHBdBc
	3G6y1+Z5uEQT86cZqTpNQDsjdu6r/Y1s8Pm16D57/m4P5WE0l17kFJAsOtjfO0zz4WJb5iy92f4/R
	LyY/try7ttMa9DXJvn2GGhK8G7NtTIlZ9JK6jrSpWT6i/xmfXNILK4pm8ZbTg6et735klZCo1jwNZ
	C4oT1/MQ==;
Received: from [185.62.158.67] (port=65476 helo=Dell)
	by server.nextmovesoftware.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <roger@nextmovesoftware.com>)
	id 1oYA7g-0005yu-Hz;
	Tue, 13 Sep 2022 13:55:00 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Cc: "'Richard Biener'" <richard.guenther@gmail.com>
References: <00a501d8aafd$e10b6da0$a32248e0$@nextmovesoftware.com> <CAFiYyc2=7LioHkhrKheXxG4-iLzm0t8rfAyzybcBJ_hDa67thw@mail.gmail.com>
In-Reply-To: <CAFiYyc2=7LioHkhrKheXxG4-iLzm0t8rfAyzybcBJ_hDa67thw@mail.gmail.com>
Subject: [PATCH] PR tree-optimization/71343: Value number X<<2 as X*4.
Date: Tue, 13 Sep 2022 18:54:58 +0100
Message-ID: <000e01d8c799$f1d2fe10$d578fa30$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_000F_01D8C7A2.539C2100"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdjHmK3mFL17HP/1T7OlS/Az8hD+5w==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

This is a multipart message in MIME format.

------=_NextPart_000_000F_01D8C7A2.539C2100
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable


This patch is the second part of a fix for PR tree-optimization/71343,
that implements Richard Biener's suggestion of using tree-ssa's value
numbering instead of match.pd.  The change is that when assigning a
value number for the expression X<<C, we actually look-up or insert
the value number for the multiplication X*(1<<C).  This elegantly
handles the fact that we (intentionally) don't canonicalize these as
equivalent in GIMPLE, and the optimization/equivalence in PR 71343 now
happens by (tree-ssa SCCVN) magic.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=3Dunix{-32},
with no new failures.  Ok for mainline?


2022-09-13  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR tree-optimization/71343
        * tree-ssa-sccvn.cc (visit_nary_op) <case LSHIFT_EXPR>: Make
        the value number of the expression X << C the same as the value
        number for the multiplication X * (1<<C).

gcc/testsuite/ChangeLog
        PR tree-optimization/71343
        * gcc.dg/pr71343-2.c: New test case.


Thanks in advance,
Roger
--

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 08 August 2022 12:42
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] PR tree-optimization/71343: Optimize =
(X<<C)&(Y<<C) as
> (X&Y)<<C.
>=20
> On Mon, Aug 8, 2022 at 10:07 AM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> >
> > This patch resolves PR tree-optimization/71343, a =
missed-optimization
> > enhancement request where GCC fails to see that (a<<2)+(b<<2) =3D=3D =
a*4+b*4.
> > This requires two related (sets of) optimizations to be added to =
match.pd.
> >
> > The first is that (X<<C) op (Y<<C) can be simplified to (X op Y) << =
C,
> > for many binary operators, including AND, IOR, XOR, and (if overflow
> > isn't an issue) PLUS and MINUS.  Likewise, the right shifts (both
> > logical and arithmetic) and bit-wise logical operators can be
> > simplified in a similar fashion.  These all reduce the number of
> > GIMPLE binary operations from 3 to 2, by combining/eliminating a =
shift
> operation.
> >
> > The second optimization reflects that the middle-end doesn't impose =
a
> > canonical form on multiplications by powers of two, vs. left shifts,
> > instead leaving these operations as specified by the programmer =
unless
> > there's a good reason to change them.  Hence, GIMPLE code may =
contain
> > the expressions "X * 8" and "X << 3" even though these represent the
> > same value/computation.  The tweak to match.pd is that comparison
> > operations whose operands are equivalent non-canonical expressions =
can
> > be taught their equivalence.  Hence "(X * 8) =3D=3D (X << 3)" will =
always
> > evaluate to true, and "(X<<2) > 4*X" will always evaluate to false.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make =
bootstrap
> > and make -k check, both with and without =
--target_board=3Dunix{-m32},
> > with no new failures.  Ok for mainline?
>=20
> +/* Shifts by constants distribute over several binary operations,
> +   hence (X << C) + (Y << C) can be simplified to (X + Y) << C.  */
> +(for op (plus minus)
> +  (simplify
> +    (op (lshift:s @0 INTEGER_CST@1) (lshift:s @2 INTEGER_CST@1))
> +    (if (INTEGRAL_TYPE_P (type)
> +        && TYPE_OVERFLOW_WRAPS (type)
> +        && !TYPE_SATURATING (type)
> +        && tree_fits_shwi_p (@1)
> +        && tree_to_shwi (@1) > 0
> +        && tree_to_shwi (@1) < TYPE_PRECISION (type))
>=20
> I do wonder why we need to restrict this to shifts by constants?
> Any out-of-bound shift was already there, no?
>=20
> +/* Some tree expressions are intentionally non-canonical.
> +   We handle the comparison of the equivalent forms here.  */ (for =
cmp
> +(eq le ge)
> +  (simplify
> +    (cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> +    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +        && tree_fits_shwi_p (@1)
> +        && tree_to_shwi (@1) > 0
> +        && tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> +        && wi::to_wide (@1) =3D=3D wi::exact_log2 (wi::to_wide (@2)))
> +      { constant_boolean_node (true, type); })))
> +
> +(for cmp (ne lt gt)
> +  (simplify
> +    (cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> +    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +        && tree_fits_shwi_p (@1)
> +        && tree_to_shwi (@1) > 0
> +        && tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> +        && wi::to_wide (@1) =3D=3D wi::exact_log2 (wi::to_wide (@2)))
> +      { constant_boolean_node (false, type); })))
>=20
> hmm.  I wonder if it makes more sense to handle this in =
value-numbering.
> tree-ssa-sccvn.cc:visit_nary_op handles some cases that are not =
exactly
> canonicalization issues but the shift vs mult could be handled there =
by just
> performing the alternate lookup.  That would also enable CSE and by =
means of
> that of course the comparisons you do above.
>=20
> Thanks,
> Richard.
>=20
> >
> > 2022-08-08  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR tree-optimization/71343
> >         * match.pd (op (lshift @0 @1) (lshift @2 @1)): Optimize the
> >         expression (X<<C) + (Y<<C) to (X+Y)<<C for multiple =
operators.
> >         (op (rshift @0 @1) (rshift @2 @1)): Likwise, simplify =
(X>>C)^(Y>>C)
> >         to (X^Y)>>C for binary logical operators, AND, IOR and XOR.
> >         (cmp:c (lshift @0) (mult @1)): Optimize comparisons between
> >         shifts by integer constants and multiplications by powers of =
2.
> >
> > gcc/testsuite/ChangeLog
> >         PR tree-optimization/71343
> >         * gcc.dg/pr71343-1.c: New test case.
> >         * gcc.dg/pr71343-2.c: Likewise.
> >
> >
> > Thanks in advance,
> > Roger
> > --


------=_NextPart_000_000F_01D8C7A2.539C2100
Content-Type: text/plain;
	name="patchvn.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="patchvn.txt"

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc=0A=
index 74b8d8d..2644446 100644=0A=
--- a/gcc/tree-ssa-sccvn.cc=0A=
+++ b/gcc/tree-ssa-sccvn.cc=0A=
@@ -5312,6 +5312,30 @@ visit_nary_op (tree lhs, gassign *stmt)=0A=
 	    }=0A=
 	}=0A=
       break;=0A=
+    case LSHIFT_EXPR:=0A=
+      /* For X << C, use the value number of X * (1 << C).  */=0A=
+      if (INTEGRAL_TYPE_P (type))=0A=
+	{=0A=
+	  tree rhs2 =3D gimple_assign_rhs2 (stmt);=0A=
+	  if (TREE_CODE (rhs2) =3D=3D INTEGER_CST=0A=
+	      && tree_fits_uhwi_p (rhs2)=0A=
+	      && tree_to_uhwi (rhs2) < TYPE_PRECISION (type))=0A=
+	    {=0A=
+	      wide_int w =3D wi::set_bit_in_zero (tree_to_uhwi (rhs2),=0A=
+						TYPE_PRECISION (type));=0A=
+	      gimple_match_op match_op (gimple_match_cond::UNCOND,=0A=
+					MULT_EXPR, type, rhs1,=0A=
+					wide_int_to_tree (type, w));=0A=
+	      result =3D vn_nary_build_or_lookup (&match_op);=0A=
+	      if (result)=0A=
+		{=0A=
+		  bool changed =3D set_ssa_val_to (lhs, result);=0A=
+		  vn_nary_op_insert_stmt (stmt, result);=0A=
+		  return changed;=0A=
+		}=0A=
+	    }=0A=
+	}=0A=
+      break;=0A=
     default:=0A=
       break;=0A=
     }=0A=
diff --git a/gcc/testsuite/gcc.dg/pr71343-2.c =
b/gcc/testsuite/gcc.dg/pr71343-2.c=0A=
new file mode 100644=0A=
index 0000000..11800a9=0A=
--- /dev/null=0A=
+++ b/gcc/testsuite/gcc.dg/pr71343-2.c=0A=
@@ -0,0 +1,34 @@=0A=
+/* { dg-do compile } */=0A=
+/* { dg-options "-O2 -fdump-tree-optimized" } */=0A=
+=0A=
+unsigned int test1(unsigned int a , unsigned int b)=0A=
+{=0A=
+  return (a << 2) + (b << 2) =3D=3D a * 4 + b * 4;=0A=
+}=0A=
+=0A=
+unsigned int test2(unsigned int a , unsigned int b)=0A=
+{=0A=
+  return (a << 2) + (b << 2) =3D=3D (a + b) << 2;=0A=
+}=0A=
+=0A=
+unsigned int test3(unsigned int a , unsigned int b)=0A=
+{=0A=
+  return a * 4 + b * 4 =3D=3D (a + b) * 4;=0A=
+}=0A=
+=0A=
+unsigned int test4(unsigned int a , unsigned int b)=0A=
+{=0A=
+  return (a + b) << 2 =3D=3D (a + b) * 4;=0A=
+}=0A=
+=0A=
+unsigned int test5(unsigned int a , unsigned int b)=0A=
+{=0A=
+  return (a << 2) + (b << 2) =3D=3D  (a + b) * 4;=0A=
+}=0A=
+=0A=
+unsigned int test6(unsigned int a , unsigned int b)=0A=
+{=0A=
+  return (a + b) << 2 =3D=3D a * 4 + b * 4;=0A=
+}=0A=
+=0A=
+/* { dg-final { scan-tree-dump-times "return 1" 6 "optimized" } } */=0A=

------=_NextPart_000_000F_01D8C7A2.539C2100--