From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-508749-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14833 invoked by alias); 10 Sep 2019 07:45:47 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 14671 invoked by uid 89); 10 Sep 2019 07:45:36 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-7.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Sep 2019 07:45:34 +0000
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 612C910C6968;	Tue, 10 Sep 2019 07:45:33 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-116-139.ams2.redhat.com [10.36.116.139])	by smtp.corp.redhat.com (Postfix) with ESMTPS id E777C6092F;	Tue, 10 Sep 2019 07:45:32 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id x8A7jUq7022159;	Tue, 10 Sep 2019 09:45:31 +0200
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id x8A7jSSg022158;	Tue, 10 Sep 2019 09:45:28 +0200
Date: Tue, 10 Sep 2019 07:45:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH] Optimize (A / (cast) (1 << B)) -> (A >> B) (PR middle-end/91680)
Message-ID: <20190910074528.GZ2120@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.11.3 (2019-02-01)
X-IsSubscribed: yes
X-SW-Source: 2019-09/txt/msg00640.txt.bz2

Hi!

The following patch optimizes (A / (cast) (1 << B)) -> (A >> B)
in addition to already supported (A / (1 << B)) -> (A >> B).

The patch only supports same precision cast (i.e. a sign change),
or widening cast, in that case either zero extension (then the extended
value is known to be a power of two and all is fine), or sign extension
if the first operand has all upper bits starting with the sign bit of the
narrower type clear.  That is because (unsigned long long) (1 << 31)
is 0xffffffff80000000ULL and we need to make sure that dividing by that
huge value is equal to >> 31.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-09-10  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/91680
	* match.pd ((A / (1 << B)) -> (A >> B)): Allow widening cast from
	the shift type to type.

	* gcc.dg/tree-ssa/pr91680.c: New test.
	* g++.dg/torture/pr91680.C: New test.

--- gcc/match.pd.jj	2019-09-09 17:33:08.551582878 +0200
+++ gcc/match.pd	2019-09-09 20:05:25.966107441 +0200
@@ -305,13 +305,29 @@ (define_operator_list COND_TERNARY
 /* (A / (1 << B)) -> (A >> B).
    Only for unsigned A.  For signed A, this would not preserve rounding
    toward zero.
-   For example: (-1 / ( 1 << B)) !=  -1 >> B.  */
+   For example: (-1 / ( 1 << B)) !=  -1 >> B.
+   Also also widening conversions, like:
+   (A / (unsigned long long) (1U << B)) -> (A >> B)
+   or
+   (A / (unsigned long long) (1 << B)) -> (A >> B).
+   If the left shift is signed, it can be done only if the upper bits
+   of A starting from shift's type sign bit are zero, as
+   (unsigned long long) (1 << 31) is -2147483648ULL, not 2147483648ULL,
+   so it is valid only if A >> 31 is zero.  */
 (simplify
- (trunc_div @0 (lshift integer_onep@1 @2))
+ (trunc_div @0 (convert? (lshift integer_onep@1 @2)))
  (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0))
       && (!VECTOR_TYPE_P (type)
 	  || target_supports_op_p (type, RSHIFT_EXPR, optab_vector)
-	  || target_supports_op_p (type, RSHIFT_EXPR, optab_scalar)))
+	  || target_supports_op_p (type, RSHIFT_EXPR, optab_scalar))
+      && (useless_type_conversion_p (type, TREE_TYPE (@1))
+	  || (element_precision (type) >= element_precision (TREE_TYPE (@1))
+	      && (TYPE_UNSIGNED (TREE_TYPE (@1))
+		  || (element_precision (type)
+		      == element_precision (TREE_TYPE (@1)))
+		  || (get_nonzero_bits (@0)
+		      & wi::mask (element_precision (TREE_TYPE (@1)) - 1, true,
+				  element_precision (type))) == 0))))
   (rshift @0 @2)))
 
 /* Preserve explicit divisions by 0: the C++ front-end wants to detect
--- gcc/testsuite/gcc.dg/tree-ssa/pr91680.c.jj	2019-09-09 20:18:04.349619827 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr91680.c	2019-09-09 20:18:33.922171856 +0200
@@ -0,0 +1,37 @@
+/* PR middle-end/91680 */
+/* { dg-do compile { target { ilp32 || lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-forwprop1" } */
+/* { dg-final { scan-tree-dump-times " / " 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times " >> " 3 "forwprop1" } } */
+
+__attribute__((noipa)) unsigned long long
+foo (unsigned char x)
+{
+  unsigned long long q = 1 << x;
+  return 256 / q;
+}
+
+__attribute__((noipa)) unsigned long long
+bar (unsigned char x)
+{
+  unsigned long long q = 1U << x;
+  return 256 / q;
+}
+
+__attribute__((noipa)) unsigned long long
+baz (unsigned char x, unsigned long long y)
+{
+  /* This can't be optimized, at least not in C++ and maybe not
+     in C89, because for x 31 q is -2147483648ULL, not
+     2147483648ULL, and e.g. 2147483648ULL >> 31 is 1, while
+     2147483648ULL / -2147483648ULL is 0.  */
+  unsigned long long q = 1 << x;
+  return y / q;
+}
+
+__attribute__((noipa)) unsigned long long
+qux (unsigned char x, unsigned long long y)
+{
+  unsigned long long q = 1U << x;
+  return y / q;
+}
--- gcc/testsuite/g++.dg/torture/pr91680.C.jj	2019-09-09 20:25:51.775539166 +0200
+++ gcc/testsuite/g++.dg/torture/pr91680.C	2019-09-09 20:26:18.610132667 +0200
@@ -0,0 +1,35 @@
+/* PR middle-end/91680 */
+/* { dg-do run { target { ilp32 || lp64 } } } */
+
+extern "C" void abort ();
+
+#include "../../gcc.dg/tree-ssa/pr91680.c"
+
+int
+main ()
+{
+  unsigned char i;
+  for (i = 0; i < __SIZEOF_INT__ * __CHAR_BIT__; i++)
+    {
+      volatile unsigned long long q = 1 << i;
+      if (foo (i) != 256 / q)
+	abort ();
+      q = 1U << i;
+      if (bar (i) != 256 / q)
+	abort ();
+      q = 1 << i;
+      if (baz (i, (1U << i) - 1) != ((1U << i) - 1) / q)
+	abort ();
+      if (baz (i, 1U << i) != (1U << i) / q)
+	abort ();
+      if (baz (i, -1) != -1 / q)
+	abort ();
+      q = 1U << i;
+      if (qux (i, (1U << i) - 1) != ((1U << i) - 1) / q)
+	abort ();
+      if (qux (i, 1U << i) != (1U << i) / q)
+	abort ();
+      if (qux (i, -1) != -1 / q)
+	abort ();
+    }
+}

	Jakub