From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14833 invoked by alias); 10 Sep 2019 07:45:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 14671 invoked by uid 89); 10 Sep 2019 07:45:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Sep 2019 07:45:34 +0000 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 612C910C6968; Tue, 10 Sep 2019 07:45:33 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-139.ams2.redhat.com [10.36.116.139]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E777C6092F; Tue, 10 Sep 2019 07:45:32 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id x8A7jUq7022159; Tue, 10 Sep 2019 09:45:31 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id x8A7jSSg022158; Tue, 10 Sep 2019 09:45:28 +0200 Date: Tue, 10 Sep 2019 07:45:00 -0000 From: Jakub Jelinek To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Optimize (A / (cast) (1 << B)) -> (A >> B) (PR middle-end/91680) Message-ID: <20190910074528.GZ2120@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.11.3 (2019-02-01) X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg00640.txt.bz2 Hi! The following patch optimizes (A / (cast) (1 << B)) -> (A >> B) in addition to already supported (A / (1 << B)) -> (A >> B). The patch only supports same precision cast (i.e. a sign change), or widening cast, in that case either zero extension (then the extended value is known to be a power of two and all is fine), or sign extension if the first operand has all upper bits starting with the sign bit of the narrower type clear. That is because (unsigned long long) (1 << 31) is 0xffffffff80000000ULL and we need to make sure that dividing by that huge value is equal to >> 31. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2019-09-10 Jakub Jelinek PR middle-end/91680 * match.pd ((A / (1 << B)) -> (A >> B)): Allow widening cast from the shift type to type. * gcc.dg/tree-ssa/pr91680.c: New test. * g++.dg/torture/pr91680.C: New test. --- gcc/match.pd.jj 2019-09-09 17:33:08.551582878 +0200 +++ gcc/match.pd 2019-09-09 20:05:25.966107441 +0200 @@ -305,13 +305,29 @@ (define_operator_list COND_TERNARY /* (A / (1 << B)) -> (A >> B). Only for unsigned A. For signed A, this would not preserve rounding toward zero. - For example: (-1 / ( 1 << B)) != -1 >> B. */ + For example: (-1 / ( 1 << B)) != -1 >> B. + Also also widening conversions, like: + (A / (unsigned long long) (1U << B)) -> (A >> B) + or + (A / (unsigned long long) (1 << B)) -> (A >> B). + If the left shift is signed, it can be done only if the upper bits + of A starting from shift's type sign bit are zero, as + (unsigned long long) (1 << 31) is -2147483648ULL, not 2147483648ULL, + so it is valid only if A >> 31 is zero. */ (simplify - (trunc_div @0 (lshift integer_onep@1 @2)) + (trunc_div @0 (convert? (lshift integer_onep@1 @2))) (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0)) && (!VECTOR_TYPE_P (type) || target_supports_op_p (type, RSHIFT_EXPR, optab_vector) - || target_supports_op_p (type, RSHIFT_EXPR, optab_scalar))) + || target_supports_op_p (type, RSHIFT_EXPR, optab_scalar)) + && (useless_type_conversion_p (type, TREE_TYPE (@1)) + || (element_precision (type) >= element_precision (TREE_TYPE (@1)) + && (TYPE_UNSIGNED (TREE_TYPE (@1)) + || (element_precision (type) + == element_precision (TREE_TYPE (@1))) + || (get_nonzero_bits (@0) + & wi::mask (element_precision (TREE_TYPE (@1)) - 1, true, + element_precision (type))) == 0)))) (rshift @0 @2))) /* Preserve explicit divisions by 0: the C++ front-end wants to detect --- gcc/testsuite/gcc.dg/tree-ssa/pr91680.c.jj 2019-09-09 20:18:04.349619827 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr91680.c 2019-09-09 20:18:33.922171856 +0200 @@ -0,0 +1,37 @@ +/* PR middle-end/91680 */ +/* { dg-do compile { target { ilp32 || lp64 } } } */ +/* { dg-options "-O2 -fdump-tree-forwprop1" } */ +/* { dg-final { scan-tree-dump-times " / " 1 "forwprop1" } } */ +/* { dg-final { scan-tree-dump-times " >> " 3 "forwprop1" } } */ + +__attribute__((noipa)) unsigned long long +foo (unsigned char x) +{ + unsigned long long q = 1 << x; + return 256 / q; +} + +__attribute__((noipa)) unsigned long long +bar (unsigned char x) +{ + unsigned long long q = 1U << x; + return 256 / q; +} + +__attribute__((noipa)) unsigned long long +baz (unsigned char x, unsigned long long y) +{ + /* This can't be optimized, at least not in C++ and maybe not + in C89, because for x 31 q is -2147483648ULL, not + 2147483648ULL, and e.g. 2147483648ULL >> 31 is 1, while + 2147483648ULL / -2147483648ULL is 0. */ + unsigned long long q = 1 << x; + return y / q; +} + +__attribute__((noipa)) unsigned long long +qux (unsigned char x, unsigned long long y) +{ + unsigned long long q = 1U << x; + return y / q; +} --- gcc/testsuite/g++.dg/torture/pr91680.C.jj 2019-09-09 20:25:51.775539166 +0200 +++ gcc/testsuite/g++.dg/torture/pr91680.C 2019-09-09 20:26:18.610132667 +0200 @@ -0,0 +1,35 @@ +/* PR middle-end/91680 */ +/* { dg-do run { target { ilp32 || lp64 } } } */ + +extern "C" void abort (); + +#include "../../gcc.dg/tree-ssa/pr91680.c" + +int +main () +{ + unsigned char i; + for (i = 0; i < __SIZEOF_INT__ * __CHAR_BIT__; i++) + { + volatile unsigned long long q = 1 << i; + if (foo (i) != 256 / q) + abort (); + q = 1U << i; + if (bar (i) != 256 / q) + abort (); + q = 1 << i; + if (baz (i, (1U << i) - 1) != ((1U << i) - 1) / q) + abort (); + if (baz (i, 1U << i) != (1U << i) / q) + abort (); + if (baz (i, -1) != -1 / q) + abort (); + q = 1U << i; + if (qux (i, (1U << i) - 1) != ((1U << i) - 1) / q) + abort (); + if (qux (i, 1U << i) != (1U << i) / q) + abort (); + if (qux (i, -1) != -1 / q) + abort (); + } +} Jakub