From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 91499 invoked by alias); 29 Jul 2015 17:19:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 91481 invoked by uid 89); 29 Jul 2015 17:19:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 29 Jul 2015 17:18:54 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42431) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZKV0G-0002Yl-IB for gcc-patches@gnu.org; Wed, 29 Jul 2015 13:18:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZKV08-0004dA-PG for gcc-patches@gnu.org; Wed, 29 Jul 2015 13:18:52 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:36714) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKV08-0004aU-Co for gcc-patches@gnu.org; Wed, 29 Jul 2015 13:18:44 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZKV06-0002Lz-00 from Tom_deVries@mentor.com ; Wed, 29 Jul 2015 10:18:42 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Wed, 29 Jul 2015 18:18:40 +0100 Message-ID: <55B90AD9.9040707@mentor.com> Date: Wed, 29 Jul 2015 17:29:00 -0000 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Richard Biener CC: "gcc-patches@gnu.org" Subject: Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1 References: <55B24E1F.1070705@mentor.com> <55B770A1.6010308@mentor.com> <55B8B758.8010902@mentor.com> In-Reply-To: Content-Type: multipart/mixed; boundary="------------060502000605070603000504" X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 X-SW-Source: 2015-07/txt/msg02487.txt.bz2 --------------060502000605070603000504 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-length: 8883 On 29/07/15 14:00, Richard Biener wrote: > On Wed, Jul 29, 2015 at 1:22 PM, Tom de Vries wrote: >> On 29/07/15 10:09, Richard Biener wrote: >>> >>> On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries >>> wrote: >>>> >>>> On 28/07/15 09:59, Richard Biener wrote: >>>>> >>>>> >>>>> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries >>>>> >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> this patch allows parallelization and vectorization of reduction >>>>>> operators >>>>>> that are guaranteed to not overflow (such as min and max operators), >>>>>> independent of the overflow behaviour of the type. >>>>>> >>>>>> Bootstrapped and reg-tested on x86_64. >>>>>> >>>>>> OK for trunk? >>>>> >>>>> >>>>> >>>>> Hmm, I don't like that no_overflow_tree_code function. We have a much >>>>> more >>>>> clear understanding which codes may overflow or trap. Thus please add >>>>> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} >>>>> like >>>>> >>>> >>>> Done. >>>> >>>>> bool >>>>> operation_overflow_traps (tree type, enum tree_code code) >>>>> { >>>>> if (!ANY_INTEGRAL_TYPE_P (type) >>>> >>>> >>>> >>>> I've changed this test into a gcc_checking_assert. >>>> >>>> >>>>> || !TYPE_OVERFLOW_TRAPS (type)) >>>>> return false; >>>>> switch (code) >>>>> { >>>>> case PLUS_EXPR: >>>>> case MINUS_EXPR: >>>>> case MULT_EXPR: >>>>> case LSHIFT_EXPR: >>>>> /* Can overflow in various ways */ >>>>> case TRUNC_DIV_EXPR: >>>>> case EXACT_DIV_EXPR: >>>>> case FLOOR_DIV_EXPR: >>>>> case CEIL_DIV_EXPR: >>>>> /* For INT_MIN / -1 */ >>>>> case NEGATE_EXPR: >>>>> case ABS_EXPR: >>>>> /* For -INT_MIN */ >>>>> return true; >>>>> default: >>>>> return false; >>>>> } >>>>> } >>>>> >>>>> and similar variants for _wraps and _undefined. I think we decided at >>>>> some point >>>>> the compiler should not take advantage of the fact that lshift or >>>>> *_div have undefined >>>>> behavior on signed integer overflow, similar we only take advantage of >>>>> integral-type >>>>> overflow behavior, not vector or complex. So we could reduce the >>>>> number of cases >>>>> the functions return true if we document that it returns true only for >>>>> the cases where >>>>> the compiler needs to / may assume wrapping behavior does not take >>>>> place. >>>>> As for _traps for example we only have optabs and libfuncs for >>>>> plus,minus,mult,negate >>>>> and abs. >>>> >>>> >>>> >>>> I've tried to capture all of this in the three new functions: >>>> - operation_overflows_and_traps >>>> - operation_no_overflow_or_wraps >>>> - operation_overflows_and_undefined (unused atm) >>>> >>>> I've also added the graphite bit. >>>> >>>> OK for trunk, if bootstrap and reg-test succeeds? >>> >>> >>> +/* Returns true if CODE operating on operands of type TYPE can overflow, >>> and >>> + fwrapv generates trapping insns for CODE. */ >>> >>> ftrapv >>> >> >> Done. >> >>> +bool >>> +operation_overflows_and_traps (tree type, enum tree_code code) >>> +{ >>> >>> operation_overflow_traps >>> >>> is better wording. Meaning that when the operation overflows then it >>> traps. >>> >> >> AFAIU, the purpose of the function is to enable optimizations when it >> returns false, that is: >> - if the operation doesn't overflow, or >> - if the operation overflows, but doesn't trap. >> >> The name operation_overflow_traps does not make clear what it returns when >> the operation doesn't overflow. If the name doesn't make it clear, you need >> to be conservative, that is, return true. Which defies the purpose of the >> function. >> >> I've changed the name to operation_no_trapping_overflow (and inverted logic >> in the function). >> >> But perhaps you want operation_overflow_traps with a conservative return for >> non-overflow operations, and use it like this: >> ... >> else if (INTEGRAL_TYPE_P (type) && check_reduction) >> { >> if (operation_overflows (type, code) >> && operation_overflow_traps (type, code)) >> { >> /* Changing the order of operations changes the semantics. */ >> ... >> ? > > I think operation_no_trapping_overflow has the same wording issue as > operation_overflow_traps but I'm not a native speaker Hmm, I'm also not a native speaker. I think I understand what you mean, if operation_no_trapping_overflow is read with stress on 'trapping', we have the same ambiguity issue. [ Possibility: a more verbose variant, but I hope no ambiguity: operation_can_overflow_and_trap ? ] > so I'll take your > word that operation_no_trapping_overflow is non-ambiguous iff the > operation cannot overflow. > > And no, I didn't mean to use it in combination with operation_overflows. > >>> + /* We don't take advantage of integral type overflow behaviour for >>> complex and >>> + vector types. */ >>> >>> We don't generate instructions that trap on overflow for complex or vector >>> types >>> >> >> Done. >> >>> + if (!INTEGRAL_TYPE_P (type)) >>> + return true; >>> >>> + switch (code) >>> + { >>> + case PLUS_EXPR: >>> + case MINUS_EXPR: >>> + case MULT_EXPR: >>> + case LSHIFT_EXPR: >>> + /* Can overflow in various ways. */ >>> >>> we don't have a trapping optab for lshift >>> >>> + case TRUNC_DIV_EXPR: >>> + case EXACT_DIV_EXPR: >>> + case FLOOR_DIV_EXPR: >>> + case CEIL_DIV_EXPR: >>> >>> nor division. See optabs.c:optab_for_tree_code. I suggest to only return >>> true >>> for those. >>> >> >> Before the logic inversion, we return false for these (And also for >> operators that do not overflow). >> >>> +/* Returns true if CODE operating on operands of type TYPE cannot >>> overflow, or >>> + wraps on overflow. */ >>> + >>> +bool >>> +operation_no_overflow_or_wraps (tree type, enum tree_code code) >>> +{ >>> + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); >>> >>> operation_overflow_wraps () >>> >>> is still my preferred name. >>> >> >> The name operation_overflow_wraps doesn't make clear what it returns if the >> operation doesn't overflow. And I didn't manage to come up with a better >> name sofar. >> >> Btw, I wonder about something like vector max operation. The current >> implementation of operation_no_overflow_or_wraps returns false. Could we do: >> ... >> /* We don't take advantage of integral type overflow behaviour for >> complex and vector types. */ >> if (!INTEGRAL_TYPE_P (type)) >> return !operation_overflows (type, code); >> ... >> ? > > Yes, we can use operation_overflows and the existing overflow macros as well: > > if (!INTEGRAL_TYPE_P (type) > || TYPE_OVERFLOW_WRAPS (type) > || !operation_overflows (type, code)) > For an unsigned vector add, this would evaluate to true. You remarked "similar we only take advantage of integral-type overflow behavior, not vector or complex". Was that remark then limited to exploiting undefined behaviour? > and get rid of operation_overflow_{wraps,undefined} That's not what I meant, but ok. > unless we want to take > advantage of the cases the compiler doesn't take advantage of the overflow > behavior. I thought the purpose of the functions was to specify in one location and clearly documented the situations that the compiler is allowed to take advantage of the overflow behavior. > I think keeping the traps variant separate makes sense because > of the clear facts on what trapping optabs we implement. > >>> +bool >>> +operation_overflow_and_undefined (tree type, enum tree_code code) >>> +{ >>> + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); >>> + >>> >>> operation_overflow_undefined () >>> >> >> The name operation_overflow_undefined doesn't make clear what it returns if >> the operation doesn't overflow. I've changed it into >> operation_undefined_overflow. >> >>> If you like to keep an explicit operation_can_overflow then there is the >>> opportunity to split out the switch statement from >>> operation_overflow_wraps >>> and operation_overflow_undefined. >>> > > Why 'operation_overflows' ... it's operation_can_overflow, it clearly doesn't > always overflow... Done. > Also it doesn't need its type argument (the assert > doesn't make much sense, for example fixed-point types can overflow > as well, likewise real types). Done. > > I'm fine with operation_no_trapping_overflow as in the patch, likewise > with operation_overflows apart from its name. > > operation_no_overflow_or_wraps can be replaced by > ANY_INTEGRAL_TYPE_P () > && (TYPE_OVERFLOW_WRAPS () || !operation_can_overflows (code)) > > conservatively operation_undefined_overflow could be treated the same > and I'd prefer to do it that way for now. > Dropped wrap/undefined functions. OK like this? Thanks, - Tom --------------060502000605070603000504 Content-Type: text/x-patch; name="0001-Allow-non-overflow-ops-in-reductions.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="0001-Allow-non-overflow-ops-in-reductions.patch" Content-length: 8868 Allow non-overflow ops in reductions 2015-07-28 Tom de Vries * tree.c (operation_can_overflow, operation_no_trapping_overflow): New function. * tree.h (operation_can_overflow, operation_no_trapping_overflow): Declare. * tree-vect-loop.c (vect_is_simple_reduction_1): Use operation_no_trapping_overflow. Allow non-overflow operations. * graphite-sese-to-poly.c (is_reduction_operation_p): Allow non-overflow operations. * gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute optimize ("-ftree-parallelize-loops=0"). Add successful scans for 2 detected reductions. Add xfail scans for 3 detected reductions. * gcc.dg/autopar/reduc-2short.c: Same. * gcc.dg/autopar/reduc-8.c (init_arrays): Mark with attribute optimize ("-ftree-parallelize-loops=0"). Add successful scans for 2 detected reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Update scan to match vectorized min and max reductions. --- gcc/graphite-sese-to-poly.c | 7 ++- gcc/testsuite/gcc.dg/autopar/reduc-2char.c | 10 ++-- gcc/testsuite/gcc.dg/autopar/reduc-2short.c | 10 ++-- gcc/testsuite/gcc.dg/autopar/reduc-8.c | 7 +-- gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c | 2 +- gcc/tree-vect-loop.c | 6 ++- gcc/tree.c | 69 ++++++++++++++++++++++++++ gcc/tree.h | 2 + 8 files changed, 98 insertions(+), 15 deletions(-) diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index c583f16..fdcc790 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -2614,8 +2614,11 @@ is_reduction_operation_p (gimple stmt) if (FLOAT_TYPE_P (type)) return flag_associative_math; - return (INTEGRAL_TYPE_P (type) - && TYPE_OVERFLOW_WRAPS (type)); + if (ANY_INTEGRAL_TYPE_P (type)) + return (TYPE_OVERFLOW_WRAPS (type) + || !operation_can_overflow (code)); + + return false; } /* Returns true when PHI contains an argument ARG. */ diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c index 14867f3..a2dad44 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c @@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result) abort (); } - __attribute__((noinline)) - void init_arrays () +void __attribute__((noinline)) + __attribute__((optimize ("-ftree-parallelize-loops=0"))) +init_arrays () { int i; @@ -60,7 +61,10 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */ + +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c index 7c19cc5..a50e14f 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c @@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result) abort (); } - __attribute__((noinline)) - void init_arrays () +void __attribute__((noinline)) + __attribute__((optimize ("-ftree-parallelize-loops=0"))) +init_arrays () { int i; @@ -58,7 +59,8 @@ int main (void) return 0; } +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ - diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c index 1d05c48..18ba03d 100644 --- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c +++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c @@ -40,7 +40,8 @@ testmin (const T *c, T init, T result) abort (); } -int main (void) +int __attribute__((optimize ("-ftree-parallelize-loops=0"))) +main (void) { static signed char A[N] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, @@ -84,5 +85,5 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c index 2129717..86f9b90 100644 --- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c @@ -46,4 +46,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index c31bfbd..59c75af 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2615,7 +2615,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, } else if (INTEGRAL_TYPE_P (type) && check_reduction) { - if (TYPE_OVERFLOW_TRAPS (type)) + if (!operation_no_trapping_overflow (type, code)) { /* Changing the order of operations changes the semantics. */ if (dump_enabled_p ()) @@ -2624,7 +2624,9 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, " (overflow traps): "); return NULL; } - if (need_wrapping_integral_overflow && !TYPE_OVERFLOW_WRAPS (type)) + if (need_wrapping_integral_overflow + && !TYPE_OVERFLOW_WRAPS (type) + && operation_can_overflow (code)) { /* Changing the order of operations changes the semantics. */ if (dump_enabled_p ()) diff --git a/gcc/tree.c b/gcc/tree.c index 94263af..3c2c20a 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -7597,6 +7597,75 @@ commutative_ternary_tree_code (enum tree_code code) return false; } +/* Returns true if CODE can overflow. */ + +bool +operation_can_overflow (enum tree_code code) +{ + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case LSHIFT_EXPR: + /* Can overflow in various ways. */ + return true; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + /* For INT_MIN / -1. */ + return true; + case NEGATE_EXPR: + case ABS_EXPR: + /* For -INT_MIN. */ + return true; + default: + /* These operators cannot overflow. */ + return false; + } +} + +/* Returns true if CODE operating on operands of type TYPE doesn't overflow, or + ftrapv doesn't generate trapping insns for CODE. */ + +bool +operation_no_trapping_overflow (tree type, enum tree_code code) +{ + gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type)); + + /* We don't generate instructions that trap on overflow for complex or vector + types. */ + if (!INTEGRAL_TYPE_P (type)) + return true; + + if (!TYPE_OVERFLOW_TRAPS (type)) + return true; + + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case NEGATE_EXPR: + case ABS_EXPR: + /* These operators can overflow, and -ftrapv generates trapping code for + these. */ + return false; + case TRUNC_DIV_EXPR: + case EXACT_DIV_EXPR: + case FLOOR_DIV_EXPR: + case CEIL_DIV_EXPR: + case LSHIFT_EXPR: + /* These operators can overflow, but -ftrapv does not generate trapping + code for these. */ + return true; + default: + /* These operators cannot overflow. */ + return true; + } +} + namespace inchash { diff --git a/gcc/tree.h b/gcc/tree.h index 6df2217..d280ea7 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4369,6 +4369,8 @@ extern int type_num_arguments (const_tree); extern bool associative_tree_code (enum tree_code); extern bool commutative_tree_code (enum tree_code); extern bool commutative_ternary_tree_code (enum tree_code); +extern bool operation_can_overflow (enum tree_code); +extern bool operation_no_trapping_overflow (tree, enum tree_code); extern tree upper_bound_in_type (tree, tree); extern tree lower_bound_in_type (tree, tree); extern int operand_equal_for_phi_arg_p (const_tree, const_tree); -- 1.9.1 --------------060502000605070603000504--