From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 66160 invoked by alias); 5 Sep 2017 06:38:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 66141 invoked by uid 89); 5 Sep 2017 06:38:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-ua0-f173.google.com Received: from mail-ua0-f173.google.com (HELO mail-ua0-f173.google.com) (209.85.217.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 05 Sep 2017 06:38:16 +0000 Received: by mail-ua0-f173.google.com with SMTP id s15so6154051uag.1 for ; Mon, 04 Sep 2017 23:38:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5y2CBnVbS7+SNcZoenxaGfhN6te2A6dtvcxbGGOb0m4=; b=OE9+qhv88S8QBx323zYcS+86GmkPCGUxtyD92o0P2ZviNKXcSXZcaW6c6FX3VWNpAK LaMXZim5pTv/o8iXd4LfwHQ4n6VkTZ5LwbSJi2ovPd52hmHzCTNdR9FXaNUk5tYPO4kX fFKspi6U46QiHq1g6GqFTu8mdNXZQDjT37dtpUGY7x0ZNvMuNvvpdNouC5gw10rLMOM9 VY0LJVr45U8Up/tDcb2PcGZEAYhj6IOqz5Cwr9rjv0H+JiNyfwE/zNNWdGfu++45XfIo AXp/fN3Cal3YLj6K0ltmVa2+DOrrLwnnRRdcM5GHnr55LAKrxJrfes5p3qbf9uC/9GqZ mv4g== X-Gm-Message-State: AHPjjUjQt0SlFDOYdIMidyk+YkPpQNJnUGpBLxG3oOKELVSFm0nwQyDx ZKDrakZs/aUhaIRXjCodMttBwJOIWws/ X-Google-Smtp-Source: ADKCNb7P0yrpAodccsLKYsaZoyzRv/dCq3GAsvVY5f/l65fniFi9Vju5x6MSpVXQiRLnBCT2+9K/rDU7x/vh6rQQSWc= X-Received: by 10.176.94.175 with SMTP id y47mr1884813uag.95.1504593494644; Mon, 04 Sep 2017 23:38:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.81.3 with HTTP; Mon, 4 Sep 2017 23:38:14 -0700 (PDT) In-Reply-To: <0ae5510d-3dc4-7038-ebb2-7291abd22f27@redhat.com> References: <56788DA6.80904@redhat.com> <5690444A.2070404@redhat.com> <56948AD4.1020909@redhat.com> <5695F13E.6030507@redhat.com> <0ae5510d-3dc4-7038-ebb2-7291abd22f27@redhat.com> From: Christophe Lyon Date: Tue, 05 Sep 2017 06:38:00 -0000 Message-ID: Subject: Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands To: Jeff Law Cc: Richard Biener , GCC Patches Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2017-09/txt/msg00214.txt.bz2 Hi Jeff, On 3 September 2017 at 16:44, Jeff Law wrote: > On 01/13/2016 05:30 AM, Richard Biener wrote: >> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law wrote: >>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>> >>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law wrote: >>>>> >>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>> >>>>>> >>>>>> Yeah, reassoc is largely about canonicalization. >>>>>> >>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>> right >>>>>>> in reassoc to begin with. >>>>>> >>>>>> >>>>>> >>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>> >>>>> >>>>> What's best for that expression would depend on factors like whether or >>>>> not >>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>> parallelism >>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>> the >>>>> bit test. >>>>> >>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>> there's >>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>> around, >>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>> Based >>>>> on the comments in reassoc, it didn't seem like the author thought >>>>> anything >>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>> the >>>>> 3-operand case. >>>>> >>>>> >>>>> >>>>>> >>>>>> So I'm not sure what pattern the backend is looking for? >>>>> >>>>> >>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>> set, flip, test, etc idioms. >>>> >>>> >>>> But those don't feed another bit operation, right? Thus we'd like to see >>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>> are designed to feed conditionals (aka CC consuming ops)? >>> >>> At the gimple level they could feed a conditional, or be part of a series of >>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>> >>> For the 3-op case, we always want the constant last. For the 4-op case it's >>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >>> & c. >> >> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >> to special-case bitwise ops. The "real" fix to the sorting heuristic would be >> to sort constants at the opposite end. >> >> That might be too invasive right now but there is another "convenient" place: >> >> /* If the operand vector is now empty, all operands were >> consumed by the __builtin_powi optimization. */ >> ... >> else >> { >> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >> int ops_num = ops.length (); >> int width = get_reassociation_width (ops_num, rhs_code, mode); >> tree new_lhs = lhs; >> >> if (dump_file && (dump_flags & TDF_DETAILS)) >> fprintf (dump_file, >> "Width = %d was chosen for >> reassociation\n", width); >> >> at this point you can check rhs_code and move the (single) constant >> entry in ops (if there is any constant entry) from .last () to the beginning. >> >> That'll get the 4 operand case correct as well and properly models >> "constant last" for the specified operation kind. > Resurrecting an old thread... Just now getting around to flushing this > out of the queue. > > To recap, given something like (x & y) & C reassociation will turn that > into (x & C) & y. It's functionally equivalent, but it will inhibit > generation of bit test instructions. > > I originally hacked up swap_ops_for_binary_stmt. You requested that > change be made in reassociate_bb so that it would apply to cases where > there are more than 3 args. > > So that's what this patch does. OK for the trunk now? > > Bootstrapped and regression tested on x86_64. Also tested the new > testcase on m68k. > > > commit c10ae0339674c27c89a1fa1904217a55bf530cb3 > Author: Jeff Law > Date: Sun Sep 3 10:42:30 2017 -0400 > > 2017-09-03 Jeff Law > > PR tree-optimization/64910 > * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > swap the first and last operand if the last is a constant. > > PR tree-optimization/64910 > * gcc.dg/tree-ssa/pr64910-2.c: New test. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 3f632ca31c2..2c9a8c8265a 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,9 @@ > +2017-09-03 Jeff Law > + > + PR tree-optimization/64910 > + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > + swap the first and last operand if the last is a constant. > + > 2017-09-01 Segher Boessenkool > > PR rtl-optimization/82024 > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 4ead57edfa2..766677c899b 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,8 @@ > +2017-09-03 Jeff Law > + > + PR tree-optimization/64910 > + * gcc.dg/tree-ssa/pr64910-2.c: New test. > + > 2017-09-01 Jakub Jelinek > > PR target/81766 > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > new file mode 100644 > index 00000000000..2e3d6790776 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ > + > +/* We want to make sure that we reassociate in a way that has the > + constant last. With the constant last, it's more likely to result > + in a bitfield test on targets with such capabilities. */ > + > +extern void boo (); > + > +int b2b_uc (unsigned char u, unsigned char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_us (unsigned short u, unsigned short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ui (unsigned int u, unsigned int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ul (unsigned long u, unsigned long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ull (unsigned long long u, unsigned long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_sc (signed char u, signed char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ss (signed short u, signed short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_si (signed int u, signed int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sl (signed long u, signed long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sll (signed long long u, signed long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +/* The AND of U & W should go into a temporary, when is then ANDed > + with the constant. > + > + First verify that we have the right number of ANDs between U and W. */ > +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ > + > +/* Then verify that we have the right number of ANDS between a temporary > + and the constant. */ > +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ This part of the new testcase fails on aarch64 & arm: FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1 "_[0-9]+ & 32;" 10 Christophe > + > +/* Each function has one AND. It will have either a second AND or TEST. So > + we can count the number of AND and TEST instructions. They must be 2X > + the number of test functions in this file. */ > +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* x86_64-*-*} } } } */ > + > +/* Similarly on the m68k. The code for the long long tests is suboptimal, > + which catch via the second pattern and xfail. */ > +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } */ > +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } } } } */ > + > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > index 561acea4dcc..76048196b27 100644 > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb) > fprintf (dump_file, > "Width = %d was chosen for reassociation\n", width); > > + > + /* For binary bit operations, if the last operand in > + OPS is a constant, move it to the front. This > + helps ensure that we generate (X & Y) & C rather > + than (X & C) & Y. The former will often match > + a canonical bit test when we get to RTL. */ > + if ((rhs_code == BIT_AND_EXPR > + || rhs_code == BIT_IOR_EXPR > + || rhs_code == BIT_XOR_EXPR) > + && TREE_CODE (ops.last ()->op) == INTEGER_CST) > + std::swap (*ops[0], *ops[ops_num - 1]); > + > if (width > 1 > && ops.length () > 3) > rewrite_expr_tree_parallel (as_a (stmt), >