public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Christophe Lyon <christophe.lyon@linaro.org>
To: Jeff Law <law@redhat.com>
Cc: Richard Biener <richard.guenther@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands
Date: Tue, 05 Sep 2017 06:38:00 -0000	[thread overview]
Message-ID: <CAKdteOaseuPmr8xquhtNMimn_O1dFX45MKreQRDWswwVMwdKcQ@mail.gmail.com> (raw)
In-Reply-To: <0ae5510d-3dc4-7038-ebb2-7291abd22f27@redhat.com>

Hi Jeff,


On 3 September 2017 at 16:44, Jeff Law <law@redhat.com> wrote:
> On 01/13/2016 05:30 AM, Richard Biener wrote:
>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote:
>>> On 01/12/2016 08:11 AM, Richard Biener wrote:
>>>>
>>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 01/11/2016 03:32 AM, Richard Biener wrote:
>>>>>
>>>>>>
>>>>>> Yeah, reassoc is largely about canonicalization.
>>>>>>
>>>>>>> Plus doing it in TER is almost certainly more complex than getting it
>>>>>>> right
>>>>>>> in reassoc to begin with.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I guess canonicalizing differently is ok but you'll still create
>>>>>> ((a & b) & 1) & c then if you only change the above place.
>>>>>
>>>>>
>>>>> What's best for that expression would depend on factors like whether or
>>>>> not
>>>>> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more
>>>>> parallelism
>>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose
>>>>> the
>>>>> bit test.
>>>>>
>>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as
>>>>> there's
>>>>> no ILP or chance of creating a bit test.  My patch shuffles things
>>>>> around,
>>>>> but still doesn't expose the ILP or bit test in the 4 operand case.
>>>>> Based
>>>>> on the comments in reassoc, it didn't seem like the author thought
>>>>> anything
>>>>> beyond the 3-operand case was worth handling. So my patch just handles
>>>>> the
>>>>> 3-operand case.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> So I'm not sure what pattern the backend is looking for?
>>>>>
>>>>>
>>>>> It just wants the constant last in the sequence.  That exposes bit clear,
>>>>> set, flip, test, etc idioms.
>>>>
>>>>
>>>> But those don't feed another bit operation, right?  Thus we'd like to see
>>>> ((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
>>>> are designed to feed conditionals (aka CC consuming ops)?
>>>
>>> At the gimple level they could feed a conditional, or be part of a series of
>>> ops on an SSA_NAME that eventually gets stored to memory, etc.  At the RTL
>>> level they'll feed CC consumers and bit manipulations of pseudos or memory.
>>>
>>> For the 3-op case, we always want the constant last.  For the 4-op case it's
>>> less clear.  Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1)
>>> & c.
>>
>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place
>> to special-case bitwise ops.  The "real" fix to the sorting heuristic would be
>> to sort constants at the opposite end.
>>
>> That might be too invasive right now but there is another "convenient" place:
>>
>>               /* If the operand vector is now empty, all operands were
>>                  consumed by the __builtin_powi optimization.  */
>> ...
>>               else
>>                 {
>>                   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
>>                   int ops_num = ops.length ();
>>                   int width = get_reassociation_width (ops_num, rhs_code, mode);
>>                   tree new_lhs = lhs;
>>
>>                   if (dump_file && (dump_flags & TDF_DETAILS))
>>                     fprintf (dump_file,
>>                              "Width = %d was chosen for
>> reassociation\n", width);
>>
>> at this point you can check rhs_code and move the (single) constant
>> entry in ops (if there is any constant entry) from .last () to the beginning.
>>
>> That'll get the 4 operand case correct as well and properly models
>> "constant last" for the specified operation kind.
> Resurrecting an old thread...  Just now getting around to flushing this
> out of the queue.
>
> To recap, given something like (x & y) & C reassociation will turn that
> into (x & C) & y.  It's functionally equivalent, but it will inhibit
> generation of bit test instructions.
>
> I originally hacked up swap_ops_for_binary_stmt.  You requested that
> change be made in reassociate_bb so that it would apply to cases where
> there are more than 3 args.
>
> So that's what this patch does.   OK for the trunk now?
>
> Bootstrapped and regression tested on x86_64.  Also tested the new
> testcase on m68k.
>
>
> commit c10ae0339674c27c89a1fa1904217a55bf530cb3
> Author: Jeff Law <law@torsion.usersys.redhat.com>
> Date:   Sun Sep 3 10:42:30 2017 -0400
>
>     2017-09-03  Jeff Law  <law@redhat.com>
>
>             PR tree-optimization/64910
>             * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
>             swap the first and last operand if the last is a constant.
>
>             PR tree-optimization/64910
>             * gcc.dg/tree-ssa/pr64910-2.c: New test.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3f632ca31c2..2c9a8c8265a 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2017-09-03  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimization/64910
> +       * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
> +       swap the first and last operand if the last is a constant.
> +
>  2017-09-01  Segher Boessenkool  <segher@kernel.crashing.org>
>
>         PR rtl-optimization/82024
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 4ead57edfa2..766677c899b 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2017-09-03  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimization/64910
> +       * gcc.dg/tree-ssa/pr64910-2.c: New test.
> +
>  2017-09-01  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/81766
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
> new file mode 100644
> index 00000000000..2e3d6790776
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
> @@ -0,0 +1,85 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-reassoc1" } */
> +
> +/* We want to make sure that we reassociate in a way that has the
> +   constant last.  With the constant last, it's more likely to result
> +   in a bitfield test on targets with such capabilities.  */
> +
> +extern void boo ();
> +
> +int b2b_uc (unsigned char u, unsigned char w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +
> +int b2b_us (unsigned short u, unsigned short w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +
> +int b2b_ui (unsigned int u, unsigned int w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +int b2b_ul (unsigned long u, unsigned long w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +int b2b_ull (unsigned long long u, unsigned long long w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +
> +int b2b_sc (signed char u, signed char w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +
> +int b2b_ss (signed short u, signed short w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +
> +int b2b_si (signed int u, signed int w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +int b2b_sl (signed long u, signed long w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +int b2b_sll (signed long long u, signed long long w)
> +{
> +  if ((u & w) & 0x20)
> +    boo ();
> +}
> +
> +/* The AND of U & W should go into a temporary, when is then ANDed
> +   with the constant.
> +
> +   First verify that we have the right number of ANDs between U and W.  */
> +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */
> +
> +/* Then verify that we have the right number of ANDS between a temporary
> +   and the constant.  */
> +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */

This part of the new testcase fails on aarch64 & arm:
FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1
"_[0-9]+ & 32;" 10

Christophe

> +
> +/* Each function has one AND.  It will have either a second AND or TEST.  So
> +   we can count the number of AND and TEST instructions.  They must be 2X
> +   the number of test functions in this file.  */
> +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* x86_64-*-*} } } } */
> +
> +/* Similarly on the m68k.  The code for the long long tests is suboptimal,
> +   which catch via the second pattern and xfail.  */
> +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } */
> +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } } } } */
> +
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index 561acea4dcc..76048196b27 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb)
>                     fprintf (dump_file,
>                              "Width = %d was chosen for reassociation\n", width);
>
> +
> +                 /* For binary bit operations, if the last operand in
> +                    OPS is a constant, move it to the front.  This
> +                    helps ensure that we generate (X & Y) & C rather
> +                    than (X & C) & Y.  The former will often match
> +                    a canonical bit test when we get to RTL.  */
> +                 if ((rhs_code == BIT_AND_EXPR
> +                      || rhs_code == BIT_IOR_EXPR
> +                      || rhs_code == BIT_XOR_EXPR)
> +                     && TREE_CODE (ops.last ()->op) == INTEGER_CST)
> +                   std::swap (*ops[0], *ops[ops_num - 1]);
> +
>                   if (width > 1
>                       && ops.length () > 3)
>                     rewrite_expr_tree_parallel (as_a <gassign *> (stmt),
>

  parent reply	other threads:[~2017-09-05  6:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21 23:39 Jeff Law
2016-01-08  9:37 ` Richard Biener
2016-01-08 23:20   ` Jeff Law
2016-01-11 10:32     ` Richard Biener
2016-01-12  5:10       ` Jeff Law
2016-01-12 15:11         ` Richard Biener
2016-01-13  6:40           ` Jeff Law
2016-01-13 12:30             ` Richard Biener
2017-09-03 14:45               ` Jeff Law
2017-09-04  9:36                 ` Richard Biener
2017-09-05  6:38                 ` Christophe Lyon [this message]
2017-09-05 17:26                   ` Jeff Law
2017-09-06  5:21                     ` Jeff Law
2017-09-06  9:26                       ` Jakub Jelinek
2017-10-13 16:37                         ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKdteOaseuPmr8xquhtNMimn_O1dFX45MKreQRDWswwVMwdKcQ@mail.gmail.com \
    --to=christophe.lyon@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).