public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits
@ 2021-09-08 11:55 npiggin at gmail dot com
  2021-09-09 19:40 ` [Bug target/102239] " segher at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: npiggin at gmail dot com @ 2021-09-08 11:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

            Bug ID: 102239
           Summary: powerpc suboptimal boolean test of contiguous bits
           Product: gcc
           Version: 11.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: npiggin at gmail dot com
  Target Milestone: ---
            Target: powerpc64le-linux-gnu

gcc version 11.2.1 20210815 (Debian 11.2.0-2) 

Build flags -O2

--- test.c ---
void foo(long arg)
{
        if (arg & ((1UL << 33) | (1UL << 34)))
                asm volatile("# if");
        else
                asm volatile("# else");
}
---

generates:

foo:
        rldicr 3,3,29,1
        srdi. 3,3,29
        beq 0,.L6
        # if
        blr
.L6:
        # else
        blr

This test of multiple contiguous bits could be tested with a single instruction
with rldicr. or rldicl.

Those instructions do tend to be more expensive than the Rc=0 form (C2 cracked
on POWER9, slower pipe class on POWER10), but I think they should be better
than the current 2-instruction sequence.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
@ 2021-09-09 19:40 ` segher at gcc dot gnu.org
  2021-11-24  7:17 ` luoxhu at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: segher at gcc dot gnu.org @ 2021-09-09 19:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-09-09
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Confirmed.

So the relevant insn

(parallel [(set (reg:CC 123)
                (compare:CC (and:DI (reg:DI 124)
                                    (const_int 25769803776 [0x600000000]))
                            (const_int 0 [0])))
           (clobber (scratch:DI))])

is matched by *and<mode>3_2insn but not by any pattern that ends up as just
one insn.  Not *and<mode>3_mask_dot, because that doesn't do a shift first,
is just an AND and there are no machine insns to do that; but there is no
pattern for what we can do.

*rotl<mode>3_mask_dot cannot do this either; the base and the dot2 of that
cannot be done, they return a shifted result, but that doesn't matter for
comparing it to 0.  So we should add a specialised version.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
  2021-09-09 19:40 ` [Bug target/102239] " segher at gcc dot gnu.org
@ 2021-11-24  7:17 ` luoxhu at gcc dot gnu.org
  2021-11-24 23:22 ` segher at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-11-24  7:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

luoxhu at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |luoxhu at gcc dot gnu.org

--- Comment #2 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #1)
> Confirmed.
> 
> So the relevant insn
> 
> (parallel [(set (reg:CC 123)
>                 (compare:CC (and:DI (reg:DI 124)
>                                     (const_int 25769803776 [0x600000000]))
>                             (const_int 0 [0])))
>            (clobber (scratch:DI))])
> 
> is matched by *and<mode>3_2insn but not by any pattern that ends up as just
> one insn.  Not *and<mode>3_mask_dot, because that doesn't do a shift first,
> is just an AND and there are no machine insns to do that; but there is no
> pattern for what we can do.
> 
> *rotl<mode>3_mask_dot cannot do this either; the base and the dot2 of that
> cannot be done, they return a shifted result, but that doesn't matter for
> comparing it to 0.  So we should add a specialised version.

Seems different with what you describe, in combine, it was combined to
anddi3_2insn_dot:

(insn 9 8 10 2 (parallel [
            (set (reg:CC 122)
                (compare:CC (and:DI (reg:DI 123)
                        (const_int 25769803776 [0x600000000]))
                    (const_int 0 [0])))
            (clobber (scratch:DI))
        ]) "pr102239.c":3:6 210 {*anddi3_2insn_dot}
     (expr_list:REG_DEAD (reg:DI 123)
        (nil)))
(jump_insn 10 9 11 2 (set (pc)
        (if_then_else (eq (reg:CC 122)
                (const_int 0 [0]))
            (label_ref 15)
            (pc))) "pr102239.c":3:6 868 {*cbranch}
     (expr_list:REG_DEAD (reg:CC 122)
        (int_list:REG_BR_PROB 536870916 (nil)))



Then in pr102239.c.302r.split2, it is split by "*and<mode>3_2insn_dot" to
rotldi3_mask+lshrdi3_dot:

Splitting with gen_split_80 (rs6000.md:3721)

(insn 32 8 33 2 (set (reg:DI 3 3 [124])
        (and:DI (ashift:DI (reg:DI 3 3 [123])
                (const_int 29 [0x1d]))
            (const_int -4611686018427387904 [0xc000000000000000])))
"pr102239.c":3:6 236 {*rotldi3_mask}
     (nil))
(insn 33 32 10 2 (parallel [
            (set (reg:CC 100 0 [122])
                (compare:CC (lshiftrt:DI (reg:DI 3 3 [124])
                        (const_int 29 [0x1d]))
                    (const_int 0 [0])))
            (clobber (reg:DI 3 3 [124]))
        ]) "pr102239.c":3:6 281 {*lshrdi3_dot}
     (nil))


Why this difference happens?

0x600000000 is not a valid mask for anddi3_2insn_dot:


 "(<MODE>mode == Pmode || UINTVAL (operands[2]) <= 0x7fffffff)
   && rs6000_is_valid_2insn_and (operands[2], <MODE>mode)
   && !(rs6000_is_valid_and_mask (operands[2], <MODE>mode)
        || logical_const_operand (operands[2], <MODE>mode))"


(gdb) p UINTVAL (operands[2]) <= 0x7fffffff
$84 = false
(gdb) p rs6000_is_valid_2insn_and (operands[2], E_DImode)
$85 = true
(gdb) p logical_const_operand (operands[2], E_DImode)
$86 = false
(gdb) p rs6000_is_valid_and_mask (operands[2], E_DImode)
$87 = false
(gdb) p Pmode
$88 = DImode

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
  2021-09-09 19:40 ` [Bug target/102239] " segher at gcc dot gnu.org
  2021-11-24  7:17 ` luoxhu at gcc dot gnu.org
@ 2021-11-24 23:22 ` segher at gcc dot gnu.org
  2021-11-26  8:50 ` luoxhu at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: segher at gcc dot gnu.org @ 2021-11-24 23:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #3 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to luoxhu from comment #2)
> (In reply to Segher Boessenkool from comment #1)
> > Confirmed.
> > 
> > So the relevant insn
> > 
> > (parallel [(set (reg:CC 123)
> >                 (compare:CC (and:DI (reg:DI 124)
> >                                     (const_int 25769803776 [0x600000000]))
> >                             (const_int 0 [0])))
> >            (clobber (scratch:DI))])
> > 
> > is matched by *and<mode>3_2insn but not by any pattern that ends up as just
> > one insn.  Not *and<mode>3_mask_dot, because that doesn't do a shift first,
> > is just an AND and there are no machine insns to do that; but there is no
> > pattern for what we can do.
> > 
> > *rotl<mode>3_mask_dot cannot do this either; the base and the dot2 of that
> > cannot be done, they return a shifted result, but that doesn't matter for
> > comparing it to 0.  So we should add a specialised version.
> 
> Seems different with what you describe, in combine, it was combined to
> anddi3_2insn_dot:
> 
> (insn 9 8 10 2 (parallel [
>             (set (reg:CC 122)
>                 (compare:CC (and:DI (reg:DI 123)
>                         (const_int 25769803776 [0x600000000]))
>                     (const_int 0 [0])))
>             (clobber (scratch:DI))
>         ]) "pr102239.c":3:6 210 {*anddi3_2insn_dot}
>      (expr_list:REG_DEAD (reg:DI 123)
>         (nil)))
> (jump_insn 10 9 11 2 (set (pc)
>         (if_then_else (eq (reg:CC 122)
>                 (const_int 0 [0]))
>             (label_ref 15)
>             (pc))) "pr102239.c":3:6 868 {*cbranch}
>      (expr_list:REG_DEAD (reg:CC 122)
>         (int_list:REG_BR_PROB 536870916 (nil)))
> 
> 
> 
> Then in pr102239.c.302r.split2, it is split by "*and<mode>3_2insn_dot" to
> rotldi3_mask+lshrdi3_dot:
> 
> Splitting with gen_split_80 (rs6000.md:3721)
> 
> (insn 32 8 33 2 (set (reg:DI 3 3 [124])
>         (and:DI (ashift:DI (reg:DI 3 3 [123])
>                 (const_int 29 [0x1d]))
>             (const_int -4611686018427387904 [0xc000000000000000])))
> "pr102239.c":3:6 236 {*rotldi3_mask}
>      (nil))
> (insn 33 32 10 2 (parallel [
>             (set (reg:CC 100 0 [122])
>                 (compare:CC (lshiftrt:DI (reg:DI 3 3 [124])
>                         (const_int 29 [0x1d]))
>                     (const_int 0 [0])))
>             (clobber (reg:DI 3 3 [124]))
>         ]) "pr102239.c":3:6 281 {*lshrdi3_dot}
>      (nil))
> 
> 
> Why this difference happens?

What difference?  It is split by the same pattern that matched it, and that
is the 2insn pattern.  I'm not sure what problem you see?

> 0x600000000 is not a valid mask for anddi3_2insn_dot:

It should be though!

>  "(<MODE>mode == Pmode || UINTVAL (operands[2]) <= 0x7fffffff)
>    && rs6000_is_valid_2insn_and (operands[2], <MODE>mode)
>    && !(rs6000_is_valid_and_mask (operands[2], <MODE>mode)
> 	|| logical_const_operand (operands[2], <MODE>mode))"
> 
> 
> (gdb) p UINTVAL (operands[2]) <= 0x7fffffff
> $84 = false

But we have Pmode here so all is fine.

> (gdb) p rs6000_is_valid_2insn_and (operands[2], E_DImode)
> $85 = true
> (gdb) p logical_const_operand (operands[2], E_DImode)
> $86 = false
> (gdb) p rs6000_is_valid_and_mask (operands[2], E_DImode)
> $87 = false
> (gdb) p Pmode
> $88 = DImode

Yes, and <MODE>mode is DImode as well.

My point is that we do not have any pattern that can recognise this that
will end up as just one machine insn.  anddi3_2insn_dot split to two
machine insns, and it should: its name indicates we want that!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (2 preceding siblings ...)
  2021-11-24 23:22 ` segher at gcc dot gnu.org
@ 2021-11-26  8:50 ` luoxhu at gcc dot gnu.org
  2021-11-26 15:53 ` segher at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-11-26  8:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #4 from luoxhu at gcc dot gnu.org ---
Simply adjust the sequence of dot instruction could produce expected code, is
this correct?


foo:
.LFB0:
        .cfi_startproc
        rldicr. 3,3,29,1
        beq 0,.L2
#APP
 # 10 "pr102239.c" 1
        # if
 # 0 "" 2
#NO_APP
        blr
        .p2align 4,,15
.L2:
#APP
 # 12 "pr102239.c" 1
        # else
 # 0 "" 2
#NO_APP
        blr



 git diff
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c9ce0550df1..2f0b5992bbf 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -11749,9 +11749,9 @@ rs6000_emit_2insn_and (machine_mode mode, rtx
*operands, bool expand, int dot)
        {
          rtx tmp = gen_rtx_ASHIFT (mode, operands[1], GEN_INT (shift));
          tmp = gen_rtx_AND (mode, tmp, GEN_INT (val << shift));
-         emit_move_insn (operands[0], tmp);
-         tmp = gen_rtx_LSHIFTRT (mode, operands[0], GEN_INT (shift));
          rs6000_emit_dot_insn (operands[0], tmp, dot, dot ? operands[3] : 0);
+         tmp = gen_rtx_LSHIFTRT (mode, operands[0], GEN_INT (shift));
+         emit_move_insn (operands[0], tmp);
        }
       return;
     }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (3 preceding siblings ...)
  2021-11-26  8:50 ` luoxhu at gcc dot gnu.org
@ 2021-11-26 15:53 ` segher at gcc dot gnu.org
  2021-11-29  3:31 ` luoxhu at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: segher at gcc dot gnu.org @ 2021-11-26 15:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to luoxhu from comment #4)
> Simply adjust the sequence of dot instruction could produce expected code,
> is this correct?

No it isn't.  Sorry.

> foo:
> .LFB0:
>         .cfi_startproc
>         rldicr. 3,3,29,1
>         beq 0,.L2

This is fine, but only because it tests the EQ bit (not the LT or GT bits).
So the generated RTL for this insn (the 2insn one) is not correct.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (4 preceding siblings ...)
  2021-11-26 15:53 ` segher at gcc dot gnu.org
@ 2021-11-29  3:31 ` luoxhu at gcc dot gnu.org
  2021-11-29  6:36 ` luoxhu at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-11-29  3:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #6 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #5)
> (In reply to luoxhu from comment #4)
> > Simply adjust the sequence of dot instruction could produce expected code,
> > is this correct?
> 
> No it isn't.  Sorry.

Sorry I don't understand what is wrong...

> 
> > foo:
> > .LFB0:
> >         .cfi_startproc
> >         rldicr. 3,3,29,1
> >         beq 0,.L2
> 
> This is fine, but only because it tests the EQ bit (not the LT or GT bits).
> So the generated RTL for this insn (the 2insn one) is not correct.

The generated RTL in pr102239.c.300r.split2 is:

(insn 32 8 33 2 (parallel [
            (set (reg:CC 100 0 [123])
                (compare:CC (and:DI (ashift:DI (reg:DI 3 3 [124])
                            (const_int 29 [0x1d]))
                        (const_int -4611686018427387904 [0xc000000000000000]))
                    (const_int 0 [0])))
            (clobber (reg:DI 3 3 [125]))
        ]) "pr102239.c":4:6 238 {*rotldi3_mask_dot}
     (nil))
(insn 33 32 10 2 (set (reg:DI 3 3 [125])
        (lshiftrt:DI (reg:DI 3 3 [125])
            (const_int 29 [0x1d]))) "pr102239.c":4:6 278 {lshrdi3}
     (nil))
(jump_insn 10 33 11 2 (set (pc)
        (if_then_else (eq (reg:CC 100 0 [123])
                (const_int 0 [0]))
            (label_ref 15)
            (pc))) "pr102239.c":4:6 868 {*cbranch}
     (int_list:REG_BR_PROB 536870916 (nil))
 -> 15)


rotldi3_mask_dot is what you mentioned in c#1, it is a shifted result and not
matter for comparing to 0:

> *rotl<mode>3_mask_dot cannot do this either; the base and the dot2 of that
> cannot be done, they return a shifted result, but that doesn't matter for
> comparing it to 0.  So we should add a specialised version.

What specialized version to add?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (5 preceding siblings ...)
  2021-11-29  3:31 ` luoxhu at gcc dot gnu.org
@ 2021-11-29  6:36 ` luoxhu at gcc dot gnu.org
  2021-11-29 10:05 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-11-29  6:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #7 from luoxhu at gcc dot gnu.org ---
 1| Dump of assembler code for function foo:
 2|    0x00000000100005e0 <+0>: rldicr. r3,r3,29,1
 3+>   0x00000000100005e4 <+4>: beq     0x100005f0 <foo+16>
 4|    0x00000000100005e8 <+8>: blr
 5|    0x00000000100005ec <+12>:    ori     r2,r2,0
 6|    0x00000000100005f0 <+16>:    blr
 7|    0x00000000100005f4 <+20>:    .long 0x0
 8|    0x00000000100005f8 <+24>:    .long 0x0

(gdb) si
0x00000000100005e4 in foo ()
1: /x $r3 = 0xc000000000000000
2: /x $cr = 0x82000282

cr0 is negative if only rotldi3_mask_dot, but it was 0x42000282 on master code.


BTW, clang also generated instructions with two rorates:

foo(long):                                # @foo(long)
        rldicl 3, 3, 31, 33
        rldicl. 3, 3, 33, 29
        beq     0, .LBB0_2
        blr
.LBB0_2:
        blr
        .long   0
        .quad   0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (6 preceding siblings ...)
  2021-11-29  6:36 ` luoxhu at gcc dot gnu.org
@ 2021-11-29 10:05 ` segher at gcc dot gnu.org
  2021-11-30  5:28 ` luoxhu at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: segher at gcc dot gnu.org @ 2021-11-29 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to luoxhu from comment #6)
> > > foo:
> > > .LFB0:
> > >         .cfi_startproc
> > >         rldicr. 3,3,29,1
> > >         beq 0,.L2
> > 
> > This is fine, but only because it tests the EQ bit (not the LT or GT bits).
> > So the generated RTL for this insn (the 2insn one) is not correct.
> 
> The generated RTL in pr102239.c.300r.split2 is:
> 
> (insn 32 8 33 2 (parallel [
>             (set (reg:CC 100 0 [123])
>                 (compare:CC (and:DI (ashift:DI (reg:DI 3 3 [124])
>                             (const_int 29 [0x1d]))
>                         (const_int -4611686018427387904
> [0xc000000000000000]))
>                     (const_int 0 [0])))
>             (clobber (reg:DI 3 3 [125]))
>         ]) "pr102239.c":4:6 238 {*rotldi3_mask_dot}
>      (nil))
> (insn 33 32 10 2 (set (reg:DI 3 3 [125])
>         (lshiftrt:DI (reg:DI 3 3 [125])
>             (const_int 29 [0x1d]))) "pr102239.c":4:6 278 {lshrdi3}
>      (nil))
> (jump_insn 10 33 11 2 (set (pc)
>         (if_then_else (eq (reg:CC 100 0 [123])
>                 (const_int 0 [0]))
>             (label_ref 15)
>             (pc))) "pr102239.c":4:6 868 {*cbranch}
>      (int_list:REG_BR_PROB 536870916 (nil))
>  -> 15)

So combine will have to look at insn 10 as well when it does the combination
(it often already does, via "other_insn") -- but also it does have to know
an "eq" is okay here, and that requires a new pattern.

> rotldi3_mask_dot is what you mentioned in c#1, it is a shifted result and
> not matter for comparing to 0:

It does matter, if what you are want to see is if it is smaller than zero or
greater than zero.  CCmode includes those things.  There is a CCEQmode for
if only the EQ bit is set correctly.

> > *rotl<mode>3_mask_dot cannot do this either; the base and the dot2 of that
> > cannot be done, they return a shifted result, but that doesn't matter for
> > comparing it to 0.  So we should add a specialised version.
> 
> What specialized version to add?

Some pattern that just does this as an rldicr, as a single insn.  It will
have to be excluded by the 2insn thing (it is only a single insn itself!),
and it will have to have comparison mode CCEQ only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (7 preceding siblings ...)
  2021-11-29 10:05 ` segher at gcc dot gnu.org
@ 2021-11-30  5:28 ` luoxhu at gcc dot gnu.org
  2021-11-30 22:59 ` segher at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-11-30  5:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #9 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #8)
> (In reply to luoxhu from comment #6)
> > > > foo:
> > > > .LFB0:
> > > >         .cfi_startproc
> > > >         rldicr. 3,3,29,1
> > > >         beq 0,.L2
> > > 
> > > This is fine, but only because it tests the EQ bit (not the LT or GT bits).
> > > So the generated RTL for this insn (the 2insn one) is not correct.
> > 
> > The generated RTL in pr102239.c.300r.split2 is:
> > 
> > (insn 32 8 33 2 (parallel [
> >             (set (reg:CC 100 0 [123])
> >                 (compare:CC (and:DI (ashift:DI (reg:DI 3 3 [124])
> >                             (const_int 29 [0x1d]))
> >                         (const_int -4611686018427387904
> > [0xc000000000000000]))
> >                     (const_int 0 [0])))
> >             (clobber (reg:DI 3 3 [125]))
> >         ]) "pr102239.c":4:6 238 {*rotldi3_mask_dot}
> >      (nil))
> > (insn 33 32 10 2 (set (reg:DI 3 3 [125])
> >         (lshiftrt:DI (reg:DI 3 3 [125])
> >             (const_int 29 [0x1d]))) "pr102239.c":4:6 278 {lshrdi3}
> >      (nil))
> > (jump_insn 10 33 11 2 (set (pc)
> >         (if_then_else (eq (reg:CC 100 0 [123])
> >                 (const_int 0 [0]))
> >             (label_ref 15)
> >             (pc))) "pr102239.c":4:6 868 {*cbranch}
> >      (int_list:REG_BR_PROB 536870916 (nil))
> >  -> 15)
> 
> So combine will have to look at insn 10 as well when it does the combination
> (it often already does, via "other_insn") -- but also it does have to know
> an "eq" is okay here, and that requires a new pattern.
> 
> > rotldi3_mask_dot is what you mentioned in c#1, it is a shifted result and
> > not matter for comparing to 0:
> 
> It does matter, if what you are want to see is if it is smaller than zero or
> greater than zero.  CCmode includes those things.  There is a CCEQmode for
> if only the EQ bit is set correctly.

Got it, thanks. As the example in c#7.  If CCmode is LT, rotate data to highest
bits will get negative result and set CR0 to negative, which is unexpected. 


> 
> > > *rotl<mode>3_mask_dot cannot do this either; the base and the dot2 of that
> > > cannot be done, they return a shifted result, but that doesn't matter for
> > > comparing it to 0.  So we should add a specialised version.
> > 
> > What specialized version to add?
> 
> Some pattern that just does this as an rldicr, as a single insn.  It will
> have to be excluded by the 2insn thing (it is only a single insn itself!),
> and it will have to have comparison mode CCEQ only.


I was motivated by the clang code, and tried to rotate the data to LSB instead,
it doesn't suffer from CCmode issue again?  Will this be simpler than the
combine & new pattern solution?

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c9ce0550df1..d2a5b916b1d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -11747,11 +11747,11 @@ rs6000_emit_2insn_and (machine_mode mode, rtx
*operands, bool expand, int dot)
        }
       else
        {
-         rtx tmp = gen_rtx_ASHIFT (mode, operands[1], GEN_INT (shift));
-         tmp = gen_rtx_AND (mode, tmp, GEN_INT (val << shift));
-         emit_move_insn (operands[0], tmp);
-         tmp = gen_rtx_LSHIFTRT (mode, operands[0], GEN_INT (shift));
+         rtx tmp = gen_rtx_LSHIFTRT (mode, operands[1], GEN_INT (ne));
+         tmp = gen_rtx_AND (mode, tmp, GEN_INT (val >> ne));
          rs6000_emit_dot_insn (operands[0], tmp, dot, dot ? operands[3] : 0);
+         tmp = gen_rtx_ASHIFT (mode, operands[0], GEN_INT (ne));
+         emit_move_insn (operands[0], tmp);
        }
       return;


RTL  pr102239.c.300r.split2:

(insn 32 8 33 2 (parallel [
            (set (reg:CC 100 0 [123])
                (compare:CC (and:DI (lshiftrt:DI (reg:DI 3 3 [124])
                            (const_int 33 [0x21]))
                        (const_int 3 [0x3]))
                    (const_int 0 [0])))
            (clobber (reg:DI 3 3 [125]))
        ]) "pr102239.c":4:6 238 {*rotldi3_mask_dot}
     (nil))
(insn 33 32 10 2 (set (reg:DI 3 3 [125])
        (ashift:DI (reg:DI 3 3 [125])
            (const_int 33 [0x21]))) "pr102239.c":4:6 268 {ashldi3}
     (nil))
(jump_insn 10 33 11 2 (set (pc)
        (if_then_else (eq (reg:CC 100 0 [123])
                (const_int 0 [0]))
            (label_ref 15)
            (pc))) "pr102239.c":4:6 868 {*cbranch}
     (int_list:REG_BR_PROB 536870916 (nil))
 -> 15)


ASM pr102239.s:

foo:
.LFB0:
        .cfi_startproc
        rldicl. 3,3,31,62
        beq 0,.L2
#APP
 # 5 "pr102239.c" 1
        # if
 # 0 "" 2
#NO_APP
        blr
        .p2align 4,,15
.L2:
#APP

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (8 preceding siblings ...)
  2021-11-30  5:28 ` luoxhu at gcc dot gnu.org
@ 2021-11-30 22:59 ` segher at gcc dot gnu.org
  2021-12-01  7:46 ` luoxhu at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: segher at gcc dot gnu.org @ 2021-11-30 22:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #10 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to luoxhu from comment #9)
> > It does matter, if what you are want to see is if it is smaller than zero or
> > greater than zero.  CCmode includes those things.  There is a CCEQmode for
> > if only the EQ bit is set correctly.
> 
> Got it, thanks. As the example in c#7.  If CCmode is LT, rotate data to
> highest bits will get negative result and set CR0 to negative, which is
> unexpected. 

CCmode is all three (or four, for non-fast FP) comparison results at once.
You can use for example both LT and EQ on the same result.  In CCEQmode
only the EQ bit is valid.

> > Some pattern that just does this as an rldicr, as a single insn.  It will
> > have to be excluded by the 2insn thing (it is only a single insn itself!),
> > and it will have to have comparison mode CCEQ only.
> 
> I was motivated by the clang code, and tried to rotate the data to LSB
> instead, it doesn't suffer from CCmode issue again?  Will this be simpler
> than the combine & new pattern solution?

It is incorrect, in a similar way.

It also is sub-optimal, it is better to tell GCC that this pattern will be
only one machine insn, so that it can consider that when choosing the best
insn patterns to use.

Also note the comment right before this code:

  /* If it is one stretch of ones, it is DImode; shift left, mask, then
     shift right.  This generates better code than doing the masks without
     shifts, or shifting first right and then left.  */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (9 preceding siblings ...)
  2021-11-30 22:59 ` segher at gcc dot gnu.org
@ 2021-12-01  7:46 ` luoxhu at gcc dot gnu.org
  2022-01-11  9:23 ` cvs-commit at gcc dot gnu.org
  2022-01-12  0:27 ` luoxhu at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-12-01  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #11 from luoxhu at gcc dot gnu.org ---


+(define_insn_and_split "*anddi3_insn_dot"
+ [(set (pc)
+    (if_then_else (eq (and:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r")
+                             (match_operand:DI 2 "const_int_operand" "n,n"))
+                     (const_int 0))
+                 (label_ref (match_operand 3 ""))
+                 (pc)))
+  (clobber (match_scratch:DI 0 "=r,r"))]
+  "rs6000_is_valid_2insn_and (operands[2], DImode)
+   && !(rs6000_is_valid_and_mask (operands[2], DImode)
+       || logical_const_operand (operands[2], DImode))"
+  "#"
+  "&& reload_completed"
+  [(pc)]
+{
+   int nb, ne;
+   if (rs6000_is_valid_mask (operands[2], &nb, &ne, DImode) && nb >= ne)
+     {
+       unsigned HOST_WIDE_INT val = INTVAL (operands[2]);
+       int shift = 63 - nb;
+       rtx tmp = gen_rtx_ASHIFT (DImode, operands[1], GEN_INT (shift));
+       tmp = gen_rtx_AND (DImode, tmp, GEN_INT (val << shift));
+       rtx cr0 = gen_rtx_REG (CCmode, CR0_REGNO);
+       rs6000_emit_dot_insn (operands[0], tmp, 1, cr0);
+       rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[3]);
+       rtx cond = gen_rtx_EQ (CCEQmode, cr0, const0_rtx);
+       rtx ite = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, loc_ref, pc_rtx);
+       emit_jump_insn (gen_rtx_SET (pc_rtx, ite));
+       DONE;
+     }
+   else
+     FAIL;
+}
+  [(set_attr "type" "shift")
+   (set_attr "dot" "yes")
+   (set_attr "length" "8,12")])
+


This pattern could combine the two instructions from

     9: {r123:CC=cmp(r124:DI&0x600000000,0);clobber scratch;}
           REG_DEAD r124:DI
     10: pc={(r123:CC==0)?L15:pc}
          REG_DEAD r123:CC

to:

     10: {pc={(r124:DI&0x600000000==0)?L15:pc};clobber scratch;}

then split2 will split it to one rotate dot instruction, is this OK?


(insn 32 9 33 2 (parallel [
            (set (reg:CC 100 0)
                (compare:CC (and:DI (ashift:DI (reg:DI 3 3 [124])
                            (const_int 29 [0x1d]))
                        (const_int -4611686018427387904 [0xc000000000000000]))
                    (const_int 0 [0])))
            (clobber (reg:DI 3 3 [125]))
        ]) "pr102239.c":4:6 239 {*rotldi3_mask_dot}
     (nil))
(jump_insn 33 32 11 2 (set (pc)
        (if_then_else (eq:CCEQ (reg:CC 100 0)
                (const_int 0 [0]))
            (label_ref 15)
            (pc))) "pr102239.c":4:6 869 {*cbranch}
     (int_list:REG_BR_PROB 536870916 (nil))
 -> 15)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (10 preceding siblings ...)
  2021-12-01  7:46 ` luoxhu at gcc dot gnu.org
@ 2022-01-11  9:23 ` cvs-commit at gcc dot gnu.org
  2022-01-12  0:27 ` luoxhu at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-11  9:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Xiong Hu Luo <luoxhu@gcc.gnu.org>:

https://gcc.gnu.org/g:19d81fda48f30c4fc11c8912749351acd9159c17

commit r12-6433-g19d81fda48f30c4fc11c8912749351acd9159c17
Author: Xionghu Luo <luoxhu@linux.ibm.com>
Date:   Sun Dec 12 23:17:13 2021 -0600

    rs6000: powerpc suboptimal boolean test of contiguous bits [PR102239]

    Add specialized version to combine two instructions from

     9: {r123:CC=cmp(r124:DI&0x600000000,0);clobber scratch;}
           REG_DEAD r124:DI
     10: pc={(r123:CC==0)?L15:pc}
          REG_DEAD r123:CC

    to:

     10: {pc={(r123:DI&0x600000000==0)?L15:pc};clobber scratch;clobber %0:CC;}

    then split2 will split it to one rotate dot instruction (to save one
    rotate back instruction) as shifted result doesn't matter when comparing
    to 0 in CCEQmode.

    Bootstrapped and regression tested pass on Power 8/9/10.

    gcc/ChangeLog:

            PR target/102239
            * config/rs6000/rs6000-protos.h (rs6000_is_valid_rotate_dot_mask):
New
            declare.
            * config/rs6000/rs6000.c (rs6000_is_valid_rotate_dot_mask): New
            function.
            * config/rs6000/rs6000.md (*branch_anddi3_dot): New.

    gcc/testsuite/ChangeLog:

            PR target/102239
            * gcc.target/powerpc/pr102239.c: New test.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/102239] powerpc suboptimal boolean test of contiguous bits
  2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
                   ` (11 preceding siblings ...)
  2022-01-11  9:23 ` cvs-commit at gcc dot gnu.org
@ 2022-01-12  0:27 ` luoxhu at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2022-01-12  0:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

luoxhu at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from luoxhu at gcc dot gnu.org ---
Fixed on master.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-12  0:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 11:55 [Bug c/102239] New: powerpc suboptimal boolean test of contiguous bits npiggin at gmail dot com
2021-09-09 19:40 ` [Bug target/102239] " segher at gcc dot gnu.org
2021-11-24  7:17 ` luoxhu at gcc dot gnu.org
2021-11-24 23:22 ` segher at gcc dot gnu.org
2021-11-26  8:50 ` luoxhu at gcc dot gnu.org
2021-11-26 15:53 ` segher at gcc dot gnu.org
2021-11-29  3:31 ` luoxhu at gcc dot gnu.org
2021-11-29  6:36 ` luoxhu at gcc dot gnu.org
2021-11-29 10:05 ` segher at gcc dot gnu.org
2021-11-30  5:28 ` luoxhu at gcc dot gnu.org
2021-11-30 22:59 ` segher at gcc dot gnu.org
2021-12-01  7:46 ` luoxhu at gcc dot gnu.org
2022-01-11  9:23 ` cvs-commit at gcc dot gnu.org
2022-01-12  0:27 ` luoxhu at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).