public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] match.pd: Convert {I,X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
@ 2024-01-08 14:43 Uros Bizjak
  2024-01-08 16:57 ` [PATCH] match.pd: Convert {I, X}OR " Andrew Pinski
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2024-01-08 14:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2112 bytes --]

Instead of converting XOR or PLUS of two values, ANDed with two constants that
have no bits in common, to IOR expression, convert IOR or XOR of said two
ANDed values to PLUS expression.

If we consider the following testcase:

--cut here--
unsigned int foo (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r + p + 2;
}

unsigned int bar (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r | p | 2;
}
--cut here--

the above testcase compiles (x86_64 -O2) to:

foo:
        andl    $1, %edi
        andl    $-4, %esi
        orl     %esi, %edi
        leal    2(%rdi), %eax
        ret

bar:
        andl    $1, %edi
        andl    $-4, %esi
        orl     %esi, %edi
        movl    %edi, %eax
        orl     $2, %eax
        ret

There is no further simplification possible in any case, we can't combine
OR with a PLUS in the first case, and we don't have OR instruction with
multiple inputs in the second case.

If we switch around the logic in the conversion and convert from IOR/XOR
to PLUS, then the resulting assembly reads:

foo:
        andl    $-4, %esi
        andl    $1, %edi
        leal    2(%rsi,%rdi), %eax
        ret

bar:
        andl    $1, %edi
        andl    $-4, %esi
        leal    (%rdi,%rsi), %eax
        orl     $2, %eax
        ret

On x86, the conversion can now use LEA instruction, which is much more
usable than OR instruction.  In the first case, LEA implements three input
ADD instruction, while in the second case, even though the instruction
can't be combined with a follow-up OR, the non-destructive LEA avoids a move.

    PR target/108477

gcc/ChangeLog:

    * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2):
    Do not convert PLUS of two values, ANDed with two constants
    that have no bits in common to IOR exporession, convert
    IOR or XOR of said two ANDed values to PLUS expression.

gcc/testsuite/ChangeLog:

    * gcc.target/i386/pr108477.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1643 bytes --]

diff --git a/gcc/match.pd b/gcc/match.pd
index 7b4b15acc41..deac18a7635 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1830,18 +1830,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && element_precision (type) <= element_precision (TREE_TYPE (@1)))
    (bit_not (rop (convert @0) (convert @1))))))
 
-/* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing
+/* If we are ORing or XORing two BIT_AND_EXPR's, both of which are and'ing
    with a constant, and the two constants have no bits in common,
-   we should treat this as a BIT_IOR_EXPR since this may produce more
+   we should treat this as a PLUS_EXPR since this may produce more
    simplifications.  */
-(for op (bit_xor plus)
+(for op (bit_ior bit_xor)
  (simplify
   (op (convert1? (bit_and@4 @0 INTEGER_CST@1))
       (convert2? (bit_and@5 @2 INTEGER_CST@3)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
        && tree_nop_conversion_p (type, TREE_TYPE (@2))
        && (wi::to_wide (@1) & wi::to_wide (@3)) == 0)
-   (bit_ior (convert @4) (convert @5)))))
+   (plus (convert @4) (convert @5)))))
 
 /* (X | Y) ^ X -> Y & ~ X*/
 (simplify
diff --git a/gcc/testsuite/gcc.target/i386/pr108477.c b/gcc/testsuite/gcc.target/i386/pr108477.c
new file mode 100644
index 00000000000..fb320a84c6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108477.c
@@ -0,0 +1,13 @@
+/* PR target/108477 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=att" } */
+
+unsigned int foo (unsigned int a, unsigned int b)
+{
+  unsigned int r = a & 0x1;
+  unsigned int p = b & ~0x3;
+
+  return r + p + 2;
+}
+
+/* { dg-final { scan-assembler-not "orl" } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-08 14:43 [PATCH] match.pd: Convert {I,X}OR of two values ANDed with alien CSTs to PLUS [PR108477] Uros Bizjak
@ 2024-01-08 16:57 ` Andrew Pinski
  2024-01-08 17:01   ` Jeff Law
  2024-01-08 20:10   ` Uros Bizjak
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Pinski @ 2024-01-08 16:57 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Biener

On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> Instead of converting XOR or PLUS of two values, ANDed with two constants that
> have no bits in common, to IOR expression, convert IOR or XOR of said two
> ANDed values to PLUS expression.

I think this only helps targets which have leal like instruction. Also
I think it is the same issue as I recorded as PR 111763 .  I suspect
BIT_IOR is more of a Canonical form for GIMPLE while we should handle
this in expand to decide if we want to use PLUS or IOR.

Thanks,
Andrew Pinski

>
> If we consider the following testcase:
>
> --cut here--
> unsigned int foo (unsigned int a, unsigned int b)
> {
>   unsigned int r = a & 0x1;
>   unsigned int p = b & ~0x3;
>
>   return r + p + 2;
> }
>
> unsigned int bar (unsigned int a, unsigned int b)
> {
>   unsigned int r = a & 0x1;
>   unsigned int p = b & ~0x3;
>
>   return r | p | 2;
> }
> --cut here--
>
> the above testcase compiles (x86_64 -O2) to:
>
> foo:
>         andl    $1, %edi
>         andl    $-4, %esi
>         orl     %esi, %edi
>         leal    2(%rdi), %eax
>         ret
>
> bar:
>         andl    $1, %edi
>         andl    $-4, %esi
>         orl     %esi, %edi
>         movl    %edi, %eax
>         orl     $2, %eax
>         ret
>
> There is no further simplification possible in any case, we can't combine
> OR with a PLUS in the first case, and we don't have OR instruction with
> multiple inputs in the second case.
>
> If we switch around the logic in the conversion and convert from IOR/XOR
> to PLUS, then the resulting assembly reads:
>
> foo:
>         andl    $-4, %esi
>         andl    $1, %edi
>         leal    2(%rsi,%rdi), %eax
>         ret
>
> bar:
>         andl    $1, %edi
>         andl    $-4, %esi
>         leal    (%rdi,%rsi), %eax
>         orl     $2, %eax
>         ret
>
> On x86, the conversion can now use LEA instruction, which is much more
> usable than OR instruction.  In the first case, LEA implements three input
> ADD instruction, while in the second case, even though the instruction
> can't be combined with a follow-up OR, the non-destructive LEA avoids a move.
>
>     PR target/108477
>
> gcc/ChangeLog:
>
>     * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2):
>     Do not convert PLUS of two values, ANDed with two constants
>     that have no bits in common to IOR exporession, convert
>     IOR or XOR of said two ANDed values to PLUS expression.
>
> gcc/testsuite/ChangeLog:
>
>     * gcc.target/i386/pr108477.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for mainline?
>
> Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-08 16:57 ` [PATCH] match.pd: Convert {I, X}OR " Andrew Pinski
@ 2024-01-08 17:01   ` Jeff Law
  2024-01-09  8:35     ` Richard Biener
  2024-01-08 20:10   ` Uros Bizjak
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Law @ 2024-01-08 17:01 UTC (permalink / raw)
  To: Andrew Pinski, Uros Bizjak; +Cc: gcc-patches, Richard Biener



On 1/8/24 09:57, Andrew Pinski wrote:
> On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> Instead of converting XOR or PLUS of two values, ANDed with two constants that
>> have no bits in common, to IOR expression, convert IOR or XOR of said two
>> ANDed values to PLUS expression.
> 
> I think this only helps targets which have leal like instruction. Also
> I think it is the same issue as I recorded as PR 111763 .  I suspect
> BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> this in expand to decide if we want to use PLUS or IOR.
Actually there's benefit on RISC-V to using PLUS over IOR/XOR when 
there's no bits in common.  In fact, I've been asked to do that by 
Andrew W. for a case where we know ahead of time there's no bits in 
common in a sequence that currently uses IOR.

Specifically it can allow more use of the compact instructions as the 
compact PLUS allows the full set of hard registers while compact IOR/XOR 
only allow a subset of registers.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-08 16:57 ` [PATCH] match.pd: Convert {I, X}OR " Andrew Pinski
  2024-01-08 17:01   ` Jeff Law
@ 2024-01-08 20:10   ` Uros Bizjak
  2024-01-09  8:53     ` Richard Biener
  1 sibling, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2024-01-08 20:10 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches, Richard Biener, Jeff Law

On Mon, Jan 8, 2024 at 5:57 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > ANDed values to PLUS expression.
>
> I think this only helps targets which have leal like instruction. Also
> I think it is the same issue as I recorded as PR 111763 .  I suspect
> BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> this in expand to decide if we want to use PLUS or IOR.

For the pr108477.c testcase, expand pass expands:

  r_3 = a_2(D) & 1;
 p_5 = b_4(D) & 4294967292;
 _1 = r_3 | p_5;
 _6 = _1 + 2;
 return _6;

The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
we need to determine values of constants. Is this information
available in the expand pass?

IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
the shown testcase would be beneficial when constructing control
register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
sequence in this case.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-08 17:01   ` Jeff Law
@ 2024-01-09  8:35     ` Richard Biener
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Biener @ 2024-01-09  8:35 UTC (permalink / raw)
  To: Jeff Law; +Cc: Andrew Pinski, Uros Bizjak, gcc-patches

On Mon, 8 Jan 2024, Jeff Law wrote:

> 
> 
> On 1/8/24 09:57, Andrew Pinski wrote:
> > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >>
> >> Instead of converting XOR or PLUS of two values, ANDed with two constants
> >> that
> >> have no bits in common, to IOR expression, convert IOR or XOR of said two
> >> ANDed values to PLUS expression.
> > 
> > I think this only helps targets which have leal like instruction. Also
> > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > this in expand to decide if we want to use PLUS or IOR.
> Actually there's benefit on RISC-V to using PLUS over IOR/XOR when there's no
> bits in common.  In fact, I've been asked to do that by Andrew W. for a case
> where we know ahead of time there's no bits in common in a sequence that
> currently uses IOR.
> 
> Specifically it can allow more use of the compact instructions as the compact
> PLUS allows the full set of hard registers while compact IOR/XOR only allow a
> subset of registers.

Still the benefit of IOR is no undefined overflow issues and thus better
re-association (on GIMPLE).  Of course in the end what is better
"depends", but when "depends" depends on what the target can do better
it's indeed better to leave that to either RTL expansion or
instruction selection.

Richard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-08 20:10   ` Uros Bizjak
@ 2024-01-09  8:53     ` Richard Biener
  2024-01-09  9:25       ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2024-01-09  8:53 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andrew Pinski, gcc-patches, Jeff Law

On Mon, 8 Jan 2024, Uros Bizjak wrote:

> On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> >
> > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > ANDed values to PLUS expression.
> >
> > I think this only helps targets which have leal like instruction. Also
> > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > this in expand to decide if we want to use PLUS or IOR.
> 
> For the pr108477.c testcase, expand pass expands:
> 
>   r_3 = a_2(D) & 1;
>  p_5 = b_4(D) & 4294967292;
>  _1 = r_3 | p_5;
>  _6 = _1 + 2;
>  return _6;
> 
> The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> we need to determine values of constants. Is this information
> available in the expand pass?

If there's single-uses then TER makes this info available.

> IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> the shown testcase would be beneficial when constructing control
> register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> sequence in this case.

The other possibility is to expose LEA as optab and making GIMPLE
instruction selection generate a direct internal function for that
(that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
but that has constraints on the addends mode (ptr_mode) which might
not fit what the target can do?  Otherwise that would be an existing
way to do this computation as well.

Richard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-09  8:53     ` Richard Biener
@ 2024-01-09  9:25       ` Uros Bizjak
  2024-01-09  9:39         ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2024-01-09  9:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew Pinski, gcc-patches, Jeff Law

On Tue, Jan 9, 2024 at 9:58 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 8 Jan 2024, Uros Bizjak wrote:
>
> > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > >
> > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > ANDed values to PLUS expression.
> > >
> > > I think this only helps targets which have leal like instruction. Also
> > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > this in expand to decide if we want to use PLUS or IOR.
> >
> > For the pr108477.c testcase, expand pass expands:
> >
> >   r_3 = a_2(D) & 1;
> >  p_5 = b_4(D) & 4294967292;
> >  _1 = r_3 | p_5;
> >  _6 = _1 + 2;
> >  return _6;
> >
> > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > we need to determine values of constants. Is this information
> > available in the expand pass?
>
> If there's single-uses then TER makes this info available.
>
> > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > the shown testcase would be beneficial when constructing control
> > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > sequence in this case.
>
> The other possibility is to expose LEA as optab and making GIMPLE
> instruction selection generate a direct internal function for that
> (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> but that has constraints on the addends mode (ptr_mode) which might
> not fit what the target can do?  Otherwise that would be an existing
> way to do this computation as well.

I think there is no need for a new optab. If we can determine at
expand time that ANDed values are fed to the IOR/XOR expressions, then
we can check the constants and emit PLUS RTX instead. RTL combine pass
will then create LEA instruction from separate PLUS instructions.

So, we can emit:

op0 = and (a, CST1)
op1 = and (b, CST2)
op2 = plus (op0, op1)

RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0

and

op0 = and (a, CST1)
op1 = plus (op0, CST2)

RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0

The above transformation is valid for IOR and XOR.

x86 can't combine IOR/XOR in any meaningful way, but can combine the
sequence of PLUS (together with MULT) RTXes to LEA.

(BTW: I am not versed in the expand stuff, so a disclaimer is at hand ;) )

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-09  9:25       ` Uros Bizjak
@ 2024-01-09  9:39         ` Richard Biener
  2024-01-09  9:53           ` Jakub Jelinek
  2024-01-09 10:02           ` Uros Bizjak
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Biener @ 2024-01-09  9:39 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andrew Pinski, gcc-patches, Jeff Law

On Tue, 9 Jan 2024, Uros Bizjak wrote:

> On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> >
> > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > >
> > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > ANDed values to PLUS expression.
> > > >
> > > > I think this only helps targets which have leal like instruction. Also
> > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > this in expand to decide if we want to use PLUS or IOR.
> > >
> > > For the pr108477.c testcase, expand pass expands:
> > >
> > >   r_3 = a_2(D) & 1;
> > >  p_5 = b_4(D) & 4294967292;
> > >  _1 = r_3 | p_5;
> > >  _6 = _1 + 2;
> > >  return _6;
> > >
> > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > we need to determine values of constants. Is this information
> > > available in the expand pass?
> >
> > If there's single-uses then TER makes this info available.
> >
> > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > the shown testcase would be beneficial when constructing control
> > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > sequence in this case.
> >
> > The other possibility is to expose LEA as optab and making GIMPLE
> > instruction selection generate a direct internal function for that
> > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > but that has constraints on the addends mode (ptr_mode) which might
> > not fit what the target can do?  Otherwise that would be an existing
> > way to do this computation as well.
> 
> I think there is no need for a new optab. If we can determine at
> expand time that ANDed values are fed to the IOR/XOR expressions, then
> we can check the constants and emit PLUS RTX instead. RTL combine pass
> will then create LEA instruction from separate PLUS instructions.
> 
> So, we can emit:
> 
> op0 = and (a, CST1)
> op1 = and (b, CST2)
> op2 = plus (op0, op1)
> 
> RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> 
> and
> 
> op0 = and (a, CST1)
> op1 = plus (op0, CST2)
> 
> RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> 
> The above transformation is valid for IOR and XOR.
> 
> x86 can't combine IOR/XOR in any meaningful way, but can combine the
> sequence of PLUS (together with MULT) RTXes to LEA.

Btw, this looks like a three-insn combination even with IOR so a
pattern for this case would work as well?

Deciding whether to use PLUS or IOR at RTL expansion time would
need to somehow figure out what's better for the target in question.
I'm not sure how to do that?

Richard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-09  9:39         ` Richard Biener
@ 2024-01-09  9:53           ` Jakub Jelinek
  2024-01-09 10:02           ` Uros Bizjak
  1 sibling, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2024-01-09  9:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, Andrew Pinski, gcc-patches, Jeff Law

On Tue, Jan 09, 2024 at 10:39:50AM +0100, Richard Biener wrote:
> > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > sequence of PLUS (together with MULT) RTXes to LEA.
> 
> Btw, this looks like a three-insn combination even with IOR so a
> pattern for this case would work as well?
> 
> Deciding whether to use PLUS or IOR at RTL expansion time would
> need to somehow figure out what's better for the target in question.
> I'm not sure how to do that?

Maybe a new optab which would be used when expanding
BIT_IOR_EXPR/BIT_XOR_EXPR with operands which have no common bits
(say in (get_nonzero_bits (arg1) & get_nonzero_bits (arg2)) == 0
sense)?
x86 and riscv could just emit a PLUS in those cases...

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-09 10:02           ` Uros Bizjak
@ 2024-01-09 10:01             ` Richard Biener
  2024-01-09 10:19               ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2024-01-09 10:01 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andrew Pinski, gcc-patches, Jeff Law

On Tue, 9 Jan 2024, Uros Bizjak wrote:

> On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> >
> > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > > >
> > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > >
> > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > > ANDed values to PLUS expression.
> > > > > >
> > > > > > I think this only helps targets which have leal like instruction. Also
> > > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > > this in expand to decide if we want to use PLUS or IOR.
> > > > >
> > > > > For the pr108477.c testcase, expand pass expands:
> > > > >
> > > > >   r_3 = a_2(D) & 1;
> > > > >  p_5 = b_4(D) & 4294967292;
> > > > >  _1 = r_3 | p_5;
> > > > >  _6 = _1 + 2;
> > > > >  return _6;
> > > > >
> > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > > we need to determine values of constants. Is this information
> > > > > available in the expand pass?
> > > >
> > > > If there's single-uses then TER makes this info available.
> > > >
> > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > > the shown testcase would be beneficial when constructing control
> > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > > sequence in this case.
> > > >
> > > > The other possibility is to expose LEA as optab and making GIMPLE
> > > > instruction selection generate a direct internal function for that
> > > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > > but that has constraints on the addends mode (ptr_mode) which might
> > > > not fit what the target can do?  Otherwise that would be an existing
> > > > way to do this computation as well.
> > >
> > > I think there is no need for a new optab. If we can determine at
> > > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > > will then create LEA instruction from separate PLUS instructions.
> > >
> > > So, we can emit:
> > >
> > > op0 = and (a, CST1)
> > > op1 = and (b, CST2)
> > > op2 = plus (op0, op1)
> > >
> > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> > >
> > > and
> > >
> > > op0 = and (a, CST1)
> > > op1 = plus (op0, CST2)
> > >
> > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> > >
> > > The above transformation is valid for IOR and XOR.
> > >
> > > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > > sequence of PLUS (together with MULT) RTXes to LEA.
> >
> > Btw, this looks like a three-insn combination even with IOR so a
> > pattern for this case would work as well?
> 
> IIUC the question: x86 does not have three-input IOR, but we want to
> emulate it with LEA (three-input PLUS, but one of the arguments has to
> be constant).

But couldn't you have a define_insn matching LEA but with IOR instead
of PLUS (and with the appropriate constraints?).  Maybe it could
also be combine trying PLUS instead of IOR if that's possible
(looking at the constants).

> We can do the conversion only when masking constants do
> not intersect. *However*, the original problem is that the compiler is
> converting PLUS to IOR, preventing the generation of LEA. As shown in
> the original PR, we would like to *prevent* the conversion from PLUS
> to IOR on x86, because it interferes with (a & CST1 + b & CST2 + CST3)
> - we would really want to generate LEA in this case.

I'm throwing in the argument that the user could have written
a & CST1 | b & CST2 + CST3 as well.

> > Deciding whether to use PLUS or IOR at RTL expansion time would
> > need to somehow figure out what's better for the target in question.
> > I'm not sure how to do that?
> 
> I think that a target hook would be needed. In addition to x86, Jeff
> presented a very specialized use case for RISC-V that can't be
> described otherwise.
> 
> Uros.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-09  9:39         ` Richard Biener
  2024-01-09  9:53           ` Jakub Jelinek
@ 2024-01-09 10:02           ` Uros Bizjak
  2024-01-09 10:01             ` Richard Biener
  1 sibling, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2024-01-09 10:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew Pinski, gcc-patches, Jeff Law

On Tue, Jan 9, 2024 at 10:44 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 9 Jan 2024, Uros Bizjak wrote:
>
> > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > >
> > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > >
> > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > ANDed values to PLUS expression.
> > > > >
> > > > > I think this only helps targets which have leal like instruction. Also
> > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > this in expand to decide if we want to use PLUS or IOR.
> > > >
> > > > For the pr108477.c testcase, expand pass expands:
> > > >
> > > >   r_3 = a_2(D) & 1;
> > > >  p_5 = b_4(D) & 4294967292;
> > > >  _1 = r_3 | p_5;
> > > >  _6 = _1 + 2;
> > > >  return _6;
> > > >
> > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > we need to determine values of constants. Is this information
> > > > available in the expand pass?
> > >
> > > If there's single-uses then TER makes this info available.
> > >
> > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > the shown testcase would be beneficial when constructing control
> > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > sequence in this case.
> > >
> > > The other possibility is to expose LEA as optab and making GIMPLE
> > > instruction selection generate a direct internal function for that
> > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > but that has constraints on the addends mode (ptr_mode) which might
> > > not fit what the target can do?  Otherwise that would be an existing
> > > way to do this computation as well.
> >
> > I think there is no need for a new optab. If we can determine at
> > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > will then create LEA instruction from separate PLUS instructions.
> >
> > So, we can emit:
> >
> > op0 = and (a, CST1)
> > op1 = and (b, CST2)
> > op2 = plus (op0, op1)
> >
> > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> >
> > and
> >
> > op0 = and (a, CST1)
> > op1 = plus (op0, CST2)
> >
> > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> >
> > The above transformation is valid for IOR and XOR.
> >
> > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > sequence of PLUS (together with MULT) RTXes to LEA.
>
> Btw, this looks like a three-insn combination even with IOR so a
> pattern for this case would work as well?

IIUC the question: x86 does not have three-input IOR, but we want to
emulate it with LEA (three-input PLUS, but one of the arguments has to
be constant). We can do the conversion only when masking constants do
not intersect. *However*, the original problem is that the compiler is
converting PLUS to IOR, preventing the generation of LEA. As shown in
the original PR, we would like to *prevent* the conversion from PLUS
to IOR on x86, because it interferes with (a & CST1 + b & CST2 + CST3)
- we would really want to generate LEA in this case.

> Deciding whether to use PLUS or IOR at RTL expansion time would
> need to somehow figure out what's better for the target in question.
> I'm not sure how to do that?

I think that a target hook would be needed. In addition to x86, Jeff
presented a very specialized use case for RISC-V that can't be
described otherwise.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-09 10:01             ` Richard Biener
@ 2024-01-09 10:19               ` Uros Bizjak
  2024-01-09 10:27                 ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2024-01-09 10:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew Pinski, gcc-patches, Jeff Law

On Tue, Jan 9, 2024 at 11:06 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 9 Jan 2024, Uros Bizjak wrote:
>
> > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> > >
> > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > > > >
> > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > > > ANDed values to PLUS expression.
> > > > > > >
> > > > > > > I think this only helps targets which have leal like instruction. Also
> > > > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > > > this in expand to decide if we want to use PLUS or IOR.
> > > > > >
> > > > > > For the pr108477.c testcase, expand pass expands:
> > > > > >
> > > > > >   r_3 = a_2(D) & 1;
> > > > > >  p_5 = b_4(D) & 4294967292;
> > > > > >  _1 = r_3 | p_5;
> > > > > >  _6 = _1 + 2;
> > > > > >  return _6;
> > > > > >
> > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > > > we need to determine values of constants. Is this information
> > > > > > available in the expand pass?
> > > > >
> > > > > If there's single-uses then TER makes this info available.
> > > > >
> > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > > > the shown testcase would be beneficial when constructing control
> > > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > > > sequence in this case.
> > > > >
> > > > > The other possibility is to expose LEA as optab and making GIMPLE
> > > > > instruction selection generate a direct internal function for that
> > > > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > > > but that has constraints on the addends mode (ptr_mode) which might
> > > > > not fit what the target can do?  Otherwise that would be an existing
> > > > > way to do this computation as well.
> > > >
> > > > I think there is no need for a new optab. If we can determine at
> > > > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > > > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > > > will then create LEA instruction from separate PLUS instructions.
> > > >
> > > > So, we can emit:
> > > >
> > > > op0 = and (a, CST1)
> > > > op1 = and (b, CST2)
> > > > op2 = plus (op0, op1)
> > > >
> > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> > > >
> > > > and
> > > >
> > > > op0 = and (a, CST1)
> > > > op1 = plus (op0, CST2)
> > > >
> > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> > > >
> > > > The above transformation is valid for IOR and XOR.
> > > >
> > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > > > sequence of PLUS (together with MULT) RTXes to LEA.
> > >
> > > Btw, this looks like a three-insn combination even with IOR so a
> > > pattern for this case would work as well?
> >
> > IIUC the question: x86 does not have three-input IOR, but we want to
> > emulate it with LEA (three-input PLUS, but one of the arguments has to
> > be constant).
>
> But couldn't you have a define_insn matching LEA but with IOR instead
> of PLUS (and with the appropriate constraints?).  Maybe it could
> also be combine trying PLUS instead of IOR if that's possible
> (looking at the constants).

we would have to include masking ANDs in the define_insn and add
additional conditions regarding mask constants in the insn constraint.

So, this define_insn would look something like:

(ior (ior (and (op1, CST1), and (op2, CST2)), CST3))

with (CST1 & CST2 & CST3) == 0 condition.

and combinations with PLUS / MULT RTXes including all the variants
with two arguments.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
  2024-01-09 10:19               ` Uros Bizjak
@ 2024-01-09 10:27                 ` Uros Bizjak
  0 siblings, 0 replies; 13+ messages in thread
From: Uros Bizjak @ 2024-01-09 10:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew Pinski, gcc-patches, Jeff Law

On Tue, Jan 9, 2024 at 11:19 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Jan 9, 2024 at 11:06 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> >
> > > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> > > >
> > > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > > > > >
> > > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > > > > ANDed values to PLUS expression.
> > > > > > > >
> > > > > > > > I think this only helps targets which have leal like instruction. Also
> > > > > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > > > > this in expand to decide if we want to use PLUS or IOR.
> > > > > > >
> > > > > > > For the pr108477.c testcase, expand pass expands:
> > > > > > >
> > > > > > >   r_3 = a_2(D) & 1;
> > > > > > >  p_5 = b_4(D) & 4294967292;
> > > > > > >  _1 = r_3 | p_5;
> > > > > > >  _6 = _1 + 2;
> > > > > > >  return _6;
> > > > > > >
> > > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > > > > we need to determine values of constants. Is this information
> > > > > > > available in the expand pass?
> > > > > >
> > > > > > If there's single-uses then TER makes this info available.
> > > > > >
> > > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > > > > the shown testcase would be beneficial when constructing control
> > > > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > > > > sequence in this case.
> > > > > >
> > > > > > The other possibility is to expose LEA as optab and making GIMPLE
> > > > > > instruction selection generate a direct internal function for that
> > > > > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > > > > but that has constraints on the addends mode (ptr_mode) which might
> > > > > > not fit what the target can do?  Otherwise that would be an existing
> > > > > > way to do this computation as well.
> > > > >
> > > > > I think there is no need for a new optab. If we can determine at
> > > > > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > > > > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > > > > will then create LEA instruction from separate PLUS instructions.
> > > > >
> > > > > So, we can emit:
> > > > >
> > > > > op0 = and (a, CST1)
> > > > > op1 = and (b, CST2)
> > > > > op2 = plus (op0, op1)
> > > > >
> > > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> > > > >
> > > > > and
> > > > >
> > > > > op0 = and (a, CST1)
> > > > > op1 = plus (op0, CST2)
> > > > >
> > > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> > > > >
> > > > > The above transformation is valid for IOR and XOR.
> > > > >
> > > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > > > > sequence of PLUS (together with MULT) RTXes to LEA.
> > > >
> > > > Btw, this looks like a three-insn combination even with IOR so a
> > > > pattern for this case would work as well?
> > >
> > > IIUC the question: x86 does not have three-input IOR, but we want to
> > > emulate it with LEA (three-input PLUS, but one of the arguments has to
> > > be constant).
> >
> > But couldn't you have a define_insn matching LEA but with IOR instead
> > of PLUS (and with the appropriate constraints?).  Maybe it could
> > also be combine trying PLUS instead of IOR if that's possible
> > (looking at the constants).
>
> we would have to include masking ANDs in the define_insn and add
> additional conditions regarding mask constants in the insn constraint.
>
> So, this define_insn would look something like:
>
> (ior (ior (and (op1, CST1), and (op2, CST2)), CST3))
>
> with (CST1 & CST2 & CST3) == 0 condition.
>
> and combinations with PLUS / MULT RTXes including all the variants
> with two arguments.

Oh, and then we would have to split masking ANDs out of the above
instruction. We can't clobber the inputs, so this should be made with
a temporary registers before reload...

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-01-09 10:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-08 14:43 [PATCH] match.pd: Convert {I,X}OR of two values ANDed with alien CSTs to PLUS [PR108477] Uros Bizjak
2024-01-08 16:57 ` [PATCH] match.pd: Convert {I, X}OR " Andrew Pinski
2024-01-08 17:01   ` Jeff Law
2024-01-09  8:35     ` Richard Biener
2024-01-08 20:10   ` Uros Bizjak
2024-01-09  8:53     ` Richard Biener
2024-01-09  9:25       ` Uros Bizjak
2024-01-09  9:39         ` Richard Biener
2024-01-09  9:53           ` Jakub Jelinek
2024-01-09 10:02           ` Uros Bizjak
2024-01-09 10:01             ` Richard Biener
2024-01-09 10:19               ` Uros Bizjak
2024-01-09 10:27                 ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).