[middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
@ 2024-01-18 19:54 Roger Sayle
  2024-01-19 11:03 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Roger Sayle @ 2024-01-18 19:54 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2075 bytes --]


This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:

unsigned long long foo(unsigned long long x) { return x<<2; }

which with -O2 is currently compiled to:

foo:    lsr     r2,r0,30
        asl_s   r1,r1,2
        asl_s   r0,r0,2
        j_s.d   [blink]
        or_s    r1,r1,r2

with this patch becomes:

foo:    lsr     r2,r0,30
        add2    r1,r2,r1
        j_s.d   [blink]
        asl_s   r0,r0,2

unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }

which with -O2 is currently compiled to 6 insns + return:

bar:    lsr     r12,r0,30
        asl_s   r3,r1,2
        asl_s   r0,r0,2
        lsr_s   r1,r1,30
        or_s    r0,r0,r1
        j_s.d   [blink]
        or      r1,r12,r3

with this patch becomes 4 insns + return:

bar:    lsr     r3,r1,30
        lsr     r2,r0,30
        add2    r1,r2,r1
        j_s.d   [blink]
        add2    r0,r3,r0


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
        to generate PLUS instead or IOR when unioning disjoint bitfields.
        * optabs.cc (expand_subword_shift): Likewise.
        (expand_binop): Likewise for double-word rotate.


Thanks in advance,
Roger
--


[-- Attachment #2: patchex.txt --]
[-- Type: text/plain, Size: 2707 bytes --]

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 5916d6ed1bc..d1900f97f0c 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2610,10 +2610,11 @@ expand_shift_1 (enum tree_code code, machine_mode mode, rtx shifted,
 	  else if (methods == OPTAB_LIB_WIDEN)
 	    {
 	      /* If we have been unable to open-code this by a rotation,
-		 do it as the IOR of two shifts.  I.e., to rotate A
-		 by N bits, compute
+		 do it as the IOR or PLUS of two shifts.  I.e., to rotate
+		 A by N bits, compute
 		 (A << N) | ((unsigned) A >> ((-N) & (C - 1)))
-		 where C is the bitsize of A.
+		 where C is the bitsize of A.  If N cannot be zero,
+		 use PLUS instead of IOR.
 
 		 It is theoretically possible that the target machine might
 		 not be able to perform either shift and hence we would
@@ -2650,8 +2651,9 @@ expand_shift_1 (enum tree_code code, machine_mode mode, rtx shifted,
 	      temp1 = expand_shift_1 (left ? RSHIFT_EXPR : LSHIFT_EXPR,
 				      mode, shifted, other_amount,
 				      subtarget, 1);
-	      return expand_binop (mode, ior_optab, temp, temp1, target,
-				   unsignedp, methods);
+	      return expand_binop (mode,
+				   CONST_INT_P (op1) ? add_optab : ior_optab,
+				   temp, temp1, target, unsignedp, methods);
 	    }
 
 	  temp = expand_binop (mode,
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ce91f94ed43..dcd3e406719 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -566,8 +566,8 @@ expand_subword_shift (scalar_int_mode op1_mode, optab binoptab,
       if (tmp == 0)
 	return false;
 
-      /* Now OR in the bits carried over from OUTOF_INPUT.  */
-      if (!force_expand_binop (word_mode, ior_optab, tmp, carries,
+      /* Now OR/PLUS in the bits carried over from OUTOF_INPUT.  */
+      if (!force_expand_binop (word_mode, add_optab, tmp, carries,
 			       into_target, unsignedp, methods))
 	return false;
     }
@@ -1937,7 +1937,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1,
 				     NULL_RTX, unsignedp, next_methods);
 
 	  if (into_temp1 != 0 && into_temp2 != 0)
-	    inter = expand_binop (word_mode, ior_optab, into_temp1, into_temp2,
+	    inter = expand_binop (word_mode, add_optab, into_temp1, into_temp2,
 				  into_target, unsignedp, next_methods);
 	  else
 	    inter = 0;
@@ -1953,7 +1953,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1,
 				      NULL_RTX, unsignedp, next_methods);
 
 	  if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
-	    inter = expand_binop (word_mode, ior_optab,
+	    inter = expand_binop (word_mode, add_optab,
 				  outof_temp1, outof_temp2,
 				  outof_target, unsignedp, next_methods);
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-18 19:54 [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates Roger Sayle
@ 2024-01-19 11:03 ` Richard Biener
  2024-01-19 13:26   ` Roger Sayle
  2024-01-19 16:05 ` Georg-Johann Lay
  2024-06-09  1:48 ` Jeff Law
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2024-01-19 11:03 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Thu, Jan 18, 2024 at 8:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch tweaks RTL expansion of multi-word shifts and rotates to use
> PLUS rather than IOR for disjunctive operations.  During expansion of
> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> where the constants C1 and C2 guarantee that bits don't overlap.
> Hence the IOR can be performed by any any_or_plus operation, such as
> IOR, XOR or PLUS; for word-size operations where carry chains aren't
> an issue these should all be equally fast (single-cycle) instructions.
> The benefit of this change is that targets with shift-and-add insns,
> like x86's lea, can benefit from the LSHIFT-ADD form.
>
> An example of a backend that benefits is ARC, which is demonstrated
> by these two simple functions:
>
> unsigned long long foo(unsigned long long x) { return x<<2; }
>
> which with -O2 is currently compiled to:
>
> foo:    lsr     r2,r0,30
>         asl_s   r1,r1,2
>         asl_s   r0,r0,2
>         j_s.d   [blink]
>         or_s    r1,r1,r2
>
> with this patch becomes:
>
> foo:    lsr     r2,r0,30
>         add2    r1,r2,r1
>         j_s.d   [blink]
>         asl_s   r0,r0,2
>
> unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }
>
> which with -O2 is currently compiled to 6 insns + return:
>
> bar:    lsr     r12,r0,30
>         asl_s   r3,r1,2
>         asl_s   r0,r0,2
>         lsr_s   r1,r1,30
>         or_s    r0,r0,r1
>         j_s.d   [blink]
>         or      r1,r12,r3
>
> with this patch becomes 4 insns + return:
>
> bar:    lsr     r3,r1,30
>         lsr     r2,r0,30
>         add2    r1,r2,r1
>         j_s.d   [blink]
>         add2    r0,r3,r0
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

For expand_shift_1 you add

+                where C is the bitsize of A.  If N cannot be zero,
+                use PLUS instead of IOR.

but I don't see a check ensuring this other than mabe CONST_INT_P (op1)
suggesting that we enver end up with const0_rtx here.  OTOH why is
N zero a problem and why is it not in the optabs.cc case where I don't
see any such check (at least not obvious)?

Since this doesn't seem to fix a regression it probably has to wait for
stage1 to re-open.

Thanks,
Richard.

>
> 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
>         to generate PLUS instead or IOR when unioning disjoint bitfields.
>         * optabs.cc (expand_subword_shift): Likewise.
>         (expand_binop): Likewise for double-word rotate.
>
>
> Thanks in advance,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-19 11:03 ` Richard Biener
@ 2024-01-19 13:26   ` Roger Sayle
  2024-01-19 13:49     ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Sayle @ 2024-01-19 13:26 UTC (permalink / raw)
  To: 'Richard Biener'; +Cc: gcc-patches

Hi Richard,

Thanks for the speedy review.  I completely agree this patch
can wait for stage1, but it's related to some recent work Andrew
Pinski has been doing in match.pd, so I thought I'd share it.

Hypothetically, recognizing (x<<4)+(x>>60) as a rotation at the
tree-level might lead to a code quality regression, if RTL
expansion doesn't know to lower it back to use PLUS on
those targets with lea but without rotate.

> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 19 January 2024 11:04
> On Thu, Jan 18, 2024 at 8:55 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > This patch tweaks RTL expansion of multi-word shifts and rotates to
> > use PLUS rather than IOR for disjunctive operations.  During expansion
> > of these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> > where the constants C1 and C2 guarantee that bits don't overlap.
> > Hence the IOR can be performed by any any_or_plus operation, such as
> > IOR, XOR or PLUS; for word-size operations where carry chains aren't
> > an issue these should all be equally fast (single-cycle) instructions.
> > The benefit of this change is that targets with shift-and-add insns,
> > like x86's lea, can benefit from the LSHIFT-ADD form.
> >
> > An example of a backend that benefits is ARC, which is demonstrated by
> > these two simple functions:
> >
> > unsigned long long foo(unsigned long long x) { return x<<2; }
> >
> > which with -O2 is currently compiled to:
> >
> > foo:    lsr     r2,r0,30
> >         asl_s   r1,r1,2
> >         asl_s   r0,r0,2
> >         j_s.d   [blink]
> >         or_s    r1,r1,r2
> >
> > with this patch becomes:
> >
> > foo:    lsr     r2,r0,30
> >         add2    r1,r2,r1
> >         j_s.d   [blink]
> >         asl_s   r0,r0,2
> >
> > unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62);
> > }
> >
> > which with -O2 is currently compiled to 6 insns + return:
> >
> > bar:    lsr     r12,r0,30
> >         asl_s   r3,r1,2
> >         asl_s   r0,r0,2
> >         lsr_s   r1,r1,30
> >         or_s    r0,r0,r1
> >         j_s.d   [blink]
> >         or      r1,r12,r3
> >
> > with this patch becomes 4 insns + return:
> >
> > bar:    lsr     r3,r1,30
> >         lsr     r2,r0,30
> >         add2    r1,r2,r1
> >         j_s.d   [blink]
> >         add2    r0,r3,r0
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> For expand_shift_1 you add
> 
> +                where C is the bitsize of A.  If N cannot be zero,
> +                use PLUS instead of IOR.
> 
> but I don't see a check ensuring this other than mabe CONST_INT_P (op1)
> suggesting that we enver end up with const0_rtx here.  OTOH why is N zero a
> problem and why is it not in the optabs.cc case where I don't see any such check
> (at least not obvious)?

Excellent question.   A common mistake in writing a rotate function in C
or C++ is to write something like (x>>n)|(x<<(64-n)) or (x<<n)|(x>>(64-n))
which invokes undefined behavior when n == 0.  It's OK to recognize these
as rotates (relying on the undefined behavior), but correct/portable code
(and RTL) needs the correct idiom(x>>n)|(x<<((-n)&63), which never invokes
undefined behaviour.  One interesting property of this idiom, is that shift
by zero is then calculated as (x>>0)|(x<<0) which is x|x.  This should then
reveal the problem, for all non-zero values the IOR can be replaced by PLUS,
but for zero shifts, X|X isn't the same as X+X or X^X.

This only applies for single word rotations, and not multi-word shifts
nor multi-word rotates, which explains why this test is only in one place.

In theory, we could use ranger to check whether a rotate by a variable
amount can ever be by zero bits, but the simplification used here is to
continue using IOR for variable shifts, and PLUS for fixed/known shift
values.  The last remaining insight is that we only need to check for
CONST_INT_P, as rotations/shifts by const0_rtx are handled earlier in
this function (and eliminated by the tree-optimizers), i.e. rotation by
a known constant is implicitly a rotation by a known non-zero constant.

This is a little clearer if you read/cite more of the comment that was
changed.  Fortunately, this case is also well covered by the testsuite.
I'd be happy to change the code to read:

	(CONST_INT_P (op1) && op1 != const0_rtx)
	? add_optab
	: ior_optab

But the test "if (op1 == const0_rtx)" already appears on line 2570
of expmed.cc.

> Since this doesn't seem to fix a regression it probably has to wait for
> stage1 to re-open.
> 
> Thanks,
> Richard.
> 
> > 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
> >         to generate PLUS instead or IOR when unioning disjoint bitfields.
> >         * optabs.cc (expand_subword_shift): Likewise.
> >         (expand_binop): Likewise for double-word rotate.
> >

Thanks again.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-19 13:26   ` Roger Sayle
@ 2024-01-19 13:49     ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2024-01-19 13:49 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Fri, Jan 19, 2024 at 2:26 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
>
> Thanks for the speedy review.  I completely agree this patch
> can wait for stage1, but it's related to some recent work Andrew
> Pinski has been doing in match.pd, so I thought I'd share it.
>
> Hypothetically, recognizing (x<<4)+(x>>60) as a rotation at the
> tree-level might lead to a code quality regression, if RTL
> expansion doesn't know to lower it back to use PLUS on
> those targets with lea but without rotate.
>
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: 19 January 2024 11:04
> > On Thu, Jan 18, 2024 at 8:55 PM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > > This patch tweaks RTL expansion of multi-word shifts and rotates to
> > > use PLUS rather than IOR for disjunctive operations.  During expansion
> > > of these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> > > where the constants C1 and C2 guarantee that bits don't overlap.
> > > Hence the IOR can be performed by any any_or_plus operation, such as
> > > IOR, XOR or PLUS; for word-size operations where carry chains aren't
> > > an issue these should all be equally fast (single-cycle) instructions.
> > > The benefit of this change is that targets with shift-and-add insns,
> > > like x86's lea, can benefit from the LSHIFT-ADD form.
> > >
> > > An example of a backend that benefits is ARC, which is demonstrated by
> > > these two simple functions:
> > >
> > > unsigned long long foo(unsigned long long x) { return x<<2; }
> > >
> > > which with -O2 is currently compiled to:
> > >
> > > foo:    lsr     r2,r0,30
> > >         asl_s   r1,r1,2
> > >         asl_s   r0,r0,2
> > >         j_s.d   [blink]
> > >         or_s    r1,r1,r2
> > >
> > > with this patch becomes:
> > >
> > > foo:    lsr     r2,r0,30
> > >         add2    r1,r2,r1
> > >         j_s.d   [blink]
> > >         asl_s   r0,r0,2
> > >
> > > unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62);
> > > }
> > >
> > > which with -O2 is currently compiled to 6 insns + return:
> > >
> > > bar:    lsr     r12,r0,30
> > >         asl_s   r3,r1,2
> > >         asl_s   r0,r0,2
> > >         lsr_s   r1,r1,30
> > >         or_s    r0,r0,r1
> > >         j_s.d   [blink]
> > >         or      r1,r12,r3
> > >
> > > with this patch becomes 4 insns + return:
> > >
> > > bar:    lsr     r3,r1,30
> > >         lsr     r2,r0,30
> > >         add2    r1,r2,r1
> > >         j_s.d   [blink]
> > >         add2    r0,r3,r0
> > >
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> >
> > For expand_shift_1 you add
> >
> > +                where C is the bitsize of A.  If N cannot be zero,
> > +                use PLUS instead of IOR.
> >
> > but I don't see a check ensuring this other than mabe CONST_INT_P (op1)
> > suggesting that we enver end up with const0_rtx here.  OTOH why is N zero a
> > problem and why is it not in the optabs.cc case where I don't see any such check
> > (at least not obvious)?
>
> Excellent question.   A common mistake in writing a rotate function in C
> or C++ is to write something like (x>>n)|(x<<(64-n)) or (x<<n)|(x>>(64-n))
> which invokes undefined behavior when n == 0.  It's OK to recognize these
> as rotates (relying on the undefined behavior), but correct/portable code
> (and RTL) needs the correct idiom(x>>n)|(x<<((-n)&63), which never invokes
> undefined behaviour.  One interesting property of this idiom, is that shift
> by zero is then calculated as (x>>0)|(x<<0) which is x|x.  This should then
> reveal the problem, for all non-zero values the IOR can be replaced by PLUS,
> but for zero shifts, X|X isn't the same as X+X or X^X.
>
> This only applies for single word rotations, and not multi-word shifts
> nor multi-word rotates, which explains why this test is only in one place.
>
> In theory, we could use ranger to check whether a rotate by a variable
> amount can ever be by zero bits, but the simplification used here is to
> continue using IOR for variable shifts, and PLUS for fixed/known shift
> values.  The last remaining insight is that we only need to check for
> CONST_INT_P, as rotations/shifts by const0_rtx are handled earlier in
> this function (and eliminated by the tree-optimizers), i.e. rotation by
> a known constant is implicitly a rotation by a known non-zero constant.

Ah, I see.  It wasn't obvious the expmed.cc case was for rotations only.

The patch is OK as-is for stage1 (which also gives others plenty of time
to comment).

I wonder if you can add a testcase though?

Thanks,
Richard.

> This is a little clearer if you read/cite more of the comment that was
> changed.  Fortunately, this case is also well covered by the testsuite.
> I'd be happy to change the code to read:
>
>         (CONST_INT_P (op1) && op1 != const0_rtx)
>         ? add_optab
>         : ior_optab
>
> But the test "if (op1 == const0_rtx)" already appears on line 2570
> of expmed.cc.
>
>
> > Since this doesn't seem to fix a regression it probably has to wait for
> > stage1 to re-open.
> >
> > Thanks,
> > Richard.
> >
> > > 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
> > >         to generate PLUS instead or IOR when unioning disjoint bitfields.
> > >         * optabs.cc (expand_subword_shift): Likewise.
> > >         (expand_binop): Likewise for double-word rotate.
> > >
>
>
> Thanks again.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-18 19:54 [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates Roger Sayle
  2024-01-19 11:03 ` Richard Biener
@ 2024-01-19 16:05 ` Georg-Johann Lay
  2024-01-19 16:50   ` Jeff Law
  2024-01-22  7:45   ` Richard Biener
  2024-06-09  1:48 ` Jeff Law
  2 siblings, 2 replies; 12+ messages in thread
From: Georg-Johann Lay @ 2024-01-19 16:05 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches



Am 18.01.24 um 20:54 schrieb Roger Sayle:
> 
> This patch tweaks RTL expansion of multi-word shifts and rotates to use
> PLUS rather than IOR for disjunctive operations.  During expansion of
> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> where the constants C1 and C2 guarantee that bits don't overlap.
> Hence the IOR can be performed by any any_or_plus operation, such as
> IOR, XOR or PLUS; for word-size operations where carry chains aren't
> an issue these should all be equally fast (single-cycle) instructions.
> The benefit of this change is that targets with shift-and-add insns,
> like x86's lea, can benefit from the LSHIFT-ADD form.
> 
> An example of a backend that benefits is ARC, which is demonstrated
> by these two simple functions:

But there are also back-ends where this is bad.

The reason is that with ORI, the back-end needs only to operate no
these sub-words where the sub-mask is non-zero.  But for PLUS this
is not the case because the back-end does not know that intermediate
carry will be zero.  Hence, with PLUS, more instructions are needed.
An example is AVR, but maybe much more target with multi-word operations
are affected in a bad way.

Take for example the case with 2 words and a value of 1.

LO |= 1
HI |= 0

can be optimized to

LO |= 1

but for addition this is not the case:

LO += 1
HI +=c 0 ;; Does not know that always carry = 0.

Johann


> 
> unsigned long long foo(unsigned long long x) { return x<<2; }
> 
> which with -O2 is currently compiled to:
> 
> foo:    lsr     r2,r0,30
>          asl_s   r1,r1,2
>          asl_s   r0,r0,2
>          j_s.d   [blink]
>          or_s    r1,r1,r2
> 
> with this patch becomes:
> 
> foo:    lsr     r2,r0,30
>          add2    r1,r2,r1
>          j_s.d   [blink]
>          asl_s   r0,r0,2
> 
> unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }
> 
> which with -O2 is currently compiled to 6 insns + return:
> 
> bar:    lsr     r12,r0,30
>          asl_s   r3,r1,2
>          asl_s   r0,r0,2
>          lsr_s   r1,r1,30
>          or_s    r0,r0,r1
>          j_s.d   [blink]
>          or      r1,r12,r3
> 
> with this patch becomes 4 insns + return:
> 
> bar:    lsr     r3,r1,30
>          lsr     r2,r0,30
>          add2    r1,r2,r1
>          j_s.d   [blink]
>          add2    r0,r3,r0
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 
> 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
>          to generate PLUS instead or IOR when unioning disjoint bitfields.
>          * optabs.cc (expand_subword_shift): Likewise.
>          (expand_binop): Likewise for double-word rotate.
> 
> 
> Thanks in advance,
> Roger
> --
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-19 16:05 ` Georg-Johann Lay
@ 2024-01-19 16:50   ` Jeff Law
  2024-01-20  9:31     ` Uros Bizjak
  2024-01-22  7:45   ` Richard Biener
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff Law @ 2024-01-19 16:50 UTC (permalink / raw)
  To: Georg-Johann Lay, Roger Sayle, gcc-patches



On 1/19/24 09:05, Georg-Johann Lay wrote:
> 
> 
> Am 18.01.24 um 20:54 schrieb Roger Sayle:
>>
>> This patch tweaks RTL expansion of multi-word shifts and rotates to use
>> PLUS rather than IOR for disjunctive operations.  During expansion of
>> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
>> where the constants C1 and C2 guarantee that bits don't overlap.
>> Hence the IOR can be performed by any any_or_plus operation, such as
>> IOR, XOR or PLUS; for word-size operations where carry chains aren't
>> an issue these should all be equally fast (single-cycle) instructions.
>> The benefit of this change is that targets with shift-and-add insns,
>> like x86's lea, can benefit from the LSHIFT-ADD form.
>>
>> An example of a backend that benefits is ARC, which is demonstrated
>> by these two simple functions:
> 
> But there are also back-ends where this is bad.
> 
> The reason is that with ORI, the back-end needs only to operate no
> these sub-words where the sub-mask is non-zero.  But for PLUS this
> is not the case because the back-end does not know that intermediate
> carry will be zero.  Hence, with PLUS, more instructions are needed.
> An example is AVR, but maybe much more target with multi-word operations
> are affected in a bad way.
> 
> Take for example the case with 2 words and a value of 1.
> 
> LO |= 1
> HI |= 0
> 
> can be optimized to
> 
> LO |= 1
> 
> but for addition this is not the case:
> 
> LO += 1
> HI +=c 0 ;; Does not know that always carry = 0.
I think it's clear that the decision is target and possibly uarch 
specific within a target.

Which means that expmed is probably the right place and that we're going 
to need to look for a good way for the target to control.  I suspect 
rtx_cost  isn't likely a good fit.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-19 16:50   ` Jeff Law
@ 2024-01-20  9:31     ` Uros Bizjak
  0 siblings, 0 replies; 12+ messages in thread
From: Uros Bizjak @ 2024-01-20  9:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: Georg-Johann Lay, Roger Sayle, gcc-patches

On Fri, Jan 19, 2024 at 5:50 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 1/19/24 09:05, Georg-Johann Lay wrote:
> >
> >
> > Am 18.01.24 um 20:54 schrieb Roger Sayle:
> >>
> >> This patch tweaks RTL expansion of multi-word shifts and rotates to use
> >> PLUS rather than IOR for disjunctive operations.  During expansion of
> >> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> >> where the constants C1 and C2 guarantee that bits don't overlap.
> >> Hence the IOR can be performed by any any_or_plus operation, such as
> >> IOR, XOR or PLUS; for word-size operations where carry chains aren't
> >> an issue these should all be equally fast (single-cycle) instructions.
> >> The benefit of this change is that targets with shift-and-add insns,
> >> like x86's lea, can benefit from the LSHIFT-ADD form.
> >>
> >> An example of a backend that benefits is ARC, which is demonstrated
> >> by these two simple functions:
> >
> > But there are also back-ends where this is bad.
> >
> > The reason is that with ORI, the back-end needs only to operate no
> > these sub-words where the sub-mask is non-zero.  But for PLUS this
> > is not the case because the back-end does not know that intermediate
> > carry will be zero.  Hence, with PLUS, more instructions are needed.
> > An example is AVR, but maybe much more target with multi-word operations
> > are affected in a bad way.
> >
> > Take for example the case with 2 words and a value of 1.
> >
> > LO |= 1
> > HI |= 0
> >
> > can be optimized to
> >
> > LO |= 1
> >
> > but for addition this is not the case:
> >
> > LO += 1
> > HI +=c 0 ;; Does not know that always carry = 0.
> I think it's clear that the decision is target and possibly uarch
> specific within a target.
>
> Which means that expmed is probably the right place and that we're going
> to need to look for a good way for the target to control.  I suspect
> rtx_cost  isn't likely a good fit.

Perhaps related is PR108477 [1] and patch at [2], where x86 would
prefer PLUS instead of {X,I}OR, where we have disjoint bits in the
operands of {X,I}OR.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108477
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642164.html

Uros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-19 16:05 ` Georg-Johann Lay
  2024-01-19 16:50   ` Jeff Law
@ 2024-01-22  7:45   ` Richard Biener
  2024-01-22 15:51     ` Jeff Law
  2024-01-24 15:49     ` Georg-Johann Lay
  1 sibling, 2 replies; 12+ messages in thread
From: Richard Biener @ 2024-01-22  7:45 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: Roger Sayle, gcc-patches

On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay <avr@gjlay.de> wrote:
>
>
>
> Am 18.01.24 um 20:54 schrieb Roger Sayle:
> >
> > This patch tweaks RTL expansion of multi-word shifts and rotates to use
> > PLUS rather than IOR for disjunctive operations.  During expansion of
> > these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> > where the constants C1 and C2 guarantee that bits don't overlap.
> > Hence the IOR can be performed by any any_or_plus operation, such as
> > IOR, XOR or PLUS; for word-size operations where carry chains aren't
> > an issue these should all be equally fast (single-cycle) instructions.
> > The benefit of this change is that targets with shift-and-add insns,
> > like x86's lea, can benefit from the LSHIFT-ADD form.
> >
> > An example of a backend that benefits is ARC, which is demonstrated
> > by these two simple functions:
>
> But there are also back-ends where this is bad.
>
> The reason is that with ORI, the back-end needs only to operate no
> these sub-words where the sub-mask is non-zero.  But for PLUS this
> is not the case because the back-end does not know that intermediate
> carry will be zero.  Hence, with PLUS, more instructions are needed.
> An example is AVR, but maybe much more target with multi-word operations
> are affected in a bad way.
>
> Take for example the case with 2 words and a value of 1.
>
> LO |= 1
> HI |= 0
>
> can be optimized to
>
> LO |= 1
>
> but for addition this is not the case:
>
> LO += 1
> HI +=c 0 ;; Does not know that always carry = 0.

I wonder if the PLUS can be done on the lowpart only to make this
detail obvious?

> Johann
>
>
> >
> > unsigned long long foo(unsigned long long x) { return x<<2; }
> >
> > which with -O2 is currently compiled to:
> >
> > foo:    lsr     r2,r0,30
> >          asl_s   r1,r1,2
> >          asl_s   r0,r0,2
> >          j_s.d   [blink]
> >          or_s    r1,r1,r2
> >
> > with this patch becomes:
> >
> > foo:    lsr     r2,r0,30
> >          add2    r1,r2,r1
> >          j_s.d   [blink]
> >          asl_s   r0,r0,2
> >
> > unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }
> >
> > which with -O2 is currently compiled to 6 insns + return:
> >
> > bar:    lsr     r12,r0,30
> >          asl_s   r3,r1,2
> >          asl_s   r0,r0,2
> >          lsr_s   r1,r1,30
> >          or_s    r0,r0,r1
> >          j_s.d   [blink]
> >          or      r1,r12,r3
> >
> > with this patch becomes 4 insns + return:
> >
> > bar:    lsr     r3,r1,30
> >          lsr     r2,r0,30
> >          add2    r1,r2,r1
> >          j_s.d   [blink]
> >          add2    r0,r3,r0
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
> >          to generate PLUS instead or IOR when unioning disjoint bitfields.
> >          * optabs.cc (expand_subword_shift): Likewise.
> >          (expand_binop): Likewise for double-word rotate.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-22  7:45   ` Richard Biener
@ 2024-01-22 15:51     ` Jeff Law
  2024-01-24 15:49     ` Georg-Johann Lay
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff Law @ 2024-01-22 15:51 UTC (permalink / raw)
  To: Richard Biener, Georg-Johann Lay; +Cc: Roger Sayle, gcc-patches



On 1/22/24 00:45, Richard Biener wrote:
> On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay <avr@gjlay.de> wrote:
>>
>>
>>
>> Am 18.01.24 um 20:54 schrieb Roger Sayle:
>>>
>>> This patch tweaks RTL expansion of multi-word shifts and rotates to use
>>> PLUS rather than IOR for disjunctive operations.  During expansion of
>>> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
>>> where the constants C1 and C2 guarantee that bits don't overlap.
>>> Hence the IOR can be performed by any any_or_plus operation, such as
>>> IOR, XOR or PLUS; for word-size operations where carry chains aren't
>>> an issue these should all be equally fast (single-cycle) instructions.
>>> The benefit of this change is that targets with shift-and-add insns,
>>> like x86's lea, can benefit from the LSHIFT-ADD form.
>>>
>>> An example of a backend that benefits is ARC, which is demonstrated
>>> by these two simple functions:
>>
>> But there are also back-ends where this is bad.
>>
>> The reason is that with ORI, the back-end needs only to operate no
>> these sub-words where the sub-mask is non-zero.  But for PLUS this
>> is not the case because the back-end does not know that intermediate
>> carry will be zero.  Hence, with PLUS, more instructions are needed.
>> An example is AVR, but maybe much more target with multi-word operations
>> are affected in a bad way.
>>
>> Take for example the case with 2 words and a value of 1.
>>
>> LO |= 1
>> HI |= 0
>>
>> can be optimized to
>>
>> LO |= 1
>>
>> but for addition this is not the case:
>>
>> LO += 1
>> HI +=c 0 ;; Does not know that always carry = 0.
> 
> I wonder if the PLUS can be done on the lowpart only to make this
> detail obvious?
In theory, yes.   This class of problems has often been punted to the 
target expanders (far from ideal).

I still suspect the way forward here is to have the exp* code query one 
or more target properties to guide IOR vs PLUS selection.

Jeff


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-22  7:45   ` Richard Biener
  2024-01-22 15:51     ` Jeff Law
@ 2024-01-24 15:49     ` Georg-Johann Lay
  2024-01-25  9:20       ` Richard Biener
  1 sibling, 1 reply; 12+ messages in thread
From: Georg-Johann Lay @ 2024-01-24 15:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: Roger Sayle, gcc-patches



Am 22.01.24 um 08:45 schrieb Richard Biener:
> On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay <avr@gjlay.de> wrote:
>>
>>
>>
>> Am 18.01.24 um 20:54 schrieb Roger Sayle:
>>>
>>> This patch tweaks RTL expansion of multi-word shifts and rotates to use
>>> PLUS rather than IOR for disjunctive operations.  During expansion of
>>> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
>>> where the constants C1 and C2 guarantee that bits don't overlap.
>>> Hence the IOR can be performed by any any_or_plus operation, such as
>>> IOR, XOR or PLUS; for word-size operations where carry chains aren't
>>> an issue these should all be equally fast (single-cycle) instructions.
>>> The benefit of this change is that targets with shift-and-add insns,
>>> like x86's lea, can benefit from the LSHIFT-ADD form.
>>>
>>> An example of a backend that benefits is ARC, which is demonstrated
>>> by these two simple functions:
>>
>> But there are also back-ends where this is bad.
>>
>> The reason is that with ORI, the back-end needs only to operate no
>> these sub-words where the sub-mask is non-zero.  But for PLUS this
>> is not the case because the back-end does not know that intermediate
>> carry will be zero.  Hence, with PLUS, more instructions are needed.
>> An example is AVR, but maybe much more target with multi-word operations
>> are affected in a bad way.
>>
>> Take for example the case with 2 words and a value of 1.
>>
>> LO |= 1
>> HI |= 0
>>
>> can be optimized to
>>
>> LO |= 1
>>
>> but for addition this is not the case:
>>
>> LO += 1
>> HI +=c 0 ;; Does not know that always carry = 0.
> 
> I wonder if the PLUS can be done on the lowpart only to make this
> detail obvious?

For AVR, word_mode is HImode, but the hardware has only 8-bit registers.

Moreover splitting insns is not wanted or not possible (due to CCmode).

Johann

>>> unsigned long long foo(unsigned long long x) { return x<<2; }
>>>
>>> which with -O2 is currently compiled to:
>>>
>>> foo:    lsr     r2,r0,30
>>>           asl_s   r1,r1,2
>>>           asl_s   r0,r0,2
>>>           j_s.d   [blink]
>>>           or_s    r1,r1,r2
>>>
>>> with this patch becomes:
>>>
>>> foo:    lsr     r2,r0,30
>>>           add2    r1,r2,r1
>>>           j_s.d   [blink]
>>>           asl_s   r0,r0,2
>>>
>>> unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }
>>>
>>> which with -O2 is currently compiled to 6 insns + return:
>>>
>>> bar:    lsr     r12,r0,30
>>>           asl_s   r3,r1,2
>>>           asl_s   r0,r0,2
>>>           lsr_s   r1,r1,30
>>>           or_s    r0,r0,r1
>>>           j_s.d   [blink]
>>>           or      r1,r12,r3
>>>
>>> with this patch becomes 4 insns + return:
>>>
>>> bar:    lsr     r3,r1,30
>>>           lsr     r2,r0,30
>>>           add2    r1,r2,r1
>>>           j_s.d   [blink]
>>>           add2    r0,r3,r0
>>>
>>>
>>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> and make -k check, both with and without --target_board=unix{-m32}
>>> with no new failures.  Ok for mainline?
>>>
>>>
>>> 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
>>>
>>> gcc/ChangeLog
>>>           * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
>>>           to generate PLUS instead or IOR when unioning disjoint bitfields.
>>>           * optabs.cc (expand_subword_shift): Likewise.
>>>           (expand_binop): Likewise for double-word rotate.
>>>
>>>
>>> Thanks in advance,
>>> Roger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-24 15:49     ` Georg-Johann Lay
@ 2024-01-25  9:20       ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2024-01-25  9:20 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: Roger Sayle, gcc-patches

On Wed, Jan 24, 2024 at 4:50 PM Georg-Johann Lay <avr@gjlay.de> wrote:
>
>
>
> Am 22.01.24 um 08:45 schrieb Richard Biener:
> > On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay <avr@gjlay.de> wrote:
> >>
> >>
> >>
> >> Am 18.01.24 um 20:54 schrieb Roger Sayle:
> >>>
> >>> This patch tweaks RTL expansion of multi-word shifts and rotates to use
> >>> PLUS rather than IOR for disjunctive operations.  During expansion of
> >>> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> >>> where the constants C1 and C2 guarantee that bits don't overlap.
> >>> Hence the IOR can be performed by any any_or_plus operation, such as
> >>> IOR, XOR or PLUS; for word-size operations where carry chains aren't
> >>> an issue these should all be equally fast (single-cycle) instructions.
> >>> The benefit of this change is that targets with shift-and-add insns,
> >>> like x86's lea, can benefit from the LSHIFT-ADD form.
> >>>
> >>> An example of a backend that benefits is ARC, which is demonstrated
> >>> by these two simple functions:
> >>
> >> But there are also back-ends where this is bad.
> >>
> >> The reason is that with ORI, the back-end needs only to operate no
> >> these sub-words where the sub-mask is non-zero.  But for PLUS this
> >> is not the case because the back-end does not know that intermediate
> >> carry will be zero.  Hence, with PLUS, more instructions are needed.
> >> An example is AVR, but maybe much more target with multi-word operations
> >> are affected in a bad way.
> >>
> >> Take for example the case with 2 words and a value of 1.
> >>
> >> LO |= 1
> >> HI |= 0
> >>
> >> can be optimized to
> >>
> >> LO |= 1
> >>
> >> but for addition this is not the case:
> >>
> >> LO += 1
> >> HI +=c 0 ;; Does not know that always carry = 0.
> >
> > I wonder if the PLUS can be done on the lowpart only to make this
> > detail obvious?
>
> For AVR, word_mode is HImode, but the hardware has only 8-bit registers.
>
> Moreover splitting insns is not wanted or not possible (due to CCmode).

Btw, it would be nice to have test coverage on AVR for the cases we're
talking about (if there isn't already).  That makes sure we don't regress
with whatever solution we end up with.

Richard.

> Johann
>
> >>> unsigned long long foo(unsigned long long x) { return x<<2; }
> >>>
> >>> which with -O2 is currently compiled to:
> >>>
> >>> foo:    lsr     r2,r0,30
> >>>           asl_s   r1,r1,2
> >>>           asl_s   r0,r0,2
> >>>           j_s.d   [blink]
> >>>           or_s    r1,r1,r2
> >>>
> >>> with this patch becomes:
> >>>
> >>> foo:    lsr     r2,r0,30
> >>>           add2    r1,r2,r1
> >>>           j_s.d   [blink]
> >>>           asl_s   r0,r0,2
> >>>
> >>> unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }
> >>>
> >>> which with -O2 is currently compiled to 6 insns + return:
> >>>
> >>> bar:    lsr     r12,r0,30
> >>>           asl_s   r3,r1,2
> >>>           asl_s   r0,r0,2
> >>>           lsr_s   r1,r1,30
> >>>           or_s    r0,r0,r1
> >>>           j_s.d   [blink]
> >>>           or      r1,r12,r3
> >>>
> >>> with this patch becomes 4 insns + return:
> >>>
> >>> bar:    lsr     r3,r1,30
> >>>           lsr     r2,r0,30
> >>>           add2    r1,r2,r1
> >>>           j_s.d   [blink]
> >>>           add2    r0,r3,r0
> >>>
> >>>
> >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >>> and make -k check, both with and without --target_board=unix{-m32}
> >>> with no new failures.  Ok for mainline?
> >>>
> >>>
> >>> 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
> >>>
> >>> gcc/ChangeLog
> >>>           * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
> >>>           to generate PLUS instead or IOR when unioning disjoint bitfields.
> >>>           * optabs.cc (expand_subword_shift): Likewise.
> >>>           (expand_binop): Likewise for double-word rotate.
> >>>
> >>>
> >>> Thanks in advance,
> >>> Roger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
  2024-01-18 19:54 [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates Roger Sayle
  2024-01-19 11:03 ` Richard Biener
  2024-01-19 16:05 ` Georg-Johann Lay
@ 2024-06-09  1:48 ` Jeff Law
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff Law @ 2024-06-09  1:48 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches



On 1/18/24 12:54 PM, Roger Sayle wrote:
> 
> This patch tweaks RTL expansion of multi-word shifts and rotates to use
> PLUS rather than IOR for disjunctive operations.  During expansion of
> these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
> where the constants C1 and C2 guarantee that bits don't overlap.
> Hence the IOR can be performed by any any_or_plus operation, such as
> IOR, XOR or PLUS; for word-size operations where carry chains aren't
> an issue these should all be equally fast (single-cycle) instructions.
> The benefit of this change is that targets with shift-and-add insns,
> like x86's lea, can benefit from the LSHIFT-ADD form.
> 
> An example of a backend that benefits is ARC, which is demonstrated
> by these two simple functions:
> 
> unsigned long long foo(unsigned long long x) { return x<<2; }
> 
> which with -O2 is currently compiled to:
> 
> foo:    lsr     r2,r0,30
>          asl_s   r1,r1,2
>          asl_s   r0,r0,2
>          j_s.d   [blink]
>          or_s    r1,r1,r2
> 
> with this patch becomes:
> 
> foo:    lsr     r2,r0,30
>          add2    r1,r2,r1
>          j_s.d   [blink]
>          asl_s   r0,r0,2
> 
> unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }
> 
> which with -O2 is currently compiled to 6 insns + return:
> 
> bar:    lsr     r12,r0,30
>          asl_s   r3,r1,2
>          asl_s   r0,r0,2
>          lsr_s   r1,r1,30
>          or_s    r0,r0,r1
>          j_s.d   [blink]
>          or      r1,r12,r3
> 
> with this patch becomes 4 insns + return:
> 
> bar:    lsr     r3,r1,30
>          lsr     r2,r0,30
>          add2    r1,r2,r1
>          j_s.d   [blink]
>          add2    r0,r3,r0
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 
> 2024-01-18  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
>          to generate PLUS instead or IOR when unioning disjoint bitfields.
>          * optabs.cc (expand_subword_shift): Likewise.
>          (expand_binop): Likewise for double-word rotate.
Also note that on some targets like RISC-V, there's more freedom to 
generate compressed instructions from "and" rather than "or".

Anyway, given the time elapsed since submission, I went ahead and 
retested on x86, then committed & pushed to the trunk.

Thanks!

jeff


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-06-09  1:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-18 19:54 [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates Roger Sayle
2024-01-19 11:03 ` Richard Biener
2024-01-19 13:26   ` Roger Sayle
2024-01-19 13:49     ` Richard Biener
2024-01-19 16:05 ` Georg-Johann Lay
2024-01-19 16:50   ` Jeff Law
2024-01-20  9:31     ` Uros Bizjak
2024-01-22  7:45   ` Richard Biener
2024-01-22 15:51     ` Jeff Law
2024-01-24 15:49     ` Georg-Johann Lay
2024-01-25  9:20       ` Richard Biener
2024-06-09  1:48 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).