[Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension
@ 2023-04-22  0:12 law at gcc dot gnu.org
  2023-04-24  2:46 ` [Bug rtl-optimization/109592] " wangfeng at eswincomputing dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2023-04-22  0:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

            Bug ID: 109592
           Summary: Failure to recognize shifts as sign/zero extension
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This is a trivial sign extension:

int sextb32(int x)
{ return (x << 24) >> 24; }

Yet on RV64 with ZBB enabled we get:

sextb32:
        slliw   a0,a0,24        # 6     [c=4 l=4]  ashlsi3
        sraiw   a0,a0,24        # 13    [c=8 l=4]  *ashrsi3_extend
        ret             # 21    [c=0 l=4]  simple_return

We actually get a good form to optimize in simplify_binary_operation_1:

> #0  simplify_context::simplify_binary_operation (this=0x7fffffffda68, code=ASHIFTRT, mode=E_SImode,     op0=0x7fffea11eb40, op1=0x7fffea009610) at /home/jlaw/riscv-persist/ventana/gcc/gcc/simplify-rtx.cc:2558
> 2558      gcc_assert (GET_RTX_CLASS (code) != RTX_COMPARE);
> (gdb) p code
> $24 = ASHIFTRT
> (gdb) p mode
> $25 = E_SImode
> (gdb) p debug_rtx (op0)
> (ashift:SI (subreg/s/u:SI (reg/v:DI 74 [ x ]) 0)
>     (const_int 24 [0x18]))
> $26 = void
> (gdb) p debug_rtx (op1)
> (const_int 24 [0x18])
> $27 = void

So that's (ashiftrt (ashift (object) 24) 24), ie sign extension. 

I suspect if we fix simplify_binary_operation_1 then we'll see this get
simplified by fwprop.  I also suspect we could construct a zero extension
variant.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
@ 2023-04-24  2:46 ` wangfeng at eswincomputing dot com
  2023-04-24  3:52 ` wangfeng at eswincomputing dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: wangfeng at eswincomputing dot com @ 2023-04-24  2:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

Feng Wang <wangfeng at eswincomputing dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wangfeng at eswincomputing dot com

--- Comment #1 from Feng Wang <wangfeng at eswincomputing dot com> ---
Hi Jeff,

I have modified some code according to your suggestion.
In simplify-rtx.cc I add below part in canonicalize_shift:
      if ((code == ASHIFTRT)
          && is_a <scalar_int_mode> (mode, &int_mode)
          && GET_CODE (op0) == ASHIFT
          && CONST_INT_P (op1)
          && is_a <scalar_int_mode> (GET_MODE (op0), &inner_mode)
          && CONST_INT_P (XEXP (op0, 1))
          && (INTVAL (XEXP (op0, 1)) == INTVAL (op1)))
        {
          tem = XEXP (op0, 0);
          if (SUBREG_P(XEXP (op0, 0))
              && is_a <scalar_int_mode> (GET_MODE (SUBREG_REG (XEXP (op0, 0))),
                                         &inner_mode)
              && GET_MODE_BITSIZE (inner_mode) > GET_MODE_BITSIZE (int_mode))
          {
            tem = XEXP (XEXP (op0, 0), 0);
          }

          if ((INTVAL (op1) + 8 == 32) || (INTVAL (op1) + 8 == 64))
          {
            op0 = lowpart_subreg (E_QImode, tem, GET_MODE(tem));
            return simplify_gen_unary (SIGN_EXTEND, int_mode, op0, int_mode);
          }

          if ((INTVAL (op1) + 16 == 32) || (INTVAL (op1) + 16 == 64))
          {
            op0 = lowpart_subreg (E_HImode, tem, GET_MODE(tem));
            return simplify_gen_unary (SIGN_EXTEND, int_mode, op0, int_mode);
          }
        }

and at the same time in fwprop.cc I add below conditions in classify_result to
judge whether replacement is PROFITABLE
    /* If have load byte or load half pattern, we can covert
     ashiftrt(ashift(object)const)const to load byte or half form object */
  if (GET_CODE(old_rtx) == ASHIFTRT
      && GET_CODE(new_rtx) == SIGN_EXTEND
      && (GET_MODE(new_rtx) == E_SImode
           && (((GET_MODE(XEXP(new_rtx, 0)) == E_QImode) 
                 && HAVE_extendqisi2)
            || ((GET_MODE(XEXP(new_rtx, 0)) == E_HImode) 
                 && HAVE_extendhisi2)))
          || (GET_MODE(new_rtx) == E_DImode
           && (((GET_MODE(XEXP(new_rtx, 0)) == E_QImode) 
                 && HAVE_extendqidi2)
            || ((GET_MODE(XEXP(new_rtx, 0)) == E_HImode)
                 && HAVE_extendhidi2))))
    return PROFITABLE;

Please check if it is reasonable and look forward to further discussion with
you.Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
  2023-04-24  2:46 ` [Bug rtl-optimization/109592] " wangfeng at eswincomputing dot com
@ 2023-04-24  3:52 ` wangfeng at eswincomputing dot com
  2023-04-24  7:02 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: wangfeng at eswincomputing dot com @ 2023-04-24  3:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #2 from Feng Wang <wangfeng at eswincomputing dot com> ---
(In reply to Feng Wang from comment #1)
> Hi Jeff,
> 
> I have modified some code according to your suggestion.
> In simplify-rtx.cc I add below part in canonicalize_shift:
>       if ((code == ASHIFTRT)
>           && is_a <scalar_int_mode> (mode, &int_mode)
>           && GET_CODE (op0) == ASHIFT
>           && CONST_INT_P (op1)
>           && is_a <scalar_int_mode> (GET_MODE (op0), &inner_mode)
>           && CONST_INT_P (XEXP (op0, 1))
>           && (INTVAL (XEXP (op0, 1)) == INTVAL (op1)))
>         {
>           tem = XEXP (op0, 0);
>           if (SUBREG_P(XEXP (op0, 0))
>               && is_a <scalar_int_mode> (GET_MODE (SUBREG_REG (XEXP (op0,
> 0))),
>                                          &inner_mode)
>               && GET_MODE_BITSIZE (inner_mode) > GET_MODE_BITSIZE (int_mode))
>           {
>             tem = XEXP (XEXP (op0, 0), 0);
>           }
> 
>           if ((INTVAL (op1) + 8 == 32) || (INTVAL (op1) + 8 == 64))
>           {
>             op0 = lowpart_subreg (E_QImode, tem, GET_MODE(tem));
>             return simplify_gen_unary (SIGN_EXTEND, int_mode, op0, int_mode);
>           }
> 
>           if ((INTVAL (op1) + 16 == 32) || (INTVAL (op1) + 16 == 64))
>           {
>             op0 = lowpart_subreg (E_HImode, tem, GET_MODE(tem));
>             return simplify_gen_unary (SIGN_EXTEND, int_mode, op0, int_mode);
>           }
>         }
> 
> and at the same time in fwprop.cc I add below conditions in classify_result
> to judge whether replacement is PROFITABLE
>     /* If have load byte or load half pattern, we can covert
>      ashiftrt(ashift(object)const)const to load byte or half form object */
>   if (GET_CODE(old_rtx) == ASHIFTRT
>       && GET_CODE(new_rtx) == SIGN_EXTEND
>       && ((GET_MODE(new_rtx) == E_SImode
>            && ((GET_MODE(XEXP(new_rtx, 0)) == E_QImode 
>                 && HAVE_extendqisi2)
>             || (GET_MODE(XEXP(new_rtx, 0)) == E_HImode
>                 && HAVE_extendhisi2)))
>           ||(GET_MODE(new_rtx) == E_DImode
>            && ((GET_MODE(XEXP(new_rtx, 0)) == E_QImode
>                  && HAVE_extendqidi2)
>             || (GET_MODE(XEXP(new_rtx, 0)) == E_HImode
>                  && HAVE_extendhidi2))))
>     return PROFITABLE;
> 
> Please check if it is reasonable and look forward to further discussion with
> you.Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
  2023-04-24  2:46 ` [Bug rtl-optimization/109592] " wangfeng at eswincomputing dot com
  2023-04-24  3:52 ` wangfeng at eswincomputing dot com
@ 2023-04-24  7:02 ` rguenth at gcc dot gnu.org
  2023-04-28 20:26 ` law at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-24  7:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-04-24
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #0)
> This is a trivial sign extension:
> 
> int sextb32(int x)
> { return (x << 24) >> 24; }

Btw, during the various attempts at GIMPLE narrowing/widening passes we
thought it might be interesting to add {S,Z}EXT_EXPR to GIMPLE.  For
ZEXT_EXPR the canonical variant is currently a BIT_AND_EXPR with an
appropriate mask, so ZEXT_EXPR wouldn't be any shorter (thus I'm not sure
we want to add it).  For SEXT_EXPR the canonical form is
truncation to signed and then a sign extending cast - thus two conversions.
But at least I don't see the above being canonicalized to that (not sure
if it would help RTL expansion).

We fail to simplify

int sextb32(int x)
{ return (x << 24) >> 24 == (int)(signed char)x; }

because of the missed canonicalization.  Note for (x << 27) >> 27 we'd
end up with bit-precision casts which is ugly and likely unwanted.
A SEXT_EXPR (x, 8) (the extend-from bit position would be required constant)
would be a simplification.

I suspect most ISAs cannot sign-extend from arbitrary bit positions but
only handle QI, HI and SImode though.

Without SEXT_EXPR the bit-precision cases might instead argue for the
shift sequence to be canonical (but generally conversions are easier
to combine with, and conversions have implementation defined behavior
while shifts of signed values have undefinedness issues).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-04-24  7:02 ` rguenth at gcc dot gnu.org
@ 2023-04-28 20:26 ` law at gcc dot gnu.org
  2023-05-11  7:56 ` wangfeng at eswincomputing dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2023-04-28 20:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
If we need to handle subregs here, I would suggest something like this

if (SUBREG_P (XEXP (op0, 0))
    && subreg_lowpart_p (op0)
    ... other tests ...

That way we know we're extracting the low word of the subreg.  But I'm not sure
at all why we need to handle them in this code.  I would expect generic
optimizers to strip away the subregs in the result if they are extraneous.

It's not clear why you check the size of the subreg modes.  It seems like this
optimization should work even for a paradoxical subreg (bitsize of inner will
be smaller than bitsize of outer).  

In general if you only have one statement in an arm of an IF-THEN-ELSE, then it
need not be inside a { } block.

Rather than using magic numbers like

INTVAL (op1) + 8 == 32

Instead use mode information.

INTVAL (op) + GET_MODE_BITSIZE (QImode) == GET_MODE_BITSIZE (SImode)
// code for QI->SI expansion

Then repeat for the other mode combinations.

Note that we probably should go ahead and support QI->HI.  While it doesn't
happen for RISC-V, it could likely happen on other architectures.  So you end
up wanting to supprot

QI->HI, QI->SI QI->DI
HI->SI, HI->DI
SI->DI

I don't know if it happens in practice, so check first to see what we do for a
zero extension variant of your original test.  If we need to handle that too,
it can be easily done by changing the shifts we recognize.

Anyway, it looks like you're on the right track.  I would suggest further
discussions happen on gcc-patches.

Anyway, it definitely looks like you're on the right track.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-04-28 20:26 ` law at gcc dot gnu.org
@ 2023-05-11  7:56 ` wangfeng at eswincomputing dot com
  2023-05-11 14:33 ` law at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: wangfeng at eswincomputing dot com @ 2023-05-11  7:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #5 from Feng Wang <wangfeng at eswincomputing dot com> ---
I found something interesting, these operations such as
zero_extend,sign_extend,zero_extract,sign_extract will be considered as
compound operation and will transform to the approriate shifts and AND
operations(This proc is in expand_compound_operation). So if the sign_extend op
generated earlier,it will be converted by expand_compound_operation again
during the after pass such as combine pass. 
In the combine pass, it will perform the opposite operation as above.Through
make_compound and make_extraction functions two shifts insns will combine into
sign_extend or zero_extend if the pattern exist,and then judge the cost of
it.It is profitable replacement only when the arch supports sign_extend and
costs less than two shifts insns.
So I think the first patch is enough,
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg303789.html
it handles the introduction of subregs when compiling and processing 32-bit
using a 64 bit compiler.
The first patch process the below rtl.
(set (reg:SI 137)
    (ashiftrt:SI (subreg:SI (ashift:DI (reg:DI 140)
                (const_int 24 [0x18])) 0)
        (const_int 24 [0x18])))
During extract_left_shift processing,it wants to get reg:DI 140,but now the
extraction is failed due to the subreg is in the front of the ashift. So I
think we just need jump the subreg is enough to ensure normal subsequent
processing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-05-11  7:56 ` wangfeng at eswincomputing dot com
@ 2023-05-11 14:33 ` law at gcc dot gnu.org
  2023-05-29 16:35 ` law at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2023-05-11 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #6 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I would still rather not introduce special cases for SUBREGs if we can avoid
it.  I think the question remains whether or not patching simplify-rtx's
canonicalize_shift is sufficient to fix this problem (perhaps with the
adjustment to fwprop as well).  If they are, then they would be much preferred
over the original patch which special cased SUBREGs.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-05-11 14:33 ` law at gcc dot gnu.org
@ 2023-05-29 16:35 ` law at gcc dot gnu.org
  2023-05-30  6:17 ` wangfeng at eswincomputing dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2023-05-29 16:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Attached is what I cobbled together.  It doesn't use magic numbers.  But it
doesn't yet handle zero extensions in the simplify-rtx code.  But I think it
shows the overall direction fairly well.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-05-29 16:35 ` law at gcc dot gnu.org
@ 2023-05-30  6:17 ` wangfeng at eswincomputing dot com
  2023-05-30 20:40 ` law at gcc dot gnu.org
  2023-05-30 20:41 ` law at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: wangfeng at eswincomputing dot com @ 2023-05-30  6:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #8 from Feng Wang <wangfeng at eswincomputing dot com> ---
(In reply to Jeffrey A. Law from comment #7)
> Attached is what I cobbled together.  It doesn't use magic numbers.  But it
> doesn't yet handle zero extensions in the simplify-rtx code.  But I think it
> shows the overall direction fairly well.
Sorry, I can't find the attachement. I think we need to consider more here,such
as the cost of zero/sign extensions, only convert shifts to zero/sign
extensions when it cost less in certain hardware architectures. At the same we
also consider the shift ins is more suitable for combine pass.Maybe We should
consider that whether the conversion prevents a better combination in combine
pass(I need study a suitable test case).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-05-30  6:17 ` wangfeng at eswincomputing dot com
@ 2023-05-30 20:40 ` law at gcc dot gnu.org
  2023-05-30 20:41 ` law at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2023-05-30 20:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #9 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Weird, I don't see the attachment either.  I'll extract & upload it again.

WRT costing.  fwprop and combine will both query the target rtx costs and will
reject when the target costing model indicates the change isn't actually
profitable.

As you'd noted before, combine will internally transform a sign/zero extension
into a pair of shifts.  The whole point of that internal canonicalization is to
expose cases where the shifts can combine with other nearby operations.  So
there's no significant risk to detecting and creating the extension form
earlier.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension
  2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-05-30 20:40 ` law at gcc dot gnu.org
@ 2023-05-30 20:41 ` law at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2023-05-30 20:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #10 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Created attachment 55218
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55218&action=edit
(Incomplete) Patch

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-30 20:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-22  0:12 [Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension law at gcc dot gnu.org
2023-04-24  2:46 ` [Bug rtl-optimization/109592] " wangfeng at eswincomputing dot com
2023-04-24  3:52 ` wangfeng at eswincomputing dot com
2023-04-24  7:02 ` rguenth at gcc dot gnu.org
2023-04-28 20:26 ` law at gcc dot gnu.org
2023-05-11  7:56 ` wangfeng at eswincomputing dot com
2023-05-11 14:33 ` law at gcc dot gnu.org
2023-05-29 16:35 ` law at gcc dot gnu.org
2023-05-30  6:17 ` wangfeng at eswincomputing dot com
2023-05-30 20:40 ` law at gcc dot gnu.org
2023-05-30 20:41 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).