public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95632] New: Redundant zero extension
@ 2020-06-11  4:06 bina2374 at gmail dot com
  2020-06-12  4:25 ` [Bug target/95632] " wilson at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: bina2374 at gmail dot com @ 2020-06-11  4:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

            Bug ID: 95632
           Summary: Redundant zero extension
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bina2374 at gmail dot com
                CC: kito at gcc dot gnu.org, wilson at gcc dot gnu.org
  Target Milestone: ---
            Target: riscv32-unknown-elf

Command line: bin/riscv64-unknown-elf-gcc -march=rv32imafc -mabi=ilp32f -O2
foo.c -S

==========
 C Source
==========
unsigned short foo(unsigned short crc) {
  crc ^= 0x4002;
  crc >>= 1;
  crc |= 0x8000;

  return crc;
}

=========
 GCC asm
=========
foo:
        li      a5,-24576  #
        addi    a5,a5,1    # a5 = 0xffffa001
        srli    a0,a0,1
        xor     a0,a0,a5
        slli    a0,a0,16   # 
        srli    a0,a0,16   # redundant zero-extension
        ret

=======================
 Ideal Code Generation 
=======================
foo:
        li      a5,40960   #
        addi    a5,a5,1    # a5 = 0x0000a001
        srli    a0,a0,1
        xor     a0,a0,a5
        ret

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
@ 2020-06-12  4:25 ` wilson at gcc dot gnu.org
  2020-06-15  9:49 ` bina2374 at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-06-12  4:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-06-12
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Jim Wilson <wilson at gcc dot gnu.org> ---
We sign extend HImode constants as that is the natural thing to do to make
arithmetic work.  This does mean that unsigned short logical operations need a
zero extend after the operation which might otherwise be unnecessary.  This
can't be handled at rtl generation time as we don't know if the constant will
be used for arithmetic or logicals or signed or unsigned.  But maybe an
optimization pass could go over the code and convert HImode constants to signed
or unsigned as appropriate to reduce the number of sign/zero extend operations.
 We have the ree pass that we might be able to extend to handle this.

Handling this in combine requires a 4->3 splitter which is something combine
doesn't do.  We could work around that by not splitting constants before
combine, but that would be a major change and probably not beneficial, as we
wouldn't be able to easily optimize the high part of the constants anymore.

Another approach here might be to split the xor along with the constant.  If we
generated something like
        srli    a0,a0,1
        xori    a0,a0,1
        li      a5,-24576
        xor     a0,a0,a5
then we can optimize away the following zero extend with a 3->2 splitter which
combine already supports via find_split_point.  We can still optimize the high
part of the constant. Since the immediates are sign extended, if the low part
of the immediate has the sign bit set, we would have to invert the high part of
the immediate to get the right result.  At least I think that works, I haven't
double checked it yet.  This only works for or if the low part doesn't have the
sign bit set.  And this only works for and if the low part does have the sign
bit set.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
  2020-06-12  4:25 ` [Bug target/95632] " wilson at gcc dot gnu.org
@ 2020-06-15  9:49 ` bina2374 at gmail dot com
  2020-06-16  0:01 ` wilson at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bina2374 at gmail dot com @ 2020-06-15  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #2 from Mel Chen <bina2374 at gmail dot com> ---
(In reply to Jim Wilson from comment #1)
> We sign extend HImode constants as that is the natural thing to do to make
> arithmetic work.  This does mean that unsigned short logical operations need
> a zero extend after the operation which might otherwise be unnecessary. 
> This can't be handled at rtl generation time as we don't know if the
> constant will be used for arithmetic or logicals or signed or unsigned.  But
> maybe an optimization pass could go over the code and convert HImode
> constants to signed or unsigned as appropriate to reduce the number of
> sign/zero extend operations.  We have the ree pass that we might be able to
> extend to handle this.

Extend ree pass is a good way, but now it seems only scanning XXX_extend.
Because the zero_extend has been split to 2 shift instructions before ree pass,
do we need to keep zero_extend until ree pass? Or is there any other way to
know that the shift pair was a zero_extend?
> 
> Handling this in combine requires a 4->3 splitter which is something combine
> doesn't do.  We could work around that by not splitting constants before
> combine, but that would be a major change and probably not beneficial, as we
> wouldn't be able to easily optimize the high part of the constants anymore.

I agree. This way is a bit risky.
> 
> Another approach here might be to split the xor along with the constant.  If
> we generated something like
> 	srli	a0,a0,1
>         xori    a0,a0,1
> 	li	a5,-24576
> 	xor	a0,a0,a5
> then we can optimize away the following zero extend with a 3->2 splitter
> which combine already supports via find_split_point.  We can still optimize
> the high part of the constant. Since the immediates are sign extended, if
> the low part of the immediate has the sign bit set, we would have to invert
> the high part of the immediate to get the right result.  At least I think
> that works, I haven't double checked it yet.  This only works for or if the
> low part doesn't have the sign bit set.  And this only works for and if the
> low part does have the sign bit set.

I'm not sure how difficult it is to split 1 xor to 2 xor before combine pass,
but I have another proposal:

The following dump is combine dump:
Trying 8, 9, 10 -> 11:
    8: r79:SI=0xffffffffffffa000
    9: r78:SI=r79:SI+0x1
      REG_DEAD r79:SI
      REG_EQUAL 0xffffffffffffa001
   10: r77:SI=r72:SI^r78:SI
      REG_DEAD r78:SI
      REG_DEAD r72:SI
   11: r80:SI=zero_extend(r77:SI#0)
      REG_DEAD r77:SI
Failed to match this instruction:
(set (reg:SI 80)
    (xor:SI (reg:SI 72 [ _4 ])
        (const_int 40961 [0xa001])))

Is it possible to pretend that we have a pattern that can match xor (reg:SI
80), (reg: SI 72), 0xa001 in combine pass?
And then, if the constant part is too large to put in to the immediate part, it
can be split to 2 xor in split pass.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
  2020-06-12  4:25 ` [Bug target/95632] " wilson at gcc dot gnu.org
  2020-06-15  9:49 ` bina2374 at gmail dot com
@ 2020-06-16  0:01 ` wilson at gcc dot gnu.org
  2020-06-16  0:02 ` wilson at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-06-16  0:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #3 from Jim Wilson <wilson at gcc dot gnu.org> ---
It isn't possible to have patterns that match only in combine.  If we add a
pattern to accept (xor (reg) (large constant)) then it could match in any
optimization pass, and could prevent us from optimizing away redundant lui
instructions.

There is a representation issue here with constants.  If we split them early,
then optimizing redundant lui is easy.  If we split them late, then optimizing
redundant lui is hard.  There are also other optimizations that may be easy or
hard depending on whether constants are split early or late.  Currently, we
always split constants early, and changing that will have a major impact on the
code optimization, which may be good or bad, but more likely will be good for
some programs and bad for others.  I'd rather not change this as it will be a
major project to deal with the problems caused by the change.

Hence my suggestion at RTL generation time to split xor with constants
differently.  I have a proof of concept patch for that, but it needs a lot of
cleanup to be useful, and a lot of testing to verify that it improves code more
often than it harms code.

As for ree, splitters after register allocation traditionally check
reload_completed which is a global variable set near the end of the last
register allocation pass.  The split2 pass happens between reload and ree. 
Maybe moving ree before split2 would help RISC-V, but might hurt other targets.
 Or might help for some programs and hurt for others.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
                   ` (2 preceding siblings ...)
  2020-06-16  0:01 ` wilson at gcc dot gnu.org
@ 2020-06-16  0:02 ` wilson at gcc dot gnu.org
  2020-06-16  7:15 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-06-16  0:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #4 from Jim Wilson <wilson at gcc dot gnu.org> ---
Created attachment 48737
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48737&action=edit
proof of concept patch for changing xor with a large constant

needs cleanup and testing to be useful

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
                   ` (3 preceding siblings ...)
  2020-06-16  0:02 ` wilson at gcc dot gnu.org
@ 2020-06-16  7:15 ` ubizjak at gmail dot com
  2020-06-16  7:21 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2020-06-16  7:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Mel Chen from comment #2)
> Is it possible to pretend that we have a pattern that can match xor (reg:SI
> 80), (reg: SI 72), 0xa001 in combine pass?
> And then, if the constant part is too large to put in to the immediate part,
> it can be split to 2 xor in split pass.

Please note that the combine pass has its own (rather limited) splitter, it is
documented in the second part of "Defining How to Split Instructions"
paragraph. The example is dealing with the instruction that has too large
immediate part, and looks similar to your problem.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
                   ` (4 preceding siblings ...)
  2020-06-16  7:15 ` ubizjak at gmail dot com
@ 2020-06-16  7:21 ` ubizjak at gmail dot com
  2021-05-30 22:26 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2020-06-16  7:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #5)
> (In reply to Mel Chen from comment #2)
> > Is it possible to pretend that we have a pattern that can match xor (reg:SI
> > 80), (reg: SI 72), 0xa001 in combine pass?
> > And then, if the constant part is too large to put in to the immediate part,
> > it can be split to 2 xor in split pass.
> 
> Please note that the combine pass has its own (rather limited) splitter, it
> is documented in the second part of "Defining How to Split Instructions"
> paragraph. The example is dealing with the instruction that has too large
> immediate part, and looks similar to your problem.

Oh, I missed the discussion above. In this case, x86 implements pre-reload
splits, please see patterns decorated with ix86_pre_reload_split condition.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
                   ` (5 preceding siblings ...)
  2020-06-16  7:21 ` ubizjak at gmail dot com
@ 2021-05-30 22:26 ` pinskia at gcc dot gnu.org
  2022-12-27 23:31 ` cvs-commit at gcc dot gnu.org
  2022-12-27 23:32 ` law at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-05-30 22:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
                   ` (6 preceding siblings ...)
  2021-05-30 22:26 ` pinskia at gcc dot gnu.org
@ 2022-12-27 23:31 ` cvs-commit at gcc dot gnu.org
  2022-12-27 23:32 ` law at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-27 23:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:2e886eef7f2b5aadb00171af868f0895b647c3a4

commit r13-4907-g2e886eef7f2b5aadb00171af868f0895b647c3a4
Author: Raphael Moreira Zinsly <rzinsly@ventanamicro.com>
Date:   Tue Dec 27 18:29:25 2022 -0500

    RISC-V: Produce better code with complex constants [PR95632] [PR106602]

    gcc/Changelog:
            PR target/95632
            PR target/106602
            * config/riscv/riscv.md: New pattern to simulate complex
            const_int loads.

    gcc/testsuite/ChangeLog:
            * gcc.target/riscv/pr95632.c: New test.
            * gcc.target/riscv/pr106602.c: New test.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/95632] Redundant zero extension
  2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
                   ` (7 preceding siblings ...)
  2022-12-27 23:31 ` cvs-commit at gcc dot gnu.org
@ 2022-12-27 23:32 ` law at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: law at gcc dot gnu.org @ 2022-12-27 23:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Fixed on the trunk.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-27 23:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11  4:06 [Bug target/95632] New: Redundant zero extension bina2374 at gmail dot com
2020-06-12  4:25 ` [Bug target/95632] " wilson at gcc dot gnu.org
2020-06-15  9:49 ` bina2374 at gmail dot com
2020-06-16  0:01 ` wilson at gcc dot gnu.org
2020-06-16  0:02 ` wilson at gcc dot gnu.org
2020-06-16  7:15 ` ubizjak at gmail dot com
2020-06-16  7:21 ` ubizjak at gmail dot com
2021-05-30 22:26 ` pinskia at gcc dot gnu.org
2022-12-27 23:31 ` cvs-commit at gcc dot gnu.org
2022-12-27 23:32 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).