public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/110423] New: Redundant constants not gettign eliminated on RISCV.
@ 2023-06-27  0:54 vineetg at gcc dot gnu.org
  2023-06-27  0:59 ` [Bug rtl-optimization/110423] Redundant constants not getting " vineetg at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: vineetg at gcc dot gnu.org @ 2023-06-27  0:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110423

            Bug ID: 110423
           Summary: Redundant constants not gettign eliminated on RISCV.
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vineetg at gcc dot gnu.org
  Target Milestone: ---

Redundant constants, across basic blocks, don't seem to be eliminated robustly
by gcc. I'd reported this last year [1] and this is just recapturing that info
as a bugzilla PR.

[1] https://gcc.gnu.org/pipermail/gcc/2022-October/239645.html

When analyzing coremark build for RISC-V, noticed redundant constants 
not being eliminated. While this is a recurrent issue with RV, this 
specific instance is not unique to RV as I can trigger similar output on 
aarch64 with -fno-if-conversion, hence something which could be 
addressed in common passes.

-O3 -march=rv64gc_zba_zbb

crcu8:
        xor     a3,a0,a1
        andi    a3,a3,1
        srli    a4,a0,1
        srli    a5,a1,1
        beq     a3,zero,.L2

        li      a3,-24576       # 0xFFFF_A000
        addi    a3,a3,1         # 0xFFFF_A001
        xor     a5,a5,a3
        zext.h  a5,a5

.L2:
        xor     a4,a4,a5        
        andi    a4,a4,1 
        srli    a3,a0,2 
        srli    a5,a5,1 
        beq     a4,zero,.L3     

        li      a4,-24576       # 0xFFFF_A000
        addi    a4,a4,1         # 0xFFFF_A001
        xor     a5,a5,a4        
        zext.h  a5,a5           

.L3:
        xor     a3,a3,a5        
        andi    a3,a3,1 
        srli    a4,a0,3 
        srli    a5,a5,1 
        beq     a3,zero,.L4

        li      a3,-24576       # 0xFFFF_A000
        addi    a3,a3,1         # 0xFFFF_A001
[...]

.L8
        andi    a3,a5,1
        srli    a0,a5,1
        beq     a3,a4,.L9
        li      a5,-24576       # 0xFFFF_A000
        addi    a5,a5,1         # 0xFFFF_A001
        xor     a0,a0,a5
        slli    a0,a0,48
        srli    a0,a0,48
.L9:
        ret


cse can't handle this: as explained by Jeff in [2] EBB can have jumps out but
not jumps in, which misses the cfg paths needed to be traversed to find the
equivalents.

[2] https://gcc.gnu.org/pipermail/gcc/2022-October/239646.html

Note that since gcc 13.1, this specific test generates different code since the
match.pd change 6508d5e5a1a ("match.pd: rewrite select to branchless
expression") now removes the branches and the arithmatic needing the large
const. 
But this test is very convenient, so I'm continuing to use it and just revert
the match.pd change in my local gcc build.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/110423] Redundant constants not getting eliminated on RISCV.
  2023-06-27  0:54 [Bug rtl-optimization/110423] New: Redundant constants not gettign eliminated on RISCV vineetg at gcc dot gnu.org
@ 2023-06-27  0:59 ` vineetg at gcc dot gnu.org
  2023-06-28 13:45 ` law at gcc dot gnu.org
  2023-07-08  0:07 ` vineetg at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: vineetg at gcc dot gnu.org @ 2023-06-27  0:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110423

--- Comment #1 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
Created attachment 55402
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55402&action=edit
test case (but needs reverting upstream 6508d5e5a1a)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/110423] Redundant constants not getting eliminated on RISCV.
  2023-06-27  0:54 [Bug rtl-optimization/110423] New: Redundant constants not gettign eliminated on RISCV vineetg at gcc dot gnu.org
  2023-06-27  0:59 ` [Bug rtl-optimization/110423] Redundant constants not getting " vineetg at gcc dot gnu.org
@ 2023-06-28 13:45 ` law at gcc dot gnu.org
  2023-07-08  0:07 ` vineetg at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: law at gcc dot gnu.org @ 2023-06-28 13:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110423

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org

--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So there is another broad approach we can take here.

As Vineet mentioned, this isn't really a job for PRE/LCM as those are
formulated around a requirement that they never insert an expression evaluation
in any path that did not have an evaluation before.  ie no speculative constant
loads.

We could potentially relax that condition.  I'm not sure we'd formulate it as a
PRE/LCM problem, but it gives you a sense of how we could tackle this.  The
difficulty would be in the heuristics for when to apply this transformation
since it will make some codes slower and may increase register pressure.  This
is derived heavily from Click's work in the 90s.  This would happen in gimple
most likely, though I guess one could do it in RTL if they have a high pain
threshold.

In the simplest way to think about the placement algorithm is to find the
blocks where all the uses of any given constant C occur.  A trivially correct
placement of load of that constant would be the entry block as it must dominate
every block in that set.  Of course that would make the placement quite
speculative and lengthen live ranges.  That's usually referred to an an early
placement.

Next find the latest placement for the constant load that covers all the uses. 
That will be the lowest common ancestor in the dominator tree of the set of
blocks that use the constant.

If you were to imagine a path through the dominator tree starting at the early
placement (entry) and ending at the lowest common ancestor, any block on that
path could be selected for generating the constant load and would cover every
use with that single load.  Within the set of blocks on that path, find the set
with the lowest loop nesting, then within that reduced set find those with the
deepest control nesting (or lowest estimated frequency counts).  There may be
more than one block in that final set.  Any are valid and "reasonable" choices.


Click's paper is much more general, but the same concepts apply.  His paper
doesn't cover anything like bifurcating the graph (thus allowing multiple
constant loads in an effort to reduce undesired speculation or register
allocation conflicts).

We might be able to get away with this precisely because these are constant
loads and thus subject to rematerialization later if register pressure is high.

https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/110423] Redundant constants not getting eliminated on RISCV.
  2023-06-27  0:54 [Bug rtl-optimization/110423] New: Redundant constants not gettign eliminated on RISCV vineetg at gcc dot gnu.org
  2023-06-27  0:59 ` [Bug rtl-optimization/110423] Redundant constants not getting " vineetg at gcc dot gnu.org
  2023-06-28 13:45 ` law at gcc dot gnu.org
@ 2023-07-08  0:07 ` vineetg at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: vineetg at gcc dot gnu.org @ 2023-07-08  0:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110423

--- Comment #3 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #2)

> This is derived heavily from Click's work in the 90s. 
> This would happen in gimple most likely, though I guess one could do it in
> RTL if they have a high pain threshold.

If a gimple pass, it won't help catch the late reload induced
rematerializations, which is seen on a lot of SPEC workloads, e.g. cactu for
stack addressing. Although I guess Manolis' fold const offset pass patch would
help things a bit.

> Click's paper is much more general, but the same concepts apply.  His paper
> doesn't cover anything like bifurcating the graph (thus allowing multiple
> constant loads in an effort to reduce undesired speculation or register
> allocation conflicts).
> 
> We might be able to get away with this precisely because these are constant
> loads and thus subject to rematerialization later if register pressure is
> high.
> 
> https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.
> pdf

The prospect of implementing Cliff's Global Value Numbering is very exciting,
however I would like to start small. Started digging into gcse.cc Hoist pass,
granted this is still pre-reload. It seems Hoist has some global redundancy
elimination capabilities for constants, added by Maxim Kuvyrkov back in 2010. I
need to see what it can and can not do.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-08  0:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-27  0:54 [Bug rtl-optimization/110423] New: Redundant constants not gettign eliminated on RISCV vineetg at gcc dot gnu.org
2023-06-27  0:59 ` [Bug rtl-optimization/110423] Redundant constants not getting " vineetg at gcc dot gnu.org
2023-06-28 13:45 ` law at gcc dot gnu.org
2023-07-08  0:07 ` vineetg at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).