public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/107167] New: It looks like GCC wastes registers on trivial computations when result can be cached
@ 2022-10-06  5:33 unlvsur at live dot com
  2022-10-06  5:55 ` [Bug rtl-optimization/107167] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: unlvsur at live dot com @ 2022-10-06  5:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107167

            Bug ID: 107167
           Summary: It looks like GCC wastes registers on trivial
                    computations when result can be cached
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: unlvsur at live dot com
  Target Milestone: ---

I do not know whether it is a big issue or not with targets that provide tons
of available registers (like aarch64 or loongarch64). However, this looks like
a big issue for x86_64 which only provides 16 general purpose registers (plus
%rsp is reserved, so 15 available registers)
Take the example like this:

https://godbolt.org/z/77rEsr1PG

#include<bit>

unsigned Sigma1(unsigned x) noexcept
{
    return std::rotr(x,6)^std::rotr(x,11)^std::rotr(x,25);
}


GCC generates code like this to avoid dependencies.
Sigma1m(unsigned int):
        movl    %edi, %eax
        movl    %edi, %edx
        roll    $7, %edi
        rorl    $6, %eax
        rorl    $11, %edx
        xorl    %edx, %eax
        xorl    %edi, %eax
        ret

However:
mySigma1m(unsigned int):
        movl    %edi, %eax
        rorl    $6, %edi
        rorl    $11, %eax
        xorl    %edi, %eax
        rorl    $19, %edi
        xorl    %edi, %eax
        ret

Saves one register in this task. That becomes a huge problem when tons of
computation are involved where registers are in a position of shortage.

1st one also generates 1 more instruction and it can affect the code cache.

Aggressively utilizing all registers may not give the best results. Local
maximum =/= Global maximum.
I don't know.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-07  1:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06  5:33 [Bug rtl-optimization/107167] New: It looks like GCC wastes registers on trivial computations when result can be cached unlvsur at live dot com
2022-10-06  5:55 ` [Bug rtl-optimization/107167] " pinskia at gcc dot gnu.org
2022-10-06  6:00 ` unlvsur at live dot com
2022-10-06  6:03 ` unlvsur at live dot com
2022-10-06  6:05 ` unlvsur at live dot com
2022-10-06  6:47 ` pinskia at gcc dot gnu.org
2022-10-07  1:41 ` unlvsur at live dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).