public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/82580] Optimize comparisons for __int128 on x86-64
       [not found] <bug-82580-4@http.gcc.gnu.org/bugzilla/>
@ 2023-07-07 20:23 ` roger at nextmovesoftware dot com
  2024-01-22  9:55 ` ubizjak at gmail dot com
  2024-02-01 20:30 ` ubizjak at gmail dot com
  2 siblings, 0 replies; 3+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-07 20:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82580

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #16 from Roger Sayle <roger at nextmovesoftware dot com> ---
Advance warning that the testcase pr82580.c will start FAILing due to
differences in register allocation following improvements to __int128 parameter
passing as explained in
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html.
We might need additional reload alternatives/preferences to ensure that we
don't generate a movzbl.  Hopefully, Jakub and/or Uros have some suggestions
for how best this can be fixed.

Previously, the SUBREGs and CLOBBERs generated by middle-end RTL expansion
(unintentionally) ensured that rdx and rax would never be used for __int128
arguments, which conveniently allowed the use of xor eax,eax; setc al in
peephole2 as AX_REG wasn't live.  Now reload has more freedom, it elects to use
rax as at this point the backend hasn't expressed any preference that it would
like eax reserved for producing the result.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/82580] Optimize comparisons for __int128 on x86-64
       [not found] <bug-82580-4@http.gcc.gnu.org/bugzilla/>
  2023-07-07 20:23 ` [Bug target/82580] Optimize comparisons for __int128 on x86-64 roger at nextmovesoftware dot com
@ 2024-01-22  9:55 ` ubizjak at gmail dot com
  2024-02-01 20:30 ` ubizjak at gmail dot com
  2 siblings, 0 replies; 3+ messages in thread
From: ubizjak at gmail dot com @ 2024-01-22  9:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82580

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #17 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Roger Sayle from comment #16)
> Advance warning that the testcase pr82580.c will start FAILing due to
> differences in register allocation following improvements to __int128
> parameter passing as explained in
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html.
> We might need additional reload alternatives/preferences to ensure that we
> don't generate a movzbl.  Hopefully, Jakub and/or Uros have some suggestions
> for how best this can be fixed.
> 
> Previously, the SUBREGs and CLOBBERs generated by middle-end RTL expansion
> (unintentionally) ensured that rdx and rax would never be used for __int128
> arguments, which conveniently allowed the use of xor eax,eax; setc al in
> peephole2 as AX_REG wasn't live.  Now reload has more freedom, it elects to
> use rax as at this point the backend hasn't expressed any preference that it
> would like eax reserved for producing the result.

A different regression happens with pr82580.c, f0 function. Without the patch,
the compiler generates:

f0:
        xorq    %rdi, %rdx
        xorq    %rcx, %rsi
        xorl    %eax, %eax
        orq     %rsi, %rdx
        sete    %al
        ret

But with the patch:

f0:
        xchgq   %rdi, %rsi
        movq    %rdx, %r8
        movq    %rcx, %rax
        movq    %rsi, %rdx
        movq    %rdi, %rcx
        xorq    %rax, %rcx
        xorq    %r8, %rdx
        xorl    %eax, %eax
        orq     %rcx, %rdx
        sete    %al
        ret

It looks to me that *concatditi3_3 ties two registers together so RA now tries
to satisfy *concatditi3_3 constraints *and* *cmpti_doubleword constraints.

The gcc.target/i386/pr43644-2.c mitigates this issue with
*addti3_doubleword_concat pattern that combines *addti3_doubleword with concat
insn, but doubleword compares (and other doubleword insn besides addti3) do not
provide these compound instructions.

So, without a common strategy to use doubleword_concat patterns for all double
word instructions, it is questionable if the complications with concat insn are
worth the pain of providing (many?) doubleword_concat patterns.

The real issue is with x86_64 doubleword arguments. Unfortunately, the ABI
specifies RDI/RSI to pass the double word argument, while the compiler regalloc
order sequence is RSI/RDI. IMO, we can try to swap RDI and RSI in the order and
RA would be able to allocate registers in the same optimal way as for x86_32
with -mregparm=3, even without synthetic concat patterns.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/82580] Optimize comparisons for __int128 on x86-64
       [not found] <bug-82580-4@http.gcc.gnu.org/bugzilla/>
  2023-07-07 20:23 ` [Bug target/82580] Optimize comparisons for __int128 on x86-64 roger at nextmovesoftware dot com
  2024-01-22  9:55 ` ubizjak at gmail dot com
@ 2024-02-01 20:30 ` ubizjak at gmail dot com
  2 siblings, 0 replies; 3+ messages in thread
From: ubizjak at gmail dot com @ 2024-02-01 20:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82580

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #18 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #17)

> A different regression happens with pr82580.c, f0 function. Without the
> patch, the compiler generates:

Fixed by r14-8713-g44764984cf24e27cf7756cffd197283b9c62db8b

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-02-01 20:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-82580-4@http.gcc.gnu.org/bugzilla/>
2023-07-07 20:23 ` [Bug target/82580] Optimize comparisons for __int128 on x86-64 roger at nextmovesoftware dot com
2024-01-22  9:55 ` ubizjak at gmail dot com
2024-02-01 20:30 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).