From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0B4AF3857BB6; Mon, 22 Jan 2024 09:55:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0B4AF3857BB6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1705917312; bh=8exXik6QtwKFA97Wnlj9gsXEov2Jhd1Lnyp0yfZDv5A=; h=From:To:Subject:Date:In-Reply-To:References:From; b=sYfWr0KEi+3OvLVbsLsy4kPM6EdSy5m0I8AtMIC7vdBw+qs3s2LsZ2oyEPe7IYTrn q+kiY+nwZsDwa8La2/lAtgc+yP8O+68glL6pD6FoM2Y/m/Bm9yEvmpddnruBEJ7uR3 98A5ArVn+QpmwMt7ai9Crs2lakhRrKfvn1v9j2Jo= From: "ubizjak at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/82580] Optimize comparisons for __int128 on x86-64 Date: Mon, 22 Jan 2024 09:55:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 7.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: ubizjak at gmail dot com X-Bugzilla-Status: REOPENED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: ubizjak at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D82580 Uro=C5=A1 Bizjak changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #17 from Uro=C5=A1 Bizjak --- (In reply to Roger Sayle from comment #16) > Advance warning that the testcase pr82580.c will start FAILing due to > differences in register allocation following improvements to __int128 > parameter passing as explained in > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html. > We might need additional reload alternatives/preferences to ensure that we > don't generate a movzbl. Hopefully, Jakub and/or Uros have some suggesti= ons > for how best this can be fixed. >=20 > Previously, the SUBREGs and CLOBBERs generated by middle-end RTL expansion > (unintentionally) ensured that rdx and rax would never be used for __int1= 28 > arguments, which conveniently allowed the use of xor eax,eax; setc al in > peephole2 as AX_REG wasn't live. Now reload has more freedom, it elects = to > use rax as at this point the backend hasn't expressed any preference that= it > would like eax reserved for producing the result. A different regression happens with pr82580.c, f0 function. Without the pat= ch, the compiler generates: f0: xorq %rdi, %rdx xorq %rcx, %rsi xorl %eax, %eax orq %rsi, %rdx sete %al ret But with the patch: f0: xchgq %rdi, %rsi movq %rdx, %r8 movq %rcx, %rax movq %rsi, %rdx movq %rdi, %rcx xorq %rax, %rcx xorq %r8, %rdx xorl %eax, %eax orq %rcx, %rdx sete %al ret It looks to me that *concatditi3_3 ties two registers together so RA now tr= ies to satisfy *concatditi3_3 constraints *and* *cmpti_doubleword constraints. The gcc.target/i386/pr43644-2.c mitigates this issue with *addti3_doubleword_concat pattern that combines *addti3_doubleword with con= cat insn, but doubleword compares (and other doubleword insn besides addti3) do= not provide these compound instructions. So, without a common strategy to use doubleword_concat patterns for all dou= ble word instructions, it is questionable if the complications with concat insn= are worth the pain of providing (many?) doubleword_concat patterns. The real issue is with x86_64 doubleword arguments. Unfortunately, the ABI specifies RDI/RSI to pass the double word argument, while the compiler rega= lloc order sequence is RSI/RDI. IMO, we can try to swap RDI and RSI in the order= and RA would be able to allocate registers in the same optimal way as for x86_32 with -mregparm=3D3, even without synthetic concat patterns.=