From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E013A384D163; Thu, 6 Oct 2022 05:33:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E013A384D163 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665034439; bh=s+T0sY9M4400pohulURu6ryhN5ftxm84ZSXwG5nDLLE=; h=From:To:Subject:Date:From; b=EoXyHOfTsmTFdeBv3iOeFtBZmFZdEZsHM9k2aBIBEH8eW7RO1wzrLQQuhPA2bKh2H iN1AUUTR9T7D3iiG0BcDZPqXFB20He1hL/t+e8QmoFPYMbUU4tlPHpe8oXxP+hoHD5 DYxc64Z1bHDXJNvATmy5aZthnTeFIxT82J8h1HUQ= From: "unlvsur at live dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/107167] New: It looks like GCC wastes registers on trivial computations when result can be cached Date: Thu, 06 Oct 2022 05:33:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: unlvsur at live dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107167 Bug ID: 107167 Summary: It looks like GCC wastes registers on trivial computations when result can be cached Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: unlvsur at live dot com Target Milestone: --- I do not know whether it is a big issue or not with targets that provide to= ns of available registers (like aarch64 or loongarch64). However, this looks l= ike a big issue for x86_64 which only provides 16 general purpose registers (pl= us %rsp is reserved, so 15 available registers) Take the example like this: https://godbolt.org/z/77rEsr1PG #include unsigned Sigma1(unsigned x) noexcept { return std::rotr(x,6)^std::rotr(x,11)^std::rotr(x,25); } GCC generates code like this to avoid dependencies. Sigma1m(unsigned int): movl %edi, %eax movl %edi, %edx roll $7, %edi rorl $6, %eax rorl $11, %edx xorl %edx, %eax xorl %edi, %eax ret However: mySigma1m(unsigned int): movl %edi, %eax rorl $6, %edi rorl $11, %eax xorl %edi, %eax rorl $19, %edi xorl %edi, %eax ret Saves one register in this task. That becomes a huge problem when tons of computation are involved where registers are in a position of shortage. 1st one also generates 1 more instruction and it can affect the code cache. Aggressively utilizing all registers may not give the best results. Local maximum =3D/=3D Global maximum. I don't know.=