From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 647E03858C39; Thu, 16 Feb 2023 17:53:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 647E03858C39 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1676570012; bh=VmhpvI/TYJIS64aqjfbS93JdGlILQbbyr42jL72Fgo8=; h=From:To:Subject:Date:From; b=oGQ41+orsojd1iYXYgiaLKsDth0DL0GFlYRNrWvkWn1NH4UBCeQuBu7qM7+32wemN OJq1QfNvuFq9xpZfSKWSRYEdDNlqHQCG96IE9reyB1ML3DEbsVAE8OBZ9vh+2NZAch R4xDALgZd+uDIc5uB4NJGw6T4zB8XxsTQSBFvuho= From: "tkoenig at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V Date: Thu, 16 Feb 2023 17:53:31 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: tkoenig at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108826 Bug ID: 108826 Summary: Inefficient address generation on POWER and RISC-V Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tkoenig at gcc dot gnu.org Target Milestone: --- For the code (reduced from embench) struct { unsigned int table[4][100]; } * _nettle_aes_decrypt_T; unsigned int _nettle_aes_decrypt_w1; void _nettle_aes_decrypt() { _nettle_aes_decrypt_T->table[2][0] =3D _nettle_aes_decrypt_T->table[2][_nettle_aes_decrypt_w1 >> 6 & 5]; } current trunk generates 0: addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l .localentry _nettle_aes_decrypt,.-_nettle_aes_decrypt addis 9,2,.LANCHOR0+8@toc@ha lwz 9,.LANCHOR0+8@toc@l(9) addis 10,2,.LANCHOR0@toc@ha ld 10,.LANCHOR0@toc@l(10) srwi 9,9,6 andi. 9,9,0x5 addi 9,9,200 sldi 9,9,2 lwzx 9,10,9 stw 9,800(10) blr After the TOC loading, this shifts the value once, does the and, adds 200 and then shifts back the value. These two shifts are not necessary. A better alternative would be something like (please excuse any errors) srwi 9,9,4 andi 9,9,20 add 9,9,2 lwz 9,800(9) stw 9,800(9) saving an instruction. RISC-V does something similar. According to godbolt: lui a5,%hi(_nettle_aes_decrypt_w1) lw a5,%lo(_nettle_aes_decrypt_w1)(a5) lui a4,%hi(_nettle_aes_decrypt_T) ld a4,%lo(_nettle_aes_decrypt_T)(a4) srliw a5,a5,6 andi a5,a5,5 addi a5,a5,200 slli a5,a5,2 add a5,a4,a5 lw a5,0(a5) sw a5,800(a4) ret (which is why I think this is a general RTL optimization issue). x86 is much better: movl _nettle_aes_decrypt_w1(%rip), %eax movq _nettle_aes_decrypt_T(%rip), %rdx shrl $6, %eax andl $5, %eax movl 800(%rdx,%rax,4), %eax movl %eax, 800(%rdx) ret but it can use the complex addressing modes on x86.=