From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 03A2838930C4; Mon, 24 May 2021 20:27:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 03A2838930C4 From: "ajidala at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/100745] GCC generates suboptimal assembly from vector extensions on AArch64 Date: Mon, 24 May 2021 20:27:45 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: ajidala at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 May 2021 20:27:46 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100745 --- Comment #1 from Nicolas F. --- I'll attach a second version of profile.c, with the vector extension code that's actually going to be used in mpv (some cleanup has been done). Performance is unchanged. Some absolute numbers from gcc 11.1.0: $ ./profile=20 old: 811703 nicolas: 262007 (3.10x as fast) niklas: 679524 (1.19x as fast) Some absolute numbers from Clang -O3: $ ./profile=20 old: 1547552 nicolas: 269081 (5.75x as fast) niklas: 246508 (6.28x as fast) As you can see, Clang does significantly worse on the C version (yay GCC!),= but significantly, and most importantly, in absolute terms, better on the vector version. Like more than twice as fast than GCC's code. Looking at GCC's assembly output, I can see some odd choices, such as shuff= ling vectors around on the stack instead of using the other scratch registers (v21-v30), whereas clang does use those scratch registers.=