From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6ABC63856DC5; Thu, 14 Jul 2022 13:18:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6ABC63856DC5 From: "rearnsha at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/106187] armhf: Miscompilation at O2 level (O0 / O1 are working) Date: Thu, 14 Jul 2022 13:18:26 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.4.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rearnsha at gcc dot gnu.org X-Bugzilla-Status: WAITING X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jul 2022 13:18:26 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106187 --- Comment #25 from Richard Earnshaw --- A quick status update. I've managed to reduce the testcase to the latest attachment. The program = is heavily reduced (so some bits likely don't make much sense), but the test s= till 'passes' when compiled with -fno-strict-aliasing, but fails with the same e= rror when that option is omitted. Looking at the assembler output of void hwy::N_EMU128::TestMulAdd::operator() >(float, hwy::N_EMU128::Simd) [clone .isra.0] we see (correct on left, incorrect on right): add r3, sp, #148 add r3, sp, #148 vmov.f32 s14, #3.0e+0 vmov.f32 s14, #3.0e+0 [1] mov r6, r4 mov r6, r4 vmov.f32 s15, #2.0e+0 vmov.f32 s15, #2.0e+0 add r8, sp, #100 add r8, sp, #100 add lr, sp, #132 add lr, sp, #132 ldm r3, {r0, r1, r2, r3} ldm r3, {r0, r1, r2, r3} vstr.32 s14, [sp, #152] vstr.32 s14, [sp, #152] vmov.f32 s14, #4.0e+0 vmov.f32 s14, #4.0e+0 [2] stm r4, {r0, r1, r2, r3} | stm r5, {r0, r1, r2, r3} add ip, sp, #116 add ip, sp, #116 vstr.32 s14, [sp, #156] vstr.32 s14, [sp, #156] vmov.f32 s14, #5.0e+0 vmov.f32 s14, #5.0e+0 stm r5, {r0, r1, r2, r3} < add r5, sp, #36 add r5, sp, #36 add r10, sp, #196 add r10, sp, #196 vstr.32 s14, [sp, #160] vstr.32 s14, [sp, #160] add r9, sp, #152 add r9, sp, #152 [3] vldr.32 s14, [r6] vldr.32 s14, [r6] [4] stm r8, {r0, r1, r2, r3} | stm r4, {r0, r1, r2, r3} vmul.f32 s15, s14, s15 vmul.f32 s15, s14, s= 15 > stm r8, {r0, r1, r2, r3} at [1] we see that r6 and r4 are the same value. We also see that at [3] a register is read using r6 as the base. In the good code on the left, the S= TM to r4 is at 2, but in the incorrect code is does not occur until 4, ie immediately after the load at [3]. I need to dig a bit deeper now on this specific function to see if the alias information is correct, or if it has somehow been lost/corrupted during the compilation.=