From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 6ABC63856DC5; Thu, 14 Jul 2022 13:18:26 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6ABC63856DC5
From: "rearnsha at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/106187] armhf: Miscompilation at O2 level (O0 / O1 are
 working)
Date: Thu, 14 Jul 2022 13:18:26 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 10.4.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rearnsha at gcc dot gnu.org
X-Bugzilla-Status: WAITING
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106187-4-VkPwdzyoKJ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106187-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106187-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Jul 2022 13:18:26 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106187
--- Comment #25 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
A quick status update.

I've managed to reduce the testcase to the latest attachment.  The program =
is
heavily reduced (so some bits likely don't make much sense), but the test s=
till
'passes' when compiled with -fno-strict-aliasing, but fails with the same e=
rror
when that option is omitted.

Looking at the assembler output of void
hwy::N_EMU128::TestMulAdd::operator()<float, hwy::N_EMU128::Simd<float, 4u,=
 0>
>(float, hwy::N_EMU128::Simd<float, 4u, 0>) [clone .isra.0]

we see (correct on left, incorrect on right):


        add     r3, sp, #148                    add     r3, sp, #148
        vmov.f32        s14, #3.0e+0            vmov.f32        s14, #3.0e+0
[1]     mov     r6, r4                          mov     r6, r4
        vmov.f32        s15, #2.0e+0            vmov.f32        s15, #2.0e+0
        add     r8, sp, #100                    add     r8, sp, #100
        add     lr, sp, #132                    add     lr, sp, #132
        ldm     r3, {r0, r1, r2, r3}            ldm     r3, {r0, r1, r2, r3}
        vstr.32 s14, [sp, #152]                 vstr.32 s14, [sp, #152]
        vmov.f32        s14, #4.0e+0            vmov.f32        s14, #4.0e+0
[2]     stm     r4, {r0, r1, r2, r3}  |         stm     r5, {r0, r1, r2, r3}
        add     ip, sp, #116                    add     ip, sp, #116
        vstr.32 s14, [sp, #156]                 vstr.32 s14, [sp, #156]
        vmov.f32        s14, #5.0e+0            vmov.f32        s14, #5.0e+0
        stm     r5, {r0, r1, r2, r3}  <
        add     r5, sp, #36                     add     r5, sp, #36
        add     r10, sp, #196                   add     r10, sp, #196
        vstr.32 s14, [sp, #160]                 vstr.32 s14, [sp, #160]
        add     r9, sp, #152                    add     r9, sp, #152
[3]     vldr.32 s14, [r6]                       vldr.32 s14, [r6]
[4]     stm     r8, {r0, r1, r2, r3}  |         stm     r4, {r0, r1, r2, r3}
        vmul.f32        s15, s14, s15           vmul.f32        s15, s14, s=
15
                                      >         stm     r8, {r0, r1, r2, r3}

at [1] we see that r6 and r4 are the same value.  We also see that at [3] a
register is read using r6 as the base.  In the good code on the left, the S=
TM
to r4 is at 2, but in the incorrect code is does not occur until 4, ie
immediately after the load at [3].

I need to dig a bit deeper now on this specific function to see if the alias
information is correct, or if it has somehow been lost/corrupted during the
compilation.=