From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 902403858D32; Mon, 13 Mar 2023 13:06:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 902403858D32 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678712794; bh=KQYDqj6OE6ZlbC/yXN9qRD7EENnIA8l2jhLwVzbI7Ac=; h=From:To:Subject:Date:In-Reply-To:References:From; b=qafZ/7H0uGGDsVdwxUEtJPXnkRlnVO1AJ/l4jzcvudR9ZKC/qCjtsn+QQCZYiyFyz UwmukM9ecEzTk7vnUlT42vwp+RViLaWQEZdWTp1/8oYLqugrK+Fa5+In5Q2y7PiZ5h OPq2QJNHd5MpR5+lrSRT1TECmmpUnsAo2xg35VBs= From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109093] [13 regression] csmith: a February runtime bug ? Date: Mon, 13 Mar 2023 13:06:33 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: needs-reduction, wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109093 Jakub Jelinek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hjl.tools at gmail dot com --- Comment #5 from Jakub Jelinek --- My patch just caused far more .DEFERRED_INITs to be optimized away for dead variables (though, as can be seen on #c0 apparently not all). What I see on the #c0 testcase looks like a x86 backend bug to me. In func_2.constprop.0.isra.0 there is in optimized dump: uint64_t * * * * const * l_2254[6]; variable and the IL mentions it just in l_2254 =3D .DEFERRED_INIT (48, 2, &"l_2254"[0]); and l_2254 =3D{v} {CLOBBER(eol)}; (the latter in 2 spots) statements. Why the .DEFERRED_INIT hasn't been DSE= d is certainly a question. Anyway, l_2254 has 128-bit alignment (supposedly due to ix86_local_alignment and psABI requirements). Expansion expands that .DEFERRED_INIT into: (insn 23 22 24 5 (parallel [ (set (reg:DI 162) (plus:DI (reg/f:DI 19 frame) (const_int -48 [0xffffffffffffffd0]))) (clobber (reg:CC 17 flags)) ]) "runData/keep/in.16651.c":199:34 247 {*adddi_1} (nil)) (insn 24 23 25 5 (set (reg:V32QI 163) (const_vector:V32QI [ (const_int 0 [0]) repeated x32 ])) "runData/keep/in.16651.c":199:34 1823 {movv32qi_internal} (nil)) (insn 25 24 26 5 (set (mem/c:V16QI (reg:DI 162) [0 MEM [(void *)_157]+0 S16 A128]) (vec_select:V16QI (reg:V32QI 163) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))) "runData/keep/in.16651.c":199:34 4383 {vec_extract_lo_v32qi} (nil)) (insn 26 25 27 5 (set (mem/c:V16QI (plus:DI (reg:DI 162) (const_int 16 [0x10])) [0 MEM [(void *)_157]+16 S16 A128]) (vec_select:V16QI (reg:V32QI 163) (parallel [ (const_int 16 [0x10]) (const_int 17 [0x11]) (const_int 18 [0x12]) (const_int 19 [0x13]) (const_int 20 [0x14]) (const_int 21 [0x15]) (const_int 22 [0x16]) (const_int 23 [0x17]) (const_int 24 [0x18]) (const_int 25 [0x19]) (const_int 26 [0x1a]) (const_int 27 [0x1b]) (const_int 28 [0x1c]) (const_int 29 [0x1d]) (const_int 30 [0x1e]) (const_int 31 [0x1f]) ]))) "runData/keep/in.16651.c":199:34 4384 {vec_extract_hi_v32qi} (nil)) (insn 27 26 28 5 (set (mem/c:V16QI (plus:DI (reg:DI 162) (const_int 32 [0x20])) [0 MEM [(void *)_157]+32 S16 A128]) (subreg:V16QI (reg:V32QI 163) 0)) "runData/keep/in.16651.c":199:34 = 1824 {movv16qi_internal} (nil)) cmpelim dump still has: (insn 279 6 25 4 (set (reg/f:DI 38 r10 [215]) (plus:DI (reg/f:DI 7 sp) (const_int -48 [0xffffffffffffffd0]))) 241 {*leadi} (expr_list:REG_EQUIV (plus:DI (reg/f:DI 19 frame) (const_int -48 [0xffffffffffffffd0])) (nil))) (insn 25 279 26 4 (set (reg:V16QI 21 xmm1 [orig:218 MEM [(void *)_157] ] [218]) (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (expr_list:REG_EQUIV (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]) (nil))) (insn 26 25 34 4 (set (reg:V16QI 20 xmm0 [orig:219 MEM [(void *)_157]+16 ] [219]) (reg:V16QI 21 xmm1)) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (expr_list:REG_EQUIV (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]) (nil))) before the loop and (insn 289 22 290 5 (set (mem/c:V16QI (reg/f:DI 38 r10 [215]) [0 MEM [(void *)_157]+0 S16 A128]) (reg:V16QI 21 xmm1 [orig:218 MEM [(void *)_157] ] [218= ])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (nil)) (insn 290 289 291 5 (set (mem/c:V16QI (plus:DI (reg/f:DI 38 r10 [215]) (const_int 16 [0x10])) [0 MEM [(void *)_157]+16 S16 A128]) (reg:V16QI 20 xmm0 [orig:219 MEM [(void *)_157]+16 ] [219])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (nil)) (insn 291 290 29 5 (set (mem/c:V16QI (plus:DI (reg/f:DI 38 r10 [215]) (const_int 32 [0x20])) [0 MEM [(void *)_157]+32 S16 A128]) (reg:V16QI 20 xmm0 [orig:219 MEM [(void *)_157]+16 ] [219])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (nil)) in the loop. stack_alignment_needed is 128, but then pro_and_epilogue decid= es to do: (insn/f 337 315 338 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg/f:DI 6 bp)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 338 337 339 2 (set (reg/f:DI 6 bp) (reg/f:DI 7 sp)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 339 338 340 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 41 r13)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 340 339 341 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 40 r12)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 341 340 342 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 3 bx)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn 342 341 343 2 (set (mem/v:BLK (scratch:DI) [0 A8]) (unspec:BLK [ (mem/v:BLK (scratch:DI) [0 A8]) ] UNSPEC_MEMORY_BLOCKAGE)) "runData/keep/in.16651.c":157:16 -1 (nil)) ... (insn 279 6 25 3 (set (reg/f:DI 38 r10 [215]) (plus:DI (reg/f:DI 7 sp) (const_int -48 [0xffffffffffffffd0]))) 241 {*leadi} (expr_list:REG_EQUIV (plus:DI (reg/f:DI 19 frame) (const_int -48 [0xffffffffffffffd0])) (nil))) which ends up: pushq %rbp .LCFI5: movq %rsp, %rbp .LCFI6: pushq %r13 pushq %r12 pushq %rbx ... leaq -48(%rsp), %r10 ... vmovdqa %xmm1, (%r10) vmovdqa %xmm0, 16(%r10) movl $8, %esi vmovdqa %xmm0, 32(%r10) But, this result in unaligned stores, because %rsp on entry to x86_64 funct= ions should be (%rsp & 15) =3D=3D 8, such that %rbp is 16-byte aligned, and then it does 3 pushes (24 bytes) and allocates l_2254 48 bytes below th= at, so at %rbp - 72 bytes, so all the 3 vector stores are unaligned ((%r10 & 15) =3D=3D 8).=