From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 458393858CDA; Tue, 13 Jun 2023 13:50:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 458393858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686664228; bh=2y8k9DB57Fm1K7NxvHsSyJiMiKxCr9jdQ4PPxyRs+wk=; h=From:To:Subject:Date:From; b=LVJwNspQe7JA6hb8gxKKw8bJWjUAwBtGj1Uw9DTGNnvaUfdY3F6helydBHktQErcb rhMWwXMT4xLau0cDqH8m8xBpZl7f92I8CxDtNe8CuBsILwUoi38kafYs2Ooi/jiXA9 BdH8jl7F+wSN+BMjRzbl4MykPjQ2PyxB08mRWmXM= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/110237] New: gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload Date: Tue, 13 Jun 2023 13:50:27 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110237 Bug ID: 110237 Summary: gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- When compiling the testcase with fully masked AVX512 vectorization, -march=3Dznver4 --param=3Dvect-partial-vector-usage=3D2 -fdiagnostics-plain= -output -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions RTL sched2 is presented with (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90= ]) (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] ) (const_int -4 [0xfffffffffffffffc])))) [1 MEM [(int *)vectp_b.12_28]+0 S64 A32]) (vec_merge:V16SI (reg:V16SI 21 xmm1 [118]) (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]=20 ) (const_int -4 [0xfffffffffffffffc])))) [1 MEM [(int *)vectp_b.12_28]+0 S64 A32]) (reg:HI 69 k1 [116]))) "/space/rguenther/src/gcc11queue/gcc/testsuite/gcc.dg/torture/pr58955-2.c":= 12:12 1942 {avx512f_storev16si_mask} (expr_list:REG_DEAD (reg:HI 69 k1 [116]) (expr_list:REG_DEAD (reg:DI 40 r12 [orig:90 _22 ] [90]) (expr_list:REG_DEAD (reg:V16SI 21 xmm1 [118]) (nil))))) ... (insn 41 39 42 3 (set (reg:CCZ 17 flags) (compare:CCZ (mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("b") [fla= gs 0x2] ) (const_int 4 [0x4]))) [1 b[1]+0 S4 A32]) (const_int 1 [0x1]))) "/space/rguenther/src/gcc11queue/gcc/testsuite/gcc.dg/torture/pr58955-2.c":= 15:6 11 {*cmpsi_1} (nil)) and it moves the masked store across the load of one of the destinations elements: - 32: xmm0:V16QI=3Dvec_duplicate(bx:QI) - REG_DEAD bx:QI - 33: NOTE_INSN_DELETED - 34: k1:HI=3Dunspec[xmm0:V16QI,[`*.LC0'],0x6] 146 - REG_DEAD xmm0:V16QI 36: cx:SI=3D0x1 REG_EQUIV 0x1 + 41: flags:CCZ=3Dcmp([const(`b'+0x4)],0x1) + 32: xmm0:V16QI=3Dvec_duplicate(bx:QI) + REG_DEAD bx:QI 35: xmm1:V16SI=3Dvec_duplicate(cx:SI) REG_DEAD cx:SI REG_EQUIV const_vector + 34: k1:HI=3Dunspec[xmm0:V16QI,[`*.LC0'],0x6] 146 + REG_DEAD xmm0:V16QI + 39: [`a']=3D0x2 38: [r12:DI+const(`b'-0x4)]=3Dvec_merge(xmm1:V16SI,[r12:DI+const(`b'-0x4)],k1:H= I) REG_DEAD k1:HI REG_DEAD r12:DI REG_DEAD xmm1:V16SI - 39: [`a']=3D0x2 - 41: flags:CCZ=3Dcmp([const(`b'+0x4)],0x1) the address of the masked store is computed oddly though: 14: r12:DI=3Ddx:DI<<0x2+0x4 REG_DEAD dx:DI and in the end the assembly contains leaq 4(,%rdx,4), %r12 ... cmpl $1, b+4(%rip) ... vmovdqu32 %zmm1, b-4(%r12){%k1} (%rdx is zero)=