From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id F38BB3858403; Tue, 31 Aug 2021 00:15:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F38BB3858403 From: "wilson at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/102139] New: -O3 miscompile due to slp-vectorize on strict align target Date: Tue, 31 Aug 2021 00:15:01 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: wilson at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2021 00:15:02 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102139 Bug ID: 102139 Summary: -O3 miscompile due to slp-vectorize on strict align target Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- This was originally reported here. https://github.com/riscv/riscv-gcc/issues/289 This testcase is miscompiled at -O3 for a riscv64 target, though this is no= t a bug in the riscv64 port. I think it will fail for any strict align target. typedef unsigned short uint16_t; void zero_two_uint16(uint16_t* ptr) { ptr[0] =3D 0; ptr[1] =3D 0; } void zero(uint16_t* ptr) { for (int i =3D 0; i < 16; ++i) { zero_two_uint16(ptr); ptr +=3D 2; } } The output is zero: sd zero,0(a0) sd zero,8(a0) sd zero,16(a0) sd zero,24(a0) sd zero,32(a0) sd zero,40(a0) sd zero,48(a0) sd zero,56(a0) ret which fails due to unaligned accesses as a0 only has 2 byte alignment. A git bisect tracked the problem down to this commit. commit f5e18dd Author: Kewen Lin linkw@gcc.gnu.org Date: Tue Nov 3 02:51:47 2020 +0000 pass: Run cleanup passes before SLP [PR96789] ... I get correct code if I disable the fre4 pass, which is the fre pass inside pre_slp_scalar_cleanup which was added by this patch. The 169t.vectorize pass adds an address alignment check, and then emits a l= oop with double-word stores if aligned, and a loop with half-word stores if unaligned. 172t.cunroll fully unrolls both loops. The 173t.fre4 pass dele= tes a phi node before the half-word stores. The 172t output has [local count: 12627204]: # ptr_3 =3D PHI # ivtmp_15 =3D PHI <16(2)> *ptr_3 =3D 0; and the 173t.fre4 output has [local count: 12627204]: *ptr_4(D) =3D 0; In the 175t.slp1 pass, the block of half-word stores gets vectorized which = is wrong. Then later 207t.dce7 notices duplicate code and deletes the second block of stores. Comparing the full slp1 dump with fre4 disabled versus the unmodified slp1 dump, I see that the first significant difference is when computing pointer alignment. With fre4 disabled, I get tmp.c:4:10: note: recording new base alignment for vectp_ptr.8_125 alignment: 8 misalignment: 0 based on: MEM [(uint16_t *)vectp_ptr.8_125] =3D { 0, 0, 0, 0 }; tmp.c:4:10: note: recording new base alignment for ptr_3 alignment: 2 misalignment: 0 based on: *ptr_3 =3D 0; tmp.c:4:10: note: =3D=3D=3D vect_slp_analyze_instance_alignment =3D=3D=3D tmp.c:4:10: note: vect_compute_data_ref_alignment: tmp.c:4:10: note: can't force alignment of ref: *ptr_3 It then refuses to vectorize. With the unmodified compiler I get tmp.c:4:10: note: recording new base alignment for ptr_4(D) alignment: 8 misalignment: 0 based on: MEM [(uint16_t *)ptr_4(D)] = =3D { 0, 0, 0, 0 }; tmp.c:4:10: note: =3D=3D=3D vect_slp_analyze_instance_alignment =3D=3D=3D tmp.c:4:10: note: vect_compute_data_ref_alignment: tmp.c:4:10: missed: misalign =3D 0 bytes of ref *ptr_4(D) and then goes ahead and vectorizes which is wrong. Maybe fre4 shouldn't optimize away a phi node when the pointers have differ= ent alignment? I noticed that before slp1 runs, the double-word store block has # ALIGN =3D 8, MISALIGN =3D 0 but the half-word store block does not. After slp1 runs, both the double-w= ord store and the half-word store block have these notes.=