From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 795AE3858C54; Sun, 8 Oct 2023 08:45:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 795AE3858C54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1696754751; bh=X5qz+hctzrfWiPf0XYvCx2spPzPl6u9IJ6xQdfg2GoA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=cYr5gj7MCI8IHHz/yJe89R2Kki2BpIBvnsqPuH4JrrHzd5M7cVmcS/QSE2g9kCVvx NbkB+heLTT6hU3gjbnr0XfFZyK3MqXPsaQt6AzAVPA7vIdxCrt7WKtJT7Eio0J5EGO 1wD2irEASHrwKyH2VBbx9rVudYEhSqP07u7ZWQK4= From: "guojie at loongson dot cn" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/111403] LoongArch: Wrong code with -O -mlasx -fopenmp-simd Date: Sun, 08 Oct 2023 08:45:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: guojie at loongson dot cn X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111403 Guo Jie changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |guojie at loongson dot cn --- Comment #2 from Guo Jie --- It seems that =E2=80=9Comp simd reduction=E2=80=9D cannot collaborate well = with =E2=80=9Cloop peeling=E2=80=9D, which will result in a probability error in this test case. LoongArch tree vect pass dump: # =E2=80=9Comp simd=E2=80=9D temporary arrays. struct S D.3833[8]; struct S D.3832[8]; ... # prologue loop. [local count: 723433550]: MEM [(struct S *)&D.3832][0].s =3D 0; _44 =3D D.3832[0].s; _41 =3D (long unsigned int) i_1; _58 =3D _41 * 4; _59 =3D a_18(D) + _58; _60 =3D _59->s; _61 =3D _44 + _60; D.3832[0].s =3D _61; _64 =3D D.3833[0].s; _65 =3D D.3832[0].s; _66 =3D _64 + _65; D.3833[0].s =3D _66; # Save temporary reduction results. MEM [(struct S *)&D.3832][0].s =3D _66; _69 =3D b_28(D) + _58; _70 =3D MEM [(const struct S &)&D.3832][0].s; _69->s =3D _70; i_72 =3D i_1 + 1; ivtmp_73 =3D ivtmp_2 - 1; ivtmp_78 =3D ivtmp_77 + 1; if (ivtmp_78 < prolog_loop_niters.42_7) goto ; [85.71%] else goto ; [14.29%] [local count: 620085901]: goto ; [100.00%] # vector body loop. [local count: 118111599]: # i_48 =3D PHI # ivtmp_55 =3D PHI # vectp_a.50_126 =3D PHI # vectp_b.58_158 =3D PHI # ivtmp_161 =3D PHI MEM [(struct S *)&D.3832] =3D { 0, 0, 0, 0, 0, 0, 0, 0 }; _16 =3D (long unsigned int) i_48; _17 =3D _16 * 4; _19 =3D a_18(D) + _17; vect__20.52_128 =3D MEM [(int *)vectp_a.50_126]; _20 =3D _19->s; MEM [(int *)&D.3832] =3D vect__20.52_128; vect__24.54_131 =3D MEM [(int *)&D.3833]; # Wrong value. ... vect__26.56_133 =3D vect__20.52_128 + vect__24.54_131; ... if (ivtmp_162 < bnd.44_109) goto ; [0.00%] else goto ; [100.00%] ... The temporary reduction result of =E2=80=9Cprologue loop=E2=80=9D is only s= tored in D.3833[0], and all other elements of D.3833 are 0. Therefore, only the first element of vect__26.56_133 accumulates the scalar reduction result of =E2=80=9Cprologu= e loop=E2=80=9D.=20 I think the reasonable solution should be to broadcast the scalar reduction result of =E2=80=9Cprologue loop=E2=80=9D to all elements of D.3833.=