From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CE48B3858C83 for ; Tue, 26 Apr 2022 14:43:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE48B3858C83 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8006B23A; Tue, 26 Apr 2022 07:43:15 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E25FF3F73B; Tue, 26 Apr 2022 07:43:14 -0700 (PDT) From: Richard Sandiford To: "Andre Vieira \(lists\)" Mail-Followup-To: "Andre Vieira \(lists\)" , "gcc-patches\@gcc.gnu.org" , Richard Biener , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , Richard Biener Subject: Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment References: <8462f41b-895f-9aca-499e-7713ec161673@arm.com> Date: Tue, 26 Apr 2022 15:43:13 +0100 In-Reply-To: <8462f41b-895f-9aca-499e-7713ec161673@arm.com> (Andre Vieira's message of "Tue, 26 Apr 2022 15:34:12 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Apr 2022 14:43:17 -0000 "Andre Vieira (lists)" writes: > Hi, > > This patch disables epilogue vectorization when we are peeling for=20 > alignment in the prologue and we can't guarantee the main vectorized=20 > loop is entered.=C2=A0 This is to prevent executing vectorized code with = an=20 > unaligned access if the target has indicated it wants to peel for=20 > alignment. We take this conservative approach as we currently do not=20 > distinguish between peeling for alignment for correctness or for=20 > performance. > > A better codegen would be to make it skip to the scalar epilogue in case= =20 > the main loop isn't entered when alignment peeling is required. However,= =20 > that would require a more aggressive change to the codebase which we=20 > chose to avoid at this point of development.=C2=A0 We can revisit this op= tion=20 > during stage 1 if we choose to. > > Bootstrapped on aarch64-none-linux and regression tested on=20 > aarch64-none-elf. > > gcc/ChangeLog: > > =C2=A0=C2=A0=C2=A0 PR tree-optimization/105219 > =C2=A0=C2=A0=C2=A0 * tree-vect-loop.cc (vect_epilogue_when_peeling_for_a= lignment): New=20 > function. > =C2=A0=C2=A0=C2=A0 (vect_analyze_loop): Use vect_epilogue_when_peeling_f= or_alignment=20 > to determine > =C2=A0=C2=A0=C2=A0 whether to vectorize epilogue. > =C2=A0=C2=A0=C2=A0 * testsuite/gcc.target/aarch64/pr105219.c: New. > =C2=A0=C2=A0=C2=A0 * testsuite/gcc.target/aarch64/pr105219-2.c: New. > =C2=A0=C2=A0=C2=A0 * testsuite/gcc.target/aarch64/pr105219-3.c: New. > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-2.c b/gcc/testsuit= e/gcc.target/aarch64/pr105219-2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..c97d1dc100181b77af0766e08= 407e1e352f604fe > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c > @@ -0,0 +1,29 @@ > +/* { dg-do run } */ > +/* { dg-options "-O3 -march=3Darmv8.2-a -mtune=3Dthunderx -fno-vect-cost= -model" } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=3D*" } { "-ma= rch=3Darmv8.2-a" } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=3D*" } { "-mt= une=3Dthunderx" } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=3D*" } } */ I think this should be in gcc.dg/vect, with the options forced for { target aarch64 }. Are the skips necessary? It looks like the test should work correctly for all options/targets. > +/* PR 105219. */ > +int data[128]; > + > +void __attribute((noipa)) > +foo (int *data, int n) > +{ > + for (int i =3D 0; i < n; ++i) > + data[i] =3D i; > +} > + > +int main() > +{ > + for (int start =3D 0; start < 16; ++start) > + for (int n =3D 1; n < 3*16; ++n) > + { > + __builtin_memset (data, 0, sizeof (data)); > + foo (&data[start], n); > + for (int j =3D 0; j < n; ++j) > + if (data[start + j] !=3D j) > + __builtin_abort (); > + } > + return 0; > +} > + > diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-3.c b/gcc/testsuit= e/gcc.target/aarch64/pr105219-3.c > new file mode 100644 > index 0000000000000000000000000000000000000000..444352fc051b787369f6f1be6= 236d1ff0fc2d392 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=3D*" } { "-ma= rch=3Darmv8.2-a" } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=3D*" } { "-mt= une=3Dthunderx" } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=3D*" } } */ > +/* { dg-options "-O3 -march=3Darmv8.2-a -mtune=3Dthunderx -fno-vect-cost= -model -fdump-tree-vect-all" } */ > +/* PR 105219. */ > +int data[128]; > + > +void foo (void) > +{ > + for (int i =3D 0; i < 9; ++i) > + data[i + 1] =3D i; > +} > + > +/* { dg-final { scan-tree-dump "EPILOGUE VECTORIZED" "vect" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219.c b/gcc/testsuite/= gcc.target/aarch64/pr105219.c > new file mode 100644 > index 0000000000000000000000000000000000000000..bbdefb549f6a4e803852f69d2= 0ce1ef9152a526c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr105219.c > @@ -0,0 +1,28 @@ > +/* { dg-do run { target aarch64_sve128_hw } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=3D*" } { "-ma= rch=3Darmv8.2-a+sve" } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=3D*" } { "-mt= une=3Dthunderx" } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=3D*" } } */ > +/* { dg-skip-if "incompatible options" { *-*-* } { "-msve-vector-bits=3D= *" } { "-msve-vector-bits=3D128" } } */ > +/* { dg-options "-O3 -march=3Darmv8.2-a+sve -msve-vector-bits=3D128 -mtu= ne=3Dthunderx" } */ Same here. > +/* PR 105219. */ > +int a; > +char b[60]; > +short c[18]; > +short d[4][19]; > +long long f; > +void e(int g, int h, short k[][19]) { > + for (signed i =3D 0; i < 3; i +=3D 2) > + for (signed j =3D 1; j < h + 14; j++) { > + b[i * 14 + j] =3D 1; > + c[i + j] =3D k[2][j]; > + a =3D g ? k[i][j] : 0; > + } > +} > +int main() { > + e(9, 1, d); > + for (long l =3D 0; l < 6; ++l) > + for (long m =3D 0; m < 4; ++m) > + f ^=3D b[l + m * 4]; > + if (f) > + __builtin_abort (); > +} > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index d7bc34636bd52b2f67cdecd3dc16fcff684dba07..a23e6181dec8126bcb691ea94= 74095bf65483863 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -2942,6 +2942,38 @@ vect_analyze_loop_1 (class loop *loop, vec_info_sh= ared *shared, > return opt_loop_vec_info::success (loop_vinfo); > } >=20=20 > +/* Function vect_epilogue_when_peeling_for_alignment > + > + PR 105219: If we are peeling for alignment in the prologue then we do= not > + vectorize the epilogue unless we are certain we will enter the main > + vectorized loop. This is to prevent entering the vectorized epilogue= in > + case there aren't enough iterations to enter the main loop. > +*/ > + > +static bool > +vect_epilogue_when_peeling_for_alignment (loop_vec_info loop_vinfo) > +{ > + if (vect_use_loop_mask_for_alignment_p (loop_vinfo)) > + return true; > + > + int prologue_peeling =3D LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); > + if (prologue_peeling > 0 && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) > + { > + poly_uint64 niters_for_main > + =3D upper_bound (LOOP_VINFO_VECT_FACTOR (loop_vinfo), > + LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)); > + niters_for_main > + =3D upper_bound (niters_for_main, > + LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo)); > + niters_for_main +=3D prologue_peeling; > + if (maybe_le (LOOP_VINFO_INT_NITERS (loop_vinfo), niters_for_main)) > + return false; > + } > + else if (prologue_peeling < 0) I was surprised that this tests < 0 rather than !=3D 0. If that's the right behaviour, could you add a comment saying why? I would have expected: prologue_peeling > 0 && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) to be handled more conservatively than: prologue_peeling > 0 && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) LGTM otherwise, but Richard B should have the final say. Thanks, Richard > + return false; > + return true; > +} > + > /* Function vect_analyze_loop. >=20=20 > Apply a set of analyses on LOOP, and create a loop_vec_info struct > @@ -3151,7 +3183,8 @@ vect_analyze_loop (class loop *loop, vec_info_share= d *shared) > } > } > /* For now only allow one epilogue loop. */ > - if (first_loop_vinfo->epilogue_vinfos.is_empty ()) > + if (first_loop_vinfo->epilogue_vinfos.is_empty () > + && vect_epilogue_when_peeling_for_alignment (first_loop_vinfo)) > { > first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo); > poly_uint64 th =3D LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);