From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 69A2B385840F
 for <gcc-patches@gcc.gnu.org>; Thu,  4 Nov 2021 17:48:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 69A2B385840F
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 075771063;
 Thu,  4 Nov 2021 10:48:37 -0700 (PDT)
Received: from localhost (unknown [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 68C2B3F719;
 Thu,  4 Nov 2021 10:48:36 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Mail-Followup-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>,
 Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [AArch64] Fix NEON load/store gimple lowering and big-endian
 testisms
References: <0911f6c6-7832-026c-36c1-cbb576944a62@arm.com>
Date: Thu, 04 Nov 2021 17:48:35 +0000
In-Reply-To: <0911f6c6-7832-026c-36c1-cbb576944a62@arm.com> (Andre Vieira's
 message of "Thu, 4 Nov 2021 09:55:37 +0000")
Message-ID: <mpta6ij7qx8.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, BODY_8BITS,
 GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Nov 2021 17:48:39 -0000

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> Hi,
>
> This should address the ubsan bootstrap build and big-endian testisms=20
> reported against the last NEON load/store gimple lowering patch. I also=20
> fixed a follow-up issue where the alias information was leading to a bad=
=20
> codegen transformation. The NEON intrinsics specifications do not forbid=
=20
> the use of memory accesses with different pointer types. In fact you=20
> will see intrinsic user code loading a int16x8_t vector from an int=20
> pointer, so we must make sure GCC is aware a NEON memory access of an=20
> 'int' pointer can alias with a 'short' pointer.
>
> Bootstrapped aarch64-linux-gnu (also did an ubsan bootstrap).
>
> Is this OK for trunk?
>
> gcc/ChangeLog:
>
>  =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * config/aarch64/aarch64-buil=
tins.c=20
> (aarch64_general_gimple_fold_builtin): Change pointer alignment and alias.
>
> gcc/testsuite/ChangeLog:
>
>  =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.target/aarch64/fmla_int=
rinsic_1.c: Fix big-endian testism.
>  =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.target/aarch64/fmls_int=
rinsic_1.c: Likewise.
>  =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.target/aarch64/fmul_int=
rinsic_1.c: Likewise.
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/a=
arch64-builtins.c
> index a815e4cfbccab692ca688ba87c71b06c304abbfb..fc8fcb02c55e22963d2a3bf77=
b4749eb5b1c1561 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -2486,16 +2486,22 @@ aarch64_general_gimple_fold_builtin (unsigned int=
 fcode, gcall *stmt,
>  	    aarch64_simd_type_info simd_type
>  	      =3D aarch64_simd_types[mem_type];
>  	    tree elt_ptr_type =3D build_pointer_type (simd_type.eltype);
> +	    elt_ptr_type =3D build_distinct_type_copy (elt_ptr_type);
> +	    TYPE_REF_CAN_ALIAS_ALL (elt_ptr_type) =3D true;
>  	    tree zero =3D build_zero_cst (elt_ptr_type);
>  	    gimple_seq stmts =3D NULL;
>  	    tree base =3D gimple_convert (&stmts, elt_ptr_type,
>  					args[0]);

This conversion seems redundant.  Do things work if we use args[0]
directly?

> +	    /* Use element type alignment.  */
> +	    tree access_type
> +	      =3D build_aligned_type (simd_type.itype,
> +				    TYPE_ALIGN (TREE_TYPE (simd_type.itype)));

I think simd_type.eltype is more natural than TREE_TYPE (simd_type.itype)
here, to match the pointer target type.

Same idea for the stores.

>  	    if (stmts)
>  	      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>  	    new_stmt
>  	      =3D gimple_build_assign (gimple_get_lhs (stmt),
>  				     fold_build2 (MEM_REF,
> -						  simd_type.itype,
> +						  access_type,
>  						  base, zero));
>  	  }
>  	break;
> @@ -2508,17 +2514,22 @@ aarch64_general_gimple_fold_builtin (unsigned int=
 fcode, gcall *stmt,
>  	    aarch64_simd_type_info simd_type
>  	      =3D aarch64_simd_types[mem_type];
>  	    tree elt_ptr_type =3D build_pointer_type (simd_type.eltype);
> +	    elt_ptr_type =3D build_distinct_type_copy (elt_ptr_type);
> +	    TYPE_REF_CAN_ALIAS_ALL (elt_ptr_type) =3D true;
>  	    tree zero =3D build_zero_cst (elt_ptr_type);
>  	    gimple_seq stmts =3D NULL;
>  	    tree base =3D gimple_convert (&stmts, elt_ptr_type,
>  					args[0]);
> +	    /* Use element type alignment.  */
> +	    tree access_type
> +	      =3D build_aligned_type (simd_type.itype,
> +				    TYPE_ALIGN (TREE_TYPE (simd_type.itype)));
>  	    if (stmts)
>  	      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>  	    new_stmt
> -	      =3D gimple_build_assign (fold_build2 (MEM_REF,
> -				     simd_type.itype,
> -				     base,
> -				     zero), args[1]);
> +	      =3D gimple_build_assign (fold_build2 (MEM_REF, access_type,
> +						  base, zero),
> +				     args[1]);
>  	  }
>  	break;
>=20=20
> diff --git a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c b/gcc/te=
stsuite/gcc.target/aarch64/fmla_intrinsic_1.c
> index adb787a8599af23847dd62dcd153d7cfe43dacc0..c1aeb06e74753052c2ee441b3=
61b92148f1b4b0a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
> @@ -107,10 +107,12 @@ main (int argc, char **argv)
>=20=20
>  /* vfma_lane_f64.
>     vfma_laneq_f64. */
> -/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, d\=
[0-9\]+\, d\[0-9\]+" 2 } } */
> +/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, d\=
[0-9\]+\, d\[0-9\]+" 1 { target aarch64_big_endian } } } */
> +/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, d\=
[0-9\]+\, d\[0-9\]+" 2 { target aarch64_little_endian } } } */

Could you explain this in more detail?  What happens for little-endian
vs. big-endian?

Thanks,
Richard

>=20=20
>  /* vfmaq_lane_f64.
>     vfmaq_laneq_f64.  */
> -/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
> +/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 { target aarch64_big_endian } } } */
> +/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 { target aarch64_little_endian } } } */
>=20=20
>=20=20
> diff --git a/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c b/gcc/te=
stsuite/gcc.target/aarch64/fmls_intrinsic_1.c
> index 865def28c3f4d04042ab495d232bb865cabb2b50..3137ea91e809e37de589091e9=
bbd43bfe4d221a1 100644
> --- a/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
> @@ -108,10 +108,12 @@ main (int argc, char **argv)
>=20=20
>  /* vfms_lane_f64.
>     vfms_laneq_f64.  */
> -/* { dg-final { scan-assembler-times "fmsub\\td\[0-9\]+\, d\[0-9\]+\, d\=
[0-9\]+\, d\[0-9\]+" 2 } } */
> +/* { dg-final { scan-assembler-times "fmsub\\td\[0-9\]+\, d\[0-9\]+\, d\=
[0-9\]+\, d\[0-9\]+" 1 { target aarch64_big_endian } } } */
> +/* { dg-final { scan-assembler-times "fmsub\\td\[0-9\]+\, d\[0-9\]+\, d\=
[0-9\]+\, d\[0-9\]+" 2 { target aarch64_little_endian } } } */
>=20=20
>  /* vfmsq_lane_f64.
>     vfmsq_laneq_f64.  */
> -/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
> +/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 { target aarch64_big_endian } } } */
> +/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 { target aarch64_little_endian } } } */
>=20=20
>=20=20
> diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_intrinsic_1.c b/gcc/te=
stsuite/gcc.target/aarch64/fmul_intrinsic_1.c
> index d01095e81c1e45dc1da998aa337ba551b3752ebe..7d4829c40d7042226f2f09fab=
9fdfa7c3dd211c4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/fmul_intrinsic_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/fmul_intrinsic_1.c
> @@ -107,10 +107,12 @@ main (int argc, char **argv)
>=20=20
>  /* vmul_lane_f64.
>     Vmul_laneq_f64. */
> -/* { dg-final { scan-assembler-times "fmul\\td\[0-9\]+, d\[0-9\]+, d\[0-=
9\]+" 2 } } */
> +/* { dg-final { scan-assembler-times "fmul\\td\[0-9\]+, d\[0-9\]+, d\[0-=
9\]+" 1 { target aarch64_big_endian } } } */
> +/* { dg-final { scan-assembler-times "fmul\\td\[0-9\]+, d\[0-9\]+, d\[0-=
9\]+" 2 { target aarch64_little_endian } } } */
>=20=20
>  /* vmulq_lane_f64.
>     vmulq_laneq_f64.  */
> -/* { dg-final { scan-assembler-times "fmul\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
> +/* { dg-final { scan-assembler-times "fmul\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 { target aarch64_big_endian } } } */
> +/* { dg-final { scan-assembler-times "fmul\\tv\[0-9\]+\.2d, v\[0-9\]+\.2=
d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 { target aarch64_little_endian } } } */
>=20=20
>=20=20