From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] vect: Fix SLP layout handling of masked loads [PR106794]
Date: Sat, 17 Sep 2022 07:04:00 +0200 (CEST) [thread overview]
Message-ID: <alpine.DEB.2.22.394.2209170703460.5923@rguenther-XPS-13-9380> (raw)
In-Reply-To: <mptczbv4udm.fsf@arm.com>
On Fri, 16 Sep 2022, Richard Sandiford wrote:
> PR106794 shows that I'd forgotten about masked loads when
> doing the SLP layout changes. These loads can't currently
> be permuted independently of their mask input, so during
> construction they never get a load permutation.
>
> (If we did support permuting masked loads in future, the mask
> would need to be in the right order for the load, rather than in
> the order implied by the result of the permutation. Since masked
> loads can't be partly or fully scalarised in the way that normal
> permuted loads can be, there's probably no benefit to fusing the
> permutation and the load. Permutation after the fact is probably
> good enough.)
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
OK.
Thanks,
Richard.
> Richard
>
>
> gcc/
> PR tree-optimization/106794
> PR tree-optimization/106914
> * tree-vect-slp.cc (vect_optimize_slp_pass::internal_node_cost):
> Only consider loads that already have a permutation.
> (vect_optimize_slp_pass::start_choosing_layouts): Assert that
> loads with permutations are leaf nodes. Prevent any kind of grouped
> access from changing layout if it doesn't have a load permutation.
>
> gcc/testsuite/
> * gcc.dg/vect/pr106914.c: New test.
> * g++.dg/vect/pr106794.cc: Likewise.
> ---
> gcc/testsuite/g++.dg/vect/pr106794.cc | 40 +++++++++++++++++++++++++++
> gcc/testsuite/gcc.dg/vect/pr106914.c | 15 ++++++++++
> gcc/tree-vect-slp.cc | 30 ++++++++++++++------
> 3 files changed, 77 insertions(+), 8 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/vect/pr106794.cc
> create mode 100644 gcc/testsuite/gcc.dg/vect/pr106914.c
>
> diff --git a/gcc/testsuite/g++.dg/vect/pr106794.cc b/gcc/testsuite/g++.dg/vect/pr106794.cc
> new file mode 100644
> index 00000000000..f056563c4e1
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr106794.cc
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +/* { dg-additional-options "-march=bdver2" { target x86_64-*-* i?86-*-* } } */
> +
> +template <class T> struct Vector3 {
> + Vector3();
> + Vector3(T, T, T);
> + T length() const;
> + T x, y, z;
> +};
> +template <class T>
> +Vector3<T>::Vector3(T _x, T _y, T _z) : x(_x), y(_y), z(_z) {}
> +Vector3<float> cross(Vector3<float> a, Vector3<float> b) {
> + return Vector3<float>(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z,
> + a.x * b.y - a.y * b.x);
> +}
> +template <class T> T Vector3<T>::length() const { return z; }
> +int generateNormals_i;
> +float generateNormals_p2_0, generateNormals_p0_0;
> +struct SphereMesh {
> + void generateNormals();
> + float vertices;
> +};
> +void SphereMesh::generateNormals() {
> + Vector3<float> *faceNormals = new Vector3<float>;
> + for (int j; j; j++) {
> + float *p0 = &vertices + 3, *p1 = &vertices + j * 3, *p2 = &vertices + 3,
> + *p3 = &vertices + generateNormals_i + j * 3;
> + Vector3<float> v0(p1[0] - generateNormals_p0_0, p1[1] - 1, p1[2] - 2),
> + v1(0, 1, 2);
> + if (v0.length())
> + v1 = Vector3<float>(p3[0] - generateNormals_p2_0, p3[1] - p2[1],
> + p3[2] - p2[2]);
> + else
> + v1 = Vector3<float>(generateNormals_p0_0 - p3[0], p0[1] - p3[1],
> + p0[2] - p3[2]);
> + Vector3<float> faceNormal = cross(v0, v1);
> + faceNormals[j] = faceNormal;
> + }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr106914.c b/gcc/testsuite/gcc.dg/vect/pr106914.c
> new file mode 100644
> index 00000000000..9d9b3e30081
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr106914.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-fprofile-generate" } */
> +/* { dg-additional-options "-mavx512vl" { target x86_64-*-* i?86-*-* } } */
> +
> +int *mask_slp_int64_t_8_2_x, *mask_slp_int64_t_8_2_y, *mask_slp_int64_t_8_2_z;
> +
> +void
> +__attribute__mask_slp_int64_t_8_2() {
> + for (int i; i; i += 8) {
> + mask_slp_int64_t_8_2_x[i + 6] =
> + mask_slp_int64_t_8_2_y[i + 6] ? mask_slp_int64_t_8_2_z[i] : 1;
> + mask_slp_int64_t_8_2_x[i + 7] =
> + mask_slp_int64_t_8_2_y[i + 7] ? mask_slp_int64_t_8_2_z[i + 7] : 2;
> + }
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index ca3422c2a1e..229f2663ebc 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -4494,7 +4494,8 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree node, int in_layout_i,
> stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node);
> if (rep
> && STMT_VINFO_DATA_REF (rep)
> - && DR_IS_READ (STMT_VINFO_DATA_REF (rep)))
> + && DR_IS_READ (STMT_VINFO_DATA_REF (rep))
> + && SLP_TREE_LOAD_PERMUTATION (node).exists ())
> {
> auto_load_permutation_t tmp_perm;
> tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node));
> @@ -4569,8 +4570,12 @@ vect_optimize_slp_pass::start_choosing_layouts ()
> if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
> {
> /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node
> - unpermuted, record a layout that reverses this permutation. */
> - gcc_assert (partition.layout == 0);
> + unpermuted, record a layout that reverses this permutation.
> +
> + We would need more work to cope with loads that are internally
> + permuted and also have inputs (such as masks for
> + IFN_MASK_LOADs). */
> + gcc_assert (partition.layout == 0 && !m_slpg->vertices[node_i].succ);
> if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt))
> continue;
> dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt);
> @@ -4684,12 +4689,21 @@ vect_optimize_slp_pass::start_choosing_layouts ()
> vertex.weight = vect_slp_node_weight (node);
>
> /* We do not handle stores with a permutation, so all
> - incoming permutations must have been materialized. */
> + incoming permutations must have been materialized.
> +
> + We also don't handle masked grouped loads, which lack a
> + permutation vector. In this case the memory locations
> + form an implicit second input to the loads, on top of the
> + explicit mask input, and the memory input's layout cannot
> + be changed.
> +
> + On the other hand, we do support permuting gather loads and
> + masked gather loads, where each scalar load is independent
> + of the others. This can be useful if the address/index input
> + benefits from permutation. */
> if (STMT_VINFO_DATA_REF (rep)
> - && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep)))
> - /* ??? We're forcing materialization in place
> - of the child here, we'd need special handling
> - in materialization to leave layout -1 here. */
> + && STMT_VINFO_GROUPED_ACCESS (rep)
> + && !SLP_TREE_LOAD_PERMUTATION (node).exists ())
> partition.layout = 0;
>
> /* We cannot change the layout of an operation that is
> --
> 2.25.1
>
>
prev parent reply other threads:[~2022-09-17 5:04 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 8:05 Richard Sandiford
2022-09-17 5:04 ` Richard Biener [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.22.394.2209170703460.5923@rguenther-XPS-13-9380 \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).