From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id CC79F395B422 for ; Sat, 17 Sep 2022 05:04:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC79F395B422 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 07D7D5C479; Sat, 17 Sep 2022 05:04:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1663391041; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=amPoEy0Fo8jSrbB5j00/tcemfEME/JX+RJeSUAZ7DBU=; b=EORZO4ys3akxUNhMYTWg2TmjEY+OkIaxW0iAY8Fb0fSh4uwt2d0NSfQtp8pKgkTN4MZXoT ZZSJHbRP4GoMA16hFt2ASDrA2OJzYEz1Rjrp64Uckj5etY1xI8ZtSvM1FySaDId3FpWZuT 7cEe7M/iGUcG9MVsrmBEB9pL9juJ9KA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1663391041; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=amPoEy0Fo8jSrbB5j00/tcemfEME/JX+RJeSUAZ7DBU=; b=UER2bvXUyzPwcOHXL/DX7YkpONfxxj1bJQrLuacyZ/EJ5cH4/hO63miubAh4iKhlI4zedI Pbzjqy0zhuapKtAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D96B113A49; Sat, 17 Sep 2022 05:04:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id AKgdNEBVJWNYaAAAMHmgww (envelope-from ); Sat, 17 Sep 2022 05:04:00 +0000 Date: Sat, 17 Sep 2022 07:04:00 +0200 (CEST) From: Richard Biener X-X-Sender: rguenther@rguenther-XPS-13-9380 To: Richard Sandiford cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] vect: Fix SLP layout handling of masked loads [PR106794] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 16 Sep 2022, Richard Sandiford wrote: > PR106794 shows that I'd forgotten about masked loads when > doing the SLP layout changes. These loads can't currently > be permuted independently of their mask input, so during > construction they never get a load permutation. > > (If we did support permuting masked loads in future, the mask > would need to be in the right order for the load, rather than in > the order implied by the result of the permutation. Since masked > loads can't be partly or fully scalarised in the way that normal > permuted loads can be, there's probably no benefit to fusing the > permutation and the load. Permutation after the fact is probably > good enough.) > > Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > PR tree-optimization/106794 > PR tree-optimization/106914 > * tree-vect-slp.cc (vect_optimize_slp_pass::internal_node_cost): > Only consider loads that already have a permutation. > (vect_optimize_slp_pass::start_choosing_layouts): Assert that > loads with permutations are leaf nodes. Prevent any kind of grouped > access from changing layout if it doesn't have a load permutation. > > gcc/testsuite/ > * gcc.dg/vect/pr106914.c: New test. > * g++.dg/vect/pr106794.cc: Likewise. > --- > gcc/testsuite/g++.dg/vect/pr106794.cc | 40 +++++++++++++++++++++++++++ > gcc/testsuite/gcc.dg/vect/pr106914.c | 15 ++++++++++ > gcc/tree-vect-slp.cc | 30 ++++++++++++++------ > 3 files changed, 77 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/vect/pr106794.cc > create mode 100644 gcc/testsuite/gcc.dg/vect/pr106914.c > > diff --git a/gcc/testsuite/g++.dg/vect/pr106794.cc b/gcc/testsuite/g++.dg/vect/pr106794.cc > new file mode 100644 > index 00000000000..f056563c4e1 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/vect/pr106794.cc > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-Ofast" } */ > +/* { dg-additional-options "-march=bdver2" { target x86_64-*-* i?86-*-* } } */ > + > +template struct Vector3 { > + Vector3(); > + Vector3(T, T, T); > + T length() const; > + T x, y, z; > +}; > +template > +Vector3::Vector3(T _x, T _y, T _z) : x(_x), y(_y), z(_z) {} > +Vector3 cross(Vector3 a, Vector3 b) { > + return Vector3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, > + a.x * b.y - a.y * b.x); > +} > +template T Vector3::length() const { return z; } > +int generateNormals_i; > +float generateNormals_p2_0, generateNormals_p0_0; > +struct SphereMesh { > + void generateNormals(); > + float vertices; > +}; > +void SphereMesh::generateNormals() { > + Vector3 *faceNormals = new Vector3; > + for (int j; j; j++) { > + float *p0 = &vertices + 3, *p1 = &vertices + j * 3, *p2 = &vertices + 3, > + *p3 = &vertices + generateNormals_i + j * 3; > + Vector3 v0(p1[0] - generateNormals_p0_0, p1[1] - 1, p1[2] - 2), > + v1(0, 1, 2); > + if (v0.length()) > + v1 = Vector3(p3[0] - generateNormals_p2_0, p3[1] - p2[1], > + p3[2] - p2[2]); > + else > + v1 = Vector3(generateNormals_p0_0 - p3[0], p0[1] - p3[1], > + p0[2] - p3[2]); > + Vector3 faceNormal = cross(v0, v1); > + faceNormals[j] = faceNormal; > + } > +} > diff --git a/gcc/testsuite/gcc.dg/vect/pr106914.c b/gcc/testsuite/gcc.dg/vect/pr106914.c > new file mode 100644 > index 00000000000..9d9b3e30081 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr106914.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-fprofile-generate" } */ > +/* { dg-additional-options "-mavx512vl" { target x86_64-*-* i?86-*-* } } */ > + > +int *mask_slp_int64_t_8_2_x, *mask_slp_int64_t_8_2_y, *mask_slp_int64_t_8_2_z; > + > +void > +__attribute__mask_slp_int64_t_8_2() { > + for (int i; i; i += 8) { > + mask_slp_int64_t_8_2_x[i + 6] = > + mask_slp_int64_t_8_2_y[i + 6] ? mask_slp_int64_t_8_2_z[i] : 1; > + mask_slp_int64_t_8_2_x[i + 7] = > + mask_slp_int64_t_8_2_y[i + 7] ? mask_slp_int64_t_8_2_z[i + 7] : 2; > + } > +} > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index ca3422c2a1e..229f2663ebc 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -4494,7 +4494,8 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree node, int in_layout_i, > stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node); > if (rep > && STMT_VINFO_DATA_REF (rep) > - && DR_IS_READ (STMT_VINFO_DATA_REF (rep))) > + && DR_IS_READ (STMT_VINFO_DATA_REF (rep)) > + && SLP_TREE_LOAD_PERMUTATION (node).exists ()) > { > auto_load_permutation_t tmp_perm; > tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node)); > @@ -4569,8 +4570,12 @@ vect_optimize_slp_pass::start_choosing_layouts () > if (SLP_TREE_LOAD_PERMUTATION (node).exists ()) > { > /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node > - unpermuted, record a layout that reverses this permutation. */ > - gcc_assert (partition.layout == 0); > + unpermuted, record a layout that reverses this permutation. > + > + We would need more work to cope with loads that are internally > + permuted and also have inputs (such as masks for > + IFN_MASK_LOADs). */ > + gcc_assert (partition.layout == 0 && !m_slpg->vertices[node_i].succ); > if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt)) > continue; > dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt); > @@ -4684,12 +4689,21 @@ vect_optimize_slp_pass::start_choosing_layouts () > vertex.weight = vect_slp_node_weight (node); > > /* We do not handle stores with a permutation, so all > - incoming permutations must have been materialized. */ > + incoming permutations must have been materialized. > + > + We also don't handle masked grouped loads, which lack a > + permutation vector. In this case the memory locations > + form an implicit second input to the loads, on top of the > + explicit mask input, and the memory input's layout cannot > + be changed. > + > + On the other hand, we do support permuting gather loads and > + masked gather loads, where each scalar load is independent > + of the others. This can be useful if the address/index input > + benefits from permutation. */ > if (STMT_VINFO_DATA_REF (rep) > - && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep))) > - /* ??? We're forcing materialization in place > - of the child here, we'd need special handling > - in materialization to leave layout -1 here. */ > + && STMT_VINFO_GROUPED_ACCESS (rep) > + && !SLP_TREE_LOAD_PERMUTATION (node).exists ()) > partition.layout = 0; > > /* We cannot change the layout of an operation that is > -- > 2.25.1 > >