From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id F08533858D26 for ; Wed, 22 May 2024 10:58:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F08533858D26 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F08533858D26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716375542; cv=none; b=xZYzUxfyRHfCzgqQXYVAq4zn1G+NBtjXe/pQFT+0d4yvB9Iv6oi8ZgtUjjpEwPfiukPmsypCnK56a11NhJowym4fggK4LquMpXfuqQ9BRuP9SCNboRW/AR5ksO6mzuWQbpu0XRfJB0ZKo/u580pvRPrUz/5SseaZom28rnlCVP0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716375542; c=relaxed/simple; bh=7tpRaTNVIgU9WjU1DUWBYLEKQBd2LZlJcYUynsdIqjw=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=WgzhtjTzTVXybUpAAYaejE+ka2PJY3M3+jEdYq5fcAJhG3YE7+6XaQlt2gHlbF1LECtngSdTOCwDAfQR0GquxKFjEfCHfg6yCBD2npbUcMzUSv1nSigutRKY8Mscictl/KMcBIRNlj0vMr7lT8aWKxnb7NTfJ2fPWreXVIZ4zX4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DED2E34CAC; Wed, 22 May 2024 10:58:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1716375539; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+SUmDZXG6lxS1HYb+gg8DqlfEyb0eci5HvLqbgig31M=; b=j2NcCEd9H0aXS+5jsmHdZTWoeVGnGbFMA1pcgIfhFIegr8oLzJHF2KcJDmFKXD5QKGaFQ6 9JbOK1uojGNioQQN5hMr6B2l4o2PUBcLwXizHNc04Ksf6YmfmdvwWJl7bhMKhD9sHO5/XR rwgjLdzih0b230PQ0CPbi1Sc77spLew= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1716375539; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+SUmDZXG6lxS1HYb+gg8DqlfEyb0eci5HvLqbgig31M=; b=eXs9R3CIdQGcDXHDUVZg1+GIfNnI8O1rcKa8S7bQwO0F0xZnHD3RkF6a2BnsYZtIyZJ5wX 5UEhJ2tS0M27oHAg== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1716375538; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+SUmDZXG6lxS1HYb+gg8DqlfEyb0eci5HvLqbgig31M=; b=M2awhxgIldLjGFyCdtfvRydjxEp2e4ArSJihEkKXb/AOBVV2sb8M+TW99+D4a+596D0g0i TNBazJhTQ1I3Y6h3HuD/ssKzW6GDRqn+KIpCHVCxygqBX4sku7u86I08nBWSXtXuSz54+a im43RJu3qdhKgODSX7fZSRHIR8JkT5E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1716375538; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+SUmDZXG6lxS1HYb+gg8DqlfEyb0eci5HvLqbgig31M=; b=0vtodG4y63Md5wmJBcMIz2FD7809Qz1nHk1Y2zKSRIPAr5hGX/HRkBze5U+VEd2maW3f5T ihVwHBv+c9FAAZAg== Date: Wed, 22 May 2024 12:58:58 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: tamar.christina@arm.com, richard.sandiford@arm.com Subject: Re: [PATCH 4/4] Testsuite updates Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Score: -3.30 X-Spam-Level: X-Spamd-Result: default: False [-3.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; FAKE_REPLY(1.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; MISSING_XM_UA(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; FROM_HAS_DN(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_NONE(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[gcc.target:url,suse.de:email] X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 21 May 2024, Richard Biener wrote: > The gcc.dg/vect/slp-12a.c case is interesting as we currently split > the 8 store group into lanes 0-5 which we SLP with an unroll factor > of two (on x86-64 with SSE) and the remaining two lanes are using > interleaving vectorization with a final unroll factor of four. Thus > we're using hybrid SLP within a single store group. After the change > we discover the same 0-5 lane SLP part as well as two single-lane > parts feeding the full store group. But that results in a load > permutation that isn't supported (I have WIP patchs to rectify that). > So we end up cancelling SLP and vectorizing the whole loop with > interleaving which is IMO good and results in better code. > > This is similar for gcc.target/i386/pr52252-atom.c where interleaving > generates much better code than hybrid SLP. I'm unsure how to update > the testcase though. > > gcc.dg/vect/slp-21.c runs into similar situations. Note that when > when analyzing SLP operations we discard an instance we currently > force the full loop to have no SLP because hybrid detection is > broken. It's probably not worth fixing this at this moment. > > For gcc.dg/vect/pr97428.c we are not splitting the 16 store group > into two but merge the two 8 lane loads into one before doing the > store and thus have only a single SLP instance. A similar situation > happens in gcc.dg/vect/slp-11c.c but the branches feeding the > single SLP store only have a single lane. Likewise for > gcc.dg/vect/vect-complex-5.c and gcc.dg/vect/vect-gather-2.c. > > gcc.dg/vect/slp-cond-1.c has an additional SLP vectorization > with a SLP store group of size two but two single-lane branches. > > gcc.target/i386/pr98928.c ICEs in SLP permute optimization > because we don't expect a constant and internal branch to be > merged with a permute node in > vect_optimize_slp_pass::change_vec_perm_layout:4859 (the only > permutes merging two SLP nodes are two-operator nodes right now). > This still requires fixing. > > The whole series has been bootstrapped and tested on > x86_64-unknown-linux-gnu with the gcc.target/i386/pr98928.c FAIL > unfixed. > > Comments welcome (and hello ARM CI), RISC-V and other arch > testing appreciated. Unless there are comments to the contrary > I plan to push patch 1 and 2 tomorrow. RISC-V CI didn't trigger (not sure what magic is required). Both ARM and AARCH64 show that the "Vectorizing stmts using SLP" are a bit fragile because we sometimes cancel SLP becuase we want to use load/store-lanes. I have locally scrapped the SLP scanning for gcc.dg/vect/slp-21.c where it doesn't really matter (and if we are finished with all-SLP it will matter nowhere). I've conditionalized the outcome based on vect_load_lanes for gcc.dg/vect/slp-11c.c and gcc.dg/vect/slp-cond-1.c On AARCH64 additionally gcc.target/aarch64/sve/mask_struct_store_4.c ICEs, I have a fix for that. gcc.target/aarch64/pr99873_2.c FAILs because with a single SLP store group merged from two two-lane load groups we cancel the SLP and want to use load/store-lanes. I'll leave this FAILing or shall I XFAIL it? Thanks, Richard. > Thanks, > Richard. > > * gcc.dg/vect/pr97428.c: Expect a single store SLP group. > * gcc.dg/vect/slp-11c.c: Likewise. > * gcc.dg/vect/vect-complex-5.c: Likewise. > * gcc.dg/vect/slp-12a.c: Do not expect SLP. > * gcc.dg/vect/slp-21.c: Likewise. > * gcc.dg/vect/slp-cond-1.c: Expect one more SLP. > * gcc.dg/vect/vect-gather-2.c: Expect SLP to be used. > * gcc.target/i386/pr52252-atom.c: XFAIL test for palignr. > --- > gcc/testsuite/gcc.dg/vect/pr97428.c | 2 +- > gcc/testsuite/gcc.dg/vect/slp-11c.c | 5 +++-- > gcc/testsuite/gcc.dg/vect/slp-12a.c | 6 +++++- > gcc/testsuite/gcc.dg/vect/slp-21.c | 19 +++++-------------- > gcc/testsuite/gcc.dg/vect/slp-cond-1.c | 2 +- > gcc/testsuite/gcc.dg/vect/vect-complex-5.c | 2 +- > gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 1 - > gcc/testsuite/gcc.target/i386/pr52252-atom.c | 3 ++- > 8 files changed, 18 insertions(+), 22 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c b/gcc/testsuite/gcc.dg/vect/pr97428.c > index 60dd984cfd3..3cc9976c00c 100644 > --- a/gcc/testsuite/gcc.dg/vect/pr97428.c > +++ b/gcc/testsuite/gcc.dg/vect/pr97428.c > @@ -44,5 +44,5 @@ void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n) > /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" "vect" } } */ > /* We're not able to peel & apply re-aligning to make accesses well-aligned for !vect_hw_misalign, > but we could by peeling the stores for alignment and applying re-aligning loads. */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { ! vect_hw_misalign } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_hw_misalign } } } } */ > /* { dg-final { scan-tree-dump-not "gap of 6 elements" "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c b/gcc/testsuite/gcc.dg/vect/slp-11c.c > index 0f680cd4e60..169b0d10eec 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-11c.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c > @@ -13,7 +13,8 @@ main1 () > unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}; > float out[N*8]; > > - /* Different operations - not SLPable. */ > + /* Different operations - we SLP the store and split the group to two > + single-lane branches. */ > for (i = 0; i < N*4; i++) > { > out[i*2] = ((float) in[i*2] * 2 + 6) ; > @@ -44,4 +45,4 @@ int main (void) > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } */ > /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c > index 973de6ada21..2f98dc9da0b 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-12a.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c > @@ -40,6 +40,10 @@ main1 () > out[i*8 + 3] = b3 - 1; > out[i*8 + 4] = b4 - 8; > out[i*8 + 5] = b5 - 7; > + /* Due to the use in the ia[i] store we keep the feeding expression > + in the form ((in[i*8 + 6] + 11) * 3 - 3) while other expressions > + got associated as for example (in[i*5 + 5] * 4 + 33). That > + causes SLP discovery to fail. */ > out[i*8 + 6] = b6 - 3; > out[i*8 + 7] = b7 - 7; > > @@ -76,5 +80,5 @@ int main (void) > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */ > /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */ > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c > index 58751688414..dc153a53b47 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-21.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-21.c > @@ -12,6 +12,7 @@ main1 () > unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, a0, a1, a2, a3, b5; > unsigned short in[N*8]; > > +#pragma GCC novector > for (i = 0; i < N*8; i++) > { > in[i] = i; > @@ -202,18 +203,8 @@ int main (void) > return 0; > } > > -/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target { vect_strided4 || vect_extract_even_odd } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided4 || vect_extract_even_odd } } } } } */ > -/* Some targets can vectorize the second of the three main loops using > - hybrid SLP. For 128-bit vectors, the required 4->3 permutations are: > - > - { 0, 1, 2, 4, 5, 6, 8, 9 } > - { 2, 4, 5, 6, 8, 9, 10, 12 } > - { 5, 6, 8, 9, 10, 12, 13, 14 } > - > - Not all vect_perm targets support that, and it's a bit too specific to have > - its own effective-target selector, so we just test targets directly. */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided4 } } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { vect_strided4 || vect_extract_even_odd } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided4 || vect_extract_even_odd } } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" { xfail *-*-* } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c > index 450c7141c96..16ab0cc7605 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c > @@ -125,4 +125,4 @@ main () > return 0; > } > > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c > index addcf60438c..ac562dc475c 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c > @@ -41,4 +41,4 @@ main (void) > } > > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c > index 4c23b808333..10e64e64d47 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c > @@ -36,6 +36,5 @@ f3 (int *restrict y, int *restrict x, int *restrict indices) > } > } > > -/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */ > /* { dg-final { scan-tree-dump "different gather base" vect { target { ! vect_gather_load_ifn } } } } */ > /* { dg-final { scan-tree-dump "different gather scale" vect { target { ! vect_gather_load_ifn } } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom.c b/gcc/testsuite/gcc.target/i386/pr52252-atom.c > index 11f94411575..02736d56d31 100644 > --- a/gcc/testsuite/gcc.target/i386/pr52252-atom.c > +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom.c > @@ -25,4 +25,5 @@ matrix_mul (byte *in, byte *out, int size) > } > } > > -/* { dg-final { scan-assembler "palignr" } } */ > +/* We are no longer using hybrid SLP. */ > +/* { dg-final { scan-assembler "palignr" { xfail *-*-* } } } */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)