From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id 2386C3858CD1 for ; Tue, 21 May 2024 12:48:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2386C3858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2386C3858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716295707; cv=none; b=p0vVir6UvDRd1xYTIWNqgYafMUjpGGVzZarw/5/DQ4iZNVoy1tZLQ60HLz33tPv9GRoiZr1XKQsQS2gTKO7k325OSkca8TTbe5T3EcKBnSMmvoB0DubYZvBttuC2Fg3Z5+gW38ix1Jq0rkBbKdfltJScT0pfPAFMmj/cPRt/Pzo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716295707; c=relaxed/simple; bh=qOJ6Vle6zSoszBpvqMmOKqVXtnYKRt+/vd+jasYVFCA=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version:Message-Id; b=KJauWr4iICT/u4KXJH7s3kXF1EZFejIsG0Jfy7GVKdxNT3DVhshOhdAvw/ijzZ0ir7a+oH6q8ohbCRAz3liJ3gdoUl+pdZCSYVhpa/x7hFF2PhdLYVXKwRyLU2ee4YlDSssocNkqLyC1YJ2AfWXEhQTaF/iVi/nikGsONslZhaE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 191B5222F7; Tue, 21 May 2024 12:48:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1716295703; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=lmecT0mEFJA9gqFqutDYmOmYBo4dilsV5EIUZpSIBrk=; b=hK3lLFJpC47z89EXHzYQvMluEZCZZ5E4UH5sCxpNbEshDBGzKnoouMx0ZRqXrNgZCqEHdx U0Uyzg9xxhDWp44PV7aGoEx/ehN7Lgk8ECKFFlJrKJv+Idf6hwRNyXMtVke0zSFgLEY6vu /Fw+G5LBMTud6ELuG9MOC3pFBTlSTrQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1716295703; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=lmecT0mEFJA9gqFqutDYmOmYBo4dilsV5EIUZpSIBrk=; b=RNxrSWIEVOLYx+bOaW4qd6/+4d8BCaoc9DI6fiyv6+bFryR2FT5ucFR/uxjk8o3wp/auuR VPbMbMXaNU0bxeDA== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1716295703; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=lmecT0mEFJA9gqFqutDYmOmYBo4dilsV5EIUZpSIBrk=; b=hK3lLFJpC47z89EXHzYQvMluEZCZZ5E4UH5sCxpNbEshDBGzKnoouMx0ZRqXrNgZCqEHdx U0Uyzg9xxhDWp44PV7aGoEx/ehN7Lgk8ECKFFlJrKJv+Idf6hwRNyXMtVke0zSFgLEY6vu /Fw+G5LBMTud6ELuG9MOC3pFBTlSTrQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1716295703; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=lmecT0mEFJA9gqFqutDYmOmYBo4dilsV5EIUZpSIBrk=; b=RNxrSWIEVOLYx+bOaW4qd6/+4d8BCaoc9DI6fiyv6+bFryR2FT5ucFR/uxjk8o3wp/auuR VPbMbMXaNU0bxeDA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E5A0313A1E; Tue, 21 May 2024 12:48:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id ZDhnNhaYTGayNgAAD6G6ig (envelope-from ); Tue, 21 May 2024 12:48:22 +0000 Date: Tue, 21 May 2024 14:48:14 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: tamar.christina@arm.com, richard.sandiford@arm.com Subject: [PATCH 4/4] Testsuite updates MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Message-Id: <20240521124822.E5A0313A1E@imap1.dmz-prg2.suse.org> X-Spam-Score: -4.30 X-Spam-Level: X-Spamd-Result: default: False [-4.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MISSING_XM_UA(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FROM_EQ_ENVFROM(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo] X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The gcc.dg/vect/slp-12a.c case is interesting as we currently split the 8 store group into lanes 0-5 which we SLP with an unroll factor of two (on x86-64 with SSE) and the remaining two lanes are using interleaving vectorization with a final unroll factor of four. Thus we're using hybrid SLP within a single store group. After the change we discover the same 0-5 lane SLP part as well as two single-lane parts feeding the full store group. But that results in a load permutation that isn't supported (I have WIP patchs to rectify that). So we end up cancelling SLP and vectorizing the whole loop with interleaving which is IMO good and results in better code. This is similar for gcc.target/i386/pr52252-atom.c where interleaving generates much better code than hybrid SLP. I'm unsure how to update the testcase though. gcc.dg/vect/slp-21.c runs into similar situations. Note that when when analyzing SLP operations we discard an instance we currently force the full loop to have no SLP because hybrid detection is broken. It's probably not worth fixing this at this moment. For gcc.dg/vect/pr97428.c we are not splitting the 16 store group into two but merge the two 8 lane loads into one before doing the store and thus have only a single SLP instance. A similar situation happens in gcc.dg/vect/slp-11c.c but the branches feeding the single SLP store only have a single lane. Likewise for gcc.dg/vect/vect-complex-5.c and gcc.dg/vect/vect-gather-2.c. gcc.dg/vect/slp-cond-1.c has an additional SLP vectorization with a SLP store group of size two but two single-lane branches. gcc.target/i386/pr98928.c ICEs in SLP permute optimization because we don't expect a constant and internal branch to be merged with a permute node in vect_optimize_slp_pass::change_vec_perm_layout:4859 (the only permutes merging two SLP nodes are two-operator nodes right now). This still requires fixing. The whole series has been bootstrapped and tested on x86_64-unknown-linux-gnu with the gcc.target/i386/pr98928.c FAIL unfixed. Comments welcome (and hello ARM CI), RISC-V and other arch testing appreciated. Unless there are comments to the contrary I plan to push patch 1 and 2 tomorrow. Thanks, Richard. * gcc.dg/vect/pr97428.c: Expect a single store SLP group. * gcc.dg/vect/slp-11c.c: Likewise. * gcc.dg/vect/vect-complex-5.c: Likewise. * gcc.dg/vect/slp-12a.c: Do not expect SLP. * gcc.dg/vect/slp-21.c: Likewise. * gcc.dg/vect/slp-cond-1.c: Expect one more SLP. * gcc.dg/vect/vect-gather-2.c: Expect SLP to be used. * gcc.target/i386/pr52252-atom.c: XFAIL test for palignr. --- gcc/testsuite/gcc.dg/vect/pr97428.c | 2 +- gcc/testsuite/gcc.dg/vect/slp-11c.c | 5 +++-- gcc/testsuite/gcc.dg/vect/slp-12a.c | 6 +++++- gcc/testsuite/gcc.dg/vect/slp-21.c | 19 +++++-------------- gcc/testsuite/gcc.dg/vect/slp-cond-1.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-complex-5.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 1 - gcc/testsuite/gcc.target/i386/pr52252-atom.c | 3 ++- 8 files changed, 18 insertions(+), 22 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c b/gcc/testsuite/gcc.dg/vect/pr97428.c index 60dd984cfd3..3cc9976c00c 100644 --- a/gcc/testsuite/gcc.dg/vect/pr97428.c +++ b/gcc/testsuite/gcc.dg/vect/pr97428.c @@ -44,5 +44,5 @@ void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n) /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" "vect" } } */ /* We're not able to peel & apply re-aligning to make accesses well-aligned for !vect_hw_misalign, but we could by peeling the stores for alignment and applying re-aligning loads. */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { ! vect_hw_misalign } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_hw_misalign } } } } */ /* { dg-final { scan-tree-dump-not "gap of 6 elements" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c b/gcc/testsuite/gcc.dg/vect/slp-11c.c index 0f680cd4e60..169b0d10eec 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-11c.c +++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c @@ -13,7 +13,8 @@ main1 () unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}; float out[N*8]; - /* Different operations - not SLPable. */ + /* Different operations - we SLP the store and split the group to two + single-lane branches. */ for (i = 0; i < N*4; i++) { out[i*2] = ((float) in[i*2] * 2 + 6) ; @@ -44,4 +45,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c index 973de6ada21..2f98dc9da0b 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-12a.c +++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c @@ -40,6 +40,10 @@ main1 () out[i*8 + 3] = b3 - 1; out[i*8 + 4] = b4 - 8; out[i*8 + 5] = b5 - 7; + /* Due to the use in the ia[i] store we keep the feeding expression + in the form ((in[i*8 + 6] + 11) * 3 - 3) while other expressions + got associated as for example (in[i*5 + 5] * 4 + 33). That + causes SLP discovery to fail. */ out[i*8 + 6] = b6 - 3; out[i*8 + 7] = b7 - 7; @@ -76,5 +80,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c index 58751688414..dc153a53b47 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-21.c +++ b/gcc/testsuite/gcc.dg/vect/slp-21.c @@ -12,6 +12,7 @@ main1 () unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, a0, a1, a2, a3, b5; unsigned short in[N*8]; +#pragma GCC novector for (i = 0; i < N*8; i++) { in[i] = i; @@ -202,18 +203,8 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target { vect_strided4 || vect_extract_even_odd } } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided4 || vect_extract_even_odd } } } } } */ -/* Some targets can vectorize the second of the three main loops using - hybrid SLP. For 128-bit vectors, the required 4->3 permutations are: - - { 0, 1, 2, 4, 5, 6, 8, 9 } - { 2, 4, 5, 6, 8, 9, 10, 12 } - { 5, 6, 8, 9, 10, 12, 13, 14 } - - Not all vect_perm targets support that, and it's a bit too specific to have - its own effective-target selector, so we just test targets directly. */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided4 } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { vect_strided4 || vect_extract_even_odd } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided4 || vect_extract_even_odd } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c index 450c7141c96..16ab0cc7605 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c @@ -125,4 +125,4 @@ main () return 0; } -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c index addcf60438c..ac562dc475c 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c @@ -41,4 +41,4 @@ main (void) } /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c index 4c23b808333..10e64e64d47 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c @@ -36,6 +36,5 @@ f3 (int *restrict y, int *restrict x, int *restrict indices) } } -/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */ /* { dg-final { scan-tree-dump "different gather base" vect { target { ! vect_gather_load_ifn } } } } */ /* { dg-final { scan-tree-dump "different gather scale" vect { target { ! vect_gather_load_ifn } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom.c b/gcc/testsuite/gcc.target/i386/pr52252-atom.c index 11f94411575..02736d56d31 100644 --- a/gcc/testsuite/gcc.target/i386/pr52252-atom.c +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom.c @@ -25,4 +25,5 @@ matrix_mul (byte *in, byte *out, int size) } } -/* { dg-final { scan-assembler "palignr" } } */ +/* We are no longer using hybrid SLP. */ +/* { dg-final { scan-assembler "palignr" { xfail *-*-* } } } */ -- 2.35.3