From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id BD2EB386C5B6 for ; Wed, 26 Jun 2024 13:59:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BD2EB386C5B6 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BD2EB386C5B6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719410362; cv=none; b=te7egHIn2KGo1/1HZOpJKSZ57mUJULsfpI+JYasJRPBPl6//FZY3yOVgOIDUaPhGBp06p+lAvdD5udSKCHKJJFmErAzwYBKyuVWUUEQE6Ui1IT4vAXjitbrmoQBTFnvdkvinW+FLzkTP4Ft6mKjYGi4d/cxmFK+yG6qd6UgnqHk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719410362; c=relaxed/simple; bh=iCgWy25sphHt+8wUtvbN+oXE/C6skupZD+WHT62muvM=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=UqwZ4i4w9vwNdzUvfF59fCpIfmwexGDliddDGFSvkBPVoPFrRFpsoj4F4vyeON8NkDLz2KoD2d7vkz6WPqwMYhE0ZFeY12FPEFon9WNP5jY2mnFeR34GnfCxNC6YwdXjyin7z9cMXx26jhbteO9KVPfHes8irta784nep3VuvwQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id ACFE52191C; Wed, 26 Jun 2024 13:59:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1719410358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=; b=seUMkqoPi2wWA9PoxR0yfIRSdp0dOZRHIU0UbAIxJtTrDR9ENbmLr2dCpBmsRlv8NBwi3T W/oxWndv5/xLKUnmnIFZaE3vVDpAgNHL6z2R4e73uM0vwzFovZpLSFJmq5AJeCreqzSLLd S1IROSby4V4ppPHorQ7FkPF3qf8FGfc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1719410358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=; b=I9iivi3JiYbR6mK3j2duOP63I/HE+DNO7ZamovDISvozWyB4v9IXZKHcsKGzREtSj66esk ijeu+DyBjLwVfHAQ== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1719410358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=; b=seUMkqoPi2wWA9PoxR0yfIRSdp0dOZRHIU0UbAIxJtTrDR9ENbmLr2dCpBmsRlv8NBwi3T W/oxWndv5/xLKUnmnIFZaE3vVDpAgNHL6z2R4e73uM0vwzFovZpLSFJmq5AJeCreqzSLLd S1IROSby4V4ppPHorQ7FkPF3qf8FGfc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1719410358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=; b=I9iivi3JiYbR6mK3j2duOP63I/HE+DNO7ZamovDISvozWyB4v9IXZKHcsKGzREtSj66esk ijeu+DyBjLwVfHAQ== Date: Wed, 26 Jun 2024 15:59:18 +0200 (CEST) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH]middle-end: Implement conditonal store vectorizer pattern [PR115531] In-Reply-To: Message-ID: <76788902-3pp7-3390-984s-q3170p9rs195@fhfr.qr> References: <5r1oo799-nq2o-670q-2s2p-3r38no245837@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Score: -4.30 X-Spam-Level: X-Spamd-Result: default: False [-4.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; MISSING_XM_UA(0.00)[]; TO_DN_SOME(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; DBL_BLOCKED_OPENRESOLVER(0.00)[tree-vect-patterns.cc:url,suse.de:email,gnu.org:email] X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 26 Jun 2024, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener > > Sent: Wednesday, June 26, 2024 2:23 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; jlaw@ventanamicro.com > > Subject: Re: [PATCH]middle-end: Implement conditonal store vectorizer pattern > > [PR115531] > > > > On Tue, 25 Jun 2024, Tamar Christina wrote: > > > > > Hi All, > > > > > > This adds a conditional store optimization for the vectorizer as a pattern. > > > The vectorizer already supports modifying memory accesses because of the > > pattern > > > based gather/scatter recognition. > > > > > > Doing it in the vectorizer allows us to still keep the ability to vectorize such > > > loops for architectures that don't have MASK_STORE support, whereas doing this > > > in ifcvt makes us commit to MASK_STORE. > > > > > > Concretely for this loop: > > > > > > void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int stride) > > > { > > > if (stride <= 1) > > > return; > > > > > > for (int i = 0; i < n; i++) > > > { > > > int res = c[i]; > > > int t = b[i+stride]; > > > if (a[i] != 0) > > > res = t; > > > c[i] = res; > > > } > > > } > > > > > > today we generate: > > > > > > .L3: > > > ld1b z29.s, p7/z, [x0, x5] > > > ld1w z31.s, p7/z, [x2, x5, lsl 2] > > > ld1w z30.s, p7/z, [x1, x5, lsl 2] > > > cmpne p15.b, p6/z, z29.b, #0 > > > sel z30.s, p15, z30.s, z31.s > > > st1w z30.s, p7, [x2, x5, lsl 2] > > > add x5, x5, x4 > > > whilelo p7.s, w5, w3 > > > b.any .L3 > > > > > > which in gimple is: > > > > > > vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67); > > > vect_t_20.12_74 = .MASK_LOAD (vectp.10_72, 32B, loop_mask_67); > > > vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67); > > > mask__34.16_79 = vect__9.15_77 != { 0, ... }; > > > vect_res_11.17_80 = VEC_COND_EXPR > vect_res_18.9_68>; > > > .MASK_STORE (vectp_c.18_81, 32B, loop_mask_67, vect_res_11.17_80); > > > > > > A MASK_STORE is already conditional, so there's no need to perform the load of > > > the old values and the VEC_COND_EXPR. This patch makes it so we generate: > > > > > > vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67); > > > vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67); > > > mask__34.16_79 = vect__9.15_77 != { 0, ... }; > > > .MASK_STORE (vectp_c.18_81, 32B, mask__34.16_79, vect_res_18.9_68); > > > > > > which generates: > > > > > > .L3: > > > ld1b z30.s, p7/z, [x0, x5] > > > ld1w z31.s, p7/z, [x1, x5, lsl 2] > > > cmpne p7.b, p7/z, z30.b, #0 > > > st1w z31.s, p7, [x2, x5, lsl 2] > > > add x5, x5, x4 > > > whilelo p7.s, w5, w3 > > > b.any .L3 > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > The idea looks good but I wonder if it's not slower in practice. > > The issue with masked stores, in particular those where any elements > > are actually masked out, is that such stores do not forward on any > > uarch I know. They also usually have a penalty for the merging > > (the load has to be carried out anyway). > > > > Yes, but when the predicate has all bit set it usually does. > But forwarding aside, this gets rid of the select and the additional load, > So purely from a instruction latency perspective it's a win. > > > So - can you do an actual benchmark on real hardware where the > > loop has (way) more than one vector iteration and where there's > > at least one masked element during each vector iteration? > > > > Sure, this optimization comes from exchange2 where vectoring with SVE > ends up being slower than not vectorizing. This change makes the vectorization > profitable and recovers about a 3% difference overall between vectorizing and not. > > I did run microbenchmarks over all current and future Arm cores and it was a universal > win. > > I can run more benchmarks with various masks, but as mentioned above, even without > Forwarding, you still have 2 instructions less, so it's almost always going to win. > > > > Ok for master? > > > > Few comments below. > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/115531 > > > * tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New. > > > (vect_recog_cond_store_pattern): New. > > > (vect_vect_recog_func_ptrs): Use it. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/115531 > > > * gcc.dg/vect/vect-conditional_store_1.c: New test. > > > * gcc.dg/vect/vect-conditional_store_2.c: New test. > > > * gcc.dg/vect/vect-conditional_store_3.c: New test. > > > * gcc.dg/vect/vect-conditional_store_4.c: New test. > > > > > > --- > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c > > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..3884a3c3d0a2dc2258097 > > 348c75bb7c0b3b37c72 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-require-effective-target vect_masked_store } */ > > > + > > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */ > > > + > > > +void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int stride) > > > +{ > > > + if (stride <= 1) > > > + return; > > > + > > > + for (int i = 0; i < n; i++) > > > + { > > > + int res = c[i]; > > > + int t = b[i+stride]; > > > + if (a[i] != 0) > > > + res = t; > > > + c[i] = res; > > > + } > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c > > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..bc965a244f147c199b1726 > > e5f6b44229539cd225 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-require-effective-target vect_masked_store } */ > > > + > > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */ > > > + > > > +void foo2 (char *restrict a, int *restrict b, int *restrict c, int n, int stride) > > > +{ > > > + if (stride <= 1) > > > + return; > > > + > > > + for (int i = 0; i < n; i++) > > > + { > > > + int res = c[i]; > > > + int t = b[i+stride]; > > > + if (a[i] != 0) > > > + t = res; > > > + c[i] = t; > > > + } > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c > > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..ab6889f967b330a652917 > > 925c2748b16af59b9fd > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-require-effective-target vect_masked_store } */ > > > + > > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */ > > > + > > > +void foo3 (float *restrict a, int *restrict b, int *restrict c, int n, int stride) > > > +{ > > > + if (stride <= 1) > > > + return; > > > + > > > + for (int i = 0; i < n; i++) > > > + { > > > + int res = c[i]; > > > + int t = b[i+stride]; > > > + if (a[i] >= 0) > > > + t = res; > > > + c[i] = t; > > > + } > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c > > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..3bfe2f81dc2d47096aa235 > > 29d43263be52cd422c > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c > > > @@ -0,0 +1,28 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-require-effective-target vect_masked_store } */ > > > + > > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */ > > > + > > > +void foo4 (signed char *restrict a, int *restrict b, int *restrict c, int *restrict d, int > > n, int stride) > > > +{ > > > + if (stride <= 1) > > > + return; > > > + > > > + for (int i = 0; i < n; i++) > > > + { > > > + int res1 = c[i]; > > > + int res2 = d[i]; > > > + int t = b[i+stride]; > > > + if (a[i] > 0) > > > + t = res1; > > > + else if (a[i] < 0) > > > + t = res2 * 2; > > > + > > > + c[i] = t; > > > + } > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > > index > > cef901808eb97c0c92b51da20535ea7f397b4742..f0da6c932f48d2992a501d0c > > ed3efc8924912c77 100644 > > > --- a/gcc/tree-vect-patterns.cc > > > +++ b/gcc/tree-vect-patterns.cc > > > @@ -6397,6 +6397,177 @@ vect_recog_gather_scatter_pattern (vec_info > > *vinfo, > > > return pattern_stmt; > > > } > > > > > > +/* Helper method of vect_recog_cond_store_pattern, checks to see if > > COND_ARG > > > + is points to a load statement that reads the same data as that of > > > + STORE_VINFO. */ > > > + > > > +static bool > > > +vect_cond_store_pattern_same_ref (loop_vec_info loop_vinfo, > > > + stmt_vec_info store_vinfo, tree cond_arg) > > > +{ > > > + stmt_vec_info load_stmt_vinfo = loop_vinfo->lookup_def (cond_arg); > > > + if (!load_stmt_vinfo > > > + || !STMT_VINFO_DATA_REF (load_stmt_vinfo) > > > + || DR_IS_WRITE (STMT_VINFO_DATA_REF (load_stmt_vinfo)) > > > > can you use !DR_IS_READ? > > > > > + || !same_data_refs (STMT_VINFO_DATA_REF (store_vinfo), > > > + STMT_VINFO_DATA_REF (load_stmt_vinfo))) > > > + return false; > > > + > > > + return true; > > > +} > > > + > > > +/* Function vect_recog_cond_store_pattern > > > + > > > + Try to find the following pattern: > > > + > > > + x = *_3; > > > + c = a CMP b; > > > + y = c ? t_20 : x; > > > + *_3 = y; > > > + > > > + where the store of _3 happens on a conditional select on a value loaded > > > + from the same location. In such case we can elide the initial load if > > > + MASK_STORE is supported and instead only conditionally write out the result. > > > + > > > + The pattern produces for the above: > > > + > > > + c = a CMP b; > > > + .MASK_STORE (_3, c, t_20) > > > + > > > + Input: > > > + > > > + * STMT_VINFO: The stmt from which the pattern search begins. In the > > > + example, when this function is called with _3 then the search begins. > > > + > > > + Output: > > > + > > > + * TYPE_OUT: The type of the output of this pattern. > > > + > > > + * Return value: A new stmt that will be used to replace the sequence. */ > > > + > > > +static gimple * > > > +vect_recog_cond_store_pattern (vec_info *vinfo, > > > + stmt_vec_info stmt_vinfo, tree *type_out) > > > +{ > > > + loop_vec_info loop_vinfo = dyn_cast (vinfo); > > > + if (!loop_vinfo) > > > + return NULL; > > > > Why only for loops? We run BB vect for if-converted loop bodies > > if loop vect failed on them for example. Or is it that you imply > > this is only profitable when loop masking is applied - which of course > > you do not check? > > > > I don't think it's possible when masking isn't applied no? > The check is implicit in checking for MASK_STORE support, > or can you have masked store support without masking? Sure, if-conversion can apply a .MASK_STORE for a conditional store. That's exactly the case you are matching here - there's no need for a loop mask. > > > + gimple *store_stmt = STMT_VINFO_STMT (stmt_vinfo); > > > + > > > + /* Needs to be a gimple store where we have DR info for. */ > > > + if (!STMT_VINFO_DATA_REF (stmt_vinfo) > > > + || !DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_vinfo)) > > > + || !gimple_store_p (store_stmt)) > > > + return NULL; > > > + > > > + tree st_rhs = gimple_assign_rhs1 (store_stmt); > > > + tree st_lhs = gimple_assign_lhs (store_stmt); > > > + > > > + if (TREE_CODE (st_rhs) != SSA_NAME) > > > + return NULL; > > > + > > > + gassign *cond_stmt = dyn_cast (SSA_NAME_DEF_STMT (st_rhs)); > > > + if (!cond_stmt || gimple_assign_rhs_code (cond_stmt) != COND_EXPR) > > > + return NULL; > > > + > > > + /* Check if the else value matches the original loaded one. */ > > > + bool invert = false; > > > + tree cmp_ls = gimple_arg (cond_stmt, 0); > > > + tree cond_arg1 = gimple_arg (cond_stmt, 1); > > > + tree cond_arg2 = gimple_arg (cond_stmt, 2); > > > + > > > + if (!vect_cond_store_pattern_same_ref (loop_vinfo, stmt_vinfo, cond_arg2) > > > + && !(invert = vect_cond_store_pattern_same_ref (loop_vinfo, stmt_vinfo, > > > + cond_arg1))) > > > + return NULL; > > > + > > > + vect_pattern_detected ("vect_recog_cond_store_pattern", store_stmt); > > > + > > > + tree scalar_type = TREE_TYPE (st_rhs); > > > + if (VECTOR_TYPE_P (scalar_type)) > > > + return NULL; > > > + > > > + tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type); > > > + if (vectype == NULL_TREE) > > > + return NULL; > > > + > > > + machine_mode mask_mode; > > > + machine_mode vecmode = TYPE_MODE (vectype); > > > + if (!targetm.vectorize.get_mask_mode (vecmode).exists (&mask_mode) > > > + || !can_vec_mask_load_store_p (vecmode, mask_mode, false)) > > > + return NULL; > > > + > > > + /* Convert the mask to the right form. */ > > > + tree gs_vectype = get_vectype_for_scalar_type (loop_vinfo, scalar_type); > > > > same as vectype above? You sometimes use 'vinfo' and sometimes > > 'loop_vinfo'. > > > > > + tree cookie = build_int_cst (build_pointer_type (scalar_type), > > > + TYPE_ALIGN (scalar_type)); > > > > please do this next to the use. It's also wrong, you need to > > preserve alias info and alignment of the ref properly - see if-conversion > > on how to do that. > > > > > + tree base = TREE_OPERAND (st_lhs, 0); > > > > You assume this is a MEM_REF? I think you want build_fold_addr_expr > > (st_lhs) and you need to be prepared to put this to a separate stmt if > > it's not invariant. See if-conversion again. > > > > > + tree cond_store_arg = cond_arg1; > > > + > > > + /* If we have to invert the condition, i.e. use the true argument rather than > > > + the false argument, we should check whether we can just invert the > > > + comparison or if we have to negate the result. */ > > > + if (invert) > > > + { > > > + gimple *cond = SSA_NAME_DEF_STMT (cmp_ls); > > > + /* We need to use the false parameter of the conditional select. */ > > > + cond_store_arg = cond_arg2; > > > + tree_code new_code = ERROR_MARK; > > > + tree mask_vec_type, itype; > > > + gassign *conv; > > > + tree var = vect_recog_temp_ssa_var (boolean_type_node, NULL); > > > + > > > + if (is_gimple_assign (cond) > > > + && TREE_CODE_CLASS (gimple_assign_rhs_code (cond)) == > > tcc_comparison) > > > + { > > > + tree_code cond_code = gimple_assign_rhs_code (cond); > > > + tree cond_expr0 = gimple_assign_rhs1 (cond); > > > + tree cond_expr1 = gimple_assign_rhs2 (cond); > > > + > > > + /* We have to invert the comparison, see if we can flip it. */ > > > + bool honor_nans = HONOR_NANS (TREE_TYPE (cond_expr0)); > > > + new_code = invert_tree_comparison (cond_code, honor_nans); > > > + if (new_code != ERROR_MARK) > > > + { > > > + itype = TREE_TYPE(cond_expr0); > > > + conv = gimple_build_assign (var, new_code, cond_expr0, > > > + cond_expr1); > > > + } > > > > I think this is premature optimization here. Actual inversion should > > be cheaper than having a second comparison. So just invert. > > Fair, the reason I did so was because the vectorizer already tracks masks > and their inverses. So if the negated version was live we wouldn't materialize > it. That said, that also means I can just negate and leave to the same code to > track, so I'll just try negating. > > > > > > + } > > > + > > > + if (new_code == ERROR_MARK) > > > + { > > > + /* We couldn't flip the condition, so invert the mask instead. */ > > > + itype = TREE_TYPE (cmp_ls); > > > + conv = gimple_build_assign (var, BIT_XOR_EXPR, cmp_ls, > > > + build_int_cst (itype, 1)); > > > + } > > > + > > > + mask_vec_type = get_mask_type_for_scalar_type (loop_vinfo, itype); > > > + append_pattern_def_seq (vinfo, stmt_vinfo, conv, mask_vec_type, itype); > > > + /* Then prepare the boolean mask as the mask conversion pattern > > > + won't hit on the pattern statement. */ > > > + cmp_ls = build_mask_conversion (vinfo, var, gs_vectype, stmt_vinfo); > > > > Isn't this somewhat redundant with the below call? > > > > I fear of bad [non-]interactions with bool pattern recognition btw. > > So this is again another issue with that patterns don't apply to newly produced patterns. > and so they can't serve as root for new patterns. This is why the scatter/gather pattern > addition refactored part of the work into these helper functions. > > I did actually try to just add a secondary loop that iterates over newly produced patterns > but you later run into problems where a new pattern completely cancels out an old pattern > rather than just extend it. > > So at the moment, unless the code ends up being hybrid, whatever the bool recog pattern > does is just ignored as irrelevant. > > But If we don't invert the compare then it should be simpler as the original compare is > never in a pattern. > > I'll respin with these changes. Thanks. Richard. > Thanks, > Tamar > > > > > + } > > > + > > > + tree mask = vect_convert_mask_for_vectype (cmp_ls, gs_vectype, stmt_vinfo, > > > + loop_vinfo); > > > + gcall *call > > > + = gimple_build_call_internal (IFN_MASK_STORE, 4, base, cookie, mask, > > > + cond_store_arg); > > > + gimple_set_location (call, gimple_location (store_stmt)); > > > + gimple_set_lhs (call, make_ssa_name (integer_type_node)); > > > + > > > + /* Copy across relevant vectorization info and associate DR with the > > > + new pattern statement instead of the original statement. */ > > > + stmt_vec_info pattern_stmt_info = loop_vinfo->add_stmt (call); > > > + loop_vinfo->move_dr (pattern_stmt_info, stmt_vinfo); > > > + > > > + *type_out = vectype; > > > + return call; > > > +} > > > + > > > /* Return true if TYPE is a non-boolean integer type. These are the types > > > that we want to consider for narrowing. */ > > > > > > @@ -7061,6 +7232,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = > > { > > > of mask conversion that are needed for gather and scatter > > > internal functions. */ > > > { vect_recog_gather_scatter_pattern, "gather_scatter" }, > > > + { vect_recog_cond_store_pattern, "cond_store" }, > > > { vect_recog_mask_conversion_pattern, "mask_conversion" }, > > > { vect_recog_widen_plus_pattern, "widen_plus" }, > > > { vect_recog_widen_minus_pattern, "widen_minus" }, > > > > > > > > > > > > > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)