From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=vcfg=N4=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1])
	by sourceware.org (Postfix) with ESMTPS id BD2EB386C5B6
	for <gcc-patches@gcc.gnu.org>; Wed, 26 Jun 2024 13:59:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BD2EB386C5B6
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BD2EB386C5B6
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719410362; cv=none;
	b=te7egHIn2KGo1/1HZOpJKSZ57mUJULsfpI+JYasJRPBPl6//FZY3yOVgOIDUaPhGBp06p+lAvdD5udSKCHKJJFmErAzwYBKyuVWUUEQE6Ui1IT4vAXjitbrmoQBTFnvdkvinW+FLzkTP4Ft6mKjYGi4d/cxmFK+yG6qd6UgnqHk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1719410362; c=relaxed/simple;
	bh=iCgWy25sphHt+8wUtvbN+oXE/C6skupZD+WHT62muvM=;
	h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date:
	 From:To:Subject:Message-ID:MIME-Version; b=UqwZ4i4w9vwNdzUvfF59fCpIfmwexGDliddDGFSvkBPVoPFrRFpsoj4F4vyeON8NkDLz2KoD2d7vkz6WPqwMYhE0ZFeY12FPEFon9WNP5jY2mnFeR34GnfCxNC6YwdXjyin7z9cMXx26jhbteO9KVPfHes8irta784nep3VuvwQ=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from murzim.nue2.suse.org (unknown [10.168.4.243])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by smtp-out1.suse.de (Postfix) with ESMTPS id ACFE52191C;
	Wed, 26 Jun 2024 13:59:18 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1719410358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=;
	b=seUMkqoPi2wWA9PoxR0yfIRSdp0dOZRHIU0UbAIxJtTrDR9ENbmLr2dCpBmsRlv8NBwi3T
	W/oxWndv5/xLKUnmnIFZaE3vVDpAgNHL6z2R4e73uM0vwzFovZpLSFJmq5AJeCreqzSLLd
	S1IROSby4V4ppPHorQ7FkPF3qf8FGfc=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1719410358;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=;
	b=I9iivi3JiYbR6mK3j2duOP63I/HE+DNO7ZamovDISvozWyB4v9IXZKHcsKGzREtSj66esk
	ijeu+DyBjLwVfHAQ==
Authentication-Results: smtp-out1.suse.de;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1719410358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=;
	b=seUMkqoPi2wWA9PoxR0yfIRSdp0dOZRHIU0UbAIxJtTrDR9ENbmLr2dCpBmsRlv8NBwi3T
	W/oxWndv5/xLKUnmnIFZaE3vVDpAgNHL6z2R4e73uM0vwzFovZpLSFJmq5AJeCreqzSLLd
	S1IROSby4V4ppPHorQ7FkPF3qf8FGfc=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1719410358;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=p5tmE0DaiJwlXWWhFWT5cUBHEljNRmmph0zPvq6MTR8=;
	b=I9iivi3JiYbR6mK3j2duOP63I/HE+DNO7ZamovDISvozWyB4v9IXZKHcsKGzREtSj66esk
	ijeu+DyBjLwVfHAQ==
Date: Wed, 26 Jun 2024 15:59:18 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, 
    "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH]middle-end: Implement conditonal store vectorizer pattern
 [PR115531]
In-Reply-To:  <AM0PR08MB5316B8497D0E16E0F0484A3CFFD62@AM0PR08MB5316.eurprd08.prod.outlook.com>
Message-ID: <76788902-3pp7-3390-984s-q3170p9rs195@fhfr.qr>
References: <patch-18576-tamar@arm.com> <5r1oo799-nq2o-670q-2s2p-3r38no245837@fhfr.qr>  <AM0PR08MB5316B8497D0E16E0F0484A3CFFD62@AM0PR08MB5316.eurprd08.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Score: -4.30
X-Spam-Level: 
X-Spamd-Result: default: False [-4.30 / 50.00];
	BAYES_HAM(-3.00)[100.00%];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-0.20)[-1.000];
	MIME_GOOD(-0.10)[text/plain];
	FROM_HAS_DN(0.00)[];
	MISSING_XM_UA(0.00)[];
	TO_DN_SOME(0.00)[];
	TO_DN_EQ_ADDR_SOME(0.00)[];
	MIME_TRACE(0.00)[0:+];
	ARC_NA(0.00)[];
	RCVD_COUNT_ZERO(0.00)[0];
	DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
	FROM_EQ_ENVFROM(0.00)[];
	RCPT_COUNT_THREE(0.00)[4];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	FUZZY_BLOCKED(0.00)[rspamd.com];
	DBL_BLOCKED_OPENRESOLVER(0.00)[tree-vect-patterns.cc:url,suse.de:email,gnu.org:email]
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, 26 Jun 2024, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, June 26, 2024 2:23 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH]middle-end: Implement conditonal store vectorizer pattern
> > [PR115531]
> > 
> > On Tue, 25 Jun 2024, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This adds a conditional store optimization for the vectorizer as a pattern.
> > > The vectorizer already supports modifying memory accesses because of the
> > pattern
> > > based gather/scatter recognition.
> > >
> > > Doing it in the vectorizer allows us to still keep the ability to vectorize such
> > > loops for architectures that don't have MASK_STORE support, whereas doing this
> > > in ifcvt makes us commit to MASK_STORE.
> > >
> > > Concretely for this loop:
> > >
> > > void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int stride)
> > > {
> > >   if (stride <= 1)
> > >     return;
> > >
> > >   for (int i = 0; i < n; i++)
> > >     {
> > >       int res = c[i];
> > >       int t = b[i+stride];
> > >       if (a[i] != 0)
> > >         res = t;
> > >       c[i] = res;
> > >     }
> > > }
> > >
> > > today we generate:
> > >
> > > .L3:
> > >         ld1b    z29.s, p7/z, [x0, x5]
> > >         ld1w    z31.s, p7/z, [x2, x5, lsl 2]
> > >         ld1w    z30.s, p7/z, [x1, x5, lsl 2]
> > >         cmpne   p15.b, p6/z, z29.b, #0
> > >         sel     z30.s, p15, z30.s, z31.s
> > >         st1w    z30.s, p7, [x2, x5, lsl 2]
> > >         add     x5, x5, x4
> > >         whilelo p7.s, w5, w3
> > >         b.any   .L3
> > >
> > > which in gimple is:
> > >
> > >   vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
> > >   vect_t_20.12_74 = .MASK_LOAD (vectp.10_72, 32B, loop_mask_67);
> > >   vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
> > >   mask__34.16_79 = vect__9.15_77 != { 0, ... };
> > >   vect_res_11.17_80 = VEC_COND_EXPR <mask__34.16_79, vect_t_20.12_74,
> > vect_res_18.9_68>;
> > >   .MASK_STORE (vectp_c.18_81, 32B, loop_mask_67, vect_res_11.17_80);
> > >
> > > A MASK_STORE is already conditional, so there's no need to perform the load of
> > > the old values and the VEC_COND_EXPR.  This patch makes it so we generate:
> > >
> > >   vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
> > >   vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
> > >   mask__34.16_79 = vect__9.15_77 != { 0, ... };
> > >   .MASK_STORE (vectp_c.18_81, 32B, mask__34.16_79, vect_res_18.9_68);
> > >
> > > which generates:
> > >
> > > .L3:
> > >         ld1b    z30.s, p7/z, [x0, x5]
> > >         ld1w    z31.s, p7/z, [x1, x5, lsl 2]
> > >         cmpne   p7.b, p7/z, z30.b, #0
> > >         st1w    z31.s, p7, [x2, x5, lsl 2]
> > >         add     x5, x5, x4
> > >         whilelo p7.s, w5, w3
> > >         b.any   .L3
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > The idea looks good but I wonder if it's not slower in practice.
> > The issue with masked stores, in particular those where any elements
> > are actually masked out, is that such stores do not forward on any
> > uarch I know.  They also usually have a penalty for the merging
> > (the load has to be carried out anyway).
> > 
> 
> Yes, but when the predicate has all bit set it usually does.
> But forwarding aside, this gets rid of the select and the additional load,
> So purely from a instruction latency perspective it's a win.
> 
> > So - can you do an actual benchmark on real hardware where the
> > loop has (way) more than one vector iteration and where there's
> > at least one masked element during each vector iteration?
> > 
> 
> Sure, this optimization comes from exchange2 where vectoring with SVE
> ends up being slower than not vectorizing.  This change makes the vectorization
> profitable and recovers about a 3% difference overall between vectorizing and not.
> 
> I did run microbenchmarks over all current and future Arm cores and it was a universal
> win.
> 
> I can run more benchmarks with various masks, but as mentioned above, even without
> Forwarding, you still have 2 instructions less, so it's almost always going to win.
> 
> > > Ok for master?
> > 
> > Few comments below.
> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	PR tree-optimization/115531
> > > 	* tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New.
> > > 	(vect_recog_cond_store_pattern): New.
> > > 	(vect_vect_recog_func_ptrs): Use it.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 	PR tree-optimization/115531
> > > 	* gcc.dg/vect/vect-conditional_store_1.c: New test.
> > > 	* gcc.dg/vect/vect-conditional_store_2.c: New test.
> > > 	* gcc.dg/vect/vect-conditional_store_3.c: New test.
> > > 	* gcc.dg/vect/vect-conditional_store_4.c: New test.
> > >
> > > ---
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c
> > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..3884a3c3d0a2dc2258097
> > 348c75bb7c0b3b37c72
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target vect_masked_store } */
> > > +
> > > +/* { dg-additional-options "-mavx2" { target avx2 } } */
> > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> > > +
> > > +void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int stride)
> > > +{
> > > +  if (stride <= 1)
> > > +    return;
> > > +
> > > +  for (int i = 0; i < n; i++)
> > > +    {
> > > +      int res = c[i];
> > > +      int t = b[i+stride];
> > > +      if (a[i] != 0)
> > > +        res = t;
> > > +      c[i] = res;
> > > +    }
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..bc965a244f147c199b1726
> > e5f6b44229539cd225
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target vect_masked_store } */
> > > +
> > > +/* { dg-additional-options "-mavx2" { target avx2 } } */
> > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> > > +
> > > +void foo2 (char *restrict a, int *restrict b, int *restrict c, int n, int stride)
> > > +{
> > > +  if (stride <= 1)
> > > +    return;
> > > +
> > > +  for (int i = 0; i < n; i++)
> > > +    {
> > > +      int res = c[i];
> > > +      int t = b[i+stride];
> > > +      if (a[i] != 0)
> > > +        t = res;
> > > +      c[i] = t;
> > > +    }
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..ab6889f967b330a652917
> > 925c2748b16af59b9fd
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target vect_masked_store } */
> > > +
> > > +/* { dg-additional-options "-mavx2" { target avx2 } } */
> > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> > > +
> > > +void foo3 (float *restrict a, int *restrict b, int *restrict c, int n, int stride)
> > > +{
> > > +  if (stride <= 1)
> > > +    return;
> > > +
> > > +  for (int i = 0; i < n; i++)
> > > +    {
> > > +      int res = c[i];
> > > +      int t = b[i+stride];
> > > +      if (a[i] >= 0)
> > > +        t = res;
> > > +      c[i] = t;
> > > +    }
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..3bfe2f81dc2d47096aa235
> > 29d43263be52cd422c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c
> > > @@ -0,0 +1,28 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-require-effective-target vect_masked_store } */
> > > +
> > > +/* { dg-additional-options "-mavx2" { target avx2 } } */
> > > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> > > +
> > > +void foo4 (signed char *restrict a, int *restrict b, int *restrict c, int *restrict d, int
> > n, int stride)
> > > +{
> > > +  if (stride <= 1)
> > > +    return;
> > > +
> > > +  for (int i = 0; i < n; i++)
> > > +    {
> > > +      int res1 = c[i];
> > > +      int res2 = d[i];
> > > +      int t = b[i+stride];
> > > +      if (a[i] > 0)
> > > +        t = res1;
> > > +      else if (a[i] < 0)
> > > +        t = res2 * 2;
> > > +
> > > +      c[i] = t;
> > > +    }
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index
> > cef901808eb97c0c92b51da20535ea7f397b4742..f0da6c932f48d2992a501d0c
> > ed3efc8924912c77 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -6397,6 +6397,177 @@ vect_recog_gather_scatter_pattern (vec_info
> > *vinfo,
> > >    return pattern_stmt;
> > >  }
> > >
> > > +/* Helper method of vect_recog_cond_store_pattern,  checks to see if
> > COND_ARG
> > > +   is points to a load statement that reads the same data as that of
> > > +   STORE_VINFO.  */
> > > +
> > > +static bool
> > > +vect_cond_store_pattern_same_ref (loop_vec_info loop_vinfo,
> > > +				  stmt_vec_info store_vinfo, tree cond_arg)
> > > +{
> > > +  stmt_vec_info load_stmt_vinfo = loop_vinfo->lookup_def (cond_arg);
> > > +  if (!load_stmt_vinfo
> > > +      || !STMT_VINFO_DATA_REF (load_stmt_vinfo)
> > > +      || DR_IS_WRITE (STMT_VINFO_DATA_REF (load_stmt_vinfo))
> > 
> > can you use !DR_IS_READ?
> > 
> > > +      || !same_data_refs (STMT_VINFO_DATA_REF (store_vinfo),
> > > +			  STMT_VINFO_DATA_REF (load_stmt_vinfo)))
> > > +    return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Function vect_recog_cond_store_pattern
> > > +
> > > +   Try to find the following pattern:
> > > +
> > > +   x = *_3;
> > > +   c = a CMP b;
> > > +   y = c ? t_20 : x;
> > > +   *_3 = y;
> > > +
> > > +   where the store of _3 happens on a conditional select on a value loaded
> > > +   from the same location.  In such case we can elide the initial load if
> > > +   MASK_STORE is supported and instead only conditionally write out the result.
> > > +
> > > +   The pattern produces for the above:
> > > +
> > > +   c = a CMP b;
> > > +   .MASK_STORE (_3, c, t_20)
> > > +
> > > +   Input:
> > > +
> > > +   * STMT_VINFO: The stmt from which the pattern search begins.  In the
> > > +   example, when this function is called with _3 then the search begins.
> > > +
> > > +   Output:
> > > +
> > > +   * TYPE_OUT: The type of the output  of this pattern.
> > > +
> > > +   * Return value: A new stmt that will be used to replace the sequence.  */
> > > +
> > > +static gimple *
> > > +vect_recog_cond_store_pattern (vec_info *vinfo,
> > > +			       stmt_vec_info stmt_vinfo, tree *type_out)
> > > +{
> > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > +  if (!loop_vinfo)
> > > +    return NULL;
> > 
> > Why only for loops?  We run BB vect for if-converted loop bodies
> > if loop vect failed on them for example.  Or is it that you imply
> > this is only profitable when loop masking is applied - which of course
> > you do not check?
> > 
> 
> I don't think it's possible when masking isn't applied no?
> The check is implicit in checking for MASK_STORE support,
> or can you have masked store support without masking?

Sure, if-conversion can apply a .MASK_STORE for a conditional store.
That's exactly the case you are matching here - there's no need for
a loop mask.

> > > +  gimple *store_stmt = STMT_VINFO_STMT (stmt_vinfo);
> > > +
> > > +  /* Needs to be a gimple store where we have DR info for.  */
> > > +  if (!STMT_VINFO_DATA_REF (stmt_vinfo)
> > > +      || !DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_vinfo))
> > > +      || !gimple_store_p (store_stmt))
> > > +    return NULL;
> > > +
> > > +  tree st_rhs = gimple_assign_rhs1 (store_stmt);
> > > +  tree st_lhs = gimple_assign_lhs (store_stmt);
> > > +
> > > +  if (TREE_CODE (st_rhs) != SSA_NAME)
> > > +    return NULL;
> > > +
> > > +  gassign *cond_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (st_rhs));
> > > +  if (!cond_stmt || gimple_assign_rhs_code (cond_stmt) != COND_EXPR)
> > > +    return NULL;
> > > +
> > > +  /* Check if the else value matches the original loaded one.  */
> > > +  bool invert = false;
> > > +  tree cmp_ls = gimple_arg (cond_stmt, 0);
> > > +  tree cond_arg1 = gimple_arg (cond_stmt, 1);
> > > +  tree cond_arg2 = gimple_arg (cond_stmt, 2);
> > > +
> > > +  if (!vect_cond_store_pattern_same_ref (loop_vinfo, stmt_vinfo, cond_arg2)
> > > +      && !(invert = vect_cond_store_pattern_same_ref (loop_vinfo, stmt_vinfo,
> > > +						      cond_arg1)))
> > > +    return NULL;
> > > +
> > > +  vect_pattern_detected ("vect_recog_cond_store_pattern", store_stmt);
> > > +
> > > +  tree scalar_type = TREE_TYPE (st_rhs);
> > > +  if (VECTOR_TYPE_P (scalar_type))
> > > +    return NULL;
> > > +
> > > +  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> > > +  if (vectype == NULL_TREE)
> > > +    return NULL;
> > > +
> > > +  machine_mode mask_mode;
> > > +  machine_mode vecmode = TYPE_MODE (vectype);
> > > +  if (!targetm.vectorize.get_mask_mode (vecmode).exists (&mask_mode)
> > > +      || !can_vec_mask_load_store_p (vecmode, mask_mode, false))
> > > +    return NULL;
> > > +
> > > +  /* Convert the mask to the right form.  */
> > > +  tree gs_vectype = get_vectype_for_scalar_type (loop_vinfo, scalar_type);
> > 
> > same as vectype above?  You sometimes use 'vinfo' and sometimes
> > 'loop_vinfo'.
> > 
> > > +  tree cookie = build_int_cst (build_pointer_type (scalar_type),
> > > +			       TYPE_ALIGN (scalar_type));
> > 
> > please do this next to the use.  It's also wrong, you need to
> > preserve alias info and alignment of the ref properly - see if-conversion
> > on how to do that.
> > 
> > > +  tree base = TREE_OPERAND (st_lhs, 0);
> > 
> > You assume this is a MEM_REF?  I think you want build_fold_addr_expr
> > (st_lhs) and you need to be prepared to put this to a separate stmt if
> > it's not invariant.  See if-conversion again.
> > 
> > > +  tree cond_store_arg = cond_arg1;
> > > +
> > > +  /* If we have to invert the condition, i.e. use the true argument rather than
> > > +     the false argument, we should check whether we can just invert the
> > > +     comparison or if we have to negate the result.  */
> > > +  if (invert)
> > > +    {
> > > +      gimple *cond = SSA_NAME_DEF_STMT (cmp_ls);
> > > +      /* We need to use the false parameter of the conditional select.  */
> > > +      cond_store_arg = cond_arg2;
> > > +      tree_code new_code = ERROR_MARK;
> > > +      tree mask_vec_type, itype;
> > > +      gassign *conv;
> > > +      tree var = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> > > +
> > > +      if (is_gimple_assign (cond)
> > > +	  && TREE_CODE_CLASS (gimple_assign_rhs_code (cond)) ==
> > tcc_comparison)
> > > +	{
> > > +	  tree_code cond_code = gimple_assign_rhs_code (cond);
> > > +	  tree cond_expr0 = gimple_assign_rhs1 (cond);
> > > +	  tree cond_expr1 = gimple_assign_rhs2 (cond);
> > > +
> > > +	  /* We have to invert the comparison, see if we can flip it.  */
> > > +	  bool honor_nans = HONOR_NANS (TREE_TYPE (cond_expr0));
> > > +	  new_code = invert_tree_comparison (cond_code, honor_nans);
> > > +	  if (new_code != ERROR_MARK)
> > > +	    {
> > > +	      itype = TREE_TYPE(cond_expr0);
> > > +	      conv = gimple_build_assign (var, new_code, cond_expr0,
> > > +					  cond_expr1);
> > > +	    }
> > 
> > I think this is premature optimization here.  Actual inversion should
> > be cheaper than having a second comparison.  So just invert.
> 
> Fair, the reason I did so was because the vectorizer already tracks masks
> and their inverses.  So if the negated version was live we wouldn't materialize
> it.  That said, that also means I can just negate and leave to the same code to
> track, so I'll just try negating.
> 
> > 
> > > +	}
> > > +
> > > +      if (new_code == ERROR_MARK)
> > > +	{
> > > +	  /* We couldn't flip the condition, so invert the mask instead.  */
> > > +	  itype = TREE_TYPE (cmp_ls);
> > > +	  conv = gimple_build_assign (var, BIT_XOR_EXPR, cmp_ls,
> > > +				      build_int_cst (itype, 1));
> > > +	}
> > > +
> > > +      mask_vec_type = get_mask_type_for_scalar_type (loop_vinfo, itype);
> > > +      append_pattern_def_seq (vinfo, stmt_vinfo, conv, mask_vec_type, itype);
> > > +      /* Then prepare the boolean mask as the mask conversion pattern
> > > +	 won't hit on the pattern statement.  */
> > > +      cmp_ls = build_mask_conversion (vinfo, var, gs_vectype, stmt_vinfo);
> > 
> > Isn't this somewhat redundant with the below call?
> > 
> > I fear of bad [non-]interactions with bool pattern recognition btw.
> 
> So this is again another issue with that patterns don't apply to newly produced patterns.
> and so they can't serve as root for new patterns.  This is why the scatter/gather pattern
> addition refactored part of the work into these helper functions.
> 
> I did actually try to just add a secondary loop that iterates over newly produced patterns
> but you later run into problems where a new pattern completely cancels out an old pattern
> rather than just extend it.
> 
> So at the moment, unless the code ends up being hybrid, whatever the bool recog pattern
> does is just ignored as irrelevant.
> 
> But If we don't invert the compare then it should be simpler as the original compare is
> never in a pattern.
> 
> I'll respin with these changes.

Thanks.

Richard.

> Thanks,
> Tamar
> > 
> > > +    }
> > > +
> > > +  tree mask = vect_convert_mask_for_vectype (cmp_ls, gs_vectype, stmt_vinfo,
> > > +					     loop_vinfo);
> > > +  gcall *call
> > > +    = gimple_build_call_internal (IFN_MASK_STORE, 4, base, cookie, mask,
> > > +				  cond_store_arg);
> > > +  gimple_set_location (call, gimple_location (store_stmt));
> > > +  gimple_set_lhs (call, make_ssa_name (integer_type_node));
> > > +
> > > +  /* Copy across relevant vectorization info and associate DR with the
> > > +     new pattern statement instead of the original statement.  */
> > > +  stmt_vec_info pattern_stmt_info = loop_vinfo->add_stmt (call);
> > > +  loop_vinfo->move_dr (pattern_stmt_info, stmt_vinfo);
> > > +
> > > +  *type_out = vectype;
> > > +  return call;
> > > +}
> > > +
> > >  /* Return true if TYPE is a non-boolean integer type.  These are the types
> > >     that we want to consider for narrowing.  */
> > >
> > > @@ -7061,6 +7232,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] =
> > {
> > >       of mask conversion that are needed for gather and scatter
> > >       internal functions.  */
> > >    { vect_recog_gather_scatter_pattern, "gather_scatter" },
> > > +  { vect_recog_cond_store_pattern, "cond_store" },
> > >    { vect_recog_mask_conversion_pattern, "mask_conversion" },
> > >    { vect_recog_widen_plus_pattern, "widen_plus" },
> > >    { vect_recog_widen_minus_pattern, "widen_minus" },
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)