public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Improve PR92819
@ 2019-12-05 13:07 Richard Biener
  2019-12-06  7:52 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Biener @ 2019-12-05 13:07 UTC (permalink / raw)
  To: gcc-patches


After the PR92818 commit there's one function in the testcase below
remaining not using bit-inserts.  The following fixes this.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-12-05  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/92819
	* match.pd (VEC_PERM_EXPR -> BIT_INSERT_EXPR): Handle inserts
	into the last lane.  For two-element vectors try inserting
	into the last lane when inserting into the first fails.

	* gcc.target/i386/pr92819-1.c: New testcase.

Index: gcc/testsuite/gcc.target/i386/pr92819-1.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr92819-1.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr92819-1.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O -msse2 -fdump-tree-forwprop1" } */
+
+typedef double v2df __attribute__((vector_size (16)));
+
+v2df
+foo (v2df x, double *p)
+{
+  return (v2df) { x[0], *p };
+}
+
+v2df
+bar (v2df x, double *p)
+{
+  return (v2df) { *p, x[1] };
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 2 "forwprop1" } } */
+/* { dg-final { scan-assembler "movhpd" } } */
+/* { dg-final { scan-assembler "movlpd" } } */
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 278998)
+++ gcc/match.pd	(working copy)
@@ -6032,7 +6032,8 @@ (define_operator_list COND_TERNARY
 		|| TREE_CODE (cop1) == VECTOR_CST
 		|| TREE_CODE (cop1) == CONSTRUCTOR))
           {
-	    if (sel.series_p (1, 1, nelts + 1, 1))
+	    bool insert_first_p = sel.series_p (1, 1, nelts + 1, 1);
+	    if (insert_first_p)
 	      {
 	        /* After canonicalizing the first elt to come from the
 		   first vector we only can insert the first elt from
@@ -6041,13 +6042,19 @@ (define_operator_list COND_TERNARY
 		if ((ins = fold_read_from_vector (cop0, sel[0])))
 		  op0 = op1;
 	      }
-	    else
+	    /* The above can fail for two-element vectors which always
+	       appear to insert the first element, so try inserting
+	       into the second lane as well.  For more than two
+	       elements that's wasted time.  */
+	    if (!insert_first_p || (!ins && maybe_eq (nelts, 2u)))
 	      {
 	        unsigned int encoded_nelts = sel.encoding ().encoded_nelts ();
 		for (at = 0; at < encoded_nelts; ++at)
 		  if (maybe_ne (sel[at], at))
 		    break;
-		if (at < encoded_nelts && sel.series_p (at + 1, 1, at + 1, 1))
+		if (at < encoded_nelts
+		    && (known_eq (at + 1, nelts)
+			|| sel.series_p (at + 1, 1, at + 1, 1)))
 		  {
 		    if (known_lt (poly_uint64 (sel[at]), nelts))
 		      ins = fold_read_from_vector (cop0, sel[at]);

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] Improve PR92819
  2019-12-05 13:07 [PATCH] Improve PR92819 Richard Biener
@ 2019-12-06  7:52 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2019-12-06  7:52 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4204 bytes --]

On Thu, 5 Dec 2019, Richard Biener wrote:

> 
> After the PR92818 commit there's one function in the testcase below
> remaining not using bit-inserts.  The following fixes this.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Committed with the following gcc.target/i386/pr92803.c adjustment:

Index: gcc/testsuite/gcc.target/i386/pr92803.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr92803.c	(revision 278992)
+++ gcc/testsuite/gcc.target/i386/pr92803.c	(working copy)
@@ -31,8 +31,10 @@ barf (v8sf x)
   return (v4sf) { x[4], x[5], 1.0f, 2.0f };
 }
 
-/* We expect all CTORs to turn into permutes, the FP converting ones
+/* For bar we do two inserts, first zero, then convert, then insert *p.  } */
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 2 "forwprop1" } } */
+/* We expect all other CTORs to turn into permutes, the FP converting ones
    to two each with the one with constants possibly elided in the future
    by converting 3.0f and 1.0f "back" to integers.  */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 6 "forwprop1" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 5 "forwprop1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 4 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 3 "forwprop1" { xfail *-*-* } } } */


> Richard.
> 
> 2019-12-05  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/92819
> 	* match.pd (VEC_PERM_EXPR -> BIT_INSERT_EXPR): Handle inserts
> 	into the last lane.  For two-element vectors try inserting
> 	into the last lane when inserting into the first fails.
> 
> 	* gcc.target/i386/pr92819-1.c: New testcase.
> 
> Index: gcc/testsuite/gcc.target/i386/pr92819-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/pr92819-1.c	(nonexistent)
> +++ gcc/testsuite/gcc.target/i386/pr92819-1.c	(working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -msse2 -fdump-tree-forwprop1" } */
> +
> +typedef double v2df __attribute__((vector_size (16)));
> +
> +v2df
> +foo (v2df x, double *p)
> +{
> +  return (v2df) { x[0], *p };
> +}
> +
> +v2df
> +bar (v2df x, double *p)
> +{
> +  return (v2df) { *p, x[1] };
> +}
> +
> +/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 2 "forwprop1" } } */
> +/* { dg-final { scan-assembler "movhpd" } } */
> +/* { dg-final { scan-assembler "movlpd" } } */
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd	(revision 278998)
> +++ gcc/match.pd	(working copy)
> @@ -6032,7 +6032,8 @@ (define_operator_list COND_TERNARY
>  		|| TREE_CODE (cop1) == VECTOR_CST
>  		|| TREE_CODE (cop1) == CONSTRUCTOR))
>            {
> -	    if (sel.series_p (1, 1, nelts + 1, 1))
> +	    bool insert_first_p = sel.series_p (1, 1, nelts + 1, 1);
> +	    if (insert_first_p)
>  	      {
>  	        /* After canonicalizing the first elt to come from the
>  		   first vector we only can insert the first elt from
> @@ -6041,13 +6042,19 @@ (define_operator_list COND_TERNARY
>  		if ((ins = fold_read_from_vector (cop0, sel[0])))
>  		  op0 = op1;
>  	      }
> -	    else
> +	    /* The above can fail for two-element vectors which always
> +	       appear to insert the first element, so try inserting
> +	       into the second lane as well.  For more than two
> +	       elements that's wasted time.  */
> +	    if (!insert_first_p || (!ins && maybe_eq (nelts, 2u)))
>  	      {
>  	        unsigned int encoded_nelts = sel.encoding ().encoded_nelts ();
>  		for (at = 0; at < encoded_nelts; ++at)
>  		  if (maybe_ne (sel[at], at))
>  		    break;
> -		if (at < encoded_nelts && sel.series_p (at + 1, 1, at + 1, 1))
> +		if (at < encoded_nelts
> +		    && (known_eq (at + 1, nelts)
> +			|| sel.series_p (at + 1, 1, at + 1, 1)))
>  		  {
>  		    if (known_lt (poly_uint64 (sel[at]), nelts))
>  		      ins = fold_read_from_vector (cop0, sel[at]);
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-12-06  7:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05 13:07 [PATCH] Improve PR92819 Richard Biener
2019-12-06  7:52 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).