public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)
@ 2018-11-13 23:37 Jakub Jelinek
  2018-11-14  8:29 ` Richard Biener
  2018-11-21 10:20 ` Patch ping (Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)) Jakub Jelinek
  0 siblings, 2 replies; 8+ messages in thread
From: Jakub Jelinek @ 2018-11-13 23:37 UTC (permalink / raw)
  To: Uros Bizjak, Richard Biener, Jeff Law; +Cc: gcc-patches

Hi!

As mentioned in the PR, it is unclear if zero_extract is well defined
if the second argument is 0.  x86 intrinsic require bzhi and bextr to be
well defined in those cases (extraction of 0 bits results in 0), but
e.g. combiner hapilly transforms that into << 64 >> 64, simplify-rtx.c,
while it folds it into 0, might invoke UB in the compiler etc.

So, is (zero_extract x 0 y) well defined in the middle-end or not?
If it is well defined, then what about (sign_extract x 0 y), does it also
yield 0, undefined, something else?

If neither of those are well defined, the following patch provides a backend
workaround for that, by making sure the second argument of zero_extract is
never 0 on x86.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2018-11-13  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/87817
	* config/i386/i386.md (nmi2_bzhi_<mode>3, *bmi2_bzhi_<mode>3,
	*bmi2_bzhi_<mode>3_1, *bmi2_bzhi_<mode>3_1_ccz): Use IF_THEN_ELSE
	in the pattern to avoid triggering UB when operands[2] is zero.
	(tbm_bextri_<mode>): New expander.  Renamed the old define_insn to ...
	(*tbm_bextri_<mode>): ... this.

--- gcc/config/i386/i386.md.jj	2018-11-09 14:02:00.030267540 +0100
+++ gcc/config/i386/i386.md	2018-11-13 15:45:33.034870609 +0100
@@ -13614,12 +13614,15 @@ (define_insn "*bmi_blsr_<mode>_ccz"
 (define_expand "bmi2_bzhi_<mode>3"
   [(parallel
     [(set (match_operand:SWI48 0 "register_operand")
-	  (zero_extract:SWI48
-	    (match_operand:SWI48 1 "nonimmediate_operand")
-	    (umin:SWI48
-	      (and:SWI48 (match_operand:SWI48 2 "register_operand")
-			 (const_int 255))
-	      (match_dup 3))
+	  (if_then_else:SWI48
+	    (ne:QI (and:SWI48 (match_operand:SWI48 2 "register_operand")
+			      (const_int 255))
+		   (const_int 0))
+	    (zero_extract:SWI48
+	      (match_operand:SWI48 1 "nonimmediate_operand")
+	      (umin:SWI48 (and:SWI48 (match_dup 2) (const_int 255))
+			  (match_dup 3))
+	      (const_int 0))
 	    (const_int 0)))
      (clobber (reg:CC FLAGS_REG))])]
   "TARGET_BMI2"
@@ -13627,12 +13630,15 @@ (define_expand "bmi2_bzhi_<mode>3"
 
 (define_insn "*bmi2_bzhi_<mode>3"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
-	(zero_extract:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "rm")
-	  (umin:SWI48
-	    (and:SWI48 (match_operand:SWI48 2 "register_operand" "r")
-		       (const_int 255))
-	    (match_operand:SWI48 3 "const_int_operand" "n"))
+	(if_then_else:SWI48
+	  (ne:QI (and:SWI48 (match_operand:SWI48 2 "register_operand" "r")
+			    (const_int 255))
+		 (const_int 0))
+	  (zero_extract:SWI48
+	    (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+	    (umin:SWI48 (and:SWI48 (match_dup 2) (const_int 255))
+			(match_operand:SWI48 3 "const_int_operand" "n"))
+	    (const_int 0))
 	  (const_int 0)))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI2 && INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
@@ -13643,11 +13649,13 @@ (define_insn "*bmi2_bzhi_<mode>3"
 
 (define_insn "*bmi2_bzhi_<mode>3_1"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
-	(zero_extract:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "rm")
-	  (umin:SWI48
-	    (zero_extend:SWI48 (match_operand:QI 2 "register_operand" "r"))
-	    (match_operand:SWI48 3 "const_int_operand" "n"))
+	(if_then_else:SWI48
+	  (ne:QI (match_operand:QI 2 "register_operand" "r") (const_int 0))
+	  (zero_extract:SWI48
+	    (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+	    (umin:SWI48 (zero_extend:SWI48 (match_dup 2))
+			(match_operand:SWI48 3 "const_int_operand" "n"))
+	    (const_int 0))
 	  (const_int 0)))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI2 && INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
@@ -13659,11 +13667,13 @@ (define_insn "*bmi2_bzhi_<mode>3_1"
 (define_insn "*bmi2_bzhi_<mode>3_1_ccz"
   [(set (reg:CCZ FLAGS_REG)
 	(compare:CCZ
-	  (zero_extract:SWI48
-	    (match_operand:SWI48 1 "nonimmediate_operand" "rm")
-	    (umin:SWI48
-	      (zero_extend:SWI48 (match_operand:QI 2 "register_operand" "r"))
-	      (match_operand:SWI48 3 "const_int_operand" "n"))
+	  (if_then_else:SWI48
+	    (ne:QI (match_operand:QI 2 "register_operand" "r") (const_int 0))
+	    (zero_extract:SWI48
+	      (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+	      (umin:SWI48 (zero_extend:SWI48 (match_dup 2))
+			  (match_operand:SWI48 3 "const_int_operand" "n"))
+	      (const_int 0))
 	    (const_int 0))
 	(const_int 0)))
    (clobber (match_scratch:SWI48 0 "=r"))]
@@ -13696,7 +13706,28 @@ (define_insn "bmi2_pext_<mode>3"
    (set_attr "mode" "<MODE>")])
 
 ;; TBM instructions.
-(define_insn "tbm_bextri_<mode>"
+(define_expand "tbm_bextri_<mode>"
+  [(parallel
+    [(set (match_operand:SWI48 0 "register_operand")
+	  (zero_extract:SWI48
+	    (match_operand:SWI48 1 "nonimmediate_operand")
+	    (match_operand 2 "const_0_to_255_operand" "N")
+	    (match_operand 3 "const_0_to_255_operand" "N")))
+     (clobber (reg:CC FLAGS_REG))])]
+  "TARGET_TBM"
+{
+  if (operands[2] == const0_rtx
+      || INTVAL (operands[3]) >= <MODE_SIZE> * BITS_PER_UNIT)
+    {
+      emit_move_insn (operands[0], const0_rtx);
+      DONE;
+    }
+  if (INTVAL (operands[2]) + INTVAL (operands[3])
+      > <MODE_SIZE> * BITS_PER_UNIT)
+    operands[2] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - INTVAL (operands[3]));
+})
+
+(define_insn "*tbm_bextri_<mode>"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
         (zero_extract:SWI48
           (match_operand:SWI48 1 "nonimmediate_operand" "rm")

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)
  2018-11-13 23:37 [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817) Jakub Jelinek
@ 2018-11-14  8:29 ` Richard Biener
  2018-11-14  8:58   ` Jakub Jelinek
  2018-11-14 12:21   ` [PATCH] Fold x86 bzhi with zero last operand " Jakub Jelinek
  2018-11-21 10:20 ` Patch ping (Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)) Jakub Jelinek
  1 sibling, 2 replies; 8+ messages in thread
From: Richard Biener @ 2018-11-14  8:29 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, Jeff Law, gcc-patches

On Wed, 14 Nov 2018, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, it is unclear if zero_extract is well defined
> if the second argument is 0.  x86 intrinsic require bzhi and bextr to be
> well defined in those cases (extraction of 0 bits results in 0), but
> e.g. combiner hapilly transforms that into << 64 >> 64, simplify-rtx.c,
> while it folds it into 0, might invoke UB in the compiler etc.

So where do the zero_extracts come from?  Can we somehow avoid
zero-sized bit-extracts at expansion time by folding them to zero?

> So, is (zero_extract x 0 y) well defined in the middle-end or not?
> If it is well defined, then what about (sign_extract x 0 y), does it also
> yield 0, undefined, something else?

I don't think those should be well-defined.  If we would decide to
nail them down then both should yield zero indeed.

> If neither of those are well defined, the following patch provides a backend
> workaround for that, by making sure the second argument of zero_extract is
> never 0 on x86.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> 2018-11-13  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/87817
> 	* config/i386/i386.md (nmi2_bzhi_<mode>3, *bmi2_bzhi_<mode>3,
> 	*bmi2_bzhi_<mode>3_1, *bmi2_bzhi_<mode>3_1_ccz): Use IF_THEN_ELSE
> 	in the pattern to avoid triggering UB when operands[2] is zero.
> 	(tbm_bextri_<mode>): New expander.  Renamed the old define_insn to ...
> 	(*tbm_bextri_<mode>): ... this.
> 
> --- gcc/config/i386/i386.md.jj	2018-11-09 14:02:00.030267540 +0100
> +++ gcc/config/i386/i386.md	2018-11-13 15:45:33.034870609 +0100
> @@ -13614,12 +13614,15 @@ (define_insn "*bmi_blsr_<mode>_ccz"
>  (define_expand "bmi2_bzhi_<mode>3"
>    [(parallel
>      [(set (match_operand:SWI48 0 "register_operand")
> -	  (zero_extract:SWI48
> -	    (match_operand:SWI48 1 "nonimmediate_operand")
> -	    (umin:SWI48
> -	      (and:SWI48 (match_operand:SWI48 2 "register_operand")
> -			 (const_int 255))
> -	      (match_dup 3))
> +	  (if_then_else:SWI48
> +	    (ne:QI (and:SWI48 (match_operand:SWI48 2 "register_operand")
> +			      (const_int 255))
> +		   (const_int 0))
> +	    (zero_extract:SWI48
> +	      (match_operand:SWI48 1 "nonimmediate_operand")
> +	      (umin:SWI48 (and:SWI48 (match_dup 2) (const_int 255))
> +			  (match_dup 3))
> +	      (const_int 0))
>  	    (const_int 0)))
>       (clobber (reg:CC FLAGS_REG))])]
>    "TARGET_BMI2"
> @@ -13627,12 +13630,15 @@ (define_expand "bmi2_bzhi_<mode>3"
>  
>  (define_insn "*bmi2_bzhi_<mode>3"
>    [(set (match_operand:SWI48 0 "register_operand" "=r")
> -	(zero_extract:SWI48
> -	  (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> -	  (umin:SWI48
> -	    (and:SWI48 (match_operand:SWI48 2 "register_operand" "r")
> -		       (const_int 255))
> -	    (match_operand:SWI48 3 "const_int_operand" "n"))
> +	(if_then_else:SWI48
> +	  (ne:QI (and:SWI48 (match_operand:SWI48 2 "register_operand" "r")
> +			    (const_int 255))
> +		 (const_int 0))
> +	  (zero_extract:SWI48
> +	    (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> +	    (umin:SWI48 (and:SWI48 (match_dup 2) (const_int 255))
> +			(match_operand:SWI48 3 "const_int_operand" "n"))
> +	    (const_int 0))
>  	  (const_int 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_BMI2 && INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
> @@ -13643,11 +13649,13 @@ (define_insn "*bmi2_bzhi_<mode>3"
>  
>  (define_insn "*bmi2_bzhi_<mode>3_1"
>    [(set (match_operand:SWI48 0 "register_operand" "=r")
> -	(zero_extract:SWI48
> -	  (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> -	  (umin:SWI48
> -	    (zero_extend:SWI48 (match_operand:QI 2 "register_operand" "r"))
> -	    (match_operand:SWI48 3 "const_int_operand" "n"))
> +	(if_then_else:SWI48
> +	  (ne:QI (match_operand:QI 2 "register_operand" "r") (const_int 0))
> +	  (zero_extract:SWI48
> +	    (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> +	    (umin:SWI48 (zero_extend:SWI48 (match_dup 2))
> +			(match_operand:SWI48 3 "const_int_operand" "n"))
> +	    (const_int 0))
>  	  (const_int 0)))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_BMI2 && INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
> @@ -13659,11 +13667,13 @@ (define_insn "*bmi2_bzhi_<mode>3_1"
>  (define_insn "*bmi2_bzhi_<mode>3_1_ccz"
>    [(set (reg:CCZ FLAGS_REG)
>  	(compare:CCZ
> -	  (zero_extract:SWI48
> -	    (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> -	    (umin:SWI48
> -	      (zero_extend:SWI48 (match_operand:QI 2 "register_operand" "r"))
> -	      (match_operand:SWI48 3 "const_int_operand" "n"))
> +	  (if_then_else:SWI48
> +	    (ne:QI (match_operand:QI 2 "register_operand" "r") (const_int 0))
> +	    (zero_extract:SWI48
> +	      (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> +	      (umin:SWI48 (zero_extend:SWI48 (match_dup 2))
> +			  (match_operand:SWI48 3 "const_int_operand" "n"))
> +	      (const_int 0))
>  	    (const_int 0))
>  	(const_int 0)))
>     (clobber (match_scratch:SWI48 0 "=r"))]
> @@ -13696,7 +13706,28 @@ (define_insn "bmi2_pext_<mode>3"
>     (set_attr "mode" "<MODE>")])
>  
>  ;; TBM instructions.
> -(define_insn "tbm_bextri_<mode>"
> +(define_expand "tbm_bextri_<mode>"
> +  [(parallel
> +    [(set (match_operand:SWI48 0 "register_operand")
> +	  (zero_extract:SWI48
> +	    (match_operand:SWI48 1 "nonimmediate_operand")
> +	    (match_operand 2 "const_0_to_255_operand" "N")
> +	    (match_operand 3 "const_0_to_255_operand" "N")))
> +     (clobber (reg:CC FLAGS_REG))])]
> +  "TARGET_TBM"
> +{
> +  if (operands[2] == const0_rtx
> +      || INTVAL (operands[3]) >= <MODE_SIZE> * BITS_PER_UNIT)
> +    {
> +      emit_move_insn (operands[0], const0_rtx);
> +      DONE;
> +    }
> +  if (INTVAL (operands[2]) + INTVAL (operands[3])
> +      > <MODE_SIZE> * BITS_PER_UNIT)
> +    operands[2] = GEN_INT (<MODE_SIZE> * BITS_PER_UNIT - INTVAL (operands[3]));
> +})
> +
> +(define_insn "*tbm_bextri_<mode>"
>    [(set (match_operand:SWI48 0 "register_operand" "=r")
>          (zero_extract:SWI48
>            (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)
  2018-11-14  8:29 ` Richard Biener
@ 2018-11-14  8:58   ` Jakub Jelinek
  2018-11-14 12:21   ` [PATCH] Fold x86 bzhi with zero last operand " Jakub Jelinek
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Jelinek @ 2018-11-14  8:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, Jeff Law, gcc-patches

On Wed, Nov 14, 2018 at 09:29:15AM +0100, Richard Biener wrote:
> On Wed, 14 Nov 2018, Jakub Jelinek wrote:
> > As mentioned in the PR, it is unclear if zero_extract is well defined
> > if the second argument is 0.  x86 intrinsic require bzhi and bextr to be
> > well defined in those cases (extraction of 0 bits results in 0), but
> > e.g. combiner hapilly transforms that into << 64 >> 64, simplify-rtx.c,
> > while it folds it into 0, might invoke UB in the compiler etc.
> 
> So where do the zero_extracts come from?  Can we somehow avoid
> zero-sized bit-extracts at expansion time by folding them to zero?

That is what the patch does for tbm_bextri.  Both the size and position
arguments must be constant in that instruction, so if for some reason
0 size or position >= bitsize makes it through (there is gimple folder to
optimize that), the expander will move 0 into the destination, plus
canonicalize if size + position > bitsize otherwise, so that
size + position == bitsize.

The patch can't do this for bzhi, where the operand is register_operand.
Even if we add a gimple folder to handle this (will do it, but didn't want
to test it together with the patch), the problematic constant can appear
there during RTL optimizations.  Which is why I've used if_then_else,
so if constant appears there, we fold it right.

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Fold x86 bzhi with zero last operand (PR rtl-optimization/87817)
  2018-11-14  8:29 ` Richard Biener
  2018-11-14  8:58   ` Jakub Jelinek
@ 2018-11-14 12:21   ` Jakub Jelinek
  2018-11-14 12:28     ` Uros Bizjak
  2018-11-14 12:38     ` Richard Biener
  1 sibling, 2 replies; 8+ messages in thread
From: Jakub Jelinek @ 2018-11-14 12:21 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Biener, gcc-patches

On Wed, Nov 14, 2018 at 09:29:15AM +0100, Richard Biener wrote:
> So where do the zero_extracts come from?  Can we somehow avoid
> zero-sized bit-extracts at expansion time by folding them to zero?

The following patch implements the missing folding.
Note, as I've mentioned, it is just an optimization, as if we are unlucky
enough, a constant could be propagated to the insn only at RTL optimization
time.  So the other patch is needed, even when this patch also fixes the
bmi2-bzhi-2.c testcase execution.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-11-14  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/87817
	* config/i386/i386.c (ix86_fold_builtin): For _bzhi_u{32,64} if
	last argument has low 8 bits clear, fold to 0.

	* gcc.target/i386/bmi2-bzhi-3.c (main): Add a couple of new tests.

--- gcc/config/i386/i386.c.jj	2018-11-14 00:55:50.199357949 +0100
+++ gcc/config/i386/i386.c	2018-11-14 10:28:29.492491865 +0100
@@ -32671,6 +32671,8 @@ ix86_fold_builtin (tree fndecl, int n_ar
 	      unsigned int idx = tree_to_uhwi (args[1]) & 0xff;
 	      if (idx >= TYPE_PRECISION (TREE_TYPE (args[0])))
 		return args[0];
+	      if (idx == 0)
+		return build_int_cst (TREE_TYPE (TREE_TYPE (fndecl)), 0);
 	      if (!tree_fits_uhwi_p (args[0]))
 		break;
 	      unsigned HOST_WIDE_INT res = tree_to_uhwi (args[0]);
--- gcc/testsuite/gcc.target/i386/bmi2-bzhi-3.c.jj	2016-10-31 13:28:07.961422421 +0100
+++ gcc/testsuite/gcc.target/i386/bmi2-bzhi-3.c	2018-11-14 10:38:40.352417418 +0100
@@ -58,7 +58,11 @@ main ()
     link_error ();
   if (_bzhi_u32 (c, 32) != c
       || _bzhi_u32 (c, 64) != c
-      || _bzhi_u32 (c, 255) != c)
+      || _bzhi_u32 (c, 255) != c
+      || _bzhi_u32 (c, 544) != c
+      || _bzhi_u32 (c, 0) != 0
+      || _bzhi_u32 (c, 256) != 0
+      || _bzhi_u32 (c, 1024) != 0)
     link_error ();
 #ifdef __x86_64__
   if (f21 () != 0 || f22 (-1ULL) != 0
@@ -70,7 +74,11 @@ main ()
       || f33 () != -1ULL || f34 (-1ULL) != -1ULL)
     link_error ();
   if (_bzhi_u64 (d, 64) != d
-      || _bzhi_u64 (d, 255) != d)
+      || _bzhi_u64 (d, 255) != d
+      || _bzhi_u64 (d, 576) != d
+      || _bzhi_u64 (d, 0) != 0
+      || _bzhi_u64 (d, 256) != 0
+      || _bzhi_u64 (d, 512) != 0)
     link_error ();
 #endif
   return 0;


	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fold x86 bzhi with zero last operand (PR rtl-optimization/87817)
  2018-11-14 12:21   ` [PATCH] Fold x86 bzhi with zero last operand " Jakub Jelinek
@ 2018-11-14 12:28     ` Uros Bizjak
  2018-11-14 12:38     ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2018-11-14 12:28 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Guenther, gcc-patches

On Wed, Nov 14, 2018 at 1:21 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Nov 14, 2018 at 09:29:15AM +0100, Richard Biener wrote:
> > So where do the zero_extracts come from?  Can we somehow avoid
> > zero-sized bit-extracts at expansion time by folding them to zero?
>
> The following patch implements the missing folding.
> Note, as I've mentioned, it is just an optimization, as if we are unlucky
> enough, a constant could be propagated to the insn only at RTL optimization
> time.  So the other patch is needed, even when this patch also fixes the
> bmi2-bzhi-2.c testcase execution.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-11-14  Jakub Jelinek  <jakub@redhat.com>
>
>         PR rtl-optimization/87817
>         * config/i386/i386.c (ix86_fold_builtin): For _bzhi_u{32,64} if
>         last argument has low 8 bits clear, fold to 0.
>
>         * gcc.target/i386/bmi2-bzhi-3.c (main): Add a couple of new tests.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2018-11-14 00:55:50.199357949 +0100
> +++ gcc/config/i386/i386.c      2018-11-14 10:28:29.492491865 +0100
> @@ -32671,6 +32671,8 @@ ix86_fold_builtin (tree fndecl, int n_ar
>               unsigned int idx = tree_to_uhwi (args[1]) & 0xff;
>               if (idx >= TYPE_PRECISION (TREE_TYPE (args[0])))
>                 return args[0];
> +             if (idx == 0)
> +               return build_int_cst (TREE_TYPE (TREE_TYPE (fndecl)), 0);
>               if (!tree_fits_uhwi_p (args[0]))
>                 break;
>               unsigned HOST_WIDE_INT res = tree_to_uhwi (args[0]);
> --- gcc/testsuite/gcc.target/i386/bmi2-bzhi-3.c.jj      2016-10-31 13:28:07.961422421 +0100
> +++ gcc/testsuite/gcc.target/i386/bmi2-bzhi-3.c 2018-11-14 10:38:40.352417418 +0100
> @@ -58,7 +58,11 @@ main ()
>      link_error ();
>    if (_bzhi_u32 (c, 32) != c
>        || _bzhi_u32 (c, 64) != c
> -      || _bzhi_u32 (c, 255) != c)
> +      || _bzhi_u32 (c, 255) != c
> +      || _bzhi_u32 (c, 544) != c
> +      || _bzhi_u32 (c, 0) != 0
> +      || _bzhi_u32 (c, 256) != 0
> +      || _bzhi_u32 (c, 1024) != 0)
>      link_error ();
>  #ifdef __x86_64__
>    if (f21 () != 0 || f22 (-1ULL) != 0
> @@ -70,7 +74,11 @@ main ()
>        || f33 () != -1ULL || f34 (-1ULL) != -1ULL)
>      link_error ();
>    if (_bzhi_u64 (d, 64) != d
> -      || _bzhi_u64 (d, 255) != d)
> +      || _bzhi_u64 (d, 255) != d
> +      || _bzhi_u64 (d, 576) != d
> +      || _bzhi_u64 (d, 0) != 0
> +      || _bzhi_u64 (d, 256) != 0
> +      || _bzhi_u64 (d, 512) != 0)
>      link_error ();
>  #endif
>    return 0;
>
>
>         Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fold x86 bzhi with zero last operand (PR rtl-optimization/87817)
  2018-11-14 12:21   ` [PATCH] Fold x86 bzhi with zero last operand " Jakub Jelinek
  2018-11-14 12:28     ` Uros Bizjak
@ 2018-11-14 12:38     ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2018-11-14 12:38 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, gcc-patches

On Wed, 14 Nov 2018, Jakub Jelinek wrote:

> On Wed, Nov 14, 2018 at 09:29:15AM +0100, Richard Biener wrote:
> > So where do the zero_extracts come from?  Can we somehow avoid
> > zero-sized bit-extracts at expansion time by folding them to zero?
> 
> The following patch implements the missing folding.
> Note, as I've mentioned, it is just an optimization, as if we are unlucky
> enough, a constant could be propagated to the insn only at RTL optimization
> time.  So the other patch is needed, even when this patch also fixes the
> bmi2-bzhi-2.c testcase execution.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2018-11-14  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/87817
> 	* config/i386/i386.c (ix86_fold_builtin): For _bzhi_u{32,64} if
> 	last argument has low 8 bits clear, fold to 0.
> 
> 	* gcc.target/i386/bmi2-bzhi-3.c (main): Add a couple of new tests.
> 
> --- gcc/config/i386/i386.c.jj	2018-11-14 00:55:50.199357949 +0100
> +++ gcc/config/i386/i386.c	2018-11-14 10:28:29.492491865 +0100
> @@ -32671,6 +32671,8 @@ ix86_fold_builtin (tree fndecl, int n_ar
>  	      unsigned int idx = tree_to_uhwi (args[1]) & 0xff;
>  	      if (idx >= TYPE_PRECISION (TREE_TYPE (args[0])))
>  		return args[0];
> +	      if (idx == 0)
> +		return build_int_cst (TREE_TYPE (TREE_TYPE (fndecl)), 0);
>  	      if (!tree_fits_uhwi_p (args[0]))
>  		break;
>  	      unsigned HOST_WIDE_INT res = tree_to_uhwi (args[0]);
> --- gcc/testsuite/gcc.target/i386/bmi2-bzhi-3.c.jj	2016-10-31 13:28:07.961422421 +0100
> +++ gcc/testsuite/gcc.target/i386/bmi2-bzhi-3.c	2018-11-14 10:38:40.352417418 +0100
> @@ -58,7 +58,11 @@ main ()
>      link_error ();
>    if (_bzhi_u32 (c, 32) != c
>        || _bzhi_u32 (c, 64) != c
> -      || _bzhi_u32 (c, 255) != c)
> +      || _bzhi_u32 (c, 255) != c
> +      || _bzhi_u32 (c, 544) != c
> +      || _bzhi_u32 (c, 0) != 0
> +      || _bzhi_u32 (c, 256) != 0
> +      || _bzhi_u32 (c, 1024) != 0)
>      link_error ();
>  #ifdef __x86_64__
>    if (f21 () != 0 || f22 (-1ULL) != 0
> @@ -70,7 +74,11 @@ main ()
>        || f33 () != -1ULL || f34 (-1ULL) != -1ULL)
>      link_error ();
>    if (_bzhi_u64 (d, 64) != d
> -      || _bzhi_u64 (d, 255) != d)
> +      || _bzhi_u64 (d, 255) != d
> +      || _bzhi_u64 (d, 576) != d
> +      || _bzhi_u64 (d, 0) != 0
> +      || _bzhi_u64 (d, 256) != 0
> +      || _bzhi_u64 (d, 512) != 0)
>      link_error ();
>  #endif
>    return 0;
> 
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Patch ping (Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817))
  2018-11-13 23:37 [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817) Jakub Jelinek
  2018-11-14  8:29 ` Richard Biener
@ 2018-11-21 10:20 ` Jakub Jelinek
  2018-11-21 10:32   ` Uros Bizjak
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Jelinek @ 2018-11-21 10:20 UTC (permalink / raw)
  To: Uros Bizjak, Richard Biener, Jeff Law; +Cc: gcc-patches

Hi!

On Wed, Nov 14, 2018 at 12:37:02AM +0100, Jakub Jelinek wrote:
> 2018-11-13  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/87817
> 	* config/i386/i386.md (nmi2_bzhi_<mode>3, *bmi2_bzhi_<mode>3,
> 	*bmi2_bzhi_<mode>3_1, *bmi2_bzhi_<mode>3_1_ccz): Use IF_THEN_ELSE
> 	in the pattern to avoid triggering UB when operands[2] is zero.
> 	(tbm_bextri_<mode>): New expander.  Renamed the old define_insn to ...
> 	(*tbm_bextri_<mode>): ... this.

I'd like to ping this patch, while the folding committed for the PR
often triggers and so the RTL passes see literal zero propagated there less
often, e.g. the testcase with:
-O2 -mbmi2 -fno-tree-ccp -fno-tree-forwprop -fno-tree-fre -fno-tree-pre -fno-tree-vrp -fno-tree-dominator-opts -fno-code-hoisting
is still miscompiled and there could be other reasons why a zero appears
only after expansion.

From what I understood, the agreement was that zero_extract with 0 size
(either literal or just at runtime is incorrect in the middle-end).

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Patch ping (Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817))
  2018-11-21 10:20 ` Patch ping (Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)) Jakub Jelinek
@ 2018-11-21 10:32   ` Uros Bizjak
  0 siblings, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2018-11-21 10:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Guenther, Jeff Law, gcc-patches

On Wed, Nov 21, 2018 at 11:20 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> On Wed, Nov 14, 2018 at 12:37:02AM +0100, Jakub Jelinek wrote:
> > 2018-11-13  Jakub Jelinek  <jakub@redhat.com>
> >
> >       PR rtl-optimization/87817
> >       * config/i386/i386.md (nmi2_bzhi_<mode>3, *bmi2_bzhi_<mode>3,
> >       *bmi2_bzhi_<mode>3_1, *bmi2_bzhi_<mode>3_1_ccz): Use IF_THEN_ELSE
> >       in the pattern to avoid triggering UB when operands[2] is zero.
> >       (tbm_bextri_<mode>): New expander.  Renamed the old define_insn to ...
> >       (*tbm_bextri_<mode>): ... this.

OK.

I thought that I already approved the patch. Oh well...

Thanks,
Uros.

> I'd like to ping this patch, while the folding committed for the PR
> often triggers and so the RTL passes see literal zero propagated there less
> often, e.g. the testcase with:
> -O2 -mbmi2 -fno-tree-ccp -fno-tree-forwprop -fno-tree-fre -fno-tree-pre -fno-tree-vrp -fno-tree-dominator-opts -fno-code-hoisting
> is still miscompiled and there could be other reasons why a zero appears
> only after expansion.
>
> From what I understood, the agreement was that zero_extract with 0 size
> (either literal or just at runtime is incorrect in the middle-end).
>
>         Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-11-21 10:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-13 23:37 [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817) Jakub Jelinek
2018-11-14  8:29 ` Richard Biener
2018-11-14  8:58   ` Jakub Jelinek
2018-11-14 12:21   ` [PATCH] Fold x86 bzhi with zero last operand " Jakub Jelinek
2018-11-14 12:28     ` Uros Bizjak
2018-11-14 12:38     ` Richard Biener
2018-11-21 10:20 ` Patch ping (Re: [PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)) Jakub Jelinek
2018-11-21 10:32   ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).