public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, ARM] correctly encode the CC reg data flow
@ 2016-12-18 13:15 Bernd Edlinger
  2017-01-13 13:50 ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 21+ messages in thread
From: Bernd Edlinger @ 2016-12-18 13:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, Kyrill Tkachov, Wilco Dijkstra

[-- Attachment #1: Type: text/plain, Size: 1812 bytes --]

Hi,

this is related to PR77308, the follow-up patch will depend on this one.

When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
before reload, a mis-compilation in libgcc function __gnu_satfractdasq
was discovered, see [1] for more details.

The reason seems to be that when the *arm_cmpdi_insn is directly
followed by a *arm_cmpdi_unsigned instruction, both are split
up into this:

   [(set (reg:CC CC_REGNUM)
         (compare:CC (match_dup 0) (match_dup 1)))
    (parallel [(set (reg:CC CC_REGNUM)
                    (compare:CC (match_dup 3) (match_dup 4)))
               (set (match_dup 2)
                    (minus:SI (match_dup 5)
                             (ltu:SI (reg:CC_C CC_REGNUM) (const_int 
0))))])]

   [(set (reg:CC CC_REGNUM)
         (compare:CC (match_dup 2) (match_dup 3)))
    (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
               (set (reg:CC CC_REGNUM)
                    (compare:CC (match_dup 0) (match_dup 1))))]

The problem is that the reg:CC from the *subsi3_carryin_compare
is not mentioning that the reg:CC is also dependent on the reg:CC
from before.  Therefore the *arm_cmpsi_insn appears to be
redundant and thus got removed, because the data values are identical.

I think that applies to a number of similar pattern where data
flow is happening through the CC reg.

So this is a kind of correctness issue, and should be fixed
independently from the optimization issue PR77308.

Therefore I think the patterns need to specify the true
value that will be in the CC reg, in order for cse to
know what the instructions are really doing.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.


[1] https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00680.html

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr77308-5.diff --]
[-- Type: text/x-patch; name="patch-pr77308-5.diff", Size: 10117 bytes --]

2016-12-10  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/77308
	* config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
	adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
	subsi3_carryin_compare, subsi3_carryin_compare_const,
	negdi2_compare, *negsi2_carryin_compare,
	*arm_cmpdi_insn): Fix the CC reg dataflow.

Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 243515)
+++ gcc/config/arm/arm.md	(working copy)
@@ -669,17 +669,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_V CC_REGNUM)
 		   (ne:CC_V
-		    (plus:DI (plus:DI
-			      (sign_extend:DI (match_dup 4))
-			      (sign_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (sign_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI (plus:SI
-					  (match_dup 4) (match_dup 5))
-					 (ltu:SI (reg:CC_C CC_REGNUM)
-						 (const_int 0))))])]
+		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
+				       (sign_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (sign_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -713,13 +711,13 @@
   [(set (reg:CC_V CC_REGNUM)
 	(ne:CC_V
 	  (plus:DI
-	   (plus:DI
-	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (sign_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+				   (ltu:SI (reg:CC_C CC_REGNUM)
+					   (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
 	(plus:SI
 	 (plus:SI (match_dup 1) (match_dup 2))
@@ -748,17 +746,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_C CC_REGNUM)
 		   (ne:CC_C
-		    (plus:DI (plus:DI
-			      (zero_extend:DI (match_dup 4))
-			      (zero_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (zero_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI
-				 (plus:SI (match_dup 4) (match_dup 5))
-				 (ltu:SI (reg:CC_C CC_REGNUM)
-					 (const_int 0))))])]
+		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
+				       (zero_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (zero_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -777,17 +773,16 @@
   [(set (reg:CC_C CC_REGNUM)
 	(ne:CC_C
 	  (plus:DI
-	   (plus:DI
-	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (zero_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (zero_extend:DI
+	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
-	(plus:SI
-	 (plus:SI (match_dup 1) (match_dup 2))
-	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "adcs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1086,8 +1081,8 @@
 })
 
 (define_insn_and_split "subdi3_compare1"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operand:DI 2 "register_operand" "r")))
    (set (match_operand:DI 0 "register_operand" "=&r")
@@ -1098,10 +1093,14 @@
   [(parallel [(set (reg:CC CC_REGNUM)
 		   (compare:CC (match_dup 1) (match_dup 2)))
 	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 4) (match_dup 5)))
-	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
-			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+   (parallel [(set (reg:CC_C CC_REGNUM)
+		   (compare:CC_C
+		     (zero_extend:DI (match_dup 4))
+		     (plus:DI (zero_extend:DI (match_dup 5))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 3)
+		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
@@ -1156,13 +1155,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
-                    (match_operand:SI 2 "s_register_operand" "r")))
+  [(set (reg:CC_C CC_REGNUM)
+	(compare:CC_C
+	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (minus:SI (match_dup 1)
-                            (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1170,13 +1171,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare_const"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
-                    (match_operand:SI 2 "arm_not_operand" "K")))
+  [(set (reg:CC_C CC_REGNUM)
+	(compare:CC_C
+	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (plus:SI (match_dup 1)
-                           (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, #%B2"
   [(set_attr "conds" "set")
@@ -4633,8 +4636,8 @@
 
 
 (define_insn_and_split "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (const_int 0)
 	  (match_operand:DI 1 "register_operand" "0,r")))
    (set (match_operand:DI 0 "register_operand" "=r,&r")
@@ -4646,8 +4649,12 @@
 		   (compare:CC (const_int 0) (match_dup 1)))
 	      (set (match_dup 0) (minus:SI (const_int 0)
 					   (match_dup 1)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (const_int 0) (match_dup 3)))
+   (parallel [(set (reg:CC_C CC_REGNUM)
+		   (compare:CC_C
+		     (const_int 0)
+		     (plus:DI
+		       (zero_extend:DI (match_dup 3))
+		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
 	     (set (match_dup 2)
 		  (minus:SI
 		   (minus:SI (const_int 0) (match_dup 3))
@@ -4705,12 +4712,14 @@
 )
 
 (define_insn "*negsi2_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (const_int 0)
-		    (match_operand:SI 1 "s_register_operand" "r")))
+  [(set (reg:CC_C CC_REGNUM)
+	(compare:CC_C
+	  (const_int 0)
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-	(minus:SI (minus:SI (const_int 0)
-			    (match_dup 1))
+	(minus:SI (minus:SI (const_int 0) (match_dup 1))
 		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_ARM"
   "rscs\\t%0, %1, #0"
@@ -7359,12 +7368,15 @@
   "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_dup 0) (match_dup 1)))
-   (parallel [(set (reg:CC CC_REGNUM)
-                   (compare:CC (match_dup 3) (match_dup 4)))
-              (set (match_dup 2)
-                   (minus:SI (match_dup 5)
-                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+	(compare:CC (match_dup 0) (match_dup 1)))
+   (parallel [(set (reg:CC_C CC_REGNUM)
+		   (compare:CC_C
+		     (zero_extend:DI (match_dup 3))
+		     (plus:DI (zero_extend:DI (match_dup 4))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 2)
+		   (minus:SI (match_dup 5)
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2016-12-18 13:15 [PATCH, ARM] correctly encode the CC reg data flow Bernd Edlinger
@ 2017-01-13 13:50 ` Richard Earnshaw (lists)
  2017-01-13 16:10   ` Bernd Edlinger
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Earnshaw (lists) @ 2017-01-13 13:50 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

On 18/12/16 12:58, Bernd Edlinger wrote:
> Hi,
> 
> this is related to PR77308, the follow-up patch will depend on this one.
> 
> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
> was discovered, see [1] for more details.
> 
> The reason seems to be that when the *arm_cmpdi_insn is directly
> followed by a *arm_cmpdi_unsigned instruction, both are split
> up into this:
> 
>    [(set (reg:CC CC_REGNUM)
>          (compare:CC (match_dup 0) (match_dup 1)))
>     (parallel [(set (reg:CC CC_REGNUM)
>                     (compare:CC (match_dup 3) (match_dup 4)))
>                (set (match_dup 2)
>                     (minus:SI (match_dup 5)
>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int 
> 0))))])]
> 
>    [(set (reg:CC CC_REGNUM)
>          (compare:CC (match_dup 2) (match_dup 3)))
>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>                (set (reg:CC CC_REGNUM)
>                     (compare:CC (match_dup 0) (match_dup 1))))]
> 
> The problem is that the reg:CC from the *subsi3_carryin_compare
> is not mentioning that the reg:CC is also dependent on the reg:CC
> from before.  Therefore the *arm_cmpsi_insn appears to be
> redundant and thus got removed, because the data values are identical.
> 
> I think that applies to a number of similar pattern where data
> flow is happening through the CC reg.
> 
> So this is a kind of correctness issue, and should be fixed
> independently from the optimization issue PR77308.
> 
> Therefore I think the patterns need to specify the true
> value that will be in the CC reg, in order for cse to
> know what the instructions are really doing.
> 
> 
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
> 

I agree you've found a valid problem here, but I have some issues with
the patch itself.


(define_insn_and_split "subdi3_compare1"
  [(set (reg:CC_NCV CC_REGNUM)
	(compare:CC_NCV
	  (match_operand:DI 1 "register_operand" "r")
	  (match_operand:DI 2 "register_operand" "r")))
   (set (match_operand:DI 0 "register_operand" "=&r")
	(minus:DI (match_dup 1) (match_dup 2)))]
  "TARGET_32BIT"
  "#"
  "&& reload_completed"
  [(parallel [(set (reg:CC CC_REGNUM)
		   (compare:CC (match_dup 1) (match_dup 2)))
	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
   (parallel [(set (reg:CC_C CC_REGNUM)
		   (compare:CC_C
		     (zero_extend:DI (match_dup 4))
		     (plus:DI (zero_extend:DI (match_dup 5))
			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
	      (set (match_dup 3)
		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]


This pattern is now no-longer self consistent in that before the split
the overall result for the condition register is in mode CC_NCV, but
afterwards it is just CC_C.

I think CC_NCV is correct mode (the N, C and V bits all correctly
reflect the result of the 64-bit comparison), but that then implies that
the cc mode of subsi3_carryin_compare is incorrect as well and should in
fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
that CC_NCV is the correct mode for this operation

I'm not sure if there are other consequences that will fall out from
fixing this (it's possible that we might need a change to select_cc_mode
as well).

R.

> 
> Thanks
> Bernd.
> 
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00680.html
> 
> 
> patch-pr77308-5.diff
> 
> 
> 2016-12-10  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	PR target/77308
> 	* config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
> 	adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
> 	subsi3_carryin_compare, subsi3_carryin_compare_const,
> 	negdi2_compare, *negsi2_carryin_compare,
> 	*arm_cmpdi_insn): Fix the CC reg dataflow.
> 
> Index: gcc/config/arm/arm.md
> ===================================================================
> --- gcc/config/arm/arm.md	(revision 243515)
> +++ gcc/config/arm/arm.md	(working copy)
> @@ -669,17 +669,15 @@
>  	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
>     (parallel [(set (reg:CC_V CC_REGNUM)
>  		   (ne:CC_V
> -		    (plus:DI (plus:DI
> -			      (sign_extend:DI (match_dup 4))
> -			      (sign_extend:DI (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -		    (plus:DI (sign_extend:DI
> -			      (plus:SI (match_dup 4) (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> -	     (set (match_dup 3) (plus:SI (plus:SI
> -					  (match_dup 4) (match_dup 5))
> -					 (ltu:SI (reg:CC_C CC_REGNUM)
> -						 (const_int 0))))])]
> +		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
> +				       (sign_extend:DI (match_dup 5)))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +		    (sign_extend:DI
> +		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
> +	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +					  (ltu:SI (reg:CC_C CC_REGNUM)
> +						  (const_int 0))))])]
>    "
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
> @@ -713,13 +711,13 @@
>    [(set (reg:CC_V CC_REGNUM)
>  	(ne:CC_V
>  	  (plus:DI
> -	   (plus:DI
> -	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
> -	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
> -	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -	  (plus:DI (sign_extend:DI
> -		    (plus:SI (match_dup 1) (match_dup 2)))
> -		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	    (plus:DI
> +	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
> +	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +				   (ltu:SI (reg:CC_C CC_REGNUM)
> +					   (const_int 0))))))
>     (set (match_operand:SI 0 "register_operand" "=r")
>  	(plus:SI
>  	 (plus:SI (match_dup 1) (match_dup 2))
> @@ -748,17 +746,15 @@
>  	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
>     (parallel [(set (reg:CC_C CC_REGNUM)
>  		   (ne:CC_C
> -		    (plus:DI (plus:DI
> -			      (zero_extend:DI (match_dup 4))
> -			      (zero_extend:DI (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -		    (plus:DI (zero_extend:DI
> -			      (plus:SI (match_dup 4) (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> -	     (set (match_dup 3) (plus:SI
> -				 (plus:SI (match_dup 4) (match_dup 5))
> -				 (ltu:SI (reg:CC_C CC_REGNUM)
> -					 (const_int 0))))])]
> +		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
> +				       (zero_extend:DI (match_dup 5)))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +		    (zero_extend:DI
> +		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
> +	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +					  (ltu:SI (reg:CC_C CC_REGNUM)
> +						  (const_int 0))))])]
>    "
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
> @@ -777,17 +773,16 @@
>    [(set (reg:CC_C CC_REGNUM)
>  	(ne:CC_C
>  	  (plus:DI
> -	   (plus:DI
> -	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> -	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
> -	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -	  (plus:DI (zero_extend:DI
> -		    (plus:SI (match_dup 1) (match_dup 2)))
> -		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	    (plus:DI
> +	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> +	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +	  (zero_extend:DI
> +	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
>     (set (match_operand:SI 0 "register_operand" "=r")
> -	(plus:SI
> -	 (plus:SI (match_dup 1) (match_dup 2))
> -	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "adcs%?\\t%0, %1, %2"
>    [(set_attr "conds" "set")
> @@ -1086,8 +1081,8 @@
>  })
>  
>  (define_insn_and_split "subdi3_compare1"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC
> +  [(set (reg:CC_NCV CC_REGNUM)
> +	(compare:CC_NCV
>  	  (match_operand:DI 1 "register_operand" "r")
>  	  (match_operand:DI 2 "register_operand" "r")))
>     (set (match_operand:DI 0 "register_operand" "=&r")
> @@ -1098,10 +1093,14 @@
>    [(parallel [(set (reg:CC CC_REGNUM)
>  		   (compare:CC (match_dup 1) (match_dup 2)))
>  	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
> -   (parallel [(set (reg:CC CC_REGNUM)
> -		   (compare:CC (match_dup 4) (match_dup 5)))
> -	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
> -			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
> +   (parallel [(set (reg:CC_C CC_REGNUM)
> +		   (compare:CC_C
> +		     (zero_extend:DI (match_dup 4))
> +		     (plus:DI (zero_extend:DI (match_dup 5))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	      (set (match_dup 3)
> +		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
> +			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
>      operands[0] = gen_lowpart (SImode, operands[0]);
> @@ -1156,13 +1155,15 @@
>  )
>  
>  (define_insn "*subsi3_carryin_compare"
> -  [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
> -                    (match_operand:SI 2 "s_register_operand" "r")))
> +  [(set (reg:CC_C CC_REGNUM)
> +	(compare:CC_C
> +	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -        (minus:SI (minus:SI (match_dup 1)
> -                            (match_dup 2))
> -                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
> +		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "sbcs\\t%0, %1, %2"
>    [(set_attr "conds" "set")
> @@ -1170,13 +1171,15 @@
>  )
>  
>  (define_insn "*subsi3_carryin_compare_const"
> -  [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
> -                    (match_operand:SI 2 "arm_not_operand" "K")))
> +  [(set (reg:CC_C CC_REGNUM)
> +	(compare:CC_C
> +	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -        (minus:SI (plus:SI (match_dup 1)
> -                           (match_dup 2))
> -                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "sbcs\\t%0, %1, #%B2"
>    [(set_attr "conds" "set")
> @@ -4633,8 +4636,8 @@
>  
>  
>  (define_insn_and_split "negdi2_compare"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC
> +  [(set (reg:CC_NCV CC_REGNUM)
> +	(compare:CC_NCV
>  	  (const_int 0)
>  	  (match_operand:DI 1 "register_operand" "0,r")))
>     (set (match_operand:DI 0 "register_operand" "=r,&r")
> @@ -4646,8 +4649,12 @@
>  		   (compare:CC (const_int 0) (match_dup 1)))
>  	      (set (match_dup 0) (minus:SI (const_int 0)
>  					   (match_dup 1)))])
> -   (parallel [(set (reg:CC CC_REGNUM)
> -		   (compare:CC (const_int 0) (match_dup 3)))
> +   (parallel [(set (reg:CC_C CC_REGNUM)
> +		   (compare:CC_C
> +		     (const_int 0)
> +		     (plus:DI
> +		       (zero_extend:DI (match_dup 3))
> +		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>  	     (set (match_dup 2)
>  		  (minus:SI
>  		   (minus:SI (const_int 0) (match_dup 3))
> @@ -4705,12 +4712,14 @@
>  )
>  
>  (define_insn "*negsi2_carryin_compare"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC (const_int 0)
> -		    (match_operand:SI 1 "s_register_operand" "r")))
> +  [(set (reg:CC_C CC_REGNUM)
> +	(compare:CC_C
> +	  (const_int 0)
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -	(minus:SI (minus:SI (const_int 0)
> -			    (match_dup 1))
> +	(minus:SI (minus:SI (const_int 0) (match_dup 1))
>  		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_ARM"
>    "rscs\\t%0, %1, #0"
> @@ -7359,12 +7368,15 @@
>    "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
>    "&& reload_completed"
>    [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_dup 0) (match_dup 1)))
> -   (parallel [(set (reg:CC CC_REGNUM)
> -                   (compare:CC (match_dup 3) (match_dup 4)))
> -              (set (match_dup 2)
> -                   (minus:SI (match_dup 5)
> -                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
> +	(compare:CC (match_dup 0) (match_dup 1)))
> +   (parallel [(set (reg:CC_C CC_REGNUM)
> +		   (compare:CC_C
> +		     (zero_extend:DI (match_dup 3))
> +		     (plus:DI (zero_extend:DI (match_dup 4))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	      (set (match_dup 2)
> +		   (minus:SI (match_dup 5)
> +			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
>      operands[0] = gen_lowpart (SImode, operands[0]);
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-01-13 13:50 ` Richard Earnshaw (lists)
@ 2017-01-13 16:10   ` Bernd Edlinger
  2017-01-13 18:29     ` Bernd Edlinger
  0 siblings, 1 reply; 21+ messages in thread
From: Bernd Edlinger @ 2017-01-13 16:10 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
> On 18/12/16 12:58, Bernd Edlinger wrote:
>> Hi,
>>
>> this is related to PR77308, the follow-up patch will depend on this one.
>>
>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>> was discovered, see [1] for more details.
>>
>> The reason seems to be that when the *arm_cmpdi_insn is directly
>> followed by a *arm_cmpdi_unsigned instruction, both are split
>> up into this:
>>
>>    [(set (reg:CC CC_REGNUM)
>>          (compare:CC (match_dup 0) (match_dup 1)))
>>     (parallel [(set (reg:CC CC_REGNUM)
>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>                (set (match_dup 2)
>>                     (minus:SI (match_dup 5)
>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>> 0))))])]
>>
>>    [(set (reg:CC CC_REGNUM)
>>          (compare:CC (match_dup 2) (match_dup 3)))
>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>                (set (reg:CC CC_REGNUM)
>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>
>> The problem is that the reg:CC from the *subsi3_carryin_compare
>> is not mentioning that the reg:CC is also dependent on the reg:CC
>> from before.  Therefore the *arm_cmpsi_insn appears to be
>> redundant and thus got removed, because the data values are identical.
>>
>> I think that applies to a number of similar pattern where data
>> flow is happening through the CC reg.
>>
>> So this is a kind of correctness issue, and should be fixed
>> independently from the optimization issue PR77308.
>>
>> Therefore I think the patterns need to specify the true
>> value that will be in the CC reg, in order for cse to
>> know what the instructions are really doing.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>
> I agree you've found a valid problem here, but I have some issues with
> the patch itself.
>
>
> (define_insn_and_split "subdi3_compare1"
>   [(set (reg:CC_NCV CC_REGNUM)
> 	(compare:CC_NCV
> 	  (match_operand:DI 1 "register_operand" "r")
> 	  (match_operand:DI 2 "register_operand" "r")))
>    (set (match_operand:DI 0 "register_operand" "=&r")
> 	(minus:DI (match_dup 1) (match_dup 2)))]
>   "TARGET_32BIT"
>   "#"
>   "&& reload_completed"
>   [(parallel [(set (reg:CC CC_REGNUM)
> 		   (compare:CC (match_dup 1) (match_dup 2)))
> 	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>    (parallel [(set (reg:CC_C CC_REGNUM)
> 		   (compare:CC_C
> 		     (zero_extend:DI (match_dup 4))
> 		     (plus:DI (zero_extend:DI (match_dup 5))
> 			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> 	      (set (match_dup 3)
> 		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
> 			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>
>
> This pattern is now no-longer self consistent in that before the split
> the overall result for the condition register is in mode CC_NCV, but
> afterwards it is just CC_C.
>
> I think CC_NCV is correct mode (the N, C and V bits all correctly
> reflect the result of the 64-bit comparison), but that then implies that
> the cc mode of subsi3_carryin_compare is incorrect as well and should in
> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
> that CC_NCV is the correct mode for this operation
>
> I'm not sure if there are other consequences that will fall out from
> fixing this (it's possible that we might need a change to select_cc_mode
> as well).
>

Yes, this is still a bit awkward...

The N and V bit will be the correct result for the subdi3_compare1
a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
only gets the C bit correct, the expression for N and V is a different
one.

It probably works, because the subsi3_carryin_compare instruction sets
more CC bits than the pattern does explicitly specify the value.
We know the subsi3_carryin_compare also computes the NV bits, but it is
hard to write down the correct rtl expression for it.

In theory the pattern should describe everything correctly,
maybe, like:

set (reg:CC_C CC_REGNUM)
     (compare:CC_C
       (zero_extend:DI (match_dup 4))
       (plus:DI (zero_extend:DI (match_dup 5))
                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
set (reg:CC_NV CC_REGNUM)
     (compare:CC_NV
      (match_dup 4))
      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
set (match_dup 3)
     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))


But I doubt that will work to set CC_REGNUM with two different modes
in parallel?

Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
defines C from the DImode result, and NV from the SImode result,
similar to the CC_NOOVmode, that also leaves something open what
bits it really defines?


What do you think?


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-01-13 16:10   ` Bernd Edlinger
@ 2017-01-13 18:29     ` Bernd Edlinger
  2017-01-18 15:43       ` Bernd Edlinger
  0 siblings, 1 reply; 21+ messages in thread
From: Bernd Edlinger @ 2017-01-13 18:29 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

[-- Attachment #1: Type: text/plain, Size: 6110 bytes --]

On 01/13/17 17:10, Bernd Edlinger wrote:
> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is related to PR77308, the follow-up patch will depend on this one.
>>>
>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>>> was discovered, see [1] for more details.
>>>
>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>> up into this:
>>>
>>>    [(set (reg:CC CC_REGNUM)
>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>                (set (match_dup 2)
>>>                     (minus:SI (match_dup 5)
>>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>> 0))))])]
>>>
>>>    [(set (reg:CC CC_REGNUM)
>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>                (set (reg:CC CC_REGNUM)
>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>
>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>> redundant and thus got removed, because the data values are identical.
>>>
>>> I think that applies to a number of similar pattern where data
>>> flow is happening through the CC reg.
>>>
>>> So this is a kind of correctness issue, and should be fixed
>>> independently from the optimization issue PR77308.
>>>
>>> Therefore I think the patterns need to specify the true
>>> value that will be in the CC reg, in order for cse to
>>> know what the instructions are really doing.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>>
>>
>> I agree you've found a valid problem here, but I have some issues with
>> the patch itself.
>>
>>
>> (define_insn_and_split "subdi3_compare1"
>>   [(set (reg:CC_NCV CC_REGNUM)
>>     (compare:CC_NCV
>>       (match_operand:DI 1 "register_operand" "r")
>>       (match_operand:DI 2 "register_operand" "r")))
>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>   "TARGET_32BIT"
>>   "#"
>>   "&& reload_completed"
>>   [(parallel [(set (reg:CC CC_REGNUM)
>>            (compare:CC (match_dup 1) (match_dup 2)))
>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>            (compare:CC_C
>>              (zero_extend:DI (match_dup 4))
>>              (plus:DI (zero_extend:DI (match_dup 5))
>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>           (set (match_dup 3)
>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>
>>
>> This pattern is now no-longer self consistent in that before the split
>> the overall result for the condition register is in mode CC_NCV, but
>> afterwards it is just CC_C.
>>
>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>> reflect the result of the 64-bit comparison), but that then implies that
>> the cc mode of subsi3_carryin_compare is incorrect as well and should in
>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
>> that CC_NCV is the correct mode for this operation
>>
>> I'm not sure if there are other consequences that will fall out from
>> fixing this (it's possible that we might need a change to select_cc_mode
>> as well).
>>
>
> Yes, this is still a bit awkward...
>
> The N and V bit will be the correct result for the subdi3_compare1
> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
> only gets the C bit correct, the expression for N and V is a different
> one.
>
> It probably works, because the subsi3_carryin_compare instruction sets
> more CC bits than the pattern does explicitly specify the value.
> We know the subsi3_carryin_compare also computes the NV bits, but it is
> hard to write down the correct rtl expression for it.
>
> In theory the pattern should describe everything correctly,
> maybe, like:
>
> set (reg:CC_C CC_REGNUM)
>     (compare:CC_C
>       (zero_extend:DI (match_dup 4))
>       (plus:DI (zero_extend:DI (match_dup 5))
>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> set (reg:CC_NV CC_REGNUM)
>     (compare:CC_NV
>      (match_dup 4))
>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
> set (match_dup 3)
>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>
>
> But I doubt that will work to set CC_REGNUM with two different modes
> in parallel?
>
> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
> defines C from the DImode result, and NV from the SImode result,
> similar to the CC_NOOVmode, that also leaves something open what
> bits it really defines?
>
>
> What do you think?
>
>
> Thanks
> Bernd.

I think maybe the right solution is to invent a new CCmode
that defines C as if the comparison is done in DImode
but N and V as if the comparison is done in SImode.

I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
furthermore I think the CC_NOOV should be renamed to CC_NZ (because
only N and Z are set correctly), but in a different patch of course.

Attached is a new version that implements the new CCmode.

How do you like this new version?

It seems to be able to build a cross-compiler at least.

I will start a new bootstrap with this new patch, but that can take some
time until I have definitive results.

Is there still a chance that it can go into gcc-7 or should it wait
for the next stage1?

Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr77308-5.diff --]
[-- Type: text/x-patch; name="patch-pr77308-5.diff", Size: 10956 bytes --]

2016-01-13  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/77308
	* config/arm/arm-modes.def (CC_NCV_CIC): New mode.
	* config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
	adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
	subsi3_carryin_compare, subsi3_carryin_compare_const,
	negdi2_compare, *negsi2_carryin_compare,
	*arm_cmpdi_insn): Fix the CC reg dataflow.

Index: gcc/config/arm/arm-modes.def
===================================================================
--- gcc/config/arm/arm-modes.def	(revision 244439)
+++ gcc/config/arm/arm-modes.def	(working copy)
@@ -38,6 +38,8 @@
    (used for DImode unsigned comparisons).
    CC_NCVmode should be used if only the N, C, and V flags are correct
    (used for DImode signed comparisons).
+   CC_NCV_CICmode defines N and V in SImode and C in DImode
+   (used for carryin_compare patterns).
    CCmode should be used otherwise.  */
 
 CC_MODE (CC_NOOV);
@@ -44,6 +46,7 @@
 CC_MODE (CC_Z);
 CC_MODE (CC_CZ);
 CC_MODE (CC_NCV);
+CC_MODE (CC_NCV_CIC);
 CC_MODE (CC_SWP);
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 244439)
+++ gcc/config/arm/arm.md	(working copy)
@@ -669,17 +669,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_V CC_REGNUM)
 		   (ne:CC_V
-		    (plus:DI (plus:DI
-			      (sign_extend:DI (match_dup 4))
-			      (sign_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (sign_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI (plus:SI
-					  (match_dup 4) (match_dup 5))
-					 (ltu:SI (reg:CC_C CC_REGNUM)
-						 (const_int 0))))])]
+		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
+				       (sign_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (sign_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -713,13 +711,13 @@
   [(set (reg:CC_V CC_REGNUM)
 	(ne:CC_V
 	  (plus:DI
-	   (plus:DI
-	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (sign_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+				   (ltu:SI (reg:CC_C CC_REGNUM)
+					   (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
 	(plus:SI
 	 (plus:SI (match_dup 1) (match_dup 2))
@@ -748,17 +746,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_C CC_REGNUM)
 		   (ne:CC_C
-		    (plus:DI (plus:DI
-			      (zero_extend:DI (match_dup 4))
-			      (zero_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (zero_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI
-				 (plus:SI (match_dup 4) (match_dup 5))
-				 (ltu:SI (reg:CC_C CC_REGNUM)
-					 (const_int 0))))])]
+		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
+				       (zero_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (zero_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -777,17 +773,16 @@
   [(set (reg:CC_C CC_REGNUM)
 	(ne:CC_C
 	  (plus:DI
-	   (plus:DI
-	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (zero_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (zero_extend:DI
+	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
-	(plus:SI
-	 (plus:SI (match_dup 1) (match_dup 2))
-	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "adcs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1086,8 +1081,8 @@
 })
 
 (define_insn_and_split "subdi3_compare1"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operand:DI 2 "register_operand" "r")))
    (set (match_operand:DI 0 "register_operand" "=&r")
@@ -1098,10 +1093,14 @@
   [(parallel [(set (reg:CC CC_REGNUM)
 		   (compare:CC (match_dup 1) (match_dup 2)))
 	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 4) (match_dup 5)))
-	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
-			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 4))
+		     (plus:DI (zero_extend:DI (match_dup 5))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 3)
+		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
@@ -1157,13 +1156,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
-                    (match_operand:SI 2 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (minus:SI (match_dup 1)
-                            (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1171,13 +1172,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare_const"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
-                    (match_operand:SI 2 "arm_not_operand" "K")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (plus:SI (match_dup 1)
-                           (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, #%B2"
   [(set_attr "conds" "set")
@@ -4634,8 +4637,8 @@
 
 
 (define_insn_and_split "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (const_int 0)
 	  (match_operand:DI 1 "register_operand" "0,r")))
    (set (match_operand:DI 0 "register_operand" "=r,&r")
@@ -4647,8 +4650,12 @@
 		   (compare:CC (const_int 0) (match_dup 1)))
 	      (set (match_dup 0) (minus:SI (const_int 0)
 					   (match_dup 1)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (const_int 0) (match_dup 3)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (const_int 0)
+		     (plus:DI
+		       (zero_extend:DI (match_dup 3))
+		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
 	     (set (match_dup 2)
 		  (minus:SI
 		   (minus:SI (const_int 0) (match_dup 3))
@@ -4707,12 +4714,14 @@
 )
 
 (define_insn "*negsi2_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (const_int 0)
-		    (match_operand:SI 1 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (const_int 0)
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-	(minus:SI (minus:SI (const_int 0)
-			    (match_dup 1))
+	(minus:SI (minus:SI (const_int 0) (match_dup 1))
 		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_ARM"
   "rscs\\t%0, %1, #0"
@@ -7361,12 +7370,15 @@
   "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_dup 0) (match_dup 1)))
-   (parallel [(set (reg:CC CC_REGNUM)
-                   (compare:CC (match_dup 3) (match_dup 4)))
-              (set (match_dup 2)
-                   (minus:SI (match_dup 5)
-                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+	(compare:CC (match_dup 0) (match_dup 1)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 3))
+		     (plus:DI (zero_extend:DI (match_dup 4))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 2)
+		   (minus:SI (match_dup 5)
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-01-13 18:29     ` Bernd Edlinger
@ 2017-01-18 15:43       ` Bernd Edlinger
  2017-04-20 19:10         ` [PING] " Bernd Edlinger
  2017-09-04 13:55         ` Kyrill Tkachov
  0 siblings, 2 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-01-18 15:43 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

[-- Attachment #1: Type: text/plain, Size: 8247 bytes --]

On 01/13/17 19:28, Bernd Edlinger wrote:
> On 01/13/17 17:10, Bernd Edlinger wrote:
>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>> this is related to PR77308, the follow-up patch will depend on this
>>>> one.
>>>>
>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>>>> was discovered, see [1] for more details.
>>>>
>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>> up into this:
>>>>
>>>>    [(set (reg:CC CC_REGNUM)
>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>                (set (match_dup 2)
>>>>                     (minus:SI (match_dup 5)
>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>> 0))))])]
>>>>
>>>>    [(set (reg:CC CC_REGNUM)
>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>                (set (reg:CC CC_REGNUM)
>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>
>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>> redundant and thus got removed, because the data values are identical.
>>>>
>>>> I think that applies to a number of similar pattern where data
>>>> flow is happening through the CC reg.
>>>>
>>>> So this is a kind of correctness issue, and should be fixed
>>>> independently from the optimization issue PR77308.
>>>>
>>>> Therefore I think the patterns need to specify the true
>>>> value that will be in the CC reg, in order for cse to
>>>> know what the instructions are really doing.
>>>>
>>>>
>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>> Is it OK for trunk?
>>>>
>>>
>>> I agree you've found a valid problem here, but I have some issues with
>>> the patch itself.
>>>
>>>
>>> (define_insn_and_split "subdi3_compare1"
>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>     (compare:CC_NCV
>>>       (match_operand:DI 1 "register_operand" "r")
>>>       (match_operand:DI 2 "register_operand" "r")))
>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>   "TARGET_32BIT"
>>>   "#"
>>>   "&& reload_completed"
>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>            (compare:CC_C
>>>              (zero_extend:DI (match_dup 4))
>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>           (set (match_dup 3)
>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>
>>>
>>> This pattern is now no-longer self consistent in that before the split
>>> the overall result for the condition register is in mode CC_NCV, but
>>> afterwards it is just CC_C.
>>>
>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>> reflect the result of the 64-bit comparison), but that then implies that
>>> the cc mode of subsi3_carryin_compare is incorrect as well and should in
>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
>>> that CC_NCV is the correct mode for this operation
>>>
>>> I'm not sure if there are other consequences that will fall out from
>>> fixing this (it's possible that we might need a change to select_cc_mode
>>> as well).
>>>
>>
>> Yes, this is still a bit awkward...
>>
>> The N and V bit will be the correct result for the subdi3_compare1
>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>> only gets the C bit correct, the expression for N and V is a different
>> one.
>>
>> It probably works, because the subsi3_carryin_compare instruction sets
>> more CC bits than the pattern does explicitly specify the value.
>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>> hard to write down the correct rtl expression for it.
>>
>> In theory the pattern should describe everything correctly,
>> maybe, like:
>>
>> set (reg:CC_C CC_REGNUM)
>>     (compare:CC_C
>>       (zero_extend:DI (match_dup 4))
>>       (plus:DI (zero_extend:DI (match_dup 5))
>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>> set (reg:CC_NV CC_REGNUM)
>>     (compare:CC_NV
>>      (match_dup 4))
>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
>> set (match_dup 3)
>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>
>>
>> But I doubt that will work to set CC_REGNUM with two different modes
>> in parallel?
>>
>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>> defines C from the DImode result, and NV from the SImode result,
>> similar to the CC_NOOVmode, that also leaves something open what
>> bits it really defines?
>>
>>
>> What do you think?
>>
>>
>> Thanks
>> Bernd.
>
> I think maybe the right solution is to invent a new CCmode
> that defines C as if the comparison is done in DImode
> but N and V as if the comparison is done in SImode.
>
> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
> only N and Z are set correctly), but in a different patch of course.
>
> Attached is a new version that implements the new CCmode.
>
> How do you like this new version?
>
> It seems to be able to build a cross-compiler at least.
>
> I will start a new bootstrap with this new patch, but that can take some
> time until I have definitive results.
>
> Is there still a chance that it can go into gcc-7 or should it wait
> for the next stage1?
>
> Thanks
> Bernd.


I thought I should also look at where the subdi_compare1 amd the
negdi2_compare patterns are used, and look if the caller is fine with
not having all CC bits available.

And indeed usubv<mode>4 turns out to be questionabe, because it
emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
CCmode) which is inconsistent when subdi3_compare1 no longer uses
CCmode.

To correct this, the branch should use CC_Cmode which is always defined.

So I tried to test this pattern, with the following test programs,
and found that the code actually improves when the branch uses CC_Cmode
instead of CCmode, both for SImode and DImode data, which was a bit
surprising.

I used this test program to see how the usubv<mode>4 pattern works:

cat test.c (DImode)
unsigned long long x, y, z;
int b;
void test()
{
   b = __builtin_sub_overflow (y,z, &x);
}


unpatched code used 8 byte more stack than patched,
because the DImode subtraction is effectively done twice.

cat test1.c (SImode)
unsigned long x, y, z;
int b;
void test()
{
   b = __builtin_sub_overflow (y,z, &x);
}

which generates (unpatched):
         cmp     r3, r0
         sub     ip, r3, r0

instead of expected (patched):
	subs	r3, r3, r2


The condition is extracted by ifconversion and/or combine
and complicates the resulting code instead of simplifying.

I think this happens only when the branch and the subsi/di3_compare1
is using the same CC mode.

That does not happen when the CC modes disagree, as with the
proposed patch.  All other uses of the pattern are already using
CC_Cmode or CC_Vmode in the branch, and these do not change.

Attached is an updated version of the patch, that happens to
improve the code generation of the usubsi4 and usubdi4 pattern,
as a side effect.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr77308-5.diff --]
[-- Type: text/x-patch; name="patch-pr77308-5.diff", Size: 11237 bytes --]

2016-01-13  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/77308
	* config/arm/arm-modes.def (CC_NCV_CIC): New mode.
	* config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
	adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
	subsi3_carryin_compare, subsi3_carryin_compare_const,
	negdi2_compare, *negsi2_carryin_compare,
	*arm_cmpdi_insn): Fix the CC reg dataflow.
	(usubv<mode>4): Use CC_Cmode for the branch.

Index: gcc/config/arm/arm-modes.def
===================================================================
--- gcc/config/arm/arm-modes.def	(revision 244439)
+++ gcc/config/arm/arm-modes.def	(working copy)
@@ -38,6 +38,8 @@
    (used for DImode unsigned comparisons).
    CC_NCVmode should be used if only the N, C, and V flags are correct
    (used for DImode signed comparisons).
+   CC_NCV_CICmode defines N and V in SImode and C in DImode
+   (used for carryin_compare patterns).
    CCmode should be used otherwise.  */
 
 CC_MODE (CC_NOOV);
@@ -44,6 +46,7 @@
 CC_MODE (CC_Z);
 CC_MODE (CC_CZ);
 CC_MODE (CC_NCV);
+CC_MODE (CC_NCV_CIC);
 CC_MODE (CC_SWP);
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 244439)
+++ gcc/config/arm/arm.md	(working copy)
@@ -669,17 +669,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_V CC_REGNUM)
 		   (ne:CC_V
-		    (plus:DI (plus:DI
-			      (sign_extend:DI (match_dup 4))
-			      (sign_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (sign_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI (plus:SI
-					  (match_dup 4) (match_dup 5))
-					 (ltu:SI (reg:CC_C CC_REGNUM)
-						 (const_int 0))))])]
+		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
+				       (sign_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (sign_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -713,13 +711,13 @@
   [(set (reg:CC_V CC_REGNUM)
 	(ne:CC_V
 	  (plus:DI
-	   (plus:DI
-	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (sign_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+				   (ltu:SI (reg:CC_C CC_REGNUM)
+					   (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
 	(plus:SI
 	 (plus:SI (match_dup 1) (match_dup 2))
@@ -748,17 +746,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_C CC_REGNUM)
 		   (ne:CC_C
-		    (plus:DI (plus:DI
-			      (zero_extend:DI (match_dup 4))
-			      (zero_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (zero_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI
-				 (plus:SI (match_dup 4) (match_dup 5))
-				 (ltu:SI (reg:CC_C CC_REGNUM)
-					 (const_int 0))))])]
+		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
+				       (zero_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (zero_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -777,17 +773,16 @@
   [(set (reg:CC_C CC_REGNUM)
 	(ne:CC_C
 	  (plus:DI
-	   (plus:DI
-	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (zero_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (zero_extend:DI
+	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
-	(plus:SI
-	 (plus:SI (match_dup 1) (match_dup 2))
-	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "adcs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1080,14 +1075,14 @@
   "TARGET_32BIT"
 {
   emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
-  arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
+  arm_gen_unlikely_cbranch (EQ, CC_Cmode, operands[3]);
 
   DONE;
 })
 
 (define_insn_and_split "subdi3_compare1"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operand:DI 2 "register_operand" "r")))
    (set (match_operand:DI 0 "register_operand" "=&r")
@@ -1098,10 +1093,14 @@
   [(parallel [(set (reg:CC CC_REGNUM)
 		   (compare:CC (match_dup 1) (match_dup 2)))
 	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 4) (match_dup 5)))
-	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
-			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 4))
+		     (plus:DI (zero_extend:DI (match_dup 5))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 3)
+		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
@@ -1157,13 +1156,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
-                    (match_operand:SI 2 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (minus:SI (match_dup 1)
-                            (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1171,13 +1172,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare_const"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
-                    (match_operand:SI 2 "arm_not_operand" "K")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (plus:SI (match_dup 1)
-                           (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, #%B2"
   [(set_attr "conds" "set")
@@ -4634,8 +4637,8 @@
 
 
 (define_insn_and_split "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (const_int 0)
 	  (match_operand:DI 1 "register_operand" "0,r")))
    (set (match_operand:DI 0 "register_operand" "=r,&r")
@@ -4647,8 +4650,12 @@
 		   (compare:CC (const_int 0) (match_dup 1)))
 	      (set (match_dup 0) (minus:SI (const_int 0)
 					   (match_dup 1)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (const_int 0) (match_dup 3)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (const_int 0)
+		     (plus:DI
+		       (zero_extend:DI (match_dup 3))
+		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
 	     (set (match_dup 2)
 		  (minus:SI
 		   (minus:SI (const_int 0) (match_dup 3))
@@ -4707,12 +4714,14 @@
 )
 
 (define_insn "*negsi2_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (const_int 0)
-		    (match_operand:SI 1 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (const_int 0)
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-	(minus:SI (minus:SI (const_int 0)
-			    (match_dup 1))
+	(minus:SI (minus:SI (const_int 0) (match_dup 1))
 		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_ARM"
   "rscs\\t%0, %1, #0"
@@ -7361,12 +7370,15 @@
   "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_dup 0) (match_dup 1)))
-   (parallel [(set (reg:CC CC_REGNUM)
-                   (compare:CC (match_dup 3) (match_dup 4)))
-              (set (match_dup 2)
-                   (minus:SI (match_dup 5)
-                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+	(compare:CC (match_dup 0) (match_dup 1)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 3))
+		     (plus:DI (zero_extend:DI (match_dup 4))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 2)
+		   (minus:SI (match_dup 5)
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PING] [PATCH, ARM] correctly encode the CC reg data flow
  2017-01-18 15:43       ` Bernd Edlinger
@ 2017-04-20 19:10         ` Bernd Edlinger
  2017-04-29 17:32           ` [PING**2] " Bernd Edlinger
  2017-09-04 13:55         ` Kyrill Tkachov
  1 sibling, 1 reply; 21+ messages in thread
From: Bernd Edlinger @ 2017-04-20 19:10 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

Ping...

for this patch:
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html

On 01/18/17 16:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>> one.
>>>>>
>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>>>>> was discovered, see [1] for more details.
>>>>>
>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>> up into this:
>>>>>
>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>>                (set (match_dup 2)
>>>>>                     (minus:SI (match_dup 5)
>>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>> 0))))])]
>>>>>
>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>                (set (reg:CC CC_REGNUM)
>>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>
>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>> redundant and thus got removed, because the data values are identical.
>>>>>
>>>>> I think that applies to a number of similar pattern where data
>>>>> flow is happening through the CC reg.
>>>>>
>>>>> So this is a kind of correctness issue, and should be fixed
>>>>> independently from the optimization issue PR77308.
>>>>>
>>>>> Therefore I think the patterns need to specify the true
>>>>> value that will be in the CC reg, in order for cse to
>>>>> know what the instructions are really doing.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>> Is it OK for trunk?
>>>>>
>>>>
>>>> I agree you've found a valid problem here, but I have some issues with
>>>> the patch itself.
>>>>
>>>>
>>>> (define_insn_and_split "subdi3_compare1"
>>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>>     (compare:CC_NCV
>>>>       (match_operand:DI 1 "register_operand" "r")
>>>>       (match_operand:DI 2 "register_operand" "r")))
>>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>>   "TARGET_32BIT"
>>>>   "#"
>>>>   "&& reload_completed"
>>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>>            (compare:CC_C
>>>>              (zero_extend:DI (match_dup 4))
>>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>           (set (match_dup 3)
>>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>
>>>>
>>>> This pattern is now no-longer self consistent in that before the split
>>>> the overall result for the condition register is in mode CC_NCV, but
>>>> afterwards it is just CC_C.
>>>>
>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>> reflect the result of the 64-bit comparison), but that then implies
>>>> that
>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>> should in
>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>> agree
>>>> that CC_NCV is the correct mode for this operation
>>>>
>>>> I'm not sure if there are other consequences that will fall out from
>>>> fixing this (it's possible that we might need a change to
>>>> select_cc_mode
>>>> as well).
>>>>
>>>
>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>> only gets the C bit correct, the expression for N and V is a different
>>> one.
>>>
>>> It probably works, because the subsi3_carryin_compare instruction sets
>>> more CC bits than the pattern does explicitly specify the value.
>>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>>> hard to write down the correct rtl expression for it.
>>>
>>> In theory the pattern should describe everything correctly,
>>> maybe, like:
>>>
>>> set (reg:CC_C CC_REGNUM)
>>>     (compare:CC_C
>>>       (zero_extend:DI (match_dup 4))
>>>       (plus:DI (zero_extend:DI (match_dup 5))
>>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>> set (reg:CC_NV CC_REGNUM)
>>>     (compare:CC_NV
>>>      (match_dup 4))
>>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
>>> set (match_dup 3)
>>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>
>>>
>>> But I doubt that will work to set CC_REGNUM with two different modes
>>> in parallel?
>>>
>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>> defines C from the DImode result, and NV from the SImode result,
>>> similar to the CC_NOOVmode, that also leaves something open what
>>> bits it really defines?
>>>
>>>
>>> What do you think?
>>>
>>>
>>> Thanks
>>> Bernd.
>>
>> I think maybe the right solution is to invent a new CCmode
>> that defines C as if the comparison is done in DImode
>> but N and V as if the comparison is done in SImode.
>>
>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>> only N and Z are set correctly), but in a different patch of course.
>>
>> Attached is a new version that implements the new CCmode.
>>
>> How do you like this new version?
>>
>> It seems to be able to build a cross-compiler at least.
>>
>> I will start a new bootstrap with this new patch, but that can take some
>> time until I have definitive results.
>>
>> Is there still a chance that it can go into gcc-7 or should it wait
>> for the next stage1?
>>
>> Thanks
>> Bernd.
>
>
> I thought I should also look at where the subdi_compare1 amd the
> negdi2_compare patterns are used, and look if the caller is fine with
> not having all CC bits available.
>
> And indeed usubv<mode>4 turns out to be questionabe, because it
> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
> CCmode) which is inconsistent when subdi3_compare1 no longer uses
> CCmode.
>
> To correct this, the branch should use CC_Cmode which is always defined.
>
> So I tried to test this pattern, with the following test programs,
> and found that the code actually improves when the branch uses CC_Cmode
> instead of CCmode, both for SImode and DImode data, which was a bit
> surprising.
>
> I used this test program to see how the usubv<mode>4 pattern works:
>
> cat test.c (DImode)
> unsigned long long x, y, z;
> int b;
> void test()
> {
>   b = __builtin_sub_overflow (y,z, &x);
> }
>
>
> unpatched code used 8 byte more stack than patched,
> because the DImode subtraction is effectively done twice.
>
> cat test1.c (SImode)
> unsigned long x, y, z;
> int b;
> void test()
> {
>   b = __builtin_sub_overflow (y,z, &x);
> }
>
> which generates (unpatched):
>         cmp     r3, r0
>         sub     ip, r3, r0
>
> instead of expected (patched):
>     subs    r3, r3, r2
>
>
> The condition is extracted by ifconversion and/or combine
> and complicates the resulting code instead of simplifying.
>
> I think this happens only when the branch and the subsi/di3_compare1
> is using the same CC mode.
>
> That does not happen when the CC modes disagree, as with the
> proposed patch.  All other uses of the pattern are already using
> CC_Cmode or CC_Vmode in the branch, and these do not change.
>
> Attached is an updated version of the patch, that happens to
> improve the code generation of the usubsi4 and usubdi4 pattern,
> as a side effect.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PING**2] [PATCH, ARM] correctly encode the CC reg data flow
  2017-04-20 19:10         ` [PING] " Bernd Edlinger
@ 2017-04-29 17:32           ` Bernd Edlinger
  2017-05-12 16:50             ` [PING**3] " Bernd Edlinger
  0 siblings, 1 reply; 21+ messages in thread
From: Bernd Edlinger @ 2017-04-29 17:32 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

Ping...

On 04/20/17 20:11, Bernd Edlinger wrote:
> Ping...
>
> for this patch:
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>
> On 01/18/17 16:36, Bernd Edlinger wrote:
>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>> one.
>>>>>>
>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>> before reload, a mis-compilation in libgcc function
>>>>>> __gnu_satfractdasq
>>>>>> was discovered, see [1] for more details.
>>>>>>
>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>> up into this:
>>>>>>
>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>                (set (match_dup 2)
>>>>>>                     (minus:SI (match_dup 5)
>>>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>> 0))))])]
>>>>>>
>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>                (set (reg:CC CC_REGNUM)
>>>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>
>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>> redundant and thus got removed, because the data values are
>>>>>> identical.
>>>>>>
>>>>>> I think that applies to a number of similar pattern where data
>>>>>> flow is happening through the CC reg.
>>>>>>
>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>> independently from the optimization issue PR77308.
>>>>>>
>>>>>> Therefore I think the patterns need to specify the true
>>>>>> value that will be in the CC reg, in order for cse to
>>>>>> know what the instructions are really doing.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> Is it OK for trunk?
>>>>>>
>>>>>
>>>>> I agree you've found a valid problem here, but I have some issues with
>>>>> the patch itself.
>>>>>
>>>>>
>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>>>     (compare:CC_NCV
>>>>>       (match_operand:DI 1 "register_operand" "r")
>>>>>       (match_operand:DI 2 "register_operand" "r")))
>>>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>   "TARGET_32BIT"
>>>>>   "#"
>>>>>   "&& reload_completed"
>>>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>            (compare:CC_C
>>>>>              (zero_extend:DI (match_dup 4))
>>>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>           (set (match_dup 3)
>>>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>
>>>>>
>>>>> This pattern is now no-longer self consistent in that before the split
>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>> afterwards it is just CC_C.
>>>>>
>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>> reflect the result of the 64-bit comparison), but that then implies
>>>>> that
>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>> should in
>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>> agree
>>>>> that CC_NCV is the correct mode for this operation
>>>>>
>>>>> I'm not sure if there are other consequences that will fall out from
>>>>> fixing this (it's possible that we might need a change to
>>>>> select_cc_mode
>>>>> as well).
>>>>>
>>>>
>>>> Yes, this is still a bit awkward...
>>>>
>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>> only gets the C bit correct, the expression for N and V is a different
>>>> one.
>>>>
>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>> more CC bits than the pattern does explicitly specify the value.
>>>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>>>> hard to write down the correct rtl expression for it.
>>>>
>>>> In theory the pattern should describe everything correctly,
>>>> maybe, like:
>>>>
>>>> set (reg:CC_C CC_REGNUM)
>>>>     (compare:CC_C
>>>>       (zero_extend:DI (match_dup 4))
>>>>       (plus:DI (zero_extend:DI (match_dup 5))
>>>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>> set (reg:CC_NV CC_REGNUM)
>>>>     (compare:CC_NV
>>>>      (match_dup 4))
>>>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>> 0)))
>>>> set (match_dup 3)
>>>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>
>>>>
>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>> in parallel?
>>>>
>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>> defines C from the DImode result, and NV from the SImode result,
>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>> bits it really defines?
>>>>
>>>>
>>>> What do you think?
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>
>>> I think maybe the right solution is to invent a new CCmode
>>> that defines C as if the comparison is done in DImode
>>> but N and V as if the comparison is done in SImode.
>>>
>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>> only N and Z are set correctly), but in a different patch of course.
>>>
>>> Attached is a new version that implements the new CCmode.
>>>
>>> How do you like this new version?
>>>
>>> It seems to be able to build a cross-compiler at least.
>>>
>>> I will start a new bootstrap with this new patch, but that can take some
>>> time until I have definitive results.
>>>
>>> Is there still a chance that it can go into gcc-7 or should it wait
>>> for the next stage1?
>>>
>>> Thanks
>>> Bernd.
>>
>>
>> I thought I should also look at where the subdi_compare1 amd the
>> negdi2_compare patterns are used, and look if the caller is fine with
>> not having all CC bits available.
>>
>> And indeed usubv<mode>4 turns out to be questionabe, because it
>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>> CCmode.
>>
>> To correct this, the branch should use CC_Cmode which is always defined.
>>
>> So I tried to test this pattern, with the following test programs,
>> and found that the code actually improves when the branch uses CC_Cmode
>> instead of CCmode, both for SImode and DImode data, which was a bit
>> surprising.
>>
>> I used this test program to see how the usubv<mode>4 pattern works:
>>
>> cat test.c (DImode)
>> unsigned long long x, y, z;
>> int b;
>> void test()
>> {
>>   b = __builtin_sub_overflow (y,z, &x);
>> }
>>
>>
>> unpatched code used 8 byte more stack than patched,
>> because the DImode subtraction is effectively done twice.
>>
>> cat test1.c (SImode)
>> unsigned long x, y, z;
>> int b;
>> void test()
>> {
>>   b = __builtin_sub_overflow (y,z, &x);
>> }
>>
>> which generates (unpatched):
>>         cmp     r3, r0
>>         sub     ip, r3, r0
>>
>> instead of expected (patched):
>>     subs    r3, r3, r2
>>
>>
>> The condition is extracted by ifconversion and/or combine
>> and complicates the resulting code instead of simplifying.
>>
>> I think this happens only when the branch and the subsi/di3_compare1
>> is using the same CC mode.
>>
>> That does not happen when the CC modes disagree, as with the
>> proposed patch.  All other uses of the pattern are already using
>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>
>> Attached is an updated version of the patch, that happens to
>> improve the code generation of the usubsi4 and usubdi4 pattern,
>> as a side effect.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PING**3] [PATCH, ARM] correctly encode the CC reg data flow
  2017-04-29 17:32           ` [PING**2] " Bernd Edlinger
@ 2017-05-12 16:50             ` Bernd Edlinger
  2017-06-01 16:01               ` [PING**4] " Bernd Edlinger
       [not found]               ` <eb07f6a9-522b-0497-fc13-f3e4508b8277@hotmail.de>
  0 siblings, 2 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-05-12 16:50 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

Ping...

On 04/29/17 19:21, Bernd Edlinger wrote:
> Ping...
>
> On 04/20/17 20:11, Bernd Edlinger wrote:
>> Ping...
>>
>> for this patch:
>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>>
>> On 01/18/17 16:36, Bernd Edlinger wrote:
>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>> one.
>>>>>>>
>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>> __gnu_satfractdasq
>>>>>>> was discovered, see [1] for more details.
>>>>>>>
>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>> up into this:
>>>>>>>
>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>                (set (match_dup 2)
>>>>>>>                     (minus:SI (match_dup 5)
>>>>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>>> 0))))])]
>>>>>>>
>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>                (set (reg:CC CC_REGNUM)
>>>>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>
>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>> redundant and thus got removed, because the data values are
>>>>>>> identical.
>>>>>>>
>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>> flow is happening through the CC reg.
>>>>>>>
>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>> independently from the optimization issue PR77308.
>>>>>>>
>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>> know what the instructions are really doing.
>>>>>>>
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>> Is it OK for trunk?
>>>>>>>
>>>>>>
>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>> with
>>>>>> the patch itself.
>>>>>>
>>>>>>
>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>>>>     (compare:CC_NCV
>>>>>>       (match_operand:DI 1 "register_operand" "r")
>>>>>>       (match_operand:DI 2 "register_operand" "r")))
>>>>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>   "TARGET_32BIT"
>>>>>>   "#"
>>>>>>   "&& reload_completed"
>>>>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>> 2)))])
>>>>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>            (compare:CC_C
>>>>>>              (zero_extend:DI (match_dup 4))
>>>>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>           (set (match_dup 3)
>>>>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>
>>>>>>
>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>> split
>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>> afterwards it is just CC_C.
>>>>>>
>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>> reflect the result of the 64-bit comparison), but that then implies
>>>>>> that
>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>> should in
>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>> agree
>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>
>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>> fixing this (it's possible that we might need a change to
>>>>>> select_cc_mode
>>>>>> as well).
>>>>>>
>>>>>
>>>>> Yes, this is still a bit awkward...
>>>>>
>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>> one.
>>>>>
>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>> it is
>>>>> hard to write down the correct rtl expression for it.
>>>>>
>>>>> In theory the pattern should describe everything correctly,
>>>>> maybe, like:
>>>>>
>>>>> set (reg:CC_C CC_REGNUM)
>>>>>     (compare:CC_C
>>>>>       (zero_extend:DI (match_dup 4))
>>>>>       (plus:DI (zero_extend:DI (match_dup 5))
>>>>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>     (compare:CC_NV
>>>>>      (match_dup 4))
>>>>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>> 0)))
>>>>> set (match_dup 3)
>>>>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>
>>>>>
>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>> in parallel?
>>>>>
>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>> bits it really defines?
>>>>>
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>
>>>> I think maybe the right solution is to invent a new CCmode
>>>> that defines C as if the comparison is done in DImode
>>>> but N and V as if the comparison is done in SImode.
>>>>
>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>> only N and Z are set correctly), but in a different patch of course.
>>>>
>>>> Attached is a new version that implements the new CCmode.
>>>>
>>>> How do you like this new version?
>>>>
>>>> It seems to be able to build a cross-compiler at least.
>>>>
>>>> I will start a new bootstrap with this new patch, but that can take
>>>> some
>>>> time until I have definitive results.
>>>>
>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>> for the next stage1?
>>>>
>>>> Thanks
>>>> Bernd.
>>>
>>>
>>> I thought I should also look at where the subdi_compare1 amd the
>>> negdi2_compare patterns are used, and look if the caller is fine with
>>> not having all CC bits available.
>>>
>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>> CCmode.
>>>
>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>
>>> So I tried to test this pattern, with the following test programs,
>>> and found that the code actually improves when the branch uses CC_Cmode
>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>> surprising.
>>>
>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>
>>> cat test.c (DImode)
>>> unsigned long long x, y, z;
>>> int b;
>>> void test()
>>> {
>>>   b = __builtin_sub_overflow (y,z, &x);
>>> }
>>>
>>>
>>> unpatched code used 8 byte more stack than patched,
>>> because the DImode subtraction is effectively done twice.
>>>
>>> cat test1.c (SImode)
>>> unsigned long x, y, z;
>>> int b;
>>> void test()
>>> {
>>>   b = __builtin_sub_overflow (y,z, &x);
>>> }
>>>
>>> which generates (unpatched):
>>>         cmp     r3, r0
>>>         sub     ip, r3, r0
>>>
>>> instead of expected (patched):
>>>     subs    r3, r3, r2
>>>
>>>
>>> The condition is extracted by ifconversion and/or combine
>>> and complicates the resulting code instead of simplifying.
>>>
>>> I think this happens only when the branch and the subsi/di3_compare1
>>> is using the same CC mode.
>>>
>>> That does not happen when the CC modes disagree, as with the
>>> proposed patch.  All other uses of the pattern are already using
>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>
>>> Attached is an updated version of the patch, that happens to
>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>> as a side effect.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PING**4] [PATCH, ARM] correctly encode the CC reg data flow
  2017-05-12 16:50             ` [PING**3] " Bernd Edlinger
@ 2017-06-01 16:01               ` Bernd Edlinger
       [not found]               ` <eb07f6a9-522b-0497-fc13-f3e4508b8277@hotmail.de>
  1 sibling, 0 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-06-01 16:01 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

Ping...

On 05/12/17 18:49, Bernd Edlinger wrote:
> Ping...
> 
> On 04/29/17 19:21, Bernd Edlinger wrote:
>> Ping...
>>
>> On 04/20/17 20:11, Bernd Edlinger wrote:
>>> Ping...
>>>
>>> for this patch:
>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>>>
>>> On 01/18/17 16:36, Bernd Edlinger wrote:
>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>> one.
>>>>>>>>
>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>> __gnu_satfractdasq
>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>
>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>> up into this:
>>>>>>>>
>>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>                (set (match_dup 2)
>>>>>>>>                     (minus:SI (match_dup 5)
>>>>>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>>>>> (const_int
>>>>>>>> 0))))])]
>>>>>>>>
>>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>                (set (reg:CC CC_REGNUM)
>>>>>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>
>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>> identical.
>>>>>>>>
>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>> flow is happening through the CC reg.
>>>>>>>>
>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>
>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>> know what the instructions are really doing.
>>>>>>>>
>>>>>>>>
>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>> Is it OK for trunk?
>>>>>>>>
>>>>>>>
>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>> with
>>>>>>> the patch itself.
>>>>>>>
>>>>>>>
>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>     (compare:CC_NCV
>>>>>>>       (match_operand:DI 1 "register_operand" "r")
>>>>>>>       (match_operand:DI 2 "register_operand" "r")))
>>>>>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>   "TARGET_32BIT"
>>>>>>>   "#"
>>>>>>>   "&& reload_completed"
>>>>>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>> 2)))])
>>>>>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>            (compare:CC_C
>>>>>>>              (zero_extend:DI (match_dup 4))
>>>>>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>           (set (match_dup 3)
>>>>>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>
>>>>>>>
>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>> split
>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>> afterwards it is just CC_C.
>>>>>>>
>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>> reflect the result of the 64-bit comparison), but that then implies
>>>>>>> that
>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>> should in
>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>> agree
>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>
>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>> select_cc_mode
>>>>>>> as well).
>>>>>>>
>>>>>>
>>>>>> Yes, this is still a bit awkward...
>>>>>>
>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>> only gets the C bit correct, the expression for N and V is a 
>>>>>> different
>>>>>> one.
>>>>>>
>>>>>> It probably works, because the subsi3_carryin_compare instruction 
>>>>>> sets
>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>> it is
>>>>>> hard to write down the correct rtl expression for it.
>>>>>>
>>>>>> In theory the pattern should describe everything correctly,
>>>>>> maybe, like:
>>>>>>
>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>     (compare:CC_C
>>>>>>       (zero_extend:DI (match_dup 4))
>>>>>>       (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>     (compare:CC_NV
>>>>>>      (match_dup 4))
>>>>>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>> 0)))
>>>>>> set (match_dup 3)
>>>>>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>
>>>>>>
>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>> in parallel?
>>>>>>
>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>> bits it really defines?
>>>>>>
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>
>>>>> I think maybe the right solution is to invent a new CCmode
>>>>> that defines C as if the comparison is done in DImode
>>>>> but N and V as if the comparison is done in SImode.
>>>>>
>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>
>>>>> Attached is a new version that implements the new CCmode.
>>>>>
>>>>> How do you like this new version?
>>>>>
>>>>> It seems to be able to build a cross-compiler at least.
>>>>>
>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>> some
>>>>> time until I have definitive results.
>>>>>
>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>> for the next stage1?
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>
>>>>
>>>> I thought I should also look at where the subdi_compare1 amd the
>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>> not having all CC bits available.
>>>>
>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>> CCmode.
>>>>
>>>> To correct this, the branch should use CC_Cmode which is always 
>>>> defined.
>>>>
>>>> So I tried to test this pattern, with the following test programs,
>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>> surprising.
>>>>
>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>
>>>> cat test.c (DImode)
>>>> unsigned long long x, y, z;
>>>> int b;
>>>> void test()
>>>> {
>>>>   b = __builtin_sub_overflow (y,z, &x);
>>>> }
>>>>
>>>>
>>>> unpatched code used 8 byte more stack than patched,
>>>> because the DImode subtraction is effectively done twice.
>>>>
>>>> cat test1.c (SImode)
>>>> unsigned long x, y, z;
>>>> int b;
>>>> void test()
>>>> {
>>>>   b = __builtin_sub_overflow (y,z, &x);
>>>> }
>>>>
>>>> which generates (unpatched):
>>>>         cmp     r3, r0
>>>>         sub     ip, r3, r0
>>>>
>>>> instead of expected (patched):
>>>>     subs    r3, r3, r2
>>>>
>>>>
>>>> The condition is extracted by ifconversion and/or combine
>>>> and complicates the resulting code instead of simplifying.
>>>>
>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>> is using the same CC mode.
>>>>
>>>> That does not happen when the CC modes disagree, as with the
>>>> proposed patch.  All other uses of the pattern are already using
>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>
>>>> Attached is an updated version of the patch, that happens to
>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>> as a side effect.
>>>>
>>>>
>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>> Is it OK for trunk?
>>>>
>>>>
>>>> Thanks
>>>> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PING**5] [PATCH, ARM] correctly encode the CC reg data flow
       [not found]               ` <eb07f6a9-522b-0497-fc13-f3e4508b8277@hotmail.de>
@ 2017-06-14 12:34                 ` Bernd Edlinger
       [not found]                 ` <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de>
  1 sibling, 0 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-06-14 12:34 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

Ping...

On 06/01/17 18:00, Bernd Edlinger wrote:
> Ping...
> 
> On 05/12/17 18:49, Bernd Edlinger wrote:
>> Ping...
>>
>> On 04/29/17 19:21, Bernd Edlinger wrote:
>>> Ping...
>>>
>>> On 04/20/17 20:11, Bernd Edlinger wrote:
>>>> Ping...
>>>>
>>>> for this patch:
>>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>>>>
>>>> On 01/18/17 16:36, Bernd Edlinger wrote:
>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> this is related to PR77308, the follow-up patch will depend on 
>>>>>>>>> this
>>>>>>>>> one.
>>>>>>>>>
>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>> __gnu_satfractdasq
>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>
>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>> up into this:
>>>>>>>>>
>>>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>                (set (match_dup 2)
>>>>>>>>>                     (minus:SI (match_dup 5)
>>>>>>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>>>>>> (const_int
>>>>>>>>> 0))))])]
>>>>>>>>>
>>>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>                (set (reg:CC CC_REGNUM)
>>>>>>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>
>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>> identical.
>>>>>>>>>
>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>
>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>
>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>> Is it OK for trunk?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>> with
>>>>>>>> the patch itself.
>>>>>>>>
>>>>>>>>
>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>     (compare:CC_NCV
>>>>>>>>       (match_operand:DI 1 "register_operand" "r")
>>>>>>>>       (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>   "TARGET_32BIT"
>>>>>>>>   "#"
>>>>>>>>   "&& reload_completed"
>>>>>>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>> 2)))])
>>>>>>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>            (compare:CC_C
>>>>>>>>              (zero_extend:DI (match_dup 4))
>>>>>>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>           (set (match_dup 3)
>>>>>>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>
>>>>>>>>
>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>> split
>>>>>>>> the overall result for the condition register is in mode CC_NCV, 
>>>>>>>> but
>>>>>>>> afterwards it is just CC_C.
>>>>>>>>
>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>> reflect the result of the 64-bit comparison), but that then implies
>>>>>>>> that
>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>> should in
>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>>> agree
>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>
>>>>>>>> I'm not sure if there are other consequences that will fall out 
>>>>>>>> from
>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>> select_cc_mode
>>>>>>>> as well).
>>>>>>>>
>>>>>>>
>>>>>>> Yes, this is still a bit awkward...
>>>>>>>
>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>> only gets the C bit correct, the expression for N and V is a 
>>>>>>> different
>>>>>>> one.
>>>>>>>
>>>>>>> It probably works, because the subsi3_carryin_compare instruction 
>>>>>>> sets
>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>> it is
>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>
>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>> maybe, like:
>>>>>>>
>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>     (compare:CC_C
>>>>>>>       (zero_extend:DI (match_dup 4))
>>>>>>>       (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>     (compare:CC_NV
>>>>>>>      (match_dup 4))
>>>>>>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>>> 0)))
>>>>>>> set (match_dup 3)
>>>>>>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>
>>>>>>>
>>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>>> in parallel?
>>>>>>>
>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>> bits it really defines?
>>>>>>>
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>>
>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>> that defines C as if the comparison is done in DImode
>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>
>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>
>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>
>>>>>> How do you like this new version?
>>>>>>
>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>
>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>> some
>>>>>> time until I have definitive results.
>>>>>>
>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>> for the next stage1?
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>
>>>>>
>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>> not having all CC bits available.
>>>>>
>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>> CCmode.
>>>>>
>>>>> To correct this, the branch should use CC_Cmode which is always 
>>>>> defined.
>>>>>
>>>>> So I tried to test this pattern, with the following test programs,
>>>>> and found that the code actually improves when the branch uses 
>>>>> CC_Cmode
>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>> surprising.
>>>>>
>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>
>>>>> cat test.c (DImode)
>>>>> unsigned long long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>   b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>>
>>>>> unpatched code used 8 byte more stack than patched,
>>>>> because the DImode subtraction is effectively done twice.
>>>>>
>>>>> cat test1.c (SImode)
>>>>> unsigned long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>   b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>> which generates (unpatched):
>>>>>         cmp     r3, r0
>>>>>         sub     ip, r3, r0
>>>>>
>>>>> instead of expected (patched):
>>>>>     subs    r3, r3, r2
>>>>>
>>>>>
>>>>> The condition is extracted by ifconversion and/or combine
>>>>> and complicates the resulting code instead of simplifying.
>>>>>
>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>> is using the same CC mode.
>>>>>
>>>>> That does not happen when the CC modes disagree, as with the
>>>>> proposed patch.  All other uses of the pattern are already using
>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>
>>>>> Attached is an updated version of the patch, that happens to
>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>> as a side effect.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>> Is it OK for trunk?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PING**6] [PATCH, ARM] correctly encode the CC reg data flow
       [not found]                 ` <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de>
@ 2017-07-05 18:11                   ` Bernd Edlinger
  0 siblings, 0 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-07-05 18:11 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Kyrill Tkachov, Wilco Dijkstra

Ping...

On 06/14/17 14:33, Bernd Edlinger wrote:
> Ping...
> 
> On 06/01/17 18:00, Bernd Edlinger wrote:
>> Ping...
>>
>> On 05/12/17 18:49, Bernd Edlinger wrote:
>>> Ping...
>>>
>>> On 04/29/17 19:21, Bernd Edlinger wrote:
>>>> Ping...
>>>>
>>>> On 04/20/17 20:11, Bernd Edlinger wrote:
>>>>> Ping...
>>>>>
>>>>> for this patch:
>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>>>>>
>>>>> On 01/18/17 16:36, Bernd Edlinger wrote:
>>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> this is related to PR77308, the follow-up patch will depend on 
>>>>>>>>>> this
>>>>>>>>>> one.
>>>>>>>>>>
>>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>>> __gnu_satfractdasq
>>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>>
>>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>>> up into this:
>>>>>>>>>>
>>>>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>>>>          (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>>     (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>>                     (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>>                (set (match_dup 2)
>>>>>>>>>>                     (minus:SI (match_dup 5)
>>>>>>>>>>                              (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>>>>>>> (const_int
>>>>>>>>>> 0))))])]
>>>>>>>>>>
>>>>>>>>>>    [(set (reg:CC CC_REGNUM)
>>>>>>>>>>          (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>>     (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>>                (set (reg:CC CC_REGNUM)
>>>>>>>>>>                     (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>>
>>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>>> identical.
>>>>>>>>>>
>>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>>
>>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>>
>>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>>> with
>>>>>>>>> the patch itself.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>>   [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>>     (compare:CC_NCV
>>>>>>>>>       (match_operand:DI 1 "register_operand" "r")
>>>>>>>>>       (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>>    (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>>     (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>>   "TARGET_32BIT"
>>>>>>>>>   "#"
>>>>>>>>>   "&& reload_completed"
>>>>>>>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>            (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>>           (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>>> 2)))])
>>>>>>>>>    (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>>            (compare:CC_C
>>>>>>>>>              (zero_extend:DI (match_dup 4))
>>>>>>>>>              (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>>           (set (match_dup 3)
>>>>>>>>>            (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>>> split
>>>>>>>>> the overall result for the condition register is in mode 
>>>>>>>>> CC_NCV, but
>>>>>>>>> afterwards it is just CC_C.
>>>>>>>>>
>>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>>> reflect the result of the 64-bit comparison), but that then 
>>>>>>>>> implies
>>>>>>>>> that
>>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>>> should in
>>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>>>> agree
>>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>>
>>>>>>>>> I'm not sure if there are other consequences that will fall out 
>>>>>>>>> from
>>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>>> select_cc_mode
>>>>>>>>> as well).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is still a bit awkward...
>>>>>>>>
>>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>>> only gets the C bit correct, the expression for N and V is a 
>>>>>>>> different
>>>>>>>> one.
>>>>>>>>
>>>>>>>> It probably works, because the subsi3_carryin_compare 
>>>>>>>> instruction sets
>>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>>> it is
>>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>>
>>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>>> maybe, like:
>>>>>>>>
>>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>>     (compare:CC_C
>>>>>>>>       (zero_extend:DI (match_dup 4))
>>>>>>>>       (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>                (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>>     (compare:CC_NV
>>>>>>>>      (match_dup 4))
>>>>>>>>      (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>>>> 0)))
>>>>>>>> set (match_dup 3)
>>>>>>>>     (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>               (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>
>>>>>>>>
>>>>>>>> But I doubt that will work to set CC_REGNUM with two different 
>>>>>>>> modes
>>>>>>>> in parallel?
>>>>>>>>
>>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>>> bits it really defines?
>>>>>>>>
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>>
>>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>>> that defines C as if the comparison is done in DImode
>>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>>
>>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>>
>>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>>
>>>>>>> How do you like this new version?
>>>>>>>
>>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>>
>>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>>> some
>>>>>>> time until I have definitive results.
>>>>>>>
>>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>>> for the next stage1?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>>
>>>>>>
>>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>>> not having all CC bits available.
>>>>>>
>>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>>> CCmode.
>>>>>>
>>>>>> To correct this, the branch should use CC_Cmode which is always 
>>>>>> defined.
>>>>>>
>>>>>> So I tried to test this pattern, with the following test programs,
>>>>>> and found that the code actually improves when the branch uses 
>>>>>> CC_Cmode
>>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>>> surprising.
>>>>>>
>>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>>
>>>>>> cat test.c (DImode)
>>>>>> unsigned long long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>>   b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>>
>>>>>> unpatched code used 8 byte more stack than patched,
>>>>>> because the DImode subtraction is effectively done twice.
>>>>>>
>>>>>> cat test1.c (SImode)
>>>>>> unsigned long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>>   b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>> which generates (unpatched):
>>>>>>         cmp     r3, r0
>>>>>>         sub     ip, r3, r0
>>>>>>
>>>>>> instead of expected (patched):
>>>>>>     subs    r3, r3, r2
>>>>>>
>>>>>>
>>>>>> The condition is extracted by ifconversion and/or combine
>>>>>> and complicates the resulting code instead of simplifying.
>>>>>>
>>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>>> is using the same CC mode.
>>>>>>
>>>>>> That does not happen when the CC modes disagree, as with the
>>>>>> proposed patch.  All other uses of the pattern are already using
>>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>>
>>>>>> Attached is an updated version of the patch, that happens to
>>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>>> as a side effect.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> Is it OK for trunk?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-01-18 15:43       ` Bernd Edlinger
  2017-04-20 19:10         ` [PING] " Bernd Edlinger
@ 2017-09-04 13:55         ` Kyrill Tkachov
  2017-09-04 19:54           ` Bernd Edlinger
       [not found]           ` <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>
  1 sibling, 2 replies; 21+ messages in thread
From: Kyrill Tkachov @ 2017-09-04 13:55 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

Hi Bernd,

On 18/01/17 15:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>> one.
>>>>>
>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>>>>> was discovered, see [1] for more details.
>>>>>
>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>> up into this:
>>>>>
>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>           (compare:CC (match_dup 0) (match_dup 1)))
>>>>>      (parallel [(set (reg:CC CC_REGNUM)
>>>>>                      (compare:CC (match_dup 3) (match_dup 4)))
>>>>>                 (set (match_dup 2)
>>>>>                      (minus:SI (match_dup 5)
>>>>>                               (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>> 0))))])]
>>>>>
>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>           (compare:CC (match_dup 2) (match_dup 3)))
>>>>>      (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>                 (set (reg:CC CC_REGNUM)
>>>>>                      (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>
>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>> redundant and thus got removed, because the data values are identical.
>>>>>
>>>>> I think that applies to a number of similar pattern where data
>>>>> flow is happening through the CC reg.
>>>>>
>>>>> So this is a kind of correctness issue, and should be fixed
>>>>> independently from the optimization issue PR77308.
>>>>>
>>>>> Therefore I think the patterns need to specify the true
>>>>> value that will be in the CC reg, in order for cse to
>>>>> know what the instructions are really doing.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>> Is it OK for trunk?
>>>>>
>>>> I agree you've found a valid problem here, but I have some issues with
>>>> the patch itself.
>>>>
>>>>
>>>> (define_insn_and_split "subdi3_compare1"
>>>>    [(set (reg:CC_NCV CC_REGNUM)
>>>>      (compare:CC_NCV
>>>>        (match_operand:DI 1 "register_operand" "r")
>>>>        (match_operand:DI 2 "register_operand" "r")))
>>>>     (set (match_operand:DI 0 "register_operand" "=&r")
>>>>      (minus:DI (match_dup 1) (match_dup 2)))]
>>>>    "TARGET_32BIT"
>>>>    "#"
>>>>    "&& reload_completed"
>>>>    [(parallel [(set (reg:CC CC_REGNUM)
>>>>             (compare:CC (match_dup 1) (match_dup 2)))
>>>>            (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>>     (parallel [(set (reg:CC_C CC_REGNUM)
>>>>             (compare:CC_C
>>>>               (zero_extend:DI (match_dup 4))
>>>>               (plus:DI (zero_extend:DI (match_dup 5))
>>>>                    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>            (set (match_dup 3)
>>>>             (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>                   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>
>>>>
>>>> This pattern is now no-longer self consistent in that before the split
>>>> the overall result for the condition register is in mode CC_NCV, but
>>>> afterwards it is just CC_C.
>>>>
>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>> reflect the result of the 64-bit comparison), but that then implies that
>>>> the cc mode of subsi3_carryin_compare is incorrect as well and should in
>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
>>>> that CC_NCV is the correct mode for this operation
>>>>
>>>> I'm not sure if there are other consequences that will fall out from
>>>> fixing this (it's possible that we might need a change to select_cc_mode
>>>> as well).
>>>>
>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>> only gets the C bit correct, the expression for N and V is a different
>>> one.
>>>
>>> It probably works, because the subsi3_carryin_compare instruction sets
>>> more CC bits than the pattern does explicitly specify the value.
>>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>>> hard to write down the correct rtl expression for it.
>>>
>>> In theory the pattern should describe everything correctly,
>>> maybe, like:
>>>
>>> set (reg:CC_C CC_REGNUM)
>>>      (compare:CC_C
>>>        (zero_extend:DI (match_dup 4))
>>>        (plus:DI (zero_extend:DI (match_dup 5))
>>>                 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>> set (reg:CC_NV CC_REGNUM)
>>>      (compare:CC_NV
>>>       (match_dup 4))
>>>       (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
>>> set (match_dup 3)
>>>      (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>                (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>
>>>
>>> But I doubt that will work to set CC_REGNUM with two different modes
>>> in parallel?
>>>
>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>> defines C from the DImode result, and NV from the SImode result,
>>> similar to the CC_NOOVmode, that also leaves something open what
>>> bits it really defines?
>>>
>>>
>>> What do you think?
>>>
>>>
>>> Thanks
>>> Bernd.
>> I think maybe the right solution is to invent a new CCmode
>> that defines C as if the comparison is done in DImode
>> but N and V as if the comparison is done in SImode.
>>
>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>> only N and Z are set correctly), but in a different patch of course.
>>
>> Attached is a new version that implements the new CCmode.
>>
>> How do you like this new version?
>>
>> It seems to be able to build a cross-compiler at least.
>>
>> I will start a new bootstrap with this new patch, but that can take some
>> time until I have definitive results.
>>
>> Is there still a chance that it can go into gcc-7 or should it wait
>> for the next stage1?
>>
>> Thanks
>> Bernd.
>
> I thought I should also look at where the subdi_compare1 amd the
> negdi2_compare patterns are used, and look if the caller is fine with
> not having all CC bits available.
>
> And indeed usubv<mode>4 turns out to be questionabe, because it
> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
> CCmode) which is inconsistent when subdi3_compare1 no longer uses
> CCmode.
>
> To correct this, the branch should use CC_Cmode which is always defined.
>
> So I tried to test this pattern, with the following test programs,
> and found that the code actually improves when the branch uses CC_Cmode
> instead of CCmode, both for SImode and DImode data, which was a bit
> surprising.
>
> I used this test program to see how the usubv<mode>4 pattern works:
>
> cat test.c (DImode)
> unsigned long long x, y, z;
> int b;
> void test()
> {
>     b = __builtin_sub_overflow (y,z, &x);
> }
>
>
> unpatched code used 8 byte more stack than patched,
> because the DImode subtraction is effectively done twice.
>
> cat test1.c (SImode)
> unsigned long x, y, z;
> int b;
> void test()
> {
>     b = __builtin_sub_overflow (y,z, &x);
> }
>
> which generates (unpatched):
>           cmp     r3, r0
>           sub     ip, r3, r0
>
> instead of expected (patched):
> 	subs	r3, r3, r2
>
>
> The condition is extracted by ifconversion and/or combine
> and complicates the resulting code instead of simplifying.
>
> I think this happens only when the branch and the subsi/di3_compare1
> is using the same CC mode.
>
> That does not happen when the CC modes disagree, as with the
> proposed patch.  All other uses of the pattern are already using
> CC_Cmode or CC_Vmode in the branch, and these do not change.
>
> Attached is an updated version of the patch, that happens to
> improve the code generation of the usubsi4 and usubdi4 pattern,
> as a side effect.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?

I'm very sorry it has taken so long to review.
I've been ramping up on the context recently now so I'll try to move 
this along...

This patch looks mostly ok to me from reading the patterns and the 
discussion around it.
I have one concern:

  
  (define_insn_and_split "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
  	  (const_int 0)
  	  (match_operand:DI 1 "register_operand" "0,r")))
     (set (match_operand:DI 0 "register_operand" "=r,&r")
@@ -4647,8 +4650,12 @@
  		   (compare:CC (const_int 0) (match_dup 1)))
  	      (set (match_dup 0) (minus:SI (const_int 0)
  					   (match_dup 1)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (const_int 0) (match_dup 3)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (const_int 0)
+		     (plus:DI
+		       (zero_extend:DI (match_dup 3))
+		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
  	     (set (match_dup 2)
  		  (minus:SI
  		   (minus:SI (const_int 0) (match_dup 3))


I was somewhat concerned with having the first operand of the COMPARE being a const_int 0 and the second being
a complex expression as the RTL canonicalization rules usually require the complex operand going first if possible.
Reading the RTL rules in rtl.texi I see it says this:
"If one of the operands is a constant, it should be placed in the
second operand and the comparison code adjusted as appropriate."
So it seems that the pre-existing pattern that puts const_int 0 as the first operand already breaks that rule.
I think we should fix that and update the use of condition code to a GEU rather than LTU as well.

Kyrill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-09-04 13:55         ` Kyrill Tkachov
@ 2017-09-04 19:54           ` Bernd Edlinger
       [not found]           ` <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>
  1 sibling, 0 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-09-04 19:54 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

Hi Kyrill,

Thanks for your review!


On 09/04/17 15:55, Kyrill Tkachov wrote:
> Hi Bernd,
> 
> On 18/01/17 15:36, Bernd Edlinger wrote:
>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>> one.
>>>>>>
>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>> before reload, a mis-compilation in libgcc function 
>>>>>> __gnu_satfractdasq
>>>>>> was discovered, see [1] for more details.
>>>>>>
>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>> up into this:
>>>>>>
>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>           (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>      (parallel [(set (reg:CC CC_REGNUM)
>>>>>>                      (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>                 (set (match_dup 2)
>>>>>>                      (minus:SI (match_dup 5)
>>>>>>                               (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>> 0))))])]
>>>>>>
>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>           (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>      (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>                 (set (reg:CC CC_REGNUM)
>>>>>>                      (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>
>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>> redundant and thus got removed, because the data values are 
>>>>>> identical.
>>>>>>
>>>>>> I think that applies to a number of similar pattern where data
>>>>>> flow is happening through the CC reg.
>>>>>>
>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>> independently from the optimization issue PR77308.
>>>>>>
>>>>>> Therefore I think the patterns need to specify the true
>>>>>> value that will be in the CC reg, in order for cse to
>>>>>> know what the instructions are really doing.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> Is it OK for trunk?
>>>>>>
>>>>> I agree you've found a valid problem here, but I have some issues with
>>>>> the patch itself.
>>>>>
>>>>>
>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>    [(set (reg:CC_NCV CC_REGNUM)
>>>>>      (compare:CC_NCV
>>>>>        (match_operand:DI 1 "register_operand" "r")
>>>>>        (match_operand:DI 2 "register_operand" "r")))
>>>>>     (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>      (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>    "TARGET_32BIT"
>>>>>    "#"
>>>>>    "&& reload_completed"
>>>>>    [(parallel [(set (reg:CC CC_REGNUM)
>>>>>             (compare:CC (match_dup 1) (match_dup 2)))
>>>>>            (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>>>>> 2)))])
>>>>>     (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>             (compare:CC_C
>>>>>               (zero_extend:DI (match_dup 4))
>>>>>               (plus:DI (zero_extend:DI (match_dup 5))
>>>>>                    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>            (set (match_dup 3)
>>>>>             (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>                   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>
>>>>>
>>>>> This pattern is now no-longer self consistent in that before the split
>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>> afterwards it is just CC_C.
>>>>>
>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>> reflect the result of the 64-bit comparison), but that then implies 
>>>>> that
>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>>>>> should in
>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>>>>> agree
>>>>> that CC_NCV is the correct mode for this operation
>>>>>
>>>>> I'm not sure if there are other consequences that will fall out from
>>>>> fixing this (it's possible that we might need a change to 
>>>>> select_cc_mode
>>>>> as well).
>>>>>
>>>> Yes, this is still a bit awkward...
>>>>
>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>> only gets the C bit correct, the expression for N and V is a different
>>>> one.
>>>>
>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>> more CC bits than the pattern does explicitly specify the value.
>>>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>>>> hard to write down the correct rtl expression for it.
>>>>
>>>> In theory the pattern should describe everything correctly,
>>>> maybe, like:
>>>>
>>>> set (reg:CC_C CC_REGNUM)
>>>>      (compare:CC_C
>>>>        (zero_extend:DI (match_dup 4))
>>>>        (plus:DI (zero_extend:DI (match_dup 5))
>>>>                 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>> set (reg:CC_NV CC_REGNUM)
>>>>      (compare:CC_NV
>>>>       (match_dup 4))
>>>>       (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 
>>>> 0)))
>>>> set (match_dup 3)
>>>>      (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>                (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>
>>>>
>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>> in parallel?
>>>>
>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>> defines C from the DImode result, and NV from the SImode result,
>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>> bits it really defines?
>>>>
>>>>
>>>> What do you think?
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>> I think maybe the right solution is to invent a new CCmode
>>> that defines C as if the comparison is done in DImode
>>> but N and V as if the comparison is done in SImode.
>>>
>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>> only N and Z are set correctly), but in a different patch of course.
>>>
>>> Attached is a new version that implements the new CCmode.
>>>
>>> How do you like this new version?
>>>
>>> It seems to be able to build a cross-compiler at least.
>>>
>>> I will start a new bootstrap with this new patch, but that can take some
>>> time until I have definitive results.
>>>
>>> Is there still a chance that it can go into gcc-7 or should it wait
>>> for the next stage1?
>>>
>>> Thanks
>>> Bernd.
>>
>> I thought I should also look at where the subdi_compare1 amd the
>> negdi2_compare patterns are used, and look if the caller is fine with
>> not having all CC bits available.
>>
>> And indeed usubv<mode>4 turns out to be questionabe, because it
>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>> CCmode.
>>
>> To correct this, the branch should use CC_Cmode which is always defined.
>>
>> So I tried to test this pattern, with the following test programs,
>> and found that the code actually improves when the branch uses CC_Cmode
>> instead of CCmode, both for SImode and DImode data, which was a bit
>> surprising.
>>
>> I used this test program to see how the usubv<mode>4 pattern works:
>>
>> cat test.c (DImode)
>> unsigned long long x, y, z;
>> int b;
>> void test()
>> {
>>     b = __builtin_sub_overflow (y,z, &x);
>> }
>>
>>
>> unpatched code used 8 byte more stack than patched,
>> because the DImode subtraction is effectively done twice.
>>
>> cat test1.c (SImode)
>> unsigned long x, y, z;
>> int b;
>> void test()
>> {
>>     b = __builtin_sub_overflow (y,z, &x);
>> }
>>
>> which generates (unpatched):
>>           cmp     r3, r0
>>           sub     ip, r3, r0
>>
>> instead of expected (patched):
>>     subs    r3, r3, r2
>>
>>
>> The condition is extracted by ifconversion and/or combine
>> and complicates the resulting code instead of simplifying.
>>
>> I think this happens only when the branch and the subsi/di3_compare1
>> is using the same CC mode.
>>
>> That does not happen when the CC modes disagree, as with the
>> proposed patch.  All other uses of the pattern are already using
>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>
>> Attached is an updated version of the patch, that happens to
>> improve the code generation of the usubsi4 and usubdi4 pattern,
>> as a side effect.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
> 
> I'm very sorry it has taken so long to review.
> I've been ramping up on the context recently now so I'll try to move 
> this along...
> 
> This patch looks mostly ok to me from reading the patterns and the 
> discussion around it.
> I have one concern:
> 
> 
>   (define_insn_and_split "negdi2_compare"
> -  [(set (reg:CC CC_REGNUM)
> -    (compare:CC
> +  [(set (reg:CC_NCV CC_REGNUM)
> +    (compare:CC_NCV
>         (const_int 0)
>         (match_operand:DI 1 "register_operand" "0,r")))
>      (set (match_operand:DI 0 "register_operand" "=r,&r")
> @@ -4647,8 +4650,12 @@
>              (compare:CC (const_int 0) (match_dup 1)))
>             (set (match_dup 0) (minus:SI (const_int 0)
>                          (match_dup 1)))])
> -   (parallel [(set (reg:CC CC_REGNUM)
> -           (compare:CC (const_int 0) (match_dup 3)))
> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
> +           (compare:CC_NCV_CIC
> +             (const_int 0)
> +             (plus:DI
> +               (zero_extend:DI (match_dup 3))
> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>            (set (match_dup 2)
>             (minus:SI
>              (minus:SI (const_int 0) (match_dup 3))
> 
> 
> I was somewhat concerned with having the first operand of the COMPARE 
> being a const_int 0 and the second being
> a complex expression as the RTL canonicalization rules usually require 
> the complex operand going first if possible.
> Reading the RTL rules in rtl.texi I see it says this:
> "If one of the operands is a constant, it should be placed in the
> second operand and the comparison code adjusted as appropriate."
> So it seems that the pre-existing pattern that puts const_int 0 as the 
> first operand already breaks that rule.
> I think we should fix that and update the use of condition code to a GEU 
> rather than LTU as well.
> 


Hmmm...

I think the compare is not a commutative operation, and swapping
the arguments will imply a different value in the flags.

So if I write
(set (reg:CC_NCV CC_REGNUM)
      (compare:CC_NCV
        (const_int 0)
        (reg:DI 123)))

I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
N,V = usable for LT, GE

But if I write
(set (reg:CC_NCV CC_REGNUM)
      (compare:CC_NCV
        (reg:DI 123)
        (const_int 0)))

I have C,N,V set to the result of (r123 - 0), but the expansion stays
the same and the actual value in the flags is defined by the expansion.
Of course there exists probably no matching expansion for that.

Note that both LTU in the above hunk are in a parallel-stmt and operate
on the flags from the previous pattern, so changing these to GEU
will probably be wrong.

Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
(const_int 0) (match_dup 1)).

One use of the resulting flags (I know of) is in negvdi3 where we
have:

   emit_insn (gen_negdi2_compare (operands[0], operands[1]));
   arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);

I think only 0-x can overflow while x-0 can never overflow.

Of course the CC_NCV_CIC mode bends the definition of the RTL compare
a lot and I guess if this pattern is created by a splitter, this can
only be expanded by an exactly matching pattern, there is (hopefully)
no way how combine could mess with this pattern due to the exotic
CCmode.  So while I think it would work to swap only the notation of
all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
consuming statements, that would make it quite hard to follow for the
human reader at least.

What do you think?


Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
       [not found]           ` <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>
@ 2017-09-06 12:44             ` Bernd Edlinger
  2017-09-06 12:52               ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 21+ messages in thread
From: Bernd Edlinger @ 2017-09-06 12:44 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

On 09/04/17 21:54, Bernd Edlinger wrote:
> Hi Kyrill,
> 
> Thanks for your review!
> 
> 
> On 09/04/17 15:55, Kyrill Tkachov wrote:
>> Hi Bernd,
>>
>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>> one.
>>>>>>>
>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>> before reload, a mis-compilation in libgcc function 
>>>>>>> __gnu_satfractdasq
>>>>>>> was discovered, see [1] for more details.
>>>>>>>
>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>> up into this:
>>>>>>>
>>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>>           (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>      (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>                      (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>                 (set (match_dup 2)
>>>>>>>                      (minus:SI (match_dup 5)
>>>>>>>                               (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>>>> (const_int
>>>>>>> 0))))])]
>>>>>>>
>>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>>           (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>      (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>                 (set (reg:CC CC_REGNUM)
>>>>>>>                      (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>
>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>> redundant and thus got removed, because the data values are 
>>>>>>> identical.
>>>>>>>
>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>> flow is happening through the CC reg.
>>>>>>>
>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>> independently from the optimization issue PR77308.
>>>>>>>
>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>> know what the instructions are really doing.
>>>>>>>
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>> Is it OK for trunk?
>>>>>>>
>>>>>> I agree you've found a valid problem here, but I have some issues 
>>>>>> with
>>>>>> the patch itself.
>>>>>>
>>>>>>
>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>    [(set (reg:CC_NCV CC_REGNUM)
>>>>>>      (compare:CC_NCV
>>>>>>        (match_operand:DI 1 "register_operand" "r")
>>>>>>        (match_operand:DI 2 "register_operand" "r")))
>>>>>>     (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>      (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>    "TARGET_32BIT"
>>>>>>    "#"
>>>>>>    "&& reload_completed"
>>>>>>    [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>             (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>            (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>>>>>> 2)))])
>>>>>>     (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>             (compare:CC_C
>>>>>>               (zero_extend:DI (match_dup 4))
>>>>>>               (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>                    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>            (set (match_dup 3)
>>>>>>             (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>                   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>
>>>>>>
>>>>>> This pattern is now no-longer self consistent in that before the 
>>>>>> split
>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>> afterwards it is just CC_C.
>>>>>>
>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>> reflect the result of the 64-bit comparison), but that then 
>>>>>> implies that
>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>>>>>> should in
>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>>>>>> agree
>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>
>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>> fixing this (it's possible that we might need a change to 
>>>>>> select_cc_mode
>>>>>> as well).
>>>>>>
>>>>> Yes, this is still a bit awkward...
>>>>>
>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>> one.
>>>>>
>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>> We know the subsi3_carryin_compare also computes the NV bits, but 
>>>>> it is
>>>>> hard to write down the correct rtl expression for it.
>>>>>
>>>>> In theory the pattern should describe everything correctly,
>>>>> maybe, like:
>>>>>
>>>>> set (reg:CC_C CC_REGNUM)
>>>>>      (compare:CC_C
>>>>>        (zero_extend:DI (match_dup 4))
>>>>>        (plus:DI (zero_extend:DI (match_dup 5))
>>>>>                 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>      (compare:CC_NV
>>>>>       (match_dup 4))
>>>>>       (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>> (const_int 0)))
>>>>> set (match_dup 3)
>>>>>      (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>                (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>
>>>>>
>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>> in parallel?
>>>>>
>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>> bits it really defines?
>>>>>
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>> I think maybe the right solution is to invent a new CCmode
>>>> that defines C as if the comparison is done in DImode
>>>> but N and V as if the comparison is done in SImode.
>>>>
>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>> only N and Z are set correctly), but in a different patch of course.
>>>>
>>>> Attached is a new version that implements the new CCmode.
>>>>
>>>> How do you like this new version?
>>>>
>>>> It seems to be able to build a cross-compiler at least.
>>>>
>>>> I will start a new bootstrap with this new patch, but that can take 
>>>> some
>>>> time until I have definitive results.
>>>>
>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>> for the next stage1?
>>>>
>>>> Thanks
>>>> Bernd.
>>>
>>> I thought I should also look at where the subdi_compare1 amd the
>>> negdi2_compare patterns are used, and look if the caller is fine with
>>> not having all CC bits available.
>>>
>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>> CCmode.
>>>
>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>
>>> So I tried to test this pattern, with the following test programs,
>>> and found that the code actually improves when the branch uses CC_Cmode
>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>> surprising.
>>>
>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>
>>> cat test.c (DImode)
>>> unsigned long long x, y, z;
>>> int b;
>>> void test()
>>> {
>>>     b = __builtin_sub_overflow (y,z, &x);
>>> }
>>>
>>>
>>> unpatched code used 8 byte more stack than patched,
>>> because the DImode subtraction is effectively done twice.
>>>
>>> cat test1.c (SImode)
>>> unsigned long x, y, z;
>>> int b;
>>> void test()
>>> {
>>>     b = __builtin_sub_overflow (y,z, &x);
>>> }
>>>
>>> which generates (unpatched):
>>>           cmp     r3, r0
>>>           sub     ip, r3, r0
>>>
>>> instead of expected (patched):
>>>     subs    r3, r3, r2
>>>
>>>
>>> The condition is extracted by ifconversion and/or combine
>>> and complicates the resulting code instead of simplifying.
>>>
>>> I think this happens only when the branch and the subsi/di3_compare1
>>> is using the same CC mode.
>>>
>>> That does not happen when the CC modes disagree, as with the
>>> proposed patch.  All other uses of the pattern are already using
>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>
>>> Attached is an updated version of the patch, that happens to
>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>> as a side effect.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>
>> I'm very sorry it has taken so long to review.
>> I've been ramping up on the context recently now so I'll try to move 
>> this along...
>>
>> This patch looks mostly ok to me from reading the patterns and the 
>> discussion around it.
>> I have one concern:
>>
>>
>>   (define_insn_and_split "negdi2_compare"
>> -  [(set (reg:CC CC_REGNUM)
>> -    (compare:CC
>> +  [(set (reg:CC_NCV CC_REGNUM)
>> +    (compare:CC_NCV
>>         (const_int 0)
>>         (match_operand:DI 1 "register_operand" "0,r")))
>>      (set (match_operand:DI 0 "register_operand" "=r,&r")
>> @@ -4647,8 +4650,12 @@
>>              (compare:CC (const_int 0) (match_dup 1)))
>>             (set (match_dup 0) (minus:SI (const_int 0)
>>                          (match_dup 1)))])
>> -   (parallel [(set (reg:CC CC_REGNUM)
>> -           (compare:CC (const_int 0) (match_dup 3)))
>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>> +           (compare:CC_NCV_CIC
>> +             (const_int 0)
>> +             (plus:DI
>> +               (zero_extend:DI (match_dup 3))
>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>            (set (match_dup 2)
>>             (minus:SI
>>              (minus:SI (const_int 0) (match_dup 3))
>>
>>
>> I was somewhat concerned with having the first operand of the COMPARE 
>> being a const_int 0 and the second being
>> a complex expression as the RTL canonicalization rules usually require 
>> the complex operand going first if possible.
>> Reading the RTL rules in rtl.texi I see it says this:
>> "If one of the operands is a constant, it should be placed in the
>> second operand and the comparison code adjusted as appropriate."
>> So it seems that the pre-existing pattern that puts const_int 0 as the 
>> first operand already breaks that rule.
>> I think we should fix that and update the use of condition code to a 
>> GEU rather than LTU as well.
>>
> 

Well, the sentence before that one is even more explicit:

"Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
@code{compare} is valid only if the mode of @var{x} is in class
@code{MODE_INT} and @var{y} is a @code{const_int} or
@code{const_double} with mode @code{VOIDmode}."

So because the const_int 0 has VOIDmode the comparison is done
in y-mode not x-mode.

But unfortunately I see no way how to accomplish this,
because this assumes that the compare can be easily swapped
if the conditional instruction just uses one of GT/GE/LE/LT
or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.

And in this example we ask for "overflow", but while 0-X can
overflow X-0 simply can't.  And moreover there are non-symmetric
modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
the swapped conditions GT/LE/GTU/LEU.

I think the only solution would be to adjust the spec to
reflect the implementation:

Index: rtl.texi
===================================================================
--- rtl.texi	(revision 251752)
+++ rtl.texi	(working copy)
@@ -2252,6 +2252,13 @@
  If one of the operands is a constant, it should be placed in the
  second operand and the comparison code adjusted as appropriate.

+There may be exceptions from this rule if the mode @var{m} carries
+not enough information for the swapped comparison operator, or
+if we ask for overflow from the subtraction.  That means, while
+0-X may overfow X-0 can never overflow.  Under these conditions
+a compare may have the constant expression at the left side.
+Examples are the ARM negdi2_compare pattern and similar.
+
  A @code{compare} specifying two @code{VOIDmode} constants is not valid
  since there is no way to know in what mode the comparison is to be
  performed; the comparison must either be folded during the compilation



Please advise.

Thanks
Bernd.


> 
> Hmmm...
> 
> I think the compare is not a commutative operation, and swapping
> the arguments will imply a different value in the flags.
> 
> So if I write
> (set (reg:CC_NCV CC_REGNUM)
>       (compare:CC_NCV
>         (const_int 0)
>         (reg:DI 123)))
> 
> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
> N,V = usable for LT, GE
> 
> But if I write
> (set (reg:CC_NCV CC_REGNUM)
>       (compare:CC_NCV
>         (reg:DI 123)
>         (const_int 0)))
> 
> I have C,N,V set to the result of (r123 - 0), but the expansion stays
> the same and the actual value in the flags is defined by the expansion.
> Of course there exists probably no matching expansion for that.
> 
> Note that both LTU in the above hunk are in a parallel-stmt and operate
> on the flags from the previous pattern, so changing these to GEU
> will probably be wrong.
> 
> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
> (const_int 0) (match_dup 1)).
> 
> One use of the resulting flags (I know of) is in negvdi3 where we
> have:
> 
>    emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>    arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
> 
> I think only 0-x can overflow while x-0 can never overflow.
> 
> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
> a lot and I guess if this pattern is created by a splitter, this can
> only be expanded by an exactly matching pattern, there is (hopefully)
> no way how combine could mess with this pattern due to the exotic
> CCmode.  So while I think it would work to swap only the notation of
> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
> consuming statements, that would make it quite hard to follow for the
> human reader at least.
> 
> What do you think?
> 
> 
> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-09-06 12:44             ` Bernd Edlinger
@ 2017-09-06 12:52               ` Richard Earnshaw (lists)
  2017-09-06 13:00                 ` Bernd Edlinger
  2017-09-06 13:17                 ` Bernd Edlinger
  0 siblings, 2 replies; 21+ messages in thread
From: Richard Earnshaw (lists) @ 2017-09-06 12:52 UTC (permalink / raw)
  To: Bernd Edlinger, Kyrill Tkachov, gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

On 06/09/17 13:44, Bernd Edlinger wrote:
> On 09/04/17 21:54, Bernd Edlinger wrote:
>> Hi Kyrill,
>>
>> Thanks for your review!
>>
>>
>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>> Hi Bernd,
>>>
>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>> one.
>>>>>>>>
>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>> before reload, a mis-compilation in libgcc function 
>>>>>>>> __gnu_satfractdasq
>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>
>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>> up into this:
>>>>>>>>
>>>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>>>           (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>      (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>                      (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>                 (set (match_dup 2)
>>>>>>>>                      (minus:SI (match_dup 5)
>>>>>>>>                               (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>>>>> (const_int
>>>>>>>> 0))))])]
>>>>>>>>
>>>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>>>           (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>      (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>                 (set (reg:CC CC_REGNUM)
>>>>>>>>                      (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>
>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>> redundant and thus got removed, because the data values are 
>>>>>>>> identical.
>>>>>>>>
>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>> flow is happening through the CC reg.
>>>>>>>>
>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>
>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>> know what the instructions are really doing.
>>>>>>>>
>>>>>>>>
>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>> Is it OK for trunk?
>>>>>>>>
>>>>>>> I agree you've found a valid problem here, but I have some issues 
>>>>>>> with
>>>>>>> the patch itself.
>>>>>>>
>>>>>>>
>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>    [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>      (compare:CC_NCV
>>>>>>>        (match_operand:DI 1 "register_operand" "r")
>>>>>>>        (match_operand:DI 2 "register_operand" "r")))
>>>>>>>     (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>      (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>    "TARGET_32BIT"
>>>>>>>    "#"
>>>>>>>    "&& reload_completed"
>>>>>>>    [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>             (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>            (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>>>>>>> 2)))])
>>>>>>>     (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>             (compare:CC_C
>>>>>>>               (zero_extend:DI (match_dup 4))
>>>>>>>               (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>                    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>            (set (match_dup 3)
>>>>>>>             (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>                   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>
>>>>>>>
>>>>>>> This pattern is now no-longer self consistent in that before the 
>>>>>>> split
>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>> afterwards it is just CC_C.
>>>>>>>
>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>> reflect the result of the 64-bit comparison), but that then 
>>>>>>> implies that
>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>>>>>>> should in
>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>>>>>>> agree
>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>
>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>> fixing this (it's possible that we might need a change to 
>>>>>>> select_cc_mode
>>>>>>> as well).
>>>>>>>
>>>>>> Yes, this is still a bit awkward...
>>>>>>
>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>>> one.
>>>>>>
>>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but 
>>>>>> it is
>>>>>> hard to write down the correct rtl expression for it.
>>>>>>
>>>>>> In theory the pattern should describe everything correctly,
>>>>>> maybe, like:
>>>>>>
>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>      (compare:CC_C
>>>>>>        (zero_extend:DI (match_dup 4))
>>>>>>        (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>                 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>      (compare:CC_NV
>>>>>>       (match_dup 4))
>>>>>>       (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>>> (const_int 0)))
>>>>>> set (match_dup 3)
>>>>>>      (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>                (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>
>>>>>>
>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>> in parallel?
>>>>>>
>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>> bits it really defines?
>>>>>>
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>> I think maybe the right solution is to invent a new CCmode
>>>>> that defines C as if the comparison is done in DImode
>>>>> but N and V as if the comparison is done in SImode.
>>>>>
>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>
>>>>> Attached is a new version that implements the new CCmode.
>>>>>
>>>>> How do you like this new version?
>>>>>
>>>>> It seems to be able to build a cross-compiler at least.
>>>>>
>>>>> I will start a new bootstrap with this new patch, but that can take 
>>>>> some
>>>>> time until I have definitive results.
>>>>>
>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>> for the next stage1?
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>
>>>> I thought I should also look at where the subdi_compare1 amd the
>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>> not having all CC bits available.
>>>>
>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>> CCmode.
>>>>
>>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>>
>>>> So I tried to test this pattern, with the following test programs,
>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>> surprising.
>>>>
>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>
>>>> cat test.c (DImode)
>>>> unsigned long long x, y, z;
>>>> int b;
>>>> void test()
>>>> {
>>>>     b = __builtin_sub_overflow (y,z, &x);
>>>> }
>>>>
>>>>
>>>> unpatched code used 8 byte more stack than patched,
>>>> because the DImode subtraction is effectively done twice.
>>>>
>>>> cat test1.c (SImode)
>>>> unsigned long x, y, z;
>>>> int b;
>>>> void test()
>>>> {
>>>>     b = __builtin_sub_overflow (y,z, &x);
>>>> }
>>>>
>>>> which generates (unpatched):
>>>>           cmp     r3, r0
>>>>           sub     ip, r3, r0
>>>>
>>>> instead of expected (patched):
>>>>     subs    r3, r3, r2
>>>>
>>>>
>>>> The condition is extracted by ifconversion and/or combine
>>>> and complicates the resulting code instead of simplifying.
>>>>
>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>> is using the same CC mode.
>>>>
>>>> That does not happen when the CC modes disagree, as with the
>>>> proposed patch.  All other uses of the pattern are already using
>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>
>>>> Attached is an updated version of the patch, that happens to
>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>> as a side effect.
>>>>
>>>>
>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>> Is it OK for trunk?
>>>
>>> I'm very sorry it has taken so long to review.
>>> I've been ramping up on the context recently now so I'll try to move 
>>> this along...
>>>
>>> This patch looks mostly ok to me from reading the patterns and the 
>>> discussion around it.
>>> I have one concern:
>>>
>>>
>>>   (define_insn_and_split "negdi2_compare"
>>> -  [(set (reg:CC CC_REGNUM)
>>> -    (compare:CC
>>> +  [(set (reg:CC_NCV CC_REGNUM)
>>> +    (compare:CC_NCV
>>>         (const_int 0)
>>>         (match_operand:DI 1 "register_operand" "0,r")))
>>>      (set (match_operand:DI 0 "register_operand" "=r,&r")
>>> @@ -4647,8 +4650,12 @@
>>>              (compare:CC (const_int 0) (match_dup 1)))
>>>             (set (match_dup 0) (minus:SI (const_int 0)
>>>                          (match_dup 1)))])
>>> -   (parallel [(set (reg:CC CC_REGNUM)
>>> -           (compare:CC (const_int 0) (match_dup 3)))
>>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>> +           (compare:CC_NCV_CIC
>>> +             (const_int 0)
>>> +             (plus:DI
>>> +               (zero_extend:DI (match_dup 3))
>>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>            (set (match_dup 2)
>>>             (minus:SI
>>>              (minus:SI (const_int 0) (match_dup 3))
>>>
>>>
>>> I was somewhat concerned with having the first operand of the COMPARE 
>>> being a const_int 0 and the second being
>>> a complex expression as the RTL canonicalization rules usually require 
>>> the complex operand going first if possible.
>>> Reading the RTL rules in rtl.texi I see it says this:
>>> "If one of the operands is a constant, it should be placed in the
>>> second operand and the comparison code adjusted as appropriate."
>>> So it seems that the pre-existing pattern that puts const_int 0 as the 
>>> first operand already breaks that rule.
>>> I think we should fix that and update the use of condition code to a 
>>> GEU rather than LTU as well.
>>>
>>
> 
> Well, the sentence before that one is even more explicit:
> 
> "Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
> @code{compare} is valid only if the mode of @var{x} is in class
> @code{MODE_INT} and @var{y} is a @code{const_int} or
> @code{const_double} with mode @code{VOIDmode}."
> 
> So because the const_int 0 has VOIDmode the comparison is done
> in y-mode not x-mode.
> 
> But unfortunately I see no way how to accomplish this,
> because this assumes that the compare can be easily swapped
> if the conditional instruction just uses one of GT/GE/LE/LT
> or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.
> 
> And in this example we ask for "overflow", but while 0-X can
> overflow X-0 simply can't.  And moreover there are non-symmetric
> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
> the swapped conditions GT/LE/GTU/LEU.
> 
> I think the only solution would be to adjust the spec to
> reflect the implementation:
> 
> Index: rtl.texi
> ===================================================================
> --- rtl.texi	(revision 251752)
> +++ rtl.texi	(working copy)
> @@ -2252,6 +2252,13 @@
>   If one of the operands is a constant, it should be placed in the
>   second operand and the comparison code adjusted as appropriate.
> 
> +There may be exceptions from this rule if the mode @var{m} carries
> +not enough information for the swapped comparison operator, or

There may be exceptions _to_ ... if mode @var{m} does not carry enough...

> +if we ask for overflow from the subtraction.  

Aren't we really trying to 'detect overflow' rather than 'ask' for it?

> That means, while
> +0-X may overfow X-0 can never overflow.  Under these conditions
> +a compare may have the constant expression at the left side.

In these circumstances the constant will be in the first operand .

(left and right don't really make sense for RTL).
> +Examples are the ARM negdi2_compare pattern and similar.
> +
>   A @code{compare} specifying two @code{VOIDmode} constants is not valid
>   since there is no way to know in what mode the comparison is to be
>   performed; the comparison must either be folded during the compilation
> 
> 
> 
> Please advise.
> 
> Thanks
> Bernd.
> 
> 
>>
>> Hmmm...
>>
>> I think the compare is not a commutative operation, and swapping
>> the arguments will imply a different value in the flags.
>>
>> So if I write
>> (set (reg:CC_NCV CC_REGNUM)
>>       (compare:CC_NCV
>>         (const_int 0)
>>         (reg:DI 123)))
>>
>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
>> N,V = usable for LT, GE
>>
>> But if I write
>> (set (reg:CC_NCV CC_REGNUM)
>>       (compare:CC_NCV
>>         (reg:DI 123)
>>         (const_int 0)))
>>
>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>> the same and the actual value in the flags is defined by the expansion.
>> Of course there exists probably no matching expansion for that.
>>
>> Note that both LTU in the above hunk are in a parallel-stmt and operate
>> on the flags from the previous pattern, so changing these to GEU
>> will probably be wrong.
>>
>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>> (const_int 0) (match_dup 1)).
>>
>> One use of the resulting flags (I know of) is in negvdi3 where we
>> have:
>>
>>    emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>    arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>
>> I think only 0-x can overflow while x-0 can never overflow.
>>
>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>> a lot and I guess if this pattern is created by a splitter, this can
>> only be expanded by an exactly matching pattern, there is (hopefully)
>> no way how combine could mess with this pattern due to the exotic
>> CCmode.  So while I think it would work to swap only the notation of
>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
>> consuming statements, that would make it quite hard to follow for the
>> human reader at least.
>>
>> What do you think?
>>
>>
>> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-09-06 12:52               ` Richard Earnshaw (lists)
@ 2017-09-06 13:00                 ` Bernd Edlinger
  2017-09-06 13:17                 ` Bernd Edlinger
  1 sibling, 0 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-09-06 13:00 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Kyrill Tkachov, gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
> On 06/09/17 13:44, Bernd Edlinger wrote:
>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>> Hi Kyrill,
>>>
>>> Thanks for your review!
>>>
>>>
>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>>> Hi Bernd,
>>>>
>>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>>> one.
>>>>>>>>>
>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>> __gnu_satfractdasq
>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>
>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>> up into this:
>>>>>>>>>
>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>            (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>       (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>                       (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>                  (set (match_dup 2)
>>>>>>>>>                       (minus:SI (match_dup 5)
>>>>>>>>>                                (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>> (const_int
>>>>>>>>> 0))))])]
>>>>>>>>>
>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>            (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>       (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>                  (set (reg:CC CC_REGNUM)
>>>>>>>>>                       (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>
>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>> identical.
>>>>>>>>>
>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>
>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>
>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>> Is it OK for trunk?
>>>>>>>>>
>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>> with
>>>>>>>> the patch itself.
>>>>>>>>
>>>>>>>>
>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>     [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>       (compare:CC_NCV
>>>>>>>>         (match_operand:DI 1 "register_operand" "r")
>>>>>>>>         (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>      (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>       (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>     "TARGET_32BIT"
>>>>>>>>     "#"
>>>>>>>>     "&& reload_completed"
>>>>>>>>     [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>              (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>             (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>> 2)))])
>>>>>>>>      (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>              (compare:CC_C
>>>>>>>>                (zero_extend:DI (match_dup 4))
>>>>>>>>                (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>                     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>             (set (match_dup 3)
>>>>>>>>              (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>                    (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>
>>>>>>>>
>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>> split
>>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>>> afterwards it is just CC_C.
>>>>>>>>
>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>> implies that
>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>> should in
>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>>> agree
>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>
>>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>> select_cc_mode
>>>>>>>> as well).
>>>>>>>>
>>>>>>> Yes, this is still a bit awkward...
>>>>>>>
>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>>>> one.
>>>>>>>
>>>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>> it is
>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>
>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>> maybe, like:
>>>>>>>
>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>       (compare:CC_C
>>>>>>>         (zero_extend:DI (match_dup 4))
>>>>>>>         (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>                  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>       (compare:CC_NV
>>>>>>>        (match_dup 4))
>>>>>>>        (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>> (const_int 0)))
>>>>>>> set (match_dup 3)
>>>>>>>       (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>                 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>
>>>>>>>
>>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>>> in parallel?
>>>>>>>
>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>> bits it really defines?
>>>>>>>
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>> that defines C as if the comparison is done in DImode
>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>
>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>
>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>
>>>>>> How do you like this new version?
>>>>>>
>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>
>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>> some
>>>>>> time until I have definitive results.
>>>>>>
>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>> for the next stage1?
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>
>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>> not having all CC bits available.
>>>>>
>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>> CCmode.
>>>>>
>>>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>>>
>>>>> So I tried to test this pattern, with the following test programs,
>>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>> surprising.
>>>>>
>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>
>>>>> cat test.c (DImode)
>>>>> unsigned long long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>>
>>>>> unpatched code used 8 byte more stack than patched,
>>>>> because the DImode subtraction is effectively done twice.
>>>>>
>>>>> cat test1.c (SImode)
>>>>> unsigned long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>> which generates (unpatched):
>>>>>            cmp     r3, r0
>>>>>            sub     ip, r3, r0
>>>>>
>>>>> instead of expected (patched):
>>>>>      subs    r3, r3, r2
>>>>>
>>>>>
>>>>> The condition is extracted by ifconversion and/or combine
>>>>> and complicates the resulting code instead of simplifying.
>>>>>
>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>> is using the same CC mode.
>>>>>
>>>>> That does not happen when the CC modes disagree, as with the
>>>>> proposed patch.  All other uses of the pattern are already using
>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>
>>>>> Attached is an updated version of the patch, that happens to
>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>> as a side effect.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>> Is it OK for trunk?
>>>>
>>>> I'm very sorry it has taken so long to review.
>>>> I've been ramping up on the context recently now so I'll try to move
>>>> this along...
>>>>
>>>> This patch looks mostly ok to me from reading the patterns and the
>>>> discussion around it.
>>>> I have one concern:
>>>>
>>>>
>>>>    (define_insn_and_split "negdi2_compare"
>>>> -  [(set (reg:CC CC_REGNUM)
>>>> -    (compare:CC
>>>> +  [(set (reg:CC_NCV CC_REGNUM)
>>>> +    (compare:CC_NCV
>>>>          (const_int 0)
>>>>          (match_operand:DI 1 "register_operand" "0,r")))
>>>>       (set (match_operand:DI 0 "register_operand" "=r,&r")
>>>> @@ -4647,8 +4650,12 @@
>>>>               (compare:CC (const_int 0) (match_dup 1)))
>>>>              (set (match_dup 0) (minus:SI (const_int 0)
>>>>                           (match_dup 1)))])
>>>> -   (parallel [(set (reg:CC CC_REGNUM)
>>>> -           (compare:CC (const_int 0) (match_dup 3)))
>>>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>>> +           (compare:CC_NCV_CIC
>>>> +             (const_int 0)
>>>> +             (plus:DI
>>>> +               (zero_extend:DI (match_dup 3))
>>>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>             (set (match_dup 2)
>>>>              (minus:SI
>>>>               (minus:SI (const_int 0) (match_dup 3))
>>>>
>>>>
>>>> I was somewhat concerned with having the first operand of the COMPARE
>>>> being a const_int 0 and the second being
>>>> a complex expression as the RTL canonicalization rules usually require
>>>> the complex operand going first if possible.
>>>> Reading the RTL rules in rtl.texi I see it says this:
>>>> "If one of the operands is a constant, it should be placed in the
>>>> second operand and the comparison code adjusted as appropriate."
>>>> So it seems that the pre-existing pattern that puts const_int 0 as the
>>>> first operand already breaks that rule.
>>>> I think we should fix that and update the use of condition code to a
>>>> GEU rather than LTU as well.
>>>>
>>>
>>
>> Well, the sentence before that one is even more explicit:
>>
>> "Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
>> @code{compare} is valid only if the mode of @var{x} is in class
>> @code{MODE_INT} and @var{y} is a @code{const_int} or
>> @code{const_double} with mode @code{VOIDmode}."
>>
>> So because the const_int 0 has VOIDmode the comparison is done
>> in y-mode not x-mode.
>>
>> But unfortunately I see no way how to accomplish this,
>> because this assumes that the compare can be easily swapped
>> if the conditional instruction just uses one of GT/GE/LE/LT
>> or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.
>>
>> And in this example we ask for "overflow", but while 0-X can
>> overflow X-0 simply can't.  And moreover there are non-symmetric
>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
>> the swapped conditions GT/LE/GTU/LEU.
>>
>> I think the only solution would be to adjust the spec to
>> reflect the implementation:
>>
>> Index: rtl.texi
>> ===================================================================
>> --- rtl.texi	(revision 251752)
>> +++ rtl.texi	(working copy)
>> @@ -2252,6 +2252,13 @@
>>    If one of the operands is a constant, it should be placed in the
>>    second operand and the comparison code adjusted as appropriate.
>>
>> +There may be exceptions from this rule if the mode @var{m} carries
>> +not enough information for the swapped comparison operator, or
> 
> There may be exceptions _to_ ... if mode @var{m} does not carry enough...
> 
>> +if we ask for overflow from the subtraction.
> 
> Aren't we really trying to 'detect overflow' rather than 'ask' for it?
> 

Yes :-), of course: it is used this way in a pattern that does
a negation and a conditional branch if a overflow is detected:

    emit_insn (gen_negdi2_compare (operands[0], operands[1]));
    arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);



>> That means, while
>> +0-X may overfow X-0 can never overflow.  Under these conditions
>> +a compare may have the constant expression at the left side.
> 
> In these circumstances the constant will be in the first operand .
> 
> (left and right don't really make sense for RTL).

Yes, thanks.  Corrected paragraph below:

Index: rtl.texi
===================================================================
--- rtl.texi	(Revision 251752)
+++ rtl.texi	(Arbeitskopie)
@@ -2252,6 +2252,13 @@
  If one of the operands is a constant, it should be placed in the
  second operand and the comparison code adjusted as appropriate.

+There may be exceptions from this rule if the mode @var{m} carries
+not enough information for the swapped comparison operator, or
+if we try to detect overflow from the subtraction.  That means, while
+0-X may overfow X-0 can never overflow.  Under these conditions
+a compare may have the constant expression at the first operand.
+Examples are the ARM negdi2_compare pattern and similar.
+
  A @code{compare} specifying two @code{VOIDmode} constants is not valid
  since there is no way to know in what mode the comparison is to be
  performed; the comparison must either be folded during the compilation


>> +Examples are the ARM negdi2_compare pattern and similar.
>> +
>>    A @code{compare} specifying two @code{VOIDmode} constants is not valid
>>    since there is no way to know in what mode the comparison is to be
>>    performed; the comparison must either be folded during the compilation
>>
>>
>>
>> Please advise.
>>
>> Thanks
>> Bernd.
>>
>>
>>>
>>> Hmmm...
>>>
>>> I think the compare is not a commutative operation, and swapping
>>> the arguments will imply a different value in the flags.
>>>
>>> So if I write
>>> (set (reg:CC_NCV CC_REGNUM)
>>>        (compare:CC_NCV
>>>          (const_int 0)
>>>          (reg:DI 123)))
>>>
>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
>>> N,V = usable for LT, GE
>>>
>>> But if I write
>>> (set (reg:CC_NCV CC_REGNUM)
>>>        (compare:CC_NCV
>>>          (reg:DI 123)
>>>          (const_int 0)))
>>>
>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>>> the same and the actual value in the flags is defined by the expansion.
>>> Of course there exists probably no matching expansion for that.
>>>
>>> Note that both LTU in the above hunk are in a parallel-stmt and operate
>>> on the flags from the previous pattern, so changing these to GEU
>>> will probably be wrong.
>>>
>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>>> (const_int 0) (match_dup 1)).
>>>
>>> One use of the resulting flags (I know of) is in negvdi3 where we
>>> have:
>>>
>>>     emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>>     arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>>
>>> I think only 0-x can overflow while x-0 can never overflow.
>>>
>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>>> a lot and I guess if this pattern is created by a splitter, this can
>>> only be expanded by an exactly matching pattern, there is (hopefully)
>>> no way how combine could mess with this pattern due to the exotic
>>> CCmode.  So while I think it would work to swap only the notation of
>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
>>> consuming statements, that would make it quite hard to follow for the
>>> human reader at least.
>>>
>>> What do you think?
>>>
>>>
>>> Bernd.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-09-06 12:52               ` Richard Earnshaw (lists)
  2017-09-06 13:00                 ` Bernd Edlinger
@ 2017-09-06 13:17                 ` Bernd Edlinger
  2017-09-06 15:31                   ` Kyrill Tkachov
  2017-10-09 13:02                   ` Richard Earnshaw (lists)
  1 sibling, 2 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-09-06 13:17 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Kyrill Tkachov, gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

[-- Attachment #1: Type: text/plain, Size: 16538 bytes --]

On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
> On 06/09/17 13:44, Bernd Edlinger wrote:
>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>> Hi Kyrill,
>>>
>>> Thanks for your review!
>>>
>>>
>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>>> Hi Bernd,
>>>>
>>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>>> one.
>>>>>>>>>
>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>> __gnu_satfractdasq
>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>
>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>> up into this:
>>>>>>>>>
>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>            (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>       (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>                       (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>                  (set (match_dup 2)
>>>>>>>>>                       (minus:SI (match_dup 5)
>>>>>>>>>                                (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>> (const_int
>>>>>>>>> 0))))])]
>>>>>>>>>
>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>            (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>       (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>                  (set (reg:CC CC_REGNUM)
>>>>>>>>>                       (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>
>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>> identical.
>>>>>>>>>
>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>
>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>
>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>> Is it OK for trunk?
>>>>>>>>>
>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>> with
>>>>>>>> the patch itself.
>>>>>>>>
>>>>>>>>
>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>     [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>       (compare:CC_NCV
>>>>>>>>         (match_operand:DI 1 "register_operand" "r")
>>>>>>>>         (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>      (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>       (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>     "TARGET_32BIT"
>>>>>>>>     "#"
>>>>>>>>     "&& reload_completed"
>>>>>>>>     [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>              (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>             (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>> 2)))])
>>>>>>>>      (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>              (compare:CC_C
>>>>>>>>                (zero_extend:DI (match_dup 4))
>>>>>>>>                (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>                     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>             (set (match_dup 3)
>>>>>>>>              (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>                    (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>
>>>>>>>>
>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>> split
>>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>>> afterwards it is just CC_C.
>>>>>>>>
>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>> implies that
>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>> should in
>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>>> agree
>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>
>>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>> select_cc_mode
>>>>>>>> as well).
>>>>>>>>
>>>>>>> Yes, this is still a bit awkward...
>>>>>>>
>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>>>> one.
>>>>>>>
>>>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>> it is
>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>
>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>> maybe, like:
>>>>>>>
>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>       (compare:CC_C
>>>>>>>         (zero_extend:DI (match_dup 4))
>>>>>>>         (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>                  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>       (compare:CC_NV
>>>>>>>        (match_dup 4))
>>>>>>>        (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>> (const_int 0)))
>>>>>>> set (match_dup 3)
>>>>>>>       (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>                 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>
>>>>>>>
>>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>>> in parallel?
>>>>>>>
>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>> bits it really defines?
>>>>>>>
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>> that defines C as if the comparison is done in DImode
>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>
>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>
>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>
>>>>>> How do you like this new version?
>>>>>>
>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>
>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>> some
>>>>>> time until I have definitive results.
>>>>>>
>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>> for the next stage1?
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>
>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>> not having all CC bits available.
>>>>>
>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>> CCmode.
>>>>>
>>>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>>>
>>>>> So I tried to test this pattern, with the following test programs,
>>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>> surprising.
>>>>>
>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>
>>>>> cat test.c (DImode)
>>>>> unsigned long long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>>
>>>>> unpatched code used 8 byte more stack than patched,
>>>>> because the DImode subtraction is effectively done twice.
>>>>>
>>>>> cat test1.c (SImode)
>>>>> unsigned long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>> which generates (unpatched):
>>>>>            cmp     r3, r0
>>>>>            sub     ip, r3, r0
>>>>>
>>>>> instead of expected (patched):
>>>>>      subs    r3, r3, r2
>>>>>
>>>>>
>>>>> The condition is extracted by ifconversion and/or combine
>>>>> and complicates the resulting code instead of simplifying.
>>>>>
>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>> is using the same CC mode.
>>>>>
>>>>> That does not happen when the CC modes disagree, as with the
>>>>> proposed patch.  All other uses of the pattern are already using
>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>
>>>>> Attached is an updated version of the patch, that happens to
>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>> as a side effect.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>> Is it OK for trunk?
>>>>
>>>> I'm very sorry it has taken so long to review.
>>>> I've been ramping up on the context recently now so I'll try to move
>>>> this along...
>>>>
>>>> This patch looks mostly ok to me from reading the patterns and the
>>>> discussion around it.
>>>> I have one concern:
>>>>
>>>>
>>>>    (define_insn_and_split "negdi2_compare"
>>>> -  [(set (reg:CC CC_REGNUM)
>>>> -    (compare:CC
>>>> +  [(set (reg:CC_NCV CC_REGNUM)
>>>> +    (compare:CC_NCV
>>>>          (const_int 0)
>>>>          (match_operand:DI 1 "register_operand" "0,r")))
>>>>       (set (match_operand:DI 0 "register_operand" "=r,&r")
>>>> @@ -4647,8 +4650,12 @@
>>>>               (compare:CC (const_int 0) (match_dup 1)))
>>>>              (set (match_dup 0) (minus:SI (const_int 0)
>>>>                           (match_dup 1)))])
>>>> -   (parallel [(set (reg:CC CC_REGNUM)
>>>> -           (compare:CC (const_int 0) (match_dup 3)))
>>>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>>> +           (compare:CC_NCV_CIC
>>>> +             (const_int 0)
>>>> +             (plus:DI
>>>> +               (zero_extend:DI (match_dup 3))
>>>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>             (set (match_dup 2)
>>>>              (minus:SI
>>>>               (minus:SI (const_int 0) (match_dup 3))
>>>>
>>>>
>>>> I was somewhat concerned with having the first operand of the COMPARE
>>>> being a const_int 0 and the second being
>>>> a complex expression as the RTL canonicalization rules usually require
>>>> the complex operand going first if possible.
>>>> Reading the RTL rules in rtl.texi I see it says this:
>>>> "If one of the operands is a constant, it should be placed in the
>>>> second operand and the comparison code adjusted as appropriate."
>>>> So it seems that the pre-existing pattern that puts const_int 0 as the
>>>> first operand already breaks that rule.
>>>> I think we should fix that and update the use of condition code to a
>>>> GEU rather than LTU as well.
>>>>
>>>
>>
>> Well, the sentence before that one is even more explicit:
>>
>> "Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
>> @code{compare} is valid only if the mode of @var{x} is in class
>> @code{MODE_INT} and @var{y} is a @code{const_int} or
>> @code{const_double} with mode @code{VOIDmode}."
>>
>> So because the const_int 0 has VOIDmode the comparison is done
>> in y-mode not x-mode.
>>
>> But unfortunately I see no way how to accomplish this,
>> because this assumes that the compare can be easily swapped
>> if the conditional instruction just uses one of GT/GE/LE/LT
>> or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.
>>
>> And in this example we ask for "overflow", but while 0-X can
>> overflow X-0 simply can't.  And moreover there are non-symmetric
>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
>> the swapped conditions GT/LE/GTU/LEU.
>>
>> I think the only solution would be to adjust the spec to
>> reflect the implementation:
>>
>> Index: rtl.texi
>> ===================================================================
>> --- rtl.texi	(revision 251752)
>> +++ rtl.texi	(working copy)
>> @@ -2252,6 +2252,13 @@
>>    If one of the operands is a constant, it should be placed in the
>>    second operand and the comparison code adjusted as appropriate.
>>
>> +There may be exceptions from this rule if the mode @var{m} carries
>> +not enough information for the swapped comparison operator, or
> 
> There may be exceptions _to_ ... if mode @var{m} does not carry enough...
> 
>> +if we ask for overflow from the subtraction.
> 
> Aren't we really trying to 'detect overflow' rather than 'ask' for it?
> 
>> That means, while
>> +0-X may overfow X-0 can never overflow.  Under these conditions
>> +a compare may have the constant expression at the left side.
> 
> In these circumstances the constant will be in the first operand .
> 
> (left and right don't really make sense for RTL).
>> +Examples are the ARM negdi2_compare pattern and similar.
>> +
>>    A @code{compare} specifying two @code{VOIDmode} constants is not valid
>>    since there is no way to know in what mode the comparison is to be
>>    performed; the comparison must either be folded during the compilation
>>
>>
>>
>> Please advise.
>>
>> Thanks
>> Bernd.
>>
>>
>>>
>>> Hmmm...
>>>
>>> I think the compare is not a commutative operation, and swapping
>>> the arguments will imply a different value in the flags.
>>>
>>> So if I write
>>> (set (reg:CC_NCV CC_REGNUM)
>>>        (compare:CC_NCV
>>>          (const_int 0)
>>>          (reg:DI 123)))
>>>
>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
>>> N,V = usable for LT, GE
>>>
>>> But if I write
>>> (set (reg:CC_NCV CC_REGNUM)
>>>        (compare:CC_NCV
>>>          (reg:DI 123)
>>>          (const_int 0)))
>>>
>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>>> the same and the actual value in the flags is defined by the expansion.
>>> Of course there exists probably no matching expansion for that.
>>>
>>> Note that both LTU in the above hunk are in a parallel-stmt and operate
>>> on the flags from the previous pattern, so changing these to GEU
>>> will probably be wrong.
>>>
>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>>> (const_int 0) (match_dup 1)).
>>>
>>> One use of the resulting flags (I know of) is in negvdi3 where we
>>> have:
>>>
>>>     emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>>     arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>>
>>> I think only 0-x can overflow while x-0 can never overflow.
>>>
>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>>> a lot and I guess if this pattern is created by a splitter, this can
>>> only be expanded by an exactly matching pattern, there is (hopefully)
>>> no way how combine could mess with this pattern due to the exotic
>>> CCmode.  So while I think it would work to swap only the notation of
>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
>>> consuming statements, that would make it quite hard to follow for the
>>> human reader at least.
>>>
>>> What do you think?
>>>
>>>
>>> Bernd.
> 

Attached is the patch with an update to the rtl.texi documentation.
The code does not change, so I did no new bootstrap.


Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr77308-5.diff --]
[-- Type: text/x-patch; name="patch-pr77308-5.diff", Size: 23015 bytes --]

2017-09-06  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/77308
	* doc/rtl.texi: Update documentation.
	* config/arm/arm-modes.def (CC_NCV_CIC): New mode.
	* config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
	adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
	subsi3_carryin_compare, subsi3_carryin_compare_const,
	negdi2_compare, *negsi2_carryin_compare,
	*arm_cmpdi_insn): Fix the CC reg dataflow.
	(usubv<mode>4): Use CC_Cmode for the branch.

Index: gcc/config/arm/arm-modes.def
===================================================================
--- gcc/config/arm/arm-modes.def	(revision 244439)
+++ gcc/config/arm/arm-modes.def	(working copy)
@@ -38,6 +38,8 @@
    (used for DImode unsigned comparisons).
    CC_NCVmode should be used if only the N, C, and V flags are correct
    (used for DImode signed comparisons).
+   CC_NCV_CICmode defines N and V in SImode and C in DImode
+   (used for carryin_compare patterns).
    CCmode should be used otherwise.  */
 
 CC_MODE (CC_NOOV);
@@ -44,6 +46,7 @@
 CC_MODE (CC_Z);
 CC_MODE (CC_CZ);
 CC_MODE (CC_NCV);
+CC_MODE (CC_NCV_CIC);
 CC_MODE (CC_SWP);
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 244439)
+++ gcc/config/arm/arm.md	(working copy)
@@ -669,17 +669,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_V CC_REGNUM)
 		   (ne:CC_V
-		    (plus:DI (plus:DI
-			      (sign_extend:DI (match_dup 4))
-			      (sign_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (sign_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI (plus:SI
-					  (match_dup 4) (match_dup 5))
-					 (ltu:SI (reg:CC_C CC_REGNUM)
-						 (const_int 0))))])]
+		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
+				       (sign_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (sign_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -713,13 +711,13 @@
   [(set (reg:CC_V CC_REGNUM)
 	(ne:CC_V
 	  (plus:DI
-	   (plus:DI
-	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (sign_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+				   (ltu:SI (reg:CC_C CC_REGNUM)
+					   (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
 	(plus:SI
 	 (plus:SI (match_dup 1) (match_dup 2))
@@ -748,17 +746,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_C CC_REGNUM)
 		   (ne:CC_C
-		    (plus:DI (plus:DI
-			      (zero_extend:DI (match_dup 4))
-			      (zero_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (zero_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI
-				 (plus:SI (match_dup 4) (match_dup 5))
-				 (ltu:SI (reg:CC_C CC_REGNUM)
-					 (const_int 0))))])]
+		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
+				       (zero_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (zero_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -777,17 +773,16 @@
   [(set (reg:CC_C CC_REGNUM)
 	(ne:CC_C
 	  (plus:DI
-	   (plus:DI
-	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (zero_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (zero_extend:DI
+	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
-	(plus:SI
-	 (plus:SI (match_dup 1) (match_dup 2))
-	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "adcs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1080,14 +1075,14 @@
   "TARGET_32BIT"
 {
   emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
-  arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
+  arm_gen_unlikely_cbranch (EQ, CC_Cmode, operands[3]);
 
   DONE;
 })
 
 (define_insn_and_split "subdi3_compare1"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operand:DI 2 "register_operand" "r")))
    (set (match_operand:DI 0 "register_operand" "=&r")
@@ -1098,10 +1093,14 @@
   [(parallel [(set (reg:CC CC_REGNUM)
 		   (compare:CC (match_dup 1) (match_dup 2)))
 	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 4) (match_dup 5)))
-	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
-			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 4))
+		     (plus:DI (zero_extend:DI (match_dup 5))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 3)
+		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
@@ -1157,13 +1156,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
-                    (match_operand:SI 2 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (minus:SI (match_dup 1)
-                            (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1171,13 +1172,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare_const"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
-                    (match_operand:SI 2 "arm_not_operand" "K")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (plus:SI (match_dup 1)
-                           (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, #%B2"
   [(set_attr "conds" "set")
@@ -4634,8 +4637,8 @@
 
 
 (define_insn_and_split "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (const_int 0)
 	  (match_operand:DI 1 "register_operand" "0,r")))
    (set (match_operand:DI 0 "register_operand" "=r,&r")
@@ -4647,8 +4650,12 @@
 		   (compare:CC (const_int 0) (match_dup 1)))
 	      (set (match_dup 0) (minus:SI (const_int 0)
 					   (match_dup 1)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (const_int 0) (match_dup 3)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (const_int 0)
+		     (plus:DI
+		       (zero_extend:DI (match_dup 3))
+		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
 	     (set (match_dup 2)
 		  (minus:SI
 		   (minus:SI (const_int 0) (match_dup 3))
@@ -4707,12 +4714,14 @@
 )
 
 (define_insn "*negsi2_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (const_int 0)
-		    (match_operand:SI 1 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (const_int 0)
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-	(minus:SI (minus:SI (const_int 0)
-			    (match_dup 1))
+	(minus:SI (minus:SI (const_int 0) (match_dup 1))
 		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_ARM"
   "rscs\\t%0, %1, #0"
@@ -7361,12 +7370,15 @@
   "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_dup 0) (match_dup 1)))
-   (parallel [(set (reg:CC CC_REGNUM)
-                   (compare:CC (match_dup 3) (match_dup 4)))
-              (set (match_dup 2)
-                   (minus:SI (match_dup 5)
-                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+	(compare:CC (match_dup 0) (match_dup 1)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 3))
+		     (plus:DI (zero_extend:DI (match_dup 4))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 2)
+		   (minus:SI (match_dup 5)
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
Index: gcc/config/arm/arm-modes.def
===================================================================
--- gcc/config/arm/arm-modes.def	(revision 251752)
+++ gcc/config/arm/arm-modes.def	(working copy)
@@ -38,6 +38,8 @@
    (used for DImode unsigned comparisons).
    CC_NCVmode should be used if only the N, C, and V flags are correct
    (used for DImode signed comparisons).
+   CC_NCV_CICmode defines N and V in SImode and C in DImode
+   (used for carryin_compare patterns).
    CCmode should be used otherwise.  */
 
 CC_MODE (CC_NOOV);
@@ -44,6 +46,7 @@
 CC_MODE (CC_Z);
 CC_MODE (CC_CZ);
 CC_MODE (CC_NCV);
+CC_MODE (CC_NCV_CIC);
 CC_MODE (CC_SWP);
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 251752)
+++ gcc/config/arm/arm.md	(working copy)
@@ -664,17 +664,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_V CC_REGNUM)
 		   (ne:CC_V
-		    (plus:DI (plus:DI
-			      (sign_extend:DI (match_dup 4))
-			      (sign_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (sign_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI (plus:SI
-					  (match_dup 4) (match_dup 5))
-					 (ltu:SI (reg:CC_C CC_REGNUM)
-						 (const_int 0))))])]
+		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
+				       (sign_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (sign_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -708,13 +706,13 @@
   [(set (reg:CC_V CC_REGNUM)
 	(ne:CC_V
 	  (plus:DI
-	   (plus:DI
-	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (sign_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+				   (ltu:SI (reg:CC_C CC_REGNUM)
+					   (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
 	(plus:SI
 	 (plus:SI (match_dup 1) (match_dup 2))
@@ -743,17 +741,15 @@
 	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
    (parallel [(set (reg:CC_C CC_REGNUM)
 		   (ne:CC_C
-		    (plus:DI (plus:DI
-			      (zero_extend:DI (match_dup 4))
-			      (zero_extend:DI (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-		    (plus:DI (zero_extend:DI
-			      (plus:SI (match_dup 4) (match_dup 5)))
-			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
-	     (set (match_dup 3) (plus:SI
-				 (plus:SI (match_dup 4) (match_dup 5))
-				 (ltu:SI (reg:CC_C CC_REGNUM)
-					 (const_int 0))))])]
+		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
+				       (zero_extend:DI (match_dup 5)))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+		    (zero_extend:DI
+		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
+	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
+					  (ltu:SI (reg:CC_C CC_REGNUM)
+						  (const_int 0))))])]
   "
   {
     operands[3] = gen_highpart (SImode, operands[0]);
@@ -772,17 +768,16 @@
   [(set (reg:CC_C CC_REGNUM)
 	(ne:CC_C
 	  (plus:DI
-	   (plus:DI
-	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
-	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
-	  (plus:DI (zero_extend:DI
-		    (plus:SI (match_dup 1) (match_dup 2)))
-		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	    (plus:DI
+	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
+	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
+	  (zero_extend:DI
+	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
    (set (match_operand:SI 0 "register_operand" "=r")
-	(plus:SI
-	 (plus:SI (match_dup 1) (match_dup 2))
-	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "adcs%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1075,14 +1070,14 @@
   "TARGET_32BIT"
 {
   emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
-  arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
+  arm_gen_unlikely_cbranch (EQ, CC_Cmode, operands[3]);
 
   DONE;
 })
 
 (define_insn_and_split "subdi3_compare1"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operand:DI 2 "register_operand" "r")))
    (set (match_operand:DI 0 "register_operand" "=&r")
@@ -1093,10 +1088,14 @@
   [(parallel [(set (reg:CC CC_REGNUM)
 		   (compare:CC (match_dup 1) (match_dup 2)))
 	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (match_dup 4) (match_dup 5)))
-	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
-			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 4))
+		     (plus:DI (zero_extend:DI (match_dup 5))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 3)
+		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
@@ -1152,13 +1151,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
-                    (match_operand:SI 2 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (minus:SI (match_dup 1)
-                            (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, %2"
   [(set_attr "conds" "set")
@@ -1166,13 +1167,15 @@
 )
 
 (define_insn "*subsi3_carryin_compare_const"
-  [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
-                    (match_operand:SI 2 "arm_not_operand" "K")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-        (minus:SI (plus:SI (match_dup 1)
-                           (match_dup 2))
-                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
+	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
+		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_32BIT"
   "sbcs\\t%0, %1, #%B2"
   [(set_attr "conds" "set")
@@ -4686,8 +4689,8 @@
 
 
 (define_insn_and_split "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
+  [(set (reg:CC_NCV CC_REGNUM)
+	(compare:CC_NCV
 	  (const_int 0)
 	  (match_operand:DI 1 "register_operand" "0,r")))
    (set (match_operand:DI 0 "register_operand" "=r,&r")
@@ -4699,8 +4702,12 @@
 		   (compare:CC (const_int 0) (match_dup 1)))
 	      (set (match_dup 0) (minus:SI (const_int 0)
 					   (match_dup 1)))])
-   (parallel [(set (reg:CC CC_REGNUM)
-		   (compare:CC (const_int 0) (match_dup 3)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (const_int 0)
+		     (plus:DI
+		       (zero_extend:DI (match_dup 3))
+		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
 	     (set (match_dup 2)
 		  (minus:SI
 		   (minus:SI (const_int 0) (match_dup 3))
@@ -4759,12 +4766,14 @@
 )
 
 (define_insn "*negsi2_carryin_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC (const_int 0)
-		    (match_operand:SI 1 "s_register_operand" "r")))
+  [(set (reg:CC_NCV_CIC CC_REGNUM)
+	(compare:CC_NCV_CIC
+	  (const_int 0)
+	  (plus:DI
+	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
+	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
    (set (match_operand:SI 0 "s_register_operand" "=r")
-	(minus:SI (minus:SI (const_int 0)
-			    (match_dup 1))
+	(minus:SI (minus:SI (const_int 0) (match_dup 1))
 		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
   "TARGET_ARM"
   "rscs\\t%0, %1, #0"
@@ -7438,12 +7447,15 @@
   "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
-        (compare:CC (match_dup 0) (match_dup 1)))
-   (parallel [(set (reg:CC CC_REGNUM)
-                   (compare:CC (match_dup 3) (match_dup 4)))
-              (set (match_dup 2)
-                   (minus:SI (match_dup 5)
-                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
+	(compare:CC (match_dup 0) (match_dup 1)))
+   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
+		   (compare:CC_NCV_CIC
+		     (zero_extend:DI (match_dup 3))
+		     (plus:DI (zero_extend:DI (match_dup 4))
+			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
+	      (set (match_dup 2)
+		   (minus:SI (match_dup 5)
+			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
   {
     operands[3] = gen_highpart (SImode, operands[0]);
     operands[0] = gen_lowpart (SImode, operands[0]);
Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	(revision 251752)
+++ gcc/doc/rtl.texi	(working copy)
@@ -2252,6 +2252,13 @@
 If one of the operands is a constant, it should be placed in the
 second operand and the comparison code adjusted as appropriate.
 
+There may be exceptions to this rule if the mode @var{m} does not
+carry enough information for the swapped comparison operator, or
+if we try to detect overflow from the subtraction.  That means, while
+0-X may overfow X-0 can never overflow.  Under these conditions
+a compare may have the constant expression at the first operand.
+Examples are the ARM negdi2_compare pattern and similar.
+
 A @code{compare} specifying two @code{VOIDmode} constants is not valid
 since there is no way to know in what mode the comparison is to be
 performed; the comparison must either be folded during the compilation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-09-06 13:17                 ` Bernd Edlinger
@ 2017-09-06 15:31                   ` Kyrill Tkachov
  2017-09-17  8:38                     ` [PING] " Bernd Edlinger
  2017-10-09 13:02                   ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 21+ messages in thread
From: Kyrill Tkachov @ 2017-09-06 15:31 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra


On 06/09/17 14:17, Bernd Edlinger wrote:
> On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
>> On 06/09/17 13:44, Bernd Edlinger wrote:
>>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>>> Hi Kyrill,
>>>>
>>>> Thanks for your review!
>>>>
>>>>
>>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>>>> Hi Bernd,
>>>>>
>>>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>>>> one.
>>>>>>>>>>
>>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>>> __gnu_satfractdasq
>>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>>
>>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>>> up into this:
>>>>>>>>>>
>>>>>>>>>>       [(set (reg:CC CC_REGNUM)
>>>>>>>>>>             (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>>        (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>>                        (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>>                   (set (match_dup 2)
>>>>>>>>>>                        (minus:SI (match_dup 5)
>>>>>>>>>>                                 (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>>> (const_int
>>>>>>>>>> 0))))])]
>>>>>>>>>>
>>>>>>>>>>       [(set (reg:CC CC_REGNUM)
>>>>>>>>>>             (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>>        (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>>                   (set (reg:CC CC_REGNUM)
>>>>>>>>>>                        (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>>
>>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>>> identical.
>>>>>>>>>>
>>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>>
>>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>>
>>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>
>>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>>> with
>>>>>>>>> the patch itself.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>>      [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>>        (compare:CC_NCV
>>>>>>>>>          (match_operand:DI 1 "register_operand" "r")
>>>>>>>>>          (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>>       (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>>        (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>>      "TARGET_32BIT"
>>>>>>>>>      "#"
>>>>>>>>>      "&& reload_completed"
>>>>>>>>>      [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>               (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>>              (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>>> 2)))])
>>>>>>>>>       (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>>               (compare:CC_C
>>>>>>>>>                 (zero_extend:DI (match_dup 4))
>>>>>>>>>                 (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>>                      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>>              (set (match_dup 3)
>>>>>>>>>               (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>>                     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>>> split
>>>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>>>> afterwards it is just CC_C.
>>>>>>>>>
>>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>>> implies that
>>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>>> should in
>>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>>>> agree
>>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>>
>>>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>>> select_cc_mode
>>>>>>>>> as well).
>>>>>>>>>
>>>>>>>> Yes, this is still a bit awkward...
>>>>>>>>
>>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>>>>> one.
>>>>>>>>
>>>>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>>> it is
>>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>>
>>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>>> maybe, like:
>>>>>>>>
>>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>>        (compare:CC_C
>>>>>>>>          (zero_extend:DI (match_dup 4))
>>>>>>>>          (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>>        (compare:CC_NV
>>>>>>>>         (match_dup 4))
>>>>>>>>         (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>> (const_int 0)))
>>>>>>>> set (match_dup 3)
>>>>>>>>        (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>
>>>>>>>>
>>>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>>>> in parallel?
>>>>>>>>
>>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>>> bits it really defines?
>>>>>>>>
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>>> that defines C as if the comparison is done in DImode
>>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>>
>>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>>
>>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>>
>>>>>>> How do you like this new version?
>>>>>>>
>>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>>
>>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>>> some
>>>>>>> time until I have definitive results.
>>>>>>>
>>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>>> for the next stage1?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>>> not having all CC bits available.
>>>>>>
>>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>>> CCmode.
>>>>>>
>>>>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>>>>
>>>>>> So I tried to test this pattern, with the following test programs,
>>>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>>> surprising.
>>>>>>
>>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>>
>>>>>> cat test.c (DImode)
>>>>>> unsigned long long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>>       b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>>
>>>>>> unpatched code used 8 byte more stack than patched,
>>>>>> because the DImode subtraction is effectively done twice.
>>>>>>
>>>>>> cat test1.c (SImode)
>>>>>> unsigned long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>>       b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>> which generates (unpatched):
>>>>>>             cmp     r3, r0
>>>>>>             sub     ip, r3, r0
>>>>>>
>>>>>> instead of expected (patched):
>>>>>>       subs    r3, r3, r2
>>>>>>
>>>>>>
>>>>>> The condition is extracted by ifconversion and/or combine
>>>>>> and complicates the resulting code instead of simplifying.
>>>>>>
>>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>>> is using the same CC mode.
>>>>>>
>>>>>> That does not happen when the CC modes disagree, as with the
>>>>>> proposed patch.  All other uses of the pattern are already using
>>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>>
>>>>>> Attached is an updated version of the patch, that happens to
>>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>>> as a side effect.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> Is it OK for trunk?
>>>>> I'm very sorry it has taken so long to review.
>>>>> I've been ramping up on the context recently now so I'll try to move
>>>>> this along...
>>>>>
>>>>> This patch looks mostly ok to me from reading the patterns and the
>>>>> discussion around it.
>>>>> I have one concern:
>>>>>
>>>>>
>>>>>     (define_insn_and_split "negdi2_compare"
>>>>> -  [(set (reg:CC CC_REGNUM)
>>>>> -    (compare:CC
>>>>> +  [(set (reg:CC_NCV CC_REGNUM)
>>>>> +    (compare:CC_NCV
>>>>>           (const_int 0)
>>>>>           (match_operand:DI 1 "register_operand" "0,r")))
>>>>>        (set (match_operand:DI 0 "register_operand" "=r,&r")
>>>>> @@ -4647,8 +4650,12 @@
>>>>>                (compare:CC (const_int 0) (match_dup 1)))
>>>>>               (set (match_dup 0) (minus:SI (const_int 0)
>>>>>                            (match_dup 1)))])
>>>>> -   (parallel [(set (reg:CC CC_REGNUM)
>>>>> -           (compare:CC (const_int 0) (match_dup 3)))
>>>>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>>>> +           (compare:CC_NCV_CIC
>>>>> +             (const_int 0)
>>>>> +             (plus:DI
>>>>> +               (zero_extend:DI (match_dup 3))
>>>>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>              (set (match_dup 2)
>>>>>               (minus:SI
>>>>>                (minus:SI (const_int 0) (match_dup 3))
>>>>>
>>>>>
>>>>> I was somewhat concerned with having the first operand of the COMPARE
>>>>> being a const_int 0 and the second being
>>>>> a complex expression as the RTL canonicalization rules usually require
>>>>> the complex operand going first if possible.
>>>>> Reading the RTL rules in rtl.texi I see it says this:
>>>>> "If one of the operands is a constant, it should be placed in the
>>>>> second operand and the comparison code adjusted as appropriate."
>>>>> So it seems that the pre-existing pattern that puts const_int 0 as the
>>>>> first operand already breaks that rule.
>>>>> I think we should fix that and update the use of condition code to a
>>>>> GEU rather than LTU as well.
>>>>>
>>> Well, the sentence before that one is even more explicit:
>>>
>>> "Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
>>> @code{compare} is valid only if the mode of @var{x} is in class
>>> @code{MODE_INT} and @var{y} is a @code{const_int} or
>>> @code{const_double} with mode @code{VOIDmode}."
>>>
>>> So because the const_int 0 has VOIDmode the comparison is done
>>> in y-mode not x-mode.
>>>
>>> But unfortunately I see no way how to accomplish this,
>>> because this assumes that the compare can be easily swapped
>>> if the conditional instruction just uses one of GT/GE/LE/LT
>>> or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.
>>>
>>> And in this example we ask for "overflow", but while 0-X can
>>> overflow X-0 simply can't.  And moreover there are non-symmetric
>>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
>>> the swapped conditions GT/LE/GTU/LEU.
>>>
>>> I think the only solution would be to adjust the spec to
>>> reflect the implementation:
>>>
>>> Index: rtl.texi
>>> ===================================================================
>>> --- rtl.texi	(revision 251752)
>>> +++ rtl.texi	(working copy)
>>> @@ -2252,6 +2252,13 @@
>>>     If one of the operands is a constant, it should be placed in the
>>>     second operand and the comparison code adjusted as appropriate.
>>>
>>> +There may be exceptions from this rule if the mode @var{m} carries
>>> +not enough information for the swapped comparison operator, or
>> There may be exceptions _to_ ... if mode @var{m} does not carry enough...
>>
>>> +if we ask for overflow from the subtraction.
>> Aren't we really trying to 'detect overflow' rather than 'ask' for it?
>>
>>> That means, while
>>> +0-X may overfow X-0 can never overflow.  Under these conditions
>>> +a compare may have the constant expression at the left side.
>> In these circumstances the constant will be in the first operand .
>>
>> (left and right don't really make sense for RTL).
>>> +Examples are the ARM negdi2_compare pattern and similar.
>>> +
>>>     A @code{compare} specifying two @code{VOIDmode} constants is not valid
>>>     since there is no way to know in what mode the comparison is to be
>>>     performed; the comparison must either be folded during the compilation
>>>
>>>
>>>
>>> Please advise.
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>>> Hmmm...
>>>>
>>>> I think the compare is not a commutative operation, and swapping
>>>> the arguments will imply a different value in the flags.
>>>>
>>>> So if I write
>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>         (compare:CC_NCV
>>>>           (const_int 0)
>>>>           (reg:DI 123)))
>>>>
>>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
>>>> N,V = usable for LT, GE
>>>>
>>>> But if I write
>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>         (compare:CC_NCV
>>>>           (reg:DI 123)
>>>>           (const_int 0)))
>>>>
>>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>>>> the same and the actual value in the flags is defined by the expansion.
>>>> Of course there exists probably no matching expansion for that.
>>>>
>>>> Note that both LTU in the above hunk are in a parallel-stmt and operate
>>>> on the flags from the previous pattern, so changing these to GEU
>>>> will probably be wrong.
>>>>
>>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>>>> (const_int 0) (match_dup 1)).
>>>>
>>>> One use of the resulting flags (I know of) is in negvdi3 where we
>>>> have:
>>>>
>>>>      emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>>>      arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>>>
>>>> I think only 0-x can overflow while x-0 can never overflow.
>>>>
>>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>>>> a lot and I guess if this pattern is created by a splitter, this can
>>>> only be expanded by an exactly matching pattern, there is (hopefully)
>>>> no way how combine could mess with this pattern due to the exotic
>>>> CCmode.  So while I think it would work to swap only the notation of
>>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
>>>> consuming statements, that would make it quite hard to follow for the
>>>> human reader at least.
>>>>
>>>> What do you think?

I agree on the readability point. It's already very hard to follow the 
CCmode stuff as it is :(
My concern was on whether some RTL pass (combine in particular) would 
trip up on this
nominally non-canonical RTL. As long as nothing manipulates these RTXes 
and they match
the precise single pattern they're expected to match we should be ok I 
think. One wonders
whether we would be better off using unspecs if the compare operator is 
not exactly what we need,
but I think we can get away with COMPARE for now.


>>>>
>>>> Bernd.
> Attached is the patch with an update to the rtl.texi documentation.
> The code does not change, so I did no new bootstrap.
>
>
> Is it OK for trunk?

This looks ok to me, but I'd feel more comfortable if someone more 
knowledgeable on RTL could
ok the documentation part (I guess it's outside my remit anyway) as the 
overflow case you pointed out
does indeed make the compare operator not commutative.

So the arm parts are ok assuming the documentation change is ok'd.
Richard, would you be okay with the suggested documentation change?

Thanks,
Kyrill

>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PING] [PATCH, ARM] correctly encode the CC reg data flow
  2017-09-06 15:31                   ` Kyrill Tkachov
@ 2017-09-17  8:38                     ` Bernd Edlinger
  0 siblings, 0 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-09-17  8:38 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Earnshaw (lists), gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

Hi,

ping... for the documentation part.

Thanks,
Bernd.

On 09/06/17 17:31, Kyrill Tkachov wrote:
> 
> On 06/09/17 14:17, Bernd Edlinger wrote:
>> On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
>>> On 06/09/17 13:44, Bernd Edlinger wrote:
>>>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>>>> Hi Kyrill,
>>>>>
>>>>> Thanks for your review!
>>>>>
>>>>>
>>>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>>>>> Hi Bernd,
>>>>>>
>>>>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> this is related to PR77308, the follow-up patch will depend 
>>>>>>>>>>> on this
>>>>>>>>>>> one.
>>>>>>>>>>>
>>>>>>>>>>> When trying the split the *arm_cmpdi_insn and 
>>>>>>>>>>> *arm_cmpdi_unsigned
>>>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>>>> __gnu_satfractdasq
>>>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>>>
>>>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>>>> up into this:
>>>>>>>>>>>
>>>>>>>>>>>       [(set (reg:CC CC_REGNUM)
>>>>>>>>>>>             (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>>>        (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>>>                        (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>>>                   (set (match_dup 2)
>>>>>>>>>>>                        (minus:SI (match_dup 5)
>>>>>>>>>>>                                 (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>>>> (const_int
>>>>>>>>>>> 0))))])]
>>>>>>>>>>>
>>>>>>>>>>>       [(set (reg:CC CC_REGNUM)
>>>>>>>>>>>             (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>>>        (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>>>                   (set (reg:CC CC_REGNUM)
>>>>>>>>>>>                        (compare:CC (match_dup 0) (match_dup 
>>>>>>>>>>> 1))))]
>>>>>>>>>>>
>>>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>>>> is not mentioning that the reg:CC is also dependent on the 
>>>>>>>>>>> reg:CC
>>>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>>>> identical.
>>>>>>>>>>>
>>>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>>>
>>>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>>>
>>>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>
>>>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>>>> with
>>>>>>>>>> the patch itself.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>>>      [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>>>        (compare:CC_NCV
>>>>>>>>>>          (match_operand:DI 1 "register_operand" "r")
>>>>>>>>>>          (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>>>       (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>>>        (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>>>      "TARGET_32BIT"
>>>>>>>>>>      "#"
>>>>>>>>>>      "&& reload_completed"
>>>>>>>>>>      [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>>               (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>>>              (set (match_dup 0) (minus:SI (match_dup 1) 
>>>>>>>>>> (match_dup
>>>>>>>>>> 2)))])
>>>>>>>>>>       (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>>>               (compare:CC_C
>>>>>>>>>>                 (zero_extend:DI (match_dup 4))
>>>>>>>>>>                 (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>>>                      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 
>>>>>>>>>> 0)))))
>>>>>>>>>>              (set (match_dup 3)
>>>>>>>>>>               (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>>>                     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 
>>>>>>>>>> 0))))])]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>>>> split
>>>>>>>>>> the overall result for the condition register is in mode 
>>>>>>>>>> CC_NCV, but
>>>>>>>>>> afterwards it is just CC_C.
>>>>>>>>>>
>>>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>>>> implies that
>>>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>>>> should in
>>>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm 
>>>>>>>>>> inclined to
>>>>>>>>>> agree
>>>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>>>
>>>>>>>>>> I'm not sure if there are other consequences that will fall 
>>>>>>>>>> out from
>>>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>>>> select_cc_mode
>>>>>>>>>> as well).
>>>>>>>>>>
>>>>>>>>> Yes, this is still a bit awkward...
>>>>>>>>>
>>>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI 
>>>>>>>>> ...)
>>>>>>>>> only gets the C bit correct, the expression for N and V is a 
>>>>>>>>> different
>>>>>>>>> one.
>>>>>>>>>
>>>>>>>>> It probably works, because the subsi3_carryin_compare 
>>>>>>>>> instruction sets
>>>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>>>> it is
>>>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>>>
>>>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>>>> maybe, like:
>>>>>>>>>
>>>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>>>        (compare:CC_C
>>>>>>>>>          (zero_extend:DI (match_dup 4))
>>>>>>>>>          (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>>                   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>>>        (compare:CC_NV
>>>>>>>>>         (match_dup 4))
>>>>>>>>>         (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>> (const_int 0)))
>>>>>>>>> set (match_dup 3)
>>>>>>>>>        (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>>                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But I doubt that will work to set CC_REGNUM with two different 
>>>>>>>>> modes
>>>>>>>>> in parallel?
>>>>>>>>>
>>>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that 
>>>>>>>>> implicitly
>>>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>>>> bits it really defines?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Bernd.
>>>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>>>> that defines C as if the comparison is done in DImode
>>>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>>>
>>>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = 
>>>>>>>> Carry-In-Compare),
>>>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>>>> only N and Z are set correctly), but in a different patch of 
>>>>>>>> course.
>>>>>>>>
>>>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>>>
>>>>>>>> How do you like this new version?
>>>>>>>>
>>>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>>>
>>>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>>>> some
>>>>>>>> time until I have definitive results.
>>>>>>>>
>>>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>>>> for the next stage1?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>>>> negdi2_compare patterns are used, and look if the caller is fine 
>>>>>>> with
>>>>>>> not having all CC bits available.
>>>>>>>
>>>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch 
>>>>>>> (LTU,
>>>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>>>> CCmode.
>>>>>>>
>>>>>>> To correct this, the branch should use CC_Cmode which is always 
>>>>>>> defined.
>>>>>>>
>>>>>>> So I tried to test this pattern, with the following test programs,
>>>>>>> and found that the code actually improves when the branch uses 
>>>>>>> CC_Cmode
>>>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>>>> surprising.
>>>>>>>
>>>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>>>
>>>>>>> cat test.c (DImode)
>>>>>>> unsigned long long x, y, z;
>>>>>>> int b;
>>>>>>> void test()
>>>>>>> {
>>>>>>>       b = __builtin_sub_overflow (y,z, &x);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> unpatched code used 8 byte more stack than patched,
>>>>>>> because the DImode subtraction is effectively done twice.
>>>>>>>
>>>>>>> cat test1.c (SImode)
>>>>>>> unsigned long x, y, z;
>>>>>>> int b;
>>>>>>> void test()
>>>>>>> {
>>>>>>>       b = __builtin_sub_overflow (y,z, &x);
>>>>>>> }
>>>>>>>
>>>>>>> which generates (unpatched):
>>>>>>>             cmp     r3, r0
>>>>>>>             sub     ip, r3, r0
>>>>>>>
>>>>>>> instead of expected (patched):
>>>>>>>       subs    r3, r3, r2
>>>>>>>
>>>>>>>
>>>>>>> The condition is extracted by ifconversion and/or combine
>>>>>>> and complicates the resulting code instead of simplifying.
>>>>>>>
>>>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>>>> is using the same CC mode.
>>>>>>>
>>>>>>> That does not happen when the CC modes disagree, as with the
>>>>>>> proposed patch.  All other uses of the pattern are already using
>>>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>>>
>>>>>>> Attached is an updated version of the patch, that happens to
>>>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>>>> as a side effect.
>>>>>>>
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>> Is it OK for trunk?
>>>>>> I'm very sorry it has taken so long to review.
>>>>>> I've been ramping up on the context recently now so I'll try to move
>>>>>> this along...
>>>>>>
>>>>>> This patch looks mostly ok to me from reading the patterns and the
>>>>>> discussion around it.
>>>>>> I have one concern:
>>>>>>
>>>>>>
>>>>>>     (define_insn_and_split "negdi2_compare"
>>>>>> -  [(set (reg:CC CC_REGNUM)
>>>>>> -    (compare:CC
>>>>>> +  [(set (reg:CC_NCV CC_REGNUM)
>>>>>> +    (compare:CC_NCV
>>>>>>           (const_int 0)
>>>>>>           (match_operand:DI 1 "register_operand" "0,r")))
>>>>>>        (set (match_operand:DI 0 "register_operand" "=r,&r")
>>>>>> @@ -4647,8 +4650,12 @@
>>>>>>                (compare:CC (const_int 0) (match_dup 1)))
>>>>>>               (set (match_dup 0) (minus:SI (const_int 0)
>>>>>>                            (match_dup 1)))])
>>>>>> -   (parallel [(set (reg:CC CC_REGNUM)
>>>>>> -           (compare:CC (const_int 0) (match_dup 3)))
>>>>>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>>>>> +           (compare:CC_NCV_CIC
>>>>>> +             (const_int 0)
>>>>>> +             (plus:DI
>>>>>> +               (zero_extend:DI (match_dup 3))
>>>>>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>              (set (match_dup 2)
>>>>>>               (minus:SI
>>>>>>                (minus:SI (const_int 0) (match_dup 3))
>>>>>>
>>>>>>
>>>>>> I was somewhat concerned with having the first operand of the COMPARE
>>>>>> being a const_int 0 and the second being
>>>>>> a complex expression as the RTL canonicalization rules usually 
>>>>>> require
>>>>>> the complex operand going first if possible.
>>>>>> Reading the RTL rules in rtl.texi I see it says this:
>>>>>> "If one of the operands is a constant, it should be placed in the
>>>>>> second operand and the comparison code adjusted as appropriate."
>>>>>> So it seems that the pre-existing pattern that puts const_int 0 as 
>>>>>> the
>>>>>> first operand already breaks that rule.
>>>>>> I think we should fix that and update the use of condition code to a
>>>>>> GEU rather than LTU as well.
>>>>>>
>>>> Well, the sentence before that one is even more explicit:
>>>>
>>>> "Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
>>>> @code{compare} is valid only if the mode of @var{x} is in class
>>>> @code{MODE_INT} and @var{y} is a @code{const_int} or
>>>> @code{const_double} with mode @code{VOIDmode}."
>>>>
>>>> So because the const_int 0 has VOIDmode the comparison is done
>>>> in y-mode not x-mode.
>>>>
>>>> But unfortunately I see no way how to accomplish this,
>>>> because this assumes that the compare can be easily swapped
>>>> if the conditional instruction just uses one of GT/GE/LE/LT
>>>> or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.
>>>>
>>>> And in this example we ask for "overflow", but while 0-X can
>>>> overflow X-0 simply can't.  And moreover there are non-symmetric
>>>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
>>>> the swapped conditions GT/LE/GTU/LEU.
>>>>
>>>> I think the only solution would be to adjust the spec to
>>>> reflect the implementation:
>>>>
>>>> Index: rtl.texi
>>>> ===================================================================
>>>> --- rtl.texi    (revision 251752)
>>>> +++ rtl.texi    (working copy)
>>>> @@ -2252,6 +2252,13 @@
>>>>     If one of the operands is a constant, it should be placed in the
>>>>     second operand and the comparison code adjusted as appropriate.
>>>>
>>>> +There may be exceptions from this rule if the mode @var{m} carries
>>>> +not enough information for the swapped comparison operator, or
>>> There may be exceptions _to_ ... if mode @var{m} does not carry 
>>> enough...
>>>
>>>> +if we ask for overflow from the subtraction.
>>> Aren't we really trying to 'detect overflow' rather than 'ask' for it?
>>>
>>>> That means, while
>>>> +0-X may overfow X-0 can never overflow.  Under these conditions
>>>> +a compare may have the constant expression at the left side.
>>> In these circumstances the constant will be in the first operand .
>>>
>>> (left and right don't really make sense for RTL).
>>>> +Examples are the ARM negdi2_compare pattern and similar.
>>>> +
>>>>     A @code{compare} specifying two @code{VOIDmode} constants is not 
>>>> valid
>>>>     since there is no way to know in what mode the comparison is to be
>>>>     performed; the comparison must either be folded during the 
>>>> compilation
>>>>
>>>>
>>>>
>>>> Please advise.
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>>> Hmmm...
>>>>>
>>>>> I think the compare is not a commutative operation, and swapping
>>>>> the arguments will imply a different value in the flags.
>>>>>
>>>>> So if I write
>>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>>         (compare:CC_NCV
>>>>>           (const_int 0)
>>>>>           (reg:DI 123)))
>>>>>
>>>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or 
>>>>> GEU,
>>>>> N,V = usable for LT, GE
>>>>>
>>>>> But if I write
>>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>>         (compare:CC_NCV
>>>>>           (reg:DI 123)
>>>>>           (const_int 0)))
>>>>>
>>>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>>>>> the same and the actual value in the flags is defined by the 
>>>>> expansion.
>>>>> Of course there exists probably no matching expansion for that.
>>>>>
>>>>> Note that both LTU in the above hunk are in a parallel-stmt and 
>>>>> operate
>>>>> on the flags from the previous pattern, so changing these to GEU
>>>>> will probably be wrong.
>>>>>
>>>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>>>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>>>>> (const_int 0) (match_dup 1)).
>>>>>
>>>>> One use of the resulting flags (I know of) is in negvdi3 where we
>>>>> have:
>>>>>
>>>>>      emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>>>>      arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>>>>
>>>>> I think only 0-x can overflow while x-0 can never overflow.
>>>>>
>>>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>>>>> a lot and I guess if this pattern is created by a splitter, this can
>>>>> only be expanded by an exactly matching pattern, there is (hopefully)
>>>>> no way how combine could mess with this pattern due to the exotic
>>>>> CCmode.  So while I think it would work to swap only the notation of
>>>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and 
>>>>> the
>>>>> consuming statements, that would make it quite hard to follow for the
>>>>> human reader at least.
>>>>>
>>>>> What do you think?
> 
> I agree on the readability point. It's already very hard to follow the 
> CCmode stuff as it is :(
> My concern was on whether some RTL pass (combine in particular) would 
> trip up on this
> nominally non-canonical RTL. As long as nothing manipulates these RTXes 
> and they match
> the precise single pattern they're expected to match we should be ok I 
> think. One wonders
> whether we would be better off using unspecs if the compare operator is 
> not exactly what we need,
> but I think we can get away with COMPARE for now.
> 
> 
>>>>>
>>>>> Bernd.
>> Attached is the patch with an update to the rtl.texi documentation.
>> The code does not change, so I did no new bootstrap.
>>
>>
>> Is it OK for trunk?
> 
> This looks ok to me, but I'd feel more comfortable if someone more 
> knowledgeable on RTL could
> ok the documentation part (I guess it's outside my remit anyway) as the 
> overflow case you pointed out
> does indeed make the compare operator not commutative.
> 
> So the arm parts are ok assuming the documentation change is ok'd.
> Richard, would you be okay with the suggested documentation change?
> 
> Thanks,
> Kyrill
> 
>>
>> Thanks
>> Bernd.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-09-06 13:17                 ` Bernd Edlinger
  2017-09-06 15:31                   ` Kyrill Tkachov
@ 2017-10-09 13:02                   ` Richard Earnshaw (lists)
  2017-10-10 19:11                     ` Bernd Edlinger
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Earnshaw (lists) @ 2017-10-09 13:02 UTC (permalink / raw)
  To: Bernd Edlinger, Kyrill Tkachov, gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

On 06/09/17 14:17, Bernd Edlinger wrote:
> On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
>> On 06/09/17 13:44, Bernd Edlinger wrote:
>>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>>> Hi Kyrill,
>>>>
>>>> Thanks for your review!
>>>>
>>>>
>>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>>>> Hi Bernd,
>>>>>
>>>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>>>> one.
>>>>>>>>>>
>>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>>> __gnu_satfractdasq
>>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>>
>>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>>> up into this:
>>>>>>>>>>
>>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>>            (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>>       (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>>                       (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>>                  (set (match_dup 2)
>>>>>>>>>>                       (minus:SI (match_dup 5)
>>>>>>>>>>                                (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>>> (const_int
>>>>>>>>>> 0))))])]
>>>>>>>>>>
>>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>>            (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>>       (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>>                  (set (reg:CC CC_REGNUM)
>>>>>>>>>>                       (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>>
>>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>>> identical.
>>>>>>>>>>
>>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>>
>>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>>
>>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>
>>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>>> with
>>>>>>>>> the patch itself.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>>     [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>>       (compare:CC_NCV
>>>>>>>>>         (match_operand:DI 1 "register_operand" "r")
>>>>>>>>>         (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>>      (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>>       (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>>     "TARGET_32BIT"
>>>>>>>>>     "#"
>>>>>>>>>     "&& reload_completed"
>>>>>>>>>     [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>              (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>>             (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>>> 2)))])
>>>>>>>>>      (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>>              (compare:CC_C
>>>>>>>>>                (zero_extend:DI (match_dup 4))
>>>>>>>>>                (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>>                     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>>             (set (match_dup 3)
>>>>>>>>>              (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>>                    (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>>> split
>>>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>>>> afterwards it is just CC_C.
>>>>>>>>>
>>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>>> implies that
>>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>>> should in
>>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>>>> agree
>>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>>
>>>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>>> select_cc_mode
>>>>>>>>> as well).
>>>>>>>>>
>>>>>>>> Yes, this is still a bit awkward...
>>>>>>>>
>>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>>>>> one.
>>>>>>>>
>>>>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>>> it is
>>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>>
>>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>>> maybe, like:
>>>>>>>>
>>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>>       (compare:CC_C
>>>>>>>>         (zero_extend:DI (match_dup 4))
>>>>>>>>         (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>                  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>>       (compare:CC_NV
>>>>>>>>        (match_dup 4))
>>>>>>>>        (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>> (const_int 0)))
>>>>>>>> set (match_dup 3)
>>>>>>>>       (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>                 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>
>>>>>>>>
>>>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>>>> in parallel?
>>>>>>>>
>>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>>> bits it really defines?
>>>>>>>>
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>>> that defines C as if the comparison is done in DImode
>>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>>
>>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>>
>>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>>
>>>>>>> How do you like this new version?
>>>>>>>
>>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>>
>>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>>> some
>>>>>>> time until I have definitive results.
>>>>>>>
>>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>>> for the next stage1?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>>
>>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>>> not having all CC bits available.
>>>>>>
>>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>>> CCmode.
>>>>>>
>>>>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>>>>
>>>>>> So I tried to test this pattern, with the following test programs,
>>>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>>> surprising.
>>>>>>
>>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>>
>>>>>> cat test.c (DImode)
>>>>>> unsigned long long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>>
>>>>>> unpatched code used 8 byte more stack than patched,
>>>>>> because the DImode subtraction is effectively done twice.
>>>>>>
>>>>>> cat test1.c (SImode)
>>>>>> unsigned long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>> which generates (unpatched):
>>>>>>            cmp     r3, r0
>>>>>>            sub     ip, r3, r0
>>>>>>
>>>>>> instead of expected (patched):
>>>>>>      subs    r3, r3, r2
>>>>>>
>>>>>>
>>>>>> The condition is extracted by ifconversion and/or combine
>>>>>> and complicates the resulting code instead of simplifying.
>>>>>>
>>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>>> is using the same CC mode.
>>>>>>
>>>>>> That does not happen when the CC modes disagree, as with the
>>>>>> proposed patch.  All other uses of the pattern are already using
>>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>>
>>>>>> Attached is an updated version of the patch, that happens to
>>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>>> as a side effect.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> Is it OK for trunk?
>>>>>
>>>>> I'm very sorry it has taken so long to review.
>>>>> I've been ramping up on the context recently now so I'll try to move
>>>>> this along...
>>>>>
>>>>> This patch looks mostly ok to me from reading the patterns and the
>>>>> discussion around it.
>>>>> I have one concern:
>>>>>
>>>>>
>>>>>    (define_insn_and_split "negdi2_compare"
>>>>> -  [(set (reg:CC CC_REGNUM)
>>>>> -    (compare:CC
>>>>> +  [(set (reg:CC_NCV CC_REGNUM)
>>>>> +    (compare:CC_NCV
>>>>>          (const_int 0)
>>>>>          (match_operand:DI 1 "register_operand" "0,r")))
>>>>>       (set (match_operand:DI 0 "register_operand" "=r,&r")
>>>>> @@ -4647,8 +4650,12 @@
>>>>>               (compare:CC (const_int 0) (match_dup 1)))
>>>>>              (set (match_dup 0) (minus:SI (const_int 0)
>>>>>                           (match_dup 1)))])
>>>>> -   (parallel [(set (reg:CC CC_REGNUM)
>>>>> -           (compare:CC (const_int 0) (match_dup 3)))
>>>>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>>>> +           (compare:CC_NCV_CIC
>>>>> +             (const_int 0)
>>>>> +             (plus:DI
>>>>> +               (zero_extend:DI (match_dup 3))
>>>>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>             (set (match_dup 2)
>>>>>              (minus:SI
>>>>>               (minus:SI (const_int 0) (match_dup 3))
>>>>>
>>>>>
>>>>> I was somewhat concerned with having the first operand of the COMPARE
>>>>> being a const_int 0 and the second being
>>>>> a complex expression as the RTL canonicalization rules usually require
>>>>> the complex operand going first if possible.
>>>>> Reading the RTL rules in rtl.texi I see it says this:
>>>>> "If one of the operands is a constant, it should be placed in the
>>>>> second operand and the comparison code adjusted as appropriate."
>>>>> So it seems that the pre-existing pattern that puts const_int 0 as the
>>>>> first operand already breaks that rule.
>>>>> I think we should fix that and update the use of condition code to a
>>>>> GEU rather than LTU as well.
>>>>>
>>>>
>>>
>>> Well, the sentence before that one is even more explicit:
>>>
>>> "Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
>>> @code{compare} is valid only if the mode of @var{x} is in class
>>> @code{MODE_INT} and @var{y} is a @code{const_int} or
>>> @code{const_double} with mode @code{VOIDmode}."
>>>
>>> So because the const_int 0 has VOIDmode the comparison is done
>>> in y-mode not x-mode.
>>>
>>> But unfortunately I see no way how to accomplish this,
>>> because this assumes that the compare can be easily swapped
>>> if the conditional instruction just uses one of GT/GE/LE/LT
>>> or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.
>>>
>>> And in this example we ask for "overflow", but while 0-X can
>>> overflow X-0 simply can't.  And moreover there are non-symmetric
>>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
>>> the swapped conditions GT/LE/GTU/LEU.
>>>
>>> I think the only solution would be to adjust the spec to
>>> reflect the implementation:
>>>
>>> Index: rtl.texi
>>> ===================================================================
>>> --- rtl.texi	(revision 251752)
>>> +++ rtl.texi	(working copy)
>>> @@ -2252,6 +2252,13 @@
>>>    If one of the operands is a constant, it should be placed in the
>>>    second operand and the comparison code adjusted as appropriate.
>>>
>>> +There may be exceptions from this rule if the mode @var{m} carries
>>> +not enough information for the swapped comparison operator, or
>>
>> There may be exceptions _to_ ... if mode @var{m} does not carry enough...
>>
>>> +if we ask for overflow from the subtraction.
>>
>> Aren't we really trying to 'detect overflow' rather than 'ask' for it?
>>
>>> That means, while
>>> +0-X may overfow X-0 can never overflow.  Under these conditions
>>> +a compare may have the constant expression at the left side.
>>
>> In these circumstances the constant will be in the first operand .
>>
>> (left and right don't really make sense for RTL).
>>> +Examples are the ARM negdi2_compare pattern and similar.
>>> +
>>>    A @code{compare} specifying two @code{VOIDmode} constants is not valid
>>>    since there is no way to know in what mode the comparison is to be
>>>    performed; the comparison must either be folded during the compilation
>>>
>>>
>>>
>>> Please advise.
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>>>
>>>> Hmmm...
>>>>
>>>> I think the compare is not a commutative operation, and swapping
>>>> the arguments will imply a different value in the flags.
>>>>
>>>> So if I write
>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>        (compare:CC_NCV
>>>>          (const_int 0)
>>>>          (reg:DI 123)))
>>>>
>>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
>>>> N,V = usable for LT, GE
>>>>
>>>> But if I write
>>>> (set (reg:CC_NCV CC_REGNUM)
>>>>        (compare:CC_NCV
>>>>          (reg:DI 123)
>>>>          (const_int 0)))
>>>>
>>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>>>> the same and the actual value in the flags is defined by the expansion.
>>>> Of course there exists probably no matching expansion for that.
>>>>
>>>> Note that both LTU in the above hunk are in a parallel-stmt and operate
>>>> on the flags from the previous pattern, so changing these to GEU
>>>> will probably be wrong.
>>>>
>>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>>>> (const_int 0) (match_dup 1)).
>>>>
>>>> One use of the resulting flags (I know of) is in negvdi3 where we
>>>> have:
>>>>
>>>>     emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>>>     arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>>>
>>>> I think only 0-x can overflow while x-0 can never overflow.
>>>>
>>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>>>> a lot and I guess if this pattern is created by a splitter, this can
>>>> only be expanded by an exactly matching pattern, there is (hopefully)
>>>> no way how combine could mess with this pattern due to the exotic
>>>> CCmode.  So while I think it would work to swap only the notation of
>>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
>>>> consuming statements, that would make it quite hard to follow for the
>>>> human reader at least.
>>>>
>>>> What do you think?
>>>>
>>>>
>>>> Bernd.
>>
> 
> Attached is the patch with an update to the rtl.texi documentation.
> The code does not change, so I did no new bootstrap.
> 
> 
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> patch-pr77308-5.diff
> 
> 
> 2017-09-06  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	PR target/77308
> 	* doc/rtl.texi: Update documentation.
> 	* config/arm/arm-modes.def (CC_NCV_CIC): New mode.
> 	* config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
> 	adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
> 	subsi3_carryin_compare, subsi3_carryin_compare_const,
> 	negdi2_compare, *negsi2_carryin_compare,
> 	*arm_cmpdi_insn): Fix the CC reg dataflow.
> 	(usubv<mode>4): Use CC_Cmode for the branch.
> 
> Index: gcc/config/arm/arm-modes.def
> ===================================================================
> --- gcc/config/arm/arm-modes.def	(revision 244439)
> +++ gcc/config/arm/arm-modes.def	(working copy)
> @@ -38,6 +38,8 @@
>     (used for DImode unsigned comparisons).
>     CC_NCVmode should be used if only the N, C, and V flags are correct
>     (used for DImode signed comparisons).
> +   CC_NCV_CICmode defines N and V in SImode and C in DImode
> +   (used for carryin_compare patterns).
>     CCmode should be used otherwise.  */
>  
>  CC_MODE (CC_NOOV);
> @@ -44,6 +46,7 @@
>  CC_MODE (CC_Z);
>  CC_MODE (CC_CZ);
>  CC_MODE (CC_NCV);
> +CC_MODE (CC_NCV_CIC);
>  CC_MODE (CC_SWP);
>  CC_MODE (CCFP);
>  CC_MODE (CCFPE);
> Index: gcc/config/arm/arm.md
> ===================================================================
> --- gcc/config/arm/arm.md	(revision 244439)
> +++ gcc/config/arm/arm.md	(working copy)
> @@ -669,17 +669,15 @@
>  	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
>     (parallel [(set (reg:CC_V CC_REGNUM)
>  		   (ne:CC_V
> -		    (plus:DI (plus:DI
> -			      (sign_extend:DI (match_dup 4))
> -			      (sign_extend:DI (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -		    (plus:DI (sign_extend:DI
> -			      (plus:SI (match_dup 4) (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> -	     (set (match_dup 3) (plus:SI (plus:SI
> -					  (match_dup 4) (match_dup 5))
> -					 (ltu:SI (reg:CC_C CC_REGNUM)
> -						 (const_int 0))))])]
> +		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
> +				       (sign_extend:DI (match_dup 5)))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +		    (sign_extend:DI
> +		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
> +	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +					  (ltu:SI (reg:CC_C CC_REGNUM)
> +						  (const_int 0))))])]
>    "
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
> @@ -713,13 +711,13 @@
>    [(set (reg:CC_V CC_REGNUM)
>  	(ne:CC_V
>  	  (plus:DI
> -	   (plus:DI
> -	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
> -	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
> -	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -	  (plus:DI (sign_extend:DI
> -		    (plus:SI (match_dup 1) (match_dup 2)))
> -		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	    (plus:DI
> +	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
> +	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +				   (ltu:SI (reg:CC_C CC_REGNUM)
> +					   (const_int 0))))))
>     (set (match_operand:SI 0 "register_operand" "=r")
>  	(plus:SI
>  	 (plus:SI (match_dup 1) (match_dup 2))
> @@ -748,17 +746,15 @@
>  	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
>     (parallel [(set (reg:CC_C CC_REGNUM)
>  		   (ne:CC_C
> -		    (plus:DI (plus:DI
> -			      (zero_extend:DI (match_dup 4))
> -			      (zero_extend:DI (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -		    (plus:DI (zero_extend:DI
> -			      (plus:SI (match_dup 4) (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> -	     (set (match_dup 3) (plus:SI
> -				 (plus:SI (match_dup 4) (match_dup 5))
> -				 (ltu:SI (reg:CC_C CC_REGNUM)
> -					 (const_int 0))))])]
> +		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
> +				       (zero_extend:DI (match_dup 5)))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +		    (zero_extend:DI
> +		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
> +	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +					  (ltu:SI (reg:CC_C CC_REGNUM)
> +						  (const_int 0))))])]
>    "
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
> @@ -777,17 +773,16 @@
>    [(set (reg:CC_C CC_REGNUM)
>  	(ne:CC_C
>  	  (plus:DI
> -	   (plus:DI
> -	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> -	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
> -	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -	  (plus:DI (zero_extend:DI
> -		    (plus:SI (match_dup 1) (match_dup 2)))
> -		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	    (plus:DI
> +	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> +	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +	  (zero_extend:DI
> +	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
>     (set (match_operand:SI 0 "register_operand" "=r")
> -	(plus:SI
> -	 (plus:SI (match_dup 1) (match_dup 2))
> -	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "adcs%?\\t%0, %1, %2"
>    [(set_attr "conds" "set")
> @@ -1080,14 +1075,14 @@
>    "TARGET_32BIT"
>  {
>    emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
> -  arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
> +  arm_gen_unlikely_cbranch (EQ, CC_Cmode, operands[3]);
>  
>    DONE;
>  })
>  
>  (define_insn_and_split "subdi3_compare1"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC
> +  [(set (reg:CC_NCV CC_REGNUM)
> +	(compare:CC_NCV
>  	  (match_operand:DI 1 "register_operand" "r")
>  	  (match_operand:DI 2 "register_operand" "r")))
>     (set (match_operand:DI 0 "register_operand" "=&r")
> @@ -1098,10 +1093,14 @@
>    [(parallel [(set (reg:CC CC_REGNUM)
>  		   (compare:CC (match_dup 1) (match_dup 2)))
>  	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
> -   (parallel [(set (reg:CC CC_REGNUM)
> -		   (compare:CC (match_dup 4) (match_dup 5)))
> -	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
> -			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
> +		   (compare:CC_NCV_CIC
> +		     (zero_extend:DI (match_dup 4))
> +		     (plus:DI (zero_extend:DI (match_dup 5))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	      (set (match_dup 3)
> +		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
> +			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
>      operands[0] = gen_lowpart (SImode, operands[0]);
> @@ -1157,13 +1156,15 @@
>  )
>  
>  (define_insn "*subsi3_carryin_compare"
> -  [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
> -                    (match_operand:SI 2 "s_register_operand" "r")))
> +  [(set (reg:CC_NCV_CIC CC_REGNUM)
> +	(compare:CC_NCV_CIC
> +	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -        (minus:SI (minus:SI (match_dup 1)
> -                            (match_dup 2))
> -                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
> +		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "sbcs\\t%0, %1, %2"
>    [(set_attr "conds" "set")
> @@ -1171,13 +1172,15 @@
>  )
>  
>  (define_insn "*subsi3_carryin_compare_const"
> -  [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
> -                    (match_operand:SI 2 "arm_not_operand" "K")))
> +  [(set (reg:CC_NCV_CIC CC_REGNUM)
> +	(compare:CC_NCV_CIC
> +	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -        (minus:SI (plus:SI (match_dup 1)
> -                           (match_dup 2))
> -                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "sbcs\\t%0, %1, #%B2"
>    [(set_attr "conds" "set")
> @@ -4634,8 +4637,8 @@
>  
>  
>  (define_insn_and_split "negdi2_compare"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC
> +  [(set (reg:CC_NCV CC_REGNUM)
> +	(compare:CC_NCV
>  	  (const_int 0)
>  	  (match_operand:DI 1 "register_operand" "0,r")))
>     (set (match_operand:DI 0 "register_operand" "=r,&r")
> @@ -4647,8 +4650,12 @@
>  		   (compare:CC (const_int 0) (match_dup 1)))
>  	      (set (match_dup 0) (minus:SI (const_int 0)
>  					   (match_dup 1)))])
> -   (parallel [(set (reg:CC CC_REGNUM)
> -		   (compare:CC (const_int 0) (match_dup 3)))
> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
> +		   (compare:CC_NCV_CIC
> +		     (const_int 0)
> +		     (plus:DI
> +		       (zero_extend:DI (match_dup 3))
> +		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>  	     (set (match_dup 2)
>  		  (minus:SI
>  		   (minus:SI (const_int 0) (match_dup 3))
> @@ -4707,12 +4714,14 @@
>  )
>  
>  (define_insn "*negsi2_carryin_compare"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC (const_int 0)
> -		    (match_operand:SI 1 "s_register_operand" "r")))
> +  [(set (reg:CC_NCV_CIC CC_REGNUM)
> +	(compare:CC_NCV_CIC
> +	  (const_int 0)
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -	(minus:SI (minus:SI (const_int 0)
> -			    (match_dup 1))
> +	(minus:SI (minus:SI (const_int 0) (match_dup 1))
>  		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_ARM"
>    "rscs\\t%0, %1, #0"
> @@ -7361,12 +7370,15 @@
>    "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
>    "&& reload_completed"
>    [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_dup 0) (match_dup 1)))
> -   (parallel [(set (reg:CC CC_REGNUM)
> -                   (compare:CC (match_dup 3) (match_dup 4)))
> -              (set (match_dup 2)
> -                   (minus:SI (match_dup 5)
> -                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
> +	(compare:CC (match_dup 0) (match_dup 1)))
> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
> +		   (compare:CC_NCV_CIC
> +		     (zero_extend:DI (match_dup 3))
> +		     (plus:DI (zero_extend:DI (match_dup 4))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	      (set (match_dup 2)
> +		   (minus:SI (match_dup 5)
> +			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
>      operands[0] = gen_lowpart (SImode, operands[0]);
> Index: gcc/config/arm/arm-modes.def
> ===================================================================
> --- gcc/config/arm/arm-modes.def	(revision 251752)
> +++ gcc/config/arm/arm-modes.def	(working copy)
> @@ -38,6 +38,8 @@
>     (used for DImode unsigned comparisons).
>     CC_NCVmode should be used if only the N, C, and V flags are correct
>     (used for DImode signed comparisons).
> +   CC_NCV_CICmode defines N and V in SImode and C in DImode
> +   (used for carryin_compare patterns).
>     CCmode should be used otherwise.  */
>  
>  CC_MODE (CC_NOOV);
> @@ -44,6 +46,7 @@
>  CC_MODE (CC_Z);
>  CC_MODE (CC_CZ);
>  CC_MODE (CC_NCV);
> +CC_MODE (CC_NCV_CIC);
>  CC_MODE (CC_SWP);
>  CC_MODE (CCFP);
>  CC_MODE (CCFPE);
> Index: gcc/config/arm/arm.md
> ===================================================================
> --- gcc/config/arm/arm.md	(revision 251752)
> +++ gcc/config/arm/arm.md	(working copy)
> @@ -664,17 +664,15 @@
>  	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
>     (parallel [(set (reg:CC_V CC_REGNUM)
>  		   (ne:CC_V
> -		    (plus:DI (plus:DI
> -			      (sign_extend:DI (match_dup 4))
> -			      (sign_extend:DI (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -		    (plus:DI (sign_extend:DI
> -			      (plus:SI (match_dup 4) (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> -	     (set (match_dup 3) (plus:SI (plus:SI
> -					  (match_dup 4) (match_dup 5))
> -					 (ltu:SI (reg:CC_C CC_REGNUM)
> -						 (const_int 0))))])]
> +		     (plus:DI (plus:DI (sign_extend:DI (match_dup 4))
> +				       (sign_extend:DI (match_dup 5)))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +		    (sign_extend:DI
> +		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
> +	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +					  (ltu:SI (reg:CC_C CC_REGNUM)
> +						  (const_int 0))))])]
>    "
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
> @@ -708,13 +706,13 @@
>    [(set (reg:CC_V CC_REGNUM)
>  	(ne:CC_V
>  	  (plus:DI
> -	   (plus:DI
> -	    (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
> -	    (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
> -	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -	  (plus:DI (sign_extend:DI
> -		    (plus:SI (match_dup 1) (match_dup 2)))
> -		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	    (plus:DI
> +	      (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
> +	      (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +	  (sign_extend:DI (plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +				   (ltu:SI (reg:CC_C CC_REGNUM)
> +					   (const_int 0))))))
>     (set (match_operand:SI 0 "register_operand" "=r")
>  	(plus:SI
>  	 (plus:SI (match_dup 1) (match_dup 2))
> @@ -743,17 +741,15 @@
>  	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
>     (parallel [(set (reg:CC_C CC_REGNUM)
>  		   (ne:CC_C
> -		    (plus:DI (plus:DI
> -			      (zero_extend:DI (match_dup 4))
> -			      (zero_extend:DI (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -		    (plus:DI (zero_extend:DI
> -			      (plus:SI (match_dup 4) (match_dup 5)))
> -			     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> -	     (set (match_dup 3) (plus:SI
> -				 (plus:SI (match_dup 4) (match_dup 5))
> -				 (ltu:SI (reg:CC_C CC_REGNUM)
> -					 (const_int 0))))])]
> +		     (plus:DI (plus:DI (zero_extend:DI (match_dup 4))
> +				       (zero_extend:DI (match_dup 5)))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +		    (zero_extend:DI
> +		      (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
> +	      (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
> +					  (ltu:SI (reg:CC_C CC_REGNUM)
> +						  (const_int 0))))])]
>    "
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
> @@ -772,17 +768,16 @@
>    [(set (reg:CC_C CC_REGNUM)
>  	(ne:CC_C
>  	  (plus:DI
> -	   (plus:DI
> -	    (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> -	    (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
> -	   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> -	  (plus:DI (zero_extend:DI
> -		    (plus:SI (match_dup 1) (match_dup 2)))
> -		   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	    (plus:DI
> +	      (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> +	      (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> +	  (zero_extend:DI
> +	    (plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))))
>     (set (match_operand:SI 0 "register_operand" "=r")
> -	(plus:SI
> -	 (plus:SI (match_dup 1) (match_dup 2))
> -	 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "adcs%?\\t%0, %1, %2"
>    [(set_attr "conds" "set")
> @@ -1075,14 +1070,14 @@
>    "TARGET_32BIT"
>  {
>    emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
> -  arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
> +  arm_gen_unlikely_cbranch (EQ, CC_Cmode, operands[3]);
>  
>    DONE;
>  })
>  
>  (define_insn_and_split "subdi3_compare1"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC
> +  [(set (reg:CC_NCV CC_REGNUM)
> +	(compare:CC_NCV
>  	  (match_operand:DI 1 "register_operand" "r")
>  	  (match_operand:DI 2 "register_operand" "r")))
>     (set (match_operand:DI 0 "register_operand" "=&r")
> @@ -1093,10 +1088,14 @@
>    [(parallel [(set (reg:CC CC_REGNUM)
>  		   (compare:CC (match_dup 1) (match_dup 2)))
>  	      (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
> -   (parallel [(set (reg:CC CC_REGNUM)
> -		   (compare:CC (match_dup 4) (match_dup 5)))
> -	     (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
> -			       (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
> +		   (compare:CC_NCV_CIC
> +		     (zero_extend:DI (match_dup 4))
> +		     (plus:DI (zero_extend:DI (match_dup 5))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	      (set (match_dup 3)
> +		   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
> +			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
>      operands[0] = gen_lowpart (SImode, operands[0]);
> @@ -1152,13 +1151,15 @@
>  )
>  
>  (define_insn "*subsi3_carryin_compare"
> -  [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_operand:SI 1 "s_register_operand" "r")
> -                    (match_operand:SI 2 "s_register_operand" "r")))
> +  [(set (reg:CC_NCV_CIC CC_REGNUM)
> +	(compare:CC_NCV_CIC
> +	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -        (minus:SI (minus:SI (match_dup 1)
> -                            (match_dup 2))
> -                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
> +		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "sbcs\\t%0, %1, %2"
>    [(set_attr "conds" "set")
> @@ -1166,13 +1167,15 @@
>  )
>  
>  (define_insn "*subsi3_carryin_compare_const"
> -  [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_operand:SI 1 "reg_or_int_operand" "r")
> -                    (match_operand:SI 2 "arm_not_operand" "K")))
> +  [(set (reg:CC_NCV_CIC CC_REGNUM)
> +	(compare:CC_NCV_CIC
> +	  (zero_extend:DI (match_operand:SI 1 "reg_or_int_operand" "r"))
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 2 "arm_not_operand" "K"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -        (minus:SI (plus:SI (match_dup 1)
> -                           (match_dup 2))
> -                  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
> +	(minus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_32BIT"
>    "sbcs\\t%0, %1, #%B2"
>    [(set_attr "conds" "set")
> @@ -4686,8 +4689,8 @@
>  
>  
>  (define_insn_and_split "negdi2_compare"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC
> +  [(set (reg:CC_NCV CC_REGNUM)
> +	(compare:CC_NCV
>  	  (const_int 0)
>  	  (match_operand:DI 1 "register_operand" "0,r")))
>     (set (match_operand:DI 0 "register_operand" "=r,&r")
> @@ -4699,8 +4702,12 @@
>  		   (compare:CC (const_int 0) (match_dup 1)))
>  	      (set (match_dup 0) (minus:SI (const_int 0)
>  					   (match_dup 1)))])
> -   (parallel [(set (reg:CC CC_REGNUM)
> -		   (compare:CC (const_int 0) (match_dup 3)))
> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
> +		   (compare:CC_NCV_CIC
> +		     (const_int 0)
> +		     (plus:DI
> +		       (zero_extend:DI (match_dup 3))
> +		       (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>  	     (set (match_dup 2)
>  		  (minus:SI
>  		   (minus:SI (const_int 0) (match_dup 3))
> @@ -4759,12 +4766,14 @@
>  )
>  
>  (define_insn "*negsi2_carryin_compare"
> -  [(set (reg:CC CC_REGNUM)
> -	(compare:CC (const_int 0)
> -		    (match_operand:SI 1 "s_register_operand" "r")))
> +  [(set (reg:CC_NCV_CIC CC_REGNUM)
> +	(compare:CC_NCV_CIC
> +	  (const_int 0)
> +	  (plus:DI
> +	    (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
> +	    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>     (set (match_operand:SI 0 "s_register_operand" "=r")
> -	(minus:SI (minus:SI (const_int 0)
> -			    (match_dup 1))
> +	(minus:SI (minus:SI (const_int 0) (match_dup 1))
>  		  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
>    "TARGET_ARM"
>    "rscs\\t%0, %1, #0"
> @@ -7438,12 +7447,15 @@
>    "#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
>    "&& reload_completed"
>    [(set (reg:CC CC_REGNUM)
> -        (compare:CC (match_dup 0) (match_dup 1)))
> -   (parallel [(set (reg:CC CC_REGNUM)
> -                   (compare:CC (match_dup 3) (match_dup 4)))
> -              (set (match_dup 2)
> -                   (minus:SI (match_dup 5)
> -                            (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
> +	(compare:CC (match_dup 0) (match_dup 1)))
> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
> +		   (compare:CC_NCV_CIC
> +		     (zero_extend:DI (match_dup 3))
> +		     (plus:DI (zero_extend:DI (match_dup 4))
> +			      (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
> +	      (set (match_dup 2)
> +		   (minus:SI (match_dup 5)
> +			     (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>    {
>      operands[3] = gen_highpart (SImode, operands[0]);
>      operands[0] = gen_lowpart (SImode, operands[0]);
> Index: gcc/doc/rtl.texi
> ===================================================================
> --- gcc/doc/rtl.texi	(revision 251752)
> +++ gcc/doc/rtl.texi	(working copy)
> @@ -2252,6 +2252,13 @@
>  If one of the operands is a constant, it should be placed in the
>  second operand and the comparison code adjusted as appropriate.
>  
> +There may be exceptions to this rule if the mode @var{m} does not
> +carry enough information for the swapped comparison operator, or
> +if we try to detect overflow from the subtraction.  That means, while
> +0-X may overfow X-0 can never overflow.  Under these conditions
> +a compare may have the constant expression at the first operand.
> +Examples are the ARM negdi2_compare pattern and similar.
> +
>  A @code{compare} specifying two @code{VOIDmode} constants is not valid
>  since there is no way to know in what mode the comparison is to be
>  performed; the comparison must either be folded during the compilation
> 


Er, hold on.  Comparisons don't 'overflow'.  They compare two values and
tell you something about their relationship.  If A cmp B doesn't tell me
the same basic things as B swapped(cmp) A, then the world will fall
apart big time.  So your documentation patch can't be right.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH, ARM] correctly encode the CC reg data flow
  2017-10-09 13:02                   ` Richard Earnshaw (lists)
@ 2017-10-10 19:11                     ` Bernd Edlinger
  0 siblings, 0 replies; 21+ messages in thread
From: Bernd Edlinger @ 2017-10-10 19:11 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Kyrill Tkachov, gcc-patches
  Cc: Ramana Radhakrishnan, Wilco Dijkstra

On 10/09/17 15:02, Richard Earnshaw (lists) wrote:
> On 06/09/17 14:17, Bernd Edlinger wrote:
>> Index: gcc/doc/rtl.texi
>> ===================================================================
>> --- gcc/doc/rtl.texi	(revision 251752)
>> +++ gcc/doc/rtl.texi	(working copy)
>> @@ -2252,6 +2252,13 @@
>>   If one of the operands is a constant, it should be placed in the
>>   second operand and the comparison code adjusted as appropriate.
>>   
>> +There may be exceptions to this rule if the mode @var{m} does not
>> +carry enough information for the swapped comparison operator, or
>> +if we try to detect overflow from the subtraction.  That means, while
>> +0-X may overfow X-0 can never overflow.  Under these conditions

s/overfow/overflow/

>> +a compare may have the constant expression at the first operand.
>> +Examples are the ARM negdi2_compare pattern and similar.
>> +
>>   A @code{compare} specifying two @code{VOIDmode} constants is not valid
>>   since there is no way to know in what mode the comparison is to be
>>   performed; the comparison must either be folded during the compilation
>>
> 
> 
> Er, hold on.  Comparisons don't 'overflow'.  They compare two values and
> tell you something about their relationship.  If A cmp B doesn't tell me
> the same basic things as B swapped(cmp) A, then the world will fall
> apart big time.  So your documentation patch can't be right.
> 

Hi Richard,

I think a CC-mode like CC_NCV is inherently unsymmetrical
which means that it does in general not provide enough information for
the swapped(cmp), and I think the truth is that the N-flag
is the sign of A-B, and V-flag is 1 if A-B overflows otherwise 0.
And if you ask for Branch if less-than that is Branch if N ^ V.


But in the moment I have no idea how to proceed.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-10-10 19:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-18 13:15 [PATCH, ARM] correctly encode the CC reg data flow Bernd Edlinger
2017-01-13 13:50 ` Richard Earnshaw (lists)
2017-01-13 16:10   ` Bernd Edlinger
2017-01-13 18:29     ` Bernd Edlinger
2017-01-18 15:43       ` Bernd Edlinger
2017-04-20 19:10         ` [PING] " Bernd Edlinger
2017-04-29 17:32           ` [PING**2] " Bernd Edlinger
2017-05-12 16:50             ` [PING**3] " Bernd Edlinger
2017-06-01 16:01               ` [PING**4] " Bernd Edlinger
     [not found]               ` <eb07f6a9-522b-0497-fc13-f3e4508b8277@hotmail.de>
2017-06-14 12:34                 ` [PING**5] " Bernd Edlinger
     [not found]                 ` <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de>
2017-07-05 18:11                   ` [PING**6] " Bernd Edlinger
2017-09-04 13:55         ` Kyrill Tkachov
2017-09-04 19:54           ` Bernd Edlinger
     [not found]           ` <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>
2017-09-06 12:44             ` Bernd Edlinger
2017-09-06 12:52               ` Richard Earnshaw (lists)
2017-09-06 13:00                 ` Bernd Edlinger
2017-09-06 13:17                 ` Bernd Edlinger
2017-09-06 15:31                   ` Kyrill Tkachov
2017-09-17  8:38                     ` [PING] " Bernd Edlinger
2017-10-09 13:02                   ` Richard Earnshaw (lists)
2017-10-10 19:11                     ` Bernd Edlinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).