public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][ARM] Remove DImode expansions for 1-bit shifts
@ 2017-01-17 19:48 Wilco Dijkstra
  2017-01-17 21:26 ` kugan
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Wilco Dijkstra @ 2017-01-17 19:48 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Kyrill Tkachov, Richard Earnshaw

A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
        (arm_ashldi3_1bit): Remove pattern.
        (ashrdi3): Remove shift by 1 expansion.
        (arm_ashrdi3_1bit): Remove pattern.
        (lshrdi3): Remove shift by 1 expansion.
        (arm_lshrdi3_1bit): Remove pattern.
        * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
        cost of ashldi3 by 1.
        * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
        (<shift>di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 		   + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
 	  if (speed_p)
 	    *cost += 2 * extra_cost->alu.shift;
+	  /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+	  if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+	    *cost += 1;
 	  return true;
 	}
       else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI            0 "s_register_operand" "")
 	(ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
 	(ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
 	(lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1184,12 +1184,8 @@
 	gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
 		    || REGNO (operands[0]) == REGNO (operands[1]));
 
-	if (operands[2] == CONST1_RTX (SImode))
-	  /* This clobbers CC.  */
-	  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-	else
-	  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-					 operands[2], operands[3], operands[4]);
+	arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+				       operands[2], operands[3], operands[4]);
       }
     DONE;
   }"
@@ -1288,13 +1284,9 @@
 	gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
 		    || REGNO (operands[0]) == REGNO (operands[1]));
 
-	if (operands[2] == CONST1_RTX (SImode))
-	  /* This clobbers CC.  */
-	  emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
-	else
-	  /* This clobbers CC (ASHIFTRT by register only).  */
-	  arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
-				 	 operands[2], operands[3], operands[4]);
+	/* This clobbers CC (ASHIFTRT by register only).  */
+	arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
+				       operands[2], operands[3], operands[4]);
       }
 
     DONE;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-01-17 19:48 [PATCH][ARM] Remove DImode expansions for 1-bit shifts Wilco Dijkstra
@ 2017-01-17 21:26 ` kugan
  2017-01-17 23:41   ` Wilco Dijkstra
  2017-02-02 14:43 ` Wilco Dijkstra
  2017-04-20 16:01 ` Wilco Dijkstra
  2 siblings, 1 reply; 12+ messages in thread
From: kugan @ 2017-01-17 21:26 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches; +Cc: nd, Kyrill Tkachov, Richard Earnshaw

Hi Wilco,

On 18/01/17 06:23, Wilco Dijkstra wrote:
> ChangeLog:
> 2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
>         (arm_ashldi3_1bit): Remove pattern.
>         (ashrdi3): Remove shift by 1 expansion.
>         (arm_ashrdi3_1bit): Remove pattern.
>         (lshrdi3): Remove shift by 1 expansion.
>         (arm_lshrdi3_1bit): Remove pattern.
>         * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
>         cost of ashldi3 by 1.
>         * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
>         (<shift>di3_neon): Likewise.
> --
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
>  		   + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
>  	  if (speed_p)
>  	    *cost += 2 * extra_cost->alu.shift;
> +	  /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
> +	  if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
> +	    *cost += 1;
>  	  return true;
>  	}
Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is 
done only for SImode. Am I missing something?

Also, what was the motivation for this patch. Is that to improve the 
maintainability of the arm back-end?

Thanks,
Kugan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-01-17 21:26 ` kugan
@ 2017-01-17 23:41   ` Wilco Dijkstra
  0 siblings, 0 replies; 12+ messages in thread
From: Wilco Dijkstra @ 2017-01-17 23:41 UTC (permalink / raw)
  To: kugan, GCC Patches; +Cc: nd, Kyrill Tkachov, Richard Earnshaw

kugan wrote:
> Wilco Dijkstra wrote:
> > +       /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
> > +       if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))

> Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is 
> done only for SImode. Am I missing something?

The diff doesn't show enough context, but this is inside an if that checks
for DImode shifts. Note the shift count is SImode. 

> Also, what was the motivation for this patch. Is that to improve the 
> maintainability of the arm back-end?

These particular patterns should never have existed. Optimized
expansions should be added to arm_emit_coreregs_64bit_shift.

You may have noticed a few patches have been proposed recently to 
improve the generated code of DImode operations (PR77308).
The key realization was that GCC will generate absolutely terrible code 
unless either all DImode operations are split before register allocation,
or we only use Neon instructions. There is no middle ground here, trying
to allocate DImode registers from only 5 available register pairs (if lucky)
just isn't going to work.

So the goal is to enable early splitting in all DImode patterns. Removing
no-split multi-instruction patterns helps -  these are a bad idea anyway.

Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-01-17 19:48 [PATCH][ARM] Remove DImode expansions for 1-bit shifts Wilco Dijkstra
  2017-01-17 21:26 ` kugan
@ 2017-02-02 14:43 ` Wilco Dijkstra
  2017-02-23 16:58   ` Wilco Dijkstra
  2017-04-20 16:01 ` Wilco Dijkstra
  2 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2017-02-02 14:43 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Kyrill Tkachov, Richard Earnshaw


ping



From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
        (arm_ashldi3_1bit): Remove pattern.
        (ashrdi3): Remove shift by 1 expansion.
        (arm_ashrdi3_1bit): Remove pattern.
        (lshrdi3): Remove shift by 1 expansion.
        (arm_lshrdi3_1bit): Remove pattern.
        * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
        cost of ashldi3 by 1.
        * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
        (<shift>di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
                    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
           if (speed_p)
             *cost += 2 * extra_cost->alu.shift;
+         /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+         if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+           *cost += 1;
           return true;
         }
       else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI            0 "s_register_operand" "")
         (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1184,12 +1184,8 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-       else
-         arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-                                        operands[2], operands[3], operands[4]);
+       arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
     DONE;
   }"
@@ -1288,13 +1284,9 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
-       else
-         /* This clobbers CC (ASHIFTRT by register only).  */
-         arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
-                                  operands[2], operands[3], operands[4]);
+       /* This clobbers CC (ASHIFTRT by register only).  */
+       arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
 
     DONE;    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-02-02 14:43 ` Wilco Dijkstra
@ 2017-02-23 16:58   ` Wilco Dijkstra
  0 siblings, 0 replies; 12+ messages in thread
From: Wilco Dijkstra @ 2017-02-23 16:58 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd, Kyrill Tkachov, Richard Earnshaw

    

ping



From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
        (arm_ashldi3_1bit): Remove pattern.
        (ashrdi3): Remove shift by 1 expansion.
        (arm_ashrdi3_1bit): Remove pattern.
        (lshrdi3): Remove shift by 1 expansion.
        (arm_lshrdi3_1bit): Remove pattern.
        * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
        cost of ashldi3 by 1.
        * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
        (<shift>di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
                    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
           if (speed_p)
             *cost += 2 * extra_cost->alu.shift;
+         /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+         if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+           *cost += 1;
           return true;
         }
       else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI            0 "s_register_operand" "")
         (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1184,12 +1184,8 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-       else
-         arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-                                        operands[2], operands[3], operands[4]);
+       arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
     DONE;
   }"
@@ -1288,13 +1284,9 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
-       else
-         /* This clobbers CC (ASHIFTRT by register only).  */
-         arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
-                                  operands[2], operands[3], operands[4]);
+       /* This clobbers CC (ASHIFTRT by register only).  */
+       arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
 
     DONE;        

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-01-17 19:48 [PATCH][ARM] Remove DImode expansions for 1-bit shifts Wilco Dijkstra
  2017-01-17 21:26 ` kugan
  2017-02-02 14:43 ` Wilco Dijkstra
@ 2017-04-20 16:01 ` Wilco Dijkstra
  2017-06-13 14:01   ` Wilco Dijkstra
  2 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2017-04-20 16:01 UTC (permalink / raw)
  To: GCC Patches, Kyrill Tkachov; +Cc: nd, Richard Earnshaw


ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
        (arm_ashldi3_1bit): Remove pattern.
        (ashrdi3): Remove shift by 1 expansion.
        (arm_ashrdi3_1bit): Remove pattern.
        (lshrdi3): Remove shift by 1 expansion.
        (arm_lshrdi3_1bit): Remove pattern.
        * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
        cost of ashldi3 by 1.
        * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
        (<shift>di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
                    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
           if (speed_p)
             *cost += 2 * extra_cost->alu.shift;
+         /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+         if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+           *cost += 1;
           return true;
         }
       else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI            0 "s_register_operand" "")
         (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1184,12 +1184,8 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-       else
-         arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-                                        operands[2], operands[3], operands[4]);
+       arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
     DONE;
   }"
@@ -1288,13 +1284,9 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
-       else
-         /* This clobbers CC (ASHIFTRT by register only).  */
-         arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
-                                  operands[2], operands[3], operands[4]);
+       /* This clobbers CC (ASHIFTRT by register only).  */
+       arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
 
     DONE;    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-04-20 16:01 ` Wilco Dijkstra
@ 2017-06-13 14:01   ` Wilco Dijkstra
  2017-06-27 15:38     ` Wilco Dijkstra
  0 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2017-06-13 14:01 UTC (permalink / raw)
  To: GCC Patches, Kyrill Tkachov; +Cc: nd, Richard Earnshaw


ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
        (arm_ashldi3_1bit): Remove pattern.
        (ashrdi3): Remove shift by 1 expansion.
        (arm_ashrdi3_1bit): Remove pattern.
        (lshrdi3): Remove shift by 1 expansion.
        (arm_lshrdi3_1bit): Remove pattern.
        * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
        cost of ashldi3 by 1.
        * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
        (<shift>di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
                    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
           if (speed_p)
             *cost += 2 * extra_cost->alu.shift;
+         /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+         if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+           *cost += 1;
           return true;
         }
       else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI            0 "s_register_operand" "")
         (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1184,12 +1184,8 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-       else
-         arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-                                        operands[2], operands[3], operands[4]);
+       arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
     DONE;
   }"
@@ -1288,13 +1284,9 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
-       else
-         /* This clobbers CC (ASHIFTRT by register only).  */
-         arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
-                                  operands[2], operands[3], operands[4]);
+       /* This clobbers CC (ASHIFTRT by register only).  */
+       arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
 
     DONE;        

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-06-13 14:01   ` Wilco Dijkstra
@ 2017-06-27 15:38     ` Wilco Dijkstra
  2017-10-16 11:45       ` Wilco Dijkstra
  0 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2017-06-27 15:38 UTC (permalink / raw)
  To: GCC Patches, Kyrill Tkachov; +Cc: nd, Richard Earnshaw

    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
        (arm_ashldi3_1bit): Remove pattern.
        (ashrdi3): Remove shift by 1 expansion.
        (arm_ashrdi3_1bit): Remove pattern.
        (lshrdi3): Remove shift by 1 expansion.
        (arm_lshrdi3_1bit): Remove pattern.
        * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
        cost of ashldi3 by 1.
        * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
        (<shift>di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
                    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
           if (speed_p)
             *cost += 2 * extra_cost->alu.shift;
+         /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+         if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+           *cost += 1;
           return true;
         }
       else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI            0 "s_register_operand" "")
         (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1184,12 +1184,8 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-       else
-         arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-                                        operands[2], operands[3], operands[4]);
+       arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
     DONE;
   }"
@@ -1288,13 +1284,9 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
-       else
-         /* This clobbers CC (ASHIFTRT by register only).  */
-         arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
-                                  operands[2], operands[3], operands[4]);
+       /* This clobbers CC (ASHIFTRT by register only).  */
+       arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
 
     DONE;            

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-06-27 15:38     ` Wilco Dijkstra
@ 2017-10-16 11:45       ` Wilco Dijkstra
  2017-10-27 16:49         ` Kyrill Tkachov
  0 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2017-10-16 11:45 UTC (permalink / raw)
  To: GCC Patches, Kyrill Tkachov; +Cc: nd, Richard Earnshaw

    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
        (arm_ashldi3_1bit): Remove pattern.
        (ashrdi3): Remove shift by 1 expansion.
        (arm_ashrdi3_1bit): Remove pattern.
        (lshrdi3): Remove shift by 1 expansion.
        (arm_lshrdi3_1bit): Remove pattern.
        * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
        cost of ashldi3 by 1.
        * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
        (<shift>di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
                    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
           if (speed_p)
             *cost += 2 * extra_cost->alu.shift;
+         /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+         if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+           *cost += 1;
           return true;
         }
       else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI            0 "s_register_operand" "")
         (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
     {
       rtx scratch1, scratch2;
 
-      if (operands[2] == CONST1_RTX (SImode))
-        {
-          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-          DONE;
-        }
-
       /* Ideally we should use iwmmxt here if we could know that operands[1]
          ends up already living in an iwmmxt register. Otherwise it's
          cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
-        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
-                     (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI              0 "s_register_operand" "")
         (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1184,12 +1184,8 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-       else
-         arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-                                        operands[2], operands[3], operands[4]);
+       arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
     DONE;
   }"
@@ -1288,13 +1284,9 @@
         gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
                     || REGNO (operands[0]) == REGNO (operands[1]));
 
-       if (operands[2] == CONST1_RTX (SImode))
-         /* This clobbers CC.  */
-         emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
-       else
-         /* This clobbers CC (ASHIFTRT by register only).  */
-         arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
-                                  operands[2], operands[3], operands[4]);
+       /* This clobbers CC (ASHIFTRT by register only).  */
+       arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
+                                      operands[2], operands[3], operands[4]);
       }
 
     DONE;                

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-10-16 11:45       ` Wilco Dijkstra
@ 2017-10-27 16:49         ` Kyrill Tkachov
  2017-10-30 14:04           ` Wilco Dijkstra
  0 siblings, 1 reply; 12+ messages in thread
From: Kyrill Tkachov @ 2017-10-27 16:49 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches; +Cc: nd, Richard Earnshaw

Hi Wilco,

Sorry for the delay.

On 16/10/17 12:30, Wilco Dijkstra wrote:
>      
>
> ping
>
> From: Wilco Dijkstra
> Sent: 17 January 2017 19:23
> To: GCC Patches
> Cc: nd; Kyrill Tkachov; Richard Earnshaw
> Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
>      
> A left shift of 1 can always be done using an add, so slightly adjust rtx
> cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
> and the arm_ashldi3_1bit is redundant.

I agree...

> DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
> so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
> patterns.

... but it's still used, and the patterns were put there for a reason.
Even if GCC itself doesn't use them much they may be used by other 
applications.

So I'd support removing the left shift 1-bit expansions, but not the 
right shift ones.

Thanks,
Kyrill

> Bootstrap OK on arm-linux-gnueabihf.
>
> ChangeLog:
> 2017-01-17  Wilco Dijkstra  <wdijkstr@arm.com>
>
>          * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
>          (arm_ashldi3_1bit): Remove pattern.
>          (ashrdi3): Remove shift by 1 expansion.
>          (arm_ashrdi3_1bit): Remove pattern.
>          (lshrdi3): Remove shift by 1 expansion.
>          (arm_lshrdi3_1bit): Remove pattern.
>          * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
>          cost of ashldi3 by 1.
>          * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
>          (<shift>di3_neon): Likewise.
> --
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
>                      + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
>             if (speed_p)
>               *cost += 2 * extra_cost->alu.shift;
> +         /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
> +         if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
> +           *cost += 1;
>             return true;
>           }
>         else if (mode == SImode)
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -4061,12 +4061,6 @@
>       {
>         rtx scratch1, scratch2;
>   
> -      if (operands[2] == CONST1_RTX (SImode))
> -        {
> -          emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
> -          DONE;
> -        }
> -
>         /* Ideally we should use iwmmxt here if we could know that operands[1]
>            ends up already living in an iwmmxt register. Otherwise it's
>            cheaper to have the alternate code being generated than moving
> @@ -4083,18 +4077,6 @@
>     "
>   )
>   
> -(define_insn "arm_ashldi3_1bit"
> -  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
> -        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
> -                   (const_int 1)))
> -   (clobber (reg:CC CC_REGNUM))]
> -  "TARGET_32BIT"
> -  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
> -  [(set_attr "conds" "clob")
> -   (set_attr "length" "8")
> -   (set_attr "type" "multiple")]
> -)
> -
>   (define_expand "ashlsi3"
>     [(set (match_operand:SI            0 "s_register_operand" "")
>           (ashift:SI (match_operand:SI 1 "s_register_operand" "")
> @@ -4130,12 +4112,6 @@
>       {
>         rtx scratch1, scratch2;
>   
> -      if (operands[2] == CONST1_RTX (SImode))
> -        {
> -          emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
> -          DONE;
> -        }
> -
>         /* Ideally we should use iwmmxt here if we could know that operands[1]
>            ends up already living in an iwmmxt register. Otherwise it's
>            cheaper to have the alternate code being generated than moving
> @@ -4152,18 +4128,6 @@
>     "
>   )
>   
> -(define_insn "arm_ashrdi3_1bit"
> -  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
> -        (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
> -                     (const_int 1)))
> -   (clobber (reg:CC CC_REGNUM))]
> -  "TARGET_32BIT"
> -  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
> -  [(set_attr "conds" "clob")
> -   (set_attr "length" "8")
> -   (set_attr "type" "multiple")]
> -)
> -
>   (define_expand "ashrsi3"
>     [(set (match_operand:SI              0 "s_register_operand" "")
>           (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
> @@ -4196,12 +4160,6 @@
>       {
>         rtx scratch1, scratch2;
>   
> -      if (operands[2] == CONST1_RTX (SImode))
> -        {
> -          emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
> -          DONE;
> -        }
> -
>         /* Ideally we should use iwmmxt here if we could know that operands[1]
>            ends up already living in an iwmmxt register. Otherwise it's
>            cheaper to have the alternate code being generated than moving
> @@ -4218,18 +4176,6 @@
>     "
>   )
>   
> -(define_insn "arm_lshrdi3_1bit"
> -  [(set (match_operand:DI              0 "s_register_operand" "=r,&r")
> -        (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
> -                     (const_int 1)))
> -   (clobber (reg:CC CC_REGNUM))]
> -  "TARGET_32BIT"
> -  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
> -  [(set_attr "conds" "clob")
> -   (set_attr "length" "8")
> -   (set_attr "type" "multiple")]
> -)
> -
>   (define_expand "lshrsi3"
>     [(set (match_operand:SI              0 "s_register_operand" "")
>           (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index cf281df0292d0f511d7d63e828886d860a3a8201..ebac36db8db3a74a16cc4ef7f76b1edd90e28fc9 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -1184,12 +1184,8 @@
>           gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
>                       || REGNO (operands[0]) == REGNO (operands[1]));
>   
> -       if (operands[2] == CONST1_RTX (SImode))
> -         /* This clobbers CC.  */
> -         emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
> -       else
> -         arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
> -                                        operands[2], operands[3], operands[4]);
> +       arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
> +                                      operands[2], operands[3], operands[4]);
>         }
>       DONE;
>     }"
> @@ -1288,13 +1284,9 @@
>           gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1])
>                       || REGNO (operands[0]) == REGNO (operands[1]));
>   
> -       if (operands[2] == CONST1_RTX (SImode))
> -         /* This clobbers CC.  */
> -         emit_insn (gen_arm_<shift>di3_1bit (operands[0], operands[1]));
> -       else
> -         /* This clobbers CC (ASHIFTRT by register only).  */
> -         arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
> -                                  operands[2], operands[3], operands[4]);
> +       /* This clobbers CC (ASHIFTRT by register only).  */
> +       arm_emit_coreregs_64bit_shift (<CODE>, operands[0], operands[1],
> +                                      operands[2], operands[3], operands[4]);
>         }
>   
>       DONE;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-10-27 16:49         ` Kyrill Tkachov
@ 2017-10-30 14:04           ` Wilco Dijkstra
  2017-10-30 18:25             ` Kyrill Tkachov
  0 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2017-10-30 14:04 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: nd, Richard Earnshaw

Kyrill Tkachov wrote:
> On 16/10/17 12:30, Wilco Dijkstra wrote:

> > DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
> > so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
> > patterns.
>
> ... but it's still used, and the patterns were put there for a reason.
> Even if GCC itself doesn't use them much they may be used by other 
> applications.
> 
> So I'd support removing the left shift 1-bit expansions, but not the 
> right shift ones.

The purpose of removing the shift-by-1 cases is not just to cleanup code
but also to improve code generation. These shifts cannot expand early
and thus don't benefit from optimization (like shift merging). They also
suffer from the DImode register allocation issues.

As a simple example this loop runs >20% faster with my patch on both
Cortex-A53 and Cortex-A57 when built with -mfpu=vfp:

long long loop1 (long long x, long long y, int n)
{
  int i;
   for (i = 0; i < n; i++)
   {
      x >>= 1;
      x |= y;
      x >>= 1;
      x |= y;
      x >>= 1;
      x |= y;
      x >>= 1;
      x |= y;
   }
   return x;
}

So given these shifts are bad for performance, why have them at all?

Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
  2017-10-30 14:04           ` Wilco Dijkstra
@ 2017-10-30 18:25             ` Kyrill Tkachov
  0 siblings, 0 replies; 12+ messages in thread
From: Kyrill Tkachov @ 2017-10-30 18:25 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches; +Cc: nd, Richard Earnshaw


On 30/10/17 13:54, Wilco Dijkstra wrote:
> Kyrill Tkachov wrote:
>> On 16/10/17 12:30, Wilco Dijkstra wrote:
>>> DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
>>> so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
>>> patterns.
>> ... but it's still used, and the patterns were put there for a reason.
>> Even if GCC itself doesn't use them much they may be used by other
>> applications.
>>
>> So I'd support removing the left shift 1-bit expansions, but not the
>> right shift ones.
> The purpose of removing the shift-by-1 cases is not just to cleanup code
> but also to improve code generation. These shifts cannot expand early
> and thus don't benefit from optimization (like shift merging). They also
> suffer from the DImode register allocation issues.
>
> As a simple example this loop runs >20% faster with my patch on both
> Cortex-A53 and Cortex-A57 when built with -mfpu=vfp:
>
> long long loop1 (long long x, long long y, int n)
> {
>    int i;
>     for (i = 0; i < n; i++)
>     {
>        x >>= 1;
>        x |= y;
>        x >>= 1;
>        x |= y;
>        x >>= 1;
>        x |= y;
>        x >>= 1;
>        x |= y;
>     }
>     return x;
> }
>
> So given these shifts are bad for performance, why have them at all?

Thanks, that makes sense.
Ok for trunk.

Thanks,
Kyrill

> Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-30 18:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-17 19:48 [PATCH][ARM] Remove DImode expansions for 1-bit shifts Wilco Dijkstra
2017-01-17 21:26 ` kugan
2017-01-17 23:41   ` Wilco Dijkstra
2017-02-02 14:43 ` Wilco Dijkstra
2017-02-23 16:58   ` Wilco Dijkstra
2017-04-20 16:01 ` Wilco Dijkstra
2017-06-13 14:01   ` Wilco Dijkstra
2017-06-27 15:38     ` Wilco Dijkstra
2017-10-16 11:45       ` Wilco Dijkstra
2017-10-27 16:49         ` Kyrill Tkachov
2017-10-30 14:04           ` Wilco Dijkstra
2017-10-30 18:25             ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).