[PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR
@ 2014-08-12 14:38 Alan Lawrence
  2014-08-12 14:40 ` [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers Alan Lawrence
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Alan Lawrence @ 2014-08-12 14:38 UTC (permalink / raw)
  To: gcc-patches

Following the change to make arm_neon.h's (u?)int64x1_t types into vectors, 
these types are now passed in the SIMD registers rather than general purpose 
registers, which often results in poor quality code in functions taking or 
returning these types. Often values are moved from vector registers into GPRs, 
an operation performed, and the value moved back - yet the architecture is 
capable of performing the operation directly on the SIMD registers. Hence these 
patches are small tweaks to the relevant patterns.

The third patch, allowing AND+OR directly on values in SIMD registers, is more 
complicated, as the XOR pattern there was never matched (due to action of 
simplify_rtx) - the new pattern should also start to be used in GPR registers.

Regression-tested check-gcc check-g++ on aarch64-none-elf and aarch64_be-none-elf.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers
  2014-08-12 14:38 [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence
@ 2014-08-12 14:40 ` Alan Lawrence
  2014-08-12 15:53   ` pinskia
  2014-08-12 14:44 ` [PATCH AArch64 2/3] Add SIMD-reg variants of logical operators and/ior/xor/not Alan Lawrence
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Alan Lawrence @ 2014-08-12 14:40 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 137 bytes --]

(It is no more expensive.)

gcc/ChangeLog:

	* config/aarch64/aarch64.md (subdi3, adddi3_aarch64): Don't penalize
	SIMD reg variant.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: dont_disparage_addsub.patch --]
[-- Type: text/x-patch; name=dont_disparage_addsub.patch, Size: 1183 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f8eb305140e7b0aed006b33f1724a90939e48316..0a8ca4bcc7941f912c8d42200b33206d4188fa48 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1188,10 +1188,10 @@
 
 (define_insn "*adddi3_aarch64"
   [(set
-    (match_operand:DI 0 "register_operand" "=rk,rk,rk,!w")
+    (match_operand:DI 0 "register_operand" "=rk,rk,rk,w")
     (plus:DI
-     (match_operand:DI 1 "register_operand" "%rk,rk,rk,!w")
-     (match_operand:DI 2 "aarch64_plus_operand" "I,r,J,!w")))]
+     (match_operand:DI 1 "register_operand" "%rk,rk,rk,w")
+     (match_operand:DI 2 "aarch64_plus_operand" "I,r,J,w")))]
   ""
   "@
   add\\t%x0, %x1, %2
@@ -1662,9 +1662,9 @@
 )
 
 (define_insn "subdi3"
-  [(set (match_operand:DI 0 "register_operand" "=rk,!w")
-	(minus:DI (match_operand:DI 1 "register_operand" "r,!w")
-		   (match_operand:DI 2 "register_operand" "r,!w")))]
+  [(set (match_operand:DI 0 "register_operand" "=rk,w")
+	(minus:DI (match_operand:DI 1 "register_operand" "r,w")
+		   (match_operand:DI 2 "register_operand" "r,w")))]
   ""
   "@
    sub\\t%x0, %x1, %x2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH AArch64 2/3] Add SIMD-reg variants of logical operators and/ior/xor/not
  2014-08-12 14:38 [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence
  2014-08-12 14:40 ` [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers Alan Lawrence
@ 2014-08-12 14:44 ` Alan Lawrence
  2014-09-02 15:10   ` Marcus Shawcroft
  2014-08-12 14:48 ` [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON Alan Lawrence
  2014-12-19 18:04 ` [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence
  3 siblings, 1 reply; 15+ messages in thread
From: Alan Lawrence @ 2014-08-12 14:44 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 391 bytes --]

This patch adds SIMD register variants for and, ior, xor and not - similarly to 
add/sub, the H/W supports it, and it'll be more efficient if the values are 
there already, e.g. if passed as [u]int64x1_t parameters.

gcc/ChangeLog:

	* config/aarch64/aarch64.md (<optab><mode>3, one_cmpl<mode>2):
	Add SIMD-register variant.
	* config/aarch64/iterators.md (Vbtype): Add value for SI.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: simd_reg_andiorxor.patch --]
[-- Type: text/x-patch; name=simd_reg_andiorxor.patch, Size: 2247 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0a8ca4bcc7941f912c8d42200b33206d4188fa48..8eaf1be3ba6e39ca00a2ae3905e84375b354ccd8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2608,12 +2608,16 @@
 ;; -------------------------------------------------------------------
 
 (define_insn "<optab><mode>3"
-  [(set (match_operand:GPI 0 "register_operand" "=r,rk")
-	(LOGICAL:GPI (match_operand:GPI 1 "register_operand" "%r,r")
-		     (match_operand:GPI 2 "aarch64_logical_operand" "r,<lconst>")))]
+  [(set (match_operand:GPI 0 "register_operand" "=r,rk,w")
+	(LOGICAL:GPI (match_operand:GPI 1 "register_operand" "%r,r,w")
+		     (match_operand:GPI 2 "aarch64_logical_operand" "r,<lconst>,w")))]
   ""
-  "<logical>\\t%<w>0, %<w>1, %<w>2"
-  [(set_attr "type" "logic_reg,logic_imm")]
+  "@
+  <logical>\\t%<w>0, %<w>1, %<w>2
+  <logical>\\t%<w>0, %<w>1, %<w>2
+  <logical>\\t%0.<Vbtype>, %1.<Vbtype>, %2.<Vbtype>"
+  [(set_attr "type" "logic_reg,logic_imm,neon_logic")
+   (set_attr "simd" "*,*,yes")]
 )
 
 ;; zero_extend version of above
@@ -2734,11 +2738,14 @@
 )
 
 (define_insn "one_cmpl<mode>2"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
-	(not:GPI (match_operand:GPI 1 "register_operand" "r")))]
+  [(set (match_operand:GPI 0 "register_operand" "=r,w")
+	(not:GPI (match_operand:GPI 1 "register_operand" "r,w")))]
   ""
-  "mvn\\t%<w>0, %<w>1"
-  [(set_attr "type" "logic_reg")]
+  "@
+  mvn\\t%<w>0, %<w>1
+  mvn\\t%0.8b, %1.8b"
+  [(set_attr "type" "logic_reg,neon_logic")
+   (set_attr "simd" "*,yes")]
 )
 
 (define_insn "*one_cmpl_<optab><mode>2"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 3203c3da7e293d566d1ea329856cbef8fb73a825..b7f1d5709eeda0362117f7de3800b99048352225 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -401,7 +401,8 @@
 			  (V2SI "8b") (V4SI  "16b")
 			  (V2DI "16b") (V2SF  "8b")
 			  (V4SF "16b") (V2DF  "16b")
-			  (DI   "8b")  (DF    "8b")])
+			  (DI   "8b")  (DF    "8b")
+			  (SI   "8b")])
 
 ;; Define element mode for each vector mode.
 (define_mode_attr VEL [(V8QI "QI") (V16QI "QI")

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON
  2014-08-12 14:38 [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence
  2014-08-12 14:40 ` [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers Alan Lawrence
  2014-08-12 14:44 ` [PATCH AArch64 2/3] Add SIMD-reg variants of logical operators and/ior/xor/not Alan Lawrence
@ 2014-08-12 14:48 ` Alan Lawrence
  2014-08-12 14:55   ` Alan Lawrence
  2014-12-19 18:04 ` [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence
  3 siblings, 1 reply; 15+ messages in thread
From: Alan Lawrence @ 2014-08-12 14:48 UTC (permalink / raw)
  To: gcc-patches

[When I wrote that xor was broken on GPRs and this fixes it, I meant 
xor_one_cmpl rather than xor, sorry!]

The pattern for xor_one_cmpl never matched, due to the action of 
combine_simplify_rtx; hence, separate this pattern out from that for ORN/BIC.

ORN/BIC have equivalent SIMD-reg variants, so add those for the benefit of 
values in vector registers (e.g. passed as [u]int64x1_t parameters).

EON does not have a SIMD-reg variant; however, it seems better to split it (to 
XOR + NOT) than to move both arguments to GPRs, perform EON, and move the result 
back.

gcc/ChangeLog:

	* config/aarch64/aarch64.c (<LOGICAL:optab>_one_cmpl<mode>3):
	Reparameterize to...
	(<NLOGICAL:optab>_one_cmpl<mode>3): with extra SIMD-register variant.
	(xor_one_cmpl<mode>3): New define_insn_and_split.

	* config/aarch64/iterators.md (NLOGICAL): New define_code_iterator.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/eon_1.c: New test.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON
  2014-08-12 14:48 ` [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON Alan Lawrence
@ 2014-08-12 14:55   ` Alan Lawrence
  2014-08-13  2:42     ` Kugan
  2014-09-02 15:14     ` Marcus Shawcroft
  0 siblings, 2 replies; 15+ messages in thread
From: Alan Lawrence @ 2014-08-12 14:55 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

...patch attached...

Alan Lawrence wrote:
> [When I wrote that xor was broken on GPRs and this fixes it, I meant 
> xor_one_cmpl rather than xor, sorry!]
> 
> The pattern for xor_one_cmpl never matched, due to the action of 
> combine_simplify_rtx; hence, separate this pattern out from that for ORN/BIC.
> 
> ORN/BIC have equivalent SIMD-reg variants, so add those for the benefit of 
> values in vector registers (e.g. passed as [u]int64x1_t parameters).
> 
> EON does not have a SIMD-reg variant; however, it seems better to split it (to 
> XOR + NOT) than to move both arguments to GPRs, perform EON, and move the result 
> back.
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64.c (<LOGICAL:optab>_one_cmpl<mode>3):
> 	Reparameterize to...
> 	(<NLOGICAL:optab>_one_cmpl<mode>3): with extra SIMD-register variant.
> 	(xor_one_cmpl<mode>3): New define_insn_and_split.
> 
> 	* config/aarch64/iterators.md (NLOGICAL): New define_code_iterator.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/aarch64/eon_1.c: New test.
> 
> 
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: simd_eon.patch --]
[-- Type: text/x-patch; name=simd_eon.patch, Size: 3943 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8eaf1be3ba6e39ca00a2ae3905e84375b354ccd8..2b9cc29148e699b8b6839b6e1294d0eebcad9001 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2757,14 +2757,36 @@
   [(set_attr "type" "logic_shift_imm")]
 )
 
-(define_insn "*<LOGICAL:optab>_one_cmpl<mode>3"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
-	(LOGICAL:GPI (not:GPI
-		      (match_operand:GPI 1 "register_operand" "r"))
-		     (match_operand:GPI 2 "register_operand" "r")))]
+;; Binary logical operators negating one operand, i.e. (a & !b), (a | !b).
+
+(define_insn "*<NLOGICAL:optab>_one_cmpl<mode>3"
+  [(set (match_operand:GPI 0 "register_operand" "=r,w")
+	(NLOGICAL:GPI (not:GPI (match_operand:GPI 1 "register_operand" "r,w"))
+		     (match_operand:GPI 2 "register_operand" "r,w")))]
+  ""
+  "@
+  <NLOGICAL:nlogical>\\t%<w>0, %<w>2, %<w>1
+  <NLOGICAL:nlogical>\\t%0.<Vbtype>, %2.<Vbtype>, %1.<Vbtype>"
+  [(set_attr "type" "logic_reg,neon_logic")
+   (set_attr "simd" "*,yes")]
+)
+
+;; (xor (not a) b) is simplify_rtx-ed down to (not (xor a b)).
+;; eon does not operate on SIMD registers so the vector variant must be split.
+(define_insn_and_split "*xor_one_cmpl<mode>3"
+  [(set (match_operand:GPI 0 "register_operand" "=r,w")
+        (not:GPI (xor:GPI (match_operand:GPI 1 "register_operand" "r,?w")
+                          (match_operand:GPI 2 "register_operand" "r,w"))))]
+  ""
+  "eon\\t%<w>0, %<w>1, %<w>2" ;; For GPR registers (only).
+  "reload_completed && (which_alternative == 1)" ;; For SIMD registers.
+  [(set (match_operand:GPI 0 "register_operand" "=w")
+        (xor:GPI (match_operand:GPI 1 "register_operand" "w")
+                 (match_operand:GPI 2 "register_operand" "w")))
+   (set (match_dup 0) (not:GPI (match_dup 0)))]
   ""
-  "<LOGICAL:nlogical>\\t%<w>0, %<w>2, %<w>1"
-  [(set_attr "type" "logic_reg")]
+  [(set_attr "type" "logic_reg,multiple")
+   (set_attr "simd" "*,yes")]
 )
 
 (define_insn "*and_one_cmpl<mode>3_compare0"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index b7f1d5709eeda0362117f7de3800b99048352225..da8bea2ea4f9e2cc8abae5375b908a247a7edc2f 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -668,6 +668,9 @@
 ;; Code iterator for logical operations
 (define_code_iterator LOGICAL [and ior xor])
 
+;; Code iterator for logical operations whose :nlogical works on SIMD registers.
+(define_code_iterator NLOGICAL [and ior])
+
 ;; Code iterator for sign/zero extension
 (define_code_iterator ANY_EXTEND [sign_extend zero_extend])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/eon_1.c b/gcc/testsuite/gcc.target/aarch64/eon_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..dcdf3b4d052e034e0475028b238bdff0105d4c44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/eon_1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* { dg-final { scan-assembler-not "\tf?mov\t" } } */
+
+typedef long long int64_t;
+typedef int64_t int64x1_t __attribute__ ((__vector_size__ (8)));
+
+/* { dg-final { scan-assembler-times "\\teon\\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" 1 } } */
+
+int64_t
+test_eon (int64_t a, int64_t b)
+{
+  return a ^ ~b;
+}
+
+/* { dg-final { scan-assembler-times "\\tmvn\\tx\[0-9\]+, x\[0-9\]+" 1 } } */
+int64_t
+test_not (int64_t a)
+{
+  return ~a;
+}
+
+/* There is no eon for SIMD regs; we prefer eor+mvn to mov+mov+eon+mov.  */
+
+/* { dg-final { scan-assembler-times "\\teor\\tv\[0-9\]+\.8b, v\[0-9\]+\.8b, v\[0-9\]+\.8b" 1 } } */
+/* { dg-final { scan-assembler-times "\\tmvn\\tv\[0-9\]+\.8b, v\[0-9\]+\.8b" 2 } } */
+int64x1_t
+test_vec_eon (int64x1_t a, int64x1_t b)
+{
+  return a ^ ~b;
+}
+
+int64x1_t
+test_vec_not (int64x1_t a)
+{
+  return ~a;
+}
+

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers
  2014-08-12 14:40 ` [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers Alan Lawrence
@ 2014-08-12 15:53   ` pinskia
  2014-08-13  8:36     ` James Greenhalgh
  2014-08-18 16:50     ` Alan Lawrence
  0 siblings, 2 replies; 15+ messages in thread
From: pinskia @ 2014-08-12 15:53 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches



> On Aug 12, 2014, at 7:40 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> 
> (It is no more expensive.)

Yes on some processors it could be. 

Thanks,
Andrew


> 
> gcc/ChangeLog:
> 
>    * config/aarch64/aarch64.md (subdi3, adddi3_aarch64): Don't penalize
>    SIMD reg variant.
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f8eb305140e7b0aed006b33f1724a90939e48316..0a8ca4bcc7941f912c8d42200b33206d4188fa48 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1188,10 +1188,10 @@ (define_insn "*adddi3_aarch64" [(set - (match_operand:DI 0 "register_operand" "=rk,rk,rk,!w") + (match_operand:DI 0 "register_operand" "=rk,rk,rk,w") (plus:DI - (match_operand:DI 1 "register_operand" "%rk,rk,rk,!w") - (match_operand:DI 2 "aarch64_plus_operand" "I,r,J,!w")))] + (match_operand:DI 1 "register_operand" "%rk,rk,rk,w") + (match_operand:DI 2 "aarch64_plus_operand" "I,r,J,w")))] "" "@ add\\t%x0, %x1, %2 @@ -1662,9 +1662,9 @@ ) (define_insn "subdi3" - [(set (match_operand:DI 0 "register_operand" "=rk,!w") -	(minus:DI (match_operand:DI 1 "register_operand" "r,!w") -	 (match_operand:DI 2 "register_operand" "r,!w")))] + [(set (match_operand:DI 0 "register_operand" "=rk,w") +	(minus:DI (match_operand:DI 1 "register_operand" "r,w") +	 (match_operand:DI 2 "register_operand" "r,w")))] "" "@ sub\\t%x0, %x1, %x2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON
  2014-08-12 14:55   ` Alan Lawrence
@ 2014-08-13  2:42     ` Kugan
  2014-08-13 10:59       ` Alan Lawrence
  2014-09-02 15:14     ` Marcus Shawcroft
  1 sibling, 1 reply; 15+ messages in thread
From: Kugan @ 2014-08-13  2:42 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches


On 13/08/14 00:55, Alan Lawrence wrote:
> ...patch attached...
> 
> Alan Lawrence wrote:
>> [When I wrote that xor was broken on GPRs and this fixes it, I meant
>> xor_one_cmpl rather than xor, sorry!]
>>
>> The pattern for xor_one_cmpl never matched, due to the action of
>> combine_simplify_rtx; hence, separate this pattern out from that for
>> ORN/BIC.
>>
>> ORN/BIC have equivalent SIMD-reg variants, so add those for the
>> benefit of values in vector registers (e.g. passed as [u]int64x1_t
>> parameters).
>>
>> EON does not have a SIMD-reg variant; however, it seems better to
>> split it (to XOR + NOT) than to move both arguments to GPRs, perform
>> EON, and move the result back.
>>


+;; (xor (not a) b) is simplify_rtx-ed down to (not (xor a b)).
+;; eon does not operate on SIMD registers so the vector variant must be
split.
+(define_insn_and_split "*xor_one_cmpl<mode>3"
+  [(set (match_operand:GPI 0 "register_operand" "=r,w")
+        (not:GPI (xor:GPI (match_operand:GPI 1 "register_operand" "r,?w")

Hi Alan,

Is there any specific reason for why you are disparaging slightly this
alternative  with Â‘?Â’. Your earlier patch removes '!' from subdi3.

Thanks,
Kugan


+                          (match_operand:GPI 2 "register_operand"
"r,w"))))]
+  ""
+  "eon\\t%<w>0, %<w>1, %<w>2" ;; For GPR registers (only).
+  "reload_completed && (which_alternative == 1)" ;; For SIMD registers.
+  [(set (match_operand:GPI 0 "register_operand" "=w")
+        (xor:GPI (match_operand:GPI 1 "register_operand" "w")
+                 (match_operand:GPI 2 "register_operand" "w")))
+   (set (match_dup 0) (not:GPI (match_dup 0)))]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers
  2014-08-12 15:53   ` pinskia
@ 2014-08-13  8:36     ` James Greenhalgh
  2014-08-13 16:08       ` Vladimir Makarov
  2014-08-18 16:50     ` Alan Lawrence
  1 sibling, 1 reply; 15+ messages in thread
From: James Greenhalgh @ 2014-08-13  8:36 UTC (permalink / raw)
  To: pinskia
  Cc: Alan Lawrence, gcc-patches, marcus.shawcroft, richard.earnshaw,
	ramana.radhakrishnan, vmakarov

On Tue, Aug 12, 2014 at 04:53:38PM +0100, pinskia@gmail.com wrote:
> 
> 
> > On Aug 12, 2014, at 7:40 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> > 
> > (It is no more expensive.)
> 
> Yes on some processors it could be. 

Haven't we been here before [1]?

This disparaging mechanism is still not going to give what we are trying to
achieve (assigning cost at the -mcpu/-mtune level, rather than the
target level). 

Recall that '!' gives a static cost of 600 [2] and that this cost is
only applied to the alternative if any operand in that alternative needs
reloading - otherwise, LRA sees a set of matching operands and does not
bother checking costs [3]. IRA, as far as I can see, does not care about
'!', but unconditionally adds 2 to the cost of an alternative for '?' [4].

Even if LRA did try to do a more complete job of always picking the
alternative with lowest cost (rather than the current first-matching
behaviour) "600" would be far too high a cost for the operation.

If IRA took '!' into account, we would be as well to remove the alternative
entirely.

So, I still can't agree that we want these exclamation marks - and we are
now in a halfway state where some instructions have them and some don't.
We have to pick a consistent policy or we are going to see some very poor
code generation.

In an ideal world, we would have a sensible way of describing a
(per-core granularity) alternative cost, which would be considered by
the register allocators. I've played about with doing this using attributes,
but it ends up as a messy patch and I can't bring myself to add yet another
cost framework to the back end.

Do you have any ideas as to how we can make some progress? Maybe Vladimir
has some suggestions?

Thanks,
James

[1] https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01627.html
[2] regoc.c::preprocess_constraints
[3] I may have misread this, but that seems to be what the final condition
    of the main loop of lra-constraints::process_alt_operands implies.
[4] ira-costs.c::record_reg_classes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON
  2014-08-13  2:42     ` Kugan
@ 2014-08-13 10:59       ` Alan Lawrence
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Lawrence @ 2014-08-13 10:59 UTC (permalink / raw)
  To: Kugan; +Cc: gcc-patches

So my reasoning was that the alternative is likely to be more expensive *on all 
cores*, as it is split to two instructions, whereas add/sub "could" be more 
expensive for *some* processors.

But yes I can see the argument, by my own logic and James Greenhalgh's, for 
removing the '?': it still doesn't really say what we want to say, which is that 
the instruction itself is more expensive, rather than anything to do with moving 
values into registers if/when reloading. At this point in time we don't have a 
framework that allows us to say that...

--Alan

Kugan wrote:
> On 13/08/14 00:55, Alan Lawrence wrote:
>> ...patch attached...
>>
>> Alan Lawrence wrote:
>>> [When I wrote that xor was broken on GPRs and this fixes it, I meant
>>> xor_one_cmpl rather than xor, sorry!]
>>>
>>> The pattern for xor_one_cmpl never matched, due to the action of
>>> combine_simplify_rtx; hence, separate this pattern out from that for
>>> ORN/BIC.
>>>
>>> ORN/BIC have equivalent SIMD-reg variants, so add those for the
>>> benefit of values in vector registers (e.g. passed as [u]int64x1_t
>>> parameters).
>>>
>>> EON does not have a SIMD-reg variant; however, it seems better to
>>> split it (to XOR + NOT) than to move both arguments to GPRs, perform
>>> EON, and move the result back.
>>>
> 
> 
> +;; (xor (not a) b) is simplify_rtx-ed down to (not (xor a b)).
> +;; eon does not operate on SIMD registers so the vector variant must be
> split.
> +(define_insn_and_split "*xor_one_cmpl<mode>3"
> +  [(set (match_operand:GPI 0 "register_operand" "=r,w")
> +        (not:GPI (xor:GPI (match_operand:GPI 1 "register_operand" "r,?w")
> 
> Hi Alan,
> 
> Is there any specific reason for why you are disparaging slightly this
> alternative  with ‘?’. Your earlier patch removes '!' from subdi3.
> 
> Thanks,
> Kugan
> 
> 
> +                          (match_operand:GPI 2 "register_operand"
> "r,w"))))]
> +  ""
> +  "eon\\t%<w>0, %<w>1, %<w>2" ;; For GPR registers (only).
> +  "reload_completed && (which_alternative == 1)" ;; For SIMD registers.
> +  [(set (match_operand:GPI 0 "register_operand" "=w")
> +        (xor:GPI (match_operand:GPI 1 "register_operand" "w")
> +                 (match_operand:GPI 2 "register_operand" "w")))
> +   (set (match_dup 0) (not:GPI (match_dup 0)))]
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers
  2014-08-13  8:36     ` James Greenhalgh
@ 2014-08-13 16:08       ` Vladimir Makarov
  0 siblings, 0 replies; 15+ messages in thread
From: Vladimir Makarov @ 2014-08-13 16:08 UTC (permalink / raw)
  To: James Greenhalgh, pinskia
  Cc: Alan Lawrence, gcc-patches, marcus.shawcroft, richard.earnshaw,
	ramana.radhakrishnan

On 2014-08-13, 4:36 AM, James Greenhalgh wrote:
> On Tue, Aug 12, 2014 at 04:53:38PM +0100, pinskia@gmail.com wrote:
>>
>>
>>> On Aug 12, 2014, at 7:40 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>
>>> (It is no more expensive.)
>>
>> Yes on some processors it could be.
>
> Haven't we been here before [1]?
>
> This disparaging mechanism is still not going to give what we are trying to
> achieve (assigning cost at the -mcpu/-mtune level, rather than the
> target level).
>
> Recall that '!' gives a static cost of 600 [2] and that this cost is
> only applied to the alternative if any operand in that alternative needs
> reloading - otherwise, LRA sees a set of matching operands and does not
> bother checking costs [3]. IRA, as far as I can see, does not care about
> '!', but unconditionally adds 2 to the cost of an alternative for '?' [4].
>

Yes, correct.  IRA follows the old global RA and the documentation.

> Even if LRA did try to do a more complete job of always picking the
> alternative with lowest cost (rather than the current first-matching
> behaviour) "600" would be far too high a cost for the operation.
>
> If IRA took '!' into account, we would be as well to remove the alternative
> entirely.
>

Yes, that is right.

> So, I still can't agree that we want these exclamation marks - and we are
> now in a halfway state where some instructions have them and some don't.
> We have to pick a consistent policy or we are going to see some very poor
> code generation.
>
> In an ideal world, we would have a sensible way of describing a
> (per-core granularity) alternative cost, which would be considered by
> the register allocators. I've played about with doing this using attributes,
> but it ends up as a messy patch and I can't bring myself to add yet another
> cost framework to the back end.
>
> Do you have any ideas as to how we can make some progress? Maybe Vladimir
> has some suggestions?
>

Yes, I have some thoughts.  We could provide machine-dependent hooks to 
treat different costs for '!' (which will work only for LRA/reload), and 
'?', and even '*' (to ignore or not the next register constraint for 
register preference).  The hook could get insn for which we calculate 
the costs and of course to have the current default values) for 
compatibility.

It could provide a lot of flexibility for machine-description writers.

Although it would create some conflicts with insn cost attribute 
calculation by alternatives and would create more complication for less 
experienced md writer.   So we need some more consistent solution.  It 
is all about code selection which currently is done in many places 
(combiner, register preferences and finally in reload/LRA).  A lot of 
thinking should be done how to better approach to the solution.

The right approach would be reconsider all the pipeline and probably 
rewriting combiner/register preferencing (which is moved from the old 
regclass.c practically without changes and ignore the fact that the 
alternative should be the same for different operands) / partially LRA. 
  So it is complex task and a big project.  But I am going to work in 
this direction when I start to be less busy.  I am only afraid that the 
quick solutions as mentioned by me above could create a lot of 
complications for the long-term project.

> Thanks,
> James
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01627.html
> [2] regoc.c::preprocess_constraints
> [3] I may have misread this, but that seems to be what the final condition
>      of the main loop of lra-constraints::process_alt_operands implies.
> [4] ira-costs.c::record_reg_classes
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers
  2014-08-12 15:53   ` pinskia
  2014-08-13  8:36     ` James Greenhalgh
@ 2014-08-18 16:50     ` Alan Lawrence
  2014-09-02 15:08       ` Marcus Shawcroft
  1 sibling, 1 reply; 15+ messages in thread
From: Alan Lawrence @ 2014-08-18 16:50 UTC (permalink / raw)
  To: pinskia; +Cc: gcc-patches

Well, you're right that it could be. So I presented the wrong justification.

Clearly we would benefit from some better cost infrastructure here, ideally that 
is expressive, taken into account at all appropriate stages of the compiler, and 
tunable per core. I imagine that steps (patches) towards such infrastructure 
would be welcomed by both AArch64 maintainers and more widely.

In the meantime, however, we must work with what we have. I'll still argue that 
we should remove the '!' (as per patch), however. As James has said, even if 
your add is more expensive in SIMD registers, the '!' still doesn't express 
that; and leaving it in affects code-generation on all cores. And it is 
inconsistent with other instructions.

--Alan

pinskia@gmail.com wrote:
> 
>> On Aug 12, 2014, at 7:40 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>
>> (It is no more expensive.)
> 
> Yes on some processors it could be. 
> 
> Thanks,
> Andrew
> 
> 
>> gcc/ChangeLog:
>>
>>    * config/aarch64/aarch64.md (subdi3, adddi3_aarch64): Don't penalize
>>    SIMD reg variant.
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f8eb305140e7b0aed006b33f1724a90939e48316..0a8ca4bcc7941f912c8d42200b33206d4188fa48 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1188,10 +1188,10 @@ (define_insn "*adddi3_aarch64" [(set - (match_operand:DI 0 "register_operand" "=rk,rk,rk,!w") + (match_operand:DI 0 "register_operand" "=rk,rk,rk,w") (plus:DI - (match_operand:DI 1 "register_operand" "%rk,rk,rk,!w") - (match_operand:DI 2 "aarch64_plus_operand" "I,r,J,!w")))] + (match_operand:DI 1 "register_operand" "%rk,rk,rk,w") + (match_operand:DI 2 "aarch64_plus_operand" "I,r,J,w")))] "" "@ add\\t%x0, %x1, %2 @@ -1662,9 +1662,9 @@ ) (define_insn "subdi3" - [(set (match_operand:DI 0 "register_operand" "=rk,!w") -	(minus:DI (match_operand:DI 1 "register_operand" "r,!w") -	 (match_operand:DI 2 "register_operand" "r,!w")))] + [(set (match_operand:DI 0 "register_operand" "=rk,w") +	(minus:DI (match_operand:DI 1 "
register_operand" "r,w") +	 (match_operand:DI 2 "register_operand" "r,w")))] "" "@ sub\\t%x0, %x1, %x2
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers
  2014-08-18 16:50     ` Alan Lawrence
@ 2014-09-02 15:08       ` Marcus Shawcroft
  0 siblings, 0 replies; 15+ messages in thread
From: Marcus Shawcroft @ 2014-09-02 15:08 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On 18 August 2014 17:50, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Well, you're right that it could be. So I presented the wrong justification.
>
> Clearly we would benefit from some better cost infrastructure here, ideally
> that is expressive, taken into account at all appropriate stages of the
> compiler, and tunable per core. I imagine that steps (patches) towards such
> infrastructure would be welcomed by both AArch64 maintainers and more
> widely.
>
> In the meantime, however, we must work with what we have. I'll still argue
> that we should remove the '!' (as per patch), however. As James has said,
> even if your add is more expensive in SIMD registers, the '!' still doesn't
> express that; and leaving it in affects code-generation on all cores. And it
> is inconsistent with other instructions.

Agreed and OK. /Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 2/3] Add SIMD-reg variants of logical operators and/ior/xor/not
  2014-08-12 14:44 ` [PATCH AArch64 2/3] Add SIMD-reg variants of logical operators and/ior/xor/not Alan Lawrence
@ 2014-09-02 15:10   ` Marcus Shawcroft
  0 siblings, 0 replies; 15+ messages in thread
From: Marcus Shawcroft @ 2014-09-02 15:10 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On 12 August 2014 15:43, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This patch adds SIMD register variants for and, ior, xor and not - similarly
> to add/sub, the H/W supports it, and it'll be more efficient if the values
> are there already, e.g. if passed as [u]int64x1_t parameters.
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.md (<optab><mode>3, one_cmpl<mode>2):
>         Add SIMD-register variant.
>         * config/aarch64/iterators.md (Vbtype): Add value for SI.

OK /Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON
  2014-08-12 14:55   ` Alan Lawrence
  2014-08-13  2:42     ` Kugan
@ 2014-09-02 15:14     ` Marcus Shawcroft
  1 sibling, 0 replies; 15+ messages in thread
From: Marcus Shawcroft @ 2014-09-02 15:14 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On 12 August 2014 15:55, Alan Lawrence <alan.lawrence@arm.com> wrote:

>> gcc/ChangeLog:
>>
>>         * config/aarch64/aarch64.c (<LOGICAL:optab>_one_cmpl<mode>3):
>>         Reparameterize to...
>>         (<NLOGICAL:optab>_one_cmpl<mode>3): with extra SIMD-register
>> variant.
>>         (xor_one_cmpl<mode>3): New define_insn_and_split.
>>
>>         * config/aarch64/iterators.md (NLOGICAL): New
>> define_code_iterator.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.target/aarch64/eon_1.c: New test.

OK /Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR
  2014-08-12 14:38 [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence
                   ` (2 preceding siblings ...)
  2014-08-12 14:48 ` [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON Alan Lawrence
@ 2014-12-19 18:04 ` Alan Lawrence
  3 siblings, 0 replies; 15+ messages in thread
From: Alan Lawrence @ 2014-12-19 18:04 UTC (permalink / raw)
  To: gcc-patches

I've now committed all three of these patches, as r/218958 r/218960 and 
r/218961, after investigating the effect of the first two on a range of 
benchmarks (Spec2000, Geekbench, Spec2k6) and finding almost no effect on 
codegen and no significant performance difference on Cortex-A53 or Cortex-A57.

Cheers, Alan

On Tue, 12 Aug 2014 15:38:09, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Following the change to make arm_neon.h's (u?)int64x1_t types into vectors, these types are now passed in the SIMD registers rather than general purpose registers, which often results in poor quality code in functions taking or returning these types. Often values are moved from vector registers into GPRs, an operation performed, and the value moved back - yet the architecture is capable of performing the operation directly on the SIMD registers. Hence these patches are small tweaks to the relevant patterns.
> 
> 
> The third patch, allowing AND+OR directly on values in SIMD registers, is more complicated, as the XOR pattern there was never matched (due to action of simplify_rtx) - the new pattern should also start to be used in GPR registers.
> 
> 
> Regression-tested check-gcc check-g++ on aarch64-none-elf and aarch64_be-none-elf.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-12-19 18:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-12 14:38 [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence
2014-08-12 14:40 ` [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers Alan Lawrence
2014-08-12 15:53   ` pinskia
2014-08-13  8:36     ` James Greenhalgh
2014-08-13 16:08       ` Vladimir Makarov
2014-08-18 16:50     ` Alan Lawrence
2014-09-02 15:08       ` Marcus Shawcroft
2014-08-12 14:44 ` [PATCH AArch64 2/3] Add SIMD-reg variants of logical operators and/ior/xor/not Alan Lawrence
2014-09-02 15:10   ` Marcus Shawcroft
2014-08-12 14:48 ` [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON Alan Lawrence
2014-08-12 14:55   ` Alan Lawrence
2014-08-13  2:42     ` Kugan
2014-08-13 10:59       ` Alan Lawrence
2014-09-02 15:14     ` Marcus Shawcroft
2014-12-19 18:04 ` [PATCH AArch64 0/3] Optimizations for 64x1 vectors, also fixes/enables XOR Alan Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).