public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [ARM] Implement division using vrecpe, vrecps
@ 2018-11-02 13:38 Wilco Dijkstra
  2018-11-05  4:55 ` Prathamesh Kulkarni
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Dijkstra @ 2018-11-02 13:38 UTC (permalink / raw)
  To: prathamesh.kulkarni; +Cc: nd, GCC Patches, Kyrylo Tkachov, Ramana Radhakrishnan

Prathamesh Kulkarni wrote:

> This is a rebased version of patch that adds a pattern to neon.md for
> implementing division with multiplication by reciprocal using
> vrecpe/vrecps with -funsafe-math-optimizations excluding -Os.
> The newly added test-cases are not vectorized on armeb target with
> -O2. I posted the analysis for that here:
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01765.html

I don't think doing this unconditionally for any CPU is a good idea. On AArch64
we don't enable this for any core since it's not really faster (newer CPUs have
significantly improved division and the reciprocal instructions reduce throughput
of other FMAs). On wrf doing reciprocal square root is far better than reciprocal
division, but it's only faster on some specific CPUs, so it's not enabled by default.

Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread
* [ARM] Implement division using vrecpe, vrecps
@ 2018-10-26  6:36 Prathamesh Kulkarni
  2018-11-02  9:39 ` Prathamesh Kulkarni
  2018-11-05 13:52 ` Ramana Radhakrishnan
  0 siblings, 2 replies; 7+ messages in thread
From: Prathamesh Kulkarni @ 2018-10-26  6:36 UTC (permalink / raw)
  To: gcc Patches, Kyrill Tkachov, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1031 bytes --]

Hi,
This is a rebased version of patch that adds a pattern to neon.md for
implementing division with multiplication by reciprocal using
vrecpe/vrecps with -funsafe-math-optimizations excluding -Os.
The newly added test-cases are not vectorized on armeb target with
-O2. I posted the analysis for that here:
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01765.html

Briefly, the difference between little and big-endian vectorizer is in
arm_builtin_support_vector_misalignment() which calls
default_builtin_support_vector_misalignment() for big-endian case, and
that returns false because
movmisalign_optab does not exist for V2SF mode. This isn't observed
with -O3 because loop peeling for alignment gets enabled.

It seems that the test cases in patch appear unsupported on armeb,
after r221677 thus this patch requires no changes to
target-supports.exp to adjust for armeb (unlike last time which
stalled the patch).

Bootstrap+tested on arm-linux-gnueabihf.
Cross-tested on arm*-*-* variants.
OK for trunk ?

Thanks,
Prathamesh

[-- Attachment #2: tcwg-319-3.txt --]
[-- Type: text/plain, Size: 3349 bytes --]

2018-10-26  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	* config/arm/neon.md (div<mode>3): New pattern.

testsuite/
	* gcc.target/arm/neon-vect-div-1.c: New test.
	* gcc.target/arm/neon-vect-div-2.c: Likewise.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 5aeee4b08c1..25ed45d381a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -620,6 +620,38 @@
                     (const_string "neon_mul_<V_elem_ch><q>")))]
 )
 
+/* Perform division using multiply-by-reciprocal.
+   Reciprocal is calculated using Newton-Raphson method.
+   Enabled with -funsafe-math-optimizations -freciprocal-math
+   and disabled for -Os since it increases code size .  */
+
+(define_expand "div<mode>3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
+		  (match_operand:VCVTF 2 "s_register_operand" "w")))]
+  "TARGET_NEON && !optimize_size
+   && flag_unsafe_math_optimizations && flag_reciprocal_math"
+  {
+    rtx rec = gen_reg_rtx (<MODE>mode);
+    rtx vrecps_temp = gen_reg_rtx (<MODE>mode);
+
+    /* Reciprocal estimate.  */
+    emit_insn (gen_neon_vrecpe<mode> (rec, operands[2]));
+
+    /* Perform 2 iterations of newton-raphson method.  */
+    for (int i = 0; i < 2; i++)
+      {
+	emit_insn (gen_neon_vrecps<mode> (vrecps_temp, rec, operands[2]));
+	emit_insn (gen_mul<mode>3 (rec, rec, vrecps_temp));
+      }
+
+    /* We now have reciprocal in rec, perform operands[0] = operands[1] * rec.  */
+    emit_insn (gen_mul<mode>3 (operands[0], operands[1], rec));
+    DONE;
+  }
+)
+
+
 (define_insn "mul<mode>3add<mode>_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
new file mode 100644
index 00000000000..50d04b4175b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
@@ -0,0 +1,16 @@
+/* Test pattern div<mode>3.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target vect_hw_misalign } */
+/* { dg-options "-O2 -ftree-vectorize -funsafe-math-optimizations -fdump-tree-vect-details" } */
+/* { dg-add-options arm_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+    p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
new file mode 100644
index 00000000000..606f54b4e0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
@@ -0,0 +1,16 @@
+/* Test pattern div<mode>3.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target vect_hw_misalign } */
+/* { dg-options "-O3 -ftree-vectorize -funsafe-math-optimizations -fdump-tree-vect-details -fno-reciprocal-math" } */
+/* { dg-add-options arm_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+    p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-not "vectorized 1 loops" "vect" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-09  6:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-02 13:38 [ARM] Implement division using vrecpe, vrecps Wilco Dijkstra
2018-11-05  4:55 ` Prathamesh Kulkarni
2018-11-05 13:35   ` Wilco Dijkstra
  -- strict thread matches above, loose matches on Subject: below --
2018-10-26  6:36 Prathamesh Kulkarni
2018-11-02  9:39 ` Prathamesh Kulkarni
2018-11-05 13:52 ` Ramana Radhakrishnan
2018-11-09  6:44   ` Prathamesh Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).