public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][committed] aarch64: Cost vector comparisons more accurately
@ 2023-05-15 11:05 Kyrylo Tkachov
  0 siblings, 0 replies; only message in thread
From: Kyrylo Tkachov @ 2023-05-15 11:05 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]

Hi all,

We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate:
foo:
        fabs    v3.4s, v0.4s
        fabs    v0.4s, v1.4s
        fabs    v1.4s, v2.4s
        fcmgt   v0.4s, v3.4s, v0.4s
        fcmgt   v1.4s, v3.4s, v1.4s
        b       g

This is because combine is rejecting the pattern due to costs:
Successfully matched this instruction:
(set (reg:V4SI 106)
    (neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113))
            (abs:V4SF (reg:V4SF 111)))))
rejecting combination of insns 8, 9 and 10
original costs 8 + 8 + 12 = 28
replacement costs 8 + 28 = 36

It is obviously recursing in the various arms of the RTX and such.
This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of
compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get
the much more reasonable dump:
original costs 8 + 8 + 8 = 24
replacement costs 8 + 8 = 16
and generate the optimal assembly:
foo:
        mov     v31.16b, v0.16b
        facgt   v0.4s, v0.4s, v1.4s
        facgt   v1.4s, v31.4s, v2.4s
        b       g

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_rtx_costs, NEG case): Add costing
	logic for vector modes.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/facg_1.c: New test.

[-- Attachment #2: vcmpcst.patch --]
[-- Type: application/octet-stream, Size: 1781 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 101d4c91813564eb780566d12e4bb4f25f7bd2f6..6360be3887d57486427c4bc405c081558fc03c5a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14142,6 +14142,27 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED,
 
       if (VECTOR_MODE_P (mode))
 	{
+	  /* Many vector comparison operations are represented as NEG
+	     of a comparison.  */
+	  if (COMPARISON_P (op0))
+	    {
+	      rtx op00 = XEXP (op0, 0);
+	      rtx op01 = XEXP (op0, 1);
+	      machine_mode inner_mode = GET_MODE (op00);
+	      /* FACGE/FACGT.  */
+	      if (GET_MODE_CLASS (inner_mode) == MODE_VECTOR_FLOAT
+		  && GET_CODE (op00) == ABS
+		  && GET_CODE (op01) == ABS)
+		{
+		  op00 = XEXP (op00, 0);
+		  op01 = XEXP (op01, 0);
+		}
+	      *cost += rtx_cost (op00, inner_mode, GET_CODE (op0), 0, speed);
+	      *cost += rtx_cost (op01, inner_mode, GET_CODE (op0), 1, speed);
+	      if (speed)
+		*cost += extra_cost->vect.alu;
+	      return true;
+	    }
 	  if (speed)
 	    {
 	      /* FNEG.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/facg_1.c b/gcc/testsuite/gcc.target/aarch64/facg_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..6c17fb6f8876ca3cb74c9b40ef117ec00c97c89d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/facg_1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include <arm_neon.h>
+
+int g(uint32x4_t, uint32x4_t);
+
+int foo (float32x4_t x, float32x4_t a, float32x4_t b)
+{
+  return g(vcagtq_f32 (x, a), vcagtq_f32 (x, b));
+}
+
+/* { dg-final { scan-assembler-times {facgt\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.4s} 2 } } */
+/* { dg-final { scan-assembler-not {\tfabs\t} } } */
+

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-05-15 11:05 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 11:05 [PATCH][committed] aarch64: Cost vector comparisons more accurately Kyrylo Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).