[AArch64] Fix sve/extract_[12].c for big-endian SVE

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@linaro.org>
To: gcc-patches@gcc.gnu.org
Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com,
	marcus.shawcroft@arm.com
Subject: [AArch64] Fix sve/extract_[12].c for big-endian SVE
Date: Fri, 26 Jan 2018 13:46:00 -0000	[thread overview]
Message-ID: <87d11wpx0r.fsf@linaro.org> (raw)

sve/extract_[12].c were relying on the target-independent optimisation
that removes a redundant vec_select, so that we don't end up with
things like:

    dup v0.4s, v0.4s[0]
    ...use s0...

But that optimisation rightly doesn't trigger for big-endian targets,
because GCC expects lane 0 to be in the high part of the register
rather than the low part.

SVE breaks this assumption -- see the comment at the head of
aarch64-sve.md for details -- so the optimisation is valid for
both endiannesses.  Long term, we probably need some kind of target
hook to make GCC aware of this.

But there's another problem with the current extract pattern: it doesn't
tell the register allocator how cheap an extraction of lane 0 is with
tied registers.  It seems better to split the lane 0 case out into
its own pattern and use tied operands for the FPR<-SIMD case,
so that using different registers has the cost of an extra reload.
I think we want this for both endiannesses, regardless of the hook
described above.

Also, the gen_lowpart in this pattern fails for aarch64_be due to
TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
instead.  We're only creating this rtl in order to print it, so there's
no need for anything fancier.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New
	pattern.
	(*vec_extract<mode><Vel>_v128): Require a nonzero lane number.
	Use gen_rtx_REG rather than gen_lowpart.

Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-01-13 18:01:51.232735405 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2018-01-26 13:26:50.176756711 +0000
@@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>"
   }
 )
 
+;; Extract element zero.  This is a special case because we want to force
+;; the registers to be the same for the second alternative, and then
+;; split the instruction into nothing after RA.
+(define_insn_and_split "*vec_extract<mode><Vel>_0"
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+	(vec_select:<VEL>
+	  (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
+	  (parallel [(const_int 0)])))]
+  "TARGET_SVE"
+  {
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
+    switch (which_alternative)
+      {
+	case 0:
+	  return "umov\\t%<vwcore>0, %1.<Vetype>[0]";
+	case 1:
+	  return "#";
+	case 2:
+	  return "st1\\t{%1.<Vetype>}[0], %0";
+	default:
+	  gcc_unreachable ();
+      }
+  }
+  "&& reload_completed
+   && REG_P (operands[1])
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(const_int 0)]
+  {
+    emit_note (NOTE_INSN_DELETED);
+    DONE;
+  }
+  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
+)
+
 ;; Extract an element from the Advanced SIMD portion of the register.
 ;; We don't just reuse the aarch64-simd.md pattern because we don't
-;; want any chnage in lane number on big-endian targets.
+;; want any change in lane number on big-endian targets.
 (define_insn "*vec_extract<mode><Vel>_v128"
   [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
 	(vec_select:<VEL>
 	  (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
 	  (parallel [(match_operand:SI 2 "const_int_operand")])))]
   "TARGET_SVE
-   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 1, 15)"
   {
-    operands[1] = gen_lowpart (<V128>mode, operands[1]);
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
     switch (which_alternative)
       {
 	case 0:

next             reply	other threads:[~2018-01-26 13:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-26 13:46 Richard Sandiford [this message]
2018-01-26 14:22 ` Kyrill Tkachov
2018-01-26 15:27   ` Richard Sandiford
2018-01-29 12:37     ` James Greenhalgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d11wpx0r.fsf@linaro.org \
    --to=richard.sandiford@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=james.greenhalgh@arm.com \
    --cc=marcus.shawcroft@arm.com \
    --cc=richard.earnshaw@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).