public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Alan Lawrence <alan.lawrence@arm.com>
To: Michael Meissner <meissner@linux.vnet.ibm.com>,
	 Segher Boessenkool <segher@kernel.crashing.org>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Ping: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
Date: Thu, 11 Dec 2014 15:59:00 -0000	[thread overview]
Message-ID: <5489BF58.9090008@arm.com> (raw)
In-Reply-To: <5463ABD0.8020004@arm.com>

[-- Attachment #1: Type: text/plain, Size: 6041 bytes --]

So I'm afraid I'm not going to get involved in a discussion about 
CANNOT_CHANGE_MODE_CLASS on RS6000, and what you might want to do there - sorry, 
but I don't think I can really contribute anything there. However, I *am* trying 
to migrate all platforms off the old reduc_xxx optabs to the new version 
producing a scalar.

Hence, can I ping the attached patch (which is just a simple combination of the 
previously-posted patch + snippet)? No regressions on gcc112.fsffrance.org.

This works in exactly the same way as the old code path, with a second insn to 
pull the scalar result out of the reduction, just as the expander would have 
done (or the bitfieldref before that), and avoiding the v2df combine pattern 
(again, as previously).

gcc/ChangeLog:

     * config/rs6000/altivec.md (reduc_splus_<mode>): Rename to...
     (reduc_plus_scal_<mode>): ...this, add rs6000_expand_vector_extract.
     (reduc_uplus_v16qi): Remove.

     * config/rs6000/vector.md (VEC_reduc_name): change "splus" to "plus"
     (reduc_<VEC_reduc_name>_v2df): Remove.
     (reduc_<VEC_reduc_name>_scal_v2df): New.
     (reduc_<VEC_reduc_name>_v4sf): Rename to...
     (reduc_<VEC_reduc_name>_scal_v4sf): ...this, wrap VEC_reduc in a
     vec_select of element 3, add scratch register.




> Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using this snippet on top of original patch; no regressions.
> 
> 
> Alan Lawrence wrote:
> 
>     So I'm no expert on RS6000 here, but following on from Segher's observation about the change in pattern...so the difference in 'expand' is exactly that, a vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combining the two previous insns.
> 
> 
>     However, inspecting the logs from -fdump-rtl-combine-all, *without* my patch, when the combiner tries to put those two together, I see:
> 
> 
>     Trying 30 -> 31:
>     Failed to match this instruction:
>     (set (reg:DF 179 [ stmp_s_5.7D.2196 ])
>          (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ vect_s_5.6D.2195 ])
>                      (parallel [
>                              (const_int 1 [0x1])
>                              (const_int 0 [0])
>                          ]))
>                  (reg:V2DF 173 [ vect_s_5.6D.2195 ]))
>              (parallel [
>                      (const_int 1 [0x1])
>                  ])))
> 
>     That is, it looks like combine_simplify_rtx has transformed the (vec_concat (vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df insn, into a single vec_select, which does not match the vsx_reduc_plus_v2df_scalar insn.
> 
> 
>     So despite the comment (in vsx.md):
> 
>     ;; Combiner patterns with the vector reduction patterns that knows we can get
>     ;; to the top element of the V2DF array without doing an extract.
> 
>     It looks like the code generation prior to my patch, considered better, was because the combiner didn't actually use the pattern?
> 
> 
>     In that case whilst you may want to dig into register allocation, cannot_change_mode_class, etc., for other reasons, I think the best fix for migrating to reduc_plus_scal... is simply to avoid using the "Combiner" patterns and just emit two insns, the old pattern followed by a vec_extract. The attached snippet does this (I won't call it a patch yet, and it applies on top of the previous patch - I went the route of calling the two gen functions rather than copying their RTL sequences, but could do the latter if that were preferable???), and restores code generation to the original form on your example above; it bootstraps OK but I'm still running check-gcc on the Compile Farm...
> 
> 
>     However, again on your example above, I note that if I *remove* the reduc_plus_scal_v2df pattern altogether, I get:
> 
> 
>     .sum:
>              li 10,512        # 52   *movdi_internal64/4     [length = 4]
>              ld 9,.LC2@toc(2)         # 20   *movdi_internal64/2     [length = 4]
>              xxlxor 0,0,0     # 17   *vsx_movv2df/12 [length = 4]
>              mtctr 10         # 48   *movdi_internal64/11    [length = 4]
>              .align 4
>     .L2:
>              lxvd2x 12,0,9    # 23   *vsx_movv2df/2  [length = 4]
>              addi 9,9,16      # 25   *adddi3_internal1/2     [length = 4]
>              xvadddp 0,0,12   # 24   *vsx_addv2df3/1 [length = 4]
>              bdnz .L2         # 47   *ctrdi_internal1/1      [length = 4]
>              xxsldwi 12,0,0,2         # 30   vsx_xxsldwi_v2df        [length = 4]
>              xvadddp 1,0,12   # 31   *vsx_addv2df3/1 [length = 4]
>              nop      # 37   *vsx_extract_v2df_internal2/1   [length = 4]
>              blr      # 55   return  [length = 4]
> 
>     this is presumably using gcc's scalar reduction code, but (to my untrained eye on powerpc!) it looks even better than the first form above (the same in the loop, and in the reduction, an xxpermdi is replaced by a nop !)...
> 
> 
>     --Alan
> 
> 
>     Segher Boessenkool wrote:
> 
>         On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote:
> 
>             However, the double pattern is completely broken.  This cannot go in.
> 
>         [snip]
> 
>             It is unacceptable to have to do the inner loop doing a load, vector add, and
>             store in the loop.
> 
>         Before the patch, the final reduction used *vsx_reduc_splus_v2df; after
>         the patch, it is *vsx_reduc_plus_v2df_scalar.  The former does a vector
>         add, the latter a float add.  And it uses the same pseudoregister for the
>         accumulator throughout.  IRA decides a register is more expensive than
>         memory for this, I suppose because it wants both V2DF and DF?  It doesn't
>         seem to like the subreg very much.
> 
>         The new code does look nicer otherwise :-)
> 
> 
>         Segher

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: vec_rs6000_combined.patch --]
[-- Type: text/x-patch; name=vec_rs6000_combined.patch, Size: 4721 bytes --]

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index d46ef191409211a0a9d213b57fb4e657cd5a3cb4..b79f7aa477ec700743ae7c5734672571e395b79d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2596,35 +2596,22 @@
   operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
 })
 
-(define_expand "reduc_splus_<mode>"
-  [(set (match_operand:VIshort 0 "register_operand" "=v")
+(define_expand "reduc_plus_scal_<mode>"
+  [(set (match_operand:<VI_scalar> 0 "register_operand" "=v")
         (unspec:VIshort [(match_operand:VIshort 1 "register_operand" "v")]
 			UNSPEC_REDUC_PLUS))]
   "TARGET_ALTIVEC"
 {
   rtx vzero = gen_reg_rtx (V4SImode);
   rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
+  rtx vtmp2 = gen_reg_rtx (<MODE>mode);
+  rtx dest = gen_lowpart (V4SImode, vtmp2);
+  HOST_WIDE_INT last_elem = GET_MODE_NUNITS (<MODE>mode) - 1;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4s<VI_char>s (vtmp1, operands[1], vzero));
   emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
-  DONE;
-})
-
-(define_expand "reduc_uplus_v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-        (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
-		      UNSPEC_REDUC_PLUS))]
-  "TARGET_ALTIVEC"
-{
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
-
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
-  emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
-  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
+  rs6000_expand_vector_extract (operands[0], vtmp2, last_elem);
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index e2946bd6e312e909471253fc2d75a4b25e050f82..cc01a96cb681a61b0708b34ffa4339529f3c1b9e 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -77,7 +77,7 @@
 ;; Vector reduction code iterators
 (define_code_iterator VEC_reduc [plus smin smax])
 
-(define_code_attr VEC_reduc_name [(plus "splus")
+(define_code_attr VEC_reduc_name [(plus "plus")
 				  (smin "smin")
 				  (smax "smax")])
 
@@ -990,20 +990,16 @@
 \f
 ;; Vector reduction expanders for VSX
 
-(define_expand "reduc_<VEC_reduc_name>_v2df"
-  [(parallel [(set (match_operand:V2DF 0 "vfloat_operand" "")
-		   (VEC_reduc:V2DF
-		    (vec_concat:V2DF
-		     (vec_select:DF
-		      (match_operand:V2DF 1 "vfloat_operand" "")
-		      (parallel [(const_int 1)]))
-		     (vec_select:DF
-		      (match_dup 1)
-		      (parallel [(const_int 0)])))
-		    (match_dup 1)))
-	      (clobber (match_scratch:V2DF 2 ""))])]
+(define_expand "reduc_<VEC_reduc_name>_scal_v2df"
+  [(match_operand:DF 0 "register_operand" "")
+   (VEC_reduc:V2DF (match_operand:V2DF 1 "vfloat_operand" "") (const_int 0))]
   "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "")
+  {
+    rtx vec = gen_reg_rtx (V2DFmode);
+    emit_insn (gen_vsx_reduc_<VEC_reduc_name>_v2df (vec, operand1));
+    emit_insn (gen_vsx_extract_v2df (operand0, vec, const1_rtx));
+    DONE;
+  })
 
 ; The (VEC_reduc:V4SF
 ;	(op1)
@@ -1012,13 +1008,16 @@
 ; is to allow us to use a code iterator, but not completely list all of the
 ; vector rotates, etc. to prevent canonicalization
 
-(define_expand "reduc_<VEC_reduc_name>_v4sf"
-  [(parallel [(set (match_operand:V4SF 0 "vfloat_operand" "")
-		   (VEC_reduc:V4SF
-		    (unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
-		    (match_operand:V4SF 1 "vfloat_operand" "")))
+(define_expand "reduc_<VEC_reduc_name>_scal_v4sf"
+  [(parallel [(set (match_operand:SF 0 "vfloat_operand" "")
+		   (vec_select:SF
+		    (VEC_reduc:V4SF
+		     (unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
+		     (match_operand:V4SF 1 "vfloat_operand" ""))
+		    (parallel [(const_int 3)])))
 	      (clobber (match_scratch:V4SF 2 ""))
-	      (clobber (match_scratch:V4SF 3 ""))])]
+	      (clobber (match_scratch:V4SF 3 ""))
+	      (clobber (match_scratch:V4SF 4 ""))])]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "")
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 27d464e07f7b77166047dd9ba41966aef411c029..2a30039c2bf415ab92fbd0b806787a74b15d7dcb 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2150,7 +2150,7 @@
 \f
 ;; Vector reduction insns and splitters
 
-(define_insn_and_split "*vsx_reduc_<VEC_reduc_name>_v2df"
+(define_insn_and_split "vsx_reduc_<VEC_reduc_name>_v2df"
   [(set (match_operand:V2DF 0 "vfloat_operand" "=&wd,&?wa,wd,?wa")
 	(VEC_reduc:V2DF
 	 (vec_concat:V2DF

  reply	other threads:[~2014-12-11 15:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-24 11:57 [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral Alan Lawrence
2014-10-24 11:58 ` [PATCH 7/11][ARM] Migrate to new reduc_plus_scal_optab Alan Lawrence
2014-11-03 17:32   ` Ramana Radhakrishnan
2014-10-24 12:01 ` [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral Richard Biener
2014-10-24 12:05 ` [PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab Alan Lawrence
2014-11-04 11:08   ` Ramana Radhakrishnan
2014-10-24 12:06 ` [PATCH 9/11][i386] Migrate reduction optabs to reduc_..._scal Alan Lawrence
2014-10-24 12:07 ` [PATCH 10/11][RS6000] " Alan Lawrence
2014-10-24 12:14   ` Alan Lawrence
2014-10-25  0:08   ` David Edelsohn
2014-11-03 17:51     ` Bill Schmidt
2014-11-06 16:44       ` Alan Lawrence
2014-11-06 18:57         ` Bill Schmidt
2014-11-07 10:09           ` Alan Lawrence
2014-11-10 22:39   ` Michael Meissner
2014-11-11  7:10     ` Segher Boessenkool
2014-11-12  1:54       ` Michael Meissner
2014-11-12  9:26         ` Segher Boessenkool
2014-11-12 19:20           ` Michael Meissner
2014-11-12 12:32       ` Alan Lawrence
2014-11-12 18:53         ` Alan Lawrence
2014-12-11 15:59           ` Alan Lawrence [this message]
2014-12-11 18:37             ` Ping: " Alan Lawrence
2014-10-24 12:12 ` [Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab Alan Lawrence
2014-10-24 12:50   ` Alan Lawrence
2014-10-24 15:19 ` [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral Matthew Fortune
2014-10-27 11:48   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5489BF58.9090008@arm.com \
    --to=alan.lawrence@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=meissner@linux.vnet.ibm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).