From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-387195-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 2901 invoked by alias); 11 Dec 2014 15:59:29 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 2887 invoked by uid 89); 11 Dec 2014 15:59:28 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 11 Dec 2014 15:59:25 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by service87.mimecast.com; Thu, 11 Dec 2014 15:59:21 +0000
Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Thu, 11 Dec 2014 15:59:20 +0000
Message-ID: <5489BF58.9090008@arm.com>
Date: Thu, 11 Dec 2014 15:59:00 -0000
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: Michael Meissner <meissner@linux.vnet.ibm.com>,  Segher Boessenkool <segher@kernel.crashing.org>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Ping: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
In-Reply-To: <5463ABD0.8020004@arm.com>
X-MC-Unique: 114121115592104301
Content-Type: multipart/mixed; boundary="------------020100090706070703040306"
X-IsSubscribed: yes
X-SW-Source: 2014-12/txt/msg01024.txt.bz2

This is a multi-part message in MIME format.
--------------020100090706070703040306
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-length: 6118

So I'm afraid I'm not going to get involved in a discussion about=20
CANNOT_CHANGE_MODE_CLASS on RS6000, and what you might want to do there - s=
orry,=20
but I don't think I can really contribute anything there. However, I *am* t=
rying=20
to migrate all platforms off the old reduc_xxx optabs to the new version=20
producing a scalar.

Hence, can I ping the attached patch (which is just a simple combination of=
 the=20
previously-posted patch + snippet)? No regressions on gcc112.fsffrance.org.

This works in exactly the same way as the old code path, with a second insn=
 to=20
pull the scalar result out of the reduction, just as the expander would hav=
e=20
done (or the bitfieldref before that), and avoiding the v2df combine patter=
n=20
(again, as previously).

gcc/ChangeLog:

     * config/rs6000/altivec.md (reduc_splus_<mode>): Rename to...
     (reduc_plus_scal_<mode>): ...this, add rs6000_expand_vector_extract.
     (reduc_uplus_v16qi): Remove.

     * config/rs6000/vector.md (VEC_reduc_name): change "splus" to "plus"
     (reduc_<VEC_reduc_name>_v2df): Remove.
     (reduc_<VEC_reduc_name>_scal_v2df): New.
     (reduc_<VEC_reduc_name>_v4sf): Rename to...
     (reduc_<VEC_reduc_name>_scal_v4sf): ...this, wrap VEC_reduc in a
     vec_select of element 3, add scratch register.




> Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) =
using this snippet on top of original patch; no regressions.
>=20
>=20
> Alan Lawrence wrote:
>=20
>     So I'm no expert on RS6000 here, but following on from Segher's obser=
vation about the change in pattern...so the difference in 'expand' is exact=
ly that, a vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a =
vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by comb=
ining the two previous insns.
>=20
>=20
>     However, inspecting the logs from -fdump-rtl-combine-all, *without* m=
y patch, when the combiner tries to put those two together, I see:
>=20
>=20
>     Trying 30 -> 31:
>     Failed to match this instruction:
>     (set (reg:DF 179 [ stmp_s_5.7D.2196 ])
>          (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ vect_=
s_5.6D.2195 ])
>                      (parallel [
>                              (const_int 1 [0x1])
>                              (const_int 0 [0])
>                          ]))
>                  (reg:V2DF 173 [ vect_s_5.6D.2195 ]))
>              (parallel [
>                      (const_int 1 [0x1])
>                  ])))
>=20
>     That is, it looks like combine_simplify_rtx has transformed the (vec_=
concat (vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df =
insn, into a single vec_select, which does not match the vsx_reduc_plus_v2d=
f_scalar insn.
>=20
>=20
>     So despite the comment (in vsx.md):
>=20
>     ;; Combiner patterns with the vector reduction patterns that knows we=
 can get
>     ;; to the top element of the V2DF array without doing an extract.
>=20
>     It looks like the code generation prior to my patch, considered bette=
r, was because the combiner didn't actually use the pattern?
>=20
>=20
>     In that case whilst you may want to dig into register allocation, can=
not_change_mode_class, etc., for other reasons, I think the best fix for mi=
grating to reduc_plus_scal... is simply to avoid using the "Combiner" patte=
rns and just emit two insns, the old pattern followed by a vec_extract. The=
 attached snippet does this (I won't call it a patch yet, and it applies on=
 top of the previous patch - I went the route of calling the two gen functi=
ons rather than copying their RTL sequences, but could do the latter if tha=
t were preferable???), and restores code generation to the original form on=
 your example above; it bootstraps OK but I'm still running check-gcc on th=
e Compile Farm...
>=20
>=20
>     However, again on your example above, I note that if I *remove* the r=
educ_plus_scal_v2df pattern altogether, I get:
>=20
>=20
>     .sum:
>              li 10,512        # 52   *movdi_internal64/4     [length =3D =
4]
>              ld 9,.LC2@toc(2)         # 20   *movdi_internal64/2     [len=
gth =3D 4]
>              xxlxor 0,0,0     # 17   *vsx_movv2df/12 [length =3D 4]
>              mtctr 10         # 48   *movdi_internal64/11    [length =3D =
4]
>              .align 4
>     .L2:
>              lxvd2x 12,0,9    # 23   *vsx_movv2df/2  [length =3D 4]
>              addi 9,9,16      # 25   *adddi3_internal1/2     [length =3D =
4]
>              xvadddp 0,0,12   # 24   *vsx_addv2df3/1 [length =3D 4]
>              bdnz .L2         # 47   *ctrdi_internal1/1      [length =3D =
4]
>              xxsldwi 12,0,0,2         # 30   vsx_xxsldwi_v2df        [len=
gth =3D 4]
>              xvadddp 1,0,12   # 31   *vsx_addv2df3/1 [length =3D 4]
>              nop      # 37   *vsx_extract_v2df_internal2/1   [length =3D =
4]
>              blr      # 55   return  [length =3D 4]
>=20
>     this is presumably using gcc's scalar reduction code, but (to my untr=
ained eye on powerpc!) it looks even better than the first form above (the =
same in the loop, and in the reduction, an xxpermdi is replaced by a nop !)=
...
>=20
>=20
>     --Alan
>=20
>=20
>     Segher Boessenkool wrote:
>=20
>         On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote:
>=20
>             However, the double pattern is completely broken.  This canno=
t go in.
>=20
>         [snip]
>=20
>             It is unacceptable to have to do the inner loop doing a load,=
 vector add, and
>             store in the loop.
>=20
>         Before the patch, the final reduction used *vsx_reduc_splus_v2df;=
 after
>         the patch, it is *vsx_reduc_plus_v2df_scalar.  The former does a =
vector
>         add, the latter a float add.  And it uses the same pseudoregister=
 for the
>         accumulator throughout.  IRA decides a register is more expensive=
 than
>         memory for this, I suppose because it wants both V2DF and DF?  It=
 doesn't
>         seem to like the subreg very much.
>=20
>         The new code does look nicer otherwise :-)
>=20
>=20
>         Segher

--------------020100090706070703040306
Content-Type: text/x-patch; name=vec_rs6000_combined.patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline;
 filename="vec_rs6000_combined.patch"
Content-length: 4661
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index d46ef191409211a0a9d213b57fb4e657cd5a3cb4..b79f7aa477ec700743ae7c57346=
72571e395b79d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2596,35 +2596,22 @@
   operands[3] =3D gen_reg_rtx (GET_MODE (operands[0]));
 })
=20
-(define_expand "reduc_splus_<mode>"
-  [(set (match_operand:VIshort 0 "register_operand" "=3Dv")
+(define_expand "reduc_plus_scal_<mode>"
+  [(set (match_operand:<VI_scalar> 0 "register_operand" "=3Dv")
         (unspec:VIshort [(match_operand:VIshort 1 "register_operand" "v")]
 			UNSPEC_REDUC_PLUS))]
   "TARGET_ALTIVEC"
 {
   rtx vzero =3D gen_reg_rtx (V4SImode);
   rtx vtmp1 =3D gen_reg_rtx (V4SImode);
-  rtx dest =3D gen_lowpart (V4SImode, operands[0]);
+  rtx vtmp2 =3D gen_reg_rtx (<MODE>mode);
+  rtx dest =3D gen_lowpart (V4SImode, vtmp2);
+  HOST_WIDE_INT last_elem =3D GET_MODE_NUNITS (<MODE>mode) - 1;
=20
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4s<VI_char>s (vtmp1, operands[1], vzero));
   emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
-  DONE;
-})
-
-(define_expand "reduc_uplus_v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=3Dv")
-        (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
-		      UNSPEC_REDUC_PLUS))]
-  "TARGET_ALTIVEC"
-{
-  rtx vzero =3D gen_reg_rtx (V4SImode);
-  rtx vtmp1 =3D gen_reg_rtx (V4SImode);
-  rtx dest =3D gen_lowpart (V4SImode, operands[0]);
-
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
-  emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
-  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
+  rs6000_expand_vector_extract (operands[0], vtmp2, last_elem);
   DONE;
 })
=20
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index e2946bd6e312e909471253fc2d75a4b25e050f82..cc01a96cb681a61b0708b34ffa4=
339529f3c1b9e 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -77,7 +77,7 @@
 ;; Vector reduction code iterators
 (define_code_iterator VEC_reduc [plus smin smax])
=20
-(define_code_attr VEC_reduc_name [(plus "splus")
+(define_code_attr VEC_reduc_name [(plus "plus")
 				  (smin "smin")
 				  (smax "smax")])
=20
@@ -990,20 +990,16 @@
 =0C
 ;; Vector reduction expanders for VSX
=20
-(define_expand "reduc_<VEC_reduc_name>_v2df"
-  [(parallel [(set (match_operand:V2DF 0 "vfloat_operand" "")
-		   (VEC_reduc:V2DF
-		    (vec_concat:V2DF
-		     (vec_select:DF
-		      (match_operand:V2DF 1 "vfloat_operand" "")
-		      (parallel [(const_int 1)]))
-		     (vec_select:DF
-		      (match_dup 1)
-		      (parallel [(const_int 0)])))
-		    (match_dup 1)))
-	      (clobber (match_scratch:V2DF 2 ""))])]
+(define_expand "reduc_<VEC_reduc_name>_scal_v2df"
+  [(match_operand:DF 0 "register_operand" "")
+   (VEC_reduc:V2DF (match_operand:V2DF 1 "vfloat_operand" "") (const_int 0=
))]
   "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "")
+  {
+    rtx vec =3D gen_reg_rtx (V2DFmode);
+    emit_insn (gen_vsx_reduc_<VEC_reduc_name>_v2df (vec, operand1));
+    emit_insn (gen_vsx_extract_v2df (operand0, vec, const1_rtx));
+    DONE;
+  })
=20
 ; The (VEC_reduc:V4SF
 ;	(op1)
@@ -1012,13 +1008,16 @@
 ; is to allow us to use a code iterator, but not completely list all of the
 ; vector rotates, etc. to prevent canonicalization
=20
-(define_expand "reduc_<VEC_reduc_name>_v4sf"
-  [(parallel [(set (match_operand:V4SF 0 "vfloat_operand" "")
-		   (VEC_reduc:V4SF
-		    (unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
-		    (match_operand:V4SF 1 "vfloat_operand" "")))
+(define_expand "reduc_<VEC_reduc_name>_scal_v4sf"
+  [(parallel [(set (match_operand:SF 0 "vfloat_operand" "")
+		   (vec_select:SF
+		    (VEC_reduc:V4SF
+		     (unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
+		     (match_operand:V4SF 1 "vfloat_operand" ""))
+		    (parallel [(const_int 3)])))
 	      (clobber (match_scratch:V4SF 2 ""))
-	      (clobber (match_scratch:V4SF 3 ""))])]
+	      (clobber (match_scratch:V4SF 3 ""))
+	      (clobber (match_scratch:V4SF 4 ""))])]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "")
=20
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 27d464e07f7b77166047dd9ba41966aef411c029..2a30039c2bf415ab92fbd0b8067=
87a74b15d7dcb 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2150,7 +2150,7 @@
 =0C
 ;; Vector reduction insns and splitters
=20
-(define_insn_and_split "*vsx_reduc_<VEC_reduc_name>_v2df"
+(define_insn_and_split "vsx_reduc_<VEC_reduc_name>_v2df"
   [(set (match_operand:V2DF 0 "vfloat_operand" "=3D&wd,&?wa,wd,?wa")
 	(VEC_reduc:V2DF
 	 (vec_concat:V2DF=

--------------020100090706070703040306--