public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rs6000] Fix endianness issue with vmrgew and vmrgow permute constant recognition
@ 2017-08-15 22:08 Bill Schmidt
  2017-08-15 22:09 ` Bill Schmidt
  2017-08-17  1:39 ` Segher Boessenkool
  0 siblings, 2 replies; 3+ messages in thread
From: Bill Schmidt @ 2017-08-15 22:08 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, David Edelsohn, cel

Hi,

One of Carl Love's proposed built-in function patches exposed a bug in the Power
code that recognizes specific permute control vector patterns for a permute, and
changes the permute to a more specific and more efficient instruction.  The
patterns for p8_vmrgew_v4si and p8_vmrgow are generated regardless of endianness,
leading to problems on the little-endian port.

The normal way that would cause us to generate these patterns is via the
vec_widen_[su]mult_{even,odd}_<mode> interfaces, which are not yet instantiated
for Power; hence it appears that we've gotten lucky not to run into this before.
Carl's proposed patch instantiated these interfaces, triggering the discovery of
the problem.

This patch simply changes the handling for p8_vmrg[eo]w to match how it's done
for all of the other common pack/merge/etc. patterns.

In altivec.md, we already had a p8_vmrgew_v4sf_direct insn that does what we want.
I generalized this for both V4SF and V4SI modes.  I then added a similar
p8_vmrgow_<mode>_direct define_insn.

The use in rs6000.c of p8_vmrgew_v4sf_direct, rather than p8_vmrgew_v4si_direct,
is arbitrary.  The existing code already handles converting (for free) a V4SI
operand to a V4SF one, so there's no need to specify the mode directly; and it
would actually complicate the code to extract the mode so the "proper" pattern
would match.  I think what I have here is better, but if you disagree I can
change it.

Bootstrapped and tested on powerpc64le-linux-gnu (P8 64-bit) and on
powerpc64-linux-gnu (P7 32- and 64-bit) with no regressions.  Is this okay for
trunk?

Thanks,
Bill


2017-08-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (UNSPEC_VMRGOW_DIRECT): New constant.
	(p8_vmrgew_v4sf_direct): Generalize to p8_vmrgew_<mode>_direct.
	(p8_vmrgow_<mode>_direct): New define_insn.
	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Properly
	handle endianness for vmrgew and vmrgow permute patterns.


Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 250965)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -148,6 +148,7 @@
    UNSPEC_VMRGL_DIRECT
    UNSPEC_VSPLT_DIRECT
    UNSPEC_VMRGEW_DIRECT
+   UNSPEC_VMRGOW_DIRECT
    UNSPEC_VSUMSWS_DIRECT
    UNSPEC_VADDCUQ
    UNSPEC_VADDEUQM
@@ -1357,15 +1358,24 @@
 }
   [(set_attr "type" "vecperm")])
 
-(define_insn "p8_vmrgew_v4sf_direct"
-  [(set (match_operand:V4SF 0 "register_operand" "=v")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
-		      (match_operand:V4SF 2 "register_operand" "v")]
+(define_insn "p8_vmrgew_<mode>_direct"
+  [(set (match_operand:VSX_W 0 "register_operand" "=v")
+	(unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v")
+		       (match_operand:VSX_W 2 "register_operand" "v")]
 		     UNSPEC_VMRGEW_DIRECT))]
   "TARGET_P8_VECTOR"
   "vmrgew %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
+(define_insn "p8_vmrgow_<mode>_direct"
+  [(set (match_operand:VSX_W 0 "register_operand" "=v")
+	(unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v")
+		       (match_operand:VSX_W 2 "register_operand" "v")]
+		     UNSPEC_VMRGOW_DIRECT))]
+  "TARGET_P8_VECTOR"
+  "vmrgow %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
 (define_expand "vec_widen_umult_even_v16qi"
   [(use (match_operand:V8HI 0 "register_operand" ""))
    (use (match_operand:V16QI 1 "register_operand" ""))
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 250965)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -35209,9 +35209,13 @@ altivec_expand_vec_perm_const (rtx operands[4])
       (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct
        : CODE_FOR_altivec_vmrghw_direct),
       {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
-    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew_v4si,
+    { OPTION_MASK_P8_VECTOR,
+      (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
+       : CODE_FOR_p8_vmrgow_v4sf_direct),
       {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
-    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
+    { OPTION_MASK_P8_VECTOR,
+      (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
+       : CODE_FOR_p8_vmrgew_v4sf_direct),
       {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
   };
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, rs6000] Fix endianness issue with vmrgew and vmrgow permute constant recognition
  2017-08-15 22:08 [PATCH, rs6000] Fix endianness issue with vmrgew and vmrgow permute constant recognition Bill Schmidt
@ 2017-08-15 22:09 ` Bill Schmidt
  2017-08-17  1:39 ` Segher Boessenkool
  1 sibling, 0 replies; 3+ messages in thread
From: Bill Schmidt @ 2017-08-15 22:09 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, David Edelsohn, cel

Hi,

I forgot to mention that I didn't include a test case.  Carl's upcoming patch will cause
this to be well tested with the existing test suite, so I think that's not needed.  Let me
know if you disagree.

Thanks,
Bill

> On Aug 15, 2017, at 4:14 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> 
> Hi,
> 
> One of Carl Love's proposed built-in function patches exposed a bug in the Power
> code that recognizes specific permute control vector patterns for a permute, and
> changes the permute to a more specific and more efficient instruction.  The
> patterns for p8_vmrgew_v4si and p8_vmrgow are generated regardless of endianness,
> leading to problems on the little-endian port.
> 
> The normal way that would cause us to generate these patterns is via the
> vec_widen_[su]mult_{even,odd}_<mode> interfaces, which are not yet instantiated
> for Power; hence it appears that we've gotten lucky not to run into this before.
> Carl's proposed patch instantiated these interfaces, triggering the discovery of
> the problem.
> 
> This patch simply changes the handling for p8_vmrg[eo]w to match how it's done
> for all of the other common pack/merge/etc. patterns.
> 
> In altivec.md, we already had a p8_vmrgew_v4sf_direct insn that does what we want.
> I generalized this for both V4SF and V4SI modes.  I then added a similar
> p8_vmrgow_<mode>_direct define_insn.
> 
> The use in rs6000.c of p8_vmrgew_v4sf_direct, rather than p8_vmrgew_v4si_direct,
> is arbitrary.  The existing code already handles converting (for free) a V4SI
> operand to a V4SF one, so there's no need to specify the mode directly; and it
> would actually complicate the code to extract the mode so the "proper" pattern
> would match.  I think what I have here is better, but if you disagree I can
> change it.
> 
> Bootstrapped and tested on powerpc64le-linux-gnu (P8 64-bit) and on
> powerpc64-linux-gnu (P7 32- and 64-bit) with no regressions.  Is this okay for
> trunk?
> 
> Thanks,
> Bill
> 
> 
> 2017-08-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (UNSPEC_VMRGOW_DIRECT): New constant.
> 	(p8_vmrgew_v4sf_direct): Generalize to p8_vmrgew_<mode>_direct.
> 	(p8_vmrgow_<mode>_direct): New define_insn.
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Properly
> 	handle endianness for vmrgew and vmrgow permute patterns.
> 
> 
> Index: gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc/config/rs6000/altivec.md	(revision 250965)
> +++ gcc/config/rs6000/altivec.md	(working copy)
> @@ -148,6 +148,7 @@
>    UNSPEC_VMRGL_DIRECT
>    UNSPEC_VSPLT_DIRECT
>    UNSPEC_VMRGEW_DIRECT
> +   UNSPEC_VMRGOW_DIRECT
>    UNSPEC_VSUMSWS_DIRECT
>    UNSPEC_VADDCUQ
>    UNSPEC_VADDEUQM
> @@ -1357,15 +1358,24 @@
> }
>   [(set_attr "type" "vecperm")])
> 
> -(define_insn "p8_vmrgew_v4sf_direct"
> -  [(set (match_operand:V4SF 0 "register_operand" "=v")
> -	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
> -		      (match_operand:V4SF 2 "register_operand" "v")]
> +(define_insn "p8_vmrgew_<mode>_direct"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=v")
> +	(unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v")
> +		       (match_operand:VSX_W 2 "register_operand" "v")]
> 		     UNSPEC_VMRGEW_DIRECT))]
>   "TARGET_P8_VECTOR"
>   "vmrgew %0,%1,%2"
>   [(set_attr "type" "vecperm")])
> 
> +(define_insn "p8_vmrgow_<mode>_direct"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=v")
> +	(unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v")
> +		       (match_operand:VSX_W 2 "register_operand" "v")]
> +		     UNSPEC_VMRGOW_DIRECT))]
> +  "TARGET_P8_VECTOR"
> +  "vmrgow %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> (define_expand "vec_widen_umult_even_v16qi"
>   [(use (match_operand:V8HI 0 "register_operand" ""))
>    (use (match_operand:V16QI 1 "register_operand" ""))
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 250965)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -35209,9 +35209,13 @@ altivec_expand_vec_perm_const (rtx operands[4])
>       (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct
>        : CODE_FOR_altivec_vmrghw_direct),
>       {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
> -    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew_v4si,
> +    { OPTION_MASK_P8_VECTOR,
> +      (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
> +       : CODE_FOR_p8_vmrgow_v4sf_direct),
>       {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
> -    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
> +    { OPTION_MASK_P8_VECTOR,
> +      (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
> +       : CODE_FOR_p8_vmrgew_v4sf_direct),
>       {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
>   };
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, rs6000] Fix endianness issue with vmrgew and vmrgow permute constant recognition
  2017-08-15 22:08 [PATCH, rs6000] Fix endianness issue with vmrgew and vmrgow permute constant recognition Bill Schmidt
  2017-08-15 22:09 ` Bill Schmidt
@ 2017-08-17  1:39 ` Segher Boessenkool
  1 sibling, 0 replies; 3+ messages in thread
From: Segher Boessenkool @ 2017-08-17  1:39 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: GCC Patches, David Edelsohn, cel

Hi!

On Tue, Aug 15, 2017 at 04:14:21PM -0500, Bill Schmidt wrote:
> One of Carl Love's proposed built-in function patches exposed a bug in the Power
> code that recognizes specific permute control vector patterns for a permute, and
> changes the permute to a more specific and more efficient instruction.  The
> patterns for p8_vmrgew_v4si and p8_vmrgow are generated regardless of endianness,
> leading to problems on the little-endian port.

Ouch.

> The use in rs6000.c of p8_vmrgew_v4sf_direct, rather than p8_vmrgew_v4si_direct,
> is arbitrary.  The existing code already handles converting (for free) a V4SI
> operand to a V4SF one, so there's no need to specify the mode directly; and it
> would actually complicate the code to extract the mode so the "proper" pattern
> would match.  I think what I have here is better, but if you disagree I can
> change it.

It's fine I think.

This is okay for trunk, thanks!


Segher

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-08-16 22:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-15 22:08 [PATCH, rs6000] Fix endianness issue with vmrgew and vmrgow permute constant recognition Bill Schmidt
2017-08-15 22:09 ` Bill Schmidt
2017-08-17  1:39 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).