* [PATCH] Optimize signed DImode -> TImode on power10, PR target/104698
@ 2022-03-01 3:21 Michael Meissner
2022-03-01 19:30 ` will schmidt
0 siblings, 1 reply; 2+ messages in thread
From: Michael Meissner @ 2022-03-01 3:21 UTC (permalink / raw)
To: gcc-patches, Michael Meissner, Segher Boessenkool,
David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt
Optimize signed DImode -> TImode on power10, PR target/104698.
On power10, GCC tries to optimize the signed conversion from DImode to
TImode by using the vextsd2q instruction. However to generate this
instruction, it would have to generate 3 direct moves (1 from the GPR
registers to the altivec registers, and 2 from the altivec registers to
the GPR register).
This patch adds code back in to use the shift right immediate instruction
to do the conversion if the target/source is GPR registers.
2022-02-28 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/104698
* config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
(extendditi2): Replace with code to deal with both GPR registers
and with altivec registers.
---
gcc/config/rs6000/vsx.md | 73 ++++++++++++++++++++++++++++------------
1 file changed, 52 insertions(+), 21 deletions(-)
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b53de103872..62464f67f4d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5023,15 +5023,58 @@ (define_expand "vsignextend_si_v2di"
DONE;
})
-;; ISA 3.1 vector sign extend
-;; Move DI value from GPR to TI mode in VSX register, word 1.
-(define_insn "mtvsrdd_diti_w1"
- [(set (match_operand:TI 0 "register_operand" "=wa")
- (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
- UNSPEC_MTVSRD_DITI_W1))]
- "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
- "mtvsrdd %x0,0,%1"
- [(set_attr "type" "vecmove")])
+;; Sign extend DI to TI. We provide both GPR targets and Altivec targets. If
+;; the register allocator prefers the GPRs, we won't have to move the value to
+;; the altivec registers, do the vextsd2q instruction and move it back. If we
+;; aren't compiling for 64-bit power10, don't provide the service and let the
+;; machine independent code handle the extension.
+(define_insn_and_split "extendditi2"
+ [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
+ (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
+ (clobber (reg:DI CA_REGNO))]
+ "TARGET_POWERPC64 && TARGET_POWER10"
+ "#"
+ "&& reload_completed"
+ [(pc)]
+{
+ rtx dest = operands[0];
+ rtx src = operands[1];
+ int dest_regno = reg_or_subregno (dest);
+
+ /* Handle conversion to GPR registers. Load up the low part and then do
+ a sign extension to the upper part. */
+ if (INT_REGNO_P (dest_regno))
+ {
+ rtx dest_hi = gen_highpart (DImode, dest);
+ rtx dest_lo = gen_lowpart (DImode, dest);
+
+ emit_move_insn (dest_lo, src);
+ emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
+ DONE;
+ }
+
+ /* For conversion to Altivec register, generate either a splat operation or
+ a load rightmost double word instruction. Both instructions gets the
+ DImode value into the lower 64 bits, and then do the vextsd2q
+ instruction. */
+ else if (ALTIVEC_REGNO_P (dest_regno))
+ {
+ if (MEM_P (src))
+ emit_insn (gen_vsx_lxvrdx (dest, src));
+ else
+ {
+ rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
+ emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
+ }
+
+ emit_insn (gen_extendditi2_vector (dest, dest));
+ DONE;
+ }
+
+ else
+ gcc_unreachable ();
+}
+ [(set_attr "length" "8")])
;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
(define_insn "extendditi2_vector"
@@ -5042,18 +5085,6 @@ (define_insn "extendditi2_vector"
"vextsd2q %0,%1"
[(set_attr "type" "vecexts")])
-(define_expand "extendditi2"
- [(set (match_operand:TI 0 "gpc_reg_operand")
- (sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
- "TARGET_POWER10"
- {
- /* Move 64-bit src from GPR to vector reg and sign extend to 128-bits. */
- rtx temp = gen_reg_rtx (TImode);
- emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
- emit_insn (gen_extendditi2_vector (operands[0], temp));
- DONE;
- })
-
\f
;; ISA 3.0 Binary Floating-Point Support
--
2.35.1
--
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] Optimize signed DImode -> TImode on power10, PR target/104698
2022-03-01 3:21 [PATCH] Optimize signed DImode -> TImode on power10, PR target/104698 Michael Meissner
@ 2022-03-01 19:30 ` will schmidt
0 siblings, 0 replies; 2+ messages in thread
From: will schmidt @ 2022-03-01 19:30 UTC (permalink / raw)
To: Michael Meissner, gcc-patches, Segher Boessenkool,
David Edelsohn, Bill Schmidt, Peter Bergner
On Mon, 2022-02-28 at 22:21 -0500, Michael Meissner wrote:
> Optimize signed DImode -> TImode on power10, PR target/104698.
>
Hi,
Logic seems OK to me, a few suggestions on the comments intermixed
below. As always, i defer if there are counter arguments. :-)
> On power10, GCC tries to optimize the signed conversion from DImode to
> TImode by using the vextsd2q instruction. However to generate this
> instruction, it would have to generate 3 direct moves (1 from the GPR
> registers to the altivec registers, and 2 from the altivec registers to
> the GPR register).
>
> This patch adds code back in to use the shift right immediate instruction
> to do the conversion if the target/source is GPR registers.
Perhaps drop "back in". If it's necessary to call out a previous
commit that removed the code for whatever reason, certainly do so.
It's not clear from context if that was the case.
>
> 2022-02-28 Michael Meissner <meissner@linux.ibm.com>
>
> gcc/
> PR target/104698
> * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
> (extendditi2): Replace with code to deal with both GPR registers
> and with altivec registers.
Perhaps enhance with
(extendditi2): Convert from define_expand to
define_insn_and_split. Replace with code ...
> ---
> gcc/config/rs6000/vsx.md | 73 ++++++++++++++++++++++++++++------------
> 1 file changed, 52 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index b53de103872..62464f67f4d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5023,15 +5023,58 @@ (define_expand "vsignextend_si_v2di"
> DONE;
> })
>
> -;; ISA 3.1 vector sign extend
> -;; Move DI value from GPR to TI mode in VSX register, word 1.
> -(define_insn "mtvsrdd_diti_w1"
> - [(set (match_operand:TI 0 "register_operand" "=wa")
> - (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> - UNSPEC_MTVSRD_DITI_W1))]
> - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> - "mtvsrdd %x0,0,%1"
> - [(set_attr "type" "vecmove")])
> +;; Sign extend DI to TI. We provide both GPR targets and Altivec targets. If
> +;; the register allocator prefers the GPRs, we won't have to move the value to
> +;; the altivec registers, do the vextsd2q instruction and move it back. If we
> +;; aren't compiling for 64-bit power10, don't provide the service and let the
> +;; machine independent code handle the extension.
So, the ".. we won't have to ..." applies to the altivec target path
here? Describing in a way that indicates what code doesn't do doesn't
seem right.
If so, and perhaps even if not, i suggest rearranging the
comment slightly so it can be read as an either or.
If the register
allocator prefers the GPRS, ...
Otherwise, for altivec registers we dothe vextsd2q ...
> +(define_insn_and_split "extendditi2"
> + [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
> + (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
> + (clobber (reg:DI CA_REGNO))]
> + "TARGET_POWERPC64 && TARGET_POWER10"
> + "#"
> + "&& reload_completed"
> + [(pc)]
> +{
> + rtx dest = operands[0];
> + rtx src = operands[1];
> + int dest_regno = reg_or_subregno (dest);
> +
> + /* Handle conversion to GPR registers. Load up the low part and then do
> + a sign extension to the upper part. */
> + if (INT_REGNO_P (dest_regno))
> + {
> + rtx dest_hi = gen_highpart (DImode, dest);
> + rtx dest_lo = gen_lowpart (DImode, dest);
> +
> + emit_move_insn (dest_lo, src);
> + emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
> + DONE;
> + }
ok
> +
> + /* For conversion to Altivec register, generate either a splat operation or
> + a load rightmost double word instruction. Both instructions gets the
> + DImode value into the lower 64 bits, and then do the vextsd2q
> + instruction. */
consider s/instruction. Both instructions gets/to get/
> + else if (ALTIVEC_REGNO_P (dest_regno))
> + {
> + if (MEM_P (src))
> + emit_insn (gen_vsx_lxvrdx (dest, src));
> + else
> + {
> + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> + emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
> + }
> +
> + emit_insn (gen_extendditi2_vector (dest, dest));
> + DONE;
> + }
ok
lgtm, thanks
-Will
> +
> + else
> + gcc_unreachable ();
> +}
> + [(set_attr "length" "8")])
>
> ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
> (define_insn "extendditi2_vector"
> @@ -5042,18 +5085,6 @@ (define_insn "extendditi2_vector"
> "vextsd2q %0,%1"
> [(set_attr "type" "vecexts")])
>
> -(define_expand "extendditi2"
> - [(set (match_operand:TI 0 "gpc_reg_operand")
> - (sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> - "TARGET_POWER10"
> - {
> - /* Move 64-bit src from GPR to vector reg and sign extend to 128-bits. */
> - rtx temp = gen_reg_rtx (TImode);
> - emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
> - emit_insn (gen_extendditi2_vector (operands[0], temp));
> - DONE;
> - })
> -
>
> ;; ISA 3.0 Binary Floating-Point Support
>
> --
> 2.35.1
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-03-01 19:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01 3:21 [PATCH] Optimize signed DImode -> TImode on power10, PR target/104698 Michael Meissner
2022-03-01 19:30 ` will schmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).