public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/dmf004)] Use lxvl and stxvl for small variable memcpy moves.
@ 2022-11-15 0:58 Michael Meissner
0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-11-15 0:58 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:a2e1c0fa8dc467907ef551d8c6e03baf2c5f99d5
commit a2e1c0fa8dc467907ef551d8c6e03baf2c5f99d5
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Nov 14 19:56:25 2022 -0500
Use lxvl and stxvl for small variable memcpy moves.
This patch adds support to generate inline code for block copy with a variable
size if the size is 16 bytes or less. If the size is more than 16 bytes, just
call memcpy.
To handle variable sizes, I found we need DImode versions of the two insns for
copying memory (cpymem<mode> and <movmem<mode>).
2022-11-14 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/rs6000-string.cc (expand_block_move): Add support for
using lxvl and stxvl to move up to 16 bytes inline without calling
memcpy.
* config/rs6000/rs6000.md (cpymem<mode>): Expand cpymemsi to also
provide cpymemdi to handle DImode sizes as well as SImode sizes.
(movmem<mode>): Expand movmemsi to also provide movmemdi to handle
DImode sizes as well as SImode sizes.
Diff:
---
gcc/config/rs6000/rs6000-string.cc | 49 ++++++++++++++++++++++++++++++++++++--
gcc/config/rs6000/rs6000.md | 12 +++++-----
2 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index cd8ee8c2f7e..596fbc634f4 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2760,9 +2760,54 @@ expand_block_move (rtx operands[], bool might_overlap)
rtx stores[MAX_MOVE_REG];
int num_reg = 0;
- /* If this is not a fixed size move, just call memcpy */
+ /* If this is not a fixed size move, see if we can use load/store vector with
+ length to handle multiple bytes. Don't do the optimization if -Os.
+ Otherwise, just call memcpy. */
if (! constp)
- return 0;
+ {
+ if (TARGET_BLOCK_OPS_UNALIGNED_VSX && TARGET_P9_VECTOR && TARGET_64BIT
+ && !optimize_size)
+ {
+ rtx join_label = gen_label_rtx ();
+ rtx inline_label = gen_label_rtx ();
+ rtx dest_addr = copy_addr_to_reg (XEXP (orig_dest, 0));
+ rtx src_addr = copy_addr_to_reg (XEXP (orig_src, 0));
+
+ /* Call memcpy if the size is too large. */
+ bytes_rtx = force_reg (Pmode, bytes_rtx);
+ rtx cr = gen_reg_rtx (CCUNSmode);
+ rtx max_size = GEN_INT (16);
+ emit_insn (gen_rtx_SET (cr,
+ gen_rtx_COMPARE (CCUNSmode, bytes_rtx,
+ max_size)));
+
+ do_ifelse (CCUNSmode, LEU, NULL_RTX, NULL_RTX, cr,
+ inline_label, profile_probability::likely ());
+
+ tree fun = builtin_decl_explicit (BUILT_IN_MEMCPY);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+ NULL_RTX, LCT_NORMAL, Pmode,
+ dest_addr, Pmode,
+ src_addr, Pmode,
+ bytes_rtx, Pmode);
+
+ rtx join_ref = gen_rtx_LABEL_REF (VOIDmode, join_label);
+ emit_jump_insn (gen_rtx_SET (pc_rtx, join_ref));
+ emit_barrier ();
+
+ emit_label (inline_label);
+
+ /* Move the final 0..16 bytes. */
+ rtx vreg = gen_reg_rtx (V16QImode);
+ emit_insn (gen_lxvl (vreg, src_addr, bytes_rtx));
+ emit_insn (gen_stxvl (vreg, dest_addr, bytes_rtx));
+
+ emit_label (join_label);
+ return 1;
+ }
+
+ return 0;
+ }
/* This must be a fixed size alignment */
gcc_assert (CONST_INT_P (align_rtx));
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index e9dfb138603..12bae0d32a7 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9880,11 +9880,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "cpymemsi"
+(define_expand "cpymem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, false))
@@ -9899,11 +9899,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "movmemsi"
+(define_expand "movmem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, true))
^ permalink raw reply [flat|nested] 4+ messages in thread
* [gcc(refs/users/meissner/heads/dmf004)] Use lxvl and stxvl for small variable memcpy moves.
@ 2022-11-17 21:54 Michael Meissner
0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-11-17 21:54 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:64eaa2bccc0a2934220b10b82426c2e11d9d16ae
commit 64eaa2bccc0a2934220b10b82426c2e11d9d16ae
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Nov 14 20:55:46 2022 -0500
Use lxvl and stxvl for small variable memcpy moves.
This patch adds support to generate inline code for block copy with a variable
size if the size is 16 bytes or less. If the size is more than 16 bytes, just
call memcpy.
To handle variable sizes, I found we need DImode versions of the two insns for
copying memory (cpymem<mode> and <movmem<mode>).
2022-11-14 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/rs6000-string.cc (expand_block_move): Add support for
using lxvl and stxvl to move up to 16 bytes inline without calling
memcpy.
* config/rs6000/rs6000.md (cpymem<mode>): Expand cpymemsi to also
provide cpymemdi to handle DImode sizes as well as SImode sizes.
(movmem<mode>): Expand movmemsi to also provide movmemdi to handle
DImode sizes as well as SImode sizes.
Diff:
---
gcc/config/rs6000/rs6000-string.cc | 53 ++++++++++++++++++++++++++++++++++++--
gcc/config/rs6000/rs6000.md | 12 ++++-----
2 files changed, 57 insertions(+), 8 deletions(-)
diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index cd8ee8c2f7e..2468e375781 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2760,9 +2760,58 @@ expand_block_move (rtx operands[], bool might_overlap)
rtx stores[MAX_MOVE_REG];
int num_reg = 0;
- /* If this is not a fixed size move, just call memcpy */
+ /* If this is not a fixed size move, see if we can use load/store vector with
+ length to handle multiple bytes. Don't do the optimization if -Os.
+ Otherwise, just call memcpy. */
if (! constp)
- return 0;
+ {
+ if (TARGET_BLOCK_OPS_UNALIGNED_VSX && TARGET_P9_VECTOR && TARGET_64BIT
+ && !optimize_size)
+ {
+ rtx join_label = gen_label_rtx ();
+ rtx inline_label = gen_label_rtx ();
+ rtx dest_addr = copy_addr_to_reg (XEXP (orig_dest, 0));
+ rtx src_addr = copy_addr_to_reg (XEXP (orig_src, 0));
+
+ /* Check if we want to handle this with inline code. */
+ bytes_rtx = (GET_MODE (bytes_rtx) == Pmode
+ ? copy_to_reg (bytes_rtx)
+ : convert_to_mode (Pmode, bytes_rtx, true));
+
+ rtx cr = gen_reg_rtx (CCUNSmode);
+ rtx max_size = GEN_INT (16);
+ emit_insn (gen_rtx_SET (cr,
+ gen_rtx_COMPARE (CCUNSmode, bytes_rtx,
+ max_size)));
+
+ do_ifelse (CCUNSmode, LEU, NULL_RTX, NULL_RTX, cr,
+ inline_label, profile_probability::likely ());
+
+ /* Call memcpy if the size is too large. */
+ tree fun = builtin_decl_explicit (BUILT_IN_MEMCPY);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+ NULL_RTX, LCT_NORMAL, Pmode,
+ dest_addr, Pmode,
+ src_addr, Pmode,
+ bytes_rtx, Pmode);
+
+ rtx join_ref = gen_rtx_LABEL_REF (VOIDmode, join_label);
+ emit_jump_insn (gen_rtx_SET (pc_rtx, join_ref));
+ emit_barrier ();
+
+ emit_label (inline_label);
+
+ /* We want to move bytes inline. Move 0..16 bytes now. */
+ rtx vreg = gen_reg_rtx (V16QImode);
+ emit_insn (gen_lxvl (vreg, src_addr, bytes_rtx));
+ emit_insn (gen_stxvl (vreg, dest_addr, bytes_rtx));
+
+ emit_label (join_label);
+ return 1;
+ }
+
+ return 0;
+ }
/* This must be a fixed size alignment */
gcc_assert (CONST_INT_P (align_rtx));
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index e9dfb138603..12bae0d32a7 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9880,11 +9880,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "cpymemsi"
+(define_expand "cpymem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, false))
@@ -9899,11 +9899,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "movmemsi"
+(define_expand "movmem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, true))
^ permalink raw reply [flat|nested] 4+ messages in thread
* [gcc(refs/users/meissner/heads/dmf004)] Use lxvl and stxvl for small variable memcpy moves.
@ 2022-11-17 21:54 Michael Meissner
0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-11-17 21:54 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:65b1ab1e15183dc06bc00e0ad0ae546731ed513b
commit 65b1ab1e15183dc06bc00e0ad0ae546731ed513b
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Nov 14 19:56:25 2022 -0500
Use lxvl and stxvl for small variable memcpy moves.
This patch adds support to generate inline code for block copy with a variable
size if the size is 16 bytes or less. If the size is more than 16 bytes, just
call memcpy.
To handle variable sizes, I found we need DImode versions of the two insns for
copying memory (cpymem<mode> and <movmem<mode>).
2022-11-14 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/rs6000-string.cc (expand_block_move): Add support for
using lxvl and stxvl to move up to 16 bytes inline without calling
memcpy.
* config/rs6000/rs6000.md (cpymem<mode>): Expand cpymemsi to also
provide cpymemdi to handle DImode sizes as well as SImode sizes.
(movmem<mode>): Expand movmemsi to also provide movmemdi to handle
DImode sizes as well as SImode sizes.
Diff:
---
gcc/config/rs6000/rs6000-string.cc | 49 ++++++++++++++++++++++++++++++++++++--
gcc/config/rs6000/rs6000.md | 12 +++++-----
2 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index cd8ee8c2f7e..596fbc634f4 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2760,9 +2760,54 @@ expand_block_move (rtx operands[], bool might_overlap)
rtx stores[MAX_MOVE_REG];
int num_reg = 0;
- /* If this is not a fixed size move, just call memcpy */
+ /* If this is not a fixed size move, see if we can use load/store vector with
+ length to handle multiple bytes. Don't do the optimization if -Os.
+ Otherwise, just call memcpy. */
if (! constp)
- return 0;
+ {
+ if (TARGET_BLOCK_OPS_UNALIGNED_VSX && TARGET_P9_VECTOR && TARGET_64BIT
+ && !optimize_size)
+ {
+ rtx join_label = gen_label_rtx ();
+ rtx inline_label = gen_label_rtx ();
+ rtx dest_addr = copy_addr_to_reg (XEXP (orig_dest, 0));
+ rtx src_addr = copy_addr_to_reg (XEXP (orig_src, 0));
+
+ /* Call memcpy if the size is too large. */
+ bytes_rtx = force_reg (Pmode, bytes_rtx);
+ rtx cr = gen_reg_rtx (CCUNSmode);
+ rtx max_size = GEN_INT (16);
+ emit_insn (gen_rtx_SET (cr,
+ gen_rtx_COMPARE (CCUNSmode, bytes_rtx,
+ max_size)));
+
+ do_ifelse (CCUNSmode, LEU, NULL_RTX, NULL_RTX, cr,
+ inline_label, profile_probability::likely ());
+
+ tree fun = builtin_decl_explicit (BUILT_IN_MEMCPY);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+ NULL_RTX, LCT_NORMAL, Pmode,
+ dest_addr, Pmode,
+ src_addr, Pmode,
+ bytes_rtx, Pmode);
+
+ rtx join_ref = gen_rtx_LABEL_REF (VOIDmode, join_label);
+ emit_jump_insn (gen_rtx_SET (pc_rtx, join_ref));
+ emit_barrier ();
+
+ emit_label (inline_label);
+
+ /* Move the final 0..16 bytes. */
+ rtx vreg = gen_reg_rtx (V16QImode);
+ emit_insn (gen_lxvl (vreg, src_addr, bytes_rtx));
+ emit_insn (gen_stxvl (vreg, dest_addr, bytes_rtx));
+
+ emit_label (join_label);
+ return 1;
+ }
+
+ return 0;
+ }
/* This must be a fixed size alignment */
gcc_assert (CONST_INT_P (align_rtx));
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index e9dfb138603..12bae0d32a7 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9880,11 +9880,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "cpymemsi"
+(define_expand "cpymem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, false))
@@ -9899,11 +9899,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "movmemsi"
+(define_expand "movmem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, true))
^ permalink raw reply [flat|nested] 4+ messages in thread
* [gcc(refs/users/meissner/heads/dmf004)] Use lxvl and stxvl for small variable memcpy moves.
@ 2022-11-15 1:56 Michael Meissner
0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-11-15 1:56 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:8398557374d749782e51ecd4ecae776da82d209f
commit 8398557374d749782e51ecd4ecae776da82d209f
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Nov 14 20:55:46 2022 -0500
Use lxvl and stxvl for small variable memcpy moves.
This patch adds support to generate inline code for block copy with a variable
size if the size is 16 bytes or less. If the size is more than 16 bytes, just
call memcpy.
To handle variable sizes, I found we need DImode versions of the two insns for
copying memory (cpymem<mode> and <movmem<mode>).
2022-11-14 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/rs6000-string.cc (expand_block_move): Add support for
using lxvl and stxvl to move up to 16 bytes inline without calling
memcpy.
* config/rs6000/rs6000.md (cpymem<mode>): Expand cpymemsi to also
provide cpymemdi to handle DImode sizes as well as SImode sizes.
(movmem<mode>): Expand movmemsi to also provide movmemdi to handle
DImode sizes as well as SImode sizes.
Diff:
---
gcc/config/rs6000/rs6000-string.cc | 53 ++++++++++++++++++++++++++++++++++++--
gcc/config/rs6000/rs6000.md | 12 ++++-----
2 files changed, 57 insertions(+), 8 deletions(-)
diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index cd8ee8c2f7e..2468e375781 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2760,9 +2760,58 @@ expand_block_move (rtx operands[], bool might_overlap)
rtx stores[MAX_MOVE_REG];
int num_reg = 0;
- /* If this is not a fixed size move, just call memcpy */
+ /* If this is not a fixed size move, see if we can use load/store vector with
+ length to handle multiple bytes. Don't do the optimization if -Os.
+ Otherwise, just call memcpy. */
if (! constp)
- return 0;
+ {
+ if (TARGET_BLOCK_OPS_UNALIGNED_VSX && TARGET_P9_VECTOR && TARGET_64BIT
+ && !optimize_size)
+ {
+ rtx join_label = gen_label_rtx ();
+ rtx inline_label = gen_label_rtx ();
+ rtx dest_addr = copy_addr_to_reg (XEXP (orig_dest, 0));
+ rtx src_addr = copy_addr_to_reg (XEXP (orig_src, 0));
+
+ /* Check if we want to handle this with inline code. */
+ bytes_rtx = (GET_MODE (bytes_rtx) == Pmode
+ ? copy_to_reg (bytes_rtx)
+ : convert_to_mode (Pmode, bytes_rtx, true));
+
+ rtx cr = gen_reg_rtx (CCUNSmode);
+ rtx max_size = GEN_INT (16);
+ emit_insn (gen_rtx_SET (cr,
+ gen_rtx_COMPARE (CCUNSmode, bytes_rtx,
+ max_size)));
+
+ do_ifelse (CCUNSmode, LEU, NULL_RTX, NULL_RTX, cr,
+ inline_label, profile_probability::likely ());
+
+ /* Call memcpy if the size is too large. */
+ tree fun = builtin_decl_explicit (BUILT_IN_MEMCPY);
+ emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+ NULL_RTX, LCT_NORMAL, Pmode,
+ dest_addr, Pmode,
+ src_addr, Pmode,
+ bytes_rtx, Pmode);
+
+ rtx join_ref = gen_rtx_LABEL_REF (VOIDmode, join_label);
+ emit_jump_insn (gen_rtx_SET (pc_rtx, join_ref));
+ emit_barrier ();
+
+ emit_label (inline_label);
+
+ /* We want to move bytes inline. Move 0..16 bytes now. */
+ rtx vreg = gen_reg_rtx (V16QImode);
+ emit_insn (gen_lxvl (vreg, src_addr, bytes_rtx));
+ emit_insn (gen_stxvl (vreg, dest_addr, bytes_rtx));
+
+ emit_label (join_label);
+ return 1;
+ }
+
+ return 0;
+ }
/* This must be a fixed size alignment */
gcc_assert (CONST_INT_P (align_rtx));
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index e9dfb138603..12bae0d32a7 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9880,11 +9880,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "cpymemsi"
+(define_expand "cpymem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, false))
@@ -9899,11 +9899,11 @@
;; Argument 2 is the length
;; Argument 3 is the alignment
-(define_expand "movmemsi"
+(define_expand "movmem<mode>"
[(parallel [(set (match_operand:BLK 0 "")
(match_operand:BLK 1 ""))
- (use (match_operand:SI 2 ""))
- (use (match_operand:SI 3 ""))])]
+ (use (match_operand:GPR 2 ""))
+ (use (match_operand:GPR 3 ""))])]
""
{
if (expand_block_move (operands, true))
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-11-17 21:54 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-15 0:58 [gcc(refs/users/meissner/heads/dmf004)] Use lxvl and stxvl for small variable memcpy moves Michael Meissner
2022-11-15 1:56 Michael Meissner
2022-11-17 21:54 Michael Meissner
2022-11-17 21:54 Michael Meissner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).