[gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight
@ 2022-11-15 14:03 Philipp Tomsich
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Tomsich @ 2022-11-15 14:03 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:bc337d70f39ac9526a1d68f206b9f3c40cb171ce

commit bc337d70f39ac9526a1d68f206b9f3c40cb171ce
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Oct 5 02:10:30 2022 +0200

    riscv: Use by-pieces to do overlapping accesses in block_move_straight
    
    The current implementation of riscv_block_move_straight() emits a couple
    of load-store pairs with maximum width (e.g. 8-byte for RV64).
    The remainder is handed over to move_by_pieces(), which emits code based
    target settings like slow_unaligned_access and overlap_op_by_pieces.
    
    move_by_pieces() will emit overlapping memory accesses with maximum
    width only if the given length exceeds the size of one access
    (e.g. 15-bytes for 8-byte accesses).
    
    This patch changes the implementation of riscv_block_move_straight()
    such, that it preserves a remainder within the interval
    [delta..2*delta) instead of [0..delta), so that overlapping memory
    access may be emitted (if the requirements for them are given).
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-string.c (riscv_block_move_straight):
            Adjust range for emitted load/store pairs.
            (riscv_expand_block_move): Fix duplicated extraction of length.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/memcpy-overlapping.c: Adjust test for the
            improved expected instruction sequence.
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diff:
---
 gcc/config/riscv/riscv-string.cc                    |  8 ++++----
 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight
@ 2022-12-01 13:24 Philipp Tomsich
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Tomsich @ 2022-12-01 13:24 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:69da0a00342be91a90051df9f790874a33d22b54

commit 69da0a00342be91a90051df9f790874a33d22b54
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Oct 5 02:10:30 2022 +0200

    riscv: Use by-pieces to do overlapping accesses in block_move_straight
    
    The current implementation of riscv_block_move_straight() emits a couple
    of load-store pairs with maximum width (e.g. 8-byte for RV64).
    The remainder is handed over to move_by_pieces(), which emits code based
    target settings like slow_unaligned_access and overlap_op_by_pieces.
    
    move_by_pieces() will emit overlapping memory accesses with maximum
    width only if the given length exceeds the size of one access
    (e.g. 15-bytes for 8-byte accesses).
    
    This patch changes the implementation of riscv_block_move_straight()
    such, that it preserves a remainder within the interval
    [delta..2*delta) instead of [0..delta), so that overlapping memory
    access may be emitted (if the requirements for them are given).
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-string.c (riscv_block_move_straight):
            Adjust range for emitted load/store pairs.
            (riscv_expand_block_move): Fix duplicated extraction of length.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/memcpy-overlapping.c: Adjust test for the
            improved expected instruction sequence.
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diff:
---
 gcc/config/riscv/riscv-string.cc                    |  8 ++++----
 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight
@ 2022-11-18 20:26 Philipp Tomsich
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Tomsich @ 2022-11-18 20:26 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:c58e42423085540bcfb53998f2394e80b92096f2

commit c58e42423085540bcfb53998f2394e80b92096f2
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Oct 5 02:10:30 2022 +0200

    riscv: Use by-pieces to do overlapping accesses in block_move_straight
    
    The current implementation of riscv_block_move_straight() emits a couple
    of load-store pairs with maximum width (e.g. 8-byte for RV64).
    The remainder is handed over to move_by_pieces(), which emits code based
    target settings like slow_unaligned_access and overlap_op_by_pieces.
    
    move_by_pieces() will emit overlapping memory accesses with maximum
    width only if the given length exceeds the size of one access
    (e.g. 15-bytes for 8-byte accesses).
    
    This patch changes the implementation of riscv_block_move_straight()
    such, that it preserves a remainder within the interval
    [delta..2*delta) instead of [0..delta), so that overlapping memory
    access may be emitted (if the requirements for them are given).
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-string.c (riscv_block_move_straight):
            Adjust range for emitted load/store pairs.
            (riscv_expand_block_move): Fix duplicated extraction of length.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/memcpy-overlapping.c: Adjust test for the
            improved expected instruction sequence.
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diff:
---
 gcc/config/riscv/riscv-string.cc                    |  8 ++++----
 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight
@ 2022-11-18 20:23 Philipp Tomsich
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Tomsich @ 2022-11-18 20:23 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:5ea97a29f38094176ff2553292d56bfe7b0b1090

commit 5ea97a29f38094176ff2553292d56bfe7b0b1090
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Oct 5 02:10:30 2022 +0200

    riscv: Use by-pieces to do overlapping accesses in block_move_straight
    
    The current implementation of riscv_block_move_straight() emits a couple
    of load-store pairs with maximum width (e.g. 8-byte for RV64).
    The remainder is handed over to move_by_pieces(), which emits code based
    target settings like slow_unaligned_access and overlap_op_by_pieces.
    
    move_by_pieces() will emit overlapping memory accesses with maximum
    width only if the given length exceeds the size of one access
    (e.g. 15-bytes for 8-byte accesses).
    
    This patch changes the implementation of riscv_block_move_straight()
    such, that it preserves a remainder within the interval
    [delta..2*delta) instead of [0..delta), so that overlapping memory
    access may be emitted (if the requirements for them are given).
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-string.c (riscv_block_move_straight):
            Adjust range for emitted load/store pairs.
            (riscv_expand_block_move): Fix duplicated extraction of length.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/memcpy-overlapping.c: Adjust test for the
            improved expected instruction sequence.
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diff:
---
 gcc/config/riscv/riscv-string.cc                    |  8 ++++----
 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight
@ 2022-11-18 11:36 Philipp Tomsich
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Tomsich @ 2022-11-18 11:36 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:7481d2c5a1a24a29ba0ad161bdf733c3351aaafc

commit 7481d2c5a1a24a29ba0ad161bdf733c3351aaafc
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Oct 5 02:10:30 2022 +0200

    riscv: Use by-pieces to do overlapping accesses in block_move_straight
    
    The current implementation of riscv_block_move_straight() emits a couple
    of load-store pairs with maximum width (e.g. 8-byte for RV64).
    The remainder is handed over to move_by_pieces(), which emits code based
    target settings like slow_unaligned_access and overlap_op_by_pieces.
    
    move_by_pieces() will emit overlapping memory accesses with maximum
    width only if the given length exceeds the size of one access
    (e.g. 15-bytes for 8-byte accesses).
    
    This patch changes the implementation of riscv_block_move_straight()
    such, that it preserves a remainder within the interval
    [delta..2*delta) instead of [0..delta), so that overlapping memory
    access may be emitted (if the requirements for them are given).
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-string.c (riscv_block_move_straight):
            Adjust range for emitted load/store pairs.
            (riscv_expand_block_move): Fix duplicated extraction of length.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/memcpy-overlapping.c: Adjust test for the
            improved expected instruction sequence.
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diff:
---
 gcc/config/riscv/riscv-string.cc                    |  8 ++++----
 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight
@ 2022-11-17 22:27 Philipp Tomsich
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Tomsich @ 2022-11-17 22:27 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:f206e3f545a5ad1358fb69b25b21a5e2b5810110

commit f206e3f545a5ad1358fb69b25b21a5e2b5810110
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Oct 5 02:10:30 2022 +0200

    riscv: Use by-pieces to do overlapping accesses in block_move_straight
    
    The current implementation of riscv_block_move_straight() emits a couple
    of load-store pairs with maximum width (e.g. 8-byte for RV64).
    The remainder is handed over to move_by_pieces(), which emits code based
    target settings like slow_unaligned_access and overlap_op_by_pieces.
    
    move_by_pieces() will emit overlapping memory accesses with maximum
    width only if the given length exceeds the size of one access
    (e.g. 15-bytes for 8-byte accesses).
    
    This patch changes the implementation of riscv_block_move_straight()
    such, that it preserves a remainder within the interval
    [delta..2*delta) instead of [0..delta), so that overlapping memory
    access may be emitted (if the requirements for them are given).
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-string.c (riscv_block_move_straight):
            Adjust range for emitted load/store pairs.
            (riscv_expand_block_move): Fix duplicated extraction of length.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/memcpy-overlapping.c: Adjust test for the
            improved expected instruction sequence.
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diff:
---
 gcc/config/riscv/riscv-string.cc                    |  8 ++++----
 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight
@ 2022-11-15 15:01 Philipp Tomsich
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Tomsich @ 2022-11-15 15:01 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:a0a82c76329a7b5cf114bae219bda11b9f3d526c

commit a0a82c76329a7b5cf114bae219bda11b9f3d526c
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Oct 5 02:10:30 2022 +0200

    riscv: Use by-pieces to do overlapping accesses in block_move_straight
    
    The current implementation of riscv_block_move_straight() emits a couple
    of load-store pairs with maximum width (e.g. 8-byte for RV64).
    The remainder is handed over to move_by_pieces(), which emits code based
    target settings like slow_unaligned_access and overlap_op_by_pieces.
    
    move_by_pieces() will emit overlapping memory accesses with maximum
    width only if the given length exceeds the size of one access
    (e.g. 15-bytes for 8-byte accesses).
    
    This patch changes the implementation of riscv_block_move_straight()
    such, that it preserves a remainder within the interval
    [delta..2*delta) instead of [0..delta), so that overlapping memory
    access may be emitted (if the requirements for them are given).
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-string.c (riscv_block_move_straight):
            Adjust range for emitted load/store pairs.
            (riscv_expand_block_move): Fix duplicated extraction of length.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/memcpy-overlapping.c: Adjust test for the
            improved expected instruction sequence.
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diff:
---
 gcc/config/riscv/riscv-string.cc                    |  8 ++++----
 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269a..1137df475be1 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd14..ef95bfb879ba 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-01 13:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-15 14:03 [gcc(refs/vendors/vrull/heads/for-upstream)] riscv: Use by-pieces to do overlapping accesses in block_move_straight Philipp Tomsich
2022-11-15 15:01 Philipp Tomsich
2022-11-17 22:27 Philipp Tomsich
2022-11-18 11:36 Philipp Tomsich
2022-11-18 20:23 Philipp Tomsich
2022-11-18 20:26 Philipp Tomsich
2022-12-01 13:24 Philipp Tomsich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).