* [PATCH v3 0/2] Generate offset adjusted operation for op_by_pieces operations
@ 2021-04-27 1:14 H.J. Lu
2021-04-27 1:14 ` [PATCH v3 1/2] op_by_pieces_d::run: Change a while loop to a do-while loop H.J. Lu
2021-04-27 1:14 ` [PATCH v3 2/2] Generate offset adjusted operation for op_by_pieces operations H.J. Lu
0 siblings, 2 replies; 5+ messages in thread
From: H.J. Lu @ 2021-04-27 1:14 UTC (permalink / raw)
To: gcc-patches
Add an overlap_op_by_pieces_p target hook for op_by_pieces operations
between two areas of memory to generate one offset adjusted operation
in the smallest integer mode for the remaining bytes on the last piece
operation of a memory region to avoid doing more than one smaller
operations.
Pass the RTL information from the previous iteration to m_constfn in
op_by_pieces operation so that builtin_memset_[read|gen]_str can
generate the new RTL from the previous RTL.
The v3 changes:
1. Split changing a while loop in op_by_pieces_d::run to a do-while loop
into a separate patch for easier review.
2. Simplify the builtin_memset_read_str change.
3. Document that offset adjusted operation is unaligned.
The v2 changes are:
1. Added a target hook, TARGET_OVERLAP_OP_BY_PIECES_P.
2. Added a pointer argument to pieces_addr::adjust to pass the RTL
information from the previous iteraton to m_constfn.
3. Updated builtin_memset_read_str and builtin_memset_gen_str to
generate the new RTL from the previous RTL info.
H.J. Lu (2):
op_by_pieces_d::run: Change a while loop to a do-while loop
Generate offset adjusted operation for op_by_pieces operations
gcc/builtins.c | 36 ++++-
gcc/builtins.h | 6 +-
gcc/config/i386/i386.c | 3 +
gcc/doc/tm.texi | 7 +
gcc/doc/tm.texi.in | 2 +
gcc/expr.c | 171 ++++++++++++++++-----
gcc/expr.h | 10 +-
gcc/target.def | 9 ++
gcc/testsuite/g++.dg/pr90773-1.h | 14 ++
gcc/testsuite/g++.dg/pr90773-1a.C | 13 ++
gcc/testsuite/g++.dg/pr90773-1b.C | 5 +
gcc/testsuite/g++.dg/pr90773-1c.C | 5 +
gcc/testsuite/g++.dg/pr90773-1d.C | 19 +++
gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 ++
gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 ++
gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 ++
gcc/testsuite/gcc.target/i386/pr90773-12.c | 11 ++
gcc/testsuite/gcc.target/i386/pr90773-13.c | 11 ++
gcc/testsuite/gcc.target/i386/pr90773-14.c | 13 ++
gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 +++
gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++
gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 ++
gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 ++
gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 ++
gcc/testsuite/gcc.target/i386/pr90773-7.c | 11 ++
gcc/testsuite/gcc.target/i386/pr90773-8.c | 13 ++
gcc/testsuite/gcc.target/i386/pr90773-9.c | 13 ++
27 files changed, 446 insertions(+), 49 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h
create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C
create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C
create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C
create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-14.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c
--
2.31.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/2] op_by_pieces_d::run: Change a while loop to a do-while loop
2021-04-27 1:14 [PATCH v3 0/2] Generate offset adjusted operation for op_by_pieces operations H.J. Lu
@ 2021-04-27 1:14 ` H.J. Lu
2021-04-27 13:05 ` Richard Biener
2021-04-27 1:14 ` [PATCH v3 2/2] Generate offset adjusted operation for op_by_pieces operations H.J. Lu
1 sibling, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2021-04-27 1:14 UTC (permalink / raw)
To: gcc-patches
Change a while loop in op_by_pieces_d::run to a do-while loop to prepare
for offset adjusted operation for the remaining bytes on the last piece
operation of a memory region.
PR middl-end/90773
* expr.c (op_by_pieces_d::get_usable_mode): New member function.
(op_by_pieces_d::run): Cange a while loop to a do-while loop.
---
gcc/expr.c | 76 +++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 53 insertions(+), 23 deletions(-)
diff --git a/gcc/expr.c b/gcc/expr.c
index a0e19465965..07cb64427c9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1041,6 +1041,9 @@ pieces_addr::maybe_postinc (HOST_WIDE_INT size)
class op_by_pieces_d
{
+ private:
+ scalar_int_mode get_usable_mode (scalar_int_mode mode, unsigned int);
+
protected:
pieces_addr m_to, m_from;
unsigned HOST_WIDE_INT m_len;
@@ -1108,6 +1111,25 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
m_align = align;
}
+/* This function returns the largest usable integer mode for LEN bytes
+ whose size is no bigger than size of MODE. */
+
+scalar_int_mode
+op_by_pieces_d::get_usable_mode (scalar_int_mode mode, unsigned int len)
+{
+ unsigned int size;
+ do
+ {
+ size = GET_MODE_SIZE (mode);
+ if (len >= size && prepare_mode (mode, m_align))
+ break;
+ /* NB: widest_int_mode_for_size checks SIZE > 1. */
+ mode = widest_int_mode_for_size (size);
+ }
+ while (1);
+ return mode;
+}
+
/* This function contains the main loop used for expanding a block
operation. First move what we can in the largest integer mode,
then go to successively smaller modes. For every access, call
@@ -1116,42 +1138,50 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
void
op_by_pieces_d::run ()
{
- while (m_max_size > 1 && m_len > 0)
+ if (m_len == 0)
+ return;
+
+ /* NB: widest_int_mode_for_size checks M_MAX_SIZE > 1. */
+ scalar_int_mode mode = widest_int_mode_for_size (m_max_size);
+ mode = get_usable_mode (mode, m_len);
+
+ do
{
- scalar_int_mode mode = widest_int_mode_for_size (m_max_size);
+ unsigned int size = GET_MODE_SIZE (mode);
+ rtx to1 = NULL_RTX, from1;
- if (prepare_mode (mode, m_align))
+ while (m_len >= size)
{
- unsigned int size = GET_MODE_SIZE (mode);
- rtx to1 = NULL_RTX, from1;
+ if (m_reverse)
+ m_offset -= size;
- while (m_len >= size)
- {
- if (m_reverse)
- m_offset -= size;
+ to1 = m_to.adjust (mode, m_offset);
+ from1 = m_from.adjust (mode, m_offset);
- to1 = m_to.adjust (mode, m_offset);
- from1 = m_from.adjust (mode, m_offset);
+ m_to.maybe_predec (-(HOST_WIDE_INT)size);
+ m_from.maybe_predec (-(HOST_WIDE_INT)size);
- m_to.maybe_predec (-(HOST_WIDE_INT)size);
- m_from.maybe_predec (-(HOST_WIDE_INT)size);
+ generate (to1, from1, mode);
- generate (to1, from1, mode);
+ m_to.maybe_postinc (size);
+ m_from.maybe_postinc (size);
- m_to.maybe_postinc (size);
- m_from.maybe_postinc (size);
+ if (!m_reverse)
+ m_offset += size;
- if (!m_reverse)
- m_offset += size;
+ m_len -= size;
+ }
- m_len -= size;
- }
+ finish_mode (mode);
- finish_mode (mode);
- }
+ if (m_len == 0)
+ return;
- m_max_size = GET_MODE_SIZE (mode);
+ /* NB: widest_int_mode_for_size checks SIZE > 1. */
+ mode = widest_int_mode_for_size (size);
+ mode = get_usable_mode (mode, m_len);
}
+ while (1);
/* The code above should have handled everything. */
gcc_assert (!m_len);
--
2.31.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] Generate offset adjusted operation for op_by_pieces operations
2021-04-27 1:14 [PATCH v3 0/2] Generate offset adjusted operation for op_by_pieces operations H.J. Lu
2021-04-27 1:14 ` [PATCH v3 1/2] op_by_pieces_d::run: Change a while loop to a do-while loop H.J. Lu
@ 2021-04-27 1:14 ` H.J. Lu
2021-04-29 11:08 ` Richard Biener
1 sibling, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2021-04-27 1:14 UTC (permalink / raw)
To: gcc-patches
Add an overlap_op_by_pieces_p target hook for op_by_pieces operations
between two areas of memory to generate one offset adjusted operation
in the smallest integer mode for the remaining bytes on the last piece
operation of a memory region to avoid doing more than one smaller
operations.
Pass the RTL information from the previous iteration to m_constfn in
op_by_pieces operation so that builtin_memset_[read|gen]_str can
generate the new RTL from the previous RTL.
Tested on Linux/x86-64.
gcc/
PR middl-end/90773
* builtins.c (builtin_memcpy_read_str): Add a dummy argument.
(builtin_strncpy_read_str): Likewise.
(builtin_memset_read_str): Add an argument for the previous RTL
information and generate the new RTL from the previous RTL info.
(builtin_memset_gen_str): Likewise.
* builtins.h (builtin_strncpy_read_str): Update the prototype.
(builtin_memset_read_str): Likewise.
* expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p()
returns true, round up size and alignment to the widest integer
mode for maximum size.
(pieces_addr::adjust): Add a pointer to by_pieces_prev argument
and pass it to m_constfn.
(op_by_pieces_d): Add m_push and m_overlap_op_by_pieces.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_push. Initialize m_overlap_op_by_pieces with
targetm.overlap_op_by_pieces_p ().
(op_by_pieces_d::run): Pass the previous RTL information to
pieces_addr::adjust and generate overlapping operations if
m_overlap_op_by_pieces is true.
(PUSHG_P): New.
(move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
change.
(store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d
change.
(can_store_by_pieces): Use by_pieces_constfn on constfun.
(store_by_pieces): Use by_pieces_constfn on constfun. Updated
for op_by_pieces_d change.
(clear_by_pieces_1): Add a dummy argument.
(clear_by_pieces): Updated for op_by_pieces_d change.
(compare_by_pieces_d::compare_by_pieces_d): Likewise.
(string_cst_read_str): Add a dummy argument.
* expr.h (by_pieces_constfn): Add a dummy argument.
(by_pieces_prev): New.
* target.def (overlap_op_by_pieces_p): New target hook.
* config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New.
* doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P.
* doc/tm.texi: Regenerated.
gcc/testsuite/
PR middl-end/90773
* g++.dg/pr90773-1.h: New test.
* g++.dg/pr90773-1a.C: Likewise.
* g++.dg/pr90773-1b.C: Likewise.
* g++.dg/pr90773-1c.C: Likewise.
* g++.dg/pr90773-1d.C: Likewise.
* gcc.target/i386/pr90773-1.c: Likewise.
* gcc.target/i386/pr90773-2.c: Likewise.
* gcc.target/i386/pr90773-3.c: Likewise.
* gcc.target/i386/pr90773-4.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-6.c: Likewise.
* gcc.target/i386/pr90773-7.c: Likewise.
* gcc.target/i386/pr90773-8.c: Likewise.
* gcc.target/i386/pr90773-9.c: Likewise.
* gcc.target/i386/pr90773-10.c: Likewise.
* gcc.target/i386/pr90773-11.c: Likewise.
* gcc.target/i386/pr90773-12.c: Likewise.
* gcc.target/i386/pr90773-13.c: Likewise.
* gcc.target/i386/pr90773-14.c: Likewise.
---
gcc/builtins.c | 36 +++++--
gcc/builtins.h | 6 +-
gcc/config/i386/i386.c | 3 +
gcc/doc/tm.texi | 7 ++
gcc/doc/tm.texi.in | 2 +
gcc/expr.c | 105 +++++++++++++++++----
gcc/expr.h | 10 +-
gcc/target.def | 9 ++
gcc/testsuite/g++.dg/pr90773-1.h | 14 +++
gcc/testsuite/g++.dg/pr90773-1a.C | 13 +++
gcc/testsuite/g++.dg/pr90773-1b.C | 5 +
gcc/testsuite/g++.dg/pr90773-1c.C | 5 +
gcc/testsuite/g++.dg/pr90773-1d.C | 19 ++++
gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 ++++
gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 +++
gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 +++
gcc/testsuite/gcc.target/i386/pr90773-12.c | 11 +++
gcc/testsuite/gcc.target/i386/pr90773-13.c | 11 +++
gcc/testsuite/gcc.target/i386/pr90773-14.c | 13 +++
gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 ++++
gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++++
gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 +++
gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 +++
gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 +++
gcc/testsuite/gcc.target/i386/pr90773-7.c | 11 +++
gcc/testsuite/gcc.target/i386/pr90773-8.c | 13 +++
gcc/testsuite/gcc.target/i386/pr90773-9.c | 13 +++
27 files changed, 398 insertions(+), 31 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h
create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C
create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C
create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C
create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-14.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 8c5324bf7de..2d6bf4a65b4 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -128,7 +128,6 @@ static rtx expand_builtin_va_copy (tree);
static rtx inline_expand_builtin_bytecmp (tree, rtx);
static rtx expand_builtin_strcmp (tree, rtx);
static rtx expand_builtin_strncmp (tree, rtx, machine_mode);
-static rtx builtin_memcpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
static rtx expand_builtin_memchr (tree, rtx);
static rtx expand_builtin_memcpy (tree, rtx);
static rtx expand_builtin_memory_copy_args (tree dest, tree src, tree len,
@@ -145,7 +144,6 @@ static rtx expand_builtin_stpcpy (tree, rtx, machine_mode);
static rtx expand_builtin_stpncpy (tree, rtx);
static rtx expand_builtin_strncat (tree, rtx);
static rtx expand_builtin_strncpy (tree, rtx);
-static rtx builtin_memset_gen_str (void *, HOST_WIDE_INT, scalar_int_mode);
static rtx expand_builtin_memset (tree, rtx, machine_mode);
static rtx expand_builtin_memset_args (tree, tree, tree, rtx, machine_mode, tree);
static rtx expand_builtin_bzero (tree);
@@ -3860,7 +3858,7 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
a target constant. */
static rtx
-builtin_memcpy_read_str (void *data, HOST_WIDE_INT offset,
+builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
scalar_int_mode mode)
{
/* The REPresentation pointed to by DATA need not be a nul-terminated
@@ -6373,7 +6371,7 @@ expand_builtin_stpncpy (tree exp, rtx)
constant. */
rtx
-builtin_strncpy_read_str (void *data, HOST_WIDE_INT offset,
+builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
scalar_int_mode mode)
{
const char *str = (const char *) data;
@@ -6584,12 +6582,22 @@ expand_builtin_strncpy (tree exp, rtx target)
/* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE)
bytes from constant string DATA + OFFSET and return it as target
- constant. */
+ constant. If PREV isn't nullptr, it has the RTL info from the
+ previous iteration. */
rtx
-builtin_memset_read_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
+builtin_memset_read_str (void *data, void *prevp,
+ HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
scalar_int_mode mode)
{
+ by_pieces_prev *prev = (by_pieces_prev *) prevp;
+ if (prev != nullptr && prev->data != nullptr)
+ {
+ /* Use the previous data in the same mode. */
+ if (prev->mode == mode)
+ return prev->data;
+ }
+
const char *c = (const char *) data;
char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
@@ -6601,16 +6609,28 @@ builtin_memset_read_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
/* Callback routine for store_by_pieces. Return the RTL of a register
containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned
char value given in the RTL register data. For example, if mode is
- 4 bytes wide, return the RTL for 0x01010101*data. */
+ 4 bytes wide, return the RTL for 0x01010101*data. If PREV isn't
+ nullptr, it has the RTL info from the previous iteration. */
static rtx
-builtin_memset_gen_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
+builtin_memset_gen_str (void *data, void *prevp,
+ HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
scalar_int_mode mode)
{
rtx target, coeff;
size_t size;
char *p;
+ by_pieces_prev *prev = (by_pieces_prev *) prevp;
+ if (prev != nullptr && prev->data != nullptr)
+ {
+ /* Use the previous data in the same mode. */
+ if (prev->mode == mode)
+ return prev->data;
+
+ return simplify_gen_subreg (mode, prev->data, prev->mode, 0);
+ }
+
size = GET_MODE_SIZE (mode);
if (size == 1)
return (rtx) data;
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 307a20fbadb..e71f40c300a 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -110,8 +110,10 @@ extern void expand_builtin_update_setjmp_buf (rtx);
extern tree mathfn_built_in (tree, enum built_in_function fn);
extern tree mathfn_built_in (tree, combined_fn);
extern tree mathfn_built_in_type (combined_fn);
-extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
-extern rtx builtin_memset_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
+extern rtx builtin_strncpy_read_str (void *, void *, HOST_WIDE_INT,
+ scalar_int_mode);
+extern rtx builtin_memset_read_str (void *, void *, HOST_WIDE_INT,
+ scalar_int_mode);
extern rtx expand_builtin_saveregs (void);
extern tree std_build_builtin_va_list (void);
extern tree std_fn_abi_va_list (tree);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index adcef1e98bf..68f33f96f5a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23538,6 +23538,9 @@ ix86_run_selftests (void)
#undef TARGET_ADDRESS_COST
#define TARGET_ADDRESS_COST ix86_address_cost
+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
+
#undef TARGET_FLAGS_REGNUM
#define TARGET_FLAGS_REGNUM FLAGS_REG
#undef TARGET_FIXED_CONDITION_CODE_REGS
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 823f85ba9ab..ff88b14938c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6758,6 +6758,13 @@ in code size, for example where the number of insns emitted to perform a
move would be greater than that of a library call.
@end deftypefn
+@deftypefn {Target Hook} bool TARGET_OVERLAP_OP_BY_PIECES_P (void)
+This target hook should return true if when the @code{by_pieces}
+infrastructure is used, an offset adjusted unaligned memory operation
+in the smallest integer mode for the last piece operation of a memory
+region can be generated to avoid doing more than one smaller operations.
+@end deftypefn
+
@deftypefn {Target Hook} int TARGET_COMPARE_BY_PIECES_BRANCH_RATIO (machine_mode @var{mode})
When expanding a block comparison in MODE, gcc can try to reduce the
number of branches at the expense of more memory operations. This hook
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 2321a5fc4e0..4ac3452278e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4586,6 +4586,8 @@ If you don't define this, a reasonable default is used.
@hook TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
+@hook TARGET_OVERLAP_OP_BY_PIECES_P
+
@hook TARGET_COMPARE_BY_PIECES_BRANCH_RATIO
@defmac MOVE_MAX_PIECES
diff --git a/gcc/expr.c b/gcc/expr.c
index 07cb64427c9..b5b96ea1185 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -815,12 +815,27 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align,
unsigned int max_size, by_pieces_operation op)
{
unsigned HOST_WIDE_INT n_insns = 0;
+ scalar_int_mode mode;
+
+ if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES)
+ {
+ /* NB: Round up L and ALIGN to the widest integer mode for
+ MAX_SIZE. */
+ mode = widest_int_mode_for_size (max_size);
+ if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
+ {
+ unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
+ if (up > l)
+ l = up;
+ align = GET_MODE_ALIGNMENT (mode);
+ }
+ }
align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
while (max_size > 1 && l > 0)
{
- scalar_int_mode mode = widest_int_mode_for_size (max_size);
+ mode = widest_int_mode_for_size (max_size);
enum insn_code icode;
unsigned int modesize = GET_MODE_SIZE (mode);
@@ -888,7 +903,8 @@ class pieces_addr
void *m_cfndata;
public:
pieces_addr (rtx, bool, by_pieces_constfn, void *);
- rtx adjust (scalar_int_mode, HOST_WIDE_INT);
+ rtx adjust (scalar_int_mode, HOST_WIDE_INT,
+ by_pieces_prev * = nullptr);
void increment_address (HOST_WIDE_INT);
void maybe_predec (HOST_WIDE_INT);
void maybe_postinc (HOST_WIDE_INT);
@@ -990,10 +1006,12 @@ pieces_addr::decide_autoinc (machine_mode ARG_UNUSED (mode), bool reverse,
but we still modify the MEM's properties. */
rtx
-pieces_addr::adjust (scalar_int_mode mode, HOST_WIDE_INT offset)
+pieces_addr::adjust (scalar_int_mode mode, HOST_WIDE_INT offset,
+ by_pieces_prev *prev)
{
if (m_constfn)
- return m_constfn (m_cfndata, offset, mode);
+ /* Pass the previous data to m_constfn. */
+ return m_constfn (m_cfndata, prev, offset, mode);
if (m_obj == NULL_RTX)
return NULL_RTX;
if (m_auto)
@@ -1051,6 +1069,10 @@ class op_by_pieces_d
unsigned int m_align;
unsigned int m_max_size;
bool m_reverse;
+ /* True if this is a stack push. */
+ bool m_push;
+ /* True if targetm.overlap_op_by_pieces_p () returns true. */
+ bool m_overlap_op_by_pieces;
/* Virtual functions, overriden by derived classes for the specific
operation. */
@@ -1062,7 +1084,7 @@ class op_by_pieces_d
public:
op_by_pieces_d (rtx, bool, rtx, bool, by_pieces_constfn, void *,
- unsigned HOST_WIDE_INT, unsigned int);
+ unsigned HOST_WIDE_INT, unsigned int, bool);
void run ();
};
@@ -1077,10 +1099,11 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
by_pieces_constfn from_cfn,
void *from_cfn_data,
unsigned HOST_WIDE_INT len,
- unsigned int align)
+ unsigned int align, bool push)
: m_to (to, to_load, NULL, NULL),
m_from (from, from_load, from_cfn, from_cfn_data),
- m_len (len), m_max_size (MOVE_MAX_PIECES + 1)
+ m_len (len), m_max_size (MOVE_MAX_PIECES + 1),
+ m_push (push)
{
int toi = m_to.get_addr_inc ();
int fromi = m_from.get_addr_inc ();
@@ -1109,6 +1132,8 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
m_align = align;
+
+ m_overlap_op_by_pieces = targetm.overlap_op_by_pieces_p ();
}
/* This function returns the largest usable integer mode for LEN bytes
@@ -1145,6 +1170,9 @@ op_by_pieces_d::run ()
scalar_int_mode mode = widest_int_mode_for_size (m_max_size);
mode = get_usable_mode (mode, m_len);
+ by_pieces_prev to_prev = { nullptr, mode };
+ by_pieces_prev from_prev = { nullptr, mode };
+
do
{
unsigned int size = GET_MODE_SIZE (mode);
@@ -1155,8 +1183,12 @@ op_by_pieces_d::run ()
if (m_reverse)
m_offset -= size;
- to1 = m_to.adjust (mode, m_offset);
- from1 = m_from.adjust (mode, m_offset);
+ to1 = m_to.adjust (mode, m_offset, &to_prev);
+ to_prev.data = to1;
+ to_prev.mode = mode;
+ from1 = m_from.adjust (mode, m_offset, &from_prev);
+ from_prev.data = from1;
+ from_prev.mode = mode;
m_to.maybe_predec (-(HOST_WIDE_INT)size);
m_from.maybe_predec (-(HOST_WIDE_INT)size);
@@ -1177,9 +1209,32 @@ op_by_pieces_d::run ()
if (m_len == 0)
return;
- /* NB: widest_int_mode_for_size checks SIZE > 1. */
- mode = widest_int_mode_for_size (size);
- mode = get_usable_mode (mode, m_len);
+ if (!m_push && m_overlap_op_by_pieces)
+ {
+ /* NB: Generate overlapping operations if it is not a stack
+ push since stack push must not overlap. Get the smallest
+ integer mode for M_LEN bytes. */
+ mode = smallest_int_mode_for_size (m_len * BITS_PER_UNIT);
+ mode = get_usable_mode (mode, GET_MODE_SIZE (mode));
+ int gap = GET_MODE_SIZE (mode) - m_len;
+ if (gap > 0)
+ {
+ /* If size of MODE > M_LEN, generate the last operation
+ in MODE for the remaining bytes with ovelapping memory
+ from the previois operation. */
+ if (m_reverse)
+ m_offset += gap;
+ else
+ m_offset -= gap;
+ m_len += gap;
+ }
+ }
+ else
+ {
+ /* NB: widest_int_mode_for_size checks SIZE > 1. */
+ mode = widest_int_mode_for_size (size);
+ mode = get_usable_mode (mode, m_len);
+ }
}
while (1);
@@ -1190,6 +1245,12 @@ op_by_pieces_d::run ()
/* Derived class from op_by_pieces_d, providing support for block move
operations. */
+#ifdef PUSH_ROUNDING
+#define PUSHG_P(to) ((to) == nullptr)
+#else
+#define PUSHG_P(to) false
+#endif
+
class move_by_pieces_d : public op_by_pieces_d
{
insn_gen_fn m_gen_fun;
@@ -1199,7 +1260,8 @@ class move_by_pieces_d : public op_by_pieces_d
public:
move_by_pieces_d (rtx to, rtx from, unsigned HOST_WIDE_INT len,
unsigned int align)
- : op_by_pieces_d (to, false, from, true, NULL, NULL, len, align)
+ : op_by_pieces_d (to, false, from, true, NULL, NULL, len, align,
+ PUSHG_P (to))
{
}
rtx finish_retmode (memop_ret);
@@ -1294,7 +1356,8 @@ class store_by_pieces_d : public op_by_pieces_d
public:
store_by_pieces_d (rtx to, by_pieces_constfn cfn, void *cfn_data,
unsigned HOST_WIDE_INT len, unsigned int align)
- : op_by_pieces_d (to, false, NULL_RTX, true, cfn, cfn_data, len, align)
+ : op_by_pieces_d (to, false, NULL_RTX, true, cfn, cfn_data, len,
+ align, false)
{
}
rtx finish_retmode (memop_ret);
@@ -1349,7 +1412,7 @@ store_by_pieces_d::finish_retmode (memop_ret retmode)
int
can_store_by_pieces (unsigned HOST_WIDE_INT len,
- rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode),
+ by_pieces_constfn constfun,
void *constfundata, unsigned int align, bool memsetp)
{
unsigned HOST_WIDE_INT l;
@@ -1396,7 +1459,7 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
if (reverse)
offset -= size;
- cst = (*constfun) (constfundata, offset, mode);
+ cst = (*constfun) (constfundata, nullptr, offset, mode);
if (!targetm.legitimate_constant_p (mode, cst))
return 0;
@@ -1426,7 +1489,7 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
rtx
store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
- rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode),
+ by_pieces_constfn constfun,
void *constfundata, unsigned int align, bool memsetp,
memop_ret retmode)
{
@@ -1454,7 +1517,7 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
Return const0_rtx unconditionally. */
static rtx
-clear_by_pieces_1 (void *, HOST_WIDE_INT, scalar_int_mode)
+clear_by_pieces_1 (void *, void *, HOST_WIDE_INT, scalar_int_mode)
{
return const0_rtx;
}
@@ -1490,7 +1553,8 @@ class compare_by_pieces_d : public op_by_pieces_d
compare_by_pieces_d (rtx op0, rtx op1, by_pieces_constfn op1_cfn,
void *op1_cfn_data, HOST_WIDE_INT len, int align,
rtx_code_label *fail_label)
- : op_by_pieces_d (op0, true, op1, true, op1_cfn, op1_cfn_data, len, align)
+ : op_by_pieces_d (op0, true, op1, true, op1_cfn, op1_cfn_data, len,
+ align, false)
{
m_fail_label = fail_label;
}
@@ -5676,7 +5740,8 @@ emit_storent_insn (rtx to, rtx from)
/* Helper function for store_expr storing of STRING_CST. */
static rtx
-string_cst_read_str (void *data, HOST_WIDE_INT offset, scalar_int_mode mode)
+string_cst_read_str (void *data, void *, HOST_WIDE_INT offset,
+ scalar_int_mode mode)
{
tree str = (tree) data;
diff --git a/gcc/expr.h b/gcc/expr.h
index 1f0177a4cfa..9a2736f69fa 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -107,7 +107,15 @@ enum block_op_methods
BLOCK_OP_NO_LIBCALL_RET
};
-typedef rtx (*by_pieces_constfn) (void *, HOST_WIDE_INT, scalar_int_mode);
+typedef rtx (*by_pieces_constfn) (void *, void *, HOST_WIDE_INT,
+ scalar_int_mode);
+
+/* The second pointer passed to by_pieces_constfn. */
+struct by_pieces_prev
+{
+ rtx data;
+ scalar_int_mode mode;
+};
extern rtx emit_block_move (rtx, rtx, rtx, enum block_op_methods);
extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
diff --git a/gcc/target.def b/gcc/target.def
index d7b94bd8e5d..db64101dff6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3630,6 +3630,15 @@ move would be greater than that of a library call.",
enum by_pieces_operation op, bool speed_p),
default_use_by_pieces_infrastructure_p)
+DEFHOOK
+(overlap_op_by_pieces_p,
+ "This target hook should return true if when the @code{by_pieces}\n\
+infrastructure is used, an offset adjusted unaligned memory operation\n\
+in the smallest integer mode for the last piece operation of a memory\n\
+region can be generated to avoid doing more than one smaller operations.",
+ bool, (void),
+ hook_bool_void_false)
+
DEFHOOK
(compare_by_pieces_branch_ratio,
"When expanding a block comparison in MODE, gcc can try to reduce the\n\
diff --git a/gcc/testsuite/g++.dg/pr90773-1.h b/gcc/testsuite/g++.dg/pr90773-1.h
new file mode 100644
index 00000000000..abdb78b078b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr90773-1.h
@@ -0,0 +1,14 @@
+class fixed_wide_int_storage {
+public:
+ long val[10];
+ int len;
+ fixed_wide_int_storage ()
+ {
+ len = sizeof (val) / sizeof (val[0]);
+ for (int i = 0; i < len; i++)
+ val[i] = i;
+ }
+};
+
+extern void foo (fixed_wide_int_storage);
+extern int record_increment(void);
diff --git a/gcc/testsuite/g++.dg/pr90773-1a.C b/gcc/testsuite/g++.dg/pr90773-1a.C
new file mode 100644
index 00000000000..3ab8d929f74
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr90773-1a.C
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-options "-O2" }
+// { dg-additional-options "-mno-avx -msse2 -mtune=skylake" { target { i?86-*-* x86_64-*-* } } }
+
+#include "pr90773-1.h"
+
+int
+record_increment(void)
+{
+ fixed_wide_int_storage x;
+ foo (x);
+ return 0;
+}
diff --git a/gcc/testsuite/g++.dg/pr90773-1b.C b/gcc/testsuite/g++.dg/pr90773-1b.C
new file mode 100644
index 00000000000..9713b2dd612
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr90773-1b.C
@@ -0,0 +1,5 @@
+// { dg-do compile }
+// { dg-options "-O2" }
+// { dg-additional-options "-mno-avx512f -march=skylake" { target { i?86-*-* x86_64-*-* } } }
+
+#include "pr90773-1a.C"
diff --git a/gcc/testsuite/g++.dg/pr90773-1c.C b/gcc/testsuite/g++.dg/pr90773-1c.C
new file mode 100644
index 00000000000..699357a88dc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr90773-1c.C
@@ -0,0 +1,5 @@
+// { dg-do compile }
+// { dg-options "-O2" }
+// { dg-additional-options "-march=skylake-avx512" { target { i?86-*-* x86_64-*-* } } }
+
+#include "pr90773-1a.C"
diff --git a/gcc/testsuite/g++.dg/pr90773-1d.C b/gcc/testsuite/g++.dg/pr90773-1d.C
new file mode 100644
index 00000000000..bf9d8543c1b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr90773-1d.C
@@ -0,0 +1,19 @@
+// { dg-do run }
+// { dg-options "-O2" }
+// { dg-additional-options "-march=native" { target { i?86-*-* x86_64-*-* } } }
+// { dg-additional-sources "pr90773-1a.C" }
+
+#include "pr90773-1.h"
+
+void
+foo (fixed_wide_int_storage x)
+{
+ for (int i = 0; i < x.len; i++)
+ if (x.val[i] != i)
+ __builtin_abort ();
+}
+
+int main ()
+{
+ return record_increment ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
new file mode 100644
index 00000000000..1d9f282dc0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+ __builtin_memcpy (dst, src, 15);
+}
+
+/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-10.c b/gcc/testsuite/gcc.target/i386/pr90773-10.c
new file mode 100644
index 00000000000..9ad725e4880
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-10.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+ __builtin_memset (dst, c, 5);
+}
+
+/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-11.c b/gcc/testsuite/gcc.target/i386/pr90773-11.c
new file mode 100644
index 00000000000..1734c03a2eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-11.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+ __builtin_memset (dst, c, 6);
+}
+
+/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-12.c b/gcc/testsuite/gcc.target/i386/pr90773-12.c
new file mode 100644
index 00000000000..e45840a5b8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-12.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
+
+void
+foo (char *dst, char *src)
+{
+ __builtin_memcpy (dst, src, 255);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 16 } } */
+/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-13.c b/gcc/testsuite/gcc.target/i386/pr90773-13.c
new file mode 100644
index 00000000000..4d5ae8d1086
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-13.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
+
+void
+foo (char *dst)
+{
+ __builtin_memset (dst, 0, 255);
+}
+
+/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \[0-9\]*\\(%\[\^,\]+\\)" 16 } } */
+/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
new file mode 100644
index 00000000000..6364916ecac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+ __builtin_memset (dst, 1, 20);
+}
+
+/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$16843009, 16\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-2.c b/gcc/testsuite/gcc.target/i386/pr90773-2.c
new file mode 100644
index 00000000000..64495751b46
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+/* { dg-additional-options "-mno-avx -msse2" { target { ! ia32 } } } */
+/* { dg-additional-options "-mno-sse" { target ia32 } } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+ __builtin_memcpy (dst, src, 19);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+12\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-3.c b/gcc/testsuite/gcc.target/i386/pr90773-3.c
new file mode 100644
index 00000000000..84747c94652
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-3.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+/* { dg-additional-options "-mno-avx -msse2" { target { ! ia32 } } } */
+/* { dg-additional-options "-mno-sse" { target ia32 } } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+ __builtin_memcpy (dst, src, 31);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movdqu\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+12\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+16\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+20\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+24\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+27\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c
new file mode 100644
index 00000000000..ec0bc0100ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+ __builtin_memset (dst, 0, 31);
+}
+
+/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, 15\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-5.c b/gcc/testsuite/gcc.target/i386/pr90773-5.c
new file mode 100644
index 00000000000..49d03ef2403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+ __builtin_memset (dst, 0, 21);
+}
+
+/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movq\[\\t \]+\\\$0+, 13\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-6.c b/gcc/testsuite/gcc.target/i386/pr90773-6.c
new file mode 100644
index 00000000000..46498f6f50c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+void
+foo (char *dst, char *src)
+{
+ __builtin_memcpy (dst, src, 255);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 16 } } */
+/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-7.c b/gcc/testsuite/gcc.target/i386/pr90773-7.c
new file mode 100644
index 00000000000..4d5ae8d1086
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-7.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
+
+void
+foo (char *dst)
+{
+ __builtin_memset (dst, 0, 255);
+}
+
+/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \[0-9\]*\\(%\[\^,\]+\\)" 16 } } */
+/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-8.c b/gcc/testsuite/gcc.target/i386/pr90773-8.c
new file mode 100644
index 00000000000..0d47845d560
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+ __builtin_memset (dst, 0, 5);
+}
+
+/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-9.c b/gcc/testsuite/gcc.target/i386/pr90773-9.c
new file mode 100644
index 00000000000..ab5ea451f30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-9.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+ __builtin_memset (dst, 0, 6);
+}
+
+/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
--
2.31.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/2] op_by_pieces_d::run: Change a while loop to a do-while loop
2021-04-27 1:14 ` [PATCH v3 1/2] op_by_pieces_d::run: Change a while loop to a do-while loop H.J. Lu
@ 2021-04-27 13:05 ` Richard Biener
0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2021-04-27 13:05 UTC (permalink / raw)
To: H.J. Lu; +Cc: GCC Patches
On Tue, Apr 27, 2021 at 3:14 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Change a while loop in op_by_pieces_d::run to a do-while loop to prepare
> for offset adjusted operation for the remaining bytes on the last piece
> operation of a memory region.
OK.
Thanks,
Richard.
> PR middl-end/90773
> * expr.c (op_by_pieces_d::get_usable_mode): New member function.
> (op_by_pieces_d::run): Cange a while loop to a do-while loop.
> ---
> gcc/expr.c | 76 +++++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 53 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index a0e19465965..07cb64427c9 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -1041,6 +1041,9 @@ pieces_addr::maybe_postinc (HOST_WIDE_INT size)
>
> class op_by_pieces_d
> {
> + private:
> + scalar_int_mode get_usable_mode (scalar_int_mode mode, unsigned int);
> +
> protected:
> pieces_addr m_to, m_from;
> unsigned HOST_WIDE_INT m_len;
> @@ -1108,6 +1111,25 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
> m_align = align;
> }
>
> +/* This function returns the largest usable integer mode for LEN bytes
> + whose size is no bigger than size of MODE. */
> +
> +scalar_int_mode
> +op_by_pieces_d::get_usable_mode (scalar_int_mode mode, unsigned int len)
> +{
> + unsigned int size;
> + do
> + {
> + size = GET_MODE_SIZE (mode);
> + if (len >= size && prepare_mode (mode, m_align))
> + break;
> + /* NB: widest_int_mode_for_size checks SIZE > 1. */
> + mode = widest_int_mode_for_size (size);
> + }
> + while (1);
> + return mode;
> +}
> +
> /* This function contains the main loop used for expanding a block
> operation. First move what we can in the largest integer mode,
> then go to successively smaller modes. For every access, call
> @@ -1116,42 +1138,50 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
> void
> op_by_pieces_d::run ()
> {
> - while (m_max_size > 1 && m_len > 0)
> + if (m_len == 0)
> + return;
> +
> + /* NB: widest_int_mode_for_size checks M_MAX_SIZE > 1. */
> + scalar_int_mode mode = widest_int_mode_for_size (m_max_size);
> + mode = get_usable_mode (mode, m_len);
> +
> + do
> {
> - scalar_int_mode mode = widest_int_mode_for_size (m_max_size);
> + unsigned int size = GET_MODE_SIZE (mode);
> + rtx to1 = NULL_RTX, from1;
>
> - if (prepare_mode (mode, m_align))
> + while (m_len >= size)
> {
> - unsigned int size = GET_MODE_SIZE (mode);
> - rtx to1 = NULL_RTX, from1;
> + if (m_reverse)
> + m_offset -= size;
>
> - while (m_len >= size)
> - {
> - if (m_reverse)
> - m_offset -= size;
> + to1 = m_to.adjust (mode, m_offset);
> + from1 = m_from.adjust (mode, m_offset);
>
> - to1 = m_to.adjust (mode, m_offset);
> - from1 = m_from.adjust (mode, m_offset);
> + m_to.maybe_predec (-(HOST_WIDE_INT)size);
> + m_from.maybe_predec (-(HOST_WIDE_INT)size);
>
> - m_to.maybe_predec (-(HOST_WIDE_INT)size);
> - m_from.maybe_predec (-(HOST_WIDE_INT)size);
> + generate (to1, from1, mode);
>
> - generate (to1, from1, mode);
> + m_to.maybe_postinc (size);
> + m_from.maybe_postinc (size);
>
> - m_to.maybe_postinc (size);
> - m_from.maybe_postinc (size);
> + if (!m_reverse)
> + m_offset += size;
>
> - if (!m_reverse)
> - m_offset += size;
> + m_len -= size;
> + }
>
> - m_len -= size;
> - }
> + finish_mode (mode);
>
> - finish_mode (mode);
> - }
> + if (m_len == 0)
> + return;
>
> - m_max_size = GET_MODE_SIZE (mode);
> + /* NB: widest_int_mode_for_size checks SIZE > 1. */
> + mode = widest_int_mode_for_size (size);
> + mode = get_usable_mode (mode, m_len);
> }
> + while (1);
>
> /* The code above should have handled everything. */
> gcc_assert (!m_len);
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/2] Generate offset adjusted operation for op_by_pieces operations
2021-04-27 1:14 ` [PATCH v3 2/2] Generate offset adjusted operation for op_by_pieces operations H.J. Lu
@ 2021-04-29 11:08 ` Richard Biener
0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2021-04-29 11:08 UTC (permalink / raw)
To: H.J. Lu; +Cc: GCC Patches
On Tue, Apr 27, 2021 at 3:14 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Add an overlap_op_by_pieces_p target hook for op_by_pieces operations
> between two areas of memory to generate one offset adjusted operation
> in the smallest integer mode for the remaining bytes on the last piece
> operation of a memory region to avoid doing more than one smaller
> operations.
>
> Pass the RTL information from the previous iteration to m_constfn in
> op_by_pieces operation so that builtin_memset_[read|gen]_str can
> generate the new RTL from the previous RTL.
>
> Tested on Linux/x86-64.
OK.
Thanks,
Richard.
> gcc/
>
> PR middl-end/90773
> * builtins.c (builtin_memcpy_read_str): Add a dummy argument.
> (builtin_strncpy_read_str): Likewise.
> (builtin_memset_read_str): Add an argument for the previous RTL
> information and generate the new RTL from the previous RTL info.
> (builtin_memset_gen_str): Likewise.
> * builtins.h (builtin_strncpy_read_str): Update the prototype.
> (builtin_memset_read_str): Likewise.
> * expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p()
> returns true, round up size and alignment to the widest integer
> mode for maximum size.
> (pieces_addr::adjust): Add a pointer to by_pieces_prev argument
> and pass it to m_constfn.
> (op_by_pieces_d): Add m_push and m_overlap_op_by_pieces.
> (op_by_pieces_d::op_by_pieces_d): Add a bool argument to
> initialize m_push. Initialize m_overlap_op_by_pieces with
> targetm.overlap_op_by_pieces_p ().
> (op_by_pieces_d::run): Pass the previous RTL information to
> pieces_addr::adjust and generate overlapping operations if
> m_overlap_op_by_pieces is true.
> (PUSHG_P): New.
> (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
> change.
> (store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d
> change.
> (can_store_by_pieces): Use by_pieces_constfn on constfun.
> (store_by_pieces): Use by_pieces_constfn on constfun. Updated
> for op_by_pieces_d change.
> (clear_by_pieces_1): Add a dummy argument.
> (clear_by_pieces): Updated for op_by_pieces_d change.
> (compare_by_pieces_d::compare_by_pieces_d): Likewise.
> (string_cst_read_str): Add a dummy argument.
> * expr.h (by_pieces_constfn): Add a dummy argument.
> (by_pieces_prev): New.
> * target.def (overlap_op_by_pieces_p): New target hook.
> * config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New.
> * doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P.
> * doc/tm.texi: Regenerated.
>
> gcc/testsuite/
>
> PR middl-end/90773
> * g++.dg/pr90773-1.h: New test.
> * g++.dg/pr90773-1a.C: Likewise.
> * g++.dg/pr90773-1b.C: Likewise.
> * g++.dg/pr90773-1c.C: Likewise.
> * g++.dg/pr90773-1d.C: Likewise.
> * gcc.target/i386/pr90773-1.c: Likewise.
> * gcc.target/i386/pr90773-2.c: Likewise.
> * gcc.target/i386/pr90773-3.c: Likewise.
> * gcc.target/i386/pr90773-4.c: Likewise.
> * gcc.target/i386/pr90773-5.c: Likewise.
> * gcc.target/i386/pr90773-6.c: Likewise.
> * gcc.target/i386/pr90773-7.c: Likewise.
> * gcc.target/i386/pr90773-8.c: Likewise.
> * gcc.target/i386/pr90773-9.c: Likewise.
> * gcc.target/i386/pr90773-10.c: Likewise.
> * gcc.target/i386/pr90773-11.c: Likewise.
> * gcc.target/i386/pr90773-12.c: Likewise.
> * gcc.target/i386/pr90773-13.c: Likewise.
> * gcc.target/i386/pr90773-14.c: Likewise.
> ---
> gcc/builtins.c | 36 +++++--
> gcc/builtins.h | 6 +-
> gcc/config/i386/i386.c | 3 +
> gcc/doc/tm.texi | 7 ++
> gcc/doc/tm.texi.in | 2 +
> gcc/expr.c | 105 +++++++++++++++++----
> gcc/expr.h | 10 +-
> gcc/target.def | 9 ++
> gcc/testsuite/g++.dg/pr90773-1.h | 14 +++
> gcc/testsuite/g++.dg/pr90773-1a.C | 13 +++
> gcc/testsuite/g++.dg/pr90773-1b.C | 5 +
> gcc/testsuite/g++.dg/pr90773-1c.C | 5 +
> gcc/testsuite/g++.dg/pr90773-1d.C | 19 ++++
> gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 ++++
> gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 +++
> gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 +++
> gcc/testsuite/gcc.target/i386/pr90773-12.c | 11 +++
> gcc/testsuite/gcc.target/i386/pr90773-13.c | 11 +++
> gcc/testsuite/gcc.target/i386/pr90773-14.c | 13 +++
> gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 ++++
> gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++++
> gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 +++
> gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 +++
> gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 +++
> gcc/testsuite/gcc.target/i386/pr90773-7.c | 11 +++
> gcc/testsuite/gcc.target/i386/pr90773-8.c | 13 +++
> gcc/testsuite/gcc.target/i386/pr90773-9.c | 13 +++
> 27 files changed, 398 insertions(+), 31 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h
> create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C
> create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C
> create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C
> create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-14.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 8c5324bf7de..2d6bf4a65b4 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -128,7 +128,6 @@ static rtx expand_builtin_va_copy (tree);
> static rtx inline_expand_builtin_bytecmp (tree, rtx);
> static rtx expand_builtin_strcmp (tree, rtx);
> static rtx expand_builtin_strncmp (tree, rtx, machine_mode);
> -static rtx builtin_memcpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
> static rtx expand_builtin_memchr (tree, rtx);
> static rtx expand_builtin_memcpy (tree, rtx);
> static rtx expand_builtin_memory_copy_args (tree dest, tree src, tree len,
> @@ -145,7 +144,6 @@ static rtx expand_builtin_stpcpy (tree, rtx, machine_mode);
> static rtx expand_builtin_stpncpy (tree, rtx);
> static rtx expand_builtin_strncat (tree, rtx);
> static rtx expand_builtin_strncpy (tree, rtx);
> -static rtx builtin_memset_gen_str (void *, HOST_WIDE_INT, scalar_int_mode);
> static rtx expand_builtin_memset (tree, rtx, machine_mode);
> static rtx expand_builtin_memset_args (tree, tree, tree, rtx, machine_mode, tree);
> static rtx expand_builtin_bzero (tree);
> @@ -3860,7 +3858,7 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
> a target constant. */
>
> static rtx
> -builtin_memcpy_read_str (void *data, HOST_WIDE_INT offset,
> +builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> scalar_int_mode mode)
> {
> /* The REPresentation pointed to by DATA need not be a nul-terminated
> @@ -6373,7 +6371,7 @@ expand_builtin_stpncpy (tree exp, rtx)
> constant. */
>
> rtx
> -builtin_strncpy_read_str (void *data, HOST_WIDE_INT offset,
> +builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> scalar_int_mode mode)
> {
> const char *str = (const char *) data;
> @@ -6584,12 +6582,22 @@ expand_builtin_strncpy (tree exp, rtx target)
>
> /* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE)
> bytes from constant string DATA + OFFSET and return it as target
> - constant. */
> + constant. If PREV isn't nullptr, it has the RTL info from the
> + previous iteration. */
>
> rtx
> -builtin_memset_read_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> +builtin_memset_read_str (void *data, void *prevp,
> + HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> scalar_int_mode mode)
> {
> + by_pieces_prev *prev = (by_pieces_prev *) prevp;
> + if (prev != nullptr && prev->data != nullptr)
> + {
> + /* Use the previous data in the same mode. */
> + if (prev->mode == mode)
> + return prev->data;
> + }
> +
> const char *c = (const char *) data;
> char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
>
> @@ -6601,16 +6609,28 @@ builtin_memset_read_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> /* Callback routine for store_by_pieces. Return the RTL of a register
> containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned
> char value given in the RTL register data. For example, if mode is
> - 4 bytes wide, return the RTL for 0x01010101*data. */
> + 4 bytes wide, return the RTL for 0x01010101*data. If PREV isn't
> + nullptr, it has the RTL info from the previous iteration. */
>
> static rtx
> -builtin_memset_gen_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> +builtin_memset_gen_str (void *data, void *prevp,
> + HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> scalar_int_mode mode)
> {
> rtx target, coeff;
> size_t size;
> char *p;
>
> + by_pieces_prev *prev = (by_pieces_prev *) prevp;
> + if (prev != nullptr && prev->data != nullptr)
> + {
> + /* Use the previous data in the same mode. */
> + if (prev->mode == mode)
> + return prev->data;
> +
> + return simplify_gen_subreg (mode, prev->data, prev->mode, 0);
> + }
> +
> size = GET_MODE_SIZE (mode);
> if (size == 1)
> return (rtx) data;
> diff --git a/gcc/builtins.h b/gcc/builtins.h
> index 307a20fbadb..e71f40c300a 100644
> --- a/gcc/builtins.h
> +++ b/gcc/builtins.h
> @@ -110,8 +110,10 @@ extern void expand_builtin_update_setjmp_buf (rtx);
> extern tree mathfn_built_in (tree, enum built_in_function fn);
> extern tree mathfn_built_in (tree, combined_fn);
> extern tree mathfn_built_in_type (combined_fn);
> -extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
> -extern rtx builtin_memset_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
> +extern rtx builtin_strncpy_read_str (void *, void *, HOST_WIDE_INT,
> + scalar_int_mode);
> +extern rtx builtin_memset_read_str (void *, void *, HOST_WIDE_INT,
> + scalar_int_mode);
> extern rtx expand_builtin_saveregs (void);
> extern tree std_build_builtin_va_list (void);
> extern tree std_fn_abi_va_list (tree);
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index adcef1e98bf..68f33f96f5a 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -23538,6 +23538,9 @@ ix86_run_selftests (void)
> #undef TARGET_ADDRESS_COST
> #define TARGET_ADDRESS_COST ix86_address_cost
>
> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
> +
> #undef TARGET_FLAGS_REGNUM
> #define TARGET_FLAGS_REGNUM FLAGS_REG
> #undef TARGET_FIXED_CONDITION_CODE_REGS
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 823f85ba9ab..ff88b14938c 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6758,6 +6758,13 @@ in code size, for example where the number of insns emitted to perform a
> move would be greater than that of a library call.
> @end deftypefn
>
> +@deftypefn {Target Hook} bool TARGET_OVERLAP_OP_BY_PIECES_P (void)
> +This target hook should return true if when the @code{by_pieces}
> +infrastructure is used, an offset adjusted unaligned memory operation
> +in the smallest integer mode for the last piece operation of a memory
> +region can be generated to avoid doing more than one smaller operations.
> +@end deftypefn
> +
> @deftypefn {Target Hook} int TARGET_COMPARE_BY_PIECES_BRANCH_RATIO (machine_mode @var{mode})
> When expanding a block comparison in MODE, gcc can try to reduce the
> number of branches at the expense of more memory operations. This hook
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 2321a5fc4e0..4ac3452278e 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4586,6 +4586,8 @@ If you don't define this, a reasonable default is used.
>
> @hook TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>
> +@hook TARGET_OVERLAP_OP_BY_PIECES_P
> +
> @hook TARGET_COMPARE_BY_PIECES_BRANCH_RATIO
>
> @defmac MOVE_MAX_PIECES
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 07cb64427c9..b5b96ea1185 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -815,12 +815,27 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align,
> unsigned int max_size, by_pieces_operation op)
> {
> unsigned HOST_WIDE_INT n_insns = 0;
> + scalar_int_mode mode;
> +
> + if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES)
> + {
> + /* NB: Round up L and ALIGN to the widest integer mode for
> + MAX_SIZE. */
> + mode = widest_int_mode_for_size (max_size);
> + if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
> + {
> + unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
> + if (up > l)
> + l = up;
> + align = GET_MODE_ALIGNMENT (mode);
> + }
> + }
>
> align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
>
> while (max_size > 1 && l > 0)
> {
> - scalar_int_mode mode = widest_int_mode_for_size (max_size);
> + mode = widest_int_mode_for_size (max_size);
> enum insn_code icode;
>
> unsigned int modesize = GET_MODE_SIZE (mode);
> @@ -888,7 +903,8 @@ class pieces_addr
> void *m_cfndata;
> public:
> pieces_addr (rtx, bool, by_pieces_constfn, void *);
> - rtx adjust (scalar_int_mode, HOST_WIDE_INT);
> + rtx adjust (scalar_int_mode, HOST_WIDE_INT,
> + by_pieces_prev * = nullptr);
> void increment_address (HOST_WIDE_INT);
> void maybe_predec (HOST_WIDE_INT);
> void maybe_postinc (HOST_WIDE_INT);
> @@ -990,10 +1006,12 @@ pieces_addr::decide_autoinc (machine_mode ARG_UNUSED (mode), bool reverse,
> but we still modify the MEM's properties. */
>
> rtx
> -pieces_addr::adjust (scalar_int_mode mode, HOST_WIDE_INT offset)
> +pieces_addr::adjust (scalar_int_mode mode, HOST_WIDE_INT offset,
> + by_pieces_prev *prev)
> {
> if (m_constfn)
> - return m_constfn (m_cfndata, offset, mode);
> + /* Pass the previous data to m_constfn. */
> + return m_constfn (m_cfndata, prev, offset, mode);
> if (m_obj == NULL_RTX)
> return NULL_RTX;
> if (m_auto)
> @@ -1051,6 +1069,10 @@ class op_by_pieces_d
> unsigned int m_align;
> unsigned int m_max_size;
> bool m_reverse;
> + /* True if this is a stack push. */
> + bool m_push;
> + /* True if targetm.overlap_op_by_pieces_p () returns true. */
> + bool m_overlap_op_by_pieces;
>
> /* Virtual functions, overriden by derived classes for the specific
> operation. */
> @@ -1062,7 +1084,7 @@ class op_by_pieces_d
>
> public:
> op_by_pieces_d (rtx, bool, rtx, bool, by_pieces_constfn, void *,
> - unsigned HOST_WIDE_INT, unsigned int);
> + unsigned HOST_WIDE_INT, unsigned int, bool);
> void run ();
> };
>
> @@ -1077,10 +1099,11 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
> by_pieces_constfn from_cfn,
> void *from_cfn_data,
> unsigned HOST_WIDE_INT len,
> - unsigned int align)
> + unsigned int align, bool push)
> : m_to (to, to_load, NULL, NULL),
> m_from (from, from_load, from_cfn, from_cfn_data),
> - m_len (len), m_max_size (MOVE_MAX_PIECES + 1)
> + m_len (len), m_max_size (MOVE_MAX_PIECES + 1),
> + m_push (push)
> {
> int toi = m_to.get_addr_inc ();
> int fromi = m_from.get_addr_inc ();
> @@ -1109,6 +1132,8 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
>
> align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
> m_align = align;
> +
> + m_overlap_op_by_pieces = targetm.overlap_op_by_pieces_p ();
> }
>
> /* This function returns the largest usable integer mode for LEN bytes
> @@ -1145,6 +1170,9 @@ op_by_pieces_d::run ()
> scalar_int_mode mode = widest_int_mode_for_size (m_max_size);
> mode = get_usable_mode (mode, m_len);
>
> + by_pieces_prev to_prev = { nullptr, mode };
> + by_pieces_prev from_prev = { nullptr, mode };
> +
> do
> {
> unsigned int size = GET_MODE_SIZE (mode);
> @@ -1155,8 +1183,12 @@ op_by_pieces_d::run ()
> if (m_reverse)
> m_offset -= size;
>
> - to1 = m_to.adjust (mode, m_offset);
> - from1 = m_from.adjust (mode, m_offset);
> + to1 = m_to.adjust (mode, m_offset, &to_prev);
> + to_prev.data = to1;
> + to_prev.mode = mode;
> + from1 = m_from.adjust (mode, m_offset, &from_prev);
> + from_prev.data = from1;
> + from_prev.mode = mode;
>
> m_to.maybe_predec (-(HOST_WIDE_INT)size);
> m_from.maybe_predec (-(HOST_WIDE_INT)size);
> @@ -1177,9 +1209,32 @@ op_by_pieces_d::run ()
> if (m_len == 0)
> return;
>
> - /* NB: widest_int_mode_for_size checks SIZE > 1. */
> - mode = widest_int_mode_for_size (size);
> - mode = get_usable_mode (mode, m_len);
> + if (!m_push && m_overlap_op_by_pieces)
> + {
> + /* NB: Generate overlapping operations if it is not a stack
> + push since stack push must not overlap. Get the smallest
> + integer mode for M_LEN bytes. */
> + mode = smallest_int_mode_for_size (m_len * BITS_PER_UNIT);
> + mode = get_usable_mode (mode, GET_MODE_SIZE (mode));
> + int gap = GET_MODE_SIZE (mode) - m_len;
> + if (gap > 0)
> + {
> + /* If size of MODE > M_LEN, generate the last operation
> + in MODE for the remaining bytes with ovelapping memory
> + from the previois operation. */
> + if (m_reverse)
> + m_offset += gap;
> + else
> + m_offset -= gap;
> + m_len += gap;
> + }
> + }
> + else
> + {
> + /* NB: widest_int_mode_for_size checks SIZE > 1. */
> + mode = widest_int_mode_for_size (size);
> + mode = get_usable_mode (mode, m_len);
> + }
> }
> while (1);
>
> @@ -1190,6 +1245,12 @@ op_by_pieces_d::run ()
> /* Derived class from op_by_pieces_d, providing support for block move
> operations. */
>
> +#ifdef PUSH_ROUNDING
> +#define PUSHG_P(to) ((to) == nullptr)
> +#else
> +#define PUSHG_P(to) false
> +#endif
> +
> class move_by_pieces_d : public op_by_pieces_d
> {
> insn_gen_fn m_gen_fun;
> @@ -1199,7 +1260,8 @@ class move_by_pieces_d : public op_by_pieces_d
> public:
> move_by_pieces_d (rtx to, rtx from, unsigned HOST_WIDE_INT len,
> unsigned int align)
> - : op_by_pieces_d (to, false, from, true, NULL, NULL, len, align)
> + : op_by_pieces_d (to, false, from, true, NULL, NULL, len, align,
> + PUSHG_P (to))
> {
> }
> rtx finish_retmode (memop_ret);
> @@ -1294,7 +1356,8 @@ class store_by_pieces_d : public op_by_pieces_d
> public:
> store_by_pieces_d (rtx to, by_pieces_constfn cfn, void *cfn_data,
> unsigned HOST_WIDE_INT len, unsigned int align)
> - : op_by_pieces_d (to, false, NULL_RTX, true, cfn, cfn_data, len, align)
> + : op_by_pieces_d (to, false, NULL_RTX, true, cfn, cfn_data, len,
> + align, false)
> {
> }
> rtx finish_retmode (memop_ret);
> @@ -1349,7 +1412,7 @@ store_by_pieces_d::finish_retmode (memop_ret retmode)
>
> int
> can_store_by_pieces (unsigned HOST_WIDE_INT len,
> - rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode),
> + by_pieces_constfn constfun,
> void *constfundata, unsigned int align, bool memsetp)
> {
> unsigned HOST_WIDE_INT l;
> @@ -1396,7 +1459,7 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
> if (reverse)
> offset -= size;
>
> - cst = (*constfun) (constfundata, offset, mode);
> + cst = (*constfun) (constfundata, nullptr, offset, mode);
> if (!targetm.legitimate_constant_p (mode, cst))
> return 0;
>
> @@ -1426,7 +1489,7 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
>
> rtx
> store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
> - rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode),
> + by_pieces_constfn constfun,
> void *constfundata, unsigned int align, bool memsetp,
> memop_ret retmode)
> {
> @@ -1454,7 +1517,7 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
> Return const0_rtx unconditionally. */
>
> static rtx
> -clear_by_pieces_1 (void *, HOST_WIDE_INT, scalar_int_mode)
> +clear_by_pieces_1 (void *, void *, HOST_WIDE_INT, scalar_int_mode)
> {
> return const0_rtx;
> }
> @@ -1490,7 +1553,8 @@ class compare_by_pieces_d : public op_by_pieces_d
> compare_by_pieces_d (rtx op0, rtx op1, by_pieces_constfn op1_cfn,
> void *op1_cfn_data, HOST_WIDE_INT len, int align,
> rtx_code_label *fail_label)
> - : op_by_pieces_d (op0, true, op1, true, op1_cfn, op1_cfn_data, len, align)
> + : op_by_pieces_d (op0, true, op1, true, op1_cfn, op1_cfn_data, len,
> + align, false)
> {
> m_fail_label = fail_label;
> }
> @@ -5676,7 +5740,8 @@ emit_storent_insn (rtx to, rtx from)
> /* Helper function for store_expr storing of STRING_CST. */
>
> static rtx
> -string_cst_read_str (void *data, HOST_WIDE_INT offset, scalar_int_mode mode)
> +string_cst_read_str (void *data, void *, HOST_WIDE_INT offset,
> + scalar_int_mode mode)
> {
> tree str = (tree) data;
>
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 1f0177a4cfa..9a2736f69fa 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -107,7 +107,15 @@ enum block_op_methods
> BLOCK_OP_NO_LIBCALL_RET
> };
>
> -typedef rtx (*by_pieces_constfn) (void *, HOST_WIDE_INT, scalar_int_mode);
> +typedef rtx (*by_pieces_constfn) (void *, void *, HOST_WIDE_INT,
> + scalar_int_mode);
> +
> +/* The second pointer passed to by_pieces_constfn. */
> +struct by_pieces_prev
> +{
> + rtx data;
> + scalar_int_mode mode;
> +};
>
> extern rtx emit_block_move (rtx, rtx, rtx, enum block_op_methods);
> extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
> diff --git a/gcc/target.def b/gcc/target.def
> index d7b94bd8e5d..db64101dff6 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -3630,6 +3630,15 @@ move would be greater than that of a library call.",
> enum by_pieces_operation op, bool speed_p),
> default_use_by_pieces_infrastructure_p)
>
> +DEFHOOK
> +(overlap_op_by_pieces_p,
> + "This target hook should return true if when the @code{by_pieces}\n\
> +infrastructure is used, an offset adjusted unaligned memory operation\n\
> +in the smallest integer mode for the last piece operation of a memory\n\
> +region can be generated to avoid doing more than one smaller operations.",
> + bool, (void),
> + hook_bool_void_false)
> +
> DEFHOOK
> (compare_by_pieces_branch_ratio,
> "When expanding a block comparison in MODE, gcc can try to reduce the\n\
> diff --git a/gcc/testsuite/g++.dg/pr90773-1.h b/gcc/testsuite/g++.dg/pr90773-1.h
> new file mode 100644
> index 00000000000..abdb78b078b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr90773-1.h
> @@ -0,0 +1,14 @@
> +class fixed_wide_int_storage {
> +public:
> + long val[10];
> + int len;
> + fixed_wide_int_storage ()
> + {
> + len = sizeof (val) / sizeof (val[0]);
> + for (int i = 0; i < len; i++)
> + val[i] = i;
> + }
> +};
> +
> +extern void foo (fixed_wide_int_storage);
> +extern int record_increment(void);
> diff --git a/gcc/testsuite/g++.dg/pr90773-1a.C b/gcc/testsuite/g++.dg/pr90773-1a.C
> new file mode 100644
> index 00000000000..3ab8d929f74
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr90773-1a.C
> @@ -0,0 +1,13 @@
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +// { dg-additional-options "-mno-avx -msse2 -mtune=skylake" { target { i?86-*-* x86_64-*-* } } }
> +
> +#include "pr90773-1.h"
> +
> +int
> +record_increment(void)
> +{
> + fixed_wide_int_storage x;
> + foo (x);
> + return 0;
> +}
> diff --git a/gcc/testsuite/g++.dg/pr90773-1b.C b/gcc/testsuite/g++.dg/pr90773-1b.C
> new file mode 100644
> index 00000000000..9713b2dd612
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr90773-1b.C
> @@ -0,0 +1,5 @@
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +// { dg-additional-options "-mno-avx512f -march=skylake" { target { i?86-*-* x86_64-*-* } } }
> +
> +#include "pr90773-1a.C"
> diff --git a/gcc/testsuite/g++.dg/pr90773-1c.C b/gcc/testsuite/g++.dg/pr90773-1c.C
> new file mode 100644
> index 00000000000..699357a88dc
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr90773-1c.C
> @@ -0,0 +1,5 @@
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +// { dg-additional-options "-march=skylake-avx512" { target { i?86-*-* x86_64-*-* } } }
> +
> +#include "pr90773-1a.C"
> diff --git a/gcc/testsuite/g++.dg/pr90773-1d.C b/gcc/testsuite/g++.dg/pr90773-1d.C
> new file mode 100644
> index 00000000000..bf9d8543c1b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr90773-1d.C
> @@ -0,0 +1,19 @@
> +// { dg-do run }
> +// { dg-options "-O2" }
> +// { dg-additional-options "-march=native" { target { i?86-*-* x86_64-*-* } } }
> +// { dg-additional-sources "pr90773-1a.C" }
> +
> +#include "pr90773-1.h"
> +
> +void
> +foo (fixed_wide_int_storage x)
> +{
> + for (int i = 0; i < x.len; i++)
> + if (x.val[i] != i)
> + __builtin_abort ();
> +}
> +
> +int main ()
> +{
> + return record_increment ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> new file mode 100644
> index 00000000000..1d9f282dc0d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic" } */
> +
> +extern char *dst, *src;
> +
> +void
> +foo (void)
> +{
> + __builtin_memcpy (dst, src, 15);
> +}
> +
> +/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-10.c b/gcc/testsuite/gcc.target/i386/pr90773-10.c
> new file mode 100644
> index 00000000000..9ad725e4880
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-10.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic" } */
> +
> +extern char *dst;
> +
> +void
> +foo (int c)
> +{
> + __builtin_memset (dst, c, 5);
> +}
> +
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-11.c b/gcc/testsuite/gcc.target/i386/pr90773-11.c
> new file mode 100644
> index 00000000000..1734c03a2eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-11.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic" } */
> +
> +extern char *dst;
> +
> +void
> +foo (int c)
> +{
> + __builtin_memset (dst, c, 6);
> +}
> +
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-12.c b/gcc/testsuite/gcc.target/i386/pr90773-12.c
> new file mode 100644
> index 00000000000..e45840a5b8d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-12.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
> +
> +void
> +foo (char *dst, char *src)
> +{
> + __builtin_memcpy (dst, src, 255);
> +}
> +
> +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 16 } } */
> +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-13.c b/gcc/testsuite/gcc.target/i386/pr90773-13.c
> new file mode 100644
> index 00000000000..4d5ae8d1086
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-13.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
> +
> +void
> +foo (char *dst)
> +{
> + __builtin_memset (dst, 0, 255);
> +}
> +
> +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \[0-9\]*\\(%\[\^,\]+\\)" 16 } } */
> +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> new file mode 100644
> index 00000000000..6364916ecac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> +
> +extern char *dst;
> +
> +void
> +foo (void)
> +{
> + __builtin_memset (dst, 1, 20);
> +}
> +
> +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$16843009, 16\\(%\[\^,\]+\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-2.c b/gcc/testsuite/gcc.target/i386/pr90773-2.c
> new file mode 100644
> index 00000000000..64495751b46
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic" } */
> +/* { dg-additional-options "-mno-avx -msse2" { target { ! ia32 } } } */
> +/* { dg-additional-options "-mno-sse" { target ia32 } } */
> +
> +extern char *dst, *src;
> +
> +void
> +foo (void)
> +{
> + __builtin_memcpy (dst, src, 19);
> +}
> +
> +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+12\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-3.c b/gcc/testsuite/gcc.target/i386/pr90773-3.c
> new file mode 100644
> index 00000000000..84747c94652
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-3.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic" } */
> +/* { dg-additional-options "-mno-avx -msse2" { target { ! ia32 } } } */
> +/* { dg-additional-options "-mno-sse" { target ia32 } } */
> +
> +extern char *dst, *src;
> +
> +void
> +foo (void)
> +{
> + __builtin_memcpy (dst, src, 31);
> +}
> +
> +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+12\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+16\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+20\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+24\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+27\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> new file mode 100644
> index 00000000000..ec0bc0100ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> +
> +extern char *dst;
> +
> +void
> +foo (void)
> +{
> + __builtin_memset (dst, 0, 31);
> +}
> +
> +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, 15\\(%\[\^,\]+\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-5.c b/gcc/testsuite/gcc.target/i386/pr90773-5.c
> new file mode 100644
> index 00000000000..49d03ef2403
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-5.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> +
> +extern char *dst;
> +
> +void
> +foo (void)
> +{
> + __builtin_memset (dst, 0, 21);
> +}
> +
> +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movq\[\\t \]+\\\$0+, 13\\(%\[\^,\]+\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-6.c b/gcc/testsuite/gcc.target/i386/pr90773-6.c
> new file mode 100644
> index 00000000000..46498f6f50c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-6.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> +
> +void
> +foo (char *dst, char *src)
> +{
> + __builtin_memcpy (dst, src, 255);
> +}
> +
> +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 16 } } */
> +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-7.c b/gcc/testsuite/gcc.target/i386/pr90773-7.c
> new file mode 100644
> index 00000000000..4d5ae8d1086
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-7.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
> +
> +void
> +foo (char *dst)
> +{
> + __builtin_memset (dst, 0, 255);
> +}
> +
> +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \[0-9\]*\\(%\[\^,\]+\\)" 16 } } */
> +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-8.c b/gcc/testsuite/gcc.target/i386/pr90773-8.c
> new file mode 100644
> index 00000000000..0d47845d560
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-8.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic" } */
> +
> +extern char *dst;
> +
> +void
> +foo (void)
> +{
> + __builtin_memset (dst, 0, 5);
> +}
> +
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-9.c b/gcc/testsuite/gcc.target/i386/pr90773-9.c
> new file mode 100644
> index 00000000000..ab5ea451f30
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-9.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic" } */
> +
> +extern char *dst;
> +
> +void
> +foo (void)
> +{
> + __builtin_memset (dst, 0, 6);
> +}
> +
> +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-04-29 11:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 1:14 [PATCH v3 0/2] Generate offset adjusted operation for op_by_pieces operations H.J. Lu
2021-04-27 1:14 ` [PATCH v3 1/2] op_by_pieces_d::run: Change a while loop to a do-while loop H.J. Lu
2021-04-27 13:05 ` Richard Biener
2021-04-27 1:14 ` [PATCH v3 2/2] Generate offset adjusted operation for op_by_pieces operations H.J. Lu
2021-04-29 11:08 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).