public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations
@ 2021-07-30 21:32 H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 01/10] x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX H.J. Lu
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Changes in the v6 patches:

1. No need to add TARGET_GEN_MEMSET_SCRATCH_RTX nor change the memset
expanders since they have been checked into master branch.

Changes in the v5 patches:

1. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.
2. Use vec_duplicate, instead of adding TARGET_READ_MEMSET_VALUE and
TARGET_GEN_MEMSET_VALUE, to expand memset if available.

Changes in the v4 patches:

1. Define x86 MAX_MOVE_MAX to 64, which is the constant maximum number
of bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define x86 MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
bytes we can move from memory to memory in one reasonably fast instruction.
The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
must be a constant, independent of compiler options, since it is used in
reload.h to define struct target_reload and MOVE_MAX can vary, depending
on compiler options.

Changes in the v3 patches:

1. Split the TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE changes
into the generic part and the x86 part.


1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.
2. x86: Avoid stack realignment when copying data
3. x86: Remov MAX_BITSIZE_MODE_ANY_INT.  Only x86 backend defines it.
4. x86: Use TImode/OImode/XImode integers for piecewise move and store.
5. x86: Add tests for TImode/OImode/XImode for piecewise move and store.
6. x86: Adjust existing tests.

On x86-64, SPEC CPU 2017 performance impact is neutral.  Glibc code size
differences with -O2 build are:

             Before         After
libc.so     1906572        1906444

Some code sequence differences in libc.so are:

<svcudp_bufcreate@GLIBC_2.2.5>:
	...
	jne    <svcudp_bufcreate@GLIBC_2.2.5+0x318>	      |		jne    <svcudp_bufcreate@GLIBC_2.2.5+0x2a8>
	test   %r15,%r15						test   %r15,%r15
	je     <svcudp_bufcreate@GLIBC_2.2.5+0x318>	      |		je     <svcudp_bufcreate@GLIBC_2.2.5+0x2a8>
	mov    %r13d,(%r14)						mov    %r13d,(%r14)
	lea    0x10(%r14),%rdi						lea    0x10(%r14),%rdi
	mov    $0x1,%ecx						mov    $0x1,%ecx
	mov    %r13d,%edx						mov    %r13d,%edx
	mov    %r15,0x40(%r12)						mov    %r15,0x40(%r12)
	mov    %r15,%rsi						mov    %r15,%rsi
	call   <xdrmem_create@GLIBC_2.2.5>				call   <xdrmem_create@GLIBC_2.2.5>
	lea    0xa2f9b(%rip),%rax        # <svcudp_op>	      |		lea    0xa2fab(%rip),%rax        # <svcudp_op>
	xor    %esi,%esi						xor    %esi,%esi
	mov    %ebp,%edi						mov    %ebp,%edi
	mov    %rax,0x8(%r12)						mov    %rax,0x8(%r12)
	movzwl 0x12(%rsp),%eax						movzwl 0x12(%rsp),%eax
	mov    $0x8,%edx				      <
	lea    0xc(%rsp),%rcx						lea    0xc(%rsp),%rcx
	mov    %r14,0x48(%r12)				      <
	add    $0x40,%r14				      <
	mov    $0x4,%r8d						mov    $0x4,%r8d
							      >		movq   $0x0,0x1d0(%r14)
							      >		mov    $0x8,%edx
	rol    $0x8,%ax							rol    $0x8,%ax
	mov    %ebp,(%r12)				      |		mov    %r14,0x48(%r12)
	movq   $0x0,0x190(%r14)				      |		add    $0x40,%r14
	mov    %ax,0x4(%r12)				      <
	mov    %r14,0x30(%r12)						mov    %r14,0x30(%r12)
							      >		mov    %ax,0x4(%r12)
							      >		mov    %ebp,(%r12)
	movl   $0x1,0xc(%rsp)						movl   $0x1,0xc(%rsp)
	call   <setsockopt>						call   <setsockopt>
	mov    %r12,%rdi						mov    %r12,%rdi
	movabs $0x101010101010101,%rdx			      <
	test   %eax,%eax						test   %eax,%eax
	mov    $0xff,%eax						mov    $0xff,%eax
	cmove  %eax,%ebx						cmove  %eax,%ebx
	movzbl %bl,%eax					      |		movd   %ebx,%xmm0
	mov    %ebx,0xc(%rsp)						mov    %ebx,0xc(%rsp)
	mov    %rax,%rsi				      |		punpcklbw %xmm0,%xmm0
	imul   %rdx,%rsi				      |		punpcklwd %xmm0,%xmm0
	mul    %rdx					      |		pshufd $0x0,%xmm0,%xmm0
	add    %rsi,%rdx				      |		movups %xmm0,0x50(%r12)
	mov    %rax,0x50(%r12)				      |		movups %xmm0,0x60(%r12)
	mov    %rdx,0x58(%r12)				      |		movups %xmm0,0x70(%r12)
	mov    %rax,0x60(%r12)				      |		movups %xmm0,0x80(%r12)
	mov    %rdx,0x68(%r12)				      |		movups %xmm0,0x90(%r12)
	mov    %rax,0x70(%r12)				      |		movups %xmm0,0xa0(%r12)
	mov    %rdx,0x78(%r12)				      |		movups %xmm0,0xb0(%r12)
	mov    %rax,0x80(%r12)				      |		movups %xmm0,0xc0(%r12)
	mov    %rdx,0x88(%r12)				      |		movups %xmm0,0xd0(%r12)
	mov    %rax,0x90(%r12)				      |		movups %xmm0,0xe0(%r12)
	mov    %rdx,0x98(%r12)				      |		movups %xmm0,0xf0(%r12)
	mov    %rax,0xa0(%r12)				      |		movups %xmm0,0x100(%r12)
	mov    %rdx,0xa8(%r12)				      |		movups %xmm0,0x110(%r12)
	mov    %rax,0xb0(%r12)				      |		movups %xmm0,0x120(%r12)
	mov    %rdx,0xb8(%r12)				      |		movups %xmm0,0x130(%r12)
	mov    %rax,0xc0(%r12)				      |		movups %xmm0,0x140(%r12)
	mov    %rdx,0xc8(%r12)				      <
	mov    %rax,0xd0(%r12)				      <
	mov    %rdx,0xd8(%r12)				      <
	mov    %rax,0xe0(%r12)				      <
	mov    %rdx,0xe8(%r12)				      <
	mov    %rax,0xf0(%r12)				      <
	mov    %rdx,0xf8(%r12)				      <
	mov    %rax,0x100(%r12)				      <
	mov    %rdx,0x108(%r12)				      <
	mov    %rax,0x110(%r12)				      <
	mov    %rdx,0x118(%r12)				      <
	mov    %rax,0x120(%r12)				      <
	mov    %rdx,0x128(%r12)				      <
	mov    %rax,0x130(%r12)				      <
	mov    %rdx,0x138(%r12)				      <
	mov    %rax,0x140(%r12)				      <
	mov    %rdx,0x148(%r12)				      <
	call   <xprt_register@GLIBC_2.2.5>				call   <xprt_register@GLIBC_2.2.5>
	add    $0x28,%rsp						add    $0x28,%rsp
	mov    %r12,%rax						mov    %r12,%rax
	pop    %rbx							pop    %rbx
	pop    %rbp							pop    %rbp
	pop    %r12							pop    %r12
	pop    %r13							pop    %r13
	pop    %r14							pop    %r14
	pop    %r15							pop    %r15
	ret    								ret    

H.J. Lu (10):
  x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX
  x86: Avoid stack realignment when copying data
  x86: Update piecewise move and store
  x86: Add AVX2 tests for PR middle-end/90773
  x86: Add tests for piecewise move and store
  x86: Also pass -mno-avx to pr72839.c
  x86: Also pass -mno-avx to cold-attribute-1.c
  x86: Also pass -mno-avx to sw-1.c for ia32
  x86: Update gcc.target/i386/incoming-11.c
  x86: Also pass -mno-sse to vect8-ret.c

 gcc/config/i386/i386-expand.c                 |  4 +-
 gcc/config/i386/i386.c                        | 27 +++++++++++--
 gcc/config/i386/i386.h                        | 40 +++++++++++++++----
 .../gcc.target/i386/cold-attribute-1.c        |  2 +-
 gcc/testsuite/gcc.target/i386/eh_return-1.c   | 26 ++++++++++++
 gcc/testsuite/gcc.target/i386/incoming-11.c   |  2 +-
 .../gcc.target/i386/pieces-memcpy-10.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memcpy-11.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memcpy-12.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memcpy-13.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memcpy-14.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memcpy-15.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memcpy-16.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memcpy-7.c         | 15 +++++++
 .../gcc.target/i386/pieces-memcpy-8.c         | 14 +++++++
 .../gcc.target/i386/pieces-memcpy-9.c         | 14 +++++++
 .../gcc.target/i386/pieces-memset-1.c         | 16 ++++++++
 .../gcc.target/i386/pieces-memset-10.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-11.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-12.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-13.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-14.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-15.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-16.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-17.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-18.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-19.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-2.c         | 12 ++++++
 .../gcc.target/i386/pieces-memset-20.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-21.c        | 18 +++++++++
 .../gcc.target/i386/pieces-memset-22.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-23.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-24.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-25.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-26.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-27.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-28.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-29.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-3.c         | 18 +++++++++
 .../gcc.target/i386/pieces-memset-30.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-31.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-32.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-33.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-34.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-35.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-36.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-37.c        | 15 +++++++
 .../gcc.target/i386/pieces-memset-38.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-39.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-4.c         | 16 ++++++++
 .../gcc.target/i386/pieces-memset-40.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-41.c        | 16 ++++++++
 .../gcc.target/i386/pieces-memset-42.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-43.c        | 17 ++++++++
 .../gcc.target/i386/pieces-memset-44.c        | 18 +++++++++
 .../gcc.target/i386/pieces-memset-5.c         | 12 ++++++
 .../gcc.target/i386/pieces-memset-6.c         | 16 ++++++++
 .../gcc.target/i386/pieces-memset-7.c         | 16 ++++++++
 .../gcc.target/i386/pieces-memset-8.c         | 16 ++++++++
 .../gcc.target/i386/pieces-memset-9.c         | 16 ++++++++
 gcc/testsuite/gcc.target/i386/pr100865-1.c    |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-2.c    |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-3.c    |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  6 +--
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |  8 ++--
 gcc/testsuite/gcc.target/i386/pr72839.c       |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-1.c     | 10 ++---
 gcc/testsuite/gcc.target/i386/pr90773-14.c    |  4 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c    | 14 +++++++
 gcc/testsuite/gcc.target/i386/pr90773-16.c    | 14 +++++++
 gcc/testsuite/gcc.target/i386/pr90773-17.c    | 14 +++++++
 gcc/testsuite/gcc.target/i386/pr90773-18.c    | 15 +++++++
 gcc/testsuite/gcc.target/i386/pr90773-19.c    | 14 +++++++
 gcc/testsuite/gcc.target/i386/pr90773-20.c    | 13 ++++++
 gcc/testsuite/gcc.target/i386/pr90773-21.c    | 13 ++++++
 gcc/testsuite/gcc.target/i386/pr90773-22.c    | 13 ++++++
 gcc/testsuite/gcc.target/i386/pr90773-23.c    | 13 ++++++
 gcc/testsuite/gcc.target/i386/pr90773-24.c    |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-25.c    |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-26.c    | 21 ++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-4.c     |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-5.c     |  2 +-
 gcc/testsuite/gcc.target/i386/sw-1.c          |  1 +
 gcc/testsuite/gcc.target/i386/vect8-ret.c     |  2 +-
 86 files changed, 1135 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-24.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-25.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-26.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-27.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-28.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-29.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-30.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-31.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-32.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-33.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-34.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-35.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-36.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-37.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-38.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-39.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-40.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-41.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-42.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-43.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-44.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-26.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 01/10] x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 02/10] x86: Avoid stack realignment when copying data H.J. Lu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to
return a scratch SSE register for memset.

gcc/

	PR middle-end/90773
	* config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New.

gcc/testsuite/

	PR middle-end/90773
	* gcc.target/i386/pr90773-5.c: Updated to expect XMM register.
	* gcc.target/i386/pr90773-15.c: New test.
	* gcc.target/i386/pr90773-16.c: Likewise.
	* gcc.target/i386/pr90773-17.c: Likewise.
	* gcc.target/i386/pr90773-18.c: Likewise.
	* gcc.target/i386/pr90773-19.c: Likewise.
---
 gcc/config/i386/i386.c                     |  6 +++++-
 gcc/testsuite/gcc.target/i386/pr90773-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-16.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-17.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-18.c | 15 +++++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-19.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-5.c  |  2 +-
 8 files changed, 78 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a0285e659ad..5d20ca2067f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23313,7 +23313,8 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
     }
 }
 
-/* Return a scratch register in MODE for vector load and store.  */
+/* Implement the TARGET_GEN_MEMSET_SCRATCH_RTX hook.  Return a scratch
+   register in MODE for vector load and store.  */
 
 rtx
 ix86_gen_scratch_sse_rtx (machine_mode mode)
@@ -24232,6 +24233,9 @@ static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
 #undef TARGET_LIBC_HAS_FAST_FUNCTION
 #define TARGET_LIBC_HAS_FAST_FUNCTION ix86_libc_has_fast_function
 
+#undef TARGET_GEN_MEMSET_SCRATCH_RTX
+#define TARGET_GEN_MEMSET_SCRATCH_RTX ix86_gen_scratch_sse_rtx
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
index 6364916ecac..e5c19f49cf5 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-14.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
@@ -10,4 +10,4 @@ foo (void)
 }
 
 /* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$16843009, 16\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movd\[\\t \]+%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c
new file mode 100644
index 00000000000..185ea60e1d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 17);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c
new file mode 100644
index 00000000000..d820cc318c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 17);
+}
+
+/* { dg-final { scan-assembler-times "(?:vpcmpeqd|vpternlogd)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+\\\$-1, 16\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c
new file mode 100644
index 00000000000..f6f179e9b5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 12, 19);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovd\[\\t \]+%xmm\[0-9\]+, 15\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-18.c b/gcc/testsuite/gcc.target/i386/pr90773-18.c
new file mode 100644
index 00000000000..b0687abbe01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-18.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 12, 9);
+}
+
+/* { dg-final { scan-assembler-times "movabsq\[\\t \]+\\\$868082074056920076, %r" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, \\(%\[\^,\]+\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, 4\\(%\[\^,\]+\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+\\\$12, 8\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-19.c b/gcc/testsuite/gcc.target/i386/pr90773-19.c
new file mode 100644
index 00000000000..8aa5540bacc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-19.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 12, 9);
+}
+
+/* { dg-final { scan-assembler-times "movabsq\[\\t \]+\\\$868082074056920076, %r" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, \\(%\[\^,\]+\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, 4\\(%\[\^,\]+\\)" 1 { target ia32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-5.c b/gcc/testsuite/gcc.target/i386/pr90773-5.c
index 49d03ef2403..27185a236a7 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-5.c
@@ -10,4 +10,4 @@ foo (void)
 }
 
 /* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
-/* { dg-final { scan-assembler-times "movq\[\\t \]+\\\$0+, 13\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movq\[\\t \]+%xmm\[0-9\]+, 13\\(%\[\^,\]+\\)" 1 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 02/10] x86: Avoid stack realignment when copying data
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 01/10] x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 03/10] x86: Update piecewise move and store H.J. Lu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

	* config/i386/i386-expand.c (ix86_expand_vector_move): Call
	ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
	data from one memory location to another.

gcc/testsuite/

	* gcc.target/i386/eh_return-1.c: New test.
---
 gcc/config/i386/i386-expand.c               |  4 +++-
 gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 +++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 896bd685b1f..1d469bf7221 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -625,7 +625,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
       && !register_operand (op0, mode)
       && !register_operand (op1, mode))
     {
-      emit_move_insn (op0, force_reg (GET_MODE (op0), op1));
+      rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
+      emit_move_insn (tmp, op1);
+      emit_move_insn (op0, tmp);
       return;
     }
 
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c b/gcc/testsuite/gcc.target/i386/eh_return-1.c
new file mode 100644
index 00000000000..671ba635e88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell -mno-avx512f" } */
+
+struct _Unwind_Context
+{
+  void *ra;
+  char array[48];
+};
+
+extern long uw_install_context_1 (struct _Unwind_Context *);
+
+void
+_Unwind_RaiseException (void)
+{
+  struct _Unwind_Context this_context, cur_context;
+  long offset = uw_install_context_1 (&this_context);
+  __builtin_memcpy (&this_context, &cur_context,
+		    sizeof (struct _Unwind_Context));
+  void *handler = __builtin_frob_return_addr ((&cur_context)->ra);
+  uw_install_context_1 (&cur_context);
+  __builtin_eh_return (offset, handler);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 03/10] x86: Update piecewise move and store
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 01/10] x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 02/10] x86: Avoid stack realignment when copying data H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-08-02 11:20   ` Uros Bizjak
  2021-07-30 21:32 ` [PATCH v6 04/10] x86: Add AVX2 tests for PR middle-end/90773 H.J. Lu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

We can use TImode/OImode/XImode integers for piecewise move and store.

1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
bytes we can move from memory to memory in one reasonably fast instruction.
The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
must be a constant, independent of compiler options, since it is used in
reload.h to define struct target_reload and MOVE_MAX can vary, depending
on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

	* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
	check stack_realign_needed for stack realignment.
	(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
	than the largest integer supported by vector register.
	* config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
	(MOVE_MAX_PIECES): Set to bytes of the largest integer supported
	by vector register.
	(MOVE_MAX): Defined to MOVE_MAX_PIECES.
	(STORE_MAX_PIECES): New.

gcc/testsuite/

	* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
	* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
	* gcc.target/i386/pr90773-15.c: Likewise.
	* gcc.target/i386/pr90773-16.c: Likewise.
	* gcc.target/i386/pr90773-17.c: Likewise.
	* gcc.target/i386/pr90773-24.c: Likewise.
	* gcc.target/i386/pr90773-25.c: Likewise.
	* gcc.target/i386/pr100865-1.c: Likewise.
	* gcc.target/i386/pr100865-2.c: Likewise.
	* gcc.target/i386/pr100865-3.c: Likewise.
	* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
	XMM movd to store 4 bytes.
	* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
	YMM registers.
	* gcc.target/i386/pr100865-4b.c: Likewise.
	* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
	* gcc.target/i386/pr100865-10b.c: Likewise.
---
 gcc/config/i386/i386.c                       | 21 ++++++++--
 gcc/config/i386/i386.h                       | 40 ++++++++++++++++----
 gcc/testsuite/gcc.target/i386/pr100865-1.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-10a.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-10b.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4a.c  |  6 +--
 gcc/testsuite/gcc.target/i386/pr100865-4b.c  |  8 ++--
 gcc/testsuite/gcc.target/i386/pr90773-1.c    | 10 ++---
 gcc/testsuite/gcc.target/i386/pr90773-14.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c   |  6 +--
 gcc/testsuite/gcc.target/i386/pr90773-16.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-24.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-25.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-4.c    |  2 +-
 17 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5d20ca2067f..842eb0e6786 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7953,8 +7953,17 @@ ix86_finalize_stack_frame_flags (void)
      assumed stack realignment might be needed or -fno-omit-frame-pointer
      is used, but in the end nothing that needed the stack alignment had
      been spilled nor stack access, clear frame_pointer_needed and say we
-     don't need stack realignment.  */
-  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
+     don't need stack realignment.
+
+     When vector register is used for piecewise move and store, we don't
+     increase stack_alignment_needed as there is no register spill for
+     piecewise move and store.  Since stack_realign_needed is set to true
+     by checking stack_alignment_estimated which is updated by pseudo
+     vector register usage, we also need to check stack_realign_needed to
+     eliminate frame pointer.  */
+  if ((stack_realign
+       || (!flag_omit_frame_pointer && optimize)
+       || crtl->stack_realign_needed)
       && frame_pointer_needed
       && crtl->is_leaf
       && crtl->sp_is_unchanging
@@ -10418,7 +10427,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
 	  /* FALLTHRU */
 	case E_OImode:
 	case E_XImode:
-	  if (!standard_sse_constant_p (x, mode))
+	  if (!standard_sse_constant_p (x, mode)
+	      && GET_MODE_SIZE (TARGET_AVX512F
+				? XImode
+				: (TARGET_AVX
+				   ? OImode
+				   : (TARGET_SSE2
+				      ? TImode : DImode))) < GET_MODE_SIZE (mode))
 	    return false;
 	default:
 	  break;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index d1e1c225990..50418a0cc9b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1757,9 +1757,10 @@ typedef struct ix86_args {
 /* Define this as 1 if `char' should by default be signed; else as 0.  */
 #define DEFAULT_SIGNED_CHAR 1
 
-/* Max number of bytes we can move from memory to memory
-   in one reasonably fast instruction.  */
-#define MOVE_MAX 16
+/* The constant maximum number of bytes that a single instruction can
+   move quickly between memory and registers or between two memory
+   locations.  */
+#define MAX_MOVE_MAX 64
 
 /* MOVE_MAX_PIECES is the number of bytes at a time which we can
    move efficiently, as opposed to  MOVE_MAX which is the maximum
@@ -1770,11 +1771,34 @@ typedef struct ix86_args {
    widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
    64-bit mode.  */
 #define MOVE_MAX_PIECES \
-  ((TARGET_64BIT \
-    && TARGET_SSE2 \
-    && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
-    && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
-   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+       && !TARGET_PREFER_AVX128 \
+       && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
+       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
+      ? 32 \
+      : ((TARGET_SSE2 \
+	  && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
+	  && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
+	 ? 16 : UNITS_PER_WORD)))
+
+/* Max number of bytes we can move from memory to memory in one
+   reasonably fast instruction.  */
+#define MOVE_MAX MOVE_MAX_PIECES
+
+/* STORE_MAX_PIECES is the number of bytes at a time that we can
+   store efficiently.  */
+#define STORE_MAX_PIECES \
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+       && !TARGET_PREFER_AVX128 \
+       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
+      ? 32 \
+      : ((TARGET_SSE2 \
+	  && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
+	 ? 16 : UNITS_PER_WORD)))
 
 /* If a memory-to-memory move would take MOVE_RATIO or more simple
    move-instruction pairs, we will do a cpymem or libcall instead.
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c
index 6c3097fb2a6..949dd5c337a 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
index 7ffc19e56a8..98b6dfb16f3 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-10a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
@@ -29,5 +29,5 @@ foo (void)
     array[i] = MK_CONST128_BROADCAST (0x1f);
 }
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
index edf52765c60..e5616d8d258 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-10b.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
@@ -3,5 +3,5 @@
 
 #include "pr100865-10a.c"
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c
index 17efe2d72a3..f3ea7753abe 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c
index 007e79f91b0..714c43e12c9 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
index f55883598f9..365487337ae 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-4a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake" } */
 
 extern char array[64];
@@ -11,6 +11,6 @@ foo (void)
     array[i] = -45;
 }
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 2 } } */
 /* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
index 1e50dc842bc..8e8a7eaaaff 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-4b.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
@@ -1,9 +1,9 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 #include "pr100865-4a.c"
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%xmm\[0-9\]+, " 4 } } */
-/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 2 } } */
+/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */
 /* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
index 1d9f282dc0d..4fd5a40d99d 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune=generic" } */
+/* { dg-options "-O2 -msse2 -mtune=generic" } */
 
 extern char *dst, *src;
 
@@ -9,9 +9,5 @@ foo (void)
   __builtin_memcpy (dst, src, 15);
 }
 
-/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 } } */
+/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
index e5c19f49cf5..96ee5cb08c1 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-14.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c
index 185ea60e1d2..403cdb248a2 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-15.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
@@ -9,6 +9,6 @@ foo (int c)
   __builtin_memset (dst, c, 17);
 }
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%.*, %xmm\[0-9\]+" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
-/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+%.*, 16\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c
index d820cc318c3..bb0aadbc77e 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-16.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c
index f6f179e9b5b..73d5d5abaee 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-17.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c b/gcc/testsuite/gcc.target/i386/pr90773-24.c
index 7b2ea66dcfc..71f1fd8c4df 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-24.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
 
 struct S
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c b/gcc/testsuite/gcc.target/i386/pr90773-25.c
index 57642ea8d2d..ad19a88c883 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-25.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
 
 struct S
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c
index ec0bc0100ae..ee4c04678d1 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
 
 extern char *dst;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 04/10] x86: Add AVX2 tests for PR middle-end/90773
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
                   ` (2 preceding siblings ...)
  2021-07-30 21:32 ` [PATCH v6 03/10] x86: Update piecewise move and store H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 05/10] x86: Add tests for piecewise move and store H.J. Lu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

	PR middle-end/90773
	* gcc.target/i386/pr90773-20.c: New test.
	* gcc.target/i386/pr90773-21.c: Likewise.
	* gcc.target/i386/pr90773-22.c: Likewise.
	* gcc.target/i386/pr90773-23.c: Likewise.
	* gcc.target/i386/pr90773-26.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr90773-26.c | 21 +++++++++++++++++++++
 5 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-26.c

diff --git a/gcc/testsuite/gcc.target/i386/pr90773-20.c b/gcc/testsuite/gcc.target/i386/pr90773-20.c
new file mode 100644
index 00000000000..e61e405f2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-21.c b/gcc/testsuite/gcc.target/i386/pr90773-21.c
new file mode 100644
index 00000000000..16ad17f3cbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-21.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]%.*, 32\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-22.c b/gcc/testsuite/gcc.target/i386/pr90773-22.c
new file mode 100644
index 00000000000..45a8ff65a84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-22.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-23.c b/gcc/testsuite/gcc.target/i386/pr90773-23.c
new file mode 100644
index 00000000000..9256ce10ff0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-23.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-26.c b/gcc/testsuite/gcc.target/i386/pr90773-26.c
new file mode 100644
index 00000000000..b2513c3a9c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-26.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 32\\(%\[\^,\]+\\)" 1 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 05/10] x86: Add tests for piecewise move and store
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
                   ` (3 preceding siblings ...)
  2021-07-30 21:32 ` [PATCH v6 04/10] x86: Add AVX2 tests for PR middle-end/90773 H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 06/10] x86: Also pass -mno-avx to pr72839.c H.J. Lu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

	* gcc.target/i386/pieces-memcpy-10.c: New test.
	* gcc.target/i386/pieces-memcpy-11.c: Likewise.
	* gcc.target/i386/pieces-memcpy-12.c: Likewise.
	* gcc.target/i386/pieces-memcpy-13.c: Likewise.
	* gcc.target/i386/pieces-memcpy-14.c: Likewise.
	* gcc.target/i386/pieces-memcpy-15.c: Likewise.
	* gcc.target/i386/pieces-memcpy-16.c: Likewise.
	* gcc.target/i386/pieces-memcpy-17.c: Likewise.
	* gcc.target/i386/pieces-memcpy-18.c: Likewise.
	* gcc.target/i386/pieces-memcpy-19.c: Likewise.
	* gcc.target/i386/pieces-memset-1.c: Likewise.
	* gcc.target/i386/pieces-memset-2.c: Likewise.
	* gcc.target/i386/pieces-memset-3.c: Likewise.
	* gcc.target/i386/pieces-memset-4.c: Likewise.
	* gcc.target/i386/pieces-memset-5.c: Likewise.
	* gcc.target/i386/pieces-memset-6.c: Likewise.
	* gcc.target/i386/pieces-memset-7.c: Likewise.
	* gcc.target/i386/pieces-memset-8.c: Likewise.
	* gcc.target/i386/pieces-memset-9.c: Likewise.
	* gcc.target/i386/pieces-memset-10.c: Likewise.
	* gcc.target/i386/pieces-memset-11.c: Likewise.
	* gcc.target/i386/pieces-memset-12.c: Likewise.
	* gcc.target/i386/pieces-memset-13.c: Likewise.
	* gcc.target/i386/pieces-memset-14.c: Likewise.
	* gcc.target/i386/pieces-memset-15.c: Likewise.
	* gcc.target/i386/pieces-memset-16.c: Likewise.
	* gcc.target/i386/pieces-memset-17.c: Likewise.
	* gcc.target/i386/pieces-memset-18.c: Likewise.
	* gcc.target/i386/pieces-memset-19.c: Likewise.
	* gcc.target/i386/pieces-memset-20.c: Likewise.
	* gcc.target/i386/pieces-memset-21.c: Likewise.
	* gcc.target/i386/pieces-memset-22.c: Likewise.
	* gcc.target/i386/pieces-memset-23.c: Likewise.
	* gcc.target/i386/pieces-memset-24.c: Likewise.
	* gcc.target/i386/pieces-memset-25.c: Likewise.
	* gcc.target/i386/pieces-memset-26.c: Likewise.
	* gcc.target/i386/pieces-memset-27.c: Likewise.
	* gcc.target/i386/pieces-memset-28.c: Likewise.
	* gcc.target/i386/pieces-memset-29.c: Likewise.
	* gcc.target/i386/pieces-memset-30.c: Likewise.
	* gcc.target/i386/pieces-memset-31.c: Likewise.
	* gcc.target/i386/pieces-memset-32.c: Likewise.
	* gcc.target/i386/pieces-memset-33.c: Likewise.
	* gcc.target/i386/pieces-memset-34.c: Likewise.
	* gcc.target/i386/pieces-memset-35.c: Likewise.
	* gcc.target/i386/pieces-memset-36.c: Likewise.
	* gcc.target/i386/pieces-memset-37.c: Likewise.
	* gcc.target/i386/pieces-memset-38.c: Likewise.
	* gcc.target/i386/pieces-memset-39.c: Likewise.
	* gcc.target/i386/pieces-memset-40.c: Likewise.
	* gcc.target/i386/pieces-memset-41.c: Likewise.
	* gcc.target/i386/pieces-memset-42.c: Likewise.
	* gcc.target/i386/pieces-memset-43.c: Likewise.
	* gcc.target/i386/pieces-memset-44.c: Likewise.
---
 .../gcc.target/i386/pieces-memcpy-10.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memcpy-11.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memcpy-12.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memcpy-13.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memcpy-14.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memcpy-15.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memcpy-16.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memcpy-7.c          | 15 +++++++++++++++
 .../gcc.target/i386/pieces-memcpy-8.c          | 14 ++++++++++++++
 .../gcc.target/i386/pieces-memcpy-9.c          | 14 ++++++++++++++
 .../gcc.target/i386/pieces-memset-1.c          | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-10.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-11.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-12.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-13.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-14.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-15.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-16.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-17.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-18.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-19.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-2.c          | 12 ++++++++++++
 .../gcc.target/i386/pieces-memset-20.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-21.c         | 18 ++++++++++++++++++
 .../gcc.target/i386/pieces-memset-22.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-23.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-24.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-25.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-26.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-27.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-28.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-29.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-3.c          | 18 ++++++++++++++++++
 .../gcc.target/i386/pieces-memset-30.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-31.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-32.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-33.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-34.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-35.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-36.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-37.c         | 15 +++++++++++++++
 .../gcc.target/i386/pieces-memset-38.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-39.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-4.c          | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-40.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-41.c         | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-42.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-43.c         | 17 +++++++++++++++++
 .../gcc.target/i386/pieces-memset-44.c         | 18 ++++++++++++++++++
 .../gcc.target/i386/pieces-memset-5.c          | 12 ++++++++++++
 .../gcc.target/i386/pieces-memset-6.c          | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-7.c          | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-8.c          | 16 ++++++++++++++++
 .../gcc.target/i386/pieces-memset-9.c          | 16 ++++++++++++++++
 54 files changed, 879 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-24.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-25.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-26.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-27.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-28.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-29.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-30.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-31.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-32.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-33.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-34.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-35.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-36.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-37.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-38.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-39.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-40.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-41.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-42.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-43.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-44.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-9.c

diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c
new file mode 100644
index 00000000000..5faee21f9b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c
new file mode 100644
index 00000000000..b8917a7f917
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 64);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c
new file mode 100644
index 00000000000..f1432ebe517
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 64);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c
new file mode 100644
index 00000000000..97e6067fec9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mtune=generic" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 66);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c
new file mode 100644
index 00000000000..7addc4c0a28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 33);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c
new file mode 100644
index 00000000000..695e8c3fa67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c
new file mode 100644
index 00000000000..728eba5ea3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c
new file mode 100644
index 00000000000..3d248d447ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+void
+foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src)
+{
+  __builtin_memcpy (dst, src, 17);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c
new file mode 100644
index 00000000000..c13a2beb2f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */
+
+void
+foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src)
+{
+  __builtin_memcpy (dst, src, 18);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c
new file mode 100644
index 00000000000..238f88b275e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mtune=generic" } */
+
+void
+foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src)
+{
+  __builtin_memcpy (dst, src, 19);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-1.c b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
new file mode 100644
index 00000000000..2b8032684b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 64);
+}
+
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-10.c b/gcc/testsuite/gcc.target/i386/pieces-memset-10.c
new file mode 100644
index 00000000000..a6390d1bd8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-10.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 64);
+}
+
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-11.c b/gcc/testsuite/gcc.target/i386/pieces-memset-11.c
new file mode 100644
index 00000000000..3fb9038b04f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-11.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 64);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-12.c b/gcc/testsuite/gcc.target/i386/pieces-memset-12.c
new file mode 100644
index 00000000000..d9a10bc038e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-12.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 66);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-13.c b/gcc/testsuite/gcc.target/i386/pieces-memset-13.c
new file mode 100644
index 00000000000..7f2cd3f58ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-13.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 33);
+}
+
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-14.c b/gcc/testsuite/gcc.target/i386/pieces-memset-14.c
new file mode 100644
index 00000000000..45ece482464
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-14.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-15.c b/gcc/testsuite/gcc.target/i386/pieces-memset-15.c
new file mode 100644
index 00000000000..2123958f836
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-15.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-16.c b/gcc/testsuite/gcc.target/i386/pieces-memset-16.c
new file mode 100644
index 00000000000..1c5d124cecc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-16.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 17);
+}
+
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-17.c b/gcc/testsuite/gcc.target/i386/pieces-memset-17.c
new file mode 100644
index 00000000000..6cdb33557c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-17.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 17);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-18.c b/gcc/testsuite/gcc.target/i386/pieces-memset-18.c
new file mode 100644
index 00000000000..02f889899d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-18.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 18);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-19.c b/gcc/testsuite/gcc.target/i386/pieces-memset-19.c
new file mode 100644
index 00000000000..7e9cf2e26d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-19.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 64);
+}
+
+/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-2.c b/gcc/testsuite/gcc.target/i386/pieces-memset-2.c
new file mode 100644
index 00000000000..649f344e8f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 64);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-20.c b/gcc/testsuite/gcc.target/i386/pieces-memset-20.c
new file mode 100644
index 00000000000..b8747e669e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-20.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 64);
+}
+
+/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-21.c b/gcc/testsuite/gcc.target/i386/pieces-memset-21.c
new file mode 100644
index 00000000000..d87d084bf2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-21.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512vl -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 66);
+}
+
+/* { dg-final { scan-assembler-times "vpxor(?:d|)\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu(?:64|8)\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* { dg-final { scan-assembler-not "vzeroupper" } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-22.c b/gcc/testsuite/gcc.target/i386/pieces-memset-22.c
new file mode 100644
index 00000000000..5f3c454ef8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-22.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-23.c b/gcc/testsuite/gcc.target/i386/pieces-memset-23.c
new file mode 100644
index 00000000000..a3b4ffc18e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-23.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-24.c b/gcc/testsuite/gcc.target/i386/pieces-memset-24.c
new file mode 100644
index 00000000000..5243f270f16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-24.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-25.c b/gcc/testsuite/gcc.target/i386/pieces-memset-25.c
new file mode 100644
index 00000000000..195ddb635eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-25.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 17);
+}
+
+/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-26.c b/gcc/testsuite/gcc.target/i386/pieces-memset-26.c
new file mode 100644
index 00000000000..13606b2da54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-26.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 17);
+}
+
+/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-27.c b/gcc/testsuite/gcc.target/i386/pieces-memset-27.c
new file mode 100644
index 00000000000..c764f6ffbce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-27.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 17);
+}
+
+/* { dg-final { scan-assembler-times "vpxor(?:d|)\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu(?:64|8|)\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-28.c b/gcc/testsuite/gcc.target/i386/pieces-memset-28.c
new file mode 100644
index 00000000000..83c2d3f0fde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-28.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 64);
+}
+
+/* { dg-final { scan-assembler-times "pcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-29.c b/gcc/testsuite/gcc.target/i386/pieces-memset-29.c
new file mode 100644
index 00000000000..650e6fe66a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-29.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 64);
+}
+
+/* { dg-final { scan-assembler-not "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
new file mode 100644
index 00000000000..2aed6dbc68e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512bw -mno-avx512vl -mavx512f -mtune=intel" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 66);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-30.c b/gcc/testsuite/gcc.target/i386/pieces-memset-30.c
new file mode 100644
index 00000000000..dcec2c700fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-30.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 64);
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-31.c b/gcc/testsuite/gcc.target/i386/pieces-memset-31.c
new file mode 100644
index 00000000000..f7b5d5bfe1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-31.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 66);
+}
+
+/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-32.c b/gcc/testsuite/gcc.target/i386/pieces-memset-32.c
new file mode 100644
index 00000000000..c5ca0bd17ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-32.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 33);
+}
+
+/* { dg-final { scan-assembler-times "pcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-33.c b/gcc/testsuite/gcc.target/i386/pieces-memset-33.c
new file mode 100644
index 00000000000..a87d1b80ae6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-33.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 33);
+}
+
+/* { dg-final { scan-assembler-not "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-34.c b/gcc/testsuite/gcc.target/i386/pieces-memset-34.c
new file mode 100644
index 00000000000..0c2f1ee6049
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-34.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 33);
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-35.c b/gcc/testsuite/gcc.target/i386/pieces-memset-35.c
new file mode 100644
index 00000000000..2b9a4da8dac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-35.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 34);
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-36.c b/gcc/testsuite/gcc.target/i386/pieces-memset-36.c
new file mode 100644
index 00000000000..d1f1263c7b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-36.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 17);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
new file mode 100644
index 00000000000..ec59497b116
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=generic" } */
+
+void
+foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
+{
+  __builtin_memset (dst, x, 66);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-38.c b/gcc/testsuite/gcc.target/i386/pieces-memset-38.c
new file mode 100644
index 00000000000..ed4a24a54fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-38.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=sandybridge" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 33);
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
new file mode 100644
index 00000000000..0ed88b274bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bw -mtune=generic" } */
+
+void
+foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
+{
+  __builtin_memset (dst, x, 66);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* { dg-final { scan-assembler-not "vinserti64x4" } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-4.c b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
new file mode 100644
index 00000000000..9256919bfdf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 33);
+}
+
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-40.c b/gcc/testsuite/gcc.target/i386/pieces-memset-40.c
new file mode 100644
index 00000000000..4eda73ead59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-40.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=sandybridge" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 66);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-41.c b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c
new file mode 100644
index 00000000000..f86b6986da9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-42.c b/gcc/testsuite/gcc.target/i386/pieces-memset-42.c
new file mode 100644
index 00000000000..df0c122aae7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-42.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-43.c b/gcc/testsuite/gcc.target/i386/pieces-memset-43.c
new file mode 100644
index 00000000000..2f2179c2df9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-43.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 33);
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-44.c b/gcc/testsuite/gcc.target/i386/pieces-memset-44.c
new file mode 100644
index 00000000000..ecc31be1a34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-44.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 48);
+}
+
+/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-5.c b/gcc/testsuite/gcc.target/i386/pieces-memset-5.c
new file mode 100644
index 00000000000..3e95db5efef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=haswell" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-6.c b/gcc/testsuite/gcc.target/i386/pieces-memset-6.c
new file mode 100644
index 00000000000..d795663e1e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=intel" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-7.c b/gcc/testsuite/gcc.target/i386/pieces-memset-7.c
new file mode 100644
index 00000000000..fd159869817
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-7.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 17);
+}
+
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-8.c b/gcc/testsuite/gcc.target/i386/pieces-memset-8.c
new file mode 100644
index 00000000000..7df0019ef63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-8.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 17);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-9.c b/gcc/testsuite/gcc.target/i386/pieces-memset-9.c
new file mode 100644
index 00000000000..1ead154fe1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-9.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */
+
+extern char *dst;
+
+void
+foo (int x)
+{
+  __builtin_memset (dst, x, 17);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 06/10] x86: Also pass -mno-avx to pr72839.c
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
                   ` (4 preceding siblings ...)
  2021-07-30 21:32 ` [PATCH v6 05/10] x86: Add tests for piecewise move and store H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 07/10] x86: Also pass -mno-avx to cold-attribute-1.c H.J. Lu
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

	* gcc.target/i386/pr72839.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/pr72839.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr72839.c b/gcc/testsuite/gcc.target/i386/pr72839.c
index ea724f70377..6888d9d0a55 100644
--- a/gcc/testsuite/gcc.target/i386/pr72839.c
+++ b/gcc/testsuite/gcc.target/i386/pr72839.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
-/* { dg-options "-O2 -mtune=lakemont" } */
+/* { dg-options "-O2 -mtune=lakemont -mno-avx" } */
 
 extern char *strcpy (char *, const char *);
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 07/10] x86: Also pass -mno-avx to cold-attribute-1.c
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
                   ` (5 preceding siblings ...)
  2021-07-30 21:32 ` [PATCH v6 06/10] x86: Also pass -mno-avx to pr72839.c H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 08/10] x86: Also pass -mno-avx to sw-1.c for ia32 H.J. Lu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

	* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/cold-attribute-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
index 57666ac60b6..658eb3e25bb 100644
--- a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
+++ b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-avx" } */
 #include <string.h>
 static inline
 __attribute__ ((cold)) void
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 08/10] x86: Also pass -mno-avx to sw-1.c for ia32
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
                   ` (6 preceding siblings ...)
  2021-07-30 21:32 ` [PATCH v6 07/10] x86: Also pass -mno-avx to cold-attribute-1.c H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 09/10] x86: Update gcc.target/i386/incoming-11.c H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 10/10] x86: Also pass -mno-sse to vect8-ret.c H.J. Lu
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

	* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
---
 gcc/testsuite/gcc.target/i386/sw-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/sw-1.c b/gcc/testsuite/gcc.target/i386/sw-1.c
index aec095eda62..a9c89fca4ec 100644
--- a/gcc/testsuite/gcc.target/i386/sw-1.c
+++ b/gcc/testsuite/gcc.target/i386/sw-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune=generic -fshrink-wrap -fdump-rtl-pro_and_epilogue" } */
+/* { dg-additional-options "-mno-avx" { target ia32 } } */
 /* { dg-skip-if "No shrink-wrapping preformed" { x86_64-*-mingw* } } */
 
 #include <string.h>
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 09/10] x86: Update gcc.target/i386/incoming-11.c
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
                   ` (7 preceding siblings ...)
  2021-07-30 21:32 ` [PATCH v6 08/10] x86: Also pass -mno-avx to sw-1.c for ia32 H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  2021-07-30 21:32 ` [PATCH v6 10/10] x86: Also pass -mno-sse to vect8-ret.c H.J. Lu
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Expect no stack realignment since we no longer realign stack when
copying data.

	* gcc.target/i386/incoming-11.c: Expect no stack realignment.
---
 gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/incoming-11.c b/gcc/testsuite/gcc.target/i386/incoming-11.c
index a830c96f7d1..4b822684b88 100644
--- a/gcc/testsuite/gcc.target/i386/incoming-11.c
+++ b/gcc/testsuite/gcc.target/i386/incoming-11.c
@@ -15,4 +15,4 @@ void f()
 	for (i = 0; i < 100; i++) q[i] = 1;
 }
 
-/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
+/* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 10/10] x86: Also pass -mno-sse to vect8-ret.c
  2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
                   ` (8 preceding siblings ...)
  2021-07-30 21:32 ` [PATCH v6 09/10] x86: Update gcc.target/i386/incoming-11.c H.J. Lu
@ 2021-07-30 21:32 ` H.J. Lu
  9 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 2021-07-30 21:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Also pass -mno-sse to vect8-ret.c to disable XMM load/store when running
GCC tests with "-march=x86-64 -m32".

	* gcc.target/i386/vect8-ret.c: Also pass -mno-sse.
---
 gcc/testsuite/gcc.target/i386/vect8-ret.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect8-ret.c b/gcc/testsuite/gcc.target/i386/vect8-ret.c
index 2b2b81ecf7a..6ace07e6e0c 100644
--- a/gcc/testsuite/gcc.target/i386/vect8-ret.c
+++ b/gcc/testsuite/gcc.target/i386/vect8-ret.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ia32 && { ! *-*-vxworks* } } } } */
-/* { dg-options "-mmmx -mvect8-ret-in-mem" } */
+/* { dg-options "-mmmx -mno-sse -mvect8-ret-in-mem" } */
 
 #include <mmintrin.h>
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 03/10] x86: Update piecewise move and store
  2021-07-30 21:32 ` [PATCH v6 03/10] x86: Update piecewise move and store H.J. Lu
@ 2021-08-02 11:20   ` Uros Bizjak
  2021-08-02 14:56     ` [PATCH v7 " H.J. Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Uros Bizjak @ 2021-08-02 11:20 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, liuhongt

On Fri, Jul 30, 2021 at 11:32 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> We can use TImode/OImode/XImode integers for piecewise move and store.
>
> 1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
> bytes that a single instruction can move quickly between memory and
> registers or between two memory locations.
> 2. Define MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
> bytes we can move from memory to memory in one reasonably fast instruction.
> The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
> must be a constant, independent of compiler options, since it is used in
> reload.h to define struct target_reload and MOVE_MAX can vary, depending
> on compiler options.
> 3. When vector register is used for piecewise move and store, we don't
> increase stack_alignment_needed since vector register spill isn't
> required for piecewise move and store.  Since stack_realign_needed is
> set to true by checking stack_alignment_estimated set by pseudo vector
> register usage, we also need to check stack_realign_needed to eliminate
> frame pointer.
>
> gcc/
>
>         * config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
>         check stack_realign_needed for stack realignment.
>         (ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
>         than the largest integer supported by vector register.
>         * config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
>         (MOVE_MAX_PIECES): Set to bytes of the largest integer supported
>         by vector register.
>         (MOVE_MAX): Defined to MOVE_MAX_PIECES.
>         (STORE_MAX_PIECES): New.
>
> gcc/testsuite/
>
>         * gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
>         * gcc.target/i386/pr90773-4.c: Also run for 32-bit.
>         * gcc.target/i386/pr90773-15.c: Likewise.
>         * gcc.target/i386/pr90773-16.c: Likewise.
>         * gcc.target/i386/pr90773-17.c: Likewise.
>         * gcc.target/i386/pr90773-24.c: Likewise.
>         * gcc.target/i386/pr90773-25.c: Likewise.
>         * gcc.target/i386/pr100865-1.c: Likewise.
>         * gcc.target/i386/pr100865-2.c: Likewise.
>         * gcc.target/i386/pr100865-3.c: Likewise.
>         * gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
>         XMM movd to store 4 bytes.
>         * gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
>         YMM registers.
>         * gcc.target/i386/pr100865-4b.c: Likewise.
>         * gcc.target/i386/pr100865-10a.c: Expect YMM registers.
>         * gcc.target/i386/pr100865-10b.c: Likewise.
> ---
>  gcc/config/i386/i386.c                       | 21 ++++++++--
>  gcc/config/i386/i386.h                       | 40 ++++++++++++++++----
>  gcc/testsuite/gcc.target/i386/pr100865-1.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-10a.c |  4 +-
>  gcc/testsuite/gcc.target/i386/pr100865-10b.c |  4 +-
>  gcc/testsuite/gcc.target/i386/pr100865-2.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-3.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-4a.c  |  6 +--
>  gcc/testsuite/gcc.target/i386/pr100865-4b.c  |  8 ++--
>  gcc/testsuite/gcc.target/i386/pr90773-1.c    | 10 ++---
>  gcc/testsuite/gcc.target/i386/pr90773-14.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-15.c   |  6 +--
>  gcc/testsuite/gcc.target/i386/pr90773-16.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-17.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-24.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-25.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-4.c    |  2 +-
>  17 files changed, 76 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 5d20ca2067f..842eb0e6786 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -7953,8 +7953,17 @@ ix86_finalize_stack_frame_flags (void)
>       assumed stack realignment might be needed or -fno-omit-frame-pointer
>       is used, but in the end nothing that needed the stack alignment had
>       been spilled nor stack access, clear frame_pointer_needed and say we
> -     don't need stack realignment.  */
> -  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
> +     don't need stack realignment.
> +
> +     When vector register is used for piecewise move and store, we don't
> +     increase stack_alignment_needed as there is no register spill for
> +     piecewise move and store.  Since stack_realign_needed is set to true
> +     by checking stack_alignment_estimated which is updated by pseudo
> +     vector register usage, we also need to check stack_realign_needed to
> +     eliminate frame pointer.  */
> +  if ((stack_realign
> +       || (!flag_omit_frame_pointer && optimize)
> +       || crtl->stack_realign_needed)
>        && frame_pointer_needed
>        && crtl->is_leaf
>        && crtl->sp_is_unchanging
> @@ -10418,7 +10427,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
>           /* FALLTHRU */
>         case E_OImode:
>         case E_XImode:
> -         if (!standard_sse_constant_p (x, mode))
> +         if (!standard_sse_constant_p (x, mode)
> +             && GET_MODE_SIZE (TARGET_AVX512F
> +                               ? XImode
> +                               : (TARGET_AVX
> +                                  ? OImode
> +                                  : (TARGET_SSE2
> +                                     ? TImode : DImode))) < GET_MODE_SIZE (mode))
>             return false;
>         default:
>           break;
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index d1e1c225990..50418a0cc9b 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1757,9 +1757,10 @@ typedef struct ix86_args {
>  /* Define this as 1 if `char' should by default be signed; else as 0.  */
>  #define DEFAULT_SIGNED_CHAR 1
>
> -/* Max number of bytes we can move from memory to memory
> -   in one reasonably fast instruction.  */
> -#define MOVE_MAX 16
> +/* The constant maximum number of bytes that a single instruction can
> +   move quickly between memory and registers or between two memory
> +   locations.  */
> +#define MAX_MOVE_MAX 64
>
>  /* MOVE_MAX_PIECES is the number of bytes at a time which we can
>     move efficiently, as opposed to  MOVE_MAX which is the maximum

The comment here is now totally wrong.

> @@ -1770,11 +1771,34 @@ typedef struct ix86_args {
>     widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
>     64-bit mode.  */
>  #define MOVE_MAX_PIECES \
> -  ((TARGET_64BIT \
> -    && TARGET_SSE2 \
> -    && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> -    && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> -   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
> +  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
> +   ? 64 \
> +   : ((TARGET_AVX \
> +       && !TARGET_PREFER_AVX128 \
> +       && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
> +       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
> +      ? 32 \
> +      : ((TARGET_SSE2 \
> +         && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> +         && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> +        ? 16 : UNITS_PER_WORD)))
> +
> +/* Max number of bytes we can move from memory to memory in one
> +   reasonably fast instruction.  */
> +#define MOVE_MAX MOVE_MAX_PIECES

Isn't this a bit backward now? Instead of the above define, we should
define MOVE_MAX instead of MOVE_MAX_PIECES, defaults.h has:

defaults.h:#ifndef MOVE_MAX_PIECES
defaults.h:#define MOVE_MAX_PIECES   MOVE_MAX

Uros.

> +
> +/* STORE_MAX_PIECES is the number of bytes at a time that we can
> +   store efficiently.  */
> +#define STORE_MAX_PIECES \
> +  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
> +   ? 64 \
> +   : ((TARGET_AVX \
> +       && !TARGET_PREFER_AVX128 \
> +       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
> +      ? 32 \
> +      : ((TARGET_SSE2 \
> +         && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> +        ? 16 : UNITS_PER_WORD)))
>
>  /* If a memory-to-memory move would take MOVE_RATIO or more simple
>     move-instruction pairs, we will do a cpymem or libcall instead.
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c
> index 6c3097fb2a6..949dd5c337a 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=x86-64" } */
>
>  extern char *dst;
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> index 7ffc19e56a8..98b6dfb16f3 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> @@ -29,5 +29,5 @@ foo (void)
>      array[i] = MK_CONST128_BROADCAST (0x1f);
>  }
>
> -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> -/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> index edf52765c60..e5616d8d258 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> @@ -3,5 +3,5 @@
>
>  #include "pr100865-10a.c"
>
> -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> -/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 8 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c
> index 17efe2d72a3..f3ea7753abe 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=skylake" } */
>
>  extern char *dst;
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c
> index 007e79f91b0..714c43e12c9 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-3.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=skylake-avx512" } */
>
>  extern char *dst;
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> index f55883598f9..365487337ae 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=skylake" } */
>
>  extern char array[64];
> @@ -11,6 +11,6 @@ foo (void)
>      array[i] = -45;
>  }
>
> -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
> -/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 2 } } */
>  /* { dg-final { scan-assembler-not "vmovdqa" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> index 1e50dc842bc..8e8a7eaaaff 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> @@ -1,9 +1,9 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=skylake-avx512" } */
>
>  #include "pr100865-4a.c"
>
> -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> -/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%xmm\[0-9\]+, " 4 } } */
> -/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 2 } } */
> +/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */
>  /* { dg-final { scan-assembler-not "vmovdqa" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> index 1d9f282dc0d..4fd5a40d99d 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mtune=generic" } */
> +/* { dg-options "-O2 -msse2 -mtune=generic" } */
>
>  extern char *dst, *src;
>
> @@ -9,9 +9,5 @@ foo (void)
>    __builtin_memcpy (dst, src, 15);
>  }
>
> -/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> -/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> -/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> -/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> -/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> -/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 } } */
> +/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> index e5c19f49cf5..96ee5cb08c1 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-14.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
>
>  extern char *dst;
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c
> index 185ea60e1d2..403cdb248a2 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-15.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=skylake-avx512" } */
>
>  extern char *dst;
> @@ -9,6 +9,6 @@ foo (int c)
>    __builtin_memset (dst, c, 17);
>  }
>
> -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%.*, %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
> -/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "movb\[\\t \]+%.*, 16\\(%\[\^,\]+\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c
> index d820cc318c3..bb0aadbc77e 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-16.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=skylake-avx512" } */
>
>  extern char *dst;
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c
> index f6f179e9b5b..73d5d5abaee 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-17.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=skylake-avx512" } */
>
>  extern char *dst;
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c b/gcc/testsuite/gcc.target/i386/pr90773-24.c
> index 7b2ea66dcfc..71f1fd8c4df 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-24.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=x86-64" } */
>
>  struct S
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c b/gcc/testsuite/gcc.target/i386/pr90773-25.c
> index 57642ea8d2d..ad19a88c883 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-25.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -march=x86-64" } */
>
>  struct S
> diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> index ec0bc0100ae..ee4c04678d1 100644
> --- a/gcc/testsuite/gcc.target/i386/pr90773-4.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
>
>  extern char *dst;
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v7 03/10] x86: Update piecewise move and store
  2021-08-02 11:20   ` Uros Bizjak
@ 2021-08-02 14:56     ` H.J. Lu
  2021-08-02 15:53       ` Uros Bizjak
  0 siblings, 1 reply; 14+ messages in thread
From: H.J. Lu @ 2021-08-02 14:56 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, liuhongt

[-- Attachment #1: Type: text/plain, Size: 18829 bytes --]

On Mon, Aug 2, 2021 at 4:20 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Jul 30, 2021 at 11:32 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > We can use TImode/OImode/XImode integers for piecewise move and store.
> >
> > 1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
> > bytes that a single instruction can move quickly between memory and
> > registers or between two memory locations.
> > 2. Define MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
> > bytes we can move from memory to memory in one reasonably fast instruction.
> > The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
> > must be a constant, independent of compiler options, since it is used in
> > reload.h to define struct target_reload and MOVE_MAX can vary, depending
> > on compiler options.
> > 3. When vector register is used for piecewise move and store, we don't
> > increase stack_alignment_needed since vector register spill isn't
> > required for piecewise move and store.  Since stack_realign_needed is
> > set to true by checking stack_alignment_estimated set by pseudo vector
> > register usage, we also need to check stack_realign_needed to eliminate
> > frame pointer.
> >
> > gcc/
> >
> >         * config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
> >         check stack_realign_needed for stack realignment.
> >         (ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
> >         than the largest integer supported by vector register.
> >         * config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
> >         (MOVE_MAX_PIECES): Set to bytes of the largest integer supported
> >         by vector register.
> >         (MOVE_MAX): Defined to MOVE_MAX_PIECES.
> >         (STORE_MAX_PIECES): New.
> >
> > gcc/testsuite/
> >
> >         * gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
> >         * gcc.target/i386/pr90773-4.c: Also run for 32-bit.
> >         * gcc.target/i386/pr90773-15.c: Likewise.
> >         * gcc.target/i386/pr90773-16.c: Likewise.
> >         * gcc.target/i386/pr90773-17.c: Likewise.
> >         * gcc.target/i386/pr90773-24.c: Likewise.
> >         * gcc.target/i386/pr90773-25.c: Likewise.
> >         * gcc.target/i386/pr100865-1.c: Likewise.
> >         * gcc.target/i386/pr100865-2.c: Likewise.
> >         * gcc.target/i386/pr100865-3.c: Likewise.
> >         * gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
> >         XMM movd to store 4 bytes.
> >         * gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
> >         YMM registers.
> >         * gcc.target/i386/pr100865-4b.c: Likewise.
> >         * gcc.target/i386/pr100865-10a.c: Expect YMM registers.
> >         * gcc.target/i386/pr100865-10b.c: Likewise.
> > ---
> >  gcc/config/i386/i386.c                       | 21 ++++++++--
> >  gcc/config/i386/i386.h                       | 40 ++++++++++++++++----
> >  gcc/testsuite/gcc.target/i386/pr100865-1.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr100865-10a.c |  4 +-
> >  gcc/testsuite/gcc.target/i386/pr100865-10b.c |  4 +-
> >  gcc/testsuite/gcc.target/i386/pr100865-2.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr100865-3.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr100865-4a.c  |  6 +--
> >  gcc/testsuite/gcc.target/i386/pr100865-4b.c  |  8 ++--
> >  gcc/testsuite/gcc.target/i386/pr90773-1.c    | 10 ++---
> >  gcc/testsuite/gcc.target/i386/pr90773-14.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr90773-15.c   |  6 +--
> >  gcc/testsuite/gcc.target/i386/pr90773-16.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr90773-17.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr90773-24.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr90773-25.c   |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr90773-4.c    |  2 +-
> >  17 files changed, 76 insertions(+), 41 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 5d20ca2067f..842eb0e6786 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -7953,8 +7953,17 @@ ix86_finalize_stack_frame_flags (void)
> >       assumed stack realignment might be needed or -fno-omit-frame-pointer
> >       is used, but in the end nothing that needed the stack alignment had
> >       been spilled nor stack access, clear frame_pointer_needed and say we
> > -     don't need stack realignment.  */
> > -  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
> > +     don't need stack realignment.
> > +
> > +     When vector register is used for piecewise move and store, we don't
> > +     increase stack_alignment_needed as there is no register spill for
> > +     piecewise move and store.  Since stack_realign_needed is set to true
> > +     by checking stack_alignment_estimated which is updated by pseudo
> > +     vector register usage, we also need to check stack_realign_needed to
> > +     eliminate frame pointer.  */
> > +  if ((stack_realign
> > +       || (!flag_omit_frame_pointer && optimize)
> > +       || crtl->stack_realign_needed)
> >        && frame_pointer_needed
> >        && crtl->is_leaf
> >        && crtl->sp_is_unchanging
> > @@ -10418,7 +10427,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
> >           /* FALLTHRU */
> >         case E_OImode:
> >         case E_XImode:
> > -         if (!standard_sse_constant_p (x, mode))
> > +         if (!standard_sse_constant_p (x, mode)
> > +             && GET_MODE_SIZE (TARGET_AVX512F
> > +                               ? XImode
> > +                               : (TARGET_AVX
> > +                                  ? OImode
> > +                                  : (TARGET_SSE2
> > +                                     ? TImode : DImode))) < GET_MODE_SIZE (mode))
> >             return false;
> >         default:
> >           break;
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index d1e1c225990..50418a0cc9b 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -1757,9 +1757,10 @@ typedef struct ix86_args {
> >  /* Define this as 1 if `char' should by default be signed; else as 0.  */
> >  #define DEFAULT_SIGNED_CHAR 1
> >
> > -/* Max number of bytes we can move from memory to memory
> > -   in one reasonably fast instruction.  */
> > -#define MOVE_MAX 16
> > +/* The constant maximum number of bytes that a single instruction can
> > +   move quickly between memory and registers or between two memory
> > +   locations.  */
> > +#define MAX_MOVE_MAX 64
> >
> >  /* MOVE_MAX_PIECES is the number of bytes at a time which we can
> >     move efficiently, as opposed to  MOVE_MAX which is the maximum
>
> The comment here is now totally wrong.

Fixed.

> > @@ -1770,11 +1771,34 @@ typedef struct ix86_args {
> >     widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
> >     64-bit mode.  */
> >  #define MOVE_MAX_PIECES \
> > -  ((TARGET_64BIT \
> > -    && TARGET_SSE2 \
> > -    && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> > -    && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> > -   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
> > +  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
> > +   ? 64 \
> > +   : ((TARGET_AVX \
> > +       && !TARGET_PREFER_AVX128 \
> > +       && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
> > +       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
> > +      ? 32 \
> > +      : ((TARGET_SSE2 \
> > +         && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> > +         && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> > +        ? 16 : UNITS_PER_WORD)))
> > +
> > +/* Max number of bytes we can move from memory to memory in one
> > +   reasonably fast instruction.  */
> > +#define MOVE_MAX MOVE_MAX_PIECES
>
> Isn't this a bit backward now? Instead of the above define, we should
> define MOVE_MAX instead of MOVE_MAX_PIECES, defaults.h has:

Here is the v7 patch which is changed to

/* Max number of bytes we can move from memory to memory in one
   reasonably fast instruction, as opposed to MOVE_MAX_PIECES which
   is the number of bytes at a time which we can move efficiently.
   MOVE_MAX_PIECES defaults to MOVE_MAX.  */

#define MOVE_MAX \
  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
   ? 64 \
   : ((TARGET_AVX \
       && !TARGET_PREFER_AVX128 \
       && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
      ? 32 \
      : ((TARGET_SSE2 \
          && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
          && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
         ? 16 : UNITS_PER_WORD)))

OK for master?  Thanks.

> defaults.h:#ifndef MOVE_MAX_PIECES
> defaults.h:#define MOVE_MAX_PIECES   MOVE_MAX
>
> Uros.
>
> > +
> > +/* STORE_MAX_PIECES is the number of bytes at a time that we can
> > +   store efficiently.  */
> > +#define STORE_MAX_PIECES \
> > +  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
> > +   ? 64 \
> > +   : ((TARGET_AVX \
> > +       && !TARGET_PREFER_AVX128 \
> > +       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
> > +      ? 32 \
> > +      : ((TARGET_SSE2 \
> > +         && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> > +        ? 16 : UNITS_PER_WORD)))
> >
> >  /* If a memory-to-memory move would take MOVE_RATIO or more simple
> >     move-instruction pairs, we will do a cpymem or libcall instead.
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c
> > index 6c3097fb2a6..949dd5c337a 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr100865-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=x86-64" } */
> >
> >  extern char *dst;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> > index 7ffc19e56a8..98b6dfb16f3 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> > @@ -29,5 +29,5 @@ foo (void)
> >      array[i] = MK_CONST128_BROADCAST (0x1f);
> >  }
> >
> > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> > -/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
> > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> > index edf52765c60..e5616d8d258 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> > @@ -3,5 +3,5 @@
> >
> >  #include "pr100865-10a.c"
> >
> > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> > -/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
> > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 8 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c
> > index 17efe2d72a3..f3ea7753abe 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr100865-2.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=skylake" } */
> >
> >  extern char *dst;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c
> > index 007e79f91b0..714c43e12c9 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr100865-3.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=skylake-avx512" } */
> >
> >  extern char *dst;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> > index f55883598f9..365487337ae 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=skylake" } */
> >
> >  extern char array[64];
> > @@ -11,6 +11,6 @@ foo (void)
> >      array[i] = -45;
> >  }
> >
> > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
> > -/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */
> > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 2 } } */
> >  /* { dg-final { scan-assembler-not "vmovdqa" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> > index 1e50dc842bc..8e8a7eaaaff 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> > @@ -1,9 +1,9 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=skylake-avx512" } */
> >
> >  #include "pr100865-4a.c"
> >
> > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> > -/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%xmm\[0-9\]+, " 4 } } */
> > -/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */
> > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 2 } } */
> > +/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */
> >  /* { dg-final { scan-assembler-not "vmovdqa" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> > index 1d9f282dc0d..4fd5a40d99d 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mtune=generic" } */
> > +/* { dg-options "-O2 -msse2 -mtune=generic" } */
> >
> >  extern char *dst, *src;
> >
> > @@ -9,9 +9,5 @@ foo (void)
> >    __builtin_memcpy (dst, src, 15);
> >  }
> >
> > -/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> > -/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> > -/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > -/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > -/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > -/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > +/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 } } */
> > +/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> > index e5c19f49cf5..96ee5cb08c1 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-14.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> >
> >  extern char *dst;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c
> > index 185ea60e1d2..403cdb248a2 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-15.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=skylake-avx512" } */
> >
> >  extern char *dst;
> > @@ -9,6 +9,6 @@ foo (int c)
> >    __builtin_memset (dst, c, 17);
> >  }
> >
> > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%.*, %xmm\[0-9\]+" 1 } } */
> >  /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
> > -/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 } } */
> > +/* { dg-final { scan-assembler-times "movb\[\\t \]+%.*, 16\\(%\[\^,\]+\\)" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c
> > index d820cc318c3..bb0aadbc77e 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-16.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=skylake-avx512" } */
> >
> >  extern char *dst;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c
> > index f6f179e9b5b..73d5d5abaee 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-17.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=skylake-avx512" } */
> >
> >  extern char *dst;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c b/gcc/testsuite/gcc.target/i386/pr90773-24.c
> > index 7b2ea66dcfc..71f1fd8c4df 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-24.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=x86-64" } */
> >
> >  struct S
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c b/gcc/testsuite/gcc.target/i386/pr90773-25.c
> > index 57642ea8d2d..ad19a88c883 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-25.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -march=x86-64" } */
> >
> >  struct S
> > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> > index ec0bc0100ae..ee4c04678d1 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr90773-4.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-do compile } */
> >  /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> >
> >  extern char *dst;
> > --
> > 2.31.1
> >



-- 
H.J.

[-- Attachment #2: v7-0001-x86-Update-piecewise-move-and-store.patch --]
[-- Type: text/x-patch, Size: 16367 bytes --]

From ea40e16bfc6c2eca5f861d802360e9d015f3630c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 5 Mar 2016 07:17:09 -0800
Subject: [PATCH v7 03/10] x86: Update piecewise move and store

We can use TImode/OImode/XImode integers for piecewise move and store.

1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to the maximum number of bytes we can move from memory
to memory in one reasonably fast instruction.  The difference between
MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX must be a constant,
independent of compiler options, since it is used in reload.h to define
struct target_reload and MOVE_MAX can vary, depending on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

	* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
	check stack_realign_needed for stack realignment.
	(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
	than the largest integer supported by vector register.
	* config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
	(MOVE_MAX): Set to bytes of the largest integer supported by
	vector register.
	(STORE_MAX_PIECES): New.

gcc/testsuite/

	* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
	* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
	* gcc.target/i386/pr90773-15.c: Likewise.
	* gcc.target/i386/pr90773-16.c: Likewise.
	* gcc.target/i386/pr90773-17.c: Likewise.
	* gcc.target/i386/pr90773-24.c: Likewise.
	* gcc.target/i386/pr90773-25.c: Likewise.
	* gcc.target/i386/pr100865-1.c: Likewise.
	* gcc.target/i386/pr100865-2.c: Likewise.
	* gcc.target/i386/pr100865-3.c: Likewise.
	* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
	XMM movd to store 4 bytes.
	* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
	YMM registers.
	* gcc.target/i386/pr100865-4b.c: Likewise.
	* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
	* gcc.target/i386/pr100865-10b.c: Likewise.

Fix x86: Update piecewise move and store

MOVE_MAX_PIECES -> MOVE_MAX.
---
 gcc/config/i386/i386.c                       | 21 ++++++--
 gcc/config/i386/i386.h                       | 53 +++++++++++++-------
 gcc/testsuite/gcc.target/i386/pr100865-1.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-10a.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-10b.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4a.c  |  6 +--
 gcc/testsuite/gcc.target/i386/pr100865-4b.c  |  8 +--
 gcc/testsuite/gcc.target/i386/pr90773-1.c    | 10 ++--
 gcc/testsuite/gcc.target/i386/pr90773-14.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c   |  6 +--
 gcc/testsuite/gcc.target/i386/pr90773-16.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-24.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-25.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-4.c    |  2 +-
 17 files changed, 79 insertions(+), 51 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5d20ca2067f..842eb0e6786 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7953,8 +7953,17 @@ ix86_finalize_stack_frame_flags (void)
      assumed stack realignment might be needed or -fno-omit-frame-pointer
      is used, but in the end nothing that needed the stack alignment had
      been spilled nor stack access, clear frame_pointer_needed and say we
-     don't need stack realignment.  */
-  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
+     don't need stack realignment.
+
+     When vector register is used for piecewise move and store, we don't
+     increase stack_alignment_needed as there is no register spill for
+     piecewise move and store.  Since stack_realign_needed is set to true
+     by checking stack_alignment_estimated which is updated by pseudo
+     vector register usage, we also need to check stack_realign_needed to
+     eliminate frame pointer.  */
+  if ((stack_realign
+       || (!flag_omit_frame_pointer && optimize)
+       || crtl->stack_realign_needed)
       && frame_pointer_needed
       && crtl->is_leaf
       && crtl->sp_is_unchanging
@@ -10418,7 +10427,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
 	  /* FALLTHRU */
 	case E_OImode:
 	case E_XImode:
-	  if (!standard_sse_constant_p (x, mode))
+	  if (!standard_sse_constant_p (x, mode)
+	      && GET_MODE_SIZE (TARGET_AVX512F
+				? XImode
+				: (TARGET_AVX
+				   ? OImode
+				   : (TARGET_SSE2
+				      ? TImode : DImode))) < GET_MODE_SIZE (mode))
 	    return false;
 	default:
 	  break;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index d1e1c225990..bed9cd9da18 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1757,24 +1757,41 @@ typedef struct ix86_args {
 /* Define this as 1 if `char' should by default be signed; else as 0.  */
 #define DEFAULT_SIGNED_CHAR 1
 
-/* Max number of bytes we can move from memory to memory
-   in one reasonably fast instruction.  */
-#define MOVE_MAX 16
-
-/* MOVE_MAX_PIECES is the number of bytes at a time which we can
-   move efficiently, as opposed to  MOVE_MAX which is the maximum
-   number of bytes we can move with a single instruction.
-
-   ??? We should use TImode in 32-bit mode and use OImode or XImode
-   if they are available.  But since by_pieces_ninsns determines the
-   widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
-   64-bit mode.  */
-#define MOVE_MAX_PIECES \
-  ((TARGET_64BIT \
-    && TARGET_SSE2 \
-    && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
-    && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
-   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
+/* The constant maximum number of bytes that a single instruction can
+   move quickly between memory and registers or between two memory
+   locations.  */
+#define MAX_MOVE_MAX 64
+
+/* Max number of bytes we can move from memory to memory in one
+   reasonably fast instruction, as opposed to MOVE_MAX_PIECES which
+   is the number of bytes at a time which we can move efficiently.
+   MOVE_MAX_PIECES defaults to MOVE_MAX.  */
+
+#define MOVE_MAX \
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+       && !TARGET_PREFER_AVX128 \
+       && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
+       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
+      ? 32 \
+      : ((TARGET_SSE2 \
+	  && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
+	  && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
+	 ? 16 : UNITS_PER_WORD)))
+
+/* STORE_MAX_PIECES is the number of bytes at a time that we can
+   store efficiently.  */
+#define STORE_MAX_PIECES \
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+       && !TARGET_PREFER_AVX128 \
+       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
+      ? 32 \
+      : ((TARGET_SSE2 \
+	  && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
+	 ? 16 : UNITS_PER_WORD)))
 
 /* If a memory-to-memory move would take MOVE_RATIO or more simple
    move-instruction pairs, we will do a cpymem or libcall instead.
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c
index 6c3097fb2a6..949dd5c337a 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
index 7ffc19e56a8..98b6dfb16f3 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-10a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
@@ -29,5 +29,5 @@ foo (void)
     array[i] = MK_CONST128_BROADCAST (0x1f);
 }
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
index edf52765c60..e5616d8d258 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-10b.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
@@ -3,5 +3,5 @@
 
 #include "pr100865-10a.c"
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c
index 17efe2d72a3..f3ea7753abe 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c
index 007e79f91b0..714c43e12c9 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
index f55883598f9..365487337ae 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-4a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake" } */
 
 extern char array[64];
@@ -11,6 +11,6 @@ foo (void)
     array[i] = -45;
 }
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 2 } } */
 /* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
index 1e50dc842bc..8e8a7eaaaff 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-4b.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
@@ -1,9 +1,9 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 #include "pr100865-4a.c"
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%xmm\[0-9\]+, " 4 } } */
-/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 2 } } */
+/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */
 /* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
index 1d9f282dc0d..4fd5a40d99d 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune=generic" } */
+/* { dg-options "-O2 -msse2 -mtune=generic" } */
 
 extern char *dst, *src;
 
@@ -9,9 +9,5 @@ foo (void)
   __builtin_memcpy (dst, src, 15);
 }
 
-/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 } } */
+/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
index e5c19f49cf5..96ee5cb08c1 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-14.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c
index 185ea60e1d2..403cdb248a2 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-15.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
@@ -9,6 +9,6 @@ foo (int c)
   __builtin_memset (dst, c, 17);
 }
 
-/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%.*, %xmm\[0-9\]+" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
-/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+%.*, 16\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c
index d820cc318c3..bb0aadbc77e 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-16.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c
index f6f179e9b5b..73d5d5abaee 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-17.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=skylake-avx512" } */
 
 extern char *dst;
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c b/gcc/testsuite/gcc.target/i386/pr90773-24.c
index 7b2ea66dcfc..71f1fd8c4df 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-24.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
 
 struct S
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c b/gcc/testsuite/gcc.target/i386/pr90773-25.c
index 57642ea8d2d..ad19a88c883 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-25.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
 
 struct S
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c
index ec0bc0100ae..ee4c04678d1 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
 
 extern char *dst;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 03/10] x86: Update piecewise move and store
  2021-08-02 14:56     ` [PATCH v7 " H.J. Lu
@ 2021-08-02 15:53       ` Uros Bizjak
  0 siblings, 0 replies; 14+ messages in thread
From: Uros Bizjak @ 2021-08-02 15:53 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, liuhongt

On Mon, Aug 2, 2021 at 4:57 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Aug 2, 2021 at 4:20 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Fri, Jul 30, 2021 at 11:32 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > We can use TImode/OImode/XImode integers for piecewise move and store.
> > >
> > > 1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
> > > bytes that a single instruction can move quickly between memory and
> > > registers or between two memory locations.
> > > 2. Define MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
> > > bytes we can move from memory to memory in one reasonably fast instruction.
> > > The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
> > > must be a constant, independent of compiler options, since it is used in
> > > reload.h to define struct target_reload and MOVE_MAX can vary, depending
> > > on compiler options.
> > > 3. When vector register is used for piecewise move and store, we don't
> > > increase stack_alignment_needed since vector register spill isn't
> > > required for piecewise move and store.  Since stack_realign_needed is
> > > set to true by checking stack_alignment_estimated set by pseudo vector
> > > register usage, we also need to check stack_realign_needed to eliminate
> > > frame pointer.
> > >
> > > gcc/
> > >
> > >         * config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
> > >         check stack_realign_needed for stack realignment.
> > >         (ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
> > >         than the largest integer supported by vector register.
> > >         * config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
> > >         (MOVE_MAX_PIECES): Set to bytes of the largest integer supported
> > >         by vector register.
> > >         (MOVE_MAX): Defined to MOVE_MAX_PIECES.
> > >         (STORE_MAX_PIECES): New.
> > >
> > > gcc/testsuite/
> > >
> > >         * gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
> > >         * gcc.target/i386/pr90773-4.c: Also run for 32-bit.
> > >         * gcc.target/i386/pr90773-15.c: Likewise.
> > >         * gcc.target/i386/pr90773-16.c: Likewise.
> > >         * gcc.target/i386/pr90773-17.c: Likewise.
> > >         * gcc.target/i386/pr90773-24.c: Likewise.
> > >         * gcc.target/i386/pr90773-25.c: Likewise.
> > >         * gcc.target/i386/pr100865-1.c: Likewise.
> > >         * gcc.target/i386/pr100865-2.c: Likewise.
> > >         * gcc.target/i386/pr100865-3.c: Likewise.
> > >         * gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
> > >         XMM movd to store 4 bytes.
> > >         * gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
> > >         YMM registers.
> > >         * gcc.target/i386/pr100865-4b.c: Likewise.
> > >         * gcc.target/i386/pr100865-10a.c: Expect YMM registers.
> > >         * gcc.target/i386/pr100865-10b.c: Likewise.
> > > ---
> > >  gcc/config/i386/i386.c                       | 21 ++++++++--
> > >  gcc/config/i386/i386.h                       | 40 ++++++++++++++++----
> > >  gcc/testsuite/gcc.target/i386/pr100865-1.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr100865-10a.c |  4 +-
> > >  gcc/testsuite/gcc.target/i386/pr100865-10b.c |  4 +-
> > >  gcc/testsuite/gcc.target/i386/pr100865-2.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr100865-3.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr100865-4a.c  |  6 +--
> > >  gcc/testsuite/gcc.target/i386/pr100865-4b.c  |  8 ++--
> > >  gcc/testsuite/gcc.target/i386/pr90773-1.c    | 10 ++---
> > >  gcc/testsuite/gcc.target/i386/pr90773-14.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr90773-15.c   |  6 +--
> > >  gcc/testsuite/gcc.target/i386/pr90773-16.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr90773-17.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr90773-24.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr90773-25.c   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr90773-4.c    |  2 +-
> > >  17 files changed, 76 insertions(+), 41 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 5d20ca2067f..842eb0e6786 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -7953,8 +7953,17 @@ ix86_finalize_stack_frame_flags (void)
> > >       assumed stack realignment might be needed or -fno-omit-frame-pointer
> > >       is used, but in the end nothing that needed the stack alignment had
> > >       been spilled nor stack access, clear frame_pointer_needed and say we
> > > -     don't need stack realignment.  */
> > > -  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
> > > +     don't need stack realignment.
> > > +
> > > +     When vector register is used for piecewise move and store, we don't
> > > +     increase stack_alignment_needed as there is no register spill for
> > > +     piecewise move and store.  Since stack_realign_needed is set to true
> > > +     by checking stack_alignment_estimated which is updated by pseudo
> > > +     vector register usage, we also need to check stack_realign_needed to
> > > +     eliminate frame pointer.  */
> > > +  if ((stack_realign
> > > +       || (!flag_omit_frame_pointer && optimize)
> > > +       || crtl->stack_realign_needed)
> > >        && frame_pointer_needed
> > >        && crtl->is_leaf
> > >        && crtl->sp_is_unchanging
> > > @@ -10418,7 +10427,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
> > >           /* FALLTHRU */
> > >         case E_OImode:
> > >         case E_XImode:
> > > -         if (!standard_sse_constant_p (x, mode))
> > > +         if (!standard_sse_constant_p (x, mode)
> > > +             && GET_MODE_SIZE (TARGET_AVX512F
> > > +                               ? XImode
> > > +                               : (TARGET_AVX
> > > +                                  ? OImode
> > > +                                  : (TARGET_SSE2
> > > +                                     ? TImode : DImode))) < GET_MODE_SIZE (mode))
> > >             return false;
> > >         default:
> > >           break;
> > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > > index d1e1c225990..50418a0cc9b 100644
> > > --- a/gcc/config/i386/i386.h
> > > +++ b/gcc/config/i386/i386.h
> > > @@ -1757,9 +1757,10 @@ typedef struct ix86_args {
> > >  /* Define this as 1 if `char' should by default be signed; else as 0.  */
> > >  #define DEFAULT_SIGNED_CHAR 1
> > >
> > > -/* Max number of bytes we can move from memory to memory
> > > -   in one reasonably fast instruction.  */
> > > -#define MOVE_MAX 16
> > > +/* The constant maximum number of bytes that a single instruction can
> > > +   move quickly between memory and registers or between two memory
> > > +   locations.  */
> > > +#define MAX_MOVE_MAX 64
> > >
> > >  /* MOVE_MAX_PIECES is the number of bytes at a time which we can
> > >     move efficiently, as opposed to  MOVE_MAX which is the maximum
> >
> > The comment here is now totally wrong.
>
> Fixed.
>
> > > @@ -1770,11 +1771,34 @@ typedef struct ix86_args {
> > >     widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
> > >     64-bit mode.  */
> > >  #define MOVE_MAX_PIECES \
> > > -  ((TARGET_64BIT \
> > > -    && TARGET_SSE2 \
> > > -    && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> > > -    && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> > > -   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
> > > +  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
> > > +   ? 64 \
> > > +   : ((TARGET_AVX \
> > > +       && !TARGET_PREFER_AVX128 \
> > > +       && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
> > > +       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
> > > +      ? 32 \
> > > +      : ((TARGET_SSE2 \
> > > +         && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> > > +         && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> > > +        ? 16 : UNITS_PER_WORD)))
> > > +
> > > +/* Max number of bytes we can move from memory to memory in one
> > > +   reasonably fast instruction.  */
> > > +#define MOVE_MAX MOVE_MAX_PIECES
> >
> > Isn't this a bit backward now? Instead of the above define, we should
> > define MOVE_MAX instead of MOVE_MAX_PIECES, defaults.h has:
>
> Here is the v7 patch which is changed to
>
> /* Max number of bytes we can move from memory to memory in one
>    reasonably fast instruction, as opposed to MOVE_MAX_PIECES which
>    is the number of bytes at a time which we can move efficiently.
>    MOVE_MAX_PIECES defaults to MOVE_MAX.  */
>
> #define MOVE_MAX \
>   ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
>    ? 64 \
>    : ((TARGET_AVX \
>        && !TARGET_PREFER_AVX128 \
>        && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
>        && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
>       ? 32 \
>       : ((TARGET_SSE2 \
>           && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
>           && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
>          ? 16 : UNITS_PER_WORD)))
>
> OK for master?  Thanks.

OK.

Thanks,
Uros.

>
> > defaults.h:#ifndef MOVE_MAX_PIECES
> > defaults.h:#define MOVE_MAX_PIECES   MOVE_MAX
> >
> > Uros.
> >
> > > +
> > > +/* STORE_MAX_PIECES is the number of bytes at a time that we can
> > > +   store efficiently.  */
> > > +#define STORE_MAX_PIECES \
> > > +  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
> > > +   ? 64 \
> > > +   : ((TARGET_AVX \
> > > +       && !TARGET_PREFER_AVX128 \
> > > +       && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
> > > +      ? 32 \
> > > +      : ((TARGET_SSE2 \
> > > +         && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
> > > +        ? 16 : UNITS_PER_WORD)))
> > >
> > >  /* If a memory-to-memory move would take MOVE_RATIO or more simple
> > >     move-instruction pairs, we will do a cpymem or libcall instead.
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c
> > > index 6c3097fb2a6..949dd5c337a 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr100865-1.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=x86-64" } */
> > >
> > >  extern char *dst;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> > > index 7ffc19e56a8..98b6dfb16f3 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
> > > @@ -29,5 +29,5 @@ foo (void)
> > >      array[i] = MK_CONST128_BROADCAST (0x1f);
> > >  }
> > >
> > > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> > > -/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
> > > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> > > index edf52765c60..e5616d8d258 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
> > > @@ -3,5 +3,5 @@
> > >
> > >  #include "pr100865-10a.c"
> > >
> > > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> > > -/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
> > > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 8 } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c
> > > index 17efe2d72a3..f3ea7753abe 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr100865-2.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=skylake" } */
> > >
> > >  extern char *dst;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c
> > > index 007e79f91b0..714c43e12c9 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr100865-3.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=skylake-avx512" } */
> > >
> > >  extern char *dst;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> > > index f55883598f9..365487337ae 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=skylake" } */
> > >
> > >  extern char array[64];
> > > @@ -11,6 +11,6 @@ foo (void)
> > >      array[i] = -45;
> > >  }
> > >
> > > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
> > > -/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */
> > > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 2 } } */
> > >  /* { dg-final { scan-assembler-not "vmovdqa" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> > > index 1e50dc842bc..8e8a7eaaaff 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
> > > @@ -1,9 +1,9 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=skylake-avx512" } */
> > >
> > >  #include "pr100865-4a.c"
> > >
> > > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> > > -/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%xmm\[0-9\]+, " 4 } } */
> > > -/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */
> > > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 2 } } */
> > > +/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */
> > >  /* { dg-final { scan-assembler-not "vmovdqa" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> > > index 1d9f282dc0d..4fd5a40d99d 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-1.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -mtune=generic" } */
> > > +/* { dg-options "-O2 -msse2 -mtune=generic" } */
> > >
> > >  extern char *dst, *src;
> > >
> > > @@ -9,9 +9,5 @@ foo (void)
> > >    __builtin_memcpy (dst, src, 15);
> > >  }
> > >
> > > -/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> > > -/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */
> > > -/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > > -/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > > -/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > > -/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */
> > > +/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 } } */
> > > +/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> > > index e5c19f49cf5..96ee5cb08c1 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-14.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> > >
> > >  extern char *dst;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c
> > > index 185ea60e1d2..403cdb248a2 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-15.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=skylake-avx512" } */
> > >
> > >  extern char *dst;
> > > @@ -9,6 +9,6 @@ foo (int c)
> > >    __builtin_memset (dst, c, 17);
> > >  }
> > >
> > > -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%.*, %xmm\[0-9\]+" 1 } } */
> > >  /* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
> > > -/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 } } */
> > > +/* { dg-final { scan-assembler-times "movb\[\\t \]+%.*, 16\\(%\[\^,\]+\\)" 1 } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c
> > > index d820cc318c3..bb0aadbc77e 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-16.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=skylake-avx512" } */
> > >
> > >  extern char *dst;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c
> > > index f6f179e9b5b..73d5d5abaee 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-17.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=skylake-avx512" } */
> > >
> > >  extern char *dst;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c b/gcc/testsuite/gcc.target/i386/pr90773-24.c
> > > index 7b2ea66dcfc..71f1fd8c4df 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-24.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=x86-64" } */
> > >
> > >  struct S
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c b/gcc/testsuite/gcc.target/i386/pr90773-25.c
> > > index 57642ea8d2d..ad19a88c883 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-25.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -march=x86-64" } */
> > >
> > >  struct S
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> > > index ec0bc0100ae..ee4c04678d1 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr90773-4.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c
> > > @@ -1,4 +1,4 @@
> > > -/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-do compile } */
> > >  /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> > >
> > >  extern char *dst;
> > > --
> > > 2.31.1
> > >
>
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-08-02 15:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 21:32 [PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations H.J. Lu
2021-07-30 21:32 ` [PATCH v6 01/10] x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX H.J. Lu
2021-07-30 21:32 ` [PATCH v6 02/10] x86: Avoid stack realignment when copying data H.J. Lu
2021-07-30 21:32 ` [PATCH v6 03/10] x86: Update piecewise move and store H.J. Lu
2021-08-02 11:20   ` Uros Bizjak
2021-08-02 14:56     ` [PATCH v7 " H.J. Lu
2021-08-02 15:53       ` Uros Bizjak
2021-07-30 21:32 ` [PATCH v6 04/10] x86: Add AVX2 tests for PR middle-end/90773 H.J. Lu
2021-07-30 21:32 ` [PATCH v6 05/10] x86: Add tests for piecewise move and store H.J. Lu
2021-07-30 21:32 ` [PATCH v6 06/10] x86: Also pass -mno-avx to pr72839.c H.J. Lu
2021-07-30 21:32 ` [PATCH v6 07/10] x86: Also pass -mno-avx to cold-attribute-1.c H.J. Lu
2021-07-30 21:32 ` [PATCH v6 08/10] x86: Also pass -mno-avx to sw-1.c for ia32 H.J. Lu
2021-07-30 21:32 ` [PATCH v6 09/10] x86: Update gcc.target/i386/incoming-11.c H.J. Lu
2021-07-30 21:32 ` [PATCH v6 10/10] x86: Also pass -mno-sse to vect8-ret.c H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).