From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id 2FADD383D834 for ; Thu, 1 Jul 2021 15:22:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2FADD383D834 Received: by mail-pf1-x429.google.com with SMTP id 21so6209649pfp.3 for ; Thu, 01 Jul 2021 08:22:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=BPERXZ7W6OAWlEoV6aiR4cRh1N6zopGRBByiPr+8gsc=; b=JBwQ9CtLs/sZoIiZvIGmGcRcM7btbsNZKLKhjj+xOT8vhE7LGYK18FoY2+QLNGkwTd wN+NemgXkBWs6n6DQ/6CFEtuU30x0cDqZ49j5drju/vLNlH5fx89Aym/CuwLDjOGcBU1 tFs3DSrhe4ox5RyCivm94lL5beRkosTI2VEbLsgLQIszIQ5EvIa87eYPlFHwg/1Sa+uv vOZAHtI0mCdVqV6RnUMk0CyXBGTufiDD7bJwSACCMt+0BFLtcqawBUduG/yGgQ3pfnZT NMsKfRCQZTnsh104OfB0XhaS5zsx5c1yvG7uABQ2nYpvEy+9PRi2N4ILu76QZ+uE/8Qp hMCw== X-Gm-Message-State: AOAM533O4BHEgvGJv9Qt7HJKbrfuNhSrQ6fTTy8yQw71s6lQfObepkrS Z18VJZRLSotgPv1oumLiCxc= X-Google-Smtp-Source: ABdhPJywweyuwOVXMV+O4iKn7hfxXz1WRj+aDawYZTyEo7LC43C6LyUfGT432LyQwZAax3YIf/82yQ== X-Received: by 2002:a62:79c5:0:b029:312:b213:d9d with SMTP id u188-20020a6279c50000b0290312b2130d9dmr274916pfc.8.1625152967318; Thu, 01 Jul 2021 08:22:47 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.56.39.115]) by smtp.gmail.com with ESMTPSA id w21sm195336pge.30.2021.07.01.08.22.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Jul 2021 08:22:45 -0700 (PDT) Received: from gnu-cfl-2.. (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 5A9AFC0094; Thu, 1 Jul 2021 08:22:44 -0700 (PDT) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Richard Biener , Richard Sandiford , Uros Bizjak , Bernd Edlinger Subject: [PATCH v5 00/11] Allow TImode/OImode/XImode in op_by_pieces operations Date: Thu, 1 Jul 2021 08:22:33 -0700 Message-Id: <20210701152244.331984-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3023.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jul 2021 15:22:50 -0000 Changes in the v5 patches: 1. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard scratch register to avoid stack realignment when expanding memset. 2. Use vec_duplicate, instead of adding TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE, to expand memset if available. Changes in the v4 patches: 1. Define x86 MAX_MOVE_MAX to 64, which is the constant maximum number of bytes that a single instruction can move quickly between memory and registers or between two memory locations. 2. Define x86 MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of bytes we can move from memory to memory in one reasonably fast instruction. The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX must be a constant, independent of compiler options, since it is used in reload.h to define struct target_reload and MOVE_MAX can vary, depending on compiler options. Changes in the v3 patches: 1. Split the TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE changes into the generic part and the x86 part. 1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support target instructions to duplicate QImode value to TImode/OImode/XImode value for memmset. 2. x86: Avoid stack realignment when copying data 3. x86: Remov MAX_BITSIZE_MODE_ANY_INT. Only x86 backend defines it. 4. x86: Use TImode/OImode/XImode integers for piecewise move and store. 5. x86: Add tests for TImode/OImode/XImode for piecewise move and store. 6. x86: Adjust existing tests. On x86-64, SPEC CPU 2017 performance impact is neutral. Glibc code size differences with -O2 build are: Before After libc.so 1906572 1906444 Some code sequence differences in libc.so are: : ... jne | jne test %r15,%r15 test %r15,%r15 je | je mov %r13d,(%r14) mov %r13d,(%r14) lea 0x10(%r14),%rdi lea 0x10(%r14),%rdi mov $0x1,%ecx mov $0x1,%ecx mov %r13d,%edx mov %r13d,%edx mov %r15,0x40(%r12) mov %r15,0x40(%r12) mov %r15,%rsi mov %r15,%rsi call call lea 0xa2f9b(%rip),%rax # | lea 0xa2fab(%rip),%rax # xor %esi,%esi xor %esi,%esi mov %ebp,%edi mov %ebp,%edi mov %rax,0x8(%r12) mov %rax,0x8(%r12) movzwl 0x12(%rsp),%eax movzwl 0x12(%rsp),%eax mov $0x8,%edx < lea 0xc(%rsp),%rcx lea 0xc(%rsp),%rcx mov %r14,0x48(%r12) < add $0x40,%r14 < mov $0x4,%r8d mov $0x4,%r8d > movq $0x0,0x1d0(%r14) > mov $0x8,%edx rol $0x8,%ax rol $0x8,%ax mov %ebp,(%r12) | mov %r14,0x48(%r12) movq $0x0,0x190(%r14) | add $0x40,%r14 mov %ax,0x4(%r12) < mov %r14,0x30(%r12) mov %r14,0x30(%r12) > mov %ax,0x4(%r12) > mov %ebp,(%r12) movl $0x1,0xc(%rsp) movl $0x1,0xc(%rsp) call call mov %r12,%rdi mov %r12,%rdi movabs $0x101010101010101,%rdx < test %eax,%eax test %eax,%eax mov $0xff,%eax mov $0xff,%eax cmove %eax,%ebx cmove %eax,%ebx movzbl %bl,%eax | movd %ebx,%xmm0 mov %ebx,0xc(%rsp) mov %ebx,0xc(%rsp) mov %rax,%rsi | punpcklbw %xmm0,%xmm0 imul %rdx,%rsi | punpcklwd %xmm0,%xmm0 mul %rdx | pshufd $0x0,%xmm0,%xmm0 add %rsi,%rdx | movups %xmm0,0x50(%r12) mov %rax,0x50(%r12) | movups %xmm0,0x60(%r12) mov %rdx,0x58(%r12) | movups %xmm0,0x70(%r12) mov %rax,0x60(%r12) | movups %xmm0,0x80(%r12) mov %rdx,0x68(%r12) | movups %xmm0,0x90(%r12) mov %rax,0x70(%r12) | movups %xmm0,0xa0(%r12) mov %rdx,0x78(%r12) | movups %xmm0,0xb0(%r12) mov %rax,0x80(%r12) | movups %xmm0,0xc0(%r12) mov %rdx,0x88(%r12) | movups %xmm0,0xd0(%r12) mov %rax,0x90(%r12) | movups %xmm0,0xe0(%r12) mov %rdx,0x98(%r12) | movups %xmm0,0xf0(%r12) mov %rax,0xa0(%r12) | movups %xmm0,0x100(%r12) mov %rdx,0xa8(%r12) | movups %xmm0,0x110(%r12) mov %rax,0xb0(%r12) | movups %xmm0,0x120(%r12) mov %rdx,0xb8(%r12) | movups %xmm0,0x130(%r12) mov %rax,0xc0(%r12) | movups %xmm0,0x140(%r12) mov %rdx,0xc8(%r12) < mov %rax,0xd0(%r12) < mov %rdx,0xd8(%r12) < mov %rax,0xe0(%r12) < mov %rdx,0xe8(%r12) < mov %rax,0xf0(%r12) < mov %rdx,0xf8(%r12) < mov %rax,0x100(%r12) < mov %rdx,0x108(%r12) < mov %rax,0x110(%r12) < mov %rdx,0x118(%r12) < mov %rax,0x120(%r12) < mov %rdx,0x128(%r12) < mov %rax,0x130(%r12) < mov %rdx,0x138(%r12) < mov %rax,0x140(%r12) < mov %rdx,0x148(%r12) < call call add $0x28,%rsp add $0x28,%rsp mov %r12,%rax mov %r12,%rax pop %rbx pop %rbx pop %rbp pop %rbp pop %r12 pop %r12 pop %r13 pop %r13 pop %r14 pop %r14 pop %r15 pop %r15 ret ret *** BLURB HERE *** H.J. Lu (11): Rewrite memset with TARGET_GEN_MEMSET_SCRATCH_RTX x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX x86: Avoid stack realignment when copying data x86: Update piecewise move and store x86: Add AVX2 tests for PR middle-end/90773 x86: Add tests for piecewise move and store x86: Also pass -mno-avx to pr72839.c x86: Also pass -mno-avx to cold-attribute-1.c x86: Also pass -mno-avx to sw-1.c for ia32 x86: Update gcc.target/i386/incoming-11.c x86: Also pass -mno-sse to vect8-ret.c gcc/builtins.c | 123 +++++++++++++++--- gcc/config/i386/i386-expand.c | 4 +- gcc/config/i386/i386.c | 27 +++- gcc/config/i386/i386.h | 40 ++++-- gcc/doc/tm.texi | 5 + gcc/doc/tm.texi.in | 2 + gcc/target.def | 7 + .../gcc.target/i386/cold-attribute-1.c | 2 +- gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 ++++ gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +- .../gcc.target/i386/pieces-memcpy-10.c | 16 +++ .../gcc.target/i386/pieces-memcpy-11.c | 17 +++ .../gcc.target/i386/pieces-memcpy-12.c | 16 +++ .../gcc.target/i386/pieces-memcpy-13.c | 16 +++ .../gcc.target/i386/pieces-memcpy-14.c | 17 +++ .../gcc.target/i386/pieces-memcpy-15.c | 16 +++ .../gcc.target/i386/pieces-memcpy-16.c | 16 +++ .../gcc.target/i386/pieces-memcpy-7.c | 15 +++ .../gcc.target/i386/pieces-memcpy-8.c | 14 ++ .../gcc.target/i386/pieces-memcpy-9.c | 14 ++ .../gcc.target/i386/pieces-memset-1.c | 16 +++ .../gcc.target/i386/pieces-memset-10.c | 16 +++ .../gcc.target/i386/pieces-memset-11.c | 16 +++ .../gcc.target/i386/pieces-memset-12.c | 16 +++ .../gcc.target/i386/pieces-memset-13.c | 16 +++ .../gcc.target/i386/pieces-memset-14.c | 16 +++ .../gcc.target/i386/pieces-memset-15.c | 16 +++ .../gcc.target/i386/pieces-memset-16.c | 16 +++ .../gcc.target/i386/pieces-memset-17.c | 16 +++ .../gcc.target/i386/pieces-memset-18.c | 16 +++ .../gcc.target/i386/pieces-memset-19.c | 17 +++ .../gcc.target/i386/pieces-memset-2.c | 12 ++ .../gcc.target/i386/pieces-memset-20.c | 17 +++ .../gcc.target/i386/pieces-memset-21.c | 17 +++ .../gcc.target/i386/pieces-memset-22.c | 17 +++ .../gcc.target/i386/pieces-memset-23.c | 17 +++ .../gcc.target/i386/pieces-memset-24.c | 17 +++ .../gcc.target/i386/pieces-memset-25.c | 17 +++ .../gcc.target/i386/pieces-memset-26.c | 17 +++ .../gcc.target/i386/pieces-memset-27.c | 17 +++ .../gcc.target/i386/pieces-memset-28.c | 17 +++ .../gcc.target/i386/pieces-memset-29.c | 17 +++ .../gcc.target/i386/pieces-memset-3.c | 18 +++ .../gcc.target/i386/pieces-memset-30.c | 17 +++ .../gcc.target/i386/pieces-memset-31.c | 17 +++ .../gcc.target/i386/pieces-memset-32.c | 17 +++ .../gcc.target/i386/pieces-memset-33.c | 17 +++ .../gcc.target/i386/pieces-memset-34.c | 17 +++ .../gcc.target/i386/pieces-memset-35.c | 17 +++ .../gcc.target/i386/pieces-memset-36.c | 17 +++ .../gcc.target/i386/pieces-memset-37.c | 15 +++ .../gcc.target/i386/pieces-memset-38.c | 17 +++ .../gcc.target/i386/pieces-memset-39.c | 16 +++ .../gcc.target/i386/pieces-memset-4.c | 16 +++ .../gcc.target/i386/pieces-memset-40.c | 17 +++ .../gcc.target/i386/pieces-memset-41.c | 16 +++ .../gcc.target/i386/pieces-memset-42.c | 17 +++ .../gcc.target/i386/pieces-memset-43.c | 17 +++ .../gcc.target/i386/pieces-memset-5.c | 12 ++ .../gcc.target/i386/pieces-memset-6.c | 16 +++ .../gcc.target/i386/pieces-memset-7.c | 16 +++ .../gcc.target/i386/pieces-memset-8.c | 16 +++ .../gcc.target/i386/pieces-memset-9.c | 16 +++ gcc/testsuite/gcc.target/i386/pr100865-1.c | 2 +- gcc/testsuite/gcc.target/i386/pr100865-10a.c | 4 +- gcc/testsuite/gcc.target/i386/pr100865-10b.c | 4 +- gcc/testsuite/gcc.target/i386/pr100865-2.c | 2 +- gcc/testsuite/gcc.target/i386/pr100865-3.c | 2 +- gcc/testsuite/gcc.target/i386/pr100865-4a.c | 6 +- gcc/testsuite/gcc.target/i386/pr100865-4b.c | 8 +- gcc/testsuite/gcc.target/i386/pr72839.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-1.c | 10 +- gcc/testsuite/gcc.target/i386/pr90773-14.c | 4 +- gcc/testsuite/gcc.target/i386/pr90773-15.c | 14 ++ gcc/testsuite/gcc.target/i386/pr90773-16.c | 14 ++ gcc/testsuite/gcc.target/i386/pr90773-17.c | 14 ++ gcc/testsuite/gcc.target/i386/pr90773-18.c | 15 +++ gcc/testsuite/gcc.target/i386/pr90773-19.c | 14 ++ gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-24.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-25.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-26.c | 21 +++ gcc/testsuite/gcc.target/i386/pr90773-4.c | 2 +- gcc/testsuite/gcc.target/i386/sw-1.c | 1 + gcc/testsuite/gcc.target/i386/vect8-ret.c | 2 +- 88 files changed, 1231 insertions(+), 64 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-13.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-14.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-15.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-16.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-17.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-18.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-19.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-20.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-21.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-22.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-23.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-24.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-25.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-26.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-27.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-28.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-29.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-30.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-31.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-32.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-33.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-34.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-35.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-36.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-37.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-38.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-39.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-40.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-41.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-42.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-43.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-9.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-26.c -- 2.31.1