From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id 780793854801 for ; Tue, 11 May 2021 23:35:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 780793854801 Received: by mail-pf1-x42b.google.com with SMTP id k19so17237059pfu.5 for ; Tue, 11 May 2021 16:35:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=zv6XOMnGuc6kI3O8WRO+eKMVaL5E9TV/Uz1zIFQ/on0=; b=CQEeklm/EBI+50t7dv43VECncOKKBlJnQWDTw+2zc8PnR9//NHn8W/tS2NzLokyyXS iGvTqDAdl9A/BEJWKa9P2J3bYZjqon8CqXWrUukiAOBNklblj+9lVJQQf6vYtlLDgkbb bonRZi/vmB/VPQ+l5cDn1DYFJF7jsDdHvHuVSMm2JPs9CmYDFNJSlXaz8LqVdoQTqvHe 8qJE+wCQtgIGGW3mRBC3r9ohRgfRaQ6aQWrDWuX2j0nR9krbd95TJpzTZhJzeO3BewAu QMaJsNJDCwkmw2vp56e37I4xY4iZi4NedrDG6wrukS6uoZ7WffuYoo8L1yW+1yke7uSc UsuA== X-Gm-Message-State: AOAM530THBLOFHQFEgXl3A56OoQ0X/oSqSmQohO3zxyCLgyVjbSsw6lO hjknfo/fuy+DRLagaStBv4giaoP7dIHkkw== X-Google-Smtp-Source: ABdhPJxbRBQ0/HatctKf42pr8u9UCBD6mpVsgk5UC4yxpfFm0qf17dAvBMjW3aPWiqnDEFZ6U3jenw== X-Received: by 2002:aa7:91d4:0:b029:2c5:7d17:c20c with SMTP id z20-20020aa791d40000b02902c57d17c20cmr9725395pfa.4.1620776139533; Tue, 11 May 2021 16:35:39 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.56.39.231]) by smtp.gmail.com with ESMTPSA id s8sm14278069pfe.112.2021.05.11.16.35.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 May 2021 16:35:37 -0700 (PDT) Received: from gnu-cfl-2.. (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 57BA4C013E; Tue, 11 May 2021 16:35:35 -0700 (PDT) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Richard Biener , Richard Sandiford , Uros Bizjak Subject: [PATCH v2 00/11] Allow TImode/OImode/XImode in op_by_pieces operations Date: Tue, 11 May 2021 16:35:24 -0700 Message-Id: <20210511233535.4448-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3025.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 May 2021 23:35:42 -0000 1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support target instructions to duplicate QImode value to TImode/OImode/XImode value for memmset. 2. x86: Avoid stack realignment when copying data 3. x86: Remov MAX_BITSIZE_MODE_ANY_INT. Only x86 backend defines it. 4. x86: Use TImode/OImode/XImode integers for piecewise move and store. 5. x86: Add tests for TImode/OImode/XImode for piecewise move and store. 6. x86: Adjust existing tests. On x86-64, SPEC CPU 2017 performance impact is neutral. Glibc code size differences with -O2 build are: Before After libc.so 1906572 1906444 Some code sequence differences in libc.so are: : ... jne | jne test %r15,%r15 test %r15,%r15 je | je mov %r13d,(%r14) mov %r13d,(%r14) lea 0x10(%r14),%rdi lea 0x10(%r14),%rdi mov $0x1,%ecx mov $0x1,%ecx mov %r13d,%edx mov %r13d,%edx mov %r15,0x40(%r12) mov %r15,0x40(%r12) mov %r15,%rsi mov %r15,%rsi call call lea 0xa2f9b(%rip),%rax # | lea 0xa2fab(%rip),%rax # xor %esi,%esi xor %esi,%esi mov %ebp,%edi mov %ebp,%edi mov %rax,0x8(%r12) mov %rax,0x8(%r12) movzwl 0x12(%rsp),%eax movzwl 0x12(%rsp),%eax mov $0x8,%edx < lea 0xc(%rsp),%rcx lea 0xc(%rsp),%rcx mov %r14,0x48(%r12) < add $0x40,%r14 < mov $0x4,%r8d mov $0x4,%r8d > movq $0x0,0x1d0(%r14) > mov $0x8,%edx rol $0x8,%ax rol $0x8,%ax mov %ebp,(%r12) | mov %r14,0x48(%r12) movq $0x0,0x190(%r14) | add $0x40,%r14 mov %ax,0x4(%r12) < mov %r14,0x30(%r12) mov %r14,0x30(%r12) > mov %ax,0x4(%r12) > mov %ebp,(%r12) movl $0x1,0xc(%rsp) movl $0x1,0xc(%rsp) call call mov %r12,%rdi mov %r12,%rdi movabs $0x101010101010101,%rdx < test %eax,%eax test %eax,%eax mov $0xff,%eax mov $0xff,%eax cmove %eax,%ebx cmove %eax,%ebx movzbl %bl,%eax | movd %ebx,%xmm0 mov %ebx,0xc(%rsp) mov %ebx,0xc(%rsp) mov %rax,%rsi | punpcklbw %xmm0,%xmm0 imul %rdx,%rsi | punpcklwd %xmm0,%xmm0 mul %rdx | pshufd $0x0,%xmm0,%xmm0 add %rsi,%rdx | movups %xmm0,0x50(%r12) mov %rax,0x50(%r12) | movups %xmm0,0x60(%r12) mov %rdx,0x58(%r12) | movups %xmm0,0x70(%r12) mov %rax,0x60(%r12) | movups %xmm0,0x80(%r12) mov %rdx,0x68(%r12) | movups %xmm0,0x90(%r12) mov %rax,0x70(%r12) | movups %xmm0,0xa0(%r12) mov %rdx,0x78(%r12) | movups %xmm0,0xb0(%r12) mov %rax,0x80(%r12) | movups %xmm0,0xc0(%r12) mov %rdx,0x88(%r12) | movups %xmm0,0xd0(%r12) mov %rax,0x90(%r12) | movups %xmm0,0xe0(%r12) mov %rdx,0x98(%r12) | movups %xmm0,0xf0(%r12) mov %rax,0xa0(%r12) | movups %xmm0,0x100(%r12) mov %rdx,0xa8(%r12) | movups %xmm0,0x110(%r12) mov %rax,0xb0(%r12) | movups %xmm0,0x120(%r12) mov %rdx,0xb8(%r12) | movups %xmm0,0x130(%r12) mov %rax,0xc0(%r12) | movups %xmm0,0x140(%r12) mov %rdx,0xc8(%r12) < mov %rax,0xd0(%r12) < mov %rdx,0xd8(%r12) < mov %rax,0xe0(%r12) < mov %rdx,0xe8(%r12) < mov %rax,0xf0(%r12) < mov %rdx,0xf8(%r12) < mov %rax,0x100(%r12) < mov %rdx,0x108(%r12) < mov %rax,0x110(%r12) < mov %rdx,0x118(%r12) < mov %rax,0x120(%r12) < mov %rdx,0x128(%r12) < mov %rax,0x130(%r12) < mov %rdx,0x138(%r12) < mov %rax,0x140(%r12) < mov %rdx,0x148(%r12) < call call add $0x28,%rsp add $0x28,%rsp mov %r12,%rax mov %r12,%rax pop %rbx pop %rbx pop %rbp pop %rbp pop %r12 pop %r12 pop %r13 pop %r13 pop %r14 pop %r14 pop %r15 pop %r15 ret ret H.J. Lu (11): Add TARGET_READ_MEMSET_VALUE/TARGET_GEN_MEMSET_VALUE x86: Avoid stack realignment when copying data Remove MAX_BITSIZE_MODE_ANY_INT x86: Update piecewise move and store x86: Add AVX2 tests for PR middle-end/90773 x86: Add tests for piecewise move and store x86: Also pass -mno-avx to pr72839.c x86: Also pass -mno-avx to cold-attribute-1.c x86: Also pass -mno-avx to sw-1.c for ia32 x86: Update gcc.target/i386/incoming-11.c constructor: Check if it is faster to load constant from memory gcc/builtins.c | 47 +-- gcc/config/i386/i386-expand.c | 18 +- gcc/config/i386/i386-modes.def | 15 +- gcc/config/i386/i386-protos.h | 5 + gcc/config/i386/i386.c | 289 +++++++++++++++++- gcc/config/i386/i386.h | 35 ++- gcc/doc/tm.texi | 16 + gcc/doc/tm.texi.in | 4 + gcc/expr.c | 11 +- gcc/target.def | 20 ++ gcc/targhooks.c | 56 ++++ gcc/targhooks.h | 4 + .../gcc.target/i386/cold-attribute-1.c | 2 +- gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 ++ gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +- .../gcc.target/i386/pieces-memcpy-10.c | 16 + .../gcc.target/i386/pieces-memcpy-11.c | 17 ++ .../gcc.target/i386/pieces-memcpy-12.c | 16 + .../gcc.target/i386/pieces-memcpy-13.c | 16 + .../gcc.target/i386/pieces-memcpy-14.c | 17 ++ .../gcc.target/i386/pieces-memcpy-15.c | 16 + .../gcc.target/i386/pieces-memcpy-16.c | 16 + .../gcc.target/i386/pieces-memcpy-7.c | 15 + .../gcc.target/i386/pieces-memcpy-8.c | 14 + .../gcc.target/i386/pieces-memcpy-9.c | 14 + .../gcc.target/i386/pieces-memset-1.c | 16 + .../gcc.target/i386/pieces-memset-10.c | 16 + .../gcc.target/i386/pieces-memset-11.c | 16 + .../gcc.target/i386/pieces-memset-12.c | 16 + .../gcc.target/i386/pieces-memset-13.c | 16 + .../gcc.target/i386/pieces-memset-14.c | 16 + .../gcc.target/i386/pieces-memset-15.c | 16 + .../gcc.target/i386/pieces-memset-16.c | 16 + .../gcc.target/i386/pieces-memset-17.c | 16 + .../gcc.target/i386/pieces-memset-18.c | 16 + .../gcc.target/i386/pieces-memset-19.c | 17 ++ .../gcc.target/i386/pieces-memset-2.c | 12 + .../gcc.target/i386/pieces-memset-20.c | 17 ++ .../gcc.target/i386/pieces-memset-21.c | 17 ++ .../gcc.target/i386/pieces-memset-22.c | 17 ++ .../gcc.target/i386/pieces-memset-23.c | 17 ++ .../gcc.target/i386/pieces-memset-24.c | 17 ++ .../gcc.target/i386/pieces-memset-25.c | 17 ++ .../gcc.target/i386/pieces-memset-26.c | 17 ++ .../gcc.target/i386/pieces-memset-27.c | 17 ++ .../gcc.target/i386/pieces-memset-28.c | 17 ++ .../gcc.target/i386/pieces-memset-29.c | 17 ++ .../gcc.target/i386/pieces-memset-3.c | 18 ++ .../gcc.target/i386/pieces-memset-30.c | 17 ++ .../gcc.target/i386/pieces-memset-31.c | 17 ++ .../gcc.target/i386/pieces-memset-32.c | 17 ++ .../gcc.target/i386/pieces-memset-33.c | 17 ++ .../gcc.target/i386/pieces-memset-34.c | 17 ++ .../gcc.target/i386/pieces-memset-35.c | 17 ++ .../gcc.target/i386/pieces-memset-36.c | 17 ++ .../gcc.target/i386/pieces-memset-37.c | 15 + .../gcc.target/i386/pieces-memset-38.c | 17 ++ .../gcc.target/i386/pieces-memset-39.c | 16 + .../gcc.target/i386/pieces-memset-4.c | 16 + .../gcc.target/i386/pieces-memset-40.c | 17 ++ .../gcc.target/i386/pieces-memset-41.c | 16 + .../gcc.target/i386/pieces-memset-42.c | 17 ++ .../gcc.target/i386/pieces-memset-43.c | 17 ++ .../gcc.target/i386/pieces-memset-5.c | 12 + .../gcc.target/i386/pieces-memset-6.c | 16 + .../gcc.target/i386/pieces-memset-7.c | 16 + .../gcc.target/i386/pieces-memset-8.c | 16 + .../gcc.target/i386/pieces-memset-9.c | 16 + gcc/testsuite/gcc.target/i386/pr72839.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-1.c | 10 +- gcc/testsuite/gcc.target/i386/pr90773-14.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-15.c | 14 + gcc/testsuite/gcc.target/i386/pr90773-16.c | 14 + gcc/testsuite/gcc.target/i386/pr90773-17.c | 14 + gcc/testsuite/gcc.target/i386/pr90773-18.c | 15 + gcc/testsuite/gcc.target/i386/pr90773-19.c | 14 + gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 + gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 + gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 + gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 + gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 ++ gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 ++ gcc/testsuite/gcc.target/i386/pr90773-4.c | 2 +- gcc/testsuite/gcc.target/i386/sw-1.c | 1 + 84 files changed, 1509 insertions(+), 83 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-13.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-14.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-15.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-16.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-17.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-18.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-19.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-20.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-21.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-22.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-23.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-24.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-25.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-26.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-27.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-28.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-29.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-30.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-31.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-32.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-33.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-34.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-35.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-36.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-37.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-38.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-39.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-40.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-41.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-42.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-43.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-9.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c -- 2.31.1