From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by sourceware.org (Postfix) with ESMTPS id 6BD6E3959E5F for ; Tue, 18 May 2021 19:16:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6BD6E3959E5F Received: by mail-pf1-x42c.google.com with SMTP id d78so7279772pfd.10 for ; Tue, 18 May 2021 12:16:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=edbuY3WbzlWtSg9TN7hfYLShDqcmLncwwM2XjsFrY4s=; b=nqLhVa9CfXBP8ry4p2MvNwtgu2Orw0oPuLlYKhlt38+EWdY3xkb/CtOhT21Ugw2grH 79LlrduVLEsfDmY8e/wghZpBVQohK92V+wuHp0Yg1UCi0Zl31bVpf2bt/E9JC/bIhbIU LsUUJS8OhjmGce9i+6HNEg1SVJxxDjKkQU2YKhLbIvqm/h1blX2rw1QPKzzQabfgJQeX Yq6t/vaWbyAeQsKsimA8ycZsOlLf8+v7lf0+eK4aVtJYpVXHObQjSwU+/X51Pyod/aV5 kaxe9A+EWo7oTpvAp1UZ8wSoBRH5efV8HXXvxh20I/FpvMIJ6EE4ny7OZKEwHj0prb/y SEDQ== X-Gm-Message-State: AOAM533PfQhOwEy2Q9E6vB7++Atc57a6EivNCeMSPu8RxeZFB304gJf4 8MynKf9EkbhgbrOmeIxRFWs= X-Google-Smtp-Source: ABdhPJwM+qRhzAyKevW4bg/ILgttycri8k2iFcckVbpDjBSCyriVpfLLVLpsi2h8kAsr1RIAztbO8g== X-Received: by 2002:a63:1906:: with SMTP id z6mr6642527pgl.289.1621365413461; Tue, 18 May 2021 12:16:53 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.56.39.231]) by smtp.gmail.com with ESMTPSA id gm21sm2447728pjb.31.2021.05.18.12.16.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 May 2021 12:16:50 -0700 (PDT) Received: from gnu-cfl-2.. (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 8615EC04D6; Tue, 18 May 2021 12:16:46 -0700 (PDT) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Richard Biener , Richard Sandiford , Uros Bizjak , Bernd Edlinger Subject: [PATCH v4 05/12] x86: Update piecewise move and store Date: Tue, 18 May 2021 12:16:39 -0700 Message-Id: <20210518191646.3071005-6-hjl.tools@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210518191646.3071005-1-hjl.tools@gmail.com> References: <20210518191646.3071005-1-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3035.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 May 2021 19:16:56 -0000 We can use TImode/OImode/XImode integers for piecewise move and store. 1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of bytes that a single instruction can move quickly between memory and registers or between two memory locations. 2. Define MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of bytes we can move from memory to memory in one reasonably fast instruction. The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX must be a constant, independent of compiler options, since it is used in reload.h to define struct target_reload and MOVE_MAX can vary, depending on compiler options. 3. When vector register is used for piecewise move and store, we don't increase stack_alignment_needed since vector register spill isn't required for piecewise move and store. Since stack_realign_needed is set to true by checking stack_alignment_estimated set by pseudo vector register usage, we also need to check stack_realign_needed to eliminate frame pointer. gcc/ * config/i386/i386.c (ix86_finalize_stack_frame_flags): Also check stack_realign_needed for stack realignment. (ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller than the largest integer supported by vector register. * config/i386/i386.h (MAX_MOVE_MAX): New. Set to 64. (MOVE_MAX_PIECES): Set to bytes of the largest integer supported by vector register. (MOVE_MAX): Defined to MOVE_MAX_PIECES. (STORE_MAX_PIECES): New. gcc/testsuite/ * gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit. * gcc.target/i386/pr90773-4.c: Also run for 32-bit. * gcc.target/i386/pr90773-14.c: Likewise. * gcc.target/i386/pr90773-15.c: Likewise. * gcc.target/i386/pr90773-16.c: Likewise. * gcc.target/i386/pr90773-17.c: Likewise. --- gcc/config/i386/i386.c | 21 ++++++++++-- gcc/config/i386/i386.h | 40 +++++++++++++++++----- gcc/testsuite/gcc.target/i386/pr90773-1.c | 10 ++---- gcc/testsuite/gcc.target/i386/pr90773-14.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-15.c | 6 ++-- gcc/testsuite/gcc.target/i386/pr90773-16.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-17.c | 2 +- gcc/testsuite/gcc.target/i386/pr90773-4.c | 2 +- 8 files changed, 60 insertions(+), 25 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6a981a01668..bdf773a312f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -7942,8 +7942,17 @@ ix86_finalize_stack_frame_flags (void) assumed stack realignment might be needed or -fno-omit-frame-pointer is used, but in the end nothing that needed the stack alignment had been spilled nor stack access, clear frame_pointer_needed and say we - don't need stack realignment. */ - if ((stack_realign || (!flag_omit_frame_pointer && optimize)) + don't need stack realignment. + + When vector register is used for piecewise move and store, we don't + increase stack_alignment_needed as there is no register spill for + piecewise move and store. Since stack_realign_needed is set to true + by checking stack_alignment_estimated which is updated by pseudo + vector register usage, we also need to check stack_realign_needed to + eliminate frame pointer. */ + if ((stack_realign + || (!flag_omit_frame_pointer && optimize) + || crtl->stack_realign_needed) && frame_pointer_needed && crtl->is_leaf && crtl->sp_is_unchanging @@ -10402,7 +10411,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x) /* FALLTHRU */ case E_OImode: case E_XImode: - if (!standard_sse_constant_p (x, mode)) + if (!standard_sse_constant_p (x, mode) + && GET_MODE_SIZE (TARGET_AVX512F + ? XImode + : (TARGET_AVX + ? OImode + : (TARGET_SSE2 + ? TImode : DImode))) < GET_MODE_SIZE (mode)) return false; default: break; diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 45d86802c51..5250b035a85 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1752,9 +1752,10 @@ typedef struct ix86_args { /* Define this as 1 if `char' should by default be signed; else as 0. */ #define DEFAULT_SIGNED_CHAR 1 -/* Max number of bytes we can move from memory to memory - in one reasonably fast instruction. */ -#define MOVE_MAX 16 +/* The constant maximum number of bytes that a single instruction can + move quickly between memory and registers or between two memory + locations. */ +#define MAX_MOVE_MAX 64 /* MOVE_MAX_PIECES is the number of bytes at a time which we can move efficiently, as opposed to MOVE_MAX which is the maximum @@ -1765,11 +1766,34 @@ typedef struct ix86_args { widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in 64-bit mode. */ #define MOVE_MAX_PIECES \ - ((TARGET_64BIT \ - && TARGET_SSE2 \ - && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ - && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ - ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD) + ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \ + ? 64 \ + : ((TARGET_AVX \ + && !TARGET_PREFER_AVX128 \ + && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \ + && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \ + ? 32 \ + : ((TARGET_SSE2 \ + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ + ? 16 : UNITS_PER_WORD))) + +/* Max number of bytes we can move from memory to memory in one + reasonably fast instruction. */ +#define MOVE_MAX MOVE_MAX_PIECES + +/* STORE_MAX_PIECES is the number of bytes at a time that we can + store efficiently. */ +#define STORE_MAX_PIECES \ + ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \ + ? 64 \ + : ((TARGET_AVX \ + && !TARGET_PREFER_AVX128 \ + && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \ + ? 32 \ + : ((TARGET_SSE2 \ + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ + ? 16 : UNITS_PER_WORD))) /* If a memory-to-memory move would take MOVE_RATIO or more simple move-instruction pairs, we will do a cpymem or libcall instead. diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c index 1d9f282dc0d..4fd5a40d99d 100644 --- a/gcc/testsuite/gcc.target/i386/pr90773-1.c +++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mtune=generic" } */ +/* { dg-options "-O2 -msse2 -mtune=generic" } */ extern char *dst, *src; @@ -9,9 +9,5 @@ foo (void) __builtin_memcpy (dst, src, 15); } -/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */ -/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */ -/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ -/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ -/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ -/* { dg-final { scan-assembler-times "movl\[\\t \]+11\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 } } */ +/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c index 6364916ecac..74ba5055960 100644 --- a/gcc/testsuite/gcc.target/i386/pr90773-14.c +++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ extern char *dst; diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c index c0a96fed892..880f71d1567 100644 --- a/gcc/testsuite/gcc.target/i386/pr90773-15.c +++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -march=skylake-avx512" } */ extern char *dst; @@ -9,6 +9,6 @@ foo (int c) __builtin_memset (dst, c, 17); } -/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%.*, %xmm\[0-9\]+" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ -/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+%.*, 16\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c index d2d1ec6141c..32a976b10df 100644 --- a/gcc/testsuite/gcc.target/i386/pr90773-16.c +++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -march=skylake-avx512" } */ extern char *dst; diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c index 6c8da7d24ef..2d6fbf22a8b 100644 --- a/gcc/testsuite/gcc.target/i386/pr90773-17.c +++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -march=skylake-avx512" } */ extern char *dst; diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c index ec0bc0100ae..ee4c04678d1 100644 --- a/gcc/testsuite/gcc.target/i386/pr90773-4.c +++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ extern char *dst; -- 2.31.1