From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1251) id 4038B3858D1E; Sat, 5 Mar 2022 08:52:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4038B3858D1E MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" From: Roger Sayle To: gcc-cvs@gcc.gnu.org Subject: [gcc r12-7502] PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32. X-Act-Checkin: gcc X-Git-Author: Roger Sayle X-Git-Refname: refs/heads/master X-Git-Oldrev: 1301d7f647c7ac40da7f910aa6e790205e34bb8b X-Git-Newrev: 8ea4a34bd0b0a46277b5e077c89cbd86dfb09c48 Message-Id: <20220305085215.4038B3858D1E@sourceware.org> Date: Sat, 5 Mar 2022 08:52:15 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Mar 2022 08:52:15 -0000 https://gcc.gnu.org/g:8ea4a34bd0b0a46277b5e077c89cbd86dfb09c48 commit r12-7502-g8ea4a34bd0b0a46277b5e077c89cbd86dfb09c48 Author: Roger Sayle Date: Sat Mar 5 08:50:45 2022 +0000 PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32. This clean-up patch resolves PR testsuite/104732, the failure of the recent test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86. Rather than just tweak the testcase, the proposed approach is to fix the underlying problem by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode logical operation expanders and pre-reload splitters in i386.md, which as I'll show generate inferior code (even a GCC 12 regression) on !TARGET_64BIT whenever -mno-stv (such as Solaris) or -msse (but not -msse2). First a little bit of history. In the beginning, DImode operations on i386 weren't defined by the machine description, and lowered during RTL expansion to SI mode operations. The with PR 65105 in 2015, -mstv was added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x) together with several *di3_doubleword post-reload splitters that made use of register allocation to perform some double word operations in 64-but XMM registers. A short while later in 2016, PR 70322 added similar support for one_cmpldi2. All of this logic was dependent upon "!TARGET_64BIT && TARGET_STV && TARGET_SSE2". With the passing of time, these conditions became irrelevant when in 2019, it was decided to split these double-word patterns before reload. https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html Hence the current situation, where on most modern CPU architectures (where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI mode operations, that are then split into two SI mode instructions before reload, except on Solaris and other odd cases, where the splitting is to two SI mode instructions is done during RTL expansion. By the time compilation reaches register allocation both paths in theory produce identical or similar code, so the vestigial legacy/logic would appear to be harmless. Unfortunately, there is one place where this arbitrary choice of how to lower DI mode doubleword operations is visible to the middle-end, it controls whether the backend appears to have a suitable optab, and the presence (or not) of DImode optabs can influence vectorization cost models and veclower decisions. The issue (and code quality regression) can be seen in this test case: typedef long long v2di __attribute__((vector_size (16))); v2di x; void foo (long long a) { v2di t = {a, a}; x = ~t; } which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces: foo: subl $28, %esp movl %ebx, 16(%esp) movl 32(%esp), %eax movl %esi, 20(%esp) movl 36(%esp), %edx movl %edi, 24(%esp) movl %eax, %esi movl %eax, %edi movl %edx, %ebx movl %edx, %ecx notl %esi notl %ebx movl %esi, (%esp) notl %edi notl %ecx movl %ebx, 4(%esp) movl 20(%esp), %esi movl %edi, 8(%esp) movl 16(%esp), %ebx movl %ecx, 12(%esp) movl 24(%esp), %edi movss 8(%esp), %xmm1 movss 12(%esp), %xmm2 movss (%esp), %xmm0 movss 4(%esp), %xmm3 unpcklps %xmm2, %xmm1 unpcklps %xmm3, %xmm0 movlhps %xmm1, %xmm0 movaps %xmm0, x addl $28, %esp ret Importantly notice the four "notl" instructions. With this patch: foo: subl $28, %esp movl 32(%esp), %edx movl 36(%esp), %eax notl %edx movl %edx, (%esp) notl %eax movl %eax, 4(%esp) movl %edx, 8(%esp) movl %eax, 12(%esp) movaps (%esp), %xmm1 movaps %xmm1, x addl $28, %esp ret Notice only two "notl" instructions. Checking with godbolt.org, GCC generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x, and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies this clean-up as suitable for stage 4]. Most significantly, this patch allows pr100711-1.c to pass with -mno-stv, allowing pandn to be used with V2DImode on Solaris/x86. Fingers-crossed this should reduce the number of discrepancies encountered supporting Solaris/x86. 2022-03-05 Roger Sayle Uroš Bizjak gcc/ChangeLog PR testsuite/104732 * config/i386/i386.md (SWIM1248x): Renamed from SWIM1248s. Include DI mode unconditionally. (*anddi3_doubleword): Remove && TARGET_STV && TARGET_SSE2 condition, i.e. always split on !TARGET_64BIT. (*di3_doubleword): Likewise. (*one_cmpldi2_doubleword): Likewise. (and3 expander): Update to use SWIM1248x from SWIM1248s. (3 expander): Likewise. (one_cmpl2 expander): Likewise. gcc/testsuite/ChangeLog PR testsuite/104732 * gcc.target/i386/pr104732.c: New test case. Diff: --- gcc/config/i386/i386.md | 28 ++++++++++++++-------------- gcc/testsuite/gcc.target/i386/pr104732.c | 14 ++++++++++++++ 2 files changed, 28 insertions(+), 14 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 5e0a980e7ae..d15170e8acf 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1079,11 +1079,11 @@ (HI "TARGET_HIMODE_MATH") SI]) -;; Math-dependant integer modes with DImode (enabled for 32bit with STV). -(define_mode_iterator SWIM1248s +;; Math-dependant integer modes with DImode. +(define_mode_iterator SWIM1248x [(QI "TARGET_QIMODE_MATH") (HI "TARGET_HIMODE_MATH") - SI (DI "TARGET_64BIT || (TARGET_STV && TARGET_SSE2)")]) + SI DI]) ;; Math-dependant single word integer modes without QImode. (define_mode_iterator SWIM248 [(HI "TARGET_HIMODE_MATH") @@ -9693,9 +9693,9 @@ ;; it should be done with splitters. (define_expand "and3" - [(set (match_operand:SWIM1248s 0 "nonimmediate_operand") - (and:SWIM1248s (match_operand:SWIM1248s 1 "nonimmediate_operand") - (match_operand:SWIM1248s 2 "")))] + [(set (match_operand:SWIM1248x 0 "nonimmediate_operand") + (and:SWIM1248x (match_operand:SWIM1248x 1 "nonimmediate_operand") + (match_operand:SWIM1248x 2 "")))] "" { machine_mode mode = mode; @@ -9733,7 +9733,7 @@ (match_operand:DI 1 "nonimmediate_operand") (match_operand:DI 2 "x86_64_szext_general_operand"))) (clobber (reg:CC FLAGS_REG))] - "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 + "!TARGET_64BIT && ix86_binary_operator_ok (AND, DImode, operands) && ix86_pre_reload_split ()" "#" @@ -10337,9 +10337,9 @@ ;; If this is considered useful, it should be done with splitters. (define_expand "3" - [(set (match_operand:SWIM1248s 0 "nonimmediate_operand") - (any_or:SWIM1248s (match_operand:SWIM1248s 1 "nonimmediate_operand") - (match_operand:SWIM1248s 2 "")))] + [(set (match_operand:SWIM1248x 0 "nonimmediate_operand") + (any_or:SWIM1248x (match_operand:SWIM1248x 1 "nonimmediate_operand") + (match_operand:SWIM1248x 2 "")))] "" "ix86_expand_binary_operator (, mode, operands); DONE;") @@ -10349,7 +10349,7 @@ (match_operand:DI 1 "nonimmediate_operand") (match_operand:DI 2 "x86_64_szext_general_operand"))) (clobber (reg:CC FLAGS_REG))] - "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 + "!TARGET_64BIT && ix86_binary_operator_ok (, DImode, operands) && ix86_pre_reload_split ()" "#" @@ -11427,15 +11427,15 @@ ;; One complement instructions (define_expand "one_cmpl2" - [(set (match_operand:SWIM1248s 0 "nonimmediate_operand") - (not:SWIM1248s (match_operand:SWIM1248s 1 "nonimmediate_operand")))] + [(set (match_operand:SWIM1248x 0 "nonimmediate_operand") + (not:SWIM1248x (match_operand:SWIM1248x 1 "nonimmediate_operand")))] "" "ix86_expand_unary_operator (NOT, mode, operands); DONE;") (define_insn_and_split "*one_cmpldi2_doubleword" [(set (match_operand:DI 0 "nonimmediate_operand") (not:DI (match_operand:DI 1 "nonimmediate_operand")))] - "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 + "!TARGET_64BIT && ix86_unary_operator_ok (NOT, DImode, operands) && ix86_pre_reload_split ()" "#" diff --git a/gcc/testsuite/gcc.target/i386/pr104732.c b/gcc/testsuite/gcc.target/i386/pr104732.c new file mode 100644 index 00000000000..c8954366c6d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104732.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2 -m32 -msse -march=pentiumpro" } */ + +typedef long long v2di __attribute__((vector_size (16))); + +v2di x; + +void foo (long long a) +{ + v2di t = {a, a}; + x = ~t; +} + +/* { dg-final { scan-assembler-times "notl" 2 } } */