public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target
@ 2015-02-18 12:11 enkovich.gnu at gmail dot com
  2015-02-18 12:45 ` [Bug target/65105] " jakub at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: enkovich.gnu at gmail dot com @ 2015-02-18 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

            Bug ID: 65105
           Summary: [i386] XMM registers are not used for 64bit
                    computations on 32bit target
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: enkovich.gnu at gmail dot com

XMM registers may be used for 64bit operations on 32bit target.  It should make
code faster and free some GPRs.

Here is an example test where GCC doesn't use XMM registers and possible code
with XMM usage:

>cat test.c
long long
test1 (long long x, long long y, long long z)
{
  return ((x | z ) + (y & z) - z);
}
>cat test_xmm.s
        .file "test.c"
        .text
        .globl test1
test1:
        movq      4(%esp), %xmm2
        movq      20(%esp), %xmm1
        movq      12(%esp), %xmm0
        por       %xmm1, %xmm2
        pand      %xmm1, %xmm0
        paddq     %xmm0, %xmm2
        psubq     %xmm1, %xmm2
        movd      %xmm2, %eax
        psrlq     $32, %xmm2
        movd      %xmm2, %edx
        ret


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65105] [i386] XMM registers are not used for 64bit computations on 32bit target
  2015-02-18 12:11 [Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target enkovich.gnu at gmail dot com
@ 2015-02-18 12:45 ` jakub at gcc dot gnu.org
  2015-02-18 13:28 ` enkovich.gnu at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-18 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rth at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The thing is that we already lower the DImode insns during subreg1 pass, and
decision whether XMM regs can be used for it or not would best be done during
RA, but if it is not useful, RA certainly prefers to see the patterns already
split.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65105] [i386] XMM registers are not used for 64bit computations on 32bit target
  2015-02-18 12:11 [Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target enkovich.gnu at gmail dot com
  2015-02-18 12:45 ` [Bug target/65105] " jakub at gcc dot gnu.org
@ 2015-02-18 13:28 ` enkovich.gnu at gmail dot com
  2015-09-29  9:33 ` ienkovich at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: enkovich.gnu at gmail dot com @ 2015-02-18 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

--- Comment #2 from Ilya Enkovich <enkovich.gnu at gmail dot com> ---
For this test I see 'plus' and 'minus' ops have DI mode until RA and get GPR
pairs:

(insn 12 35 13 2 (parallel [
            (set (reg:DI 0 ax [orig:98 D.1945 ] [98])
                (plus:DI (reg:DI 0 ax [orig:97 D.1945 ] [97])
                    (reg:DI 2 cx [orig:96 D.1945 ] [96])))
            (clobber (reg:CC 17 flags))
        ]) test.c:4 215 {*adddi3_doubleword}
     (nil))
(insn 13 12 18 2 (parallel [
            (set (reg:DI 0 ax [orig:95 D.1945 ] [95])
                (minus:DI (reg:DI 0 ax [orig:98 D.1945 ] [98])
                    (reg/v:DI 4 si [orig:94 z ] [94])))
            (clobber (reg:CC 17 flags))
        ]) test.c:4 259 {*subdi3_doubleword}
     (nil))

'ior' and 'and' use SI mode and subregs starting from expand.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65105] [i386] XMM registers are not used for 64bit computations on 32bit target
  2015-02-18 12:11 [Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target enkovich.gnu at gmail dot com
  2015-02-18 12:45 ` [Bug target/65105] " jakub at gcc dot gnu.org
  2015-02-18 13:28 ` enkovich.gnu at gmail dot com
@ 2015-09-29  9:33 ` ienkovich at gcc dot gnu.org
  2015-09-29  9:37 ` ienkovich at gcc dot gnu.org
  2022-03-05  8:52 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: ienkovich at gcc dot gnu.org @ 2015-09-29  9:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

--- Comment #3 from Ilya Enkovich <ienkovich at gcc dot gnu.org> ---
Author: ienkovich
Date: Tue Sep 29 09:32:40 2015
New Revision: 228231

URL: https://gcc.gnu.org/viewcvs?rev=228231&root=gcc&view=rev
Log:
gcc/

        PR target/65105
        * config/i386/i386.c: Include dbgcnt.h.
        (has_non_address_hard_reg): New.
        (convertible_comparison_p): New.
        (scalar_to_vector_candidate_p): New.
        (remove_non_convertible_regs): New.
        (scalar_chain): New.
        (scalar_chain::scalar_chain): New.
        (scalar_chain::~scalar_chain): New.
        (scalar_chain::add_to_queue): New.
        (scalar_chain::mark_dual_mode_def): New.
        (scalar_chain::analyze_register_chain): New.
        (scalar_chain::add_insn): New.
        (scalar_chain::build): New.
        (scalar_chain::compute_convert_gain): New.
        (scalar_chain::replace_with_subreg): New.
        (scalar_chain::replace_with_subreg_in_insn): New.
        (scalar_chain::emit_conversion_insns): New.
        (scalar_chain::make_vector_copies): New.
        (scalar_chain::convert_reg): New.
        (scalar_chain::convert_op): New.
        (scalar_chain::convert_insn): New.
        (scalar_chain::convert): New.
        (convert_scalars_to_vector): New.
        (pass_data_stv): New.
        (pass_stv): New.
        (make_pass_stv): New.
        (ix86_option_override): Created and register stv pass.
        (flag_opts): Add -mstv.
        (ix86_option_override_internal): Likewise.
        * config/i386/i386.md (SWIM1248x): New.
        (*movdi_internal): Add xmm to mem alternative for TARGET_STV.
        (and<mode>3): Use SWIM1248x iterator instead of SWIM.
        (*anddi3_doubleword): New.
        (*zext<mode>_doubleword): New.
        (*zextsi_doubleword): New.
        (<code><mode>3): Use SWIM1248x iterator instead of SWIM.
        (*<code>di3_doubleword): New.
        * config/i386/i386.opt (mstv): New.
        * dbgcnt.def (stv_conversion): New.

gcc/testsuite/

        PR target/65105
        * gcc.target/i386/pr65105-1.c: New.
        * gcc.target/i386/pr65105-2.c: New.
        * gcc.target/i386/pr65105-3.c: New.
        * gcc.target/i386/pr65105-4.C: New.
        * gcc.dg/lower-subreg-1.c: Add -mno-stv options for ia32.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr65105-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr65105-2.c
    trunk/gcc/testsuite/gcc.target/i386/pr65105-3.c
    trunk/gcc/testsuite/gcc.target/i386/pr65105-4.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/i386.opt
    trunk/gcc/dbgcnt.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/lower-subreg-1.c


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65105] [i386] XMM registers are not used for 64bit computations on 32bit target
  2015-02-18 12:11 [Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target enkovich.gnu at gmail dot com
                   ` (2 preceding siblings ...)
  2015-09-29  9:33 ` ienkovich at gcc dot gnu.org
@ 2015-09-29  9:37 ` ienkovich at gcc dot gnu.org
  2022-03-05  8:52 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: ienkovich at gcc dot gnu.org @ 2015-09-29  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

Ilya Enkovich <ienkovich at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |ienkovich at gcc dot gnu.org
         Resolution|---                         |FIXED

--- Comment #4 from Ilya Enkovich <ienkovich at gcc dot gnu.org> ---
New pass_stv handles it by transforming scalar computations into vector ones.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65105] [i386] XMM registers are not used for 64bit computations on 32bit target
  2015-02-18 12:11 [Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target enkovich.gnu at gmail dot com
                   ` (3 preceding siblings ...)
  2015-09-29  9:37 ` ienkovich at gcc dot gnu.org
@ 2022-03-05  8:52 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-05  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:8ea4a34bd0b0a46277b5e077c89cbd86dfb09c48

commit r12-7502-g8ea4a34bd0b0a46277b5e077c89cbd86dfb09c48
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sat Mar 5 08:50:45 2022 +0000

    PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32.

    This clean-up patch resolves PR testsuite/104732, the failure of the recent
    test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86.  Rather than just
    tweak the testcase, the proposed approach is to fix the underlying problem
    by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode
    logical operation expanders and pre-reload splitters in i386.md, which as
    I'll show generate inferior code (even a GCC 12 regression) on
!TARGET_64BIT
    whenever -mno-stv (such as Solaris) or -msse (but not -msse2).

    First a little bit of history.  In the beginning, DImode operations on
    i386 weren't defined by the machine description, and lowered during RTL
    expansion to SI mode operations.  The with PR 65105 in 2015, -mstv was
    added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x)
    together with several *<code>di3_doubleword post-reload splitters that
    made use of register allocation to perform some double word operations
    in 64-but XMM registers.  A short while later in 2016, PR 70322 added
    similar support for one_cmpldi2.  All of this logic was dependent upon
    "!TARGET_64BIT && TARGET_STV && TARGET_SSE2".  With the passing of time,
    these conditions became irrelevant when in 2019, it was decided to split
    these double-word patterns before reload.
    https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html
    https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html
    Hence the current situation, where on most modern CPU architectures
    (where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI
    mode operations, that are then split into two SI mode instructions
    before reload, except on Solaris and other odd cases, where the splitting
    is to two SI mode instructions is done during RTL expansion.  By the
    time compilation reaches register allocation both paths in theory
    produce identical or similar code, so the vestigial legacy/logic would
    appear to be harmless.

    Unfortunately, there is one place where this arbitrary choice of how
    to lower DI mode doubleword operations is visible to the middle-end,
    it controls whether the backend appears to have a suitable optab, and
    the presence (or not) of DImode optabs can influence vectorization
    cost models and veclower decisions.

    The issue (and code quality regression) can be seen in this test case:

    typedef long long v2di __attribute__((vector_size (16)));
    v2di x;
    void foo (long long a)
    {
        v2di t = {a, a};
        x = ~t;
    }

    which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces:

    foo:    subl    $28, %esp
            movl    %ebx, 16(%esp)
            movl    32(%esp), %eax
            movl    %esi, 20(%esp)
            movl    36(%esp), %edx
            movl    %edi, 24(%esp)
            movl    %eax, %esi
            movl    %eax, %edi
            movl    %edx, %ebx
            movl    %edx, %ecx
            notl    %esi
            notl    %ebx
            movl    %esi, (%esp)
            notl    %edi
            notl    %ecx
            movl    %ebx, 4(%esp)
            movl    20(%esp), %esi
            movl    %edi, 8(%esp)
            movl    16(%esp), %ebx
            movl    %ecx, 12(%esp)
            movl    24(%esp), %edi
            movss   8(%esp), %xmm1
            movss   12(%esp), %xmm2
            movss   (%esp), %xmm0
            movss   4(%esp), %xmm3
            unpcklps        %xmm2, %xmm1
            unpcklps        %xmm3, %xmm0
            movlhps %xmm1, %xmm0
            movaps  %xmm0, x
            addl    $28, %esp
            ret

    Importantly notice the four "notl" instructions.  With this patch:

    foo:    subl    $28, %esp
            movl    32(%esp), %edx
            movl    36(%esp), %eax
            notl    %edx
            movl    %edx, (%esp)
            notl    %eax
            movl    %eax, 4(%esp)
            movl    %edx, 8(%esp)
            movl    %eax, 12(%esp)
            movaps  (%esp), %xmm1
            movaps  %xmm1, x
            addl    $28, %esp
            ret

    Notice only two "notl" instructions.  Checking with godbolt.org, GCC
    generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x,
    and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies
    this clean-up as suitable for stage 4].

    Most significantly, this patch allows pr100711-1.c to pass with
    -mno-stv, allowing pandn to be used with V2DImode on Solaris/x86.
    Fingers-crossed this should reduce the number of discrepancies
    encountered supporting Solaris/x86.

    2022-03-05  Roger Sayle  <roger@nextmovesoftware.com>
                Uroš Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog
            PR testsuite/104732
            * config/i386/i386.md (SWIM1248x): Renamed from SWIM1248s.
            Include DI mode unconditionally.
            (*anddi3_doubleword): Remove && TARGET_STV && TARGET_SSE2
condition,
            i.e. always split on !TARGET_64BIT.
            (*<any_or>di3_doubleword): Likewise.
            (*one_cmpldi2_doubleword): Likewise.
            (and<mode>3 expander): Update to use SWIM1248x from SWIM1248s.
            (<any_or><mode>3 expander): Likewise.
            (one_cmpl<mode>2 expander): Likewise.

    gcc/testsuite/ChangeLog
            PR testsuite/104732
            * gcc.target/i386/pr104732.c: New test case.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-03-05  8:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-18 12:11 [Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target enkovich.gnu at gmail dot com
2015-02-18 12:45 ` [Bug target/65105] " jakub at gcc dot gnu.org
2015-02-18 13:28 ` enkovich.gnu at gmail dot com
2015-09-29  9:33 ` ienkovich at gcc dot gnu.org
2015-09-29  9:37 ` ienkovich at gcc dot gnu.org
2022-03-05  8:52 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).