public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack
@ 2012-08-22 14:30 neleai at seznam dot cz
  2012-08-22 14:58 ` [Bug target/54349] " jakub at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: neleai at seznam dot cz @ 2012-08-22 14:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

             Bug #: 54349
           Summary: _mm_cvtsi128_si64 unnecessary stores value at stack
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: neleai@seznam.cz


Consider following program:

#include <emmintrin.h>
#include <stdint.h>
int64_t foo(__m128i a){return  _mm_cvtsi128_si64(a);}
int64_t bar(double b){ return *((int64_t*)&b);}

A _mm_cvtsi128_si64 does unnecesary store into stack instead moving to register
directly. Same problem is present when interpreting double as integer.

  .file "e.c"
  .text
  .p2align 4,,15
  .globl  foo
  .type foo, @function
foo:
.LFB517:
  .cfi_startproc
  movq  %xmm0, -16(%rsp)
  movq  -16(%rsp), %rax
  ret
  .cfi_endproc
.LFE517:
  .size foo, .-foo
  .p2align 4,,15
  .globl  bar
  .type bar, @function
bar:
.LFB518:
  .cfi_startproc
  movsd %xmm0, -8(%rsp)
  movq  -8(%rsp), %rax
  ret
  .cfi_endproc
.LFE518:
  .size bar, .-bar
  .ident  "GCC: (Debian 20120820-1) 4.8.0 20120820 (experimental) [trunk
revision 190537]"
  .section  .note.GNU-stack,"",@progbits


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
@ 2012-08-22 14:58 ` jakub at gcc dot gnu.org
  2012-08-23 15:46 ` neleai at seznam dot cz
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-08-22 14:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |jakub at gcc dot gnu.org
         Resolution|                            |INVALID

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-08-22 14:57:54 UTC ---
Not a bug.  You need to tune for a CPU where inter-unit moves are desirable. 
The default is generic tuning, which is a compromise between Intel CPUs (where
they are desirable) and AMD CPUs (where they are undesirable).  In this
particular case the generic tuning doesn't do inter-unit moves as part of the
compromise.  If you -mtune=corei7 or similar, you'll get an inter-unit move in
both cases.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
  2012-08-22 14:58 ` [Bug target/54349] " jakub at gcc dot gnu.org
@ 2012-08-23 15:46 ` neleai at seznam dot cz
  2013-04-03 14:05 ` hubicka at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: neleai at seznam dot cz @ 2012-08-23 15:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Ondrej Bilka <neleai at seznam dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |

--- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-08-23 15:45:54 UTC ---
(In reply to comment #1)
> Not a bug.  You need to tune for a CPU where inter-unit moves are desirable. 
> The default is generic tuning, which is a compromise between Intel CPUs (where
> they are desirable) and AMD CPUs (where they are undesirable).  In this
> particular case the generic tuning doesn't do inter-unit moves as part of the
> compromise.  If you -mtune=corei7 or similar, you'll get an inter-unit move in
> both cases.

What amd procesors?

Compile following two files with march=core2 and march=amdfam10. Amd version
was always at least 5% slower.
Tested on AMD Athlon(tm) 64 Processor 3200+,AMD Opteron(tm) Processor 6134
AMD FX(tm)-8150 Eight-Core Processor, AMD Phenom(tm) II X6 1090T Processor


#include <emmintrin.h>
#include <stdint.h>

int64_t foo(int64_t a,int64_t c){__m128i b= 
_mm_cvtsi64_si128(a),d=_mm_cvtsi64_si128(c);
  return _mm_cvtsi128_si64(_mm_add_epi8(b,d));
}
/*need split otherwise simplified to identical code*/
#include <emmintrin.h>
#include <stdint.h>

int main(){
  int i;
  int64_t x=0;
  for (i=0;i<100000000;i++) x=foo(x,1);
  return x;
}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
  2012-08-22 14:58 ` [Bug target/54349] " jakub at gcc dot gnu.org
  2012-08-23 15:46 ` neleai at seznam dot cz
@ 2013-04-03 14:05 ` hubicka at gcc dot gnu.org
  2013-04-27  1:06 ` neleai at seznam dot cz
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-04-03 14:05 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2013-04-03
                 CC|                            |hubicka at gcc dot gnu.org
         AssignedTo|unassigned at gcc dot       |hubicka at gcc dot gnu.org
                   |gnu.org                     |
     Ever Confirmed|0                           |1

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> 2013-04-03 14:05:04 UTC ---
I suppose we can re-benchmark this (last time this flag was tested was at
original K8)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
                   ` (2 preceding siblings ...)
  2013-04-03 14:05 ` hubicka at gcc dot gnu.org
@ 2013-04-27  1:06 ` neleai at seznam dot cz
  2013-04-29  6:36 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: neleai at seznam dot cz @ 2013-04-27  1:06 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> 2013-04-27 01:06:45 UTC ---
I found that  AMD Bulldozer optimization guide states that moves from xmm to
GPR register should be done directly:"

10.4 Moving Data Between General-Purpose and XMM/YMM Registers


When moving data from a GPR to an XMM register, use separate store and load
instructions to move
the data first from the source register to a temporary location in memory and
then from memory into
the destination register, taking the memory latency into account when
scheduling both stages of the
load-store sequence.
When moving data from an XMM register to a general-purpose register, use the
VMOVD instruction.
Whenever possible, use loads and stores of the same data length. (See 6.3,
`Store-to-Load Forwarding
Restrictions" on page 98 for more information.)
"


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
                   ` (3 preceding siblings ...)
  2013-04-27  1:06 ` neleai at seznam dot cz
@ 2013-04-29  6:36 ` ubizjak at gmail dot com
  2013-04-29  8:47 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29  6:36 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Ganesh.Gopalasubramanian at
                   |                            |amd dot com,
                   |                            |venkataramanan.kumar at amd
                   |                            |dot com

--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 06:36:09 UTC ---
Adding CCs.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
                   ` (4 preceding siblings ...)
  2013-04-29  6:36 ` ubizjak at gmail dot com
@ 2013-04-29  8:47 ` ubizjak at gmail dot com
  2013-04-29 11:10 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29  8:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|hubicka at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #6 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 08:47:20 UTC ---
I have a patch in testing that splits TARGET_INTER_UNIT_MOVES to moves to and
from vector registers.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
                   ` (5 preceding siblings ...)
  2013-04-29  8:47 ` ubizjak at gmail dot com
@ 2013-04-29 11:10 ` ubizjak at gmail dot com
  2013-04-29 11:11 ` ubizjak at gmail dot com
  2013-04-29 11:12 ` ubizjak at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 11:10 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86
                URL|                            |http://gcc.gnu.org/ml/gcc-p
                   |                            |atches/2013-04/msg01723.htm
                   |                            |l

--- Comment #7 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 11:10:17 UTC ---
Patch at [1].

[1] http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01723.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
                   ` (6 preceding siblings ...)
  2013-04-29 11:10 ` ubizjak at gmail dot com
@ 2013-04-29 11:11 ` ubizjak at gmail dot com
  2013-04-29 11:12 ` ubizjak at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 11:11 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.9.0

--- Comment #8 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 11:11:28 UTC ---
Author: uros
Date: Mon Apr 29 11:00:10 2013
New Revision: 198401

URL: http://gcc.gnu.org/viewcvs?rev=198401&root=gcc&view=rev
Log:
    PR target/54349
    * config/i386/i386.h (enum ix86_tune_indices)
    <X86_TUNE_INTER_UNIT_MOVES_TO_VEC, X86_TUNE_INTER_UNIT_MOVES_FROM_VEC>:
    New, split from X86_TUNE_INTER_UNIT_MOVES.
    <X86_TUNE_INTER_UNIT_MOVES>: Remove.
    (TARGET_INTER_UNIT_MOVES_TO_VEC): New define.
    (TARGET_INTER_UNIT_MOVES_FROM_VEC): Ditto.
    (TARGET_INTER_UNIT_MOVES): Remove.
    * config/i386/i386.c (initial_ix86_tune_features): Update.
    Disable X86_TUNE_INTER_UNIT_MOVES_FROM_VEC for m_ATHLON_K8 only.
    (ix86_expand_convert_uns_didf_sse): Use
    TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
    (ix86_expand_vector_init_one_nonzero): Ditto.
    (ix86_expand_vector_init_interleave): Ditto.
    (inline_secondary_memory_needed): Return true for moves from SSE class
    registers for !TARGET_INTER_UNIT_MOVES_FROM_VEC targets and for moves
    to SSE class registers for !TARGET_INTER_UNIT_MOVES_TO_VEC targets.
    * config/i386/constraints.md (Yi, Ym): Depend on
    TARGET_INTER_UNIT_MOVES_TO_VEC.
    (Yj, Yn): New constraints.
    * config/i386/i386.md (*movdi_internal): Change constraints of
    operand 1 from Yi to Yj and from Ym to Yn.
    (*movsi_internal): Ditto.
    (*movdf_internal): Ditto.
    (*movsf_internal): Ditto.
    (*float<SWI48x:mode><X87MODEF:mode>2_1): Use
    TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
    (*float<SWI48x:mode><X87MODEF:mode>2_1 splitters): Ditto.
    (floatdi<X87MODEF:mode>2_i387_with_xmm): Ditto.
    (floatdi<X87MODEF:mode>2_i387_with_xmm splitters): Ditto.
    * config/i386/sse.md (movdi_to_sse): Ditto.
    (sse2_stored): Change constraint of operand 1 from Yi to Yj.
    Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of
    TARGET_INTER_UNIT_MOVES.
    (sse_storeq_rex64): Change constraint of operand 1 from Yi to Yj.
    (sse_storeq_rex64 splitter): Use TARGET_INTER_UNIT_MOVES_FROM_VEC
    instead of TARGET_INTER_UNIT_MOVES.
    * config/i386/mmx.md (*mov<mode>_internal): Change constraint of
    operand 1 from Yi to Yj and from Ym to Yn.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/constraints.md
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/mmx.md
    trunk/gcc/config/i386/sse.md


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
  2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
                   ` (7 preceding siblings ...)
  2013-04-29 11:11 ` ubizjak at gmail dot com
@ 2013-04-29 11:12 ` ubizjak at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 11:12 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #9 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 11:12:31 UTC ---
Fixed for 4.9.0.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-04-29 11:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
2012-08-22 14:58 ` [Bug target/54349] " jakub at gcc dot gnu.org
2012-08-23 15:46 ` neleai at seznam dot cz
2013-04-03 14:05 ` hubicka at gcc dot gnu.org
2013-04-27  1:06 ` neleai at seznam dot cz
2013-04-29  6:36 ` ubizjak at gmail dot com
2013-04-29  8:47 ` ubizjak at gmail dot com
2013-04-29 11:10 ` ubizjak at gmail dot com
2013-04-29 11:11 ` ubizjak at gmail dot com
2013-04-29 11:12 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).