* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
@ 2012-08-22 14:58 ` jakub at gcc dot gnu.org
2012-08-23 15:46 ` neleai at seznam dot cz
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-08-22 14:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
CC| |jakub at gcc dot gnu.org
Resolution| |INVALID
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-08-22 14:57:54 UTC ---
Not a bug. You need to tune for a CPU where inter-unit moves are desirable.
The default is generic tuning, which is a compromise between Intel CPUs (where
they are desirable) and AMD CPUs (where they are undesirable). In this
particular case the generic tuning doesn't do inter-unit moves as part of the
compromise. If you -mtune=corei7 or similar, you'll get an inter-unit move in
both cases.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
2012-08-22 14:58 ` [Bug target/54349] " jakub at gcc dot gnu.org
@ 2012-08-23 15:46 ` neleai at seznam dot cz
2013-04-03 14:05 ` hubicka at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: neleai at seznam dot cz @ 2012-08-23 15:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Ondrej Bilka <neleai at seznam dot cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |
--- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-08-23 15:45:54 UTC ---
(In reply to comment #1)
> Not a bug. You need to tune for a CPU where inter-unit moves are desirable.
> The default is generic tuning, which is a compromise between Intel CPUs (where
> they are desirable) and AMD CPUs (where they are undesirable). In this
> particular case the generic tuning doesn't do inter-unit moves as part of the
> compromise. If you -mtune=corei7 or similar, you'll get an inter-unit move in
> both cases.
What amd procesors?
Compile following two files with march=core2 and march=amdfam10. Amd version
was always at least 5% slower.
Tested on AMD Athlon(tm) 64 Processor 3200+,AMD Opteron(tm) Processor 6134
AMD FX(tm)-8150 Eight-Core Processor, AMD Phenom(tm) II X6 1090T Processor
#include <emmintrin.h>
#include <stdint.h>
int64_t foo(int64_t a,int64_t c){__m128i b=
_mm_cvtsi64_si128(a),d=_mm_cvtsi64_si128(c);
return _mm_cvtsi128_si64(_mm_add_epi8(b,d));
}
/*need split otherwise simplified to identical code*/
#include <emmintrin.h>
#include <stdint.h>
int main(){
int i;
int64_t x=0;
for (i=0;i<100000000;i++) x=foo(x,1);
return x;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
2012-08-22 14:58 ` [Bug target/54349] " jakub at gcc dot gnu.org
2012-08-23 15:46 ` neleai at seznam dot cz
@ 2013-04-03 14:05 ` hubicka at gcc dot gnu.org
2013-04-27 1:06 ` neleai at seznam dot cz
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2013-04-03 14:05 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2013-04-03
CC| |hubicka at gcc dot gnu.org
AssignedTo|unassigned at gcc dot |hubicka at gcc dot gnu.org
|gnu.org |
Ever Confirmed|0 |1
--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> 2013-04-03 14:05:04 UTC ---
I suppose we can re-benchmark this (last time this flag was tested was at
original K8)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
` (2 preceding siblings ...)
2013-04-03 14:05 ` hubicka at gcc dot gnu.org
@ 2013-04-27 1:06 ` neleai at seznam dot cz
2013-04-29 6:36 ` ubizjak at gmail dot com
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: neleai at seznam dot cz @ 2013-04-27 1:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
--- Comment #4 from Ondrej Bilka <neleai at seznam dot cz> 2013-04-27 01:06:45 UTC ---
I found that AMD Bulldozer optimization guide states that moves from xmm to
GPR register should be done directly:"
10.4 Moving Data Between General-Purpose and XMM/YMM Registers
When moving data from a GPR to an XMM register, use separate store and load
instructions to move
the data first from the source register to a temporary location in memory and
then from memory into
the destination register, taking the memory latency into account when
scheduling both stages of the
load-store sequence.
When moving data from an XMM register to a general-purpose register, use the
VMOVD instruction.
Whenever possible, use loads and stores of the same data length. (See 6.3,
`Store-to-Load Forwarding
Restrictions" on page 98 for more information.)
"
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
` (3 preceding siblings ...)
2013-04-27 1:06 ` neleai at seznam dot cz
@ 2013-04-29 6:36 ` ubizjak at gmail dot com
2013-04-29 8:47 ` ubizjak at gmail dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 6:36 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |Ganesh.Gopalasubramanian at
| |amd dot com,
| |venkataramanan.kumar at amd
| |dot com
--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 06:36:09 UTC ---
Adding CCs.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
` (4 preceding siblings ...)
2013-04-29 6:36 ` ubizjak at gmail dot com
@ 2013-04-29 8:47 ` ubizjak at gmail dot com
2013-04-29 11:10 ` ubizjak at gmail dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 8:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|hubicka at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #6 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 08:47:20 UTC ---
I have a patch in testing that splits TARGET_INTER_UNIT_MOVES to moves to and
from vector registers.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
` (5 preceding siblings ...)
2013-04-29 8:47 ` ubizjak at gmail dot com
@ 2013-04-29 11:10 ` ubizjak at gmail dot com
2013-04-29 11:11 ` ubizjak at gmail dot com
2013-04-29 11:12 ` ubizjak at gmail dot com
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 11:10 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86
URL| |http://gcc.gnu.org/ml/gcc-p
| |atches/2013-04/msg01723.htm
| |l
--- Comment #7 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 11:10:17 UTC ---
Patch at [1].
[1] http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01723.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
` (6 preceding siblings ...)
2013-04-29 11:10 ` ubizjak at gmail dot com
@ 2013-04-29 11:11 ` ubizjak at gmail dot com
2013-04-29 11:12 ` ubizjak at gmail dot com
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 11:11 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.9.0
--- Comment #8 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 11:11:28 UTC ---
Author: uros
Date: Mon Apr 29 11:00:10 2013
New Revision: 198401
URL: http://gcc.gnu.org/viewcvs?rev=198401&root=gcc&view=rev
Log:
PR target/54349
* config/i386/i386.h (enum ix86_tune_indices)
<X86_TUNE_INTER_UNIT_MOVES_TO_VEC, X86_TUNE_INTER_UNIT_MOVES_FROM_VEC>:
New, split from X86_TUNE_INTER_UNIT_MOVES.
<X86_TUNE_INTER_UNIT_MOVES>: Remove.
(TARGET_INTER_UNIT_MOVES_TO_VEC): New define.
(TARGET_INTER_UNIT_MOVES_FROM_VEC): Ditto.
(TARGET_INTER_UNIT_MOVES): Remove.
* config/i386/i386.c (initial_ix86_tune_features): Update.
Disable X86_TUNE_INTER_UNIT_MOVES_FROM_VEC for m_ATHLON_K8 only.
(ix86_expand_convert_uns_didf_sse): Use
TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_expand_vector_init_interleave): Ditto.
(inline_secondary_memory_needed): Return true for moves from SSE class
registers for !TARGET_INTER_UNIT_MOVES_FROM_VEC targets and for moves
to SSE class registers for !TARGET_INTER_UNIT_MOVES_TO_VEC targets.
* config/i386/constraints.md (Yi, Ym): Depend on
TARGET_INTER_UNIT_MOVES_TO_VEC.
(Yj, Yn): New constraints.
* config/i386/i386.md (*movdi_internal): Change constraints of
operand 1 from Yi to Yj and from Ym to Yn.
(*movsi_internal): Ditto.
(*movdf_internal): Ditto.
(*movsf_internal): Ditto.
(*float<SWI48x:mode><X87MODEF:mode>2_1): Use
TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
(*float<SWI48x:mode><X87MODEF:mode>2_1 splitters): Ditto.
(floatdi<X87MODEF:mode>2_i387_with_xmm): Ditto.
(floatdi<X87MODEF:mode>2_i387_with_xmm splitters): Ditto.
* config/i386/sse.md (movdi_to_sse): Ditto.
(sse2_stored): Change constraint of operand 1 from Yi to Yj.
Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of
TARGET_INTER_UNIT_MOVES.
(sse_storeq_rex64): Change constraint of operand 1 from Yi to Yj.
(sse_storeq_rex64 splitter): Use TARGET_INTER_UNIT_MOVES_FROM_VEC
instead of TARGET_INTER_UNIT_MOVES.
* config/i386/mmx.md (*mov<mode>_internal): Change constraint of
operand 1 from Yi to Yj and from Ym to Yn.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/constraints.md
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.h
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/mmx.md
trunk/gcc/config/i386/sse.md
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack
2012-08-22 14:30 [Bug target/54349] New: _mm_cvtsi128_si64 unnecessary stores value at stack neleai at seznam dot cz
` (7 preceding siblings ...)
2013-04-29 11:11 ` ubizjak at gmail dot com
@ 2013-04-29 11:12 ` ubizjak at gmail dot com
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2013-04-29 11:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #9 from Uros Bizjak <ubizjak at gmail dot com> 2013-04-29 11:12:31 UTC ---
Fixed for 4.9.0.
^ permalink raw reply [flat|nested] 10+ messages in thread