public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed
@ 2004-03-12 13:15 michaelni at gmx dot at
2004-03-12 13:20 ` [Bug optimization/14552] " michaelni at gmx dot at
` (18 more replies)
0 siblings, 19 replies; 21+ messages in thread
From: michaelni at gmx dot at @ 2004-03-12 13:15 UTC (permalink / raw)
To: gcc-bugs
See attached source, gcc -O3 -mtune=pentium3 -march=pentium3 -S
generates:
test:
movq w, %mm1
pushl %ebp
movl %esp, %ebp
popl %ebp
psllw $1, %mm1
movq %mm1, w
movq w, %mm0
movq %mm0, dw
ret
human generates:
movq w, %mm1
paddw %mm1,%mm1
movq %mm1, w
movq %mm1,dw
ret
--
Summary: compiled trivial vector intrinsic code contains nearly
twice as many instructions as needed
Product: gcc
Version: 3.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: michaelni at gmx dot at
CC: gcc-bugs at gcc dot gnu dot org
GCC host triplet: pentium3-debian-linux
GCC target triplet: pentium3-debian-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code contains nearly twice as many instructions as needed
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
@ 2004-03-12 13:20 ` michaelni at gmx dot at
2004-03-12 15:47 ` pinskia at gcc dot gnu dot org
` (17 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: michaelni at gmx dot at @ 2004-03-12 13:20 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From michaelni at gmx dot at 2004-03-12 13:20 -------
Created an attachment (id=5906)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=5906&action=view)
source to generate the well optimized code
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code contains nearly twice as many instructions as needed
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
2004-03-12 13:20 ` [Bug optimization/14552] " michaelni at gmx dot at
@ 2004-03-12 15:47 ` pinskia at gcc dot gnu dot org
2004-03-12 16:26 ` michaelni at gmx dot at
` (16 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-03-12 15:47 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-03-12 15:46 -------
The poblem is that you also need -fomit-frame-pointer to get the same code as the human generated
code:
test:
movq w, %mm1
psllw $1, %mm1
movq %mm1, w
movq w, %mm0
movq %mm0, dw
ret
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |INVALID
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code contains nearly twice as many instructions as needed
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
2004-03-12 13:20 ` [Bug optimization/14552] " michaelni at gmx dot at
2004-03-12 15:47 ` pinskia at gcc dot gnu dot org
@ 2004-03-12 16:26 ` michaelni at gmx dot at
2004-03-12 16:30 ` [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent pinskia at gcc dot gnu dot org
` (15 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: michaelni at gmx dot at @ 2004-03-12 16:26 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From michaelni at gmx dot at 2004-03-12 16:26 -------
sorry, no thats not the same code, it has 1 instruction more, uses a shift
instead of a addition and writes the value to memory and reads it
immedeatly afterwards, anyway iam not surprised that the bugreport got
closed immedeatly
gcc:
movq w, %mm1
psllw $1, %mm1 <-------
movq %mm1, w
movq w, %mm0 <------
movq %mm0, dw
human:
movq w, %mm1
paddw %mm1,%mm1
movq %mm1, w
movq %mm1,dw
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (2 preceding siblings ...)
2004-03-12 16:26 ` michaelni at gmx dot at
@ 2004-03-12 16:30 ` pinskia at gcc dot gnu dot org
2004-03-12 16:38 ` pinskia at gcc dot gnu dot org
` (14 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-03-12 16:30 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-03-12 16:30 -------
Okay so reopening it.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Keywords| |pessimizes-code
Resolution|INVALID |
Summary|compiled trivial vector |compiled trivial vector
|intrinsic code contains |intrinsic code is
|nearly twice as many |ineffiencent
|instructions as needed |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (3 preceding siblings ...)
2004-03-12 16:30 ` [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent pinskia at gcc dot gnu dot org
@ 2004-03-12 16:38 ` pinskia at gcc dot gnu dot org
2004-03-12 17:11 ` michaelni at gmx dot at
` (13 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-03-12 16:38 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-03-12 16:38 -------
Using a tempary variable I can get it down to 5 instructions (including the return):
movq w, %mm0
psllw $1, %mm0
movq %mm0, dw
movq %mm0, w
ret
The problem is that global variables create the pessimize code so this is a dup of bug 12395.
*** This bug has been marked as a duplicate of 12395 ***
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |DUPLICATE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (4 preceding siblings ...)
2004-03-12 16:38 ` pinskia at gcc dot gnu dot org
@ 2004-03-12 17:11 ` michaelni at gmx dot at
2004-03-12 17:15 ` pinskia at gcc dot gnu dot org
` (12 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: michaelni at gmx dot at @ 2004-03-12 17:11 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From michaelni at gmx dot at 2004-03-12 17:11 -------
and the addition vs. shift issue? on the p3, mmx additions can be executed
in port 0 or 1 while mmx shifts can only execute in port 1
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (5 preceding siblings ...)
2004-03-12 17:11 ` michaelni at gmx dot at
@ 2004-03-12 17:15 ` pinskia at gcc dot gnu dot org
2004-04-07 3:00 ` pinskia at gcc dot gnu dot org
` (11 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-03-12 17:15 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-03-12 17:15 -------
That is a tunning issue. The problem is that CSE selects the shift.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|DUPLICATE |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (6 preceding siblings ...)
2004-03-12 17:15 ` pinskia at gcc dot gnu dot org
@ 2004-04-07 3:00 ` pinskia at gcc dot gnu dot org
2004-05-31 3:01 ` [Bug target/14552] " pinskia at gcc dot gnu dot org
` (10 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-04-07 3:00 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-04-07 03:00 -------
I already confirmed this.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2004-04-07 03:00:15
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (7 preceding siblings ...)
2004-04-07 3:00 ` pinskia at gcc dot gnu dot org
@ 2004-05-31 3:01 ` pinskia at gcc dot gnu dot org
2005-01-12 6:26 ` pinskia at gcc dot gnu dot org
` (9 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-05-31 3:01 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Component|rtl-optimization |target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (8 preceding siblings ...)
2004-05-31 3:01 ` [Bug target/14552] " pinskia at gcc dot gnu dot org
@ 2005-01-12 6:26 ` pinskia at gcc dot gnu dot org
2005-01-12 6:32 ` pinskia at gcc dot gnu dot org
` (8 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-12 6:26 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-12 06:26 -------
I will have to file a new bug for this as we produce so much worse code now on the mainline but that is
because we expand the + to do it all four times instead of using the sse/mmx unit which is just plainly
wrong.
--
What |Removed |Added
----------------------------------------------------------------------------
Severity|enhancement |minor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (9 preceding siblings ...)
2005-01-12 6:26 ` pinskia at gcc dot gnu dot org
@ 2005-01-12 6:32 ` pinskia at gcc dot gnu dot org
2005-01-12 15:31 ` pinskia at gcc dot gnu dot org
` (7 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-12 6:32 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |19391
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (10 preceding siblings ...)
2005-01-12 6:32 ` pinskia at gcc dot gnu dot org
@ 2005-01-12 15:31 ` pinskia at gcc dot gnu dot org
2005-01-18 11:34 ` rth at gcc dot gnu dot org
` (6 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-12 15:31 UTC (permalink / raw)
To: gcc-bugs
--
Bug 14552 depends on bug 19391, which changed state.
Bug 19391 Summary: [4.0 Regression] missed optimization with size of 8 vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19391
What |Old Value |New Value
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |WONTFIX
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (11 preceding siblings ...)
2005-01-12 15:31 ` pinskia at gcc dot gnu dot org
@ 2005-01-18 11:34 ` rth at gcc dot gnu dot org
2005-04-05 1:52 ` pinskia at gcc dot gnu dot org
` (5 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-18 11:34 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-18 11:34 -------
No, Andrew, mainline is not plainly wrong. We are correctly not using the
MMX unit when <mmintrin.h> is not in use. The instruction selection thing
can still be seen with the SSE unit though, if you widen the vectors to 16
bytes.
The problem is that ix86_rtx_costs has no idea about the cost of vector
operations. For what little it's worth, K8 thinks paddw and psllw are
equivalent -- both can be issued to fadd or fmul pipelines.
--
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.1.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (12 preceding siblings ...)
2005-01-18 11:34 ` rth at gcc dot gnu dot org
@ 2005-04-05 1:52 ` pinskia at gcc dot gnu dot org
2005-06-22 10:14 ` uros at kss-loka dot si
` (4 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-05 1:52 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.1.0 |---
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (13 preceding siblings ...)
2005-04-05 1:52 ` pinskia at gcc dot gnu dot org
@ 2005-06-22 10:14 ` uros at kss-loka dot si
2005-07-21 8:47 ` uros at kss-loka dot si
` (3 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: uros at kss-loka dot si @ 2005-06-22 10:14 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-06-22 10:14 -------
Just for fun, I have compiled the testcase with MMX/x87 mode switching patch
included, to check MMX vector extensions. This little patch is needed to enable
MMX vector extensions (only MMX vector add expander is shown):
diff -upr /export/home/uros/gcc-back/gcc/config/i386/i386.h i386/i386.h
--- /export/home/uros/gcc-back/gcc/config/i386/i386.h 2005-06-08
07:05:22.000000000 +0200
+++ i386/i386.h 2005-06-22 10:41:31.000000000 +0200
@@ -843,7 +845,8 @@ do {
\
/* ??? No autovectorization into MMX or 3DNOW until we can reliably
place emms and femms instructions. */
-#define UNITS_PER_SIMD_WORD (TARGET_SSE ? 16 : UNITS_PER_WORD)
+#define UNITS_PER_SIMD_WORD \
+ (TARGET_SSE ? 16 : TARGET_MMX ? 8 : UNITS_PER_WORD)
#define VALID_FP_MODE_P(MODE) \
((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode \
diff -upr /export/home/uros/gcc-back/gcc/config/i386/mmx.md i386/mmx.md
--- /export/home/uros/gcc-back/gcc/config/i386/mmx.md 2005-04-20
21:56:15.000000000 +0200
+++ i386/mmx.md 2005-06-22 11:00:35.000000000 +0200
@@ -553,6 +553,13 @@
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(define_expand "add<mode>3"
+ [(set (match_operand:MMXMODEI 0 "register_operand" "")
+ (plus:MMXMODEI (match_operand:MMXMODEI 1 "nonimmediate_operand" "")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "")))]
+ "TARGET_MMX"
+ "ix86_fixup_binary_operands_no_copy (PLUS, <MODE>mode, operands);")
+
(define_insn "mmx_add<mode>3"
[(set (match_operand:MMXMODEI 0 "register_operand" "=y")
(plus:MMXMODEI
After that, the testcase from description is compiled to (with -fomit-frame-
pointer):
test:
movq w, %mm0
paddw %mm0, %mm0
movq %mm0, w
movq %mm0, dw
emms
ret
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |uros at kss-loka dot si
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (14 preceding siblings ...)
2005-06-22 10:14 ` uros at kss-loka dot si
@ 2005-07-21 8:47 ` uros at kss-loka dot si
2005-09-13 21:09 ` fjahanian at apple dot com
` (2 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: uros at kss-loka dot si @ 2005-07-21 8:47 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-07-21 08:42 -------
You can patch the mainline 4.1 compiler with the patch at
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01128.html. Patch (which is
currently awaiting a review) will make gcc to produce optimal code:
'gcc -O2 -mmmx -fomit-frame-pointer'
test:
movq w, %mm0
paddw %mm0, %mm0
movq %mm0, w
movq %mm0, dw
emms
ret
--
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |uros at kss-loka dot si
|dot org |
URL| |http://gcc.gnu.org/ml/gcc-
| |patches/2005-
| |07/msg01128.html
Status|NEW |ASSIGNED
Keywords| |patch
Last reconfirmed|2004-11-22 03:37:09 |2005-07-21 08:42:15
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (15 preceding siblings ...)
2005-07-21 8:47 ` uros at kss-loka dot si
@ 2005-09-13 21:09 ` fjahanian at apple dot com
2005-09-13 21:13 ` pinskia at gcc dot gnu dot org
2005-09-15 11:39 ` uros at kss-loka dot si
18 siblings, 0 replies; 21+ messages in thread
From: fjahanian at apple dot com @ 2005-09-13 21:09 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From fjahanian at apple dot com 2005-09-13 21:09 -------
Hello,
What is the status of Uros's patches in:
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01128.html
Looks like they did not make it to FSF mainline? Are there remaining issues with them?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (16 preceding siblings ...)
2005-09-13 21:09 ` fjahanian at apple dot com
@ 2005-09-13 21:13 ` pinskia at gcc dot gnu dot org
2005-09-15 11:39 ` uros at kss-loka dot si
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-13 21:13 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-13 21:13 -------
(In reply to comment #13)
> Are there remaining issues with them?
Yes, it does not work when configuring gcc with --with-cpu=pentium4 see PR 19161.
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |19161
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
` (17 preceding siblings ...)
2005-09-13 21:13 ` pinskia at gcc dot gnu dot org
@ 2005-09-15 11:39 ` uros at kss-loka dot si
18 siblings, 0 replies; 21+ messages in thread
From: uros at kss-loka dot si @ 2005-09-15 11:39 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-09-15 11:39 -------
(In reply to comment #14)
> Yes, it does not work when configuring gcc with --with-cpu=pentium4 see PR
19161.
No, the patch works OK for pentium4. The remaining problem is in
optimize_mode_switching() function. For a certain loop layout, o_m_s could
insert emms and efpu insn in such way, that both register sets are blocked.
Because emms/efpu insertion depends heavily on o_m_s functionality, this
infrastructure should be upgraded as explained in PR 19161.
(BTW: One of the design goals was to ICE, instead of generating wrong code. It
loks that this goal was achieved :)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
[not found] <bug-14552-4523@http.gcc.gnu.org/bugzilla/>
@ 2005-11-21 11:29 ` pluto at agmk dot net
0 siblings, 0 replies; 21+ messages in thread
From: pluto at agmk dot net @ 2005-11-21 11:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from pluto at agmk dot net 2005-11-21 11:29 -------
without Uros' mmx-patch the gcc-4.1.0-20051113 generates amazing code:
(gcc -O3 -march=pentium3 -S -fomit-frame-pointer pr14552.c)
test: subl $20, %esp
movl w, %eax
movl w+4, %edx
movl %ebx, 8(%esp)
movl %esi, 12(%esp)
movl %eax, (%esp)
movl %edx, 4(%esp)
movswl (%esp),%esi
movl %edi, 16(%esp)
movswl 4(%esp),%ecx
movswl 2(%esp),%edi
movswl 6(%esp),%ebx
addl %esi, %esi
addl %ecx, %ecx
movzwl %si, %esi
sall $17, %edi
movzwl %cx, %ecx
sall $17, %ebx
movl %edi, %eax
movl 16(%esp), %edi
movl %ebx, %edx
orl %esi, %eax
movl 8(%esp), %ebx
orl %ecx, %edx
movl 12(%esp), %esi
movl %eax, w
movl %edx, w+4
movl w, %eax
movl w+4, %edx
movl %eax, dw
movl %edx, dw+4
addl $20, %esp
ret
.size test, .-test
.comm dw,8,8
.comm w,8,8
.ident "GCC: (GNU) 4.1.0 20051113 (experimental)"
.section .note.GNU-stack,"",@progbits
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2005-11-21 11:29 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-12 13:15 [Bug optimization/14552] New: compiled trivial vector intrinsic code contains nearly twice as many instructions as needed michaelni at gmx dot at
2004-03-12 13:20 ` [Bug optimization/14552] " michaelni at gmx dot at
2004-03-12 15:47 ` pinskia at gcc dot gnu dot org
2004-03-12 16:26 ` michaelni at gmx dot at
2004-03-12 16:30 ` [Bug optimization/14552] compiled trivial vector intrinsic code is ineffiencent pinskia at gcc dot gnu dot org
2004-03-12 16:38 ` pinskia at gcc dot gnu dot org
2004-03-12 17:11 ` michaelni at gmx dot at
2004-03-12 17:15 ` pinskia at gcc dot gnu dot org
2004-04-07 3:00 ` pinskia at gcc dot gnu dot org
2004-05-31 3:01 ` [Bug target/14552] " pinskia at gcc dot gnu dot org
2005-01-12 6:26 ` pinskia at gcc dot gnu dot org
2005-01-12 6:32 ` pinskia at gcc dot gnu dot org
2005-01-12 15:31 ` pinskia at gcc dot gnu dot org
2005-01-18 11:34 ` rth at gcc dot gnu dot org
2005-04-05 1:52 ` pinskia at gcc dot gnu dot org
2005-06-22 10:14 ` uros at kss-loka dot si
2005-07-21 8:47 ` uros at kss-loka dot si
2005-09-13 21:09 ` fjahanian at apple dot com
2005-09-13 21:13 ` pinskia at gcc dot gnu dot org
2005-09-15 11:39 ` uros at kss-loka dot si
[not found] <bug-14552-4523@http.gcc.gnu.org/bugzilla/>
2005-11-21 11:29 ` pluto at agmk dot net
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).