public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE
@ 2005-01-03 13:13 tbptbp at gmail dot com
2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2005-01-03 13:13 UTC (permalink / raw)
To: gcc-bugs
I'm seeing a significant runtime performance regression (>15%) with snapshots
following gcc-4.0-20041205; as far as i can see there's some issues when the
register pressure builds up: in later versions the fpu gets involved when former
version didn't.
The >15% figure comes from larger application (a raytracer), branch predictions
also changed (but i've fixed that) so i'm reasonably sure the problem is what's
demonstrated in the attached testcase.
Switches: -march=k8 -mfpmath=sse -O3 -ffast-math -fomit-frame-pointer
with gcc-4.0-20041205:
[snip]
4010f4: movss (%ecx,%esi,4),%xmm0
4010f9: movss (%eax,%ebx,4),%xmm5
4010fe: movss (%eax,%esi,4),%xmm7
401103: mulss %xmm5,%xmm1
401107: movss (%ecx,%ebx,4),%xmm4
40110c: movss %xmm0,(%esp)
401111: mulss %xmm4,%xmm2
401115: movaps %xmm3,%xmm0
401118: subss (%ecx,%edx,4),%xmm6
40111d: addss (%eax,%edx,4),%xmm1
401122: mulss (%esp),%xmm3
401127: mulss %xmm7,%xmm0
40112b: subss %xmm2,%xmm6
40112f: xorps %xmm2,%xmm2
401132: addss %xmm0,%xmm1
401136: subss %xmm3,%xmm6
40113a: divss %xmm1,%xmm6
40113e: mulss %xmm6,%xmm7
401142: comiss 0x0(%ebp),%xmm6
401146: mulss %xmm6,%xmm5
40114a: addss (%esp),%xmm7
with gcc-4.0-20050102:
[snip]
4010ff: movss (%ecx,%esi,4),%xmm0
401104: movss (%eax,%ebx,4),%xmm5
401109: movss (%eax,%esi,4),%xmm7
40110e: mulss %xmm5,%xmm1
401112: movss (%ecx,%ebx,4),%xmm4
401117: movss %xmm0,0x4(%esp)
40111d: mulss %xmm4,%xmm2
401121: movaps %xmm3,%xmm0
401124: flds (%ecx,%edx,4)
401127: addss (%eax,%edx,4),%xmm1
40112c: mulss 0x4(%esp),%xmm3
401132: fsubrs 0xc(%edi)
401135: mulss %xmm7,%xmm0
401139: addss %xmm0,%xmm1
40113d: fstps (%esp)
401140: movss (%esp),%xmm6
401145: subss %xmm2,%xmm6
401149: xorps %xmm2,%xmm2
40114c: subss %xmm3,%xmm6
401150: divss %xmm1,%xmm6
401154: mulss %xmm6,%xmm7
401158: comiss 0x0(%ebp),%xmm6
40115c: mulss %xmm6,%xmm5
401160: addss 0x4(%esp),%xmm7
--
Summary: runtime performance regression in floating point heavy
code, x86/SSE
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tbptbp at gmail dot com
CC: gcc-bugs at gcc dot gnu dot org
GCC host triplet: cygwin
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/19240] runtime performance regression in floating point heavy code, x86/SSE
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
@ 2005-01-03 13:14 ` tbptbp at gmail dot com
2005-01-03 15:06 ` [Bug target/19240] [4.0 Regression] " pinskia at gcc dot gnu dot org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2005-01-03 13:14 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From tbptbp at gmail dot com 2005-01-03 13:14 -------
Created an attachment (id=7863)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7863&action=view)
One place with described symptoms
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
@ 2005-01-03 15:06 ` pinskia at gcc dot gnu dot org
2005-01-03 16:27 ` uros at kss-loka dot si
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-03 15:06 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Component|rtl-optimization |target
Keywords| |missed-optimization
Summary|runtime performance |[4.0 Regression] runtime
|regression in floating point|performance regression in
|heavy code, x86/SSE |floating point heavy code,
| |x86/SSE
Target Milestone|--- |4.0.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
2005-01-03 15:06 ` [Bug target/19240] [4.0 Regression] " pinskia at gcc dot gnu dot org
@ 2005-01-03 16:27 ` uros at kss-loka dot si
2005-01-03 20:39 ` rth at gcc dot gnu dot org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: uros at kss-loka dot si @ 2005-01-03 16:27 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-01-03 16:27 -------
Ah, I see the problem. Combine pass is producing reverse div/sub patterns, where
the first operand is a memory_operand and the second is a register.
Unfortunatelly, sse patterns doesn't provide reversed patterns:
(define_insn "*fop_sf_1_sse"
[(set (match_operand:SF 0 "register_operand" "=x")
(match_operator:SF 3 "binary_fp_operator"
[(match_operand:SF 1 "register_operand" "0")
(match_operand:SF 2 "nonimmediate_operand" "xm")]))]
...
(define_insn "*fop_sf_1_i387"
[(set (match_operand:SF 0 "register_operand" "=f,f")
(match_operator:SF 3 "binary_fp_operator"
[(match_operand:SF 1 "nonimmediate_operand" "0,fm")
(match_operand:SF 2 "nonimmediate_operand" "fm,0")]))]
SSE pattern isn't matched when first operand is memory operand and so it doesn't
shadow the 387 pattern. I think that fop_{s,d}f_1_i387 needs additional
constraint to hide them for TARGET_SSE. Perhaps:
&& !(TARGET_SSE && GET_CODE (operands[1]) == MEM)
and similar for DFmode.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2005-01-03 16:27:34
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
` (2 preceding siblings ...)
2005-01-03 16:27 ` uros at kss-loka dot si
@ 2005-01-03 20:39 ` rth at gcc dot gnu dot org
2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
2005-01-04 15:43 ` pinskia at gcc dot gnu dot org
5 siblings, 0 replies; 7+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-03 20:39 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-03 20:39 -------
Yep. When we do these stepwise filtering of patterns all of them have to have
the same predicates, even if the constraints are more strict.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
` (3 preceding siblings ...)
2005-01-03 20:39 ` rth at gcc dot gnu dot org
@ 2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
2005-01-04 15:43 ` pinskia at gcc dot gnu dot org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-01-04 10:41 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From cvs-commit at gcc dot gnu dot org 2005-01-04 10:41 -------
Subject: Bug 19240
CVSROOT: /cvs/gcc
Module name: gcc
Changes by: uros@gcc.gnu.org 2005-01-04 10:40:58
Modified files:
gcc : ChangeLog
gcc/config/i386: i386.md
Log message:
PR target/19240
* config/i386/i386.md (*fop_df_1_i387): Disable for TARGET_SSE_MATH.
(*fop_df_1_i387): Disable for (TARGET_SSE2 && TARGET_SSE_MATH).
Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7016&r2=2.7017
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.598&r2=1.599
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
` (4 preceding siblings ...)
2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
@ 2005-01-04 15:43 ` pinskia at gcc dot gnu dot org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-04 15:43 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-04 15:43 -------
Fixed.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-01-04 15:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
2005-01-03 15:06 ` [Bug target/19240] [4.0 Regression] " pinskia at gcc dot gnu dot org
2005-01-03 16:27 ` uros at kss-loka dot si
2005-01-03 20:39 ` rth at gcc dot gnu dot org
2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
2005-01-04 15:43 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).