public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE
@ 2005-01-03 13:13 tbptbp at gmail dot com
  2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2005-01-03 13:13 UTC (permalink / raw)
  To: gcc-bugs

I'm seeing a significant runtime performance regression (>15%) with snapshots
following gcc-4.0-20041205; as far as i can see there's some issues when the
register pressure builds up: in later versions the fpu gets involved when former
version didn't.

The >15% figure comes from larger application (a raytracer), branch predictions
also changed (but i've fixed that) so i'm reasonably sure the problem is what's
demonstrated in the attached testcase.

Switches: -march=k8 -mfpmath=sse -O3 -ffast-math -fomit-frame-pointer

with gcc-4.0-20041205:
[snip]
  4010f4:       movss  (%ecx,%esi,4),%xmm0
  4010f9:       movss  (%eax,%ebx,4),%xmm5
  4010fe:       movss  (%eax,%esi,4),%xmm7
  401103:       mulss  %xmm5,%xmm1
  401107:       movss  (%ecx,%ebx,4),%xmm4
  40110c:       movss  %xmm0,(%esp)
  401111:       mulss  %xmm4,%xmm2
  401115:       movaps %xmm3,%xmm0
  401118:       subss  (%ecx,%edx,4),%xmm6
  40111d:       addss  (%eax,%edx,4),%xmm1
  401122:       mulss  (%esp),%xmm3
  401127:       mulss  %xmm7,%xmm0
  40112b:       subss  %xmm2,%xmm6
  40112f:       xorps  %xmm2,%xmm2
  401132:       addss  %xmm0,%xmm1
  401136:       subss  %xmm3,%xmm6
  40113a:       divss  %xmm1,%xmm6
  40113e:       mulss  %xmm6,%xmm7
  401142:       comiss 0x0(%ebp),%xmm6
  401146:       mulss  %xmm6,%xmm5
  40114a:       addss  (%esp),%xmm7

with gcc-4.0-20050102:
[snip]
  4010ff:       movss  (%ecx,%esi,4),%xmm0
  401104:       movss  (%eax,%ebx,4),%xmm5
  401109:       movss  (%eax,%esi,4),%xmm7
  40110e:       mulss  %xmm5,%xmm1
  401112:       movss  (%ecx,%ebx,4),%xmm4
  401117:       movss  %xmm0,0x4(%esp)
  40111d:       mulss  %xmm4,%xmm2
  401121:       movaps %xmm3,%xmm0
  401124:       flds   (%ecx,%edx,4)
  401127:       addss  (%eax,%edx,4),%xmm1
  40112c:       mulss  0x4(%esp),%xmm3
  401132:       fsubrs 0xc(%edi)
  401135:       mulss  %xmm7,%xmm0
  401139:       addss  %xmm0,%xmm1
  40113d:       fstps  (%esp)
  401140:       movss  (%esp),%xmm6
  401145:       subss  %xmm2,%xmm6
  401149:       xorps  %xmm2,%xmm2
  40114c:       subss  %xmm3,%xmm6
  401150:       divss  %xmm1,%xmm6
  401154:       mulss  %xmm6,%xmm7
  401158:       comiss 0x0(%ebp),%xmm6
  40115c:       mulss  %xmm6,%xmm5
  401160:       addss  0x4(%esp),%xmm7

-- 
           Summary: runtime performance regression in floating point heavy
                    code, x86/SSE
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tbptbp at gmail dot com
                CC: gcc-bugs at gcc dot gnu dot org
  GCC host triplet: cygwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/19240] runtime performance regression in floating point heavy code, x86/SSE
  2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
@ 2005-01-03 13:14 ` tbptbp at gmail dot com
  2005-01-03 15:06 ` [Bug target/19240] [4.0 Regression] " pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2005-01-03 13:14 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From tbptbp at gmail dot com  2005-01-03 13:14 -------
Created an attachment (id=7863)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7863&action=view)
One place with described symptoms


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
  2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
  2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
@ 2005-01-03 15:06 ` pinskia at gcc dot gnu dot org
  2005-01-03 16:27 ` uros at kss-loka dot si
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-03 15:06 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |target
           Keywords|                            |missed-optimization
            Summary|runtime performance         |[4.0 Regression] runtime
                   |regression in floating point|performance regression in
                   |heavy code, x86/SSE         |floating point heavy code,
                   |                            |x86/SSE
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
  2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
  2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
  2005-01-03 15:06 ` [Bug target/19240] [4.0 Regression] " pinskia at gcc dot gnu dot org
@ 2005-01-03 16:27 ` uros at kss-loka dot si
  2005-01-03 20:39 ` rth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: uros at kss-loka dot si @ 2005-01-03 16:27 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2005-01-03 16:27 -------
Ah, I see the problem. Combine pass is producing reverse div/sub patterns, where
the first operand is a memory_operand and the second is a register.
Unfortunatelly, sse patterns doesn't provide reversed patterns:

(define_insn "*fop_sf_1_sse"
  [(set (match_operand:SF 0 "register_operand" "=x")
	(match_operator:SF 3 "binary_fp_operator"
			[(match_operand:SF 1 "register_operand" "0")
			 (match_operand:SF 2 "nonimmediate_operand" "xm")]))]
...

(define_insn "*fop_sf_1_i387"
  [(set (match_operand:SF 0 "register_operand" "=f,f")
	(match_operator:SF 3 "binary_fp_operator"
			[(match_operand:SF 1 "nonimmediate_operand" "0,fm")
			 (match_operand:SF 2 "nonimmediate_operand" "fm,0")]))]


SSE pattern isn't matched when first operand is memory operand and so it doesn't
shadow the 387 pattern. I think that fop_{s,d}f_1_i387 needs additional
constraint to hide them for TARGET_SSE. Perhaps:

&& !(TARGET_SSE && GET_CODE (operands[1]) == MEM)

and similar for DFmode.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2005-01-03 16:27:34
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
  2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
                   ` (2 preceding siblings ...)
  2005-01-03 16:27 ` uros at kss-loka dot si
@ 2005-01-03 20:39 ` rth at gcc dot gnu dot org
  2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
  2005-01-04 15:43 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-03 20:39 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rth at gcc dot gnu dot org  2005-01-03 20:39 -------
Yep.  When we do these stepwise filtering of patterns all of them have to have
the same predicates, even if the constraints are more strict.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
  2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
                   ` (3 preceding siblings ...)
  2005-01-03 20:39 ` rth at gcc dot gnu dot org
@ 2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
  2005-01-04 15:43 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-01-04 10:41 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2005-01-04 10:41 -------
Subject: Bug 19240

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	uros@gcc.gnu.org	2005-01-04 10:40:58

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.md 

Log message:
	PR target/19240
	* config/i386/i386.md (*fop_df_1_i387): Disable for TARGET_SSE_MATH.
	(*fop_df_1_i387): Disable for (TARGET_SSE2 && TARGET_SSE_MATH).

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7016&r2=2.7017
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.598&r2=1.599



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/19240] [4.0 Regression] runtime performance regression in floating point heavy code, x86/SSE
  2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
                   ` (4 preceding siblings ...)
  2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
@ 2005-01-04 15:43 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-01-04 15:43 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-01-04 15:43 -------
Fixed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-01-04 15:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-03 13:13 [Bug rtl-optimization/19240] New: runtime performance regression in floating point heavy code, x86/SSE tbptbp at gmail dot com
2005-01-03 13:14 ` [Bug rtl-optimization/19240] " tbptbp at gmail dot com
2005-01-03 15:06 ` [Bug target/19240] [4.0 Regression] " pinskia at gcc dot gnu dot org
2005-01-03 16:27 ` uros at kss-loka dot si
2005-01-03 20:39 ` rth at gcc dot gnu dot org
2005-01-04 10:41 ` cvs-commit at gcc dot gnu dot org
2005-01-04 15:43 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).