public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Fix SPEC2000 GCC regression
@ 2008-09-05 11:55 Jan Hubicka
  2008-09-10 15:02 ` H.J. Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2008-09-05 11:55 UTC (permalink / raw)
  To: gcc-patches

Hi,
we have regression in SPEC2000 GCC.  The problem is schedule_loop that
contains memset calls of large blocks.  Those calls are identified by
profile code as cold, that is not wrong, but since blocks are large, we
still spend important amount of time.

This patch makes hot/cold heuristics more conservative when it comes to
string operations of large or unknown size.  Now we optimize for size
only memsets in cold functions, not based on BB profile.

Bootstrapped/regtested i686-linux.

	* i386.c (decide_alg): Be more conservative about optimizing for size.

	* gcc.target/i386/cold-attribute-1.c: Update testcase.
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 139985)
+++ config/i386/i386.c	(working copy)
@@ -16994,6 +16994,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WI
 	    int *dynamic_check)
 {
   const struct stringop_algs * algs;
+  bool optimize_for_speed;
   /* Algorithms using the rep prefix want at least edi and ecx;
      additionally, memset wants eax and memcpy wants esi.  Don't
      consider such algorithms if the user has appropriated those
@@ -17008,7 +17009,16 @@ decide_alg (HOST_WIDE_INT count, HOST_WI
 			       && alg != rep_prefix_8_byte))
   const struct processor_costs *cost;
   
-  cost = optimize_insn_for_size_p () ? &ix86_size_cost : ix86_cost;
+  /* Even if the string operation call is cold, we still might spend a lot
+     of time processing large blocks.  */
+  if (optimize_function_for_size_p (cfun)
+      || (optimize_insn_for_size_p ()
+          && expected_size != -1 && expected_size < 256))
+    optimize_for_speed = false;
+  else
+    optimize_for_speed = true;
+
+  cost = optimize_for_speed ? ix86_cost : &ix86_size_cost;
 
   *dynamic_check = -1;
   if (memset)
@@ -17018,7 +17028,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WI
   if (stringop_alg != no_stringop && ALG_USABLE_P (stringop_alg))
     return stringop_alg;
   /* rep; movq or rep; movl is the smallest variant.  */
-  else if (optimize_insn_for_size_p ())
+  else if (!optimize_for_speed)
     {
       if (!count || (count & 3))
 	return rep_prefix_usable ? rep_prefix_1_byte : loop_1_byte;
Index: testsuite/gcc.target/i386/cold-attribute-1.c
===================================================================
--- testsuite/gcc.target/i386/cold-attribute-1.c	(revision 139985)
+++ testsuite/gcc.target/i386/cold-attribute-1.c	(working copy)
@@ -10,7 +10,7 @@ my_cold_memset (void *a, int b,int c)
 t(void *a,int b,int c)
 {
   if (a)
-    my_cold_memset (a,b,c);
+    my_cold_memset (a,b,40);
 }
 
 /* The IF conditional should be predicted as cold and my_cold_memset inlined

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix SPEC2000 GCC regression
  2008-09-05 11:55 Fix SPEC2000 GCC regression Jan Hubicka
@ 2008-09-10 15:02 ` H.J. Lu
  2008-09-10 15:05   ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2008-09-10 15:02 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Fri, Sep 5, 2008 at 4:29 AM, Jan Hubicka <jh@suse.cz> wrote:
> Hi,
> we have regression in SPEC2000 GCC.  The problem is schedule_loop that
> contains memset calls of large blocks.  Those calls are identified by
> profile code as cold, that is not wrong, but since blocks are large, we
> still spend important amount of time.
>
> This patch makes hot/cold heuristics more conservative when it comes to
> string operations of large or unknown size.  Now we optimize for size
> only memsets in cold functions, not based on BB profile.
>
> Bootstrapped/regtested i686-linux.
>
>        * i386.c (decide_alg): Be more conservative about optimizing for size.
>
>        * gcc.target/i386/cold-attribute-1.c: Update testcase.

Hi Honza,

This patch caused 10% performance drop on 176.gcc in SPEC CPU 20006
at -O2 -ffast-math on Intel Core 2.



-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix SPEC2000 GCC regression
  2008-09-10 15:02 ` H.J. Lu
@ 2008-09-10 15:05   ` Jan Hubicka
  2008-09-10 15:30     ` H.J. Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2008-09-10 15:05 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jan Hubicka, gcc-patches

> On Fri, Sep 5, 2008 at 4:29 AM, Jan Hubicka <jh@suse.cz> wrote:
> > Hi,
> > we have regression in SPEC2000 GCC.  The problem is schedule_loop that
> > contains memset calls of large blocks.  Those calls are identified by
> > profile code as cold, that is not wrong, but since blocks are large, we
> > still spend important amount of time.
> >
> > This patch makes hot/cold heuristics more conservative when it comes to
> > string operations of large or unknown size.  Now we optimize for size
> > only memsets in cold functions, not based on BB profile.
> >
> > Bootstrapped/regtested i686-linux.
> >
> >        * i386.c (decide_alg): Be more conservative about optimizing for size.
> >
> >        * gcc.target/i386/cold-attribute-1.c: Update testcase.
> 
> Hi Honza,
> 
> This patch caused 10% performance drop on 176.gcc in SPEC CPU 20006
> at -O2 -ffast-math on Intel Core 2.

This mostl likely mean that your glibc has slow memset implementation.
I saw similar drop on SPEC2000 GCC and Debian machines used by
compilation farm.  I was told that FSF glibc was finally updated with
new GCC so hopefully this problem is solved now.

Our (Opteron based) SPEC2006 tester seems fine
http://gcc.opensuse.org/SPEC/CINT/sb-balakirew-head-64-2006/403_gcc_big.png

Honza
> 
> 
> 
> -- 
> H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix SPEC2000 GCC regression
  2008-09-10 15:05   ` Jan Hubicka
@ 2008-09-10 15:30     ` H.J. Lu
  2008-09-10 15:57       ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2008-09-10 15:30 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, gcc-patches

On Wed, Sep 10, 2008 at 7:41 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Fri, Sep 5, 2008 at 4:29 AM, Jan Hubicka <jh@suse.cz> wrote:
>> > Hi,
>> > we have regression in SPEC2000 GCC.  The problem is schedule_loop that
>> > contains memset calls of large blocks.  Those calls are identified by
>> > profile code as cold, that is not wrong, but since blocks are large, we
>> > still spend important amount of time.
>> >
>> > This patch makes hot/cold heuristics more conservative when it comes to
>> > string operations of large or unknown size.  Now we optimize for size
>> > only memsets in cold functions, not based on BB profile.
>> >
>> > Bootstrapped/regtested i686-linux.
>> >
>> >        * i386.c (decide_alg): Be more conservative about optimizing for size.
>> >
>> >        * gcc.target/i386/cold-attribute-1.c: Update testcase.
>>
>> Hi Honza,
>>
>> This patch caused 10% performance drop on 176.gcc in SPEC CPU 20006
>> at -O2 -ffast-math on Intel Core 2.
>
> This mostl likely mean that your glibc has slow memset implementation.
> I saw similar drop on SPEC2000 GCC and Debian machines used by
> compilation farm.  I was told that FSF glibc was finally updated with
> new GCC so hopefully this problem is solved now.
>
> Our (Opteron based) SPEC2006 tester seems fine
> http://gcc.opensuse.org/SPEC/CINT/sb-balakirew-head-64-2006/403_gcc_big.png
>

I will verify it on Fedora  9 which has glibc 2.8.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fix SPEC2000 GCC regression
  2008-09-10 15:30     ` H.J. Lu
@ 2008-09-10 15:57       ` Jan Hubicka
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Hubicka @ 2008-09-10 15:57 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jan Hubicka, Jan Hubicka, gcc-patches

> On Wed, Sep 10, 2008 at 7:41 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> On Fri, Sep 5, 2008 at 4:29 AM, Jan Hubicka <jh@suse.cz> wrote:
> >> > Hi,
> >> > we have regression in SPEC2000 GCC.  The problem is schedule_loop that
> >> > contains memset calls of large blocks.  Those calls are identified by
> >> > profile code as cold, that is not wrong, but since blocks are large, we
> >> > still spend important amount of time.
> >> >
> >> > This patch makes hot/cold heuristics more conservative when it comes to
> >> > string operations of large or unknown size.  Now we optimize for size
> >> > only memsets in cold functions, not based on BB profile.
> >> >
> >> > Bootstrapped/regtested i686-linux.
> >> >
> >> >        * i386.c (decide_alg): Be more conservative about optimizing for size.
> >> >
> >> >        * gcc.target/i386/cold-attribute-1.c: Update testcase.
> >>
> >> Hi Honza,
> >>
> >> This patch caused 10% performance drop on 176.gcc in SPEC CPU 20006
> >> at -O2 -ffast-math on Intel Core 2.
> >
> > This mostl likely mean that your glibc has slow memset implementation.
> > I saw similar drop on SPEC2000 GCC and Debian machines used by
> > compilation farm.  I was told that FSF glibc was finally updated with
> > new GCC so hopefully this problem is solved now.
	^^^ strongops :)
> >
> > Our (Opteron based) SPEC2006 tester seems fine
> > http://gcc.opensuse.org/SPEC/CINT/sb-balakirew-head-64-2006/403_gcc_big.png
> >
> 
> I will verify it on Fedora  9 which has glibc 2.8.

Great, thanks!

Honza

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-09-10 15:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-05 11:55 Fix SPEC2000 GCC regression Jan Hubicka
2008-09-10 15:02 ` H.J. Lu
2008-09-10 15:05   ` Jan Hubicka
2008-09-10 15:30     ` H.J. Lu
2008-09-10 15:57       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).