public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* missed optimization, would be very helpful
@ 1998-12-31 12:32 Zack Weinberg
  1998-12-31 16:27 ` Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Zack Weinberg @ 1998-12-31 12:32 UTC (permalink / raw)
  To: egcs

In a loop of the form

while (condition) (*int_ptr)++;

gcc generates a read-mod-write cycle to the memory location *int_ptr at each
iteration of the loop.  Modern chips really don't like that; I got a ~30%
speedup on one piece of code by rewriting it to bump a register and write
the memory location at the end.  The pointer isn't volatile, so I believe
there is nothing stopping gcc from doing this optimization.

This is egcs-1.1.1; it might've been fixed more recently but I can't test
that since the Dec 24 CVS is unreliable on my system.

(If anyone wants an example, look at cpplib.c:adjust_position.)

zw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
@ 1998-12-31 16:27 ` Richard Henderson
  1998-12-31 18:05 ` Horst von Brand
  1999-01-31 23:58 ` Andreas Schwab
  2 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 1998-12-31 16:27 UTC (permalink / raw)
  To: Zack Weinberg, egcs

On Thu, Dec 31, 1998 at 03:32:06PM -0500, Zack Weinberg wrote:
> In a loop of the form

void foo(int *int_ptr, int x, int y)
{
  while (x < y) (*int_ptr)++, x++;
}

> This is egcs-1.1.1; it might've been fixed more recently but I can't test
> that since the Dec 24 CVS is unreliable on my system.

It has been, thanks to Mark Mitchell.  We now get this:

        ldl $2,0($16)
        .align 4
$L5:
        addq $2,1,$3
        addl $17,1,$17
        bis $3,$3,$2
        cmplt $17,$18,$1
        bne $1,$L5
        stl $3,0($16)

Which, now that I look at it, isn't quite ideal, but the bits that
normally fix up this kind of thing -- cse or cprop -- run before
this gets created.

Still, it's a lot better than a load/store pair in the loop.


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
  1998-12-31 16:27 ` Richard Henderson
@ 1998-12-31 18:05 ` Horst von Brand
  1999-01-31 23:58   ` Zack Weinberg
  1999-01-31 23:58   ` 043633786-0001
  1999-01-31 23:58 ` Andreas Schwab
  2 siblings, 2 replies; 13+ messages in thread
From: Horst von Brand @ 1998-12-31 18:05 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

Zack Weinberg <zack@rabi.columbia.edu> said:
> In a loop of the form
> 
> while (condition) (*int_ptr)++;
> 
> gcc generates a read-mod-write cycle to the memory location *int_ptr at each
> iteration of the loop.  Modern chips really don't like that; I got a ~30%
> speedup on one piece of code by rewriting it to bump a register and write
> the memory location at the end.  The pointer isn't volatile, so I believe
> there is nothing stopping gcc from doing this optimization.
> 
> This is egcs-1.1.1; it might've been fixed more recently but I can't test
> that since the Dec 24 CVS is unreliable on my system.

It's still there with egcs-19981226. On i586:

f(char *p, int *ip)
{ 
   while(*p++)
     (*ip)++;
}

gives:

	.file	"tst2.c"
	.version	"01.01"
gcc2_compiled.:
.text
	.align 4
.globl f
	.type	 f,@function
f:
	pushl %ebp
	movl %esp,%ebp
	movl 8(%ebp),%edx
	movl 12(%ebp),%ecx
	jmp .L7
	.p2align 4,,7
.L5:
	incl (%ecx)
.L7:
	movb (%edx),%al
	incl %edx
	testb %al,%al
	jne .L5
	movl %ebp,%esp
	popl %ebp
	ret
.Lfe1:
	.size	 f,.Lfe1-f
	.ident	"GCC: (GNU) egcs-2.92.33 19981226 (gcc2 ss-980609 experimental)"

Happy hacking in 1999!
-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Viña del Mar, Chile                               +56 32 672616

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1999-01-31 23:58     ` Zack Weinberg
@ 1999-01-31 23:58       ` Horst von Brand
  1999-01-31 23:58       ` Joern Rennecke
  1 sibling, 0 replies; 13+ messages in thread
From: Horst von Brand @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Philipp Rumpf, egcs

Zack Weinberg <zack@rabi.columbia.edu> said:
> On Fri, 1 Jan 1999 04:24:36 +0100, 043633786-0001@t-online.de wrote:
> >On Thu, Dec 31, 1998 at 11:03:25PM -0400, Horst von Brand wrote:
> >> f(char *p, int *ip)
> >> { 
> >>    while(*p++)
> >>      (*ip)++;
> >> }
> >
> >Do you want this to become something like
> >
> >	movl (%ecx),%esi (well, take anything for %esi)
> >.L5:
> >	incl %esi
> >	movb (%edx),%al
> >	incl %edx
> >	testb %al,%al
> >	jne .L5
> >	movl %esi,(%ecx)
> >
> >?

> That's the right general idea.

And for a similar case (a global whose address I put into a local int *ip)
it does something like the above, after somebody poined out that the
acceses through parameter ip could cause side effects. Sorry, my machine +
testcase are at home.
-- 
Dr. Horst H. von Brand                       mailto:vonbrand@inf.utfsm.cl
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1999-01-31 23:58 ` Andreas Schwab
@ 1999-01-31 23:58   ` Zack Weinberg
  1999-01-31 23:58     ` Richard Henderson
  1999-01-31 23:58     ` Andreas Schwab
  0 siblings, 2 replies; 13+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: egcs

On 06 Jan 1999 10:56:04 +0100, Andreas Schwab wrote:
>Zack Weinberg <zack@rabi.columbia.edu> writes:
>
>|> In a loop of the form
>|> 
>|> while (condition) (*int_ptr)++;
>|> 
>|> gcc generates a read-mod-write cycle to the memory location *int_ptr at
>|> each iteration of the loop.
[...]
>|> (If anyone wants an example, look at cpplib.c:adjust_position.)
>
>This is a bad example, because linep and colp may alias each other, and a
>char * can alias everything.

I forgot about that - but even this:

void fun (int *__restrict ctr, char *__restrict str)
{
    while (*str++) (*ctr)++;
}

compiles to an incl of memory, on x86.

zw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1999-01-31 23:58   ` Zack Weinberg
@ 1999-01-31 23:58     ` Jeffrey A Law
  0 siblings, 0 replies; 13+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

  In message < 199901011852.NAA07301@rabi.phys.columbia.edu >you write:
  > On Thu, 31 Dec 1998 23:03:25 -0400, Horst von Brand wrote:
  > >Zack Weinberg <zack@rabi.columbia.edu> said:
  > >> In a loop of the form
  > >> 
  > >> while (condition) (*int_ptr)++;
  > >> 
  > >> gcc generates a read-mod-write cycle to the memory location *int_ptr at 
  > each
  > >> iteration of the loop.
  > 
  > >
  > >It's still there with egcs-19981226. On i586:
  > >
  > 
  > This is what I see too.  rth doesn't get that on Alpha, so it's an i386.md
  > problem at this point.  A guess at a kludge: if the incdi pattern were
  > changed to not allow memory operands, that might force the optimization. 
  > But that's overkill.
I think this is another example of how we want to do post-reload instruction
splitting for the x86.

If we can find a scratch register after reload, then we can turn the
read-mod-write cycle into a read, increment, write as 3 instructions with
better scheduling opportunities.

That way we get the benefit of better scheduling/pairing without adding
register pressure to the port and causing spills.

We've got a lot of the infrastructure in the compiler now -- most importantly
the post-reload life analysis pass.  It's just waiting for someone to take
up the task.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1999-01-31 23:58   ` 043633786-0001
@ 1999-01-31 23:58     ` Zack Weinberg
  1999-01-31 23:58       ` Horst von Brand
  1999-01-31 23:58       ` Joern Rennecke
  0 siblings, 2 replies; 13+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Philipp Rumpf; +Cc: egcs

On Fri, 1 Jan 1999 04:24:36 +0100, 043633786-0001@t-online.de wrote:
>On Thu, Dec 31, 1998 at 11:03:25PM -0400, Horst von Brand wrote:
>> f(char *p, int *ip)
>> { 
>>    while(*p++)
>>      (*ip)++;
>> }
>
>Do you want this to become something like
>
>	movl (%ecx),%esi (well, take anything for %esi)
>.L5:
>	incl %esi
>	movb (%edx),%al
>	incl %edx
>	testb %al,%al
>	jne .L5
>	movl %esi,(%ecx)
>
>?

That's the right general idea.

>What if you call your function like this

>	f((char *) a, a[17]);

The compiler is allowed to assume you didn't do that.  char * and int * have
different base types, therefore aren't allowed to be "aliased"
(standardese).  Optimizing based on that assumption is enabled by
-fstrict-aliasing, but I get identical code for Horst's sample function on
x86 with and without -fstrict-aliasing.  (1.1.1 or Dec 24 CVS, makes no
difference).

I suspect that the actual problem is that the x86 machine description
doesn't consider memory to be "worse" than a register for use in an incl
instruction.  Maybe someone who's been in there recently can comment?

zw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1998-12-31 18:05 ` Horst von Brand
@ 1999-01-31 23:58   ` Zack Weinberg
  1999-01-31 23:58     ` Jeffrey A Law
  1999-01-31 23:58   ` 043633786-0001
  1 sibling, 1 reply; 13+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
  To: egcs

On Thu, 31 Dec 1998 23:03:25 -0400, Horst von Brand wrote:
>Zack Weinberg <zack@rabi.columbia.edu> said:
>> In a loop of the form
>> 
>> while (condition) (*int_ptr)++;
>> 
>> gcc generates a read-mod-write cycle to the memory location *int_ptr at each
>> iteration of the loop.

>
>It's still there with egcs-19981226. On i586:
>

This is what I see too.  rth doesn't get that on Alpha, so it's an i386.md
problem at this point.  A guess at a kludge: if the incdi pattern were
changed to not allow memory operands, that might force the optimization. 
But that's overkill.

zw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1999-01-31 23:58   ` Zack Weinberg
@ 1999-01-31 23:58     ` Richard Henderson
  1999-01-31 23:58     ` Andreas Schwab
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Henderson @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg, Andreas Schwab; +Cc: egcs

On Wed, Jan 06, 1999 at 02:52:29PM -0500, Zack Weinberg wrote:
> void fun (int *__restrict ctr, char *__restrict str)
> {
>     while (*str++) (*ctr)++;
> }
> 
> compiles to an incl of memory, on x86.

This, I think, is simply a weakness in either the alias or loop
code.  Perhaps both.  The example I posted does work on x86 --

        movl (%ebx),%edx
        subl %eax,%ecx
        movl %ecx,%eax
        .p2align 4,,7
.L5:
        incl %edx
        decl %eax
        jnz .L5
        movl %edx,(%ebx)


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1999-01-31 23:58   ` Zack Weinberg
  1999-01-31 23:58     ` Richard Henderson
@ 1999-01-31 23:58     ` Andreas Schwab
  1 sibling, 0 replies; 13+ messages in thread
From: Andreas Schwab @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

Zack Weinberg <zack@rabi.columbia.edu> writes:

|> On 06 Jan 1999 10:56:04 +0100, Andreas Schwab wrote:
|> >Zack Weinberg <zack@rabi.columbia.edu> writes:
|> >
|> >|> In a loop of the form
|> >|> 
|> >|> while (condition) (*int_ptr)++;
|> >|> 
|> >|> gcc generates a read-mod-write cycle to the memory location *int_ptr at
|> >|> each iteration of the loop.
|> [...]
|> >|> (If anyone wants an example, look at cpplib.c:adjust_position.)
|> >
|> >This is a bad example, because linep and colp may alias each other, and a
|> >char * can alias everything.
|> 
|> I forgot about that - but even this:
|> 
|> void fun (int *__restrict ctr, char *__restrict str)
|> {
|>     while (*str++) (*ctr)++;
|> }
|> 
|> compiles to an incl of memory, on x86.

Does egcs already implement restrict to its full extent?

Andreas.

-- 
Andreas Schwab                                      "And now for something
schwab@issan.cs.uni-dortmund.de                      completely different"
schwab@gnu.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
  1998-12-31 16:27 ` Richard Henderson
  1998-12-31 18:05 ` Horst von Brand
@ 1999-01-31 23:58 ` Andreas Schwab
  1999-01-31 23:58   ` Zack Weinberg
  2 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: egcs

Zack Weinberg <zack@rabi.columbia.edu> writes:

|> In a loop of the form
|> 
|> while (condition) (*int_ptr)++;
|> 
|> gcc generates a read-mod-write cycle to the memory location *int_ptr at each
|> iteration of the loop.  Modern chips really don't like that; I got a ~30%
|> speedup on one piece of code by rewriting it to bump a register and write
|> the memory location at the end.  The pointer isn't volatile, so I believe
|> there is nothing stopping gcc from doing this optimization.
|> 
|> This is egcs-1.1.1; it might've been fixed more recently but I can't test
|> that since the Dec 24 CVS is unreliable on my system.
|> 
|> (If anyone wants an example, look at cpplib.c:adjust_position.)

This is a bad example, because linep and colp may alias each other, and a
char * can alias everything.

-- 
Andreas Schwab                                      "And now for something
schwab@issan.cs.uni-dortmund.de                      completely different"
schwab@gnu.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1998-12-31 18:05 ` Horst von Brand
  1999-01-31 23:58   ` Zack Weinberg
@ 1999-01-31 23:58   ` 043633786-0001
  1999-01-31 23:58     ` Zack Weinberg
  1 sibling, 1 reply; 13+ messages in thread
From: 043633786-0001 @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Horst von Brand

On Thu, Dec 31, 1998 at 11:03:25PM -0400, Horst von Brand wrote:
> f(char *p, int *ip)
> { 
>    while(*p++)
>      (*ip)++;
> }

> .L5:
> 	incl (%ecx)
> .L7:
> 	movb (%edx),%al
> 	incl %edx
> 	testb %al,%al
> 	jne .L5

Do you want this to become something like

	movl (%ecx),%esi (well, take anything for %esi)
.L5:
	incl %esi
	movb (%edx),%al
	incl %edx
	testb %al,%al
	jne .L5
	movl %esi,(%ecx)

?

What if you call your function like this

int main(void)
{
	int a[1024];
	
	a[17] = -68; /* perhaps I am off by one. You get the idea */
	f((char *) a, a[17];

	return 19990101;
}

This (or something strikingly similar) would produce different results
with your optimization (or what I think you wanted) and without it.
What you want (if I understood you correctly) is restrict or __builtin_assert.

f(restrict char *a, restrict int *b);

or 

f(char *a, int *b)
{
	__builtin_assert(a != b);
	while(*a++) {
		__builtin_assert(a != b);
		(*b)++;
	}
}

Perhaps __builtin_assert((a - b) > sizeof(int) || (a - b) > sizeof(int)) would
get it better, but no programmer would ever use something like this. Perhaps
__builtin_nocross(a,b) ?

> Happy hacking in 1999!

Happy hacking in 2.[23]. Have a nice year.

	Philipp Rumpf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: missed optimization, would be very helpful
  1999-01-31 23:58     ` Zack Weinberg
  1999-01-31 23:58       ` Horst von Brand
@ 1999-01-31 23:58       ` Joern Rennecke
  1 sibling, 0 replies; 13+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: prumpf, egcs

> >What if you call your function like this
> 
> >	f((char *) a, a[17]);
> 
> The compiler is allowed to assume you didn't do that.  char * and int * have
> different base types, therefore aren't allowed to be "aliased"
> (standardese).  Optimizing based on that assumption is enabled by
> -fstrict-aliasing, but I get identical code for Horst's sample function on
> x86 with and without -fstrict-aliasing.  (1.1.1 or Dec 24 CVS, makes no
> difference).

But char * is special in that it is allowed to alias anything but functions.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~1999-01-31 23:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
1998-12-31 16:27 ` Richard Henderson
1998-12-31 18:05 ` Horst von Brand
1999-01-31 23:58   ` Zack Weinberg
1999-01-31 23:58     ` Jeffrey A Law
1999-01-31 23:58   ` 043633786-0001
1999-01-31 23:58     ` Zack Weinberg
1999-01-31 23:58       ` Horst von Brand
1999-01-31 23:58       ` Joern Rennecke
1999-01-31 23:58 ` Andreas Schwab
1999-01-31 23:58   ` Zack Weinberg
1999-01-31 23:58     ` Richard Henderson
1999-01-31 23:58     ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).