* missed optimization, would be very helpful
@ 1998-12-31 12:32 Zack Weinberg
1998-12-31 16:27 ` Richard Henderson
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Zack Weinberg @ 1998-12-31 12:32 UTC (permalink / raw)
To: egcs
In a loop of the form
while (condition) (*int_ptr)++;
gcc generates a read-mod-write cycle to the memory location *int_ptr at each
iteration of the loop. Modern chips really don't like that; I got a ~30%
speedup on one piece of code by rewriting it to bump a register and write
the memory location at the end. The pointer isn't volatile, so I believe
there is nothing stopping gcc from doing this optimization.
This is egcs-1.1.1; it might've been fixed more recently but I can't test
that since the Dec 24 CVS is unreliable on my system.
(If anyone wants an example, look at cpplib.c:adjust_position.)
zw
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
@ 1998-12-31 16:27 ` Richard Henderson
1998-12-31 18:05 ` Horst von Brand
1999-01-31 23:58 ` Andreas Schwab
2 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 1998-12-31 16:27 UTC (permalink / raw)
To: Zack Weinberg, egcs
On Thu, Dec 31, 1998 at 03:32:06PM -0500, Zack Weinberg wrote:
> In a loop of the form
void foo(int *int_ptr, int x, int y)
{
while (x < y) (*int_ptr)++, x++;
}
> This is egcs-1.1.1; it might've been fixed more recently but I can't test
> that since the Dec 24 CVS is unreliable on my system.
It has been, thanks to Mark Mitchell. We now get this:
ldl $2,0($16)
.align 4
$L5:
addq $2,1,$3
addl $17,1,$17
bis $3,$3,$2
cmplt $17,$18,$1
bne $1,$L5
stl $3,0($16)
Which, now that I look at it, isn't quite ideal, but the bits that
normally fix up this kind of thing -- cse or cprop -- run before
this gets created.
Still, it's a lot better than a load/store pair in the loop.
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
1998-12-31 16:27 ` Richard Henderson
@ 1998-12-31 18:05 ` Horst von Brand
1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` 043633786-0001
1999-01-31 23:58 ` Andreas Schwab
2 siblings, 2 replies; 13+ messages in thread
From: Horst von Brand @ 1998-12-31 18:05 UTC (permalink / raw)
To: Zack Weinberg; +Cc: egcs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]
Zack Weinberg <zack@rabi.columbia.edu> said:
> In a loop of the form
>
> while (condition) (*int_ptr)++;
>
> gcc generates a read-mod-write cycle to the memory location *int_ptr at each
> iteration of the loop. Modern chips really don't like that; I got a ~30%
> speedup on one piece of code by rewriting it to bump a register and write
> the memory location at the end. The pointer isn't volatile, so I believe
> there is nothing stopping gcc from doing this optimization.
>
> This is egcs-1.1.1; it might've been fixed more recently but I can't test
> that since the Dec 24 CVS is unreliable on my system.
It's still there with egcs-19981226. On i586:
f(char *p, int *ip)
{
while(*p++)
(*ip)++;
}
gives:
.file "tst2.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
.globl f
.type f,@function
f:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%edx
movl 12(%ebp),%ecx
jmp .L7
.p2align 4,,7
.L5:
incl (%ecx)
.L7:
movb (%edx),%al
incl %edx
testb %al,%al
jne .L5
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.size f,.Lfe1-f
.ident "GCC: (GNU) egcs-2.92.33 19981226 (gcc2 ss-980609 experimental)"
Happy hacking in 1999!
--
Horst von Brand vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Viña del Mar, Chile +56 32 672616
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1999-01-31 23:58 ` Zack Weinberg
@ 1999-01-31 23:58 ` Horst von Brand
1999-01-31 23:58 ` Joern Rennecke
1 sibling, 0 replies; 13+ messages in thread
From: Horst von Brand @ 1999-01-31 23:58 UTC (permalink / raw)
To: Zack Weinberg; +Cc: Philipp Rumpf, egcs
Zack Weinberg <zack@rabi.columbia.edu> said:
> On Fri, 1 Jan 1999 04:24:36 +0100, 043633786-0001@t-online.de wrote:
> >On Thu, Dec 31, 1998 at 11:03:25PM -0400, Horst von Brand wrote:
> >> f(char *p, int *ip)
> >> {
> >> while(*p++)
> >> (*ip)++;
> >> }
> >
> >Do you want this to become something like
> >
> > movl (%ecx),%esi (well, take anything for %esi)
> >.L5:
> > incl %esi
> > movb (%edx),%al
> > incl %edx
> > testb %al,%al
> > jne .L5
> > movl %esi,(%ecx)
> >
> >?
> That's the right general idea.
And for a similar case (a global whose address I put into a local int *ip)
it does something like the above, after somebody poined out that the
acceses through parameter ip could cause side effects. Sorry, my machine +
testcase are at home.
--
Dr. Horst H. von Brand mailto:vonbrand@inf.utfsm.cl
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1999-01-31 23:58 ` Andreas Schwab
@ 1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Richard Henderson
1999-01-31 23:58 ` Andreas Schwab
0 siblings, 2 replies; 13+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
To: Andreas Schwab; +Cc: egcs
On 06 Jan 1999 10:56:04 +0100, Andreas Schwab wrote:
>Zack Weinberg <zack@rabi.columbia.edu> writes:
>
>|> In a loop of the form
>|>
>|> while (condition) (*int_ptr)++;
>|>
>|> gcc generates a read-mod-write cycle to the memory location *int_ptr at
>|> each iteration of the loop.
[...]
>|> (If anyone wants an example, look at cpplib.c:adjust_position.)
>
>This is a bad example, because linep and colp may alias each other, and a
>char * can alias everything.
I forgot about that - but even this:
void fun (int *__restrict ctr, char *__restrict str)
{
while (*str++) (*ctr)++;
}
compiles to an incl of memory, on x86.
zw
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1999-01-31 23:58 ` Zack Weinberg
@ 1999-01-31 23:58 ` Jeffrey A Law
0 siblings, 0 replies; 13+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
To: Zack Weinberg; +Cc: egcs
In message < 199901011852.NAA07301@rabi.phys.columbia.edu >you write:
> On Thu, 31 Dec 1998 23:03:25 -0400, Horst von Brand wrote:
> >Zack Weinberg <zack@rabi.columbia.edu> said:
> >> In a loop of the form
> >>
> >> while (condition) (*int_ptr)++;
> >>
> >> gcc generates a read-mod-write cycle to the memory location *int_ptr at
> each
> >> iteration of the loop.
>
> >
> >It's still there with egcs-19981226. On i586:
> >
>
> This is what I see too. rth doesn't get that on Alpha, so it's an i386.md
> problem at this point. A guess at a kludge: if the incdi pattern were
> changed to not allow memory operands, that might force the optimization.
> But that's overkill.
I think this is another example of how we want to do post-reload instruction
splitting for the x86.
If we can find a scratch register after reload, then we can turn the
read-mod-write cycle into a read, increment, write as 3 instructions with
better scheduling opportunities.
That way we get the benefit of better scheduling/pairing without adding
register pressure to the port and causing spills.
We've got a lot of the infrastructure in the compiler now -- most importantly
the post-reload life analysis pass. It's just waiting for someone to take
up the task.
jeff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1999-01-31 23:58 ` 043633786-0001
@ 1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Horst von Brand
1999-01-31 23:58 ` Joern Rennecke
0 siblings, 2 replies; 13+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: egcs
On Fri, 1 Jan 1999 04:24:36 +0100, 043633786-0001@t-online.de wrote:
>On Thu, Dec 31, 1998 at 11:03:25PM -0400, Horst von Brand wrote:
>> f(char *p, int *ip)
>> {
>> while(*p++)
>> (*ip)++;
>> }
>
>Do you want this to become something like
>
> movl (%ecx),%esi (well, take anything for %esi)
>.L5:
> incl %esi
> movb (%edx),%al
> incl %edx
> testb %al,%al
> jne .L5
> movl %esi,(%ecx)
>
>?
That's the right general idea.
>What if you call your function like this
> f((char *) a, a[17]);
The compiler is allowed to assume you didn't do that. char * and int * have
different base types, therefore aren't allowed to be "aliased"
(standardese). Optimizing based on that assumption is enabled by
-fstrict-aliasing, but I get identical code for Horst's sample function on
x86 with and without -fstrict-aliasing. (1.1.1 or Dec 24 CVS, makes no
difference).
I suspect that the actual problem is that the x86 machine description
doesn't consider memory to be "worse" than a register for use in an incl
instruction. Maybe someone who's been in there recently can comment?
zw
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1998-12-31 18:05 ` Horst von Brand
@ 1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Jeffrey A Law
1999-01-31 23:58 ` 043633786-0001
1 sibling, 1 reply; 13+ messages in thread
From: Zack Weinberg @ 1999-01-31 23:58 UTC (permalink / raw)
To: egcs
On Thu, 31 Dec 1998 23:03:25 -0400, Horst von Brand wrote:
>Zack Weinberg <zack@rabi.columbia.edu> said:
>> In a loop of the form
>>
>> while (condition) (*int_ptr)++;
>>
>> gcc generates a read-mod-write cycle to the memory location *int_ptr at each
>> iteration of the loop.
>
>It's still there with egcs-19981226. On i586:
>
This is what I see too. rth doesn't get that on Alpha, so it's an i386.md
problem at this point. A guess at a kludge: if the incdi pattern were
changed to not allow memory operands, that might force the optimization.
But that's overkill.
zw
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1999-01-31 23:58 ` Zack Weinberg
@ 1999-01-31 23:58 ` Richard Henderson
1999-01-31 23:58 ` Andreas Schwab
1 sibling, 0 replies; 13+ messages in thread
From: Richard Henderson @ 1999-01-31 23:58 UTC (permalink / raw)
To: Zack Weinberg, Andreas Schwab; +Cc: egcs
On Wed, Jan 06, 1999 at 02:52:29PM -0500, Zack Weinberg wrote:
> void fun (int *__restrict ctr, char *__restrict str)
> {
> while (*str++) (*ctr)++;
> }
>
> compiles to an incl of memory, on x86.
This, I think, is simply a weakness in either the alias or loop
code. Perhaps both. The example I posted does work on x86 --
movl (%ebx),%edx
subl %eax,%ecx
movl %ecx,%eax
.p2align 4,,7
.L5:
incl %edx
decl %eax
jnz .L5
movl %edx,(%ebx)
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Richard Henderson
@ 1999-01-31 23:58 ` Andreas Schwab
1 sibling, 0 replies; 13+ messages in thread
From: Andreas Schwab @ 1999-01-31 23:58 UTC (permalink / raw)
To: Zack Weinberg; +Cc: egcs
Zack Weinberg <zack@rabi.columbia.edu> writes:
|> On 06 Jan 1999 10:56:04 +0100, Andreas Schwab wrote:
|> >Zack Weinberg <zack@rabi.columbia.edu> writes:
|> >
|> >|> In a loop of the form
|> >|>
|> >|> while (condition) (*int_ptr)++;
|> >|>
|> >|> gcc generates a read-mod-write cycle to the memory location *int_ptr at
|> >|> each iteration of the loop.
|> [...]
|> >|> (If anyone wants an example, look at cpplib.c:adjust_position.)
|> >
|> >This is a bad example, because linep and colp may alias each other, and a
|> >char * can alias everything.
|>
|> I forgot about that - but even this:
|>
|> void fun (int *__restrict ctr, char *__restrict str)
|> {
|> while (*str++) (*ctr)++;
|> }
|>
|> compiles to an incl of memory, on x86.
Does egcs already implement restrict to its full extent?
Andreas.
--
Andreas Schwab "And now for something
schwab@issan.cs.uni-dortmund.de completely different"
schwab@gnu.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
1998-12-31 16:27 ` Richard Henderson
1998-12-31 18:05 ` Horst von Brand
@ 1999-01-31 23:58 ` Andreas Schwab
1999-01-31 23:58 ` Zack Weinberg
2 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 1999-01-31 23:58 UTC (permalink / raw)
To: Zack Weinberg; +Cc: egcs
Zack Weinberg <zack@rabi.columbia.edu> writes:
|> In a loop of the form
|>
|> while (condition) (*int_ptr)++;
|>
|> gcc generates a read-mod-write cycle to the memory location *int_ptr at each
|> iteration of the loop. Modern chips really don't like that; I got a ~30%
|> speedup on one piece of code by rewriting it to bump a register and write
|> the memory location at the end. The pointer isn't volatile, so I believe
|> there is nothing stopping gcc from doing this optimization.
|>
|> This is egcs-1.1.1; it might've been fixed more recently but I can't test
|> that since the Dec 24 CVS is unreliable on my system.
|>
|> (If anyone wants an example, look at cpplib.c:adjust_position.)
This is a bad example, because linep and colp may alias each other, and a
char * can alias everything.
--
Andreas Schwab "And now for something
schwab@issan.cs.uni-dortmund.de completely different"
schwab@gnu.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1998-12-31 18:05 ` Horst von Brand
1999-01-31 23:58 ` Zack Weinberg
@ 1999-01-31 23:58 ` 043633786-0001
1999-01-31 23:58 ` Zack Weinberg
1 sibling, 1 reply; 13+ messages in thread
From: 043633786-0001 @ 1999-01-31 23:58 UTC (permalink / raw)
To: Horst von Brand
On Thu, Dec 31, 1998 at 11:03:25PM -0400, Horst von Brand wrote:
> f(char *p, int *ip)
> {
> while(*p++)
> (*ip)++;
> }
> .L5:
> incl (%ecx)
> .L7:
> movb (%edx),%al
> incl %edx
> testb %al,%al
> jne .L5
Do you want this to become something like
movl (%ecx),%esi (well, take anything for %esi)
.L5:
incl %esi
movb (%edx),%al
incl %edx
testb %al,%al
jne .L5
movl %esi,(%ecx)
?
What if you call your function like this
int main(void)
{
int a[1024];
a[17] = -68; /* perhaps I am off by one. You get the idea */
f((char *) a, a[17];
return 19990101;
}
This (or something strikingly similar) would produce different results
with your optimization (or what I think you wanted) and without it.
What you want (if I understood you correctly) is restrict or __builtin_assert.
f(restrict char *a, restrict int *b);
or
f(char *a, int *b)
{
__builtin_assert(a != b);
while(*a++) {
__builtin_assert(a != b);
(*b)++;
}
}
Perhaps __builtin_assert((a - b) > sizeof(int) || (a - b) > sizeof(int)) would
get it better, but no programmer would ever use something like this. Perhaps
__builtin_nocross(a,b) ?
> Happy hacking in 1999!
Happy hacking in 2.[23]. Have a nice year.
Philipp Rumpf
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: missed optimization, would be very helpful
1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Horst von Brand
@ 1999-01-31 23:58 ` Joern Rennecke
1 sibling, 0 replies; 13+ messages in thread
From: Joern Rennecke @ 1999-01-31 23:58 UTC (permalink / raw)
To: Zack Weinberg; +Cc: prumpf, egcs
> >What if you call your function like this
>
> > f((char *) a, a[17]);
>
> The compiler is allowed to assume you didn't do that. char * and int * have
> different base types, therefore aren't allowed to be "aliased"
> (standardese). Optimizing based on that assumption is enabled by
> -fstrict-aliasing, but I get identical code for Horst's sample function on
> x86 with and without -fstrict-aliasing. (1.1.1 or Dec 24 CVS, makes no
> difference).
But char * is special in that it is allowed to alias anything but functions.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~1999-01-31 23:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-12-31 12:32 missed optimization, would be very helpful Zack Weinberg
1998-12-31 16:27 ` Richard Henderson
1998-12-31 18:05 ` Horst von Brand
1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Jeffrey A Law
1999-01-31 23:58 ` 043633786-0001
1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Horst von Brand
1999-01-31 23:58 ` Joern Rennecke
1999-01-31 23:58 ` Andreas Schwab
1999-01-31 23:58 ` Zack Weinberg
1999-01-31 23:58 ` Richard Henderson
1999-01-31 23:58 ` Andreas Schwab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).