public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores
[not found] <bug-17264-581@http.gcc.gnu.org/bugzilla/>
@ 2006-09-24 19:52 ` falk at debian dot org
2006-09-24 22:15 ` dave at hiauly1 dot hia dot nrc dot ca
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: falk at debian dot org @ 2006-09-24 19:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from falk at debian dot org 2006-09-24 19:52 -------
For this test case:
void f(double *pds, double *pdd, unsigned long len) {
while (len >= 8*sizeof(double)) {
register double r1,r2,r3,r4;
r1 = *pds++;
r2 = *pds++;
r3 = *pds++;
r4 = *pds++;
*pdd++ = r1;
*pdd++ = r2;
*pdd++ = r3;
*pdd++ = r4;
}
}
gcc starting from 4.0 produces this:
.L3:
fldds -16(%r26),%fr22
fldds -8(%r26),%fr23
fldds 0(%r26),%fr24
fldds 8(%r26),%fr25
ldo 32(%r26),%r26
fstds %fr22,-16(%r25)
fstds %fr23,-8(%r25)
fstds %fr24,0(%r25)
fstds %fr25,8(%r25)
b .L3
which I suspect is actually better, since it avoids dependencies between the
loads. But I'm not familiar with hppa, can anybody comment?
--
falk at debian dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail| |3.4.2 4.1.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17264
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores
[not found] <bug-17264-581@http.gcc.gnu.org/bugzilla/>
2006-09-24 19:52 ` [Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores falk at debian dot org
@ 2006-09-24 22:15 ` dave at hiauly1 dot hia dot nrc dot ca
2006-09-24 23:48 ` randolph at tausq dot org
2006-09-24 23:49 ` tausq at debian dot org
3 siblings, 0 replies; 5+ messages in thread
From: dave at hiauly1 dot hia dot nrc dot ca @ 2006-09-24 22:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from dave at hiauly1 dot hia dot nrc dot ca 2006-09-24 22:15 -------
Subject: Re: [hppa] Missing address increment optimization for fp load/stores
> For this test case:
>
> void f(double *pds, double *pdd, unsigned long len) {
> while (len >= 8*sizeof(double)) {
> register double r1,r2,r3,r4;
> r1 = *pds++;
> r2 = *pds++;
> r3 = *pds++;
> r4 = *pds++;
> *pdd++ = r1;
> *pdd++ = r2;
> *pdd++ = r3;
> *pdd++ = r4;
> }
> }
>
> gcc starting from 4.0 produces this:
>
> .L3:
> fldds -16(%r26),%fr22
> fldds -8(%r26),%fr23
> fldds 0(%r26),%fr24
> fldds 8(%r26),%fr25
> ldo 32(%r26),%r26
> fstds %fr22,-16(%r25)
> fstds %fr23,-8(%r25)
> fstds %fr24,0(%r25)
> fstds %fr25,8(%r25)
> b .L3
>
> which I suspect is actually better, since it avoids dependencies between the
> loads. But I'm not familiar with hppa, can anybody comment?
It looks close to optimal to me. The code is better than that generated
by 3.4.x or HP cc. Using the auto-increment forms would allow elimination
of the two ldo instructions to increment r25 and r26.
Dave
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17264
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores
[not found] <bug-17264-581@http.gcc.gnu.org/bugzilla/>
2006-09-24 19:52 ` [Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores falk at debian dot org
2006-09-24 22:15 ` dave at hiauly1 dot hia dot nrc dot ca
@ 2006-09-24 23:48 ` randolph at tausq dot org
2006-09-24 23:49 ` tausq at debian dot org
3 siblings, 0 replies; 5+ messages in thread
From: randolph at tausq dot org @ 2006-09-24 23:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from randolph at tausq dot org 2006-09-24 23:48 -------
Subject: Re: [hppa] Missing address increment
optimization for fp load/stores
>> gcc starting from 4.0 produces this:
>>
>> .L3:
>> fldds -16(%r26),%fr22
>> fldds -8(%r26),%fr23
>> fldds 0(%r26),%fr24
>> fldds 8(%r26),%fr25
>> ldo 32(%r26),%r26
>> fstds %fr22,-16(%r25)
>> fstds %fr23,-8(%r25)
>> fstds %fr24,0(%r25)
>> fstds %fr25,8(%r25)
>> b .L3
>>
>> which I suspect is actually better, since it avoids dependencies between the
>> loads. But I'm not familiar with hppa, can anybody comment?
>
> It looks close to optimal to me. The code is better than that generated
> by 3.4.x or HP cc. Using the auto-increment forms would allow elimination
> of the two ldo instructions to increment r25 and r26.
Yeah, this looks pretty good. I've been told that not using the
autoincrement forms might be even better as it avoids interlocks between
successive instructions. The ldo insn just gets pipelined so it doesn't
necessarily slow things down.
I'll mark this bug as resolved.
thanks
randolph
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17264
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores
[not found] <bug-17264-581@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2006-09-24 23:48 ` randolph at tausq dot org
@ 2006-09-24 23:49 ` tausq at debian dot org
3 siblings, 0 replies; 5+ messages in thread
From: tausq at debian dot org @ 2006-09-24 23:49 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from tausq at debian dot org 2006-09-24 23:49 -------
Fixed in gcc-4.x
--
tausq at debian dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17264
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/17264] New: [hppa] Missing address increment optimization for fp load/stores
@ 2004-09-01 18:22 tausq at debian dot org
2004-09-01 18:53 ` [Bug rtl-optimization/17264] " danglin at gcc dot gnu dot org
0 siblings, 1 reply; 5+ messages in thread
From: tausq at debian dot org @ 2004-09-01 18:22 UTC (permalink / raw)
To: gcc-bugs
i have a bit of loop code that looks like this:
pds = (double *)pcs;
pdd = (double *)pcd;
while (len >= 8*sizeof(double)) {
register double r1,r2,r3,r4,r5,r6,r7,r8;
prefetch((const void *)(pds+8));
r1 = *pds++;
r2 = *pds++;
r3 = *pds++;
r4 = *pds++;
*pdd++ = r1;
*pdd++ = r2;
*pdd++ = r3;
*pdd++ = r4;
/* ... */
}
gcc translates this to:
2c: 2e 80 10 16 fldd 0(,r20),fr22
30: 37 18 3f 81 ldo -40(r24),r24
34: 36 94 00 10 ldo 8(r20),r20
38: 2e 80 10 17 fldd 0(,r20),fr23
3c: 36 94 00 10 ldo 8(r20),r20
40: 2e 80 10 18 fldd 0(,r20),fr24
44: 36 94 00 10 ldo 8(r20),r20
48: 2e 80 10 19 fldd 0(,r20),fr25
4c: 36 94 00 10 ldo 8(r20),r20
50: 2f 40 12 16 fstd fr22,0(,r26)
54: 37 5a 00 10 ldo 8(r26),r26
58: 2f 40 12 17 fstd fr23,0(,r26)
5c: 37 5a 00 10 ldo 8(r26),r26
60: 2f 40 12 18 fstd fr24,0(,r26)
64: 37 5a 00 10 ldo 8(r26),r26
68: 2f 40 12 19 fstd fr25,0(,r26)
6c: 37 5a 00 10 ldo 8(r26),r26
it is probably better to emit fldd,ma and fstd,ma instructions in this case.
(this works for ldw/stw insns)
Dave Anglin writes:
I think we need to add combiner patterns for floating point loads
and stores with base register modification. These need to be similar
to those for ldw and stw. See pa.md (~ line 2465 in 3.4).
I haven't done a complete scan but I think we need to add SFmode
patterns using ldw and stw, DFmode patterns using ldd and std,
SImode and SFmode using fldw and fstw, DImode and DFmode using
fldd and fstd. The half word and byte patterns need to be reviewed
to see that they are complete.
--
Summary: [hppa] Missing address increment optimization for fp
load/stores
Product: gcc
Version: 3.4.2
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tausq at debian dot org
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: hppa-linux
GCC host triplet: hppa-linux
GCC target triplet: hppa-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17264
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-09-24 23:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-17264-581@http.gcc.gnu.org/bugzilla/>
2006-09-24 19:52 ` [Bug rtl-optimization/17264] [hppa] Missing address increment optimization for fp load/stores falk at debian dot org
2006-09-24 22:15 ` dave at hiauly1 dot hia dot nrc dot ca
2006-09-24 23:48 ` randolph at tausq dot org
2006-09-24 23:49 ` tausq at debian dot org
2004-09-01 18:22 [Bug rtl-optimization/17264] New: " tausq at debian dot org
2004-09-01 18:53 ` [Bug rtl-optimization/17264] " danglin at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).