public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/28919] IV selection is messed up
[not found] <bug-28919-4@http.gcc.gnu.org/bugzilla/>
@ 2021-07-26 2:49 ` pinskia at gcc dot gnu.org
0 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-26 2:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2006-09-17 22:48:12 |2021-7-25
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Still happens.
__builtin_prefetch causes the issue.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/28919] IV selection is messed up
2006-09-01 1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
` (4 preceding siblings ...)
2006-09-18 5:52 ` tbptbp at gmail dot com
@ 2006-09-18 8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
5 siblings, 0 replies; 7+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2006-09-18 8:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rakdver at atrey dot karlin dot mff dot cuni dot cz 2006-09-18 08:44 -------
Subject: Re: IV selection is messed up
> On 17 Sep 2006 22:48:12 -0000, rakdver at gcc dot gnu dot org
> <gcc-bugzilla@gcc.gnu.org> wrote:
> > Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part,
> > gcc badly overestimates the size of the loop (it guesses 300 insns). I will
> > check what I can do with that.
> Provided i understand what you meant, it's the other way around; with
> -fprefetch-loop-array gcc prefetch distance is much too short.
Which is caused by the overestimation of the loop size. The heuristics
to determine the prefetch distance is constant/size of the loop (which
is the best approximation of the well-known formula
memory latency/time to execute the loop body that we can achieve at the
moment). If the estimate happened to be more precise (say something
like 40 insns for the testcase below), gcc would prefetch 5 iterations
ahead with the default values of the constants, which would be slightly
better (although still not quite enough).
> If i remember correctly, that testcase takes a bunch of cycles per
> iteration on my k8 (opteron 252) and you have to prefetch at the very
> least 256 bytes away to make that profitable; it's less than 128 with
> gcc-4.2-20060908.
>
> That testcase is pretty silly anyway.
> Here's what i get with the real code and -fprefetch-loop-array
>
> 4011c2: movdqa (%ecx),%xmm2
> 4011c6: lea 0x10(%ecx),%eax
> 4011c9: movdqa %xmm6,%xmm4
> 4011cd: dec %edx
> 4011ce: movdqa %xmm2,%xmm0
> 4011d2: mov %eax,%ecx
> 4011d4: prefetcht0 (%eax)
> 4011d7: movdqa %xmm6,%xmm1
> 4011db: punpckldq %xmm2,%xmm0
> 4011df: punpckhdq %xmm2,%xmm2
> 4011e3: movdqa %xmm0,%xmm3
> 4011e7: punpcklqdq %xmm0,%xmm3
> 4011eb: punpckhqdq %xmm0,%xmm0
> 4011ef: pcmpgtd %xmm3,%xmm4
> 4011f3: pcmpgtd %xmm0,%xmm1
> 4011f7: paddd 0x10(%esp),%xmm4
> 4011fd: paddd %xmm1,%xmm4
> 401201: movdqa %xmm5,%xmm1
> 401205: pcmpgtd %xmm3,%xmm1
> 401209: movdqa %xmm1,%xmm3
> 40120d: movdqa %xmm5,%xmm1
> 401211: paddd %xmm7,%xmm3
> 401215: pcmpgtd %xmm0,%xmm1
> 401219: movdqa %xmm6,%xmm0
> 40121d: paddd %xmm1,%xmm3
> 401221: movdqa %xmm2,%xmm1
> 401225: punpcklqdq %xmm2,%xmm1
> 401229: punpckhqdq %xmm2,%xmm2
> 40122d: pcmpgtd %xmm1,%xmm0
> 401231: paddd %xmm0,%xmm4
> 401235: movdqa %xmm6,%xmm0
> 401239: pcmpgtd %xmm2,%xmm0
> 40123d: paddd %xmm0,%xmm4
> 401241: movdqa %xmm5,%xmm0
> 401245: movaps %xmm4,0x10(%esp)
> 40124a: pcmpgtd %xmm1,%xmm0
> 40124e: paddd %xmm0,%xmm3
> 401252: movdqa %xmm5,%xmm0
> 401256: pcmpgtd %xmm2,%xmm0
> 40125a: paddd %xmm0,%xmm3
> 40125e: movdqa %xmm3,%xmm7
> 401262: jne 4011c2
> <kdlib::AEBH::streaming_sampling(kdlib::AEBH::streaming_node_t const&,
> kdlib::AEBH::sampler3D_t const&)+0x52>
>
> Each iteration takes about 8 cycles when not starved and prefetching
> isn't a win unless done at least 4 or 8 cachelines away, so this one
> is nothing but a hinderance.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/28919] IV selection is messed up
2006-09-01 1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
` (3 preceding siblings ...)
2006-09-18 4:15 ` steven at gcc dot gnu dot org
@ 2006-09-18 5:52 ` tbptbp at gmail dot com
2006-09-18 8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
5 siblings, 0 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2006-09-18 5:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from tbptbp at gmail dot com 2006-09-18 05:52 -------
Subject: Re: IV selection is messed up
On 17 Sep 2006 22:48:12 -0000, rakdver at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part,
> gcc badly overestimates the size of the loop (it guesses 300 insns). I will
> check what I can do with that.
Provided i understand what you meant, it's the other way around; with
-fprefetch-loop-array gcc prefetch distance is much too short.
If i remember correctly, that testcase takes a bunch of cycles per
iteration on my k8 (opteron 252) and you have to prefetch at the very
least 256 bytes away to make that profitable; it's less than 128 with
gcc-4.2-20060908.
That testcase is pretty silly anyway.
Here's what i get with the real code and -fprefetch-loop-array
4011c2: movdqa (%ecx),%xmm2
4011c6: lea 0x10(%ecx),%eax
4011c9: movdqa %xmm6,%xmm4
4011cd: dec %edx
4011ce: movdqa %xmm2,%xmm0
4011d2: mov %eax,%ecx
4011d4: prefetcht0 (%eax)
4011d7: movdqa %xmm6,%xmm1
4011db: punpckldq %xmm2,%xmm0
4011df: punpckhdq %xmm2,%xmm2
4011e3: movdqa %xmm0,%xmm3
4011e7: punpcklqdq %xmm0,%xmm3
4011eb: punpckhqdq %xmm0,%xmm0
4011ef: pcmpgtd %xmm3,%xmm4
4011f3: pcmpgtd %xmm0,%xmm1
4011f7: paddd 0x10(%esp),%xmm4
4011fd: paddd %xmm1,%xmm4
401201: movdqa %xmm5,%xmm1
401205: pcmpgtd %xmm3,%xmm1
401209: movdqa %xmm1,%xmm3
40120d: movdqa %xmm5,%xmm1
401211: paddd %xmm7,%xmm3
401215: pcmpgtd %xmm0,%xmm1
401219: movdqa %xmm6,%xmm0
40121d: paddd %xmm1,%xmm3
401221: movdqa %xmm2,%xmm1
401225: punpcklqdq %xmm2,%xmm1
401229: punpckhqdq %xmm2,%xmm2
40122d: pcmpgtd %xmm1,%xmm0
401231: paddd %xmm0,%xmm4
401235: movdqa %xmm6,%xmm0
401239: pcmpgtd %xmm2,%xmm0
40123d: paddd %xmm0,%xmm4
401241: movdqa %xmm5,%xmm0
401245: movaps %xmm4,0x10(%esp)
40124a: pcmpgtd %xmm1,%xmm0
40124e: paddd %xmm0,%xmm3
401252: movdqa %xmm5,%xmm0
401256: pcmpgtd %xmm2,%xmm0
40125a: paddd %xmm0,%xmm3
40125e: movdqa %xmm3,%xmm7
401262: jne 4011c2
<kdlib::AEBH::streaming_sampling(kdlib::AEBH::streaming_node_t const&,
kdlib::AEBH::sampler3D_t const&)+0x52>
Each iteration takes about 8 cycles when not starved and prefetching
isn't a win unless done at least 4 or 8 cachelines away, so this one
is nothing but a hinderance.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/28919] IV selection is messed up
2006-09-01 1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
` (2 preceding siblings ...)
2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
@ 2006-09-18 4:15 ` steven at gcc dot gnu dot org
2006-09-18 5:52 ` tbptbp at gmail dot com
2006-09-18 8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
5 siblings, 0 replies; 7+ messages in thread
From: steven at gcc dot gnu dot org @ 2006-09-18 4:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from steven at gcc dot gnu dot org 2006-09-18 04:15 -------
To cut down the estimate for the loop size, you need to treat CALL_EXPRs to
machine specific builtins specially (and probably some of the normal builtins
too). See estimate_num_insns_1, the case for CALL_EXPR. You probably need to
add a target hook to make this work. This has been discussed before on the
mailing lists.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/28919] IV selection is messed up
2006-09-01 1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
2006-09-01 2:46 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu dot org
2006-09-01 2:53 ` tbptbp at gmail dot com
@ 2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
2006-09-18 4:15 ` steven at gcc dot gnu dot org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-17 22:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rakdver at gcc dot gnu dot org 2006-09-17 22:48 -------
(In reply to comment #4)
> Actually this is just a problem of IV selection, what is happening is the IV
> selection chooses the 1024+(const char *)&base[quad] as the IV instead of just
> &base[quad] which causes the bigger encoding.
My guess is that there are two reasons for this:
1) at the moment, ivopts do not know that _mm_prefetch (or __builtin_prefetch
or any other prefetching builtin) is special and that the address it takes may
be expressed using an addressing mode
2) the cost function for addresses pretends that more complicated addressing
modes are cheaper
Both problems can be solved, but most likely not in stage3.
Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part,
gcc badly overestimates the size of the loop (it guesses 300 insns). I will
check what I can do with that.
--
rakdver at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2006-09-17 22:48:12
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/28919] IV selection is messed up
2006-09-01 1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
2006-09-01 2:46 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu dot org
@ 2006-09-01 2:53 ` tbptbp at gmail dot com
2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2006-09-01 2:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from tbptbp at gmail dot com 2006-09-01 02:53 -------
(In reply to comment #4)
> Actually this is just a problem of IV selection, what is happening is the IV
> selection chooses the 1024+(const char *)&base[quad] as the IV instead of just
> &base[quad] which causes the bigger encoding.
Ok. I've tried many things but i totally fail to get that IV selection to take
a pick the other way around; i'd apreciate a clue :)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/28919] IV selection is messed up
2006-09-01 1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
@ 2006-09-01 2:46 ` pinskia at gcc dot gnu dot org
2006-09-01 2:53 ` tbptbp at gmail dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-01 2:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from pinskia at gcc dot gnu dot org 2006-09-01 02:46 -------
Actually this is just a problem of IV selection, what is happening is the IV
selection chooses the 1024+(const char *)&base[quad] as the IV instead of just
&base[quad] which causes the bigger encoding.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|overzealous pointer |IV selection is messed up
|coalescence leading to poor |
|encoding |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-07-26 2:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-28919-4@http.gcc.gnu.org/bugzilla/>
2021-07-26 2:49 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu.org
2006-09-01 1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
2006-09-01 2:46 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu dot org
2006-09-01 2:53 ` tbptbp at gmail dot com
2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
2006-09-18 4:15 ` steven at gcc dot gnu dot org
2006-09-18 5:52 ` tbptbp at gmail dot com
2006-09-18 8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).