[Bug target/28919] IV selection is messed up

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/28919] IV selection is messed up
       [not found] <bug-28919-4@http.gcc.gnu.org/bugzilla/>
@ 2021-07-26  2:49 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-26  2:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2006-09-17 22:48:12         |2021-7-25

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Still happens.
__builtin_prefetch causes the issue.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/28919] IV selection is messed up
  2006-09-01  1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
                   ` (4 preceding siblings ...)
  2006-09-18  5:52 ` tbptbp at gmail dot com
@ 2006-09-18  8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  5 siblings, 0 replies; 7+ messages in thread
From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2006-09-18  8:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from rakdver at atrey dot karlin dot mff dot cuni dot cz  2006-09-18 08:44 -------
Subject: Re:  IV selection is messed up

> On 17 Sep 2006 22:48:12 -0000, rakdver at gcc dot gnu dot org
> <gcc-bugzilla@gcc.gnu.org> wrote:
> > Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part,
> > gcc badly overestimates the size of the loop (it guesses 300 insns).  I will
> > check what I can do with that.
> Provided i understand what you meant, it's the other way around; with
> -fprefetch-loop-array gcc prefetch distance is much too short.

Which is caused by the overestimation of the loop size.  The heuristics
to determine the prefetch distance is constant/size of the loop (which
is the best approximation of the well-known formula
memory latency/time to execute the loop body that we can achieve at the
moment).  If the estimate happened to be more precise (say something
like 40 insns for the testcase below),  gcc would prefetch 5 iterations
ahead with the default values of the constants, which would be slightly
better (although still not quite enough).

> If i remember correctly, that testcase takes a bunch of cycles per
> iteration on my k8 (opteron 252) and you have to prefetch at the very
> least 256 bytes away to make that profitable; it's less than 128 with
> gcc-4.2-20060908.
> 
> That testcase is pretty silly anyway.
> Here's what i get with the real code and -fprefetch-loop-array
> 
>   4011c2:       movdqa (%ecx),%xmm2
>   4011c6:       lea    0x10(%ecx),%eax
>   4011c9:       movdqa %xmm6,%xmm4
>   4011cd:       dec    %edx
>   4011ce:       movdqa %xmm2,%xmm0
>   4011d2:       mov    %eax,%ecx
>   4011d4:       prefetcht0 (%eax)
>   4011d7:       movdqa %xmm6,%xmm1
>   4011db:       punpckldq %xmm2,%xmm0
>   4011df:       punpckhdq %xmm2,%xmm2
>   4011e3:       movdqa %xmm0,%xmm3
>   4011e7:       punpcklqdq %xmm0,%xmm3
>   4011eb:       punpckhqdq %xmm0,%xmm0
>   4011ef:       pcmpgtd %xmm3,%xmm4
>   4011f3:       pcmpgtd %xmm0,%xmm1
>   4011f7:       paddd  0x10(%esp),%xmm4
>   4011fd:       paddd  %xmm1,%xmm4
>   401201:       movdqa %xmm5,%xmm1
>   401205:       pcmpgtd %xmm3,%xmm1
>   401209:       movdqa %xmm1,%xmm3
>   40120d:       movdqa %xmm5,%xmm1
>   401211:       paddd  %xmm7,%xmm3
>   401215:       pcmpgtd %xmm0,%xmm1
>   401219:       movdqa %xmm6,%xmm0
>   40121d:       paddd  %xmm1,%xmm3
>   401221:       movdqa %xmm2,%xmm1
>   401225:       punpcklqdq %xmm2,%xmm1
>   401229:       punpckhqdq %xmm2,%xmm2
>   40122d:       pcmpgtd %xmm1,%xmm0
>   401231:       paddd  %xmm0,%xmm4
>   401235:       movdqa %xmm6,%xmm0
>   401239:       pcmpgtd %xmm2,%xmm0
>   40123d:       paddd  %xmm0,%xmm4
>   401241:       movdqa %xmm5,%xmm0
>   401245:       movaps %xmm4,0x10(%esp)
>   40124a:       pcmpgtd %xmm1,%xmm0
>   40124e:       paddd  %xmm0,%xmm3
>   401252:       movdqa %xmm5,%xmm0
>   401256:       pcmpgtd %xmm2,%xmm0
>   40125a:       paddd  %xmm0,%xmm3
>   40125e:       movdqa %xmm3,%xmm7
>   401262:       jne    4011c2
> <kdlib::AEBH::streaming_sampling(kdlib::AEBH::streaming_node_t const&,
> kdlib::AEBH::sampler3D_t const&)+0x52>
> 
> Each iteration takes about 8 cycles when not starved and prefetching
> isn't a win unless done at least 4 or 8 cachelines away, so this one
> is nothing but a hinderance.
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/28919] IV selection is messed up
  2006-09-01  1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
                   ` (3 preceding siblings ...)
  2006-09-18  4:15 ` steven at gcc dot gnu dot org
@ 2006-09-18  5:52 ` tbptbp at gmail dot com
  2006-09-18  8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  5 siblings, 0 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2006-09-18  5:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from tbptbp at gmail dot com  2006-09-18 05:52 -------
Subject: Re:  IV selection is messed up

On 17 Sep 2006 22:48:12 -0000, rakdver at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part,
> gcc badly overestimates the size of the loop (it guesses 300 insns).  I will
> check what I can do with that.
Provided i understand what you meant, it's the other way around; with
-fprefetch-loop-array gcc prefetch distance is much too short.
If i remember correctly, that testcase takes a bunch of cycles per
iteration on my k8 (opteron 252) and you have to prefetch at the very
least 256 bytes away to make that profitable; it's less than 128 with
gcc-4.2-20060908.

That testcase is pretty silly anyway.
Here's what i get with the real code and -fprefetch-loop-array

  4011c2:       movdqa (%ecx),%xmm2
  4011c6:       lea    0x10(%ecx),%eax
  4011c9:       movdqa %xmm6,%xmm4
  4011cd:       dec    %edx
  4011ce:       movdqa %xmm2,%xmm0
  4011d2:       mov    %eax,%ecx
  4011d4:       prefetcht0 (%eax)
  4011d7:       movdqa %xmm6,%xmm1
  4011db:       punpckldq %xmm2,%xmm0
  4011df:       punpckhdq %xmm2,%xmm2
  4011e3:       movdqa %xmm0,%xmm3
  4011e7:       punpcklqdq %xmm0,%xmm3
  4011eb:       punpckhqdq %xmm0,%xmm0
  4011ef:       pcmpgtd %xmm3,%xmm4
  4011f3:       pcmpgtd %xmm0,%xmm1
  4011f7:       paddd  0x10(%esp),%xmm4
  4011fd:       paddd  %xmm1,%xmm4
  401201:       movdqa %xmm5,%xmm1
  401205:       pcmpgtd %xmm3,%xmm1
  401209:       movdqa %xmm1,%xmm3
  40120d:       movdqa %xmm5,%xmm1
  401211:       paddd  %xmm7,%xmm3
  401215:       pcmpgtd %xmm0,%xmm1
  401219:       movdqa %xmm6,%xmm0
  40121d:       paddd  %xmm1,%xmm3
  401221:       movdqa %xmm2,%xmm1
  401225:       punpcklqdq %xmm2,%xmm1
  401229:       punpckhqdq %xmm2,%xmm2
  40122d:       pcmpgtd %xmm1,%xmm0
  401231:       paddd  %xmm0,%xmm4
  401235:       movdqa %xmm6,%xmm0
  401239:       pcmpgtd %xmm2,%xmm0
  40123d:       paddd  %xmm0,%xmm4
  401241:       movdqa %xmm5,%xmm0
  401245:       movaps %xmm4,0x10(%esp)
  40124a:       pcmpgtd %xmm1,%xmm0
  40124e:       paddd  %xmm0,%xmm3
  401252:       movdqa %xmm5,%xmm0
  401256:       pcmpgtd %xmm2,%xmm0
  40125a:       paddd  %xmm0,%xmm3
  40125e:       movdqa %xmm3,%xmm7
  401262:       jne    4011c2
<kdlib::AEBH::streaming_sampling(kdlib::AEBH::streaming_node_t const&,
kdlib::AEBH::sampler3D_t const&)+0x52>

Each iteration takes about 8 cycles when not starved and prefetching
isn't a win unless done at least 4 or 8 cachelines away, so this one
is nothing but a hinderance.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/28919] IV selection is messed up
  2006-09-01  1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
                   ` (2 preceding siblings ...)
  2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
@ 2006-09-18  4:15 ` steven at gcc dot gnu dot org
  2006-09-18  5:52 ` tbptbp at gmail dot com
  2006-09-18  8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz
  5 siblings, 0 replies; 7+ messages in thread
From: steven at gcc dot gnu dot org @ 2006-09-18  4:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from steven at gcc dot gnu dot org  2006-09-18 04:15 -------
To cut down the estimate for the loop size, you need to treat CALL_EXPRs to
machine specific builtins specially (and probably some of the normal builtins
too).  See estimate_num_insns_1, the case for CALL_EXPR.  You probably need to
add a target hook to make this work.  This has been discussed before on the
mailing lists.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/28919] IV selection is messed up
  2006-09-01  1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
  2006-09-01  2:46 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu dot org
  2006-09-01  2:53 ` tbptbp at gmail dot com
@ 2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
  2006-09-18  4:15 ` steven at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-17 22:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rakdver at gcc dot gnu dot org  2006-09-17 22:48 -------
(In reply to comment #4)
> Actually this is just a problem of IV selection, what is happening is the IV
> selection chooses the 1024+(const char *)&base[quad] as the IV instead of just
> &base[quad] which causes the bigger encoding.

My guess is that there are two reasons for this:
1) at the moment, ivopts do not know that _mm_prefetch (or __builtin_prefetch
or any other prefetching builtin) is special and that the address it takes may
be expressed using an addressing mode
2) the cost function for addresses pretends that more complicated addressing
modes are cheaper

Both problems can be solved, but most likely not in stage3.

Regarding the "-fprefetch-loop-arrays's heuristic is way off the mark" part,
gcc badly overestimates the size of the loop (it guesses 300 insns).  I will
check what I can do with that.


-- 

rakdver at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2006-09-17 22:48:12
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/28919] IV selection is messed up
  2006-09-01  1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
  2006-09-01  2:46 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu dot org
@ 2006-09-01  2:53 ` tbptbp at gmail dot com
  2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tbptbp at gmail dot com @ 2006-09-01  2:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from tbptbp at gmail dot com  2006-09-01 02:53 -------
(In reply to comment #4)
> Actually this is just a problem of IV selection, what is happening is the IV
> selection chooses the 1024+(const char *)&base[quad] as the IV instead of just
> &base[quad] which causes the bigger encoding.
Ok. I've tried many things but i totally fail to get that IV selection to take
a pick the other way around; i'd apreciate a clue :)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/28919] IV selection is messed up
  2006-09-01  1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
@ 2006-09-01  2:46 ` pinskia at gcc dot gnu dot org
  2006-09-01  2:53 ` tbptbp at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-01  2:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gcc dot gnu dot org  2006-09-01 02:46 -------
Actually this is just a problem of IV selection, what is happening is the IV
selection chooses the 1024+(const char *)&base[quad] as the IV instead of just
&base[quad] which causes the bigger encoding.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|overzealous pointer         |IV selection is messed up
                   |coalescence leading to poor |
                   |encoding                    |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28919


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-26  2:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-28919-4@http.gcc.gnu.org/bugzilla/>
2021-07-26  2:49 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu.org
2006-09-01  1:54 [Bug target/28919] New: overzealous pointer coalescence leading to poor encoding tbptbp at gmail dot com
2006-09-01  2:46 ` [Bug target/28919] IV selection is messed up pinskia at gcc dot gnu dot org
2006-09-01  2:53 ` tbptbp at gmail dot com
2006-09-17 22:48 ` rakdver at gcc dot gnu dot org
2006-09-18  4:15 ` steven at gcc dot gnu dot org
2006-09-18  5:52 ` tbptbp at gmail dot com
2006-09-18  8:44 ` rakdver at atrey dot karlin dot mff dot cuni dot cz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).