* _mm_store_pd/s translated to movntpd/s
@ 2011-03-21 14:23 Matthias Kretz
2011-03-21 14:49 ` Matthias Kretz
0 siblings, 1 reply; 5+ messages in thread
From: Matthias Kretz @ 2011-03-21 14:23 UTC (permalink / raw)
To: gcc-help
Hi,
I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now I
tested on an AMD Magny-Cours using the -march=barcelona flag and gcc
translated _mm_store_pd/s calls in the code to streaming stores in the
resulting binary.
Where does this "optimization" come from and how can I disable it? This
doesn't make much sense on a working set that fits into the cache...
Is this intended behavior or a bug?
Cheers,
Matthias
--
Dipl.-Phys. Matthias Kretz
http://compeng.uni-frankfurt.de/?mkretz
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: _mm_store_pd/s translated to movntpd/s
2011-03-21 14:23 _mm_store_pd/s translated to movntpd/s Matthias Kretz
@ 2011-03-21 14:49 ` Matthias Kretz
2011-03-21 17:00 ` Brian Budge
2011-03-21 20:43 ` Ian Lance Taylor
0 siblings, 2 replies; 5+ messages in thread
From: Matthias Kretz @ 2011-03-21 14:49 UTC (permalink / raw)
To: gcc-help
Hi,
On Monday 21 March 2011 15:23:02 Matthias Kretz wrote:
> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now I
> tested on an AMD Magny-Cours using the -march=barcelona flag and gcc
> translated _mm_store_pd/s calls in the code to streaming stores in the
> resulting binary.
>
> Where does this "optimization" come from and how can I disable it? This
> doesn't make much sense on a working set that fits into the cache...
>
> Is this intended behavior or a bug?
Additional info: If I add -fno-prefetch-loop-arrays I get normal stores as
expected. I don't consider this a solution, though.
Regards,
Matthias
--
Dipl.-Phys. Matthias Kretz
http://compeng.uni-frankfurt.de/?mkretz
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: _mm_store_pd/s translated to movntpd/s
2011-03-21 14:49 ` Matthias Kretz
@ 2011-03-21 17:00 ` Brian Budge
2011-03-21 17:56 ` Matthias Kretz
2011-03-21 20:43 ` Ian Lance Taylor
1 sibling, 1 reply; 5+ messages in thread
From: Brian Budge @ 2011-03-21 17:00 UTC (permalink / raw)
To: Matthias Kretz; +Cc: gcc-help
On Mon, Mar 21, 2011 at 7:49 AM, Matthias Kretz
<kretz@compeng.uni-frankfurt.de> wrote:
> Hi,
>
> On Monday 21 March 2011 15:23:02 Matthias Kretz wrote:
>> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now I
>> tested on an AMD Magny-Cours using the -march=barcelona flag and gcc
>> translated _mm_store_pd/s calls in the code to streaming stores in the
>> resulting binary.
>>
>> Where does this "optimization" come from and how can I disable it? This
>> doesn't make much sense on a working set that fits into the cache...
>>
>> Is this intended behavior or a bug?
>
> Additional info: If I add -fno-prefetch-loop-arrays I get normal stores as
> expected. I don't consider this a solution, though.
>
> Regards,
> Matthias
>
> --
> Dipl.-Phys. Matthias Kretz
> http://compeng.uni-frankfurt.de/?mkretz
>
Do you mean _mm_stream_pd/s? I think store will still take your
values to cache...
Brian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: _mm_store_pd/s translated to movntpd/s
2011-03-21 17:00 ` Brian Budge
@ 2011-03-21 17:56 ` Matthias Kretz
0 siblings, 0 replies; 5+ messages in thread
From: Matthias Kretz @ 2011-03-21 17:56 UTC (permalink / raw)
To: gcc-help
Hi,
On Monday 21 March 2011 18:00:38 Brian Budge wrote:
> On Mon, Mar 21, 2011 at 7:49 AM, Matthias Kretz wrote:
> > On Monday 21 March 2011 15:23:02 Matthias Kretz wrote:
> >> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now
> >> I tested on an AMD Magny-Cours using the -march=barcelona flag and gcc
> >> translated _mm_store_pd/s calls in the code to streaming stores in the
> >> resulting binary.
> >>
> >> Where does this "optimization" come from and how can I disable it? This
> >> doesn't make much sense on a working set that fits into the cache...
> >>
> >> Is this intended behavior or a bug?
> >
> > Additional info: If I add -fno-prefetch-loop-arrays I get normal stores
> > as expected. I don't consider this a solution, though.
>
> Do you mean _mm_stream_pd/s? I think store will still take your
> values to cache...
I mean that I wrote _mm_store_pd/s in my code but I got _mm_stream_pd/s
instead. Only if I compile with -fno-prefetch-loop-arrays do I actually get
non-streaming stores.
Regards,
Matthias
--
Dipl.-Phys. Matthias Kretz
http://compeng.uni-frankfurt.de/?mkretz
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: _mm_store_pd/s translated to movntpd/s
2011-03-21 14:49 ` Matthias Kretz
2011-03-21 17:00 ` Brian Budge
@ 2011-03-21 20:43 ` Ian Lance Taylor
1 sibling, 0 replies; 5+ messages in thread
From: Ian Lance Taylor @ 2011-03-21 20:43 UTC (permalink / raw)
To: Matthias Kretz; +Cc: gcc-help
Matthias Kretz <kretz@compeng.uni-frankfurt.de> writes:
> On Monday 21 March 2011 15:23:02 Matthias Kretz wrote:
>> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now I
>> tested on an AMD Magny-Cours using the -march=barcelona flag and gcc
>> translated _mm_store_pd/s calls in the code to streaming stores in the
>> resulting binary.
>>
>> Where does this "optimization" come from and how can I disable it? This
>> doesn't make much sense on a working set that fits into the cache...
>>
>> Is this intended behavior or a bug?
>
> Additional info: If I add -fno-prefetch-loop-arrays I get normal stores as
> expected. I don't consider this a solution, though.
That is precisely where this optimization is coming from. The
vectorizer pretty much assumes that the working set doesn't fit in the
cache. I think it would be reasonable to have an option to control
this. Please consider filing a bug report as described at
http://gcc.gnu.org/bugs/ , ideally with a test case.
Ian
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-03-21 20:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-21 14:23 _mm_store_pd/s translated to movntpd/s Matthias Kretz
2011-03-21 14:49 ` Matthias Kretz
2011-03-21 17:00 ` Brian Budge
2011-03-21 17:56 ` Matthias Kretz
2011-03-21 20:43 ` Ian Lance Taylor
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).