* mmintrin slower than inline asm or even plain C
@ 2007-04-22 4:18 Jack Andrews
2007-04-22 10:31 ` Richard Guenther
0 siblings, 1 reply; 2+ messages in thread
From: Jack Andrews @ 2007-04-22 4:18 UTC (permalink / raw)
To: hjl, mark, rth, richard.guenther; +Cc: gcc-help
hi guys,
i write to you direct because i can't find the relevant mailing list
for help with the mmintrin functions. there's a thread at:
http://gcc.gnu.org/ml/gcc-help/2007-04/msg00201.html
that details my problems.
i want to sum an array of longs using mmx. i use the functions:
_mm_set_pi32 and _m_paddd
but the resultant binary contains significantly less efficient code
than inline asm or even plain C ( for(i=0;i<n;i++)total+=a[i]; ).
here's the relevant function:
simd_mmintrin(n, is)
I *is;
{ __m64 q,r;
I i;
_m_empty();
q=_m_from_int(0);
for (i=0; i < n; i+=W) {
r=_mm_set_pi32(is[i],is[i+1]);
q=_m_paddd(q,r);
}
union {long a[2];__m64 m;}u;
u.m=q;
return u.a[0]+u.a[1];
}
and the rest of the code and a shell script to run it is in the thread above.
thank you,
jack
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: mmintrin slower than inline asm or even plain C
2007-04-22 4:18 mmintrin slower than inline asm or even plain C Jack Andrews
@ 2007-04-22 10:31 ` Richard Guenther
0 siblings, 0 replies; 2+ messages in thread
From: Richard Guenther @ 2007-04-22 10:31 UTC (permalink / raw)
To: Jack Andrews; +Cc: hjl, mark, rth, gcc-help
On 4/22/07, Jack Andrews <effbiae@gmail.com> wrote:
> hi guys,
>
> i write to you direct because i can't find the relevant mailing list
> for help with the mmintrin functions. there's a thread at:
>
> http://gcc.gnu.org/ml/gcc-help/2007-04/msg00201.html
>
> that details my problems.
>
> i want to sum an array of longs using mmx. i use the functions:
> _mm_set_pi32 and _m_paddd
> but the resultant binary contains significantly less efficient code
> than inline asm or even plain C ( for(i=0;i<n;i++)total+=a[i]; ).
> here's the relevant function:
>
> simd_mmintrin(n, is)
> I *is;
> { __m64 q,r;
> I i;
> _m_empty();
> q=_m_from_int(0);
> for (i=0; i < n; i+=W) {
> r=_mm_set_pi32(is[i],is[i+1]);
> q=_m_paddd(q,r);
> }
> union {long a[2];__m64 m;}u;
> u.m=q;
> return u.a[0]+u.a[1];
> }
>
> and the rest of the code and a shell script to run it is in the thread above.
You should do a bugreport. I suspect that we cannot combine
_mm_set_pi32(is[i],is[i+1]) to a movq as you do in the asm and that
we have non-optimal register allocation.
Richard.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2007-04-22 10:06 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-22 4:18 mmintrin slower than inline asm or even plain C Jack Andrews
2007-04-22 10:31 ` Richard Guenther
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).