Re: gcc 3.2 altivec options and glibc

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: gcc 3.2 altivec options and glibc
@ 2002-08-25  9:29 Jack Howarth
  2002-08-26  4:05 ` Daniel Egger
  0 siblings, 1 reply; 8+ messages in thread
From: Jack Howarth @ 2002-08-25  9:29 UTC (permalink / raw)
  To: gcc, drow

Daniel,
   Okay. I had been looking over glibc/sysdeps/powerpc/fpu and
those routines, at first glance, seemed not to be specificly
using assembly for the fpu so I was curious if one could recompile
them to run on the altivec instead. Guess not...
                          Jack

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: gcc 3.2 altivec options and glibc
  2002-08-25  9:29 gcc 3.2 altivec options and glibc Jack Howarth
@ 2002-08-26  4:05 ` Daniel Egger
  2002-08-27  2:24   ` Gabriel Paubert
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Egger @ 2002-08-26  4:05 UTC (permalink / raw)
  To: Jack Howarth; +Cc: GCC Developer Mailinglist, drow

Am Son, 2002-08-25 um 18.28 schrieb Jack Howarth:

>    Okay. I had been looking over glibc/sysdeps/powerpc/fpu and
> those routines, at first glance, seemed not to be specificly
> using assembly for the fpu so I was curious if one could recompile
> them to run on the altivec instead. Guess not...

No, not. And actually I doubt that one could gain very much by adding
altivecized routines because vectorized routines happen to be
only fast for large(r) datasets and parallelizeable algorithms of which 
you'll find very few in glibc. That idea is not even suitable for
parallelisable functions like memcpy because the alignment restrictions
can hardly be enforced on the source and destination parameters which
means either more specialcases or general aligning data which wouldn't
leave any speed improvements.

-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA9aUy/chlzsq9KoIYRAsWBAKCtnaUV1szr3ULdlynyTHXQQarZJgCfe7uV
SdFUSWBHr9hS+jbGuuvIGbU=
=9eZt
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: gcc 3.2 altivec options and glibc
  2002-08-26  4:05 ` Daniel Egger
@ 2002-08-27  2:24   ` Gabriel Paubert
  2002-08-27  5:47     ` Daniel Egger
  0 siblings, 1 reply; 8+ messages in thread
From: Gabriel Paubert @ 2002-08-27  2:24 UTC (permalink / raw)
  To: Daniel Egger; +Cc: Jack Howarth, GCC Developer Mailinglist, drow

Daniel Egger wrote:
> Am Son, 2002-08-25 um 18.28 schrieb Jack Howarth:
>
>
>> Okay. I had been looking over glibc/sysdeps/powerpc/fpu and those
>> routines, at first glance, seemed not to be specificly using assembly
>>  for the fpu so I was curious if one could recompile them to run on
>> the altivec instead. Guess not...
>
>
> No, not. And actually I doubt that one could gain very much by adding
> altivecized routines because vectorized routines happen to be only fast
>  for large(r) datasets and parallelizeable algorithms of which you'll
> find very few in glibc. That idea is not even suitable for
> parallelisable functions like memcpy because the alignment restrictions
>  can hardly be enforced on the source and destination parameters which
> means either more specialcases or general aligning data which wouldn't
> leave any speed improvements.

I know it's offtopic, but you are ignoring Altivec's merge instruction,
which allows to write a compact memcpy loop (9 instruction loop to copy 32
bytes: 2 loads, 2 stores, 2 merges, 2 address bumps and one decrement and
branch), taking care of the alignment by shuffling bytes around in
registers. Of course it's only worth for fairly large copies, especially
since the head and tail of the copy are likely to have a non negligible
icache footprint. With a suitable shuffle register parameter, Altivec's
merge instruction can be used for many other things, like endian
conversion, etc..., but that's beyond the point.

Besides that, all vector instruction sets can at least be used for memset.

>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: gcc 3.2 altivec options and glibc
  2002-08-27  2:24   ` Gabriel Paubert
@ 2002-08-27  5:47     ` Daniel Egger
  2002-08-27 10:17       ` Geoff Keating
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Egger @ 2002-08-27  5:47 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Jack Howarth, GCC Developer Mailinglist, drow

Am Die, 2002-08-27 um 11.24 schrieb Gabriel Paubert:

> I know it's offtopic, but you are ignoring Altivec's merge instruction,

merge? You probably mean permute....

> which allows to write a compact memcpy loop (9 instruction loop to copy 32
> bytes: 2 loads, 2 stores, 2 merges, 2 address bumps and one decrement and
> branch), taking care of the alignment by shuffling bytes around in
> registers.

Generic code with constant vector permutation is always relatively slow
compared to unencumbered code because for a single read one needs two
16-byte reads and for an unaligned write 2 reads + 2 writes + several
helper mnemonics. Memory access on PowerPC is traditionally quite slow
because of the way the front-bus works.

> Of course it's only worth for fairly large copies, especially
> since the head and tail of the copy are likely to have a non negligible
> icache footprint.

Exactly my point.

> With a suitable shuffle register parameter, Altivec's
> merge instruction can be used for many other things, like endian
> conversion, etc..., but that's beyond the point.

> Besides that, all vector instruction sets can at least be used for memset.

You can use it for anything but performance will likely suck for a large
amount of unaligned memory accesses which is what you'll most likely get
with standard functions. 

even:

void clear_blocks_altivec (DCTELEM *blocks)
{
  vector signed short temp;
  vector signed short zero = vec_xor (temp, temp);
  unsigned int offset;

  for (offset = 0; offset < sizeof(DCTELEM) * 6 * 64; offset += 16)
    vec_st (czero, offset, blocks); 
}

with unrolling and several other tricks is no real burner though its
guaranteed alignment and for larger mem* you'll probably want to use
cacheblock instructions anyways because they are resulting in maximum
throughput.

-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA9a3Rzchlzsq9KoIYRAl2iAJ9CFEL962RJF7kEh1AurIH0JY/3gQCdEIQ9
6VbOGixBJaV3NVQldDcc0fM=
=6Vvc
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: gcc 3.2 altivec options and glibc
  2002-08-27  5:47     ` Daniel Egger
@ 2002-08-27 10:17       ` Geoff Keating
  2002-08-27 11:46         ` Daniel Egger
  0 siblings, 1 reply; 8+ messages in thread
From: Geoff Keating @ 2002-08-27 10:17 UTC (permalink / raw)
  To: Daniel Egger; +Cc: gcc

Daniel Egger <degger@fhm.edu> writes:

> > Besides that, all vector instruction sets can at least be used for memset.
> 
> You can use it for anything but performance will likely suck for a large
> amount of unaligned memory accesses which is what you'll most likely get
> with standard functions. 

That's not quite how it works.  glibc's current memset for powerpc
(which doesn't rely on the presence of altivec) is actually very
similar to what you'd write using a vector instruction set, except
that it deals with data in 32-bit chunks instead of 128-bit.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: gcc 3.2 altivec options and glibc
  2002-08-27 10:17       ` Geoff Keating
@ 2002-08-27 11:46         ` Daniel Egger
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Egger @ 2002-08-27 11:46 UTC (permalink / raw)
  To: Geoff Keating; +Cc: GCC Developer Mailinglist

Am Die, 2002-08-27 um 19.17 schrieb Geoff Keating:

> > You can use it for anything but performance will likely suck for a large
> > amount of unaligned memory accesses which is what you'll most likely get
> > with standard functions. 

> That's not quite how it works.  glibc's current memset for powerpc
> (which doesn't rely on the presence of altivec) is actually very
> similar to what you'd write using a vector instruction set, except
> that it deals with data in 32-bit chunks instead of 128-bit.

I know how memset works but what is your point? :) All the mem-functions
basically break up all request into largest possible aligned chunks and
then work on them to avoid unaligned accesses.

That works great for one, two or three different size types but will
quickly loose the benefits if the functions gets bloated to much by
leaving choices. AltiVec is even more a special case here because one
cannot assume it's existance on all PPC processors and thus we'd either
need to different sets of mem functions, ohne w/ and one w/o AltiVec
support which can be switched somehow or another conditional in the
function. Also since the functions are called in the same processcontext
some %vr have to be saved which will again cost.

But feel free to try it. I benchmarked a handwritten memcpy using
lvx/stvx, loops with double/int/short/char and memcpy for various
sized of 16-byte aligned memory. memcpy won in almost all cases 
by lengths followed by int-loop, lvx/stvx-loop and then the others
on a 7410 with 500Mhz. A wild speculation here is that LSU is optimised
for 32bit accesses which makes it really go fast though accesses through
larger types have the advantage of reducing the code size and thus fit
better into caches and reduce load in the scheduler.

My conclusion from that experience is that AltiVec is only worth the
hassles if you really have something to calculate and this job is
actually paralliseable. 

--
Servus,
       Daniel
-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA9a8bDchlzsq9KoIYRAklVAKDdbVH4cSsdia//a7eI3KLEhZiVzACcChCc
/rNya/qn90qlbsd9q64fEuc=
=Eyyl
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: gcc 3.2 altivec options and glibc
  2002-08-25  9:16 Jack Howarth
@ 2002-08-25  9:18 ` Daniel Jacobowitz
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Jacobowitz @ 2002-08-25  9:18 UTC (permalink / raw)
  To: Jack Howarth; +Cc: gcc

On Sun, Aug 25, 2002 at 12:16:34PM -0400, Jack Howarth wrote:
> Hi,
>     Does anyone intimate with the gcc 3.2 altivec code
> generation options have a good feel for what would happen
> if glibc were recompiled with altivec code generation
> enabled in gcc 3.2? In particular, would the libm created
> by such a compile have its floating point operations 
> redirected to the altivec? I am asking because we already have
> some code, hp-timing, that doesn't run on the 601, and it
> might be interesting at this point to explore an altivec
> saavy version of glibc. It seems to me that simply enabling
> its use through compiler flags would be a good starting 
> point, no? Thanks in advance for any suggestions.

GCC does not automatically use Altivec instructions for anything.  It
only enables the user to hand-code them without having to use assembly.
So far at least.

In addition, I believe support in 3.2 is a bit immature; the trunk is
better.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* gcc 3.2 altivec options and glibc
@ 2002-08-25  9:16 Jack Howarth
  2002-08-25  9:18 ` Daniel Jacobowitz
  0 siblings, 1 reply; 8+ messages in thread
From: Jack Howarth @ 2002-08-25  9:16 UTC (permalink / raw)
  To: gcc

Hi,
    Does anyone intimate with the gcc 3.2 altivec code
generation options have a good feel for what would happen
if glibc were recompiled with altivec code generation
enabled in gcc 3.2? In particular, would the libm created
by such a compile have its floating point operations 
redirected to the altivec? I am asking because we already have
some code, hp-timing, that doesn't run on the 601, and it
might be interesting at this point to explore an altivec
saavy version of glibc. It seems to me that simply enabling
its use through compiler flags would be a good starting 
point, no? Thanks in advance for any suggestions.
                            Jack

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-08-27 11:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-25  9:29 gcc 3.2 altivec options and glibc Jack Howarth
2002-08-26  4:05 ` Daniel Egger
2002-08-27  2:24   ` Gabriel Paubert
2002-08-27  5:47     ` Daniel Egger
2002-08-27 10:17       ` Geoff Keating
2002-08-27 11:46         ` Daniel Egger
  -- strict thread matches above, loose matches on Subject: below --
2002-08-25  9:16 Jack Howarth
2002-08-25  9:18 ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).