[PATCH] Fix PR 47272 to restore Altivec vec_ld/vec

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
@ 2011-01-24 22:34 Michael Meissner
  2011-01-24 22:46 ` Mark Mitchell
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Meissner @ 2011-01-24 22:34 UTC (permalink / raw)
  To: gcc-patches, dje.gcc, rth, rguenther, jakub, berner, geoffk,
	mark, joseph, pinskia, dominiq

[-- Attachment #1: Type: text/plain, Size: 4321 bytes --]

This patch fixes bug PR target/47272, but I'm sending it out to a wider
audience to solicit feedback from other developers to resolve a sticky
situation with the PowerPC code gen.

For those of you who don't know the power architecture, particularly with the
VSX extensions, the first main vector extension was the Altivec (VMX) vector
support.  The VSX vector support adds more floating point vector registers, and
overlaps with the Altivec support.  In particular, the vector loads and stores
on Altivec ignore the bottom 3 bits of the address, while the vector loads and
stores of the VSX instruction set do not, and will do unaligned loads and
stores.  Obviously, if the address is completely aligned, either an Altivec or
a VSX memory instruction will behave the same.  If on the other hand you have
an unaligned address, you will get different bytes loaded/stored.

The PowerPC compiler has a full set of overloaded vector intrinisic builtin
functions, including builtins for doing load and store.  When I added the VSX
support, I changed the compiler to do VSX loads/stores if the user used -mvsx
or -mcpu=power7, including changing the builtin load/store functions to use the
VSX instructions.  However, as I said, you get different results for unaligned
addresses.

Richard Henderson's change to libcpp/lex.c in August 21st, 2010 added code to
use the Altivec instruction set if the compiler supports it to speed up the
preprocessor:

2010-08-21  Richard Henderson  <rth@redhat.com>
	    Andi Kleen <ak@linux.intel.com>
	    David S. Miller  <davem@davemloft.net>

	* configure.ac (AC_C_BIGENDIAN, AC_TYPE_UINTPTR_T): New tests.
	(ssize_t): Check via AC_TYPE_SSIZE_T instead of AC_CHECK_TYPE.
	(ptrdiff_t): Check via AC_CHECK_TYPE.
	* config.in, configure: Rebuild.
	* system.h: Include stdint.h, if available.
	* lex.c (WORDS_BIGENDIAN): Provide default.
	(acc_char_mask_misalign, acc_char_replicate, acc_char_cmp,
	acc_char_index, search_line_acc_char, repl_chars, search_line_mmx,
	search_line_sse2, search_line_sse42, init_vectorized_lexer,
	search_line_fast): New.
	(_cpp_clean_line): Use search_line_fast.  Restructure the fast
	loop to make it clear when we're leaving the loop.  Stay in the
	fast loop for non-trigraph '?'.

Recently we started to look at building internal versions of the GCC 4.6
compiler with the --with-cpu=power7 support, and it exposed the difference
between the two loads.

So after some debate within IBM, we've come to the conclusion that I should not
have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that
we should go back to using the Altivec form for these instructions.  However,
in doing so, it means that anybody who has written new code explicitly for
power7 since GCC 4.5 came out might now be suprised.  Unfortunately the bug
exists in GCC 4.5 as well as the Red Hat RHEL6 and SUSE Sles 11 Sp1 compilers.

I realize that we are in stage 4 of the release process, but if we are going to
change the builtins back to the 4.4 semantics, we should do it as soon as
possible.

David suggested I poll release managers and other interested parties
what path we should take (make the builtins adhere to the 4.4 semantics, or
just keep the current situation).

I'm enclosing patches to make the load/store builtins go back to the Altivec
semantics, and added vector double support to those.  In addition, I added
patches for libcpp/lex.c so that it will work with 4.5 compilers as well as 4.4
and future 4.6 compilers.  No matter whether we decide not to re-change the
builtin semantics or not, I feel the lex.c patch should go it.

Right now, I did not add an #ifdef or -m switch to toggle to the 4.5
behaviour.  I can do this if desired (it probably is a good idea to allow code
written for 4.5 to continue to be used).  I don't know how many people directly
write using the Altivec semantics.

I should mention that Darwin users and people using the host processor in PS3
that might have written Altivec specific code will not be affected, since those
machines do not have the VSX instruction set.  It is only the newer machines
being shipped by IBM that currently will have the problem.

Sorry about all this.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

[-- Attachment #2: gcc-power7.patch205 --]
[-- Type: text/plain, Size: 5249 bytes --]

[gcc]
2011-01-24  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/47272
	* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
	Document using vector double with the load/store builtins, and
	that the load/store builtins always use Altivec instructions.

	* config/rs6000/vector.md (vector_altivec_load_<mode>): New insns
	to use altivec memory instructions, even on VSX.
	(vector_altivec_store_<mode>): Ditto.

	* config/rs6000/rs6000-protos.h (rs6000_address_for_altivec): New
	function.

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	V2DF, V2DI support to load/store overloaded builtins.

	* config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_*): Add
	altivec load/store builtins for V2DF/V2DI types.

	* config/rs6000/rs6000.c (altivec_expand_ld_builtin): Add V2DF,
	V2DI support, use vector_altivec_load/vector_altivec_store
	builtins.
	(altivec_expand_st_builtin): Ditto.
	(altivec_expand_builtin): Update altivec lvx/stvx builtin name.
	(altivec_init_builtins): Add support for V2DF/V2DI altivec
	load/store builtins.
	(rs6000_address_for_altivec): Insure memory address is appropriate
	for Altivec.

	* config/rs6000/altivec.md (UNSPEC_LVX): New UNSPEC.
	(altivec_lvx_<mode>): Make altivec_lvx use a mode iterator.
	(altivec_stvx_<mode>): Make altivec_stvx use a mode iterator.

[libcpp]
2011-01-24  Peter Bergner  <bergner@vnet.ibm.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/47272
	* lex.c (search_line_fast): Work with compilers that generate
	either LXVW4X or LVX for vec_ld.

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 169112)
+++ gcc/doc/extend.texi	(working copy)
@@ -12354,6 +12354,12 @@ vector bool long vec_cmplt (vector doubl
 vector float vec_div (vector float, vector float);
 vector double vec_div (vector double, vector double);
 vector double vec_floor (vector double);
+vector double vec_ld (int, const vector double *);
+vector double vec_ld (int, const double *);
+vector double vec_ldl (int, const vector double *);
+vector double vec_ldl (int, const double *);
+vector unsigned char vec_lvsl (int, const volatile double *);
+vector unsigned char vec_lvsr (int, const volatile double *);
 vector double vec_madd (vector double, vector double, vector double);
 vector double vec_max (vector double, vector double);
 vector double vec_min (vector double, vector double);
@@ -12382,6 +12388,8 @@ vector double vec_sel (vector double, ve
 vector double vec_sub (vector double, vector double);
 vector float vec_sqrt (vector float);
 vector double vec_sqrt (vector double);
+void vec_st (vector double, int, vector double *);
+void vec_st (vector double, int, double *);
 vector double vec_trunc (vector double);
 vector double vec_xor (vector double, vector double);
 vector double vec_xor (vector double, vector bool long);
@@ -12412,6 +12420,10 @@ int vec_any_nlt (vector double, vector d
 int vec_any_numeric (vector double);
 @end smallexample

+Note that the @samp{vec_ld} and @samp{vec_st} builtins will always
+generate the Altivec @samp{LVX} and @samp{STVX} instructions even
+if the VSX instruction set is available.
+
 GCC provides a few other builtins on Powerpc to access certain instructions:
 @smallexample
 float __builtin_recipdivf (float, float);
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 169112)
+++ libcpp/lex.c	(working copy)
@@ -547,6 +547,11 @@ search_line_fast (const uchar *s, const 
   const vc zero = { 0 };

   vc data, mask, t;
+  const uchar *unaligned_s = s;
+
+  /* While altivec loads mask addresses, we still need to align S so
+     that the offset we compute at the end is correct.  */
+  s = (const uchar *)((uintptr_t)s & -16);

   /* Altivec loads automatically mask addresses with -16.  This lets us
      issue the first load as early as possible.  */
@@ -555,15 +560,20 @@ search_line_fast (const uchar *s, const 
   /* Discard bytes before the beginning of the buffer.  Do this by
      beginning with all ones and shifting in zeros according to the
      mis-alignment.  The LVSR instruction pulls the exact shift we
-     want from the address.  */
-  mask = __builtin_vec_lvsr(0, s);
+     want from the address.
+
+     Originally, we used s in the lvsr and did the alignment afterwords, which
+     works on a system that supported just the Altivec instruction set using
+     the LVX instruction.  With the introduction of the VSX instruction, for
+     GCC 4.5, the load became LXVW4X.  LVX ignores the bottom 3 bits, and
+     LXVW4X does not.  While GCC 4.6 will revert vec_ld/vec_st to go back to
+     only produce Altivec instructions, the possibiliy exists that the stage1
+     compiler was built with a compiler that generated LXVW4X.  This code will
+     work on either system.  */
+  mask = __builtin_vec_lvsr(0, unaligned_s);
   mask = __builtin_vec_perm(zero, ones, mask);
   data &= mask;

-  /* While altivec loads mask addresses, we still need to align S so
-     that the offset we compute at the end is correct.  */
-  s = (const uchar *)((uintptr_t)s & -16);
-
   /* Main loop processing 16 bytes at a time.  */
   goto start;
   do

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-24 22:34 [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st Michael Meissner
@ 2011-01-24 22:46 ` Mark Mitchell
  2011-01-24 23:44   ` Michael Meissner
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Mark Mitchell @ 2011-01-24 22:46 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, rth, rguenther, jakub,
	berner, geoffk, joseph, pinskia, dominiq

On 1/24/2011 1:31 PM, Michael Meissner wrote:

> So after some debate within IBM, we've come to the conclusion that I should not
> have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that
> we should go back to using the Altivec form for these instructions

Can you explain why that's desirable?  I think that the first thing to
do is to convince ourselves that's technically desirable; if we can't do
that, then there's no need to think about whether to do it now or later.

My gut instinct is that having released 4.5, we should just live with
the semantics we now have; we've broken compatibility with some Altivec
code when compiled for Power 7, but breaking compatibility again seems
like it will just confuse things worse.

Thank you,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-24 22:46 ` Mark Mitchell
@ 2011-01-24 23:44   ` Michael Meissner
  2011-01-25  4:08     ` Mark Mitchell
  2011-01-31 21:00     ` Michael Meissner
  2011-01-25 10:52   ` Richard Guenther
  2011-01-26  1:19   ` Joseph S. Myers
  2 siblings, 2 replies; 14+ messages in thread
From: Michael Meissner @ 2011-01-24 23:44 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Michael Meissner, gcc-patches, dje.gcc, rth, rguenther, jakub,
	berner, geoffk, joseph, pinskia, dominiq

On Mon, Jan 24, 2011 at 01:34:50PM -0800, Mark Mitchell wrote:
> On 1/24/2011 1:31 PM, Michael Meissner wrote:
> 
> > So after some debate within IBM, we've come to the conclusion that I should not
> > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that
> > we should go back to using the Altivec form for these instructions
> 
> Can you explain why that's desirable?  I think that the first thing to
> do is to convince ourselves that's technically desirable; if we can't do
> that, then there's no need to think about whether to do it now or later.
> 
> My gut instinct is that having released 4.5, we should just live with
> the semantics we now have; we've broken compatibility with some Altivec
> code when compiled for Power 7, but breaking compatibility again seems
> like it will just confuse things worse.

Sure, if you have a program that is dealing with unaligned data, and you know
the machine ANDs out the bottom bits, you would write the code to handle the
initial bits using something like lex.c has.  At this point, I really can see
both sides (do we cater to the 4.4 users or the 4.5 users).

I suspect that if we wanted to do it automatically, we would need to see if
LVSR or similar instruction was used.  Or just provide an #ifdef, and make the
default the 4.5 behavior.

Here is the code in lex.c that knows about this alignment quirk:

static const uchar *
search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
{
  typedef __attribute__((altivec(vector))) unsigned char vc;

  /* ... */

  const vc ones = {
    -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1,
  };
  const vc zero = { 0 };

  vc data, mask, t;

  /* Altivec loads automatically mask addresses with -16.  This lets us
     issue the first load as early as possible.  */
  data = __builtin_vec_ld(0, (const vc *)s);

  /* Discard bytes before the beginning of the buffer.  Do this by
     beginning with all ones and shifting in zeros according to the
     mis-alignment.  The LVSR instruction pulls the exact shift we
     want from the address.  */
  mask = __builtin_vec_lvsr(0, s);
  mask = __builtin_vec_perm(zero, ones, mask);
  data &= mask;

  /* While altivec loads mask addresses, we still need to align S so
     that the offset we compute at the end is correct.  */
  s = (const uchar *)((uintptr_t)s & -16);

  /* Main loop processing 16 bytes at a time.  */
  goto start;
  do
    {
      vc m_nl, m_cr, m_bs, m_qm;

      s += 16;
      data = __builtin_vec_ld(0, (const vc *)s);

    /* ... */
    }
  while (!__builtin_vec_vcmpeq_p(/*__CR6_LT_REV*/3, t, zero));


-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-24 23:44   ` Michael Meissner
@ 2011-01-25  4:08     ` Mark Mitchell
  2011-01-31 21:00     ` Michael Meissner
  1 sibling, 0 replies; 14+ messages in thread
From: Mark Mitchell @ 2011-01-25  4:08 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, rth, rguenther, jakub,
	berner, geoffk, joseph, pinskia, dominiq

On 1/24/2011 1:52 PM, Michael Meissner wrote:

>> My gut instinct is that having released 4.5, we should just live with
>> the semantics we now have; we've broken compatibility with some Altivec
>> code when compiled for Power 7, but breaking compatibility again seems
>> like it will just confuse things worse.

> Sure, if you have a program that is dealing with unaligned data, and you know
> the machine ANDs out the bottom bits, you would write the code to handle the
> initial bits using something like lex.c has.  At this point, I really can see
> both sides (do we cater to the 4.4 users or the 4.5 users).

OK, right, I see that.

We keep learning how important backwards compatibility is to people.  We
really need to take that incredibly seriously going forward; we (and by
"we" I mean "I") have historically under-estimated how important that is.

In any case, here we are.  I think that having come this far, we might
as well just leave things as they are.  I don't think that we really
make things better by flip-flopping on semantics from release to
release; by now, some people have probably figured out that we changed,
and have some #ifdef somewhere to deal with it, and when we change back,
their #ifdef will break, and we'll lose again.

My two cents,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-24 22:46 ` Mark Mitchell
  2011-01-24 23:44   ` Michael Meissner
@ 2011-01-25 10:52   ` Richard Guenther
  2011-01-25 11:26     ` Jakub Jelinek
  2011-01-25 19:15     ` Mark Mitchell
  2011-01-26  1:19   ` Joseph S. Myers
  2 siblings, 2 replies; 14+ messages in thread
From: Richard Guenther @ 2011-01-25 10:52 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Michael Meissner, gcc-patches, dje.gcc, rth, jakub, berner,
	geoffk, joseph, pinskia, dominiq

On Mon, 24 Jan 2011, Mark Mitchell wrote:

> On 1/24/2011 1:31 PM, Michael Meissner wrote:
> 
> > So after some debate within IBM, we've come to the conclusion that I should not
> > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that
> > we should go back to using the Altivec form for these instructions
> 
> Can you explain why that's desirable?  I think that the first thing to
> do is to convince ourselves that's technically desirable; if we can't do
> that, then there's no need to think about whether to do it now or later.
> 
> My gut instinct is that having released 4.5, we should just live with
> the semantics we now have; we've broken compatibility with some Altivec
> code when compiled for Power 7, but breaking compatibility again seems
> like it will just confuse things worse.

I think we should revert to the pre-4.5 behavior and also fix 4.5.
Especially so if there are other compilers that follow the pre-4.5
behavior - are there?

Richard.

--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-25 10:52   ` Richard Guenther
@ 2011-01-25 11:26     ` Jakub Jelinek
  2011-01-25 19:15     ` Mark Mitchell
  1 sibling, 0 replies; 14+ messages in thread
From: Jakub Jelinek @ 2011-01-25 11:26 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mark Mitchell, Michael Meissner, gcc-patches, dje.gcc, rth,
	jakub, berner, geoffk, joseph, pinskia, dominiq

On Tue, Jan 25, 2011 at 11:15:25AM +0100, Richard Guenther wrote:
> On Mon, 24 Jan 2011, Mark Mitchell wrote:
> > > So after some debate within IBM, we've come to the conclusion that I should not
> > > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that
> > > we should go back to using the Altivec form for these instructions
> > 
> > Can you explain why that's desirable?  I think that the first thing to
> > do is to convince ourselves that's technically desirable; if we can't do
> > that, then there's no need to think about whether to do it now or later.
> > 
> > My gut instinct is that having released 4.5, we should just live with
> > the semantics we now have; we've broken compatibility with some Altivec
> > code when compiled for Power 7, but breaking compatibility again seems
> > like it will just confuse things worse.
> 
> I think we should revert to the pre-4.5 behavior and also fix 4.5.

Yeah, I agree.  Especially when it is compatible not just with pre-4.5,
but also with 4.5 with -mno-vsx.  Changing behavior of an intrinsic
depending on what CPU variant it is targetting is bad, for the VSX
load/store behavior there should be a different intrinsic instead.

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-25 10:52   ` Richard Guenther
  2011-01-25 11:26     ` Jakub Jelinek
@ 2011-01-25 19:15     ` Mark Mitchell
  2011-01-26 10:52       ` Richard Guenther
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Mitchell @ 2011-01-25 19:15 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Michael Meissner, gcc-patches, dje.gcc, rth, jakub, berner,
	geoffk, joseph, pinskia, dominiq

On 1/25/2011 2:15 AM, Richard Guenther wrote:

> I think we should revert to the pre-4.5 behavior and also fix 4.5.
> Especially so if there are other compilers that follow the pre-4.5
> behavior - are there?

You're suggesting changing 4.5 so that 4.5.N is incompatible with
previous 4.5 releases?  That seems really unfortunate to me; we're now
introducing compatibility problems in what's supposed to be a bug-fix
release.  To justify that, we really have to decide that this was a
blatant bug.  But, it wasn't; it was a reasonable choice given the
hardware.  In retrospect, not the best choice, but a reasonable choice.

I'm certainly willing to bow to a consensus opinion here.  If the Power
maintainers have consensus that this is the right way to go, then I
would respect that.

Thank you,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-24 22:46 ` Mark Mitchell
  2011-01-24 23:44   ` Michael Meissner
  2011-01-25 10:52   ` Richard Guenther
@ 2011-01-26  1:19   ` Joseph S. Myers
  2 siblings, 0 replies; 14+ messages in thread
From: Joseph S. Myers @ 2011-01-26  1:19 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Michael Meissner, gcc-patches, dje.gcc, rth, rguenther, jakub,
	berner, geoffk, pinskia, dominiq

On Mon, 24 Jan 2011, Mark Mitchell wrote:

> On 1/24/2011 1:31 PM, Michael Meissner wrote:
> 
> > So after some debate within IBM, we've come to the conclusion that I should not
> > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that
> > we should go back to using the Altivec form for these instructions
> 
> Can you explain why that's desirable?  I think that the first thing to
> do is to convince ourselves that's technically desirable; if we can't do
> that, then there's no need to think about whether to do it now or later.
> 
> My gut instinct is that having released 4.5, we should just live with
> the semantics we now have; we've broken compatibility with some Altivec
> code when compiled for Power 7, but breaking compatibility again seems
> like it will just confuse things worse.

First, the semantics of vec_ld and vec_st (the altivec.h intrinsics, as 
opposed to the built-in functions) are documented in the AltiVec PIM.  
Both my copies (I have one PDF with a Motorola logo and one with a 
Freescale logo) explicitly say for vec_ld:

    Each operation performs a 16-byte load at a 16-byte aligned address. 
    The a is taken to be an integer value, while b is a pointer. 
    BoundAlign(a+b,16) is the largest value less than or equal to a + b 
    that is a multiple of 16.

with similar wording for vec_st.  Thus, I would say the semantics of these 
intrinsics for unaligned operations are well-defined, without reference to 
particular instructions, and breaking them for a particular -mcpu should 
be treated much the same as if we (for example) broke integer division for 
some -mcpu where it had previously followed C99 semantics: it is a 
wrong-code regression and should be fixed as such.

As for the built-in functions: we do of course document that they should 
not be used directly.  I'm not sure there's a particular reason this code 
uses them, and indeed PR 45381 has a patch to use altivec.h instead, 
though it's reported as not working to fix the underlying problem there 
(AltiVec build not working with Apple 4.0 compiler).  I don't think 
there's any good reason for them to deviate from the AltiVec PIM semantics 
even if they weren't expressly documented to keep to those semantics, and 
making vec_ld and __builtin-vec_ld different would certainly be nasty for 
the altivec.h implementation.

Unlike __sync_fetch_and_nand and __sync_nand_and_fetch, where there was a 
long history of consistently implementing semantics different from those 
intended, and where we added a warning with -Wsync-nand (enabled by 
default), I do not think a warning is needed here, where the broken 
semantics were only for a limited period with particular options.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-25 19:15     ` Mark Mitchell
@ 2011-01-26 10:52       ` Richard Guenther
  2011-01-26 15:45         ` David Edelsohn
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Guenther @ 2011-01-26 10:52 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Michael Meissner, gcc-patches, dje.gcc, rth, jakub, berner,
	geoffk, joseph, pinskia, dominiq

On Tue, 25 Jan 2011, Mark Mitchell wrote:

> On 1/25/2011 2:15 AM, Richard Guenther wrote:
> 
> > I think we should revert to the pre-4.5 behavior and also fix 4.5.
> > Especially so if there are other compilers that follow the pre-4.5
> > behavior - are there?
> 
> You're suggesting changing 4.5 so that 4.5.N is incompatible with
> previous 4.5 releases?  That seems really unfortunate to me; we're now
> introducing compatibility problems in what's supposed to be a bug-fix
> release.  To justify that, we really have to decide that this was a
> blatant bug.  But, it wasn't; it was a reasonable choice given the
> hardware.  In retrospect, not the best choice, but a reasonable choice.

I see it as a blatant bug as it appearantly breaks code that was
perfectly working with previous GCC releases.

> I'm certainly willing to bow to a consensus opinion here.  If the Power
> maintainers have consensus that this is the right way to go, then I
> would respect that.

Of course - it's up to the Power maintainers to decide what to do.

Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-26 10:52       ` Richard Guenther
@ 2011-01-26 15:45         ` David Edelsohn
  2011-01-26 16:36           ` Mark Mitchell
  0 siblings, 1 reply; 14+ messages in thread
From: David Edelsohn @ 2011-01-26 15:45 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mark Mitchell, Michael Meissner, gcc-patches, rth, jakub, berner,
	geoffk, joseph, pinskia, dominiq

On Wed, Jan 26, 2011 at 5:21 AM, Richard Guenther <rguenther@suse.de> wrote:
> On Tue, 25 Jan 2011, Mark Mitchell wrote:
>
>> On 1/25/2011 2:15 AM, Richard Guenther wrote:
>>
>> > I think we should revert to the pre-4.5 behavior and also fix 4.5.
>> > Especially so if there are other compilers that follow the pre-4.5
>> > behavior - are there?
>>
>> You're suggesting changing 4.5 so that 4.5.N is incompatible with
>> previous 4.5 releases?  That seems really unfortunate to me; we're now
>> introducing compatibility problems in what's supposed to be a bug-fix
>> release.  To justify that, we really have to decide that this was a
>> blatant bug.  But, it wasn't; it was a reasonable choice given the
>> hardware.  In retrospect, not the best choice, but a reasonable choice.
>
> I see it as a blatant bug as it appearantly breaks code that was
> perfectly working with previous GCC releases.
>
>> I'm certainly willing to bow to a consensus opinion here.  If the Power
>> maintainers have consensus that this is the right way to go, then I
>> would respect that.
>
> Of course - it's up to the Power maintainers to decide what to do.

I agree that the POWER port must revert back to the Altivec/VMX
meaning of vec_ld/vec_st and backport the fix to the GCC 4.5 branch.

- David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-26 15:45         ` David Edelsohn
@ 2011-01-26 16:36           ` Mark Mitchell
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Mitchell @ 2011-01-26 16:36 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Richard Guenther, Michael Meissner, gcc-patches, rth, jakub,
	berner, geoffk, joseph, pinskia, dominiq

On 1/26/2011 7:27 AM, David Edelsohn wrote:

>> Of course - it's up to the Power maintainers to decide what to do.
> 
> I agree that the POWER port must revert back to the Altivec/VMX
> meaning of vec_ld/vec_st and backport the fix to the GCC 4.5 branch.

OK, I think we've reached pretty overwhelming consensus.  For avoidance
of doubt, I'm fine with that conclusion.

Thank you,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-24 23:44   ` Michael Meissner
  2011-01-25  4:08     ` Mark Mitchell
@ 2011-01-31 21:00     ` Michael Meissner
  2011-02-02 21:08       ` David Edelsohn
  1 sibling, 1 reply; 14+ messages in thread
From: Michael Meissner @ 2011-01-31 21:00 UTC (permalink / raw)
  To: Michael Meissner, Mark Mitchell, gcc-patches, dje.gcc, rth,
	rguenther, jakub, berner, geoffk, joseph, pinskia, dominiq

[-- Attachment #1: Type: text/plain, Size: 4231 bytes --]

Here are my latest patches to fix the problem.  It does give the user the
ability to use the VSX instructions that they had with GCC 4.5 in case there
was new code that used vec_ld/vec_st incorrectly by using the new vec_vsx_ld
and vec_vsx_st functions in altivec.h.  At present, I have not added an #ifdef
so the user could switch to the GCC 4.5 behavior, but I could do that if
desired.

I noticed that the new vector types weren't supported by vec_ld/vec_st, so I
added them.

I'm including the libcpp/lex.c patch that allows the compiler to be built with
GCC 4.5 using CFLAGS='-mcpu=power7 -O2 -g'.

I also included 3 test suite fixes in this patch.

I just did a side by side build of unpatched GCC 4.6 without special options,
and one with these patches and adding --with-cpu=power7.  I'm seeing the
following regressions:

gcc.dg/pr41551.c			(64-bit only, unrecognized insn)
gcc.dg/pr42461.c			(Peter Bergner has a fix)
gcc.dg/pr46909.c
gcc.dg/sms-3.c				(64-bit only, both fail on 32-bit)
gcc.dg/stack-usage-1.c			(32-bit only)
gcc.c-torture/execute/20050121-1.c	(32-bit only, unrecognized insn)

I'm seeing passes in:

gcc.dg/torture/va-arg-25.c
gcc.dg/torture/vector-1.c
gcc.dg/torture/vector-2.c
c-c++-common/dfp/pr35620.c
gcc.target/powerpc/ppc64-abi-dfp-1.c

Are these patches ok to install?

[gcc]
2011-01-28  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/47272
	* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
	Document using vector double with the load/store builtins, and
	that the load/store builtins always use Altivec instructions.

	* config/rs6000/vector.md (vector_altivec_load_<mode>): New insns
	to use altivec memory instructions, even on VSX.
	(vector_altivec_store_<mode>): Ditto.

	* config/rs6000/rs6000-protos.h (rs6000_address_for_altivec): New
	function.

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	V2DF, V2DI support to load/store overloaded builtins.

	* config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_*): Add
	altivec load/store builtins for V2DF/V2DI types.

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Don't
	set avoid indexed addresses on power6 if -maltivec.
	(altivec_expand_ld_builtin): Add V2DF, V2DI support, use
	vector_altivec_load/vector_altivec_store builtins.
	(altivec_expand_st_builtin): Ditto.
	(altivec_expand_builtin): Add VSX memory builtins.
	(rs6000_init_builtins): Add V2DI types to internal types.
	(altivec_init_builtins): Add support for V2DF/V2DI altivec
	load/store builtins.
	(rs6000_address_for_altivec): Insure memory address is appropriate
	for Altivec.

	* config/rs6000/vsx.md (vsx_load_<mode>): New expanders for
	vec_vsx_ld and vec_vsx_st.
	(vsx_store_<mode>): Ditto.

	* config/rs6000/rs6000.h (RS6000_BTI_long_long): New type
	variables to hold long long types for VSX vector memory builtins.
	(RS6000_BTI_unsigned_long_long): Ditto.
	(long_long_integer_type_internal_node): Ditti.
	(long_long_unsigned_type_internal_node): Ditti.

	* config/rs6000/altivec.md (UNSPEC_LVX): New UNSPEC.
	(altivec_lvx_<mode>): Make altivec_lvx use a mode iterator.
	(altivec_stvx_<mode>): Make altivec_stvx use a mode iterator.

	* config/rs6000/altivec.h (vec_vsx_ld): Define VSX memory builtin
	short cuts.
	(vec_vsx_st): Ditto.

[libcpp]
2011-01-24  Peter Bergner  <bergner@vnet.ibm.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/47272
	* lex.c (search_line_fast): Work with compilers that generate
	either LXVW4X or LVX for vec_ld.

[gcc/testsuite]
2011-01-28  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/47272
	* gcc.target/powerpc/vsx-builtin-8.c: New file, test vec_vsx_ld
	and vec_vsx_st.

	* gcc.target/powerpc/avoid-indexed-addresses.c: Disable altivec
	and vsx so a default --with-cpu=power7 doesn't give an error
	when -mavoid-indexed-addresses is used.

	* gcc.target/powerpc/ppc32-abi-dfp-1.c: Rewrite to use an asm
	wrapper function to save the arguments and then jump to the real
	function, rather than depending on the compiler not to move stuff
	before an asm.
	* gcc.target/powerpc/ppc64-abi-dfp-2.c: Ditto.


-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

[-- Attachment #2: gcc-power7.patch208b --]
[-- Type: text/plain, Size: 78161 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 169441)
+++ gcc/doc/extend.texi	(working copy)
@@ -12359,6 +12359,12 @@ vector bool long vec_cmplt (vector doubl
 vector float vec_div (vector float, vector float);
 vector double vec_div (vector double, vector double);
 vector double vec_floor (vector double);
+vector double vec_ld (int, const vector double *);
+vector double vec_ld (int, const double *);
+vector double vec_ldl (int, const vector double *);
+vector double vec_ldl (int, const double *);
+vector unsigned char vec_lvsl (int, const volatile double *);
+vector unsigned char vec_lvsr (int, const volatile double *);
 vector double vec_madd (vector double, vector double, vector double);
 vector double vec_max (vector double, vector double);
 vector double vec_min (vector double, vector double);
@@ -12387,6 +12393,8 @@ vector double vec_sel (vector double, ve
 vector double vec_sub (vector double, vector double);
 vector float vec_sqrt (vector float);
 vector double vec_sqrt (vector double);
+void vec_st (vector double, int, vector double *);
+void vec_st (vector double, int, double *);
 vector double vec_trunc (vector double);
 vector double vec_xor (vector double, vector double);
 vector double vec_xor (vector double, vector bool long);
@@ -12415,7 +12423,65 @@ int vec_any_ngt (vector double, vector d
 int vec_any_nle (vector double, vector double);
 int vec_any_nlt (vector double, vector double);
 int vec_any_numeric (vector double);
-@end smallexample
+
+vector double vec_vsx_ld (int, const vector double *);
+vector double vec_vsx_ld (int, const double *);
+vector float vec_vsx_ld (int, const vector float *);
+vector float vec_vsx_ld (int, const float *);
+vector bool int vec_vsx_ld (int, const vector bool int *);
+vector signed int vec_vsx_ld (int, const vector signed int *);
+vector signed int vec_vsx_ld (int, const int *);
+vector signed int vec_vsx_ld (int, const long *);
+vector unsigned int vec_vsx_ld (int, const vector unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned long *);
+vector bool short vec_vsx_ld (int, const vector bool short *);
+vector pixel vec_vsx_ld (int, const vector pixel *);
+vector signed short vec_vsx_ld (int, const vector signed short *);
+vector signed short vec_vsx_ld (int, const short *);
+vector unsigned short vec_vsx_ld (int, const vector unsigned short *);
+vector unsigned short vec_vsx_ld (int, const unsigned short *);
+vector bool char vec_vsx_ld (int, const vector bool char *);
+vector signed char vec_vsx_ld (int, const vector signed char *);
+vector signed char vec_vsx_ld (int, const signed char *);
+vector unsigned char vec_vsx_ld (int, const vector unsigned char *);
+vector unsigned char vec_vsx_ld (int, const unsigned char *);
+
+void vec_vsx_st (vector double, int, vector double *);
+void vec_vsx_st (vector double, int, double *);
+void vec_vsx_st (vector float, int, vector float *);
+void vec_vsx_st (vector float, int, float *);
+void vec_vsx_st (vector signed int, int, vector signed int *);
+void vec_vsx_st (vector signed int, int, int *);
+void vec_vsx_st (vector unsigned int, int, vector unsigned int *);
+void vec_vsx_st (vector unsigned int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, vector bool int *);
+void vec_vsx_st (vector bool int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, int *);
+void vec_vsx_st (vector signed short, int, vector signed short *);
+void vec_vsx_st (vector signed short, int, short *);
+void vec_vsx_st (vector unsigned short, int, vector unsigned short *);
+void vec_vsx_st (vector unsigned short, int, unsigned short *);
+void vec_vsx_st (vector bool short, int, vector bool short *);
+void vec_vsx_st (vector bool short, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, vector pixel *);
+void vec_vsx_st (vector pixel, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, short *);
+void vec_vsx_st (vector bool short, int, short *);
+void vec_vsx_st (vector signed char, int, vector signed char *);
+void vec_vsx_st (vector signed char, int, signed char *);
+void vec_vsx_st (vector unsigned char, int, vector unsigned char *);
+void vec_vsx_st (vector unsigned char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, vector bool char *);
+void vec_vsx_st (vector bool char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, signed char *);
+@end smallexample
+
+Note that the @samp{vec_ld} and @samp{vec_st} builtins will always
+generate the Altivec @samp{LVX} and @samp{STVX} instructions even
+if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
+@samp{vec_vsx_st} builtins will always generate the VSX @samp{LXVD2X},
+@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
 GCC provides a few other builtins on Powerpc to access certain instructions:
 @smallexample
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-8.c	(revision 0)
@@ -0,0 +1,97 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O3 -mcpu=power7" } */
+
+/* Test the various load/store varients.  */
+
+#include <altivec.h>
+
+#define TEST_COPY(NAME, TYPE)						\
+void NAME ## _copy_native (vector TYPE *a, vector TYPE *b)		\
+{									\
+  *a = *b;								\
+}									\
+									\
+void NAME ## _copy_vec (vector TYPE *a, vector TYPE *b)			\
+{									\
+  vector TYPE x = vec_ld (0, b);					\
+  vec_st (x, 0, a);							\
+}									\
+
+#define TEST_COPYL(NAME, TYPE)						\
+void NAME ## _lvxl (vector TYPE *a, vector TYPE *b)			\
+{									\
+  vector TYPE x = vec_ldl (0, b);					\
+  vec_stl (x, 0, a);							\
+}									\
+
+#define TEST_VSX_COPY(NAME, TYPE)					\
+void NAME ## _copy_vsx (vector TYPE *a, vector TYPE *b)			\
+{									\
+  vector TYPE x = vec_vsx_ld (0, b);					\
+  vec_vsx_st (x, 0, a);							\
+}									\
+
+#define TEST_ALIGN(NAME, TYPE)						\
+void NAME ## _align (vector unsigned char *a, TYPE *b)			\
+{									\
+  vector unsigned char x = vec_lvsl (0, b);				\
+  vector unsigned char y = vec_lvsr (0, b);				\
+  vec_st (x, 0, a);							\
+  vec_st (y, 8, a);							\
+}
+
+#ifndef NO_COPY
+TEST_COPY(uchar,  unsigned char)
+TEST_COPY(schar,  signed   char)
+TEST_COPY(bchar,  bool     char)
+TEST_COPY(ushort, unsigned short)
+TEST_COPY(sshort, signed   short)
+TEST_COPY(bshort, bool     short)
+TEST_COPY(uint,   unsigned int)
+TEST_COPY(sint,   signed   int)
+TEST_COPY(bint,   bool     int)
+TEST_COPY(float,  float)
+TEST_COPY(double, double)
+#endif	/* NO_COPY */
+
+#ifndef NO_COPYL
+TEST_COPYL(uchar,  unsigned char)
+TEST_COPYL(schar,  signed   char)
+TEST_COPYL(bchar,  bool     char)
+TEST_COPYL(ushort, unsigned short)
+TEST_COPYL(sshort, signed   short)
+TEST_COPYL(bshort, bool     short)
+TEST_COPYL(uint,   unsigned int)
+TEST_COPYL(sint,   signed   int)
+TEST_COPYL(bint,   bool     int)
+TEST_COPYL(float,  float)
+TEST_COPYL(double, double)
+#endif	/* NO_COPYL */
+
+#ifndef NO_ALIGN
+TEST_ALIGN(uchar,  unsigned char)
+TEST_ALIGN(schar,  signed   char)
+TEST_ALIGN(ushort, unsigned short)
+TEST_ALIGN(sshort, signed   short)
+TEST_ALIGN(uint,   unsigned int)
+TEST_ALIGN(sint,   signed   int)
+TEST_ALIGN(float,  float)
+TEST_ALIGN(double, double)
+#endif	/* NO_ALIGN */
+
+
+#ifndef NO_VSX_COPY
+TEST_VSX_COPY(uchar,  unsigned char)
+TEST_VSX_COPY(schar,  signed   char)
+TEST_VSX_COPY(bchar,  bool     char)
+TEST_VSX_COPY(ushort, unsigned short)
+TEST_VSX_COPY(sshort, signed   short)
+TEST_VSX_COPY(bshort, bool     short)
+TEST_VSX_COPY(uint,   unsigned int)
+TEST_VSX_COPY(sint,   signed   int)
+TEST_VSX_COPY(bint,   bool     int)
+TEST_VSX_COPY(float,  float)
+TEST_VSX_COPY(double, double)
+#endif	/* NO_VSX_COPY */
Index: gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c	(revision 169441)
+++ gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do run { target { powerpc64-*-* && { lp64 && dfprt } } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-options "-std=gnu99 -O2 -fno-strict-aliasing" } */
 
 /* Testcase to check for ABI compliance of parameter passing
@@ -31,60 +32,42 @@ typedef struct
 reg_parms_t gparms;
 
 
-/* Testcase could break on future gcc's, if parameter regs
-   are changed before this asm.  */
-
-#ifndef __MACH__
-#define save_parms(lparms)				\
-    asm volatile ("ld 11,gparms@got(2)\n\t"                \
-                  "std 3,0(11)\n\t"		        \
-	          "std 4,8(11)\n\t"			\
-	          "std 5,16(11)\n\t"			\
-	          "std 6,24(11)\n\t"			\
-	          "std 7,32(11)\n\t"			\
-	          "std 8,40(11)\n\t"			\
-	          "std 9,48(11)\n\t"			\
-	          "std 10,56(11)\n\t"			\
-                  "stfd 1,64(11)\n\t"			\
-	          "stfd 2,72(11)\n\t"			\
-	          "stfd 3,80(11)\n\t"			\
-	          "stfd 4,88(11)\n\t"			\
-	          "stfd 5,96(11)\n\t"			\
-	          "stfd 6,104(11)\n\t"			\
-	          "stfd 7,112(11)\n\t"			\
-	          "stfd 8,120(11)\n\t"			\
-	          "stfd 9,128(11)\n\t"			\
-	          "stfd 10,136(11)\n\t"			\
-	          "stfd 11,144(11)\n\t"			\
-	          "stfd 12,152(11)\n\t"                 \
-	          "stfd 13,160(11)\n\t":::"11", "memory");  \
-                  lparms = gparms;
-#else
-#define save_parms(lparms)				\
-    asm volatile ("ld r11,gparms@got(r2)\n\t"           \
-                  "std r3,0(r11)\n\t"		        \
-	          "std r4,8(r11)\n\t"			\
-	          "std r5,16(r11)\n\t"			\
-	          "std r6,24(r11)\n\t"			\
-	          "std r7,32(r11)\n\t"			\
-	          "std r8,40(r11)\n\t"			\
-	          "std r9,48(r11)\n\t"			\
-	          "std r10,56(r11)\n\t"                 \
-                  "stfd f1,64(r11)\n\t"		        \
-	          "stfd f2,72(r11)\n\t"			\
-	          "stfd f3,80(r11)\n\t"			\
-	          "stfd f4,88(r11)\n\t"			\
-	          "stfd f5,96(r11)\n\t"			\
-	          "stfd f6,104(r11)\n\t"		\
-	          "stfd f7,112(r11)\n\t"		\
-	          "stfd f8,120(r11)\n\t"		\
-	          "stfd f9,128(r11)\n\t"		\
-	          "stfd f10,136(r11)\n\t"		\
-	          "stfd f11,144(r11)\n\t"		\
-	          "stfd f12,152(r11)\n\t"               \
-	          "stfd f13,160(r11)\n\t":::"r11", "memory");  \
-                  lparms = gparms;
-#endif
+/* Wrapper to save the GPRs and FPRs and then jump to the real function.  */
+#define WRAPPER(NAME)							\
+__asm__ ("\t.globl\t" #NAME "_asm\n\t"					\
+	 ".section \".opd\",\"aw\"\n\t"					\
+	 ".align 3\n"							\
+	 #NAME "_asm:\n\t"						\
+	 ".quad .L." #NAME "_asm,.TOC.@tocbase,0\n\t"			\
+	 ".text\n\t"							\
+	 ".type " #NAME "_asm, @function\n"				\
+	 ".L." #NAME "_asm:\n\t"					\
+	 "ld 11,gparms@got(2)\n\t"					\
+	 "std 3,0(11)\n\t"						\
+	 "std 4,8(11)\n\t"						\
+	 "std 5,16(11)\n\t"						\
+	 "std 6,24(11)\n\t"						\
+	 "std 7,32(11)\n\t"						\
+	 "std 8,40(11)\n\t"						\
+	 "std 9,48(11)\n\t"						\
+	 "std 10,56(11)\n\t"						\
+	 "stfd 1,64(11)\n\t"						\
+	 "stfd 2,72(11)\n\t"						\
+	 "stfd 3,80(11)\n\t"						\
+	 "stfd 4,88(11)\n\t"						\
+	 "stfd 5,96(11)\n\t"						\
+	 "stfd 6,104(11)\n\t"						\
+	 "stfd 7,112(11)\n\t"						\
+	 "stfd 8,120(11)\n\t"						\
+	 "stfd 9,128(11)\n\t"						\
+	 "stfd 10,136(11)\n\t"						\
+	 "stfd 11,144(11)\n\t"						\
+	 "stfd 12,152(11)\n\t"						\
+	 "stfd 13,160(11)\n\t"						\
+	 "b " #NAME "\n\t"						\
+	 ".long 0\n\t"							\
+	 ".byte 0,0,0,0,0,0,0,0\n\t"					\
+	 ".size " #NAME ",.-" #NAME "\n")
 
 typedef struct sf
 {
@@ -97,6 +80,13 @@ typedef struct sf
   unsigned long slot[100];
 } stack_frame_t;
 
+extern void func0_asm (double, double, double, double, double, double,
+		       double, double, double, double, double, double,
+		       double, double, 
+		       _Decimal64, _Decimal128, _Decimal64);
+
+WRAPPER(func0);
+
 /* Fill up floating point registers with double arguments, forcing
    decimal float arguments into the parameter save area.  */
 void __attribute__ ((noinline))
@@ -105,186 +95,209 @@ func0 (double a1, double a2, double a3, 
        double a13, double a14, 
        _Decimal64 a15, _Decimal128 a16, _Decimal64 a17)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
-  if (a9 != lparms.fprs[8]) FAILURE
-  if (a10 != lparms.fprs[9]) FAILURE
-  if (a11 != lparms.fprs[10]) FAILURE
-  if (a12 != lparms.fprs[11]) FAILURE
-  if (a13 != lparms.fprs[12]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
+  if (a9 != gparms.fprs[8]) FAILURE
+  if (a10 != gparms.fprs[9]) FAILURE
+  if (a11 != gparms.fprs[10]) FAILURE
+  if (a12 != gparms.fprs[11]) FAILURE
+  if (a13 != gparms.fprs[12]) FAILURE
   if (a14 != *(double *)&sp->slot[13]) FAILURE
   if (a15 != *(_Decimal64 *)&sp->slot[14]) FAILURE
   if (a16 != *(_Decimal128 *)&sp->slot[15]) FAILURE
   if (a17 != *(_Decimal64 *)&sp->slot[17]) FAILURE
 }
 
+extern void func1_asm (double, double, double, double, double, double,
+		       double, double, double, double, double, double,
+		       double, _Decimal128 );
+
+WRAPPER(func1);
+
 void __attribute__ ((noinline))
 func1 (double a1, double a2, double a3, double a4, double a5, double a6,
        double a7, double a8, double a9, double a10, double a11, double a12,
        double a13, _Decimal128 a14)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
-  if (a9 != lparms.fprs[8]) FAILURE
-  if (a10 != lparms.fprs[9]) FAILURE
-  if (a11 != lparms.fprs[10]) FAILURE
-  if (a12 != lparms.fprs[11]) FAILURE
-  if (a13 != lparms.fprs[12]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
+  if (a9 != gparms.fprs[8]) FAILURE
+  if (a10 != gparms.fprs[9]) FAILURE
+  if (a11 != gparms.fprs[10]) FAILURE
+  if (a12 != gparms.fprs[11]) FAILURE
+  if (a13 != gparms.fprs[12]) FAILURE
   if (a14 != *(_Decimal128 *)&sp->slot[13]) FAILURE
 }
 
+extern void func2_asm (double, double, double, double, double, double,
+		       double, double, double, double, double, double,
+		       _Decimal128);
+
+WRAPPER(func2);
+
 void __attribute__ ((noinline))
 func2 (double a1, double a2, double a3, double a4, double a5, double a6,
        double a7, double a8, double a9, double a10, double a11, double a12,
        _Decimal128 a13)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
-  if (a9 != lparms.fprs[8]) FAILURE
-  if (a10 != lparms.fprs[9]) FAILURE
-  if (a11 != lparms.fprs[10]) FAILURE
-  if (a12 != lparms.fprs[11]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
+  if (a9 != gparms.fprs[8]) FAILURE
+  if (a10 != gparms.fprs[9]) FAILURE
+  if (a11 != gparms.fprs[10]) FAILURE
+  if (a12 != gparms.fprs[11]) FAILURE
   if (a13 != *(_Decimal128 *)&sp->slot[12]) FAILURE
 }
 
+extern void func3_asm (_Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64, _Decimal128);
+
+WRAPPER(func3);
+
 void __attribute__ ((noinline))
 func3 (_Decimal64 a1, _Decimal128 a2, _Decimal64 a3, _Decimal128 a4,
        _Decimal64 a5, _Decimal128 a6, _Decimal64 a7, _Decimal128 a8,
        _Decimal64 a9, _Decimal128 a10)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal64 *)&lparms.fprs[0]) FAILURE	/* f1        */
-  if (a2 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3   */
-  if (a3 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4        */
-  if (a4 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7   */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8        */
-  if (a6 != *(_Decimal128 *)&lparms.fprs[9]) FAILURE	/* f10 & f11 */
-  if (a7 != *(_Decimal64 *)&lparms.fprs[11]) FAILURE	/* f12       */
+  if (a1 != *(_Decimal64 *)&gparms.fprs[0]) FAILURE	/* f1        */
+  if (a2 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3   */
+  if (a3 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4        */
+  if (a4 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7   */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8        */
+  if (a6 != *(_Decimal128 *)&gparms.fprs[9]) FAILURE	/* f10 & f11 */
+  if (a7 != *(_Decimal64 *)&gparms.fprs[11]) FAILURE	/* f12       */
   if (a8 != *(_Decimal128 *)&sp->slot[10]) FAILURE
   if (a9 != *(_Decimal64 *)&sp->slot[12]) FAILURE
   if (a10 != *(_Decimal128 *)&sp->slot[13]) FAILURE
 }
 
+extern void func4_asm (_Decimal128, _Decimal64, _Decimal128, _Decimal64,
+		       _Decimal128, _Decimal64, _Decimal128, _Decimal64);
+
+WRAPPER(func4);
+
 void __attribute__ ((noinline))
 func4 (_Decimal128 a1, _Decimal64 a2, _Decimal128 a3, _Decimal64 a4,
        _Decimal128 a5, _Decimal64 a6, _Decimal128 a7, _Decimal64 a8)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3   */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4        */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7   */
-  if (a4 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8        */
-  if (a5 != *(_Decimal128 *)&lparms.fprs[9]) FAILURE	/* f10 & f11 */
-  if (a6 != *(_Decimal64 *)&lparms.fprs[11]) FAILURE	/* f12       */
+  if (a1 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3   */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4        */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7   */
+  if (a4 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8        */
+  if (a5 != *(_Decimal128 *)&gparms.fprs[9]) FAILURE	/* f10 & f11 */
+  if (a6 != *(_Decimal64 *)&gparms.fprs[11]) FAILURE	/* f12       */
   if (a7 != *(_Decimal128 *)&sp->slot[9]) FAILURE
   if (a8 != *(_Decimal64 *)&sp->slot[11]) FAILURE
 }
 
+extern void func5_asm (_Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32);
+
+WRAPPER(func5);
+
 void __attribute__ ((noinline))
 func5 (_Decimal32 a1, _Decimal32 a2, _Decimal32 a3, _Decimal32 a4,
        _Decimal32 a5, _Decimal32 a6, _Decimal32 a7, _Decimal32 a8,
        _Decimal32 a9, _Decimal32 a10, _Decimal32 a11, _Decimal32 a12,
        _Decimal32 a13, _Decimal32 a14, _Decimal32 a15, _Decimal32 a16)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
   /* _Decimal32 is passed in the lower half of an FPR or parameter slot.  */
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1  */
-  if (a2 != ((d32parm_t *)&lparms.fprs[1])->d) FAILURE		/* f2  */
-  if (a3 != ((d32parm_t *)&lparms.fprs[2])->d) FAILURE		/* f3  */
-  if (a4 != ((d32parm_t *)&lparms.fprs[3])->d) FAILURE		/* f4  */
-  if (a5 != ((d32parm_t *)&lparms.fprs[4])->d) FAILURE		/* f5  */
-  if (a6 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6  */
-  if (a7 != ((d32parm_t *)&lparms.fprs[6])->d) FAILURE		/* f7  */
-  if (a8 != ((d32parm_t *)&lparms.fprs[7])->d) FAILURE		/* f8  */
-  if (a9 != ((d32parm_t *)&lparms.fprs[8])->d) FAILURE		/* f9  */
-  if (a10 != ((d32parm_t *)&lparms.fprs[9])->d) FAILURE		/* f10 */
-  if (a11 != ((d32parm_t *)&lparms.fprs[10])->d) FAILURE	/* f11 */
-  if (a12 != ((d32parm_t *)&lparms.fprs[11])->d) FAILURE	/* f12 */
-  if (a13 != ((d32parm_t *)&lparms.fprs[12])->d) FAILURE	/* f13 */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1  */
+  if (a2 != ((d32parm_t *)&gparms.fprs[1])->d) FAILURE		/* f2  */
+  if (a3 != ((d32parm_t *)&gparms.fprs[2])->d) FAILURE		/* f3  */
+  if (a4 != ((d32parm_t *)&gparms.fprs[3])->d) FAILURE		/* f4  */
+  if (a5 != ((d32parm_t *)&gparms.fprs[4])->d) FAILURE		/* f5  */
+  if (a6 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6  */
+  if (a7 != ((d32parm_t *)&gparms.fprs[6])->d) FAILURE		/* f7  */
+  if (a8 != ((d32parm_t *)&gparms.fprs[7])->d) FAILURE		/* f8  */
+  if (a9 != ((d32parm_t *)&gparms.fprs[8])->d) FAILURE		/* f9  */
+  if (a10 != ((d32parm_t *)&gparms.fprs[9])->d) FAILURE		/* f10 */
+  if (a11 != ((d32parm_t *)&gparms.fprs[10])->d) FAILURE	/* f11 */
+  if (a12 != ((d32parm_t *)&gparms.fprs[11])->d) FAILURE	/* f12 */
+  if (a13 != ((d32parm_t *)&gparms.fprs[12])->d) FAILURE	/* f13 */
   if (a14 != ((d32parm_t *)&sp->slot[13])->d) FAILURE
   if (a15 != ((d32parm_t *)&sp->slot[14])->d) FAILURE
   if (a16 != ((d32parm_t *)&sp->slot[15])->d) FAILURE
 }
 
+extern void func6_asm (_Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128);
+
+WRAPPER(func6);
+
 void __attribute__ ((noinline))
 func6 (_Decimal32 a1, _Decimal64 a2, _Decimal128 a3,
        _Decimal32 a4, _Decimal64 a5, _Decimal128 a6,
        _Decimal32 a7, _Decimal64 a8, _Decimal128 a9,
        _Decimal32 a10, _Decimal64 a11, _Decimal128 a12)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1        */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[1]) FAILURE		/* f2        */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[3]) FAILURE		/* f4 & f5   */
-  if (a4 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6        */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[6]) FAILURE		/* f7        */
-  if (a6 != *(_Decimal128 *)&lparms.fprs[7]) FAILURE		/* f8 & f9   */
-  if (a7 != ((d32parm_t *)&lparms.fprs[9])->d) FAILURE		/* f10       */
-  if (a8 != *(_Decimal64 *)&lparms.fprs[10]) FAILURE		/* f11       */
-  if (a9 != *(_Decimal128 *)&lparms.fprs[11]) FAILURE		/* f12 & f13 */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1        */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[1]) FAILURE		/* f2        */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[3]) FAILURE		/* f4 & f5   */
+  if (a4 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6        */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[6]) FAILURE		/* f7        */
+  if (a6 != *(_Decimal128 *)&gparms.fprs[7]) FAILURE		/* f8 & f9   */
+  if (a7 != ((d32parm_t *)&gparms.fprs[9])->d) FAILURE		/* f10       */
+  if (a8 != *(_Decimal64 *)&gparms.fprs[10]) FAILURE		/* f11       */
+  if (a9 != *(_Decimal128 *)&gparms.fprs[11]) FAILURE		/* f12 & f13 */
   if (a10 != ((d32parm_t *)&sp->slot[12])->d) FAILURE
   if (a11 != *(_Decimal64 *)&sp->slot[13]) FAILURE
 }
@@ -292,23 +305,23 @@ func6 (_Decimal32 a1, _Decimal64 a2, _De
 int
 main (void)
 {
-  func0 (1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5,
-	 14.5, 15.2dd, 16.2dl, 17.2dd);
-  func1 (101.5, 102.5, 103.5, 104.5, 105.5, 106.5, 107.5, 108.5, 109.5,
-	 110.5, 111.5, 112.5, 113.5, 114.2dd);
-  func2 (201.5, 202.5, 203.5, 204.5, 205.5, 206.5, 207.5, 208.5, 209.5,
-	 210.5, 211.5, 212.5, 213.2dd);
-  func3 (301.2dd, 302.2dl, 303.2dd, 304.2dl, 305.2dd, 306.2dl, 307.2dd,
-	 308.2dl, 309.2dd, 310.2dl);
-  func4 (401.2dl, 402.2dd, 403.2dl, 404.2dd, 405.2dl, 406.2dd, 407.2dl,
-	 408.2dd);
+  func0_asm (1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5,
+	     14.5, 15.2dd, 16.2dl, 17.2dd);
+  func1_asm (101.5, 102.5, 103.5, 104.5, 105.5, 106.5, 107.5, 108.5, 109.5,
+	     110.5, 111.5, 112.5, 113.5, 114.2dd);
+  func2_asm (201.5, 202.5, 203.5, 204.5, 205.5, 206.5, 207.5, 208.5, 209.5,
+	     210.5, 211.5, 212.5, 213.2dd);
+  func3_asm (301.2dd, 302.2dl, 303.2dd, 304.2dl, 305.2dd, 306.2dl, 307.2dd,
+	     308.2dl, 309.2dd, 310.2dl);
+  func4_asm (401.2dl, 402.2dd, 403.2dl, 404.2dd, 405.2dl, 406.2dd, 407.2dl,
+	     408.2dd);
 #if 0
   /* _Decimal32 doesn't yet follow the ABI; enable this when it does.  */
-  func5 (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
-	 508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
-	 515.2df, 516.2df);
-  func6 (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
-	 607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
+  func5_asm (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
+	     508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
+	     515.2df, 516.2df);
+  func6_asm (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
+	     607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
 #endif
 
   if (failcnt != 0)
Index: gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c	(revision 169441)
+++ gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c	(working copy)
@@ -30,31 +30,6 @@ typedef struct
 
 reg_parms_t gparms;
 
-
-/* Testcase could break on future gcc's, if parameter regs
-   are changed before this asm.  */
-
-#define save_parms(lparms)				\
-    asm volatile ("lis 11,gparms@ha\n\t"		\
-                  "la 11,gparms@l(11)\n\t"		\
-                  "st 3,0(11)\n\t"		        \
-	          "st 4,4(11)\n\t"			\
-	          "st 5,8(11)\n\t"			\
-	          "st 6,12(11)\n\t"			\
-	          "st 7,16(11)\n\t"			\
-	          "st 8,20(11)\n\t"			\
-	          "st 9,24(11)\n\t"			\
-	          "st 10,28(11)\n\t"			\
-                  "stfd 1,32(11)\n\t"			\
-	          "stfd 2,40(11)\n\t"			\
-	          "stfd 3,48(11)\n\t"			\
-	          "stfd 4,56(11)\n\t"			\
-	          "stfd 5,64(11)\n\t"			\
-	          "stfd 6,72(11)\n\t"			\
-	          "stfd 7,80(11)\n\t"			\
-	          "stfd 8,88(11)\n\t":::"11", "memory");  \
-                  lparms = gparms;
-
 typedef struct sf
 {
   struct sf *backchain;
@@ -62,115 +37,159 @@ typedef struct sf
   unsigned int slot[200];
 } stack_frame_t;
 
+/* Wrapper to save the GPRs and FPRs and then jump to the real function.  */
+#define WRAPPER(NAME)							\
+__asm__ ("\t.globl\t" #NAME "_asm\n\t"					\
+	 ".text\n\t"							\
+	 ".type " #NAME "_asm, @function\n"				\
+	 #NAME "_asm:\n\t"						\
+	 "lis 11,gparms@ha\n\t"						\
+	 "la 11,gparms@l(11)\n\t"					\
+	 "st 3,0(11)\n\t"						\
+	 "st 4,4(11)\n\t"						\
+	 "st 5,8(11)\n\t"						\
+	 "st 6,12(11)\n\t"						\
+	 "st 7,16(11)\n\t"						\
+	 "st 8,20(11)\n\t"						\
+	 "st 9,24(11)\n\t"						\
+	 "st 10,28(11)\n\t"						\
+	 "stfd 1,32(11)\n\t"						\
+	 "stfd 2,40(11)\n\t"						\
+	 "stfd 3,48(11)\n\t"						\
+	 "stfd 4,56(11)\n\t"						\
+	 "stfd 5,64(11)\n\t"						\
+	 "stfd 6,72(11)\n\t"						\
+	 "stfd 7,80(11)\n\t"						\
+	 "stfd 8,88(11)\n\t"						\
+	 "b " #NAME "\n\t"						\
+	 ".size " #NAME ",.-" #NAME "\n")
+
 /* Fill up floating point registers with double arguments, forcing
    decimal float arguments into the parameter save area.  */
+extern void func0_asm (double, double, double, double, double,
+		       double, double, double, _Decimal64, _Decimal128);
+
+WRAPPER(func0);
+
 void __attribute__ ((noinline))
 func0 (double a1, double a2, double a3, double a4, double a5,
        double a6, double a7, double a8, _Decimal64 a9, _Decimal128 a10)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
   if (a9 != *(_Decimal64 *)&sp->slot[0]) FAILURE
   if (a10 != *(_Decimal128 *)&sp->slot[2]) FAILURE
 }
 
 /* Alternate 64-bit and 128-bit decimal float arguments, checking that
    _Decimal128 is always passed in even/odd register pairs.  */
+extern void func1_asm (_Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64, _Decimal128, _Decimal64, _Decimal128);
+
+WRAPPER(func1);
+
 void __attribute__ ((noinline))
 func1 (_Decimal64 a1, _Decimal128 a2, _Decimal64 a3, _Decimal128 a4,
        _Decimal64 a5, _Decimal128 a6, _Decimal64 a7, _Decimal128 a8)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal64 *)&lparms.fprs[0]) FAILURE	/* f1 */
-  if (a2 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3 */
-  if (a3 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4 */
-  if (a4 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7 */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8 */
+  if (a1 != *(_Decimal64 *)&gparms.fprs[0]) FAILURE	/* f1 */
+  if (a2 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3 */
+  if (a3 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4 */
+  if (a4 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7 */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8 */
   if (a6 != *(_Decimal128 *)&sp->slot[0]) FAILURE
   if (a7 != *(_Decimal64 *)&sp->slot[4]) FAILURE
   if (a8 != *(_Decimal128 *)&sp->slot[6]) FAILURE
 }
 
+extern void func2_asm (_Decimal128, _Decimal64, _Decimal128, _Decimal64,
+		       _Decimal128, _Decimal64, _Decimal128, _Decimal64);
+
+WRAPPER(func2);
+
 void __attribute__ ((noinline))
 func2 (_Decimal128 a1, _Decimal64 a2, _Decimal128 a3, _Decimal64 a4,
        _Decimal128 a5, _Decimal64 a6, _Decimal128 a7, _Decimal64 a8)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3 */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4 */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7 */
-  if (a4 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8 */
+  if (a1 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3 */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4 */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7 */
+  if (a4 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8 */
   if (a5 != *(_Decimal128 *)&sp->slot[0]) FAILURE
   if (a6 != *(_Decimal64 *)&sp->slot[4]) FAILURE
   if (a7 != *(_Decimal128 *)&sp->slot[6]) FAILURE
   if (a8 != *(_Decimal64 *)&sp->slot[10]) FAILURE
 }
 
+extern void func3_asm (_Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64);
+
+WRAPPER(func3);
+
 void __attribute__ ((noinline))
 func3 (_Decimal64 a1, _Decimal128 a2, _Decimal64 a3, _Decimal128 a4,
        _Decimal64 a5)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal64 *)&lparms.fprs[0]) FAILURE	/* f1 */
-  if (a2 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3 */
-  if (a3 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4 */
-  if (a4 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7 */
+  if (a1 != *(_Decimal64 *)&gparms.fprs[0]) FAILURE	/* f1 */
+  if (a2 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3 */
+  if (a3 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4 */
+  if (a4 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7 */
   if (a5 != *(_Decimal128 *)&sp->slot[0]) FAILURE
 }
 
+extern void func4_asm (_Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32);
+
+WRAPPER(func4);
+
 void __attribute__ ((noinline))
 func4 (_Decimal32 a1, _Decimal32 a2, _Decimal32 a3, _Decimal32 a4,
        _Decimal32 a5, _Decimal32 a6, _Decimal32 a7, _Decimal32 a8,
        _Decimal32 a9, _Decimal32 a10, _Decimal32 a11, _Decimal32 a12,
        _Decimal32 a13, _Decimal32 a14, _Decimal32 a15, _Decimal32 a16)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
   /* _Decimal32 is passed in the lower half of an FPR, or in parameter slot.  */
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1  */
-  if (a2 != ((d32parm_t *)&lparms.fprs[1])->d) FAILURE		/* f2  */
-  if (a3 != ((d32parm_t *)&lparms.fprs[2])->d) FAILURE		/* f3  */
-  if (a4 != ((d32parm_t *)&lparms.fprs[3])->d) FAILURE		/* f4  */
-  if (a5 != ((d32parm_t *)&lparms.fprs[4])->d) FAILURE		/* f5  */
-  if (a6 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6  */
-  if (a7 != ((d32parm_t *)&lparms.fprs[6])->d) FAILURE		/* f7  */
-  if (a8 != ((d32parm_t *)&lparms.fprs[7])->d) FAILURE		/* f8  */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1  */
+  if (a2 != ((d32parm_t *)&gparms.fprs[1])->d) FAILURE		/* f2  */
+  if (a3 != ((d32parm_t *)&gparms.fprs[2])->d) FAILURE		/* f3  */
+  if (a4 != ((d32parm_t *)&gparms.fprs[3])->d) FAILURE		/* f4  */
+  if (a5 != ((d32parm_t *)&gparms.fprs[4])->d) FAILURE		/* f5  */
+  if (a6 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6  */
+  if (a7 != ((d32parm_t *)&gparms.fprs[6])->d) FAILURE		/* f7  */
+  if (a8 != ((d32parm_t *)&gparms.fprs[7])->d) FAILURE		/* f8  */
   if (a9 != *(_Decimal32 *)&sp->slot[0]) FAILURE
   if (a10 != *(_Decimal32 *)&sp->slot[1]) FAILURE
   if (a11 != *(_Decimal32 *)&sp->slot[2]) FAILURE
@@ -181,24 +200,29 @@ func4 (_Decimal32 a1, _Decimal32 a2, _De
   if (a16 != *(_Decimal32 *)&sp->slot[7]) FAILURE
 }
 
+extern void func5_asm (_Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128);
+
+WRAPPER(func5);
+
 void __attribute__ ((noinline))
 func5 (_Decimal32 a1, _Decimal64 a2, _Decimal128 a3,
        _Decimal32 a4, _Decimal64 a5, _Decimal128 a6,
        _Decimal32 a7, _Decimal64 a8, _Decimal128 a9,
        _Decimal32 a10, _Decimal64 a11, _Decimal128 a12)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1      */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[1]) FAILURE		/* f2      */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[3]) FAILURE		/* f4 & f5 */
-  if (a4 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6      */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[6]) FAILURE		/* f7      */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1      */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[1]) FAILURE		/* f2      */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[3]) FAILURE		/* f4 & f5 */
+  if (a4 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6      */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[6]) FAILURE		/* f7      */
 
   if (a6 != *(_Decimal128 *)&sp->slot[0]) FAILURE
   if (a7 != *(_Decimal32 *)&sp->slot[4]) FAILURE
@@ -212,15 +236,15 @@ func5 (_Decimal32 a1, _Decimal64 a2, _De
 int
 main ()
 {
-  func0 (1., 2., 3., 4., 5., 6., 7., 8., 9.dd, 10.dl);
-  func1 (1.dd, 2.dl, 3.dd, 4.dl, 5.dd, 6.dl, 7.dd, 8.dl);
-  func2 (1.dl, 2.dd, 3.dl, 4.dd, 5.dl, 6.dd, 7.dl, 8.dd);
-  func3 (1.dd, 2.dl, 3.dd, 4.dl, 5.dl);
-  func4 (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
-	 508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
-	 515.2df, 516.2df);
-  func5 (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
-	 607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
+  func0_asm (1., 2., 3., 4., 5., 6., 7., 8., 9.dd, 10.dl);
+  func1_asm (1.dd, 2.dl, 3.dd, 4.dl, 5.dd, 6.dl, 7.dd, 8.dl);
+  func2_asm (1.dl, 2.dd, 3.dl, 4.dd, 5.dl, 6.dd, 7.dl, 8.dd);
+  func3_asm (1.dd, 2.dl, 3.dd, 4.dl, 5.dl);
+  func4_asm (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
+	     508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
+	     515.2df, 516.2df);
+  func5_asm (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
+	     607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
 
   if (failcnt != 0)
     abort ();
Index: gcc/testsuite/gcc.target/powerpc/avoid-indexed-addresses.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/avoid-indexed-addresses.c	(revision 169441)
+++ gcc/testsuite/gcc.target/powerpc/avoid-indexed-addresses.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-options "-O2 -mavoid-indexed-addresses" } */
+/* { dg-options "-O2 -mavoid-indexed-addresses -mno-altivec -mno-vsx" } */
 
 /* { dg-final { scan-assembler-not "lbzx" } }
 
Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 169441)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -123,6 +123,43 @@ (define_split
   DONE;
 })
 
+;; Vector floating point load/store instructions that uses the Altivec
+;; instructions even if we are compiling for VSX, since the Altivec
+;; instructions silently ignore the bottom 3 bits of the address, and VSX does
+;; not.
+(define_expand "vector_altivec_load_<mode>"
+  [(set (match_operand:VEC_M 0 "vfloat_operand" "")
+	(match_operand:VEC_M 1 "memory_operand" ""))]
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "
+{
+  gcc_assert (VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode));
+
+  if (VECTOR_MEM_VSX_P (<MODE>mode))
+    {
+      operands[1] = rs6000_address_for_altivec (operands[1]);
+      emit_insn (gen_altivec_lvx_<mode> (operands[0], operands[1]));
+      DONE;
+    }
+}")
+
+(define_expand "vector_altivec_store_<mode>"
+  [(set (match_operand:VEC_M 0 "memory_operand" "")
+	(match_operand:VEC_M 1 "vfloat_operand" ""))]
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "
+{
+  gcc_assert (VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode));
+
+  if (VECTOR_MEM_VSX_P (<MODE>mode))
+    {
+      operands[0] = rs6000_address_for_altivec (operands[0]);
+      emit_insn (gen_altivec_stvx_<mode> (operands[0], operands[1]));
+      DONE;
+    }
+}")
+
+
 \f
 ;; Reload patterns for vector operations.  We may need an addtional base
 ;; register to convert the reg+offset addressing to reg+reg for vector
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 169441)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -129,6 +129,7 @@ extern void rs6000_emit_parity (rtx, rtx
 extern rtx rs6000_machopic_legitimize_pic_address (rtx, enum machine_mode,
 						   rtx);
 extern rtx rs6000_address_for_fpconvert (rtx);
+extern rtx rs6000_address_for_altivec (rtx);
 extern rtx rs6000_allocate_stack_temp (enum machine_mode, bool, bool);
 extern int rs6000_loop_align (rtx);
 #endif /* RTX_CODE */
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 169441)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -1000,6 +1000,14 @@ const struct altivec_builtin_types altiv
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_XVDIVDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
@@ -1115,6 +1123,14 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
@@ -1133,6 +1149,16 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
@@ -1151,6 +1177,16 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVLX, ALTIVEC_BUILTIN_LVLX,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_LVLX, ALTIVEC_BUILTIN_LVLX,
@@ -2644,6 +2680,14 @@ const struct altivec_builtin_types altiv
   { ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VSLDOI_16QI,
     RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_NOT_OPAQUE },
   { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
     RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
     RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float },
@@ -2809,6 +2853,16 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
   { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
     RS6000_BTI_void, RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_pixel_V8HI },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_STVLX, ALTIVEC_BUILTIN_STVLX,
     RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_STVLX, ALTIVEC_BUILTIN_STVLX,
@@ -3002,6 +3056,112 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
     RS6000_BTI_NOT_OPAQUE },
 
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V4SI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_pixel_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V16QI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
+
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DF,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SF,
+    RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SF,
+    RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V4SI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V8HI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V16QI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_pixel_V8HI },
+
   /* Predicates.  */
   { ALTIVEC_BUILTIN_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTUB_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI },
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 169441)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -37,6 +37,10 @@ RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERN
 RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_16qi,	RS6000_BTC_MEM)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERNAL_4sf,		RS6000_BTC_MEM)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_4sf,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERNAL_2df,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_2df,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERNAL_2di,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_2di,		RS6000_BTC_MEM)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VADDUBM,			RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VADDUHM,			RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VADDUWM,			RS6000_BTC_CONST)
@@ -778,12 +782,20 @@ RS6000_BUILTIN(PAIRED_BUILTIN_CMPU1,			R
 
   /* VSX builtins.  */
 RS6000_BUILTIN(VSX_BUILTIN_LXSDX,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_LXVD2X,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVD2X_V2DF,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVD2X_V2DI,			RS6000_BTC_MEM)
 RS6000_BUILTIN(VSX_BUILTIN_LXVDSX,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_LXVW4X,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V4SF,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V4SI,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V8HI,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V16QI,		RS6000_BTC_MEM)
 RS6000_BUILTIN(VSX_BUILTIN_STXSDX,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_STXVD2X,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_STXVW4X,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVD2X_V2DF,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVD2X_V2DI,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V4SF,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V4SI,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V8HI,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V16QI,		RS6000_BTC_MEM)
 RS6000_BUILTIN(VSX_BUILTIN_XSABSDP,			RS6000_BTC_CONST)
 RS6000_BUILTIN(VSX_BUILTIN_XSADDDP,			RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(VSX_BUILTIN_XSCMPODP,			RS6000_BTC_FP_PURE)
@@ -983,8 +995,10 @@ RS6000_BUILTIN(VSX_BUILTIN_VEC_XXPERMDI,
 RS6000_BUILTIN(VSX_BUILTIN_VEC_XXSLDWI,			RS6000_BTC_MISC)
 RS6000_BUILTIN(VSX_BUILTIN_VEC_XXSPLTD,			RS6000_BTC_MISC)
 RS6000_BUILTIN(VSX_BUILTIN_VEC_XXSPLTW,			RS6000_BTC_MISC)
+RS6000_BUILTIN(VSX_BUILTIN_VEC_LD,			RS6000_BTC_MISC)
+RS6000_BUILTIN(VSX_BUILTIN_VEC_ST,			RS6000_BTC_MISC)
 RS6000_BUILTIN_EQUATE(VSX_BUILTIN_OVERLOADED_LAST,
-		      VSX_BUILTIN_VEC_XXSPLTW)
+		      VSX_BUILTIN_VEC_ST)
 
 /* Combined VSX/Altivec builtins.  */
 RS6000_BUILTIN(VECTOR_BUILTIN_FLOAT_V4SI_V4SF,		RS6000_BTC_FP_PURE)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 169441)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3316,9 +3316,12 @@ rs6000_option_override_internal (bool gl
   /* If not explicitly specified via option, decide whether to generate indexed
      load/store instructions.  */
   if (TARGET_AVOID_XFORM == -1)
-    /* Avoid indexed addressing when targeting Power6 in order to avoid
-     the DERAT mispredict penalty.  */
-    TARGET_AVOID_XFORM = (rs6000_cpu == PROCESSOR_POWER6 && TARGET_CMPB);
+    /* Avoid indexed addressing when targeting Power6 in order to avoid the
+     DERAT mispredict penalty.  However the LVE and STVE altivec instructions
+     need indexed accesses and the type used is the scalar type of the element
+     being loaded or stored.  */
+    TARGET_AVOID_XFORM = (rs6000_cpu == PROCESSOR_POWER6 && TARGET_CMPB
+			  && !TARGET_ALTIVEC);
 
   /* Set the -mrecip options.  */
   if (rs6000_recip_name)
@@ -11263,16 +11266,22 @@ altivec_expand_ld_builtin (tree exp, rtx
   switch (fcode)
     {
     case ALTIVEC_BUILTIN_LD_INTERNAL_16qi:
-      icode = CODE_FOR_vector_load_v16qi;
+      icode = CODE_FOR_vector_altivec_load_v16qi;
       break;
     case ALTIVEC_BUILTIN_LD_INTERNAL_8hi:
-      icode = CODE_FOR_vector_load_v8hi;
+      icode = CODE_FOR_vector_altivec_load_v8hi;
       break;
     case ALTIVEC_BUILTIN_LD_INTERNAL_4si:
-      icode = CODE_FOR_vector_load_v4si;
+      icode = CODE_FOR_vector_altivec_load_v4si;
       break;
     case ALTIVEC_BUILTIN_LD_INTERNAL_4sf:
-      icode = CODE_FOR_vector_load_v4sf;
+      icode = CODE_FOR_vector_altivec_load_v4sf;
+      break;
+    case ALTIVEC_BUILTIN_LD_INTERNAL_2df:
+      icode = CODE_FOR_vector_altivec_load_v2df;
+      break;
+    case ALTIVEC_BUILTIN_LD_INTERNAL_2di:
+      icode = CODE_FOR_vector_altivec_load_v2di;
       break;
     default:
       *expandedp = false;
@@ -11316,16 +11325,22 @@ altivec_expand_st_builtin (tree exp, rtx
   switch (fcode)
     {
     case ALTIVEC_BUILTIN_ST_INTERNAL_16qi:
-      icode = CODE_FOR_vector_store_v16qi;
+      icode = CODE_FOR_vector_altivec_store_v16qi;
       break;
     case ALTIVEC_BUILTIN_ST_INTERNAL_8hi:
-      icode = CODE_FOR_vector_store_v8hi;
+      icode = CODE_FOR_vector_altivec_store_v8hi;
       break;
     case ALTIVEC_BUILTIN_ST_INTERNAL_4si:
-      icode = CODE_FOR_vector_store_v4si;
+      icode = CODE_FOR_vector_altivec_store_v4si;
       break;
     case ALTIVEC_BUILTIN_ST_INTERNAL_4sf:
-      icode = CODE_FOR_vector_store_v4sf;
+      icode = CODE_FOR_vector_altivec_store_v4sf;
+      break;
+    case ALTIVEC_BUILTIN_ST_INTERNAL_2df:
+      icode = CODE_FOR_vector_altivec_store_v2df;
+      break;
+    case ALTIVEC_BUILTIN_ST_INTERNAL_2di:
+      icode = CODE_FOR_vector_altivec_store_v2di;
       break;
     default:
       *expandedp = false;
@@ -11557,7 +11572,7 @@ altivec_expand_builtin (tree exp, rtx ta
   switch (fcode)
     {
     case ALTIVEC_BUILTIN_STVX:
-      return altivec_expand_stv_builtin (CODE_FOR_altivec_stvx, exp);
+      return altivec_expand_stv_builtin (CODE_FOR_altivec_stvx_v4si, exp);
     case ALTIVEC_BUILTIN_STVEBX:
       return altivec_expand_stv_builtin (CODE_FOR_altivec_stvebx, exp);
     case ALTIVEC_BUILTIN_STVEHX:
@@ -11576,6 +11591,19 @@ altivec_expand_builtin (tree exp, rtx ta
     case ALTIVEC_BUILTIN_STVRXL:
       return altivec_expand_stv_builtin (CODE_FOR_altivec_stvrxl, exp);
 
+    case VSX_BUILTIN_STXVD2X_V2DF:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v2df, exp);
+    case VSX_BUILTIN_STXVD2X_V2DI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v2di, exp);
+    case VSX_BUILTIN_STXVW4X_V4SF:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v4sf, exp);
+    case VSX_BUILTIN_STXVW4X_V4SI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v4si, exp);
+    case VSX_BUILTIN_STXVW4X_V8HI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v8hi, exp);
+    case VSX_BUILTIN_STXVW4X_V16QI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v16qi, exp);
+
     case ALTIVEC_BUILTIN_MFVSCR:
       icode = CODE_FOR_altivec_mfvscr;
       tmode = insn_data[icode].operand[0].mode;
@@ -11700,7 +11728,7 @@ altivec_expand_builtin (tree exp, rtx ta
       return altivec_expand_lv_builtin (CODE_FOR_altivec_lvxl,
 					exp, target, false);
     case ALTIVEC_BUILTIN_LVX:
-      return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx,
+      return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v4si,
 					exp, target, false);
     case ALTIVEC_BUILTIN_LVLX:
       return altivec_expand_lv_builtin (CODE_FOR_altivec_lvlx,
@@ -11714,6 +11742,25 @@ altivec_expand_builtin (tree exp, rtx ta
     case ALTIVEC_BUILTIN_LVRXL:
       return altivec_expand_lv_builtin (CODE_FOR_altivec_lvrxl,
 					exp, target, true);
+    case VSX_BUILTIN_LXVD2X_V2DF:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v2df,
+					exp, target, false);
+    case VSX_BUILTIN_LXVD2X_V2DI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v2di,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V4SF:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v4sf,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V4SI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v4si,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V8HI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v8hi,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V16QI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v16qi,
+					exp, target, false);
+      break;
     default:
       break;
       /* Fall through.  */
@@ -12331,6 +12378,8 @@ rs6000_init_builtins (void)
 
   long_integer_type_internal_node = long_integer_type_node;
   long_unsigned_type_internal_node = long_unsigned_type_node;
+  long_long_integer_type_internal_node = long_long_integer_type_node;
+  long_long_unsigned_type_internal_node = long_long_unsigned_type_node;
   intQI_type_internal_node = intQI_type_node;
   uintQI_type_internal_node = unsigned_intQI_type_node;
   intHI_type_internal_node = intHI_type_node;
@@ -12340,7 +12389,7 @@ rs6000_init_builtins (void)
   intDI_type_internal_node = intDI_type_node;
   uintDI_type_internal_node = unsigned_intDI_type_node;
   float_type_internal_node = float_type_node;
-  double_type_internal_node = float_type_node;
+  double_type_internal_node = double_type_node;
   void_type_internal_node = void_type_node;
 
   /* Initialize the modes for builtin_function_type, mapping a machine mode to
@@ -12872,19 +12921,11 @@ altivec_init_builtins (void)
   size_t i;
   tree ftype;
 
-  tree pfloat_type_node = build_pointer_type (float_type_node);
-  tree pint_type_node = build_pointer_type (integer_type_node);
-  tree pshort_type_node = build_pointer_type (short_integer_type_node);
-  tree pchar_type_node = build_pointer_type (char_type_node);
-
   tree pvoid_type_node = build_pointer_type (void_type_node);
 
-  tree pcfloat_type_node = build_pointer_type (build_qualified_type (float_type_node, TYPE_QUAL_CONST));
-  tree pcint_type_node = build_pointer_type (build_qualified_type (integer_type_node, TYPE_QUAL_CONST));
-  tree pcshort_type_node = build_pointer_type (build_qualified_type (short_integer_type_node, TYPE_QUAL_CONST));
-  tree pcchar_type_node = build_pointer_type (build_qualified_type (char_type_node, TYPE_QUAL_CONST));
-
-  tree pcvoid_type_node = build_pointer_type (build_qualified_type (void_type_node, TYPE_QUAL_CONST));
+  tree pcvoid_type_node
+    = build_pointer_type (build_qualified_type (void_type_node,
+						TYPE_QUAL_CONST));
 
   tree int_ftype_opaque
     = build_function_type_list (integer_type_node,
@@ -12907,26 +12948,6 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V4SI_type_node,
 				V4SI_type_node, NULL_TREE);
-  tree v4sf_ftype_pcfloat
-    = build_function_type_list (V4SF_type_node, pcfloat_type_node, NULL_TREE);
-  tree void_ftype_pfloat_v4sf
-    = build_function_type_list (void_type_node,
-				pfloat_type_node, V4SF_type_node, NULL_TREE);
-  tree v4si_ftype_pcint
-    = build_function_type_list (V4SI_type_node, pcint_type_node, NULL_TREE);
-  tree void_ftype_pint_v4si
-    = build_function_type_list (void_type_node,
-				pint_type_node, V4SI_type_node, NULL_TREE);
-  tree v8hi_ftype_pcshort
-    = build_function_type_list (V8HI_type_node, pcshort_type_node, NULL_TREE);
-  tree void_ftype_pshort_v8hi
-    = build_function_type_list (void_type_node,
-				pshort_type_node, V8HI_type_node, NULL_TREE);
-  tree v16qi_ftype_pcchar
-    = build_function_type_list (V16QI_type_node, pcchar_type_node, NULL_TREE);
-  tree void_ftype_pchar_v16qi
-    = build_function_type_list (void_type_node,
-				pchar_type_node, V16QI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -12948,6 +12969,15 @@ altivec_init_builtins (void)
   tree v4si_ftype_long_pcvoid
     = build_function_type_list (V4SI_type_node,
 				long_integer_type_node, pcvoid_type_node, NULL_TREE);
+  tree v4sf_ftype_long_pcvoid
+    = build_function_type_list (V4SF_type_node,
+				long_integer_type_node, pcvoid_type_node, NULL_TREE);
+  tree v2df_ftype_long_pcvoid
+    = build_function_type_list (V2DF_type_node,
+				long_integer_type_node, pcvoid_type_node, NULL_TREE);
+  tree v2di_ftype_long_pcvoid
+    = build_function_type_list (V2DI_type_node,
+				long_integer_type_node, pcvoid_type_node, NULL_TREE);
 
   tree void_ftype_opaque_long_pvoid
     = build_function_type_list (void_type_node,
@@ -12965,6 +12995,18 @@ altivec_init_builtins (void)
     = build_function_type_list (void_type_node,
 				V8HI_type_node, long_integer_type_node,
 				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v4sf_long_pvoid
+    = build_function_type_list (void_type_node,
+				V4SF_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v2df_long_pvoid
+    = build_function_type_list (void_type_node,
+				V2DF_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v2di_long_pvoid
+    = build_function_type_list (void_type_node,
+				V2DI_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
   tree int_ftype_int_v8hi_v8hi
     = build_function_type_list (integer_type_node,
 				integer_type_node, V8HI_type_node,
@@ -12996,22 +13038,6 @@ altivec_init_builtins (void)
 				pcvoid_type_node, integer_type_node,
 				integer_type_node, NULL_TREE);
 
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_4sf", v4sf_ftype_pcfloat,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_4sf);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_4sf", void_ftype_pfloat_v4sf,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_4sf);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_4si", v4si_ftype_pcint,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_4si);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_4si", void_ftype_pint_v4si,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_4si);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_8hi", v8hi_ftype_pcshort,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_8hi);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_8hi", void_ftype_pshort_v8hi,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_8hi);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_16qi", v16qi_ftype_pcchar,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_16qi);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_16qi", void_ftype_pchar_v16qi,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_16qi);
   def_builtin (MASK_ALTIVEC, "__builtin_altivec_mtvscr", void_ftype_v4si, ALTIVEC_BUILTIN_MTVSCR);
   def_builtin (MASK_ALTIVEC, "__builtin_altivec_mfvscr", v8hi_ftype_void, ALTIVEC_BUILTIN_MFVSCR);
   def_builtin (MASK_ALTIVEC, "__builtin_altivec_dssall", void_ftype_void, ALTIVEC_BUILTIN_DSSALL);
@@ -13043,6 +13069,21 @@ altivec_init_builtins (void)
   def_builtin (MASK_ALTIVEC, "__builtin_vec_stvebx", void_ftype_opaque_long_pvoid, ALTIVEC_BUILTIN_VEC_STVEBX);
   def_builtin (MASK_ALTIVEC, "__builtin_vec_stvehx", void_ftype_opaque_long_pvoid, ALTIVEC_BUILTIN_VEC_STVEHX);
 
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvd2x_v2df", v2df_ftype_long_pcvoid, VSX_BUILTIN_LXVD2X_V2DF);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvd2x_v2di", v2di_ftype_long_pcvoid, VSX_BUILTIN_LXVD2X_V2DI);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v4sf", v4sf_ftype_long_pcvoid, VSX_BUILTIN_LXVW4X_V4SF);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v4si", v4si_ftype_long_pcvoid, VSX_BUILTIN_LXVW4X_V4SI);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v8hi", v8hi_ftype_long_pcvoid, VSX_BUILTIN_LXVW4X_V8HI);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v16qi", v16qi_ftype_long_pcvoid, VSX_BUILTIN_LXVW4X_V16QI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvd2x_v2df", void_ftype_v2df_long_pvoid, VSX_BUILTIN_STXVD2X_V2DF);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvd2x_v2di", void_ftype_v2di_long_pvoid, VSX_BUILTIN_STXVD2X_V2DI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v4sf", void_ftype_v4sf_long_pvoid, VSX_BUILTIN_STXVW4X_V4SF);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v4si", void_ftype_v4si_long_pvoid, VSX_BUILTIN_STXVW4X_V4SI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v8hi", void_ftype_v8hi_long_pvoid, VSX_BUILTIN_STXVW4X_V8HI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v16qi", void_ftype_v16qi_long_pvoid, VSX_BUILTIN_STXVW4X_V16QI);
+  def_builtin (MASK_VSX, "__builtin_vec_vsx_ld", opaque_ftype_long_pcvoid, VSX_BUILTIN_VEC_LD);
+  def_builtin (MASK_VSX, "__builtin_vec_vsx_st", void_ftype_opaque_long_pvoid, VSX_BUILTIN_VEC_ST);
+
   if (rs6000_cpu == PROCESSOR_CELL)
     {
       def_builtin (MASK_ALTIVEC, "__builtin_altivec_lvlx",  v16qi_ftype_long_pcvoid, ALTIVEC_BUILTIN_LVLX);
@@ -27925,4 +27966,29 @@ rs6000_address_for_fpconvert (rtx x)
   return x;
 }
 
+/* Given a memory reference, if it is not in the form for altivec memory
+   reference instructions (i.e. reg or reg+reg addressing with AND of -16),
+   convert to the altivec format.  */
+
+rtx
+rs6000_address_for_altivec (rtx x)
+{
+  gcc_assert (MEM_P (x));
+  if (!altivec_indexed_or_indirect_operand (x, GET_MODE (x)))
+    {
+      rtx addr = XEXP (x, 0);
+      int strict_p = (reload_in_progress || reload_completed);
+
+      if (!legitimate_indexed_address_p (addr, strict_p)
+	  && !legitimate_indirect_address_p (addr, strict_p))
+	addr = copy_to_mode_reg (Pmode, addr);
+
+      addr = gen_rtx_AND (Pmode, addr, GEN_INT (-16));
+      x = change_address (x, GET_MODE (x), addr);
+    }
+
+  return x;
+}
+
+
 #include "gt-rs6000.h"
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 169441)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -308,6 +308,19 @@ (define_insn "*vsx_movti"
 }
   [(set_attr "type" "vecstore,vecload,vecsimple,*,*,*,vecsimple,*,vecstore,vecload")])
 
+;; Explicit  load/store expanders for the builtin functions
+(define_expand "vsx_load_<mode>"
+  [(set (match_operand:VSX_M 0 "vsx_register_operand" "")
+	(match_operand:VSX_M 1 "memory_operand" ""))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+  "")
+
+(define_expand "vsx_store_<mode>"
+  [(set (match_operand:VEC_M 0 "memory_operand" "")
+	(match_operand:VEC_M 1 "vsx_register_operand" ""))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+  "")
+
 \f
 ;; VSX scalar and vector floating point arithmetic instructions
 (define_insn "*vsx_add<mode>3"
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 169441)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -2368,6 +2368,8 @@ enum rs6000_builtin_type_index
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
+  RS6000_BTI_long_long,	         /* long_long_integer_type_node */
+  RS6000_BTI_unsigned_long_long, /* long_long_unsigned_type_node */
   RS6000_BTI_INTQI,	         /* intQI_type_node */
   RS6000_BTI_UINTQI,		 /* unsigned_intQI_type_node */
   RS6000_BTI_INTHI,	         /* intHI_type_node */
@@ -2411,6 +2413,8 @@ enum rs6000_builtin_type_index
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
+#define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
+#define long_long_unsigned_type_internal_node (rs6000_builtin_types[RS6000_BTI_unsigned_long_long])
 #define long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long])
 #define long_unsigned_type_internal_node (rs6000_builtin_types[RS6000_BTI_unsigned_long])
 #define intQI_type_internal_node	 (rs6000_builtin_types[RS6000_BTI_INTQI])
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 169441)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -96,7 +96,7 @@ (define_constants
    (UNSPEC_STVE         203)
    (UNSPEC_SET_VSCR     213)
    (UNSPEC_GET_VRSAVE   214)
-   ;; 215 deleted
+   (UNSPEC_LVX		215)
    (UNSPEC_REDUC_PLUS   217)
    (UNSPEC_VECSH        219)
    (UNSPEC_EXTEVEN_V4SI 220)
@@ -1750,17 +1750,19 @@ (define_insn "altivec_lvxl"
   "lvxl %0,%y1"
   [(set_attr "type" "vecload")])
 
-(define_insn "altivec_lvx"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-	(match_operand:V4SI 1 "memory_operand" "Z"))]
+(define_insn "altivec_lvx_<mode>"
+  [(parallel
+    [(set (match_operand:VM2 0 "register_operand" "=v")
+	  (match_operand:VM2 1 "memory_operand" "Z"))
+     (unspec [(const_int 0)] UNSPEC_LVX)])]
   "TARGET_ALTIVEC"
   "lvx %0,%y1"
   [(set_attr "type" "vecload")])
 
-(define_insn "altivec_stvx"
+(define_insn "altivec_stvx_<mode>"
   [(parallel
-    [(set (match_operand:V4SI 0 "memory_operand" "=Z")
-	  (match_operand:V4SI 1 "register_operand" "v"))
+    [(set (match_operand:VM2 0 "memory_operand" "=Z")
+	  (match_operand:VM2 1 "register_operand" "v"))
      (unspec [(const_int 0)] UNSPEC_STVX)])]
   "TARGET_ALTIVEC"
   "stvx %1,%y0"
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 169441)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -318,6 +318,8 @@
 #define vec_nearbyint __builtin_vec_nearbyint
 #define vec_rint __builtin_vec_rint
 #define vec_sqrt __builtin_vec_sqrt
+#define vec_vsx_ld __builtin_vec_vsx_ld
+#define vec_vsx_st __builtin_vec_vsx_st
 #endif
 
 /* Predicates.
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 169441)
+++ libcpp/lex.c	(working copy)
@@ -547,6 +547,11 @@ search_line_fast (const uchar *s, const 
   const vc zero = { 0 };
 
   vc data, mask, t;
+  const uchar *unaligned_s = s;
+
+  /* While altivec loads mask addresses, we still need to align S so
+     that the offset we compute at the end is correct.  */
+  s = (const uchar *)((uintptr_t)s & -16);
 
   /* Altivec loads automatically mask addresses with -16.  This lets us
      issue the first load as early as possible.  */
@@ -555,15 +560,20 @@ search_line_fast (const uchar *s, const 
   /* Discard bytes before the beginning of the buffer.  Do this by
      beginning with all ones and shifting in zeros according to the
      mis-alignment.  The LVSR instruction pulls the exact shift we
-     want from the address.  */
-  mask = __builtin_vec_lvsr(0, s);
+     want from the address.
+
+     Originally, we used s in the lvsr and did the alignment afterwords, which
+     works on a system that supported just the Altivec instruction set using
+     the LVX instruction.  With the introduction of the VSX instruction, for
+     GCC 4.5, the load became LXVW4X.  LVX ignores the bottom 3 bits, and
+     LXVW4X does not.  While GCC 4.6 will revert vec_ld/vec_st to go back to
+     only produce Altivec instructions, the possibiliy exists that the stage1
+     compiler was built with a compiler that generated LXVW4X.  This code will
+     work on either system.  */
+  mask = __builtin_vec_lvsr(0, unaligned_s);
   mask = __builtin_vec_perm(zero, ones, mask);
   data &= mask;
 
-  /* While altivec loads mask addresses, we still need to align S so
-     that the offset we compute at the end is correct.  */
-  s = (const uchar *)((uintptr_t)s & -16);
-
   /* Main loop processing 16 bytes at a time.  */
   goto start;
   do

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-01-31 21:00     ` Michael Meissner
@ 2011-02-02 21:08       ` David Edelsohn
  2011-02-03  5:47         ` Michael Meissner
  0 siblings, 1 reply; 14+ messages in thread
From: David Edelsohn @ 2011-02-02 21:08 UTC (permalink / raw)
  To: Michael Meissner, Mark Mitchell, gcc-patches, rth, rguenther,
	jakub, berner, geoffk, joseph, pinskia, dominiq

On Mon, Jan 31, 2011 at 3:14 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> Here are my latest patches to fix the problem.  It does give the user the
> ability to use the VSX instructions that they had with GCC 4.5 in case there
> was new code that used vec_ld/vec_st incorrectly by using the new vec_vsx_ld
> and vec_vsx_st functions in altivec.h.  At present, I have not added an #ifdef
> so the user could switch to the GCC 4.5 behavior, but I could do that if
> desired.
>
> I noticed that the new vector types weren't supported by vec_ld/vec_st, so I
> added them.
>
> I'm including the libcpp/lex.c patch that allows the compiler to be built with
> GCC 4.5 using CFLAGS='-mcpu=power7 -O2 -g'.
>
> I also included 3 test suite fixes in this patch.
>
> I just did a side by side build of unpatched GCC 4.6 without special options,
> and one with these patches and adding --with-cpu=power7.  I'm seeing the
> following regressions:
>
> gcc.dg/pr41551.c                        (64-bit only, unrecognized insn)
> gcc.dg/pr42461.c                        (Peter Bergner has a fix)
> gcc.dg/pr46909.c
> gcc.dg/sms-3.c                          (64-bit only, both fail on 32-bit)
> gcc.dg/stack-usage-1.c                  (32-bit only)
> gcc.c-torture/execute/20050121-1.c      (32-bit only, unrecognized insn)
>
> I'm seeing passes in:
>
> gcc.dg/torture/va-arg-25.c
> gcc.dg/torture/vector-1.c
> gcc.dg/torture/vector-2.c
> c-c++-common/dfp/pr35620.c
> gcc.target/powerpc/ppc64-abi-dfp-1.c
>
> Are these patches ok to install?
>
> [gcc]
> 2011-01-28  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        PR target/47272
>        * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
>        Document using vector double with the load/store builtins, and
>        that the load/store builtins always use Altivec instructions.
>
>        * config/rs6000/vector.md (vector_altivec_load_<mode>): New insns
>        to use altivec memory instructions, even on VSX.
>        (vector_altivec_store_<mode>): Ditto.
>
>        * config/rs6000/rs6000-protos.h (rs6000_address_for_altivec): New
>        function.
>
>        * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
>        V2DF, V2DI support to load/store overloaded builtins.
>
>        * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_*): Add
>        altivec load/store builtins for V2DF/V2DI types.
>
>        * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't
>        set avoid indexed addresses on power6 if -maltivec.
>        (altivec_expand_ld_builtin): Add V2DF, V2DI support, use
>        vector_altivec_load/vector_altivec_store builtins.
>        (altivec_expand_st_builtin): Ditto.
>        (altivec_expand_builtin): Add VSX memory builtins.
>        (rs6000_init_builtins): Add V2DI types to internal types.
>        (altivec_init_builtins): Add support for V2DF/V2DI altivec
>        load/store builtins.
>        (rs6000_address_for_altivec): Insure memory address is appropriate
>        for Altivec.
>
>        * config/rs6000/vsx.md (vsx_load_<mode>): New expanders for
>        vec_vsx_ld and vec_vsx_st.
>        (vsx_store_<mode>): Ditto.
>
>        * config/rs6000/rs6000.h (RS6000_BTI_long_long): New type
>        variables to hold long long types for VSX vector memory builtins.
>        (RS6000_BTI_unsigned_long_long): Ditto.
>        (long_long_integer_type_internal_node): Ditti.
>        (long_long_unsigned_type_internal_node): Ditti.
>
>        * config/rs6000/altivec.md (UNSPEC_LVX): New UNSPEC.
>        (altivec_lvx_<mode>): Make altivec_lvx use a mode iterator.
>        (altivec_stvx_<mode>): Make altivec_stvx use a mode iterator.
>
>        * config/rs6000/altivec.h (vec_vsx_ld): Define VSX memory builtin
>        short cuts.
>        (vec_vsx_st): Ditto.

> [gcc/testsuite]
> 2011-01-28  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        PR target/47272
>        * gcc.target/powerpc/vsx-builtin-8.c: New file, test vec_vsx_ld
>        and vec_vsx_st.
>
>        * gcc.target/powerpc/avoid-indexed-addresses.c: Disable altivec
>        and vsx so a default --with-cpu=power7 doesn't give an error
>        when -mavoid-indexed-addresses is used.
>
>        * gcc.target/powerpc/ppc32-abi-dfp-1.c: Rewrite to use an asm
>        wrapper function to save the arguments and then jump to the real
>        function, rather than depending on the compiler not to move stuff
>        before an asm.
>        * gcc.target/powerpc/ppc64-abi-dfp-2.c: Ditto.

Okay, without the libcpp/lex.c change, as discussed offline.

Some of the XXX_type_node lines are too long (replacing lines that
were too long).

Thanks, David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st
  2011-02-02 21:08       ` David Edelsohn
@ 2011-02-03  5:47         ` Michael Meissner
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Meissner @ 2011-02-03  5:47 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Michael Meissner, Mark Mitchell, gcc-patches, rth, rguenther,
	jakub, berner, geoffk, joseph, pinskia, dominiq

[-- Attachment #1: Type: text/plain, Size: 3032 bytes --]

On Wed, Feb 02, 2011 at 04:08:44PM -0500, David Edelsohn wrote:
> Okay, without the libcpp/lex.c change, as discussed offline.
> 
> Some of the XXX_type_node lines are too long (replacing lines that
> were too long).
> 
> Thanks, David

Here is the patch I committed, fixing the long lines in the patch, and dropping
the lex.c patch.

[gcc]
2011-02-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/47272
	* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
	Document using vector double with the load/store builtins, and
	that the load/store builtins always use Altivec instructions.

	* config/rs6000/vector.md (vector_altivec_load_<mode>): New insns
	to use altivec memory instructions, even on VSX.
	(vector_altivec_store_<mode>): Ditto.

	* config/rs6000/rs6000-protos.h (rs6000_address_for_altivec): New
	function.

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	V2DF, V2DI support to load/store overloaded builtins.

	* config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_*): Add
	altivec load/store builtins for V2DF/V2DI types.

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Don't
	set avoid indexed addresses on power6 if -maltivec.
	(altivec_expand_ld_builtin): Add V2DF, V2DI support, use
	vector_altivec_load/vector_altivec_store builtins.
	(altivec_expand_st_builtin): Ditto.
	(altivec_expand_builtin): Add VSX memory builtins.
	(rs6000_init_builtins): Add V2DI types to internal types.
	(altivec_init_builtins): Add support for V2DF/V2DI altivec
	load/store builtins.
	(rs6000_address_for_altivec): Insure memory address is appropriate
	for Altivec.

	* config/rs6000/vsx.md (vsx_load_<mode>): New expanders for
	vec_vsx_ld and vec_vsx_st.
	(vsx_store_<mode>): Ditto.

	* config/rs6000/rs6000.h (RS6000_BTI_long_long): New type
	variables to hold long long types for VSX vector memory builtins.
	(RS6000_BTI_unsigned_long_long): Ditto.
	(long_long_integer_type_internal_node): Ditti.
	(long_long_unsigned_type_internal_node): Ditti.

	* config/rs6000/altivec.md (UNSPEC_LVX): New UNSPEC.
	(altivec_lvx_<mode>): Make altivec_lvx use a mode iterator.
	(altivec_stvx_<mode>): Make altivec_stvx use a mode iterator.

	* config/rs6000/altivec.h (vec_vsx_ld): Define VSX memory builtin
	short cuts.
	(vec_vsx_st): Ditto.

[gcc/testsuite]
2011-02-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/47272
	* gcc.target/powerpc/vsx-builtin-8.c: New file, test vec_vsx_ld
	and vec_vsx_st.

	* gcc.target/powerpc/avoid-indexed-addresses.c: Disable altivec
	and vsx so a default --with-cpu=power7 doesn't give an error
	when -mavoid-indexed-addresses is used.

	* gcc.target/powerpc/ppc32-abi-dfp-1.c: Rewrite to use an asm
	wrapper function to save the arguments and then jump to the real
	function, rather than depending on the compiler not to move stuff
	before an asm.
	* gcc.target/powerpc/ppc64-abi-dfp-2.c: Ditto.


-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

[-- Attachment #2: gcc-power7.patch213b --]
[-- Type: text/plain, Size: 79275 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 169775)
+++ gcc/doc/extend.texi	(working copy)
@@ -12359,6 +12359,12 @@ vector bool long vec_cmplt (vector doubl
 vector float vec_div (vector float, vector float);
 vector double vec_div (vector double, vector double);
 vector double vec_floor (vector double);
+vector double vec_ld (int, const vector double *);
+vector double vec_ld (int, const double *);
+vector double vec_ldl (int, const vector double *);
+vector double vec_ldl (int, const double *);
+vector unsigned char vec_lvsl (int, const volatile double *);
+vector unsigned char vec_lvsr (int, const volatile double *);
 vector double vec_madd (vector double, vector double, vector double);
 vector double vec_max (vector double, vector double);
 vector double vec_min (vector double, vector double);
@@ -12387,6 +12393,8 @@ vector double vec_sel (vector double, ve
 vector double vec_sub (vector double, vector double);
 vector float vec_sqrt (vector float);
 vector double vec_sqrt (vector double);
+void vec_st (vector double, int, vector double *);
+void vec_st (vector double, int, double *);
 vector double vec_trunc (vector double);
 vector double vec_xor (vector double, vector double);
 vector double vec_xor (vector double, vector bool long);
@@ -12415,7 +12423,65 @@ int vec_any_ngt (vector double, vector d
 int vec_any_nle (vector double, vector double);
 int vec_any_nlt (vector double, vector double);
 int vec_any_numeric (vector double);
-@end smallexample
+
+vector double vec_vsx_ld (int, const vector double *);
+vector double vec_vsx_ld (int, const double *);
+vector float vec_vsx_ld (int, const vector float *);
+vector float vec_vsx_ld (int, const float *);
+vector bool int vec_vsx_ld (int, const vector bool int *);
+vector signed int vec_vsx_ld (int, const vector signed int *);
+vector signed int vec_vsx_ld (int, const int *);
+vector signed int vec_vsx_ld (int, const long *);
+vector unsigned int vec_vsx_ld (int, const vector unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned long *);
+vector bool short vec_vsx_ld (int, const vector bool short *);
+vector pixel vec_vsx_ld (int, const vector pixel *);
+vector signed short vec_vsx_ld (int, const vector signed short *);
+vector signed short vec_vsx_ld (int, const short *);
+vector unsigned short vec_vsx_ld (int, const vector unsigned short *);
+vector unsigned short vec_vsx_ld (int, const unsigned short *);
+vector bool char vec_vsx_ld (int, const vector bool char *);
+vector signed char vec_vsx_ld (int, const vector signed char *);
+vector signed char vec_vsx_ld (int, const signed char *);
+vector unsigned char vec_vsx_ld (int, const vector unsigned char *);
+vector unsigned char vec_vsx_ld (int, const unsigned char *);
+
+void vec_vsx_st (vector double, int, vector double *);
+void vec_vsx_st (vector double, int, double *);
+void vec_vsx_st (vector float, int, vector float *);
+void vec_vsx_st (vector float, int, float *);
+void vec_vsx_st (vector signed int, int, vector signed int *);
+void vec_vsx_st (vector signed int, int, int *);
+void vec_vsx_st (vector unsigned int, int, vector unsigned int *);
+void vec_vsx_st (vector unsigned int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, vector bool int *);
+void vec_vsx_st (vector bool int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, int *);
+void vec_vsx_st (vector signed short, int, vector signed short *);
+void vec_vsx_st (vector signed short, int, short *);
+void vec_vsx_st (vector unsigned short, int, vector unsigned short *);
+void vec_vsx_st (vector unsigned short, int, unsigned short *);
+void vec_vsx_st (vector bool short, int, vector bool short *);
+void vec_vsx_st (vector bool short, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, vector pixel *);
+void vec_vsx_st (vector pixel, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, short *);
+void vec_vsx_st (vector bool short, int, short *);
+void vec_vsx_st (vector signed char, int, vector signed char *);
+void vec_vsx_st (vector signed char, int, signed char *);
+void vec_vsx_st (vector unsigned char, int, vector unsigned char *);
+void vec_vsx_st (vector unsigned char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, vector bool char *);
+void vec_vsx_st (vector bool char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, signed char *);
+@end smallexample
+
+Note that the @samp{vec_ld} and @samp{vec_st} builtins will always
+generate the Altivec @samp{LVX} and @samp{STVX} instructions even
+if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
+@samp{vec_vsx_st} builtins will always generate the VSX @samp{LXVD2X},
+@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
 GCC provides a few other builtins on Powerpc to access certain instructions:
 @smallexample
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-8.c	(revision 0)
@@ -0,0 +1,97 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O3 -mcpu=power7" } */
+
+/* Test the various load/store varients.  */
+
+#include <altivec.h>
+
+#define TEST_COPY(NAME, TYPE)						\
+void NAME ## _copy_native (vector TYPE *a, vector TYPE *b)		\
+{									\
+  *a = *b;								\
+}									\
+									\
+void NAME ## _copy_vec (vector TYPE *a, vector TYPE *b)			\
+{									\
+  vector TYPE x = vec_ld (0, b);					\
+  vec_st (x, 0, a);							\
+}									\
+
+#define TEST_COPYL(NAME, TYPE)						\
+void NAME ## _lvxl (vector TYPE *a, vector TYPE *b)			\
+{									\
+  vector TYPE x = vec_ldl (0, b);					\
+  vec_stl (x, 0, a);							\
+}									\
+
+#define TEST_VSX_COPY(NAME, TYPE)					\
+void NAME ## _copy_vsx (vector TYPE *a, vector TYPE *b)			\
+{									\
+  vector TYPE x = vec_vsx_ld (0, b);					\
+  vec_vsx_st (x, 0, a);							\
+}									\
+
+#define TEST_ALIGN(NAME, TYPE)						\
+void NAME ## _align (vector unsigned char *a, TYPE *b)			\
+{									\
+  vector unsigned char x = vec_lvsl (0, b);				\
+  vector unsigned char y = vec_lvsr (0, b);				\
+  vec_st (x, 0, a);							\
+  vec_st (y, 8, a);							\
+}
+
+#ifndef NO_COPY
+TEST_COPY(uchar,  unsigned char)
+TEST_COPY(schar,  signed   char)
+TEST_COPY(bchar,  bool     char)
+TEST_COPY(ushort, unsigned short)
+TEST_COPY(sshort, signed   short)
+TEST_COPY(bshort, bool     short)
+TEST_COPY(uint,   unsigned int)
+TEST_COPY(sint,   signed   int)
+TEST_COPY(bint,   bool     int)
+TEST_COPY(float,  float)
+TEST_COPY(double, double)
+#endif	/* NO_COPY */
+
+#ifndef NO_COPYL
+TEST_COPYL(uchar,  unsigned char)
+TEST_COPYL(schar,  signed   char)
+TEST_COPYL(bchar,  bool     char)
+TEST_COPYL(ushort, unsigned short)
+TEST_COPYL(sshort, signed   short)
+TEST_COPYL(bshort, bool     short)
+TEST_COPYL(uint,   unsigned int)
+TEST_COPYL(sint,   signed   int)
+TEST_COPYL(bint,   bool     int)
+TEST_COPYL(float,  float)
+TEST_COPYL(double, double)
+#endif	/* NO_COPYL */
+
+#ifndef NO_ALIGN
+TEST_ALIGN(uchar,  unsigned char)
+TEST_ALIGN(schar,  signed   char)
+TEST_ALIGN(ushort, unsigned short)
+TEST_ALIGN(sshort, signed   short)
+TEST_ALIGN(uint,   unsigned int)
+TEST_ALIGN(sint,   signed   int)
+TEST_ALIGN(float,  float)
+TEST_ALIGN(double, double)
+#endif	/* NO_ALIGN */
+
+
+#ifndef NO_VSX_COPY
+TEST_VSX_COPY(uchar,  unsigned char)
+TEST_VSX_COPY(schar,  signed   char)
+TEST_VSX_COPY(bchar,  bool     char)
+TEST_VSX_COPY(ushort, unsigned short)
+TEST_VSX_COPY(sshort, signed   short)
+TEST_VSX_COPY(bshort, bool     short)
+TEST_VSX_COPY(uint,   unsigned int)
+TEST_VSX_COPY(sint,   signed   int)
+TEST_VSX_COPY(bint,   bool     int)
+TEST_VSX_COPY(float,  float)
+TEST_VSX_COPY(double, double)
+#endif	/* NO_VSX_COPY */
Index: gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c	(revision 169775)
+++ gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do run { target { powerpc64-*-* && { lp64 && dfprt } } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-options "-std=gnu99 -O2 -fno-strict-aliasing" } */
 
 /* Testcase to check for ABI compliance of parameter passing
@@ -31,60 +32,42 @@ typedef struct
 reg_parms_t gparms;
 
 
-/* Testcase could break on future gcc's, if parameter regs
-   are changed before this asm.  */
-
-#ifndef __MACH__
-#define save_parms(lparms)				\
-    asm volatile ("ld 11,gparms@got(2)\n\t"                \
-                  "std 3,0(11)\n\t"		        \
-	          "std 4,8(11)\n\t"			\
-	          "std 5,16(11)\n\t"			\
-	          "std 6,24(11)\n\t"			\
-	          "std 7,32(11)\n\t"			\
-	          "std 8,40(11)\n\t"			\
-	          "std 9,48(11)\n\t"			\
-	          "std 10,56(11)\n\t"			\
-                  "stfd 1,64(11)\n\t"			\
-	          "stfd 2,72(11)\n\t"			\
-	          "stfd 3,80(11)\n\t"			\
-	          "stfd 4,88(11)\n\t"			\
-	          "stfd 5,96(11)\n\t"			\
-	          "stfd 6,104(11)\n\t"			\
-	          "stfd 7,112(11)\n\t"			\
-	          "stfd 8,120(11)\n\t"			\
-	          "stfd 9,128(11)\n\t"			\
-	          "stfd 10,136(11)\n\t"			\
-	          "stfd 11,144(11)\n\t"			\
-	          "stfd 12,152(11)\n\t"                 \
-	          "stfd 13,160(11)\n\t":::"11", "memory");  \
-                  lparms = gparms;
-#else
-#define save_parms(lparms)				\
-    asm volatile ("ld r11,gparms@got(r2)\n\t"           \
-                  "std r3,0(r11)\n\t"		        \
-	          "std r4,8(r11)\n\t"			\
-	          "std r5,16(r11)\n\t"			\
-	          "std r6,24(r11)\n\t"			\
-	          "std r7,32(r11)\n\t"			\
-	          "std r8,40(r11)\n\t"			\
-	          "std r9,48(r11)\n\t"			\
-	          "std r10,56(r11)\n\t"                 \
-                  "stfd f1,64(r11)\n\t"		        \
-	          "stfd f2,72(r11)\n\t"			\
-	          "stfd f3,80(r11)\n\t"			\
-	          "stfd f4,88(r11)\n\t"			\
-	          "stfd f5,96(r11)\n\t"			\
-	          "stfd f6,104(r11)\n\t"		\
-	          "stfd f7,112(r11)\n\t"		\
-	          "stfd f8,120(r11)\n\t"		\
-	          "stfd f9,128(r11)\n\t"		\
-	          "stfd f10,136(r11)\n\t"		\
-	          "stfd f11,144(r11)\n\t"		\
-	          "stfd f12,152(r11)\n\t"               \
-	          "stfd f13,160(r11)\n\t":::"r11", "memory");  \
-                  lparms = gparms;
-#endif
+/* Wrapper to save the GPRs and FPRs and then jump to the real function.  */
+#define WRAPPER(NAME)							\
+__asm__ ("\t.globl\t" #NAME "_asm\n\t"					\
+	 ".section \".opd\",\"aw\"\n\t"					\
+	 ".align 3\n"							\
+	 #NAME "_asm:\n\t"						\
+	 ".quad .L." #NAME "_asm,.TOC.@tocbase,0\n\t"			\
+	 ".text\n\t"							\
+	 ".type " #NAME "_asm, @function\n"				\
+	 ".L." #NAME "_asm:\n\t"					\
+	 "ld 11,gparms@got(2)\n\t"					\
+	 "std 3,0(11)\n\t"						\
+	 "std 4,8(11)\n\t"						\
+	 "std 5,16(11)\n\t"						\
+	 "std 6,24(11)\n\t"						\
+	 "std 7,32(11)\n\t"						\
+	 "std 8,40(11)\n\t"						\
+	 "std 9,48(11)\n\t"						\
+	 "std 10,56(11)\n\t"						\
+	 "stfd 1,64(11)\n\t"						\
+	 "stfd 2,72(11)\n\t"						\
+	 "stfd 3,80(11)\n\t"						\
+	 "stfd 4,88(11)\n\t"						\
+	 "stfd 5,96(11)\n\t"						\
+	 "stfd 6,104(11)\n\t"						\
+	 "stfd 7,112(11)\n\t"						\
+	 "stfd 8,120(11)\n\t"						\
+	 "stfd 9,128(11)\n\t"						\
+	 "stfd 10,136(11)\n\t"						\
+	 "stfd 11,144(11)\n\t"						\
+	 "stfd 12,152(11)\n\t"						\
+	 "stfd 13,160(11)\n\t"						\
+	 "b " #NAME "\n\t"						\
+	 ".long 0\n\t"							\
+	 ".byte 0,0,0,0,0,0,0,0\n\t"					\
+	 ".size " #NAME ",.-" #NAME "\n")
 
 typedef struct sf
 {
@@ -97,6 +80,13 @@ typedef struct sf
   unsigned long slot[100];
 } stack_frame_t;
 
+extern void func0_asm (double, double, double, double, double, double,
+		       double, double, double, double, double, double,
+		       double, double, 
+		       _Decimal64, _Decimal128, _Decimal64);
+
+WRAPPER(func0);
+
 /* Fill up floating point registers with double arguments, forcing
    decimal float arguments into the parameter save area.  */
 void __attribute__ ((noinline))
@@ -105,186 +95,209 @@ func0 (double a1, double a2, double a3, 
        double a13, double a14, 
        _Decimal64 a15, _Decimal128 a16, _Decimal64 a17)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
-  if (a9 != lparms.fprs[8]) FAILURE
-  if (a10 != lparms.fprs[9]) FAILURE
-  if (a11 != lparms.fprs[10]) FAILURE
-  if (a12 != lparms.fprs[11]) FAILURE
-  if (a13 != lparms.fprs[12]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
+  if (a9 != gparms.fprs[8]) FAILURE
+  if (a10 != gparms.fprs[9]) FAILURE
+  if (a11 != gparms.fprs[10]) FAILURE
+  if (a12 != gparms.fprs[11]) FAILURE
+  if (a13 != gparms.fprs[12]) FAILURE
   if (a14 != *(double *)&sp->slot[13]) FAILURE
   if (a15 != *(_Decimal64 *)&sp->slot[14]) FAILURE
   if (a16 != *(_Decimal128 *)&sp->slot[15]) FAILURE
   if (a17 != *(_Decimal64 *)&sp->slot[17]) FAILURE
 }
 
+extern void func1_asm (double, double, double, double, double, double,
+		       double, double, double, double, double, double,
+		       double, _Decimal128 );
+
+WRAPPER(func1);
+
 void __attribute__ ((noinline))
 func1 (double a1, double a2, double a3, double a4, double a5, double a6,
        double a7, double a8, double a9, double a10, double a11, double a12,
        double a13, _Decimal128 a14)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
-  if (a9 != lparms.fprs[8]) FAILURE
-  if (a10 != lparms.fprs[9]) FAILURE
-  if (a11 != lparms.fprs[10]) FAILURE
-  if (a12 != lparms.fprs[11]) FAILURE
-  if (a13 != lparms.fprs[12]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
+  if (a9 != gparms.fprs[8]) FAILURE
+  if (a10 != gparms.fprs[9]) FAILURE
+  if (a11 != gparms.fprs[10]) FAILURE
+  if (a12 != gparms.fprs[11]) FAILURE
+  if (a13 != gparms.fprs[12]) FAILURE
   if (a14 != *(_Decimal128 *)&sp->slot[13]) FAILURE
 }
 
+extern void func2_asm (double, double, double, double, double, double,
+		       double, double, double, double, double, double,
+		       _Decimal128);
+
+WRAPPER(func2);
+
 void __attribute__ ((noinline))
 func2 (double a1, double a2, double a3, double a4, double a5, double a6,
        double a7, double a8, double a9, double a10, double a11, double a12,
        _Decimal128 a13)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
-  if (a9 != lparms.fprs[8]) FAILURE
-  if (a10 != lparms.fprs[9]) FAILURE
-  if (a11 != lparms.fprs[10]) FAILURE
-  if (a12 != lparms.fprs[11]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
+  if (a9 != gparms.fprs[8]) FAILURE
+  if (a10 != gparms.fprs[9]) FAILURE
+  if (a11 != gparms.fprs[10]) FAILURE
+  if (a12 != gparms.fprs[11]) FAILURE
   if (a13 != *(_Decimal128 *)&sp->slot[12]) FAILURE
 }
 
+extern void func3_asm (_Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64, _Decimal128);
+
+WRAPPER(func3);
+
 void __attribute__ ((noinline))
 func3 (_Decimal64 a1, _Decimal128 a2, _Decimal64 a3, _Decimal128 a4,
        _Decimal64 a5, _Decimal128 a6, _Decimal64 a7, _Decimal128 a8,
        _Decimal64 a9, _Decimal128 a10)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal64 *)&lparms.fprs[0]) FAILURE	/* f1        */
-  if (a2 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3   */
-  if (a3 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4        */
-  if (a4 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7   */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8        */
-  if (a6 != *(_Decimal128 *)&lparms.fprs[9]) FAILURE	/* f10 & f11 */
-  if (a7 != *(_Decimal64 *)&lparms.fprs[11]) FAILURE	/* f12       */
+  if (a1 != *(_Decimal64 *)&gparms.fprs[0]) FAILURE	/* f1        */
+  if (a2 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3   */
+  if (a3 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4        */
+  if (a4 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7   */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8        */
+  if (a6 != *(_Decimal128 *)&gparms.fprs[9]) FAILURE	/* f10 & f11 */
+  if (a7 != *(_Decimal64 *)&gparms.fprs[11]) FAILURE	/* f12       */
   if (a8 != *(_Decimal128 *)&sp->slot[10]) FAILURE
   if (a9 != *(_Decimal64 *)&sp->slot[12]) FAILURE
   if (a10 != *(_Decimal128 *)&sp->slot[13]) FAILURE
 }
 
+extern void func4_asm (_Decimal128, _Decimal64, _Decimal128, _Decimal64,
+		       _Decimal128, _Decimal64, _Decimal128, _Decimal64);
+
+WRAPPER(func4);
+
 void __attribute__ ((noinline))
 func4 (_Decimal128 a1, _Decimal64 a2, _Decimal128 a3, _Decimal64 a4,
        _Decimal128 a5, _Decimal64 a6, _Decimal128 a7, _Decimal64 a8)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3   */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4        */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7   */
-  if (a4 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8        */
-  if (a5 != *(_Decimal128 *)&lparms.fprs[9]) FAILURE	/* f10 & f11 */
-  if (a6 != *(_Decimal64 *)&lparms.fprs[11]) FAILURE	/* f12       */
+  if (a1 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3   */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4        */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7   */
+  if (a4 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8        */
+  if (a5 != *(_Decimal128 *)&gparms.fprs[9]) FAILURE	/* f10 & f11 */
+  if (a6 != *(_Decimal64 *)&gparms.fprs[11]) FAILURE	/* f12       */
   if (a7 != *(_Decimal128 *)&sp->slot[9]) FAILURE
   if (a8 != *(_Decimal64 *)&sp->slot[11]) FAILURE
 }
 
+extern void func5_asm (_Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32);
+
+WRAPPER(func5);
+
 void __attribute__ ((noinline))
 func5 (_Decimal32 a1, _Decimal32 a2, _Decimal32 a3, _Decimal32 a4,
        _Decimal32 a5, _Decimal32 a6, _Decimal32 a7, _Decimal32 a8,
        _Decimal32 a9, _Decimal32 a10, _Decimal32 a11, _Decimal32 a12,
        _Decimal32 a13, _Decimal32 a14, _Decimal32 a15, _Decimal32 a16)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
   /* _Decimal32 is passed in the lower half of an FPR or parameter slot.  */
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1  */
-  if (a2 != ((d32parm_t *)&lparms.fprs[1])->d) FAILURE		/* f2  */
-  if (a3 != ((d32parm_t *)&lparms.fprs[2])->d) FAILURE		/* f3  */
-  if (a4 != ((d32parm_t *)&lparms.fprs[3])->d) FAILURE		/* f4  */
-  if (a5 != ((d32parm_t *)&lparms.fprs[4])->d) FAILURE		/* f5  */
-  if (a6 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6  */
-  if (a7 != ((d32parm_t *)&lparms.fprs[6])->d) FAILURE		/* f7  */
-  if (a8 != ((d32parm_t *)&lparms.fprs[7])->d) FAILURE		/* f8  */
-  if (a9 != ((d32parm_t *)&lparms.fprs[8])->d) FAILURE		/* f9  */
-  if (a10 != ((d32parm_t *)&lparms.fprs[9])->d) FAILURE		/* f10 */
-  if (a11 != ((d32parm_t *)&lparms.fprs[10])->d) FAILURE	/* f11 */
-  if (a12 != ((d32parm_t *)&lparms.fprs[11])->d) FAILURE	/* f12 */
-  if (a13 != ((d32parm_t *)&lparms.fprs[12])->d) FAILURE	/* f13 */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1  */
+  if (a2 != ((d32parm_t *)&gparms.fprs[1])->d) FAILURE		/* f2  */
+  if (a3 != ((d32parm_t *)&gparms.fprs[2])->d) FAILURE		/* f3  */
+  if (a4 != ((d32parm_t *)&gparms.fprs[3])->d) FAILURE		/* f4  */
+  if (a5 != ((d32parm_t *)&gparms.fprs[4])->d) FAILURE		/* f5  */
+  if (a6 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6  */
+  if (a7 != ((d32parm_t *)&gparms.fprs[6])->d) FAILURE		/* f7  */
+  if (a8 != ((d32parm_t *)&gparms.fprs[7])->d) FAILURE		/* f8  */
+  if (a9 != ((d32parm_t *)&gparms.fprs[8])->d) FAILURE		/* f9  */
+  if (a10 != ((d32parm_t *)&gparms.fprs[9])->d) FAILURE		/* f10 */
+  if (a11 != ((d32parm_t *)&gparms.fprs[10])->d) FAILURE	/* f11 */
+  if (a12 != ((d32parm_t *)&gparms.fprs[11])->d) FAILURE	/* f12 */
+  if (a13 != ((d32parm_t *)&gparms.fprs[12])->d) FAILURE	/* f13 */
   if (a14 != ((d32parm_t *)&sp->slot[13])->d) FAILURE
   if (a15 != ((d32parm_t *)&sp->slot[14])->d) FAILURE
   if (a16 != ((d32parm_t *)&sp->slot[15])->d) FAILURE
 }
 
+extern void func6_asm (_Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128);
+
+WRAPPER(func6);
+
 void __attribute__ ((noinline))
 func6 (_Decimal32 a1, _Decimal64 a2, _Decimal128 a3,
        _Decimal32 a4, _Decimal64 a5, _Decimal128 a6,
        _Decimal32 a7, _Decimal64 a8, _Decimal128 a9,
        _Decimal32 a10, _Decimal64 a11, _Decimal128 a12)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1        */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[1]) FAILURE		/* f2        */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[3]) FAILURE		/* f4 & f5   */
-  if (a4 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6        */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[6]) FAILURE		/* f7        */
-  if (a6 != *(_Decimal128 *)&lparms.fprs[7]) FAILURE		/* f8 & f9   */
-  if (a7 != ((d32parm_t *)&lparms.fprs[9])->d) FAILURE		/* f10       */
-  if (a8 != *(_Decimal64 *)&lparms.fprs[10]) FAILURE		/* f11       */
-  if (a9 != *(_Decimal128 *)&lparms.fprs[11]) FAILURE		/* f12 & f13 */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1        */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[1]) FAILURE		/* f2        */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[3]) FAILURE		/* f4 & f5   */
+  if (a4 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6        */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[6]) FAILURE		/* f7        */
+  if (a6 != *(_Decimal128 *)&gparms.fprs[7]) FAILURE		/* f8 & f9   */
+  if (a7 != ((d32parm_t *)&gparms.fprs[9])->d) FAILURE		/* f10       */
+  if (a8 != *(_Decimal64 *)&gparms.fprs[10]) FAILURE		/* f11       */
+  if (a9 != *(_Decimal128 *)&gparms.fprs[11]) FAILURE		/* f12 & f13 */
   if (a10 != ((d32parm_t *)&sp->slot[12])->d) FAILURE
   if (a11 != *(_Decimal64 *)&sp->slot[13]) FAILURE
 }
@@ -292,23 +305,23 @@ func6 (_Decimal32 a1, _Decimal64 a2, _De
 int
 main (void)
 {
-  func0 (1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5,
-	 14.5, 15.2dd, 16.2dl, 17.2dd);
-  func1 (101.5, 102.5, 103.5, 104.5, 105.5, 106.5, 107.5, 108.5, 109.5,
-	 110.5, 111.5, 112.5, 113.5, 114.2dd);
-  func2 (201.5, 202.5, 203.5, 204.5, 205.5, 206.5, 207.5, 208.5, 209.5,
-	 210.5, 211.5, 212.5, 213.2dd);
-  func3 (301.2dd, 302.2dl, 303.2dd, 304.2dl, 305.2dd, 306.2dl, 307.2dd,
-	 308.2dl, 309.2dd, 310.2dl);
-  func4 (401.2dl, 402.2dd, 403.2dl, 404.2dd, 405.2dl, 406.2dd, 407.2dl,
-	 408.2dd);
+  func0_asm (1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5,
+	     14.5, 15.2dd, 16.2dl, 17.2dd);
+  func1_asm (101.5, 102.5, 103.5, 104.5, 105.5, 106.5, 107.5, 108.5, 109.5,
+	     110.5, 111.5, 112.5, 113.5, 114.2dd);
+  func2_asm (201.5, 202.5, 203.5, 204.5, 205.5, 206.5, 207.5, 208.5, 209.5,
+	     210.5, 211.5, 212.5, 213.2dd);
+  func3_asm (301.2dd, 302.2dl, 303.2dd, 304.2dl, 305.2dd, 306.2dl, 307.2dd,
+	     308.2dl, 309.2dd, 310.2dl);
+  func4_asm (401.2dl, 402.2dd, 403.2dl, 404.2dd, 405.2dl, 406.2dd, 407.2dl,
+	     408.2dd);
 #if 0
   /* _Decimal32 doesn't yet follow the ABI; enable this when it does.  */
-  func5 (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
-	 508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
-	 515.2df, 516.2df);
-  func6 (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
-	 607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
+  func5_asm (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
+	     508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
+	     515.2df, 516.2df);
+  func6_asm (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
+	     607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
 #endif
 
   if (failcnt != 0)
Index: gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c	(revision 169775)
+++ gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c	(working copy)
@@ -30,31 +30,6 @@ typedef struct
 
 reg_parms_t gparms;
 
-
-/* Testcase could break on future gcc's, if parameter regs
-   are changed before this asm.  */
-
-#define save_parms(lparms)				\
-    asm volatile ("lis 11,gparms@ha\n\t"		\
-                  "la 11,gparms@l(11)\n\t"		\
-                  "st 3,0(11)\n\t"		        \
-	          "st 4,4(11)\n\t"			\
-	          "st 5,8(11)\n\t"			\
-	          "st 6,12(11)\n\t"			\
-	          "st 7,16(11)\n\t"			\
-	          "st 8,20(11)\n\t"			\
-	          "st 9,24(11)\n\t"			\
-	          "st 10,28(11)\n\t"			\
-                  "stfd 1,32(11)\n\t"			\
-	          "stfd 2,40(11)\n\t"			\
-	          "stfd 3,48(11)\n\t"			\
-	          "stfd 4,56(11)\n\t"			\
-	          "stfd 5,64(11)\n\t"			\
-	          "stfd 6,72(11)\n\t"			\
-	          "stfd 7,80(11)\n\t"			\
-	          "stfd 8,88(11)\n\t":::"11", "memory");  \
-                  lparms = gparms;
-
 typedef struct sf
 {
   struct sf *backchain;
@@ -62,115 +37,159 @@ typedef struct sf
   unsigned int slot[200];
 } stack_frame_t;
 
+/* Wrapper to save the GPRs and FPRs and then jump to the real function.  */
+#define WRAPPER(NAME)							\
+__asm__ ("\t.globl\t" #NAME "_asm\n\t"					\
+	 ".text\n\t"							\
+	 ".type " #NAME "_asm, @function\n"				\
+	 #NAME "_asm:\n\t"						\
+	 "lis 11,gparms@ha\n\t"						\
+	 "la 11,gparms@l(11)\n\t"					\
+	 "st 3,0(11)\n\t"						\
+	 "st 4,4(11)\n\t"						\
+	 "st 5,8(11)\n\t"						\
+	 "st 6,12(11)\n\t"						\
+	 "st 7,16(11)\n\t"						\
+	 "st 8,20(11)\n\t"						\
+	 "st 9,24(11)\n\t"						\
+	 "st 10,28(11)\n\t"						\
+	 "stfd 1,32(11)\n\t"						\
+	 "stfd 2,40(11)\n\t"						\
+	 "stfd 3,48(11)\n\t"						\
+	 "stfd 4,56(11)\n\t"						\
+	 "stfd 5,64(11)\n\t"						\
+	 "stfd 6,72(11)\n\t"						\
+	 "stfd 7,80(11)\n\t"						\
+	 "stfd 8,88(11)\n\t"						\
+	 "b " #NAME "\n\t"						\
+	 ".size " #NAME ",.-" #NAME "\n")
+
 /* Fill up floating point registers with double arguments, forcing
    decimal float arguments into the parameter save area.  */
+extern void func0_asm (double, double, double, double, double,
+		       double, double, double, _Decimal64, _Decimal128);
+
+WRAPPER(func0);
+
 void __attribute__ ((noinline))
 func0 (double a1, double a2, double a3, double a4, double a5,
        double a6, double a7, double a8, _Decimal64 a9, _Decimal128 a10)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != lparms.fprs[0]) FAILURE
-  if (a2 != lparms.fprs[1]) FAILURE
-  if (a3 != lparms.fprs[2]) FAILURE
-  if (a4 != lparms.fprs[3]) FAILURE
-  if (a5 != lparms.fprs[4]) FAILURE
-  if (a6 != lparms.fprs[5]) FAILURE
-  if (a7 != lparms.fprs[6]) FAILURE
-  if (a8 != lparms.fprs[7]) FAILURE
+  if (a1 != gparms.fprs[0]) FAILURE
+  if (a2 != gparms.fprs[1]) FAILURE
+  if (a3 != gparms.fprs[2]) FAILURE
+  if (a4 != gparms.fprs[3]) FAILURE
+  if (a5 != gparms.fprs[4]) FAILURE
+  if (a6 != gparms.fprs[5]) FAILURE
+  if (a7 != gparms.fprs[6]) FAILURE
+  if (a8 != gparms.fprs[7]) FAILURE
   if (a9 != *(_Decimal64 *)&sp->slot[0]) FAILURE
   if (a10 != *(_Decimal128 *)&sp->slot[2]) FAILURE
 }
 
 /* Alternate 64-bit and 128-bit decimal float arguments, checking that
    _Decimal128 is always passed in even/odd register pairs.  */
+extern void func1_asm (_Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64, _Decimal128, _Decimal64, _Decimal128);
+
+WRAPPER(func1);
+
 void __attribute__ ((noinline))
 func1 (_Decimal64 a1, _Decimal128 a2, _Decimal64 a3, _Decimal128 a4,
        _Decimal64 a5, _Decimal128 a6, _Decimal64 a7, _Decimal128 a8)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal64 *)&lparms.fprs[0]) FAILURE	/* f1 */
-  if (a2 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3 */
-  if (a3 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4 */
-  if (a4 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7 */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8 */
+  if (a1 != *(_Decimal64 *)&gparms.fprs[0]) FAILURE	/* f1 */
+  if (a2 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3 */
+  if (a3 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4 */
+  if (a4 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7 */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8 */
   if (a6 != *(_Decimal128 *)&sp->slot[0]) FAILURE
   if (a7 != *(_Decimal64 *)&sp->slot[4]) FAILURE
   if (a8 != *(_Decimal128 *)&sp->slot[6]) FAILURE
 }
 
+extern void func2_asm (_Decimal128, _Decimal64, _Decimal128, _Decimal64,
+		       _Decimal128, _Decimal64, _Decimal128, _Decimal64);
+
+WRAPPER(func2);
+
 void __attribute__ ((noinline))
 func2 (_Decimal128 a1, _Decimal64 a2, _Decimal128 a3, _Decimal64 a4,
        _Decimal128 a5, _Decimal64 a6, _Decimal128 a7, _Decimal64 a8)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3 */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4 */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7 */
-  if (a4 != *(_Decimal64 *)&lparms.fprs[7]) FAILURE	/* f8 */
+  if (a1 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3 */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4 */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7 */
+  if (a4 != *(_Decimal64 *)&gparms.fprs[7]) FAILURE	/* f8 */
   if (a5 != *(_Decimal128 *)&sp->slot[0]) FAILURE
   if (a6 != *(_Decimal64 *)&sp->slot[4]) FAILURE
   if (a7 != *(_Decimal128 *)&sp->slot[6]) FAILURE
   if (a8 != *(_Decimal64 *)&sp->slot[10]) FAILURE
 }
 
+extern void func3_asm (_Decimal64, _Decimal128, _Decimal64, _Decimal128,
+		       _Decimal64);
+
+WRAPPER(func3);
+
 void __attribute__ ((noinline))
 func3 (_Decimal64 a1, _Decimal128 a2, _Decimal64 a3, _Decimal128 a4,
        _Decimal64 a5)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != *(_Decimal64 *)&lparms.fprs[0]) FAILURE	/* f1 */
-  if (a2 != *(_Decimal128 *)&lparms.fprs[1]) FAILURE	/* f2 & f3 */
-  if (a3 != *(_Decimal64 *)&lparms.fprs[3]) FAILURE	/* f4 */
-  if (a4 != *(_Decimal128 *)&lparms.fprs[5]) FAILURE	/* f6 & f7 */
+  if (a1 != *(_Decimal64 *)&gparms.fprs[0]) FAILURE	/* f1 */
+  if (a2 != *(_Decimal128 *)&gparms.fprs[1]) FAILURE	/* f2 & f3 */
+  if (a3 != *(_Decimal64 *)&gparms.fprs[3]) FAILURE	/* f4 */
+  if (a4 != *(_Decimal128 *)&gparms.fprs[5]) FAILURE	/* f6 & f7 */
   if (a5 != *(_Decimal128 *)&sp->slot[0]) FAILURE
 }
 
+extern void func4_asm (_Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32,
+		       _Decimal32, _Decimal32, _Decimal32, _Decimal32);
+
+WRAPPER(func4);
+
 void __attribute__ ((noinline))
 func4 (_Decimal32 a1, _Decimal32 a2, _Decimal32 a3, _Decimal32 a4,
        _Decimal32 a5, _Decimal32 a6, _Decimal32 a7, _Decimal32 a8,
        _Decimal32 a9, _Decimal32 a10, _Decimal32 a11, _Decimal32 a12,
        _Decimal32 a13, _Decimal32 a14, _Decimal32 a15, _Decimal32 a16)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
   /* _Decimal32 is passed in the lower half of an FPR, or in parameter slot.  */
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1  */
-  if (a2 != ((d32parm_t *)&lparms.fprs[1])->d) FAILURE		/* f2  */
-  if (a3 != ((d32parm_t *)&lparms.fprs[2])->d) FAILURE		/* f3  */
-  if (a4 != ((d32parm_t *)&lparms.fprs[3])->d) FAILURE		/* f4  */
-  if (a5 != ((d32parm_t *)&lparms.fprs[4])->d) FAILURE		/* f5  */
-  if (a6 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6  */
-  if (a7 != ((d32parm_t *)&lparms.fprs[6])->d) FAILURE		/* f7  */
-  if (a8 != ((d32parm_t *)&lparms.fprs[7])->d) FAILURE		/* f8  */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1  */
+  if (a2 != ((d32parm_t *)&gparms.fprs[1])->d) FAILURE		/* f2  */
+  if (a3 != ((d32parm_t *)&gparms.fprs[2])->d) FAILURE		/* f3  */
+  if (a4 != ((d32parm_t *)&gparms.fprs[3])->d) FAILURE		/* f4  */
+  if (a5 != ((d32parm_t *)&gparms.fprs[4])->d) FAILURE		/* f5  */
+  if (a6 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6  */
+  if (a7 != ((d32parm_t *)&gparms.fprs[6])->d) FAILURE		/* f7  */
+  if (a8 != ((d32parm_t *)&gparms.fprs[7])->d) FAILURE		/* f8  */
   if (a9 != *(_Decimal32 *)&sp->slot[0]) FAILURE
   if (a10 != *(_Decimal32 *)&sp->slot[1]) FAILURE
   if (a11 != *(_Decimal32 *)&sp->slot[2]) FAILURE
@@ -181,24 +200,29 @@ func4 (_Decimal32 a1, _Decimal32 a2, _De
   if (a16 != *(_Decimal32 *)&sp->slot[7]) FAILURE
 }
 
+extern void func5_asm (_Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128,
+		       _Decimal32, _Decimal64, _Decimal128);
+
+WRAPPER(func5);
+
 void __attribute__ ((noinline))
 func5 (_Decimal32 a1, _Decimal64 a2, _Decimal128 a3,
        _Decimal32 a4, _Decimal64 a5, _Decimal128 a6,
        _Decimal32 a7, _Decimal64 a8, _Decimal128 a9,
        _Decimal32 a10, _Decimal64 a11, _Decimal128 a12)
 {
-  reg_parms_t lparms;
   stack_frame_t *sp;
 
-  save_parms (lparms);
   sp = __builtin_frame_address (0);
   sp = sp->backchain;
 
-  if (a1 != ((d32parm_t *)&lparms.fprs[0])->d) FAILURE		/* f1      */
-  if (a2 != *(_Decimal64 *)&lparms.fprs[1]) FAILURE		/* f2      */
-  if (a3 != *(_Decimal128 *)&lparms.fprs[3]) FAILURE		/* f4 & f5 */
-  if (a4 != ((d32parm_t *)&lparms.fprs[5])->d) FAILURE		/* f6      */
-  if (a5 != *(_Decimal64 *)&lparms.fprs[6]) FAILURE		/* f7      */
+  if (a1 != ((d32parm_t *)&gparms.fprs[0])->d) FAILURE		/* f1      */
+  if (a2 != *(_Decimal64 *)&gparms.fprs[1]) FAILURE		/* f2      */
+  if (a3 != *(_Decimal128 *)&gparms.fprs[3]) FAILURE		/* f4 & f5 */
+  if (a4 != ((d32parm_t *)&gparms.fprs[5])->d) FAILURE		/* f6      */
+  if (a5 != *(_Decimal64 *)&gparms.fprs[6]) FAILURE		/* f7      */
 
   if (a6 != *(_Decimal128 *)&sp->slot[0]) FAILURE
   if (a7 != *(_Decimal32 *)&sp->slot[4]) FAILURE
@@ -212,15 +236,15 @@ func5 (_Decimal32 a1, _Decimal64 a2, _De
 int
 main ()
 {
-  func0 (1., 2., 3., 4., 5., 6., 7., 8., 9.dd, 10.dl);
-  func1 (1.dd, 2.dl, 3.dd, 4.dl, 5.dd, 6.dl, 7.dd, 8.dl);
-  func2 (1.dl, 2.dd, 3.dl, 4.dd, 5.dl, 6.dd, 7.dl, 8.dd);
-  func3 (1.dd, 2.dl, 3.dd, 4.dl, 5.dl);
-  func4 (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
-	 508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
-	 515.2df, 516.2df);
-  func5 (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
-	 607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
+  func0_asm (1., 2., 3., 4., 5., 6., 7., 8., 9.dd, 10.dl);
+  func1_asm (1.dd, 2.dl, 3.dd, 4.dl, 5.dd, 6.dl, 7.dd, 8.dl);
+  func2_asm (1.dl, 2.dd, 3.dl, 4.dd, 5.dl, 6.dd, 7.dl, 8.dd);
+  func3_asm (1.dd, 2.dl, 3.dd, 4.dl, 5.dl);
+  func4_asm (501.2df, 502.2df, 503.2df, 504.2df, 505.2df, 506.2df, 507.2df,
+	     508.2df, 509.2df, 510.2df, 511.2df, 512.2df, 513.2df, 514.2df,
+	     515.2df, 516.2df);
+  func5_asm (601.2df, 602.2dd, 603.2dl, 604.2df, 605.2dd, 606.2dl,
+	     607.2df, 608.2dd, 609.2dl, 610.2df, 611.2dd, 612.2dl);
 
   if (failcnt != 0)
     abort ();
Index: gcc/testsuite/gcc.target/powerpc/avoid-indexed-addresses.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/avoid-indexed-addresses.c	(revision 169775)
+++ gcc/testsuite/gcc.target/powerpc/avoid-indexed-addresses.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-options "-O2 -mavoid-indexed-addresses" } */
+/* { dg-options "-O2 -mavoid-indexed-addresses -mno-altivec -mno-vsx" } */
 
 /* { dg-final { scan-assembler-not "lbzx" } }
 
Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 169775)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -3,7 +3,7 @@
 ;; expander, and the actual vector instructions will be in altivec.md and
 ;; vsx.md
 
-;; Copyright (C) 2009, 2010
+;; Copyright (C) 2009, 2010, 2011
 ;; Free Software Foundation, Inc.
 ;; Contributed by Michael Meissner <meissner@linux.vnet.ibm.com>
 
@@ -123,6 +123,43 @@ (define_split
   DONE;
 })
 
+;; Vector floating point load/store instructions that uses the Altivec
+;; instructions even if we are compiling for VSX, since the Altivec
+;; instructions silently ignore the bottom 3 bits of the address, and VSX does
+;; not.
+(define_expand "vector_altivec_load_<mode>"
+  [(set (match_operand:VEC_M 0 "vfloat_operand" "")
+	(match_operand:VEC_M 1 "memory_operand" ""))]
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "
+{
+  gcc_assert (VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode));
+
+  if (VECTOR_MEM_VSX_P (<MODE>mode))
+    {
+      operands[1] = rs6000_address_for_altivec (operands[1]);
+      emit_insn (gen_altivec_lvx_<mode> (operands[0], operands[1]));
+      DONE;
+    }
+}")
+
+(define_expand "vector_altivec_store_<mode>"
+  [(set (match_operand:VEC_M 0 "memory_operand" "")
+	(match_operand:VEC_M 1 "vfloat_operand" ""))]
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "
+{
+  gcc_assert (VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode));
+
+  if (VECTOR_MEM_VSX_P (<MODE>mode))
+    {
+      operands[0] = rs6000_address_for_altivec (operands[0]);
+      emit_insn (gen_altivec_stvx_<mode> (operands[0], operands[1]));
+      DONE;
+    }
+}")
+
+
 \f
 ;; Reload patterns for vector operations.  We may need an addtional base
 ;; register to convert the reg+offset addressing to reg+reg for vector
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 169775)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -1,5 +1,6 @@
 /* Definitions of target machine for GNU compiler, for IBM RS/6000.
-   Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
+   2010, 2011
    Free Software Foundation, Inc.
    Contributed by Richard Kenner (kenner@vlsi1.ultra.nyu.edu)
 
@@ -129,6 +130,7 @@ extern void rs6000_emit_parity (rtx, rtx
 extern rtx rs6000_machopic_legitimize_pic_address (rtx, enum machine_mode,
 						   rtx);
 extern rtx rs6000_address_for_fpconvert (rtx);
+extern rtx rs6000_address_for_altivec (rtx);
 extern rtx rs6000_allocate_stack_temp (enum machine_mode, bool, bool);
 extern int rs6000_loop_align (rtx);
 #endif /* RTX_CODE */
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 169775)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -1,5 +1,5 @@
 /* Builtin functions for rs6000/powerpc.
-   Copyright (C) 2009, 2010
+   Copyright (C) 2009, 2010, 2011
    Free Software Foundation, Inc.
    Contributed by Michael Meissner (meissner@linux.vnet.ibm.com)
 
@@ -37,6 +37,10 @@ RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERN
 RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_16qi,	RS6000_BTC_MEM)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERNAL_4sf,		RS6000_BTC_MEM)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_4sf,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERNAL_2df,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_2df,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_ST_INTERNAL_2di,		RS6000_BTC_MEM)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_LD_INTERNAL_2di,		RS6000_BTC_MEM)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VADDUBM,			RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VADDUHM,			RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VADDUWM,			RS6000_BTC_CONST)
@@ -778,12 +782,20 @@ RS6000_BUILTIN(PAIRED_BUILTIN_CMPU1,			R
 
   /* VSX builtins.  */
 RS6000_BUILTIN(VSX_BUILTIN_LXSDX,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_LXVD2X,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVD2X_V2DF,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVD2X_V2DI,			RS6000_BTC_MEM)
 RS6000_BUILTIN(VSX_BUILTIN_LXVDSX,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_LXVW4X,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V4SF,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V4SI,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V8HI,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_LXVW4X_V16QI,		RS6000_BTC_MEM)
 RS6000_BUILTIN(VSX_BUILTIN_STXSDX,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_STXVD2X,			RS6000_BTC_MEM)
-RS6000_BUILTIN(VSX_BUILTIN_STXVW4X,			RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVD2X_V2DF,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVD2X_V2DI,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V4SF,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V4SI,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V8HI,		RS6000_BTC_MEM)
+RS6000_BUILTIN(VSX_BUILTIN_STXVW4X_V16QI,		RS6000_BTC_MEM)
 RS6000_BUILTIN(VSX_BUILTIN_XSABSDP,			RS6000_BTC_CONST)
 RS6000_BUILTIN(VSX_BUILTIN_XSADDDP,			RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(VSX_BUILTIN_XSCMPODP,			RS6000_BTC_FP_PURE)
@@ -983,8 +995,10 @@ RS6000_BUILTIN(VSX_BUILTIN_VEC_XXPERMDI,
 RS6000_BUILTIN(VSX_BUILTIN_VEC_XXSLDWI,			RS6000_BTC_MISC)
 RS6000_BUILTIN(VSX_BUILTIN_VEC_XXSPLTD,			RS6000_BTC_MISC)
 RS6000_BUILTIN(VSX_BUILTIN_VEC_XXSPLTW,			RS6000_BTC_MISC)
+RS6000_BUILTIN(VSX_BUILTIN_VEC_LD,			RS6000_BTC_MISC)
+RS6000_BUILTIN(VSX_BUILTIN_VEC_ST,			RS6000_BTC_MISC)
 RS6000_BUILTIN_EQUATE(VSX_BUILTIN_OVERLOADED_LAST,
-		      VSX_BUILTIN_VEC_XXSPLTW)
+		      VSX_BUILTIN_VEC_ST)
 
 /* Combined VSX/Altivec builtins.  */
 RS6000_BUILTIN(VECTOR_BUILTIN_FLOAT_V4SI_V4SF,		RS6000_BTC_FP_PURE)
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 169775)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -1000,6 +1000,15 @@ const struct altivec_builtin_types altiv
   { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_XVDIVDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
@@ -1112,9 +1121,19 @@ const struct altivec_builtin_types altiv
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
     RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
-    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI, 0 },
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
@@ -1133,6 +1152,17 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSL, ALTIVEC_BUILTIN_LVSL,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_long_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
@@ -1151,6 +1181,17 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTDI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { ALTIVEC_BUILTIN_VEC_LVSR, ALTIVEC_BUILTIN_LVSR,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_long_long, 0 },
   { ALTIVEC_BUILTIN_VEC_LVLX, ALTIVEC_BUILTIN_LVLX,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_LVLX, ALTIVEC_BUILTIN_LVLX,
@@ -2644,6 +2685,16 @@ const struct altivec_builtin_types altiv
   { ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VSLDOI_16QI,
     RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_NOT_OPAQUE },
   { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
+    RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
     RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_ST, ALTIVEC_BUILTIN_STVX,
     RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float },
@@ -2809,6 +2860,18 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
   { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
     RS6000_BTI_void, RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_pixel_V8HI },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_STL, ALTIVEC_BUILTIN_STVXL,
+    RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_STVLX, ALTIVEC_BUILTIN_STVLX,
     RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_STVLX, ALTIVEC_BUILTIN_STVLX,
@@ -3002,6 +3065,135 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
     RS6000_BTI_NOT_OPAQUE },
 
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V4SI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V4SI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_pixel_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V8HI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V16QI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V16QI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
+
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DF,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_bool_V2DI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SF,
+    RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SF,
+    RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V4SI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_bool_V4SI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V4SI,
+    RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_INTSI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V8HI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_bool_V8HI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V8HI,
+    RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_INTHI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V16QI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_bool_V16QI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_INTQI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVW4X_V16QI,
+    RS6000_BTI_void, RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_pixel_V8HI },
+
   /* Predicates.  */
   { ALTIVEC_BUILTIN_VCMPGT_P, ALTIVEC_BUILTIN_VCMPGTUB_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI },
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 169775)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3316,9 +3316,12 @@ rs6000_option_override_internal (bool gl
   /* If not explicitly specified via option, decide whether to generate indexed
      load/store instructions.  */
   if (TARGET_AVOID_XFORM == -1)
-    /* Avoid indexed addressing when targeting Power6 in order to avoid
-     the DERAT mispredict penalty.  */
-    TARGET_AVOID_XFORM = (rs6000_cpu == PROCESSOR_POWER6 && TARGET_CMPB);
+    /* Avoid indexed addressing when targeting Power6 in order to avoid the
+     DERAT mispredict penalty.  However the LVE and STVE altivec instructions
+     need indexed accesses and the type used is the scalar type of the element
+     being loaded or stored.  */
+    TARGET_AVOID_XFORM = (rs6000_cpu == PROCESSOR_POWER6 && TARGET_CMPB
+			  && !TARGET_ALTIVEC);
 
   /* Set the -mrecip options.  */
   if (rs6000_recip_name)
@@ -11263,16 +11266,22 @@ altivec_expand_ld_builtin (tree exp, rtx
   switch (fcode)
     {
     case ALTIVEC_BUILTIN_LD_INTERNAL_16qi:
-      icode = CODE_FOR_vector_load_v16qi;
+      icode = CODE_FOR_vector_altivec_load_v16qi;
       break;
     case ALTIVEC_BUILTIN_LD_INTERNAL_8hi:
-      icode = CODE_FOR_vector_load_v8hi;
+      icode = CODE_FOR_vector_altivec_load_v8hi;
       break;
     case ALTIVEC_BUILTIN_LD_INTERNAL_4si:
-      icode = CODE_FOR_vector_load_v4si;
+      icode = CODE_FOR_vector_altivec_load_v4si;
       break;
     case ALTIVEC_BUILTIN_LD_INTERNAL_4sf:
-      icode = CODE_FOR_vector_load_v4sf;
+      icode = CODE_FOR_vector_altivec_load_v4sf;
+      break;
+    case ALTIVEC_BUILTIN_LD_INTERNAL_2df:
+      icode = CODE_FOR_vector_altivec_load_v2df;
+      break;
+    case ALTIVEC_BUILTIN_LD_INTERNAL_2di:
+      icode = CODE_FOR_vector_altivec_load_v2di;
       break;
     default:
       *expandedp = false;
@@ -11316,16 +11325,22 @@ altivec_expand_st_builtin (tree exp, rtx
   switch (fcode)
     {
     case ALTIVEC_BUILTIN_ST_INTERNAL_16qi:
-      icode = CODE_FOR_vector_store_v16qi;
+      icode = CODE_FOR_vector_altivec_store_v16qi;
       break;
     case ALTIVEC_BUILTIN_ST_INTERNAL_8hi:
-      icode = CODE_FOR_vector_store_v8hi;
+      icode = CODE_FOR_vector_altivec_store_v8hi;
       break;
     case ALTIVEC_BUILTIN_ST_INTERNAL_4si:
-      icode = CODE_FOR_vector_store_v4si;
+      icode = CODE_FOR_vector_altivec_store_v4si;
       break;
     case ALTIVEC_BUILTIN_ST_INTERNAL_4sf:
-      icode = CODE_FOR_vector_store_v4sf;
+      icode = CODE_FOR_vector_altivec_store_v4sf;
+      break;
+    case ALTIVEC_BUILTIN_ST_INTERNAL_2df:
+      icode = CODE_FOR_vector_altivec_store_v2df;
+      break;
+    case ALTIVEC_BUILTIN_ST_INTERNAL_2di:
+      icode = CODE_FOR_vector_altivec_store_v2di;
       break;
     default:
       *expandedp = false;
@@ -11557,7 +11572,7 @@ altivec_expand_builtin (tree exp, rtx ta
   switch (fcode)
     {
     case ALTIVEC_BUILTIN_STVX:
-      return altivec_expand_stv_builtin (CODE_FOR_altivec_stvx, exp);
+      return altivec_expand_stv_builtin (CODE_FOR_altivec_stvx_v4si, exp);
     case ALTIVEC_BUILTIN_STVEBX:
       return altivec_expand_stv_builtin (CODE_FOR_altivec_stvebx, exp);
     case ALTIVEC_BUILTIN_STVEHX:
@@ -11576,6 +11591,19 @@ altivec_expand_builtin (tree exp, rtx ta
     case ALTIVEC_BUILTIN_STVRXL:
       return altivec_expand_stv_builtin (CODE_FOR_altivec_stvrxl, exp);
 
+    case VSX_BUILTIN_STXVD2X_V2DF:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v2df, exp);
+    case VSX_BUILTIN_STXVD2X_V2DI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v2di, exp);
+    case VSX_BUILTIN_STXVW4X_V4SF:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v4sf, exp);
+    case VSX_BUILTIN_STXVW4X_V4SI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v4si, exp);
+    case VSX_BUILTIN_STXVW4X_V8HI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v8hi, exp);
+    case VSX_BUILTIN_STXVW4X_V16QI:
+      return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v16qi, exp);
+
     case ALTIVEC_BUILTIN_MFVSCR:
       icode = CODE_FOR_altivec_mfvscr;
       tmode = insn_data[icode].operand[0].mode;
@@ -11700,7 +11728,7 @@ altivec_expand_builtin (tree exp, rtx ta
       return altivec_expand_lv_builtin (CODE_FOR_altivec_lvxl,
 					exp, target, false);
     case ALTIVEC_BUILTIN_LVX:
-      return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx,
+      return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v4si,
 					exp, target, false);
     case ALTIVEC_BUILTIN_LVLX:
       return altivec_expand_lv_builtin (CODE_FOR_altivec_lvlx,
@@ -11714,6 +11742,25 @@ altivec_expand_builtin (tree exp, rtx ta
     case ALTIVEC_BUILTIN_LVRXL:
       return altivec_expand_lv_builtin (CODE_FOR_altivec_lvrxl,
 					exp, target, true);
+    case VSX_BUILTIN_LXVD2X_V2DF:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v2df,
+					exp, target, false);
+    case VSX_BUILTIN_LXVD2X_V2DI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v2di,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V4SF:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v4sf,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V4SI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v4si,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V8HI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v8hi,
+					exp, target, false);
+    case VSX_BUILTIN_LXVW4X_V16QI:
+      return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v16qi,
+					exp, target, false);
+      break;
     default:
       break;
       /* Fall through.  */
@@ -12331,6 +12378,8 @@ rs6000_init_builtins (void)
 
   long_integer_type_internal_node = long_integer_type_node;
   long_unsigned_type_internal_node = long_unsigned_type_node;
+  long_long_integer_type_internal_node = long_long_integer_type_node;
+  long_long_unsigned_type_internal_node = long_long_unsigned_type_node;
   intQI_type_internal_node = intQI_type_node;
   uintQI_type_internal_node = unsigned_intQI_type_node;
   intHI_type_internal_node = intHI_type_node;
@@ -12340,7 +12389,7 @@ rs6000_init_builtins (void)
   intDI_type_internal_node = intDI_type_node;
   uintDI_type_internal_node = unsigned_intDI_type_node;
   float_type_internal_node = float_type_node;
-  double_type_internal_node = float_type_node;
+  double_type_internal_node = double_type_node;
   void_type_internal_node = void_type_node;
 
   /* Initialize the modes for builtin_function_type, mapping a machine mode to
@@ -12872,19 +12921,11 @@ altivec_init_builtins (void)
   size_t i;
   tree ftype;
 
-  tree pfloat_type_node = build_pointer_type (float_type_node);
-  tree pint_type_node = build_pointer_type (integer_type_node);
-  tree pshort_type_node = build_pointer_type (short_integer_type_node);
-  tree pchar_type_node = build_pointer_type (char_type_node);
-
   tree pvoid_type_node = build_pointer_type (void_type_node);
 
-  tree pcfloat_type_node = build_pointer_type (build_qualified_type (float_type_node, TYPE_QUAL_CONST));
-  tree pcint_type_node = build_pointer_type (build_qualified_type (integer_type_node, TYPE_QUAL_CONST));
-  tree pcshort_type_node = build_pointer_type (build_qualified_type (short_integer_type_node, TYPE_QUAL_CONST));
-  tree pcchar_type_node = build_pointer_type (build_qualified_type (char_type_node, TYPE_QUAL_CONST));
-
-  tree pcvoid_type_node = build_pointer_type (build_qualified_type (void_type_node, TYPE_QUAL_CONST));
+  tree pcvoid_type_node
+    = build_pointer_type (build_qualified_type (void_type_node,
+						TYPE_QUAL_CONST));
 
   tree int_ftype_opaque
     = build_function_type_list (integer_type_node,
@@ -12907,26 +12948,6 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V4SI_type_node,
 				V4SI_type_node, NULL_TREE);
-  tree v4sf_ftype_pcfloat
-    = build_function_type_list (V4SF_type_node, pcfloat_type_node, NULL_TREE);
-  tree void_ftype_pfloat_v4sf
-    = build_function_type_list (void_type_node,
-				pfloat_type_node, V4SF_type_node, NULL_TREE);
-  tree v4si_ftype_pcint
-    = build_function_type_list (V4SI_type_node, pcint_type_node, NULL_TREE);
-  tree void_ftype_pint_v4si
-    = build_function_type_list (void_type_node,
-				pint_type_node, V4SI_type_node, NULL_TREE);
-  tree v8hi_ftype_pcshort
-    = build_function_type_list (V8HI_type_node, pcshort_type_node, NULL_TREE);
-  tree void_ftype_pshort_v8hi
-    = build_function_type_list (void_type_node,
-				pshort_type_node, V8HI_type_node, NULL_TREE);
-  tree v16qi_ftype_pcchar
-    = build_function_type_list (V16QI_type_node, pcchar_type_node, NULL_TREE);
-  tree void_ftype_pchar_v16qi
-    = build_function_type_list (void_type_node,
-				pchar_type_node, V16QI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -12938,16 +12959,32 @@ altivec_init_builtins (void)
 
   tree opaque_ftype_long_pcvoid
     = build_function_type_list (opaque_V4SI_type_node,
-				long_integer_type_node, pcvoid_type_node, NULL_TREE);
+				long_integer_type_node, pcvoid_type_node,
+				NULL_TREE);
   tree v16qi_ftype_long_pcvoid
     = build_function_type_list (V16QI_type_node,
-				long_integer_type_node, pcvoid_type_node, NULL_TREE);
+				long_integer_type_node, pcvoid_type_node,
+				NULL_TREE);
   tree v8hi_ftype_long_pcvoid
     = build_function_type_list (V8HI_type_node,
-				long_integer_type_node, pcvoid_type_node, NULL_TREE);
+				long_integer_type_node, pcvoid_type_node,
+				NULL_TREE);
   tree v4si_ftype_long_pcvoid
     = build_function_type_list (V4SI_type_node,
-				long_integer_type_node, pcvoid_type_node, NULL_TREE);
+				long_integer_type_node, pcvoid_type_node,
+				NULL_TREE);
+  tree v4sf_ftype_long_pcvoid
+    = build_function_type_list (V4SF_type_node,
+				long_integer_type_node, pcvoid_type_node,
+				NULL_TREE);
+  tree v2df_ftype_long_pcvoid
+    = build_function_type_list (V2DF_type_node,
+				long_integer_type_node, pcvoid_type_node,
+				NULL_TREE);
+  tree v2di_ftype_long_pcvoid
+    = build_function_type_list (V2DI_type_node,
+				long_integer_type_node, pcvoid_type_node,
+				NULL_TREE);
 
   tree void_ftype_opaque_long_pvoid
     = build_function_type_list (void_type_node,
@@ -12965,6 +13002,18 @@ altivec_init_builtins (void)
     = build_function_type_list (void_type_node,
 				V8HI_type_node, long_integer_type_node,
 				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v4sf_long_pvoid
+    = build_function_type_list (void_type_node,
+				V4SF_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v2df_long_pvoid
+    = build_function_type_list (void_type_node,
+				V2DF_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v2di_long_pvoid
+    = build_function_type_list (void_type_node,
+				V2DI_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
   tree int_ftype_int_v8hi_v8hi
     = build_function_type_list (integer_type_node,
 				integer_type_node, V8HI_type_node,
@@ -12996,22 +13045,6 @@ altivec_init_builtins (void)
 				pcvoid_type_node, integer_type_node,
 				integer_type_node, NULL_TREE);
 
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_4sf", v4sf_ftype_pcfloat,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_4sf);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_4sf", void_ftype_pfloat_v4sf,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_4sf);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_4si", v4si_ftype_pcint,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_4si);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_4si", void_ftype_pint_v4si,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_4si);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_8hi", v8hi_ftype_pcshort,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_8hi);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_8hi", void_ftype_pshort_v8hi,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_8hi);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_ld_internal_16qi", v16qi_ftype_pcchar,
-	       ALTIVEC_BUILTIN_LD_INTERNAL_16qi);
-  def_builtin (MASK_ALTIVEC, "__builtin_altivec_st_internal_16qi", void_ftype_pchar_v16qi,
-	       ALTIVEC_BUILTIN_ST_INTERNAL_16qi);
   def_builtin (MASK_ALTIVEC, "__builtin_altivec_mtvscr", void_ftype_v4si, ALTIVEC_BUILTIN_MTVSCR);
   def_builtin (MASK_ALTIVEC, "__builtin_altivec_mfvscr", v8hi_ftype_void, ALTIVEC_BUILTIN_MFVSCR);
   def_builtin (MASK_ALTIVEC, "__builtin_altivec_dssall", void_ftype_void, ALTIVEC_BUILTIN_DSSALL);
@@ -13043,6 +13076,35 @@ altivec_init_builtins (void)
   def_builtin (MASK_ALTIVEC, "__builtin_vec_stvebx", void_ftype_opaque_long_pvoid, ALTIVEC_BUILTIN_VEC_STVEBX);
   def_builtin (MASK_ALTIVEC, "__builtin_vec_stvehx", void_ftype_opaque_long_pvoid, ALTIVEC_BUILTIN_VEC_STVEHX);
 
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvd2x_v2df", v2df_ftype_long_pcvoid,
+	       VSX_BUILTIN_LXVD2X_V2DF);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvd2x_v2di", v2di_ftype_long_pcvoid,
+	       VSX_BUILTIN_LXVD2X_V2DI);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v4sf", v4sf_ftype_long_pcvoid,
+	       VSX_BUILTIN_LXVW4X_V4SF);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v4si", v4si_ftype_long_pcvoid,
+	       VSX_BUILTIN_LXVW4X_V4SI);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v8hi",
+	       v8hi_ftype_long_pcvoid, VSX_BUILTIN_LXVW4X_V8HI);
+  def_builtin (MASK_VSX, "__builtin_vsx_lxvw4x_v16qi",
+	       v16qi_ftype_long_pcvoid, VSX_BUILTIN_LXVW4X_V16QI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvd2x_v2df",
+	       void_ftype_v2df_long_pvoid, VSX_BUILTIN_STXVD2X_V2DF);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvd2x_v2di",
+	       void_ftype_v2di_long_pvoid, VSX_BUILTIN_STXVD2X_V2DI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v4sf",
+	       void_ftype_v4sf_long_pvoid, VSX_BUILTIN_STXVW4X_V4SF);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v4si",
+	       void_ftype_v4si_long_pvoid, VSX_BUILTIN_STXVW4X_V4SI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v8hi",
+	       void_ftype_v8hi_long_pvoid, VSX_BUILTIN_STXVW4X_V8HI);
+  def_builtin (MASK_VSX, "__builtin_vsx_stxvw4x_v16qi",
+	       void_ftype_v16qi_long_pvoid, VSX_BUILTIN_STXVW4X_V16QI);
+  def_builtin (MASK_VSX, "__builtin_vec_vsx_ld", opaque_ftype_long_pcvoid,
+	       VSX_BUILTIN_VEC_LD);
+  def_builtin (MASK_VSX, "__builtin_vec_vsx_st", void_ftype_opaque_long_pvoid,
+	       VSX_BUILTIN_VEC_ST);
+
   if (rs6000_cpu == PROCESSOR_CELL)
     {
       def_builtin (MASK_ALTIVEC, "__builtin_altivec_lvlx",  v16qi_ftype_long_pcvoid, ALTIVEC_BUILTIN_LVLX);
@@ -27925,4 +27987,29 @@ rs6000_address_for_fpconvert (rtx x)
   return x;
 }
 
+/* Given a memory reference, if it is not in the form for altivec memory
+   reference instructions (i.e. reg or reg+reg addressing with AND of -16),
+   convert to the altivec format.  */
+
+rtx
+rs6000_address_for_altivec (rtx x)
+{
+  gcc_assert (MEM_P (x));
+  if (!altivec_indexed_or_indirect_operand (x, GET_MODE (x)))
+    {
+      rtx addr = XEXP (x, 0);
+      int strict_p = (reload_in_progress || reload_completed);
+
+      if (!legitimate_indexed_address_p (addr, strict_p)
+	  && !legitimate_indirect_address_p (addr, strict_p))
+	addr = copy_to_mode_reg (Pmode, addr);
+
+      addr = gen_rtx_AND (Pmode, addr, GEN_INT (-16));
+      x = change_address (x, GET_MODE (x), addr);
+    }
+
+  return x;
+}
+
+
 #include "gt-rs6000.h"
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 169776)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -308,6 +308,19 @@ (define_insn "*vsx_movti"
 }
   [(set_attr "type" "vecstore,vecload,vecsimple,*,*,*,vecsimple,*,vecstore,vecload")])
 
+;; Explicit  load/store expanders for the builtin functions
+(define_expand "vsx_load_<mode>"
+  [(set (match_operand:VSX_M 0 "vsx_register_operand" "")
+	(match_operand:VSX_M 1 "memory_operand" ""))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+  "")
+
+(define_expand "vsx_store_<mode>"
+  [(set (match_operand:VEC_M 0 "memory_operand" "")
+	(match_operand:VEC_M 1 "vsx_register_operand" ""))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+  "")
+
 \f
 ;; VSX scalar and vector floating point arithmetic instructions
 (define_insn "*vsx_add<mode>3"
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 169775)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -1,7 +1,7 @@
 /* Definitions of target machine for GNU compiler, for IBM RS/6000.
    Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
    2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
-   2010
+   2010, 2011
    Free Software Foundation, Inc.
    Contributed by Richard Kenner (kenner@vlsi1.ultra.nyu.edu)
 
@@ -2368,6 +2368,8 @@ enum rs6000_builtin_type_index
   RS6000_BTI_pixel_V8HI,         /* __vector __pixel */
   RS6000_BTI_long,	         /* long_integer_type_node */
   RS6000_BTI_unsigned_long,      /* long_unsigned_type_node */
+  RS6000_BTI_long_long,	         /* long_long_integer_type_node */
+  RS6000_BTI_unsigned_long_long, /* long_long_unsigned_type_node */
   RS6000_BTI_INTQI,	         /* intQI_type_node */
   RS6000_BTI_UINTQI,		 /* unsigned_intQI_type_node */
   RS6000_BTI_INTHI,	         /* intHI_type_node */
@@ -2411,6 +2413,8 @@ enum rs6000_builtin_type_index
 #define bool_V2DI_type_node	      (rs6000_builtin_types[RS6000_BTI_bool_V2DI])
 #define pixel_V8HI_type_node	      (rs6000_builtin_types[RS6000_BTI_pixel_V8HI])
 
+#define long_long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long_long])
+#define long_long_unsigned_type_internal_node (rs6000_builtin_types[RS6000_BTI_unsigned_long_long])
 #define long_integer_type_internal_node  (rs6000_builtin_types[RS6000_BTI_long])
 #define long_unsigned_type_internal_node (rs6000_builtin_types[RS6000_BTI_unsigned_long])
 #define intQI_type_internal_node	 (rs6000_builtin_types[RS6000_BTI_INTQI])
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 169775)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -1,5 +1,5 @@
 ;; AltiVec patterns.
-;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
 ;; Free Software Foundation, Inc.
 ;; Contributed by Aldy Hernandez (aldy@quesejoda.com)
 
@@ -96,7 +96,7 @@ (define_constants
    (UNSPEC_STVE         203)
    (UNSPEC_SET_VSCR     213)
    (UNSPEC_GET_VRSAVE   214)
-   ;; 215 deleted
+   (UNSPEC_LVX		215)
    (UNSPEC_REDUC_PLUS   217)
    (UNSPEC_VECSH        219)
    (UNSPEC_EXTEVEN_V4SI 220)
@@ -1750,17 +1750,19 @@ (define_insn "altivec_lvxl"
   "lvxl %0,%y1"
   [(set_attr "type" "vecload")])
 
-(define_insn "altivec_lvx"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-	(match_operand:V4SI 1 "memory_operand" "Z"))]
+(define_insn "altivec_lvx_<mode>"
+  [(parallel
+    [(set (match_operand:VM2 0 "register_operand" "=v")
+	  (match_operand:VM2 1 "memory_operand" "Z"))
+     (unspec [(const_int 0)] UNSPEC_LVX)])]
   "TARGET_ALTIVEC"
   "lvx %0,%y1"
   [(set_attr "type" "vecload")])
 
-(define_insn "altivec_stvx"
+(define_insn "altivec_stvx_<mode>"
   [(parallel
-    [(set (match_operand:V4SI 0 "memory_operand" "=Z")
-	  (match_operand:V4SI 1 "register_operand" "v"))
+    [(set (match_operand:VM2 0 "memory_operand" "=Z")
+	  (match_operand:VM2 1 "register_operand" "v"))
      (unspec [(const_int 0)] UNSPEC_STVX)])]
   "TARGET_ALTIVEC"
   "stvx %1,%y0"
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 169775)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -1,5 +1,5 @@
 /* PowerPC AltiVec include file.
-   Copyright (C) 2002, 2003, 2004, 2005, 2008, 2009, 2010
+   Copyright (C) 2002, 2003, 2004, 2005, 2008, 2009, 2010, 2011
    Free Software Foundation, Inc.
    Contributed by Aldy Hernandez (aldyh@redhat.com).
    Rewritten by Paolo Bonzini (bonzini@gnu.org).
@@ -318,6 +318,8 @@
 #define vec_nearbyint __builtin_vec_nearbyint
 #define vec_rint __builtin_vec_rint
 #define vec_sqrt __builtin_vec_sqrt
+#define vec_vsx_ld __builtin_vec_vsx_ld
+#define vec_vsx_st __builtin_vec_vsx_st
 #endif
 
 /* Predicates.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-02-03  5:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-24 22:34 [PATCH] Fix PR 47272 to restore Altivec vec_ld/vec_st Michael Meissner
2011-01-24 22:46 ` Mark Mitchell
2011-01-24 23:44   ` Michael Meissner
2011-01-25  4:08     ` Mark Mitchell
2011-01-31 21:00     ` Michael Meissner
2011-02-02 21:08       ` David Edelsohn
2011-02-03  5:47         ` Michael Meissner
2011-01-25 10:52   ` Richard Guenther
2011-01-25 11:26     ` Jakub Jelinek
2011-01-25 19:15     ` Mark Mitchell
2011-01-26 10:52       ` Richard Guenther
2011-01-26 15:45         ` David Edelsohn
2011-01-26 16:36           ` Mark Mitchell
2011-01-26  1:19   ` Joseph S. Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).