On Fri, Jun 16, 2017 at 04:30:48PM -0500, Segher Boessenkool wrote: > On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote: > > > > + "&& reload_completed" > > > > > > I still don't think it is such a good idea to do all of this not until > > > after reload. It does of course allow you to play tricks with changing > > > register mode at will, like you do ;-) > > > > The problem is MODES_TIEABLE_P. V4S{I,F}mode and SImode cannot be tied > > together (i.e. use gen_lowpart to change the mode and use a SUBREG). So after > > reload, we can just use gen_rtx_REG (...) to change the register type, but > > before reload, by creating the SUBREG, it can lead to various aborts if rtl > > checking is turned on. > > That sounds like a problem elsewhere? Hrm. > > > > All these unspecs are a similar problem: the RTL optimisers cannot do > > > much at all with it. > > > > I don't think there is a good way to represent a vec_insert. And vec_extract > > can't represent a variable extract either. > > Yeah. But especially for all this lane shuffling etc. the generic > optimisers could do a good job, if only they knew how. Maybe we need > some new RTL codes. > > > > > + [(set_attr "type" "vecperm") > > > > > Is that a good type for this? I think the convert is more expensive > > > than the permutes? If so, that would be better (of course it only > > > matters for sched1, not super important). > > > > I generally use the type of the last insn. I am open to other suggestions. > > It should describe the resulting insns as a whole. Picking the type of > the most expensive insn is often a reasonable approximation; for integer > insns "two" or "three" can be okay. > > I don't think we can do much better currently. Here is the latest patch that restricts the optimization to 64-bit (due to needing VSX small integers). I've done a full bootstrap/make check on a little endian power8 system, and a build without bootstrap and make check on a little endian power9 system. Neither the power8 nor the power9 systems had any regressions. I'm also running a test on a big endian power7 system for completeness. Assuming the power7 test finishes without any regressions, can I check this patch into the trunk and later the GCC 7 branch. The main change was to restrict the optimization to 64-bit PowerPC that have VSX small integer support turned on (default for 64-bit). I did shorten the one line in the testsuite that you mentioned. [gcc] 2017-06-16 Michael Meissner PR target/79799 * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support for doing vector set of SFmode on ISA 3.0. * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise. (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF element. (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a SFmode value into a V4SF variable that was extracted from another V4SF variable without converting the element to double precision and back to single precision vector format. (vsx_insert_extract_v4sf_p9_2): Likewise. [gcc/testsuite] 2017-06-16 Michael Meissner PR target/79799 * gcc.target/powerpc/pr79799-1.c: New test. * gcc.target/powerpc/pr79799-2.c: Likewise. * gcc.target/powerpc/pr79799-3.c: Likewise. * gcc.target/powerpc/pr79799-4.c: Likewise. * gcc.target/powerpc/pr79799-5.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797