Missed patch attached in plain-text. I have copyright assignment on file with the FSF covering work on GCC. Load/stores groups of length 3 is the most frequent non-power-of-2 case. It is used in RGB image processing (like test case in PR52252). For sure we can extend the patch to length 5 and more. However, this potentially affect performance on some other architectures and requires larger testing. So length 3 it is just first step.The algorithm in the patch could be modified for a general case in several steps. I understand that the patch should wait for the stage 1, however since its ready we can discuss it right now and make some changes (like general size of group). Thanks, Evgeny On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener wrote: > > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: > > > Hi, > > > > The patch gives an expected 3 times gain for the test case in the PR52252 > > (and even 6 times for AVX2). > > It passes make check and bootstrap on x86. > > spec2000/spec2006 got no regressions/gains on x86. > > > > Is this patch ok? > > I've worked on generalizing the permutation support in the light > of the availability of the generic shuffle support in the IL > but hit some road-blocks in the way code-generation works for > group loads with permutations (I don't remember if I posted all patches). > > This patch seems to be to a slightly different place but it again > special-cases a specific permutation. Why's that? Why can't we > support groups of size 7 for example? So - can this be generalized > to support arbitrary non-power-of-two load/store groups? > > Other than that the patch has to wait for stage1 to open again, > of course. And it misses a testcase. > > Btw, do you have a copyright assignment on file with the FSF covering > work on GCC? > > Thanks, > Richard. > > > ChangeLog: > > > > 2014-02-11 Evgeny Stupachenko > > > > * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle. > > * tree-vect-data-refs.c (vect_grouped_store_supported): New > > check for stores group of length 3. > > (vect_permute_store_chain): New permutations for stores group of > > length 3. > > (vect_grouped_load_supported): New check for loads group of length > > 3. > > (vect_permute_load_chain): New permutations for loads group of > > length 3. > > * tree-vect-stmts.c (vect_model_store_cost): New cost > > vec_perm_shuffle > > for the new permutations. > > (vect_model_load_cost): Ditto. > > * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding > > vec_perm_shuffle cost as equvivalent of vec_perm cost. > > * config/arm/arm.c: Ditto. > > * config/rs6000/rs6000.c: Ditto. > > * config/spu/spu.c: Ditto. > > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow > > byte > > shuffle on some x86 architectures. > > * config/i386/i386.h (processor_costs): Defining pshuffb cost. > > * config/i386/i386.c (processor_costs): Adding pshuffb cost. > > (ix86_builtin_vectorization_cost): Adding cost for the new > > permutations. > > Fixing cost for other permutations. > > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are > > slow (TARGET_SLOW_PHUFFB). > > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. > > Adding new shuffle cost only when byte shuffle is expected. > > Fixing cost model for Silvermont. > > > > Thanks, > > Evgeny > > > > -- > Richard Biener > SUSE / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer