From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10411 invoked by alias); 19 Mar 2012 17:15:41 -0000 Received: (qmail 10375 invoked by uid 22791); 19 Mar 2012 17:15:37 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_VP X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 19 Mar 2012 17:15:13 +0000 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2} Date: Mon, 19 Mar 2012 17:19:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: CC Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-03/txt/msg01535.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 Jakub Jelinek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org, | |rth at gcc dot gnu.org --- Comment #8 from Jakub Jelinek 2012-03-19 17:13:53 UTC --- I'm not very keen on having too many different routines, the more generic they are, the better. So IMHO e.g. the two insn sequence, vperm2[if]128 + some one insn shuffle could look like: /* A subroutine of ix86_expand_vec_perm_builtin_1. Try to expand a vector permutation using two instructions, vperm2f128 resp. vperm2i128 followed by any single in-lane permutation. */ static bool expand_vec_perm_vperm2f128 (struct expand_vec_perm_d *d) { struct expand_vec_perm_d dfirst, dsecond; unsigned i, j, nelt = d->nelt, nelt2 = nelt / 2, perm; bool ok; if (!TARGET_AVX || GET_MODE_SIZE (d->vmode) != 32 || (d->vmode != V8SFmode && d->vmode != V4DFmode && !TARGET_AVX2)) return false; dsecond = *d; if (d->op0 == d->op1) dsecond.op1 = gen_reg_rtx (d->vmode); dsecond.testing_p = true; /* ((perm << 2)|perm) & 0x33 is the vperm2[fi]128 immediate. For perm < 16 the second permutation uses d->op0 as first operand, for perm >= 16 it uses d->op1 as first operand. The second operand is the result of vperm2[fi]128. */ for (perm = 0; perm < 32; perm++) { /* Ignore permutations which do not move anything cross-lane. */ if (perm < 16) { if ((perm & 0xc) == (1 << 2)) continue; if ((perm & 3) == 0) continue; if ((perm & 0xf) == ((3 << 2) | 2)) continue; } else { if ((perm & 0xc) == (3 << 2)) continue; if ((perm & 3) == 2) continue; if ((perm & 0xf) == (1 << 2)) continue; } for (i = 0; i < nelt; i++) { j = d->perm[i] / nelt2; if (j == ((perm >> (2 * (i >= nelt2))) & 3)) dsecond.perm[i] = nelt + (i & nelt2) + (d->perm[i] & (nelt2 - 1)); else if (j == (unsigned) (i >= nelt2) + 2 * (perm >= 16)) dsecond.perm[i] = d->perm[i] & (nelt - 1); else break; } if (i == nelt) { start_sequence (); ok = expand_vec_perm_1 (&dsecond); end_sequence (); } else ok = false; if (ok) { if (d->testing_p) return true; dsecond.testing_p = false; dfirst = *d; if (d->op0 == d->op1) dfirst.target = dsecond.op1; else dfirst.target = gen_reg_rtx (d->vmode); for (i = 0; i < nelt; i++) dfirst.perm[i] = (i & (nelt2 - 1)) + ((perm >> (2 * (i >= nelt2))) & 3) * nelt2; ok = expand_vec_perm_1 (&dfirst); gcc_assert (ok); dsecond.op1 = dfirst.target; if (perm >= 16) dsecond.op0 = dfirst.op1; ok = expand_vec_perm_1 (&dsecond); gcc_assert (ok); return true; } /* For d->op0 == d->op1 the only useful vperm2f128 permutation is 0x10. */ if (d->op0 == d->op1) return false; } return false; } This will handle e.g. vperm2f128 + {vshufpd,vblendpd,vunpcklpd,vunpckhpd} etc. But with the current expand_vselect implementation it might be too costly, at least memory-wise.