From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) by sourceware.org (Postfix) with ESMTPS id 76E353858D1E for ; Fri, 23 Dec 2022 17:18:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 76E353858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb2d.google.com with SMTP id o127so5897217yba.5 for ; Fri, 23 Dec 2022 09:18:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JRKO7KePgM0ugquZp+kUJIGOlloZxssXDSu5aNNBHLk=; b=mcKQLdCCJphK3CjykbPq100vMfnJsCls354E0wtD41oCEAT0LAKq9SL/a0zMwBluAY G4Aw/j+12j37R55Ut0bPIH8b+d8wBTNFpeJakkNZpIj8oTnS9W2Jw66qkptK/j/gmLO9 geAgXYAHcYcKnl9GvEecRUmE1BLy/N77hV4Zeut37ICp9G/PWXU7PaCKR3TyeMvvxIF7 7dLif7x2jLIfs27UCT9GT0jUr8NZerzjY9pX67No6kYtXbEu6/dsSrnxouXE6cvsVCSF UT3OkyHoF5ln9DS3CH/04rAmgsMGQq6cDBN4N3eoWrpRKiNFAx98x/SjnHn8/anNP+Ji 6QXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JRKO7KePgM0ugquZp+kUJIGOlloZxssXDSu5aNNBHLk=; b=n+rzr4pvSeUM9vICqrQPG7kVdPX49HflPVjaHSxsyolgwtzKNr1PWfUkluYP2Xe/Xn ZmKG2vVZ4lqzT2h+/YqPh5QzLn7t2BFz//uG92qNlqZB9mWgj83IVMhPixhGkR4qPTZK SVmiNEYRWHZ0JjZd0CQrNY1Eej94xeeCtV+ClLcNyHWRzAyyayc528bdip+fQI+QW/Sh +Bufs2ZaKnns3vJD+kBXZM7JkF5jOfMT+iTu/8zSRrnqhN7hrdq9KwCl7kgt4DzG0v3p VfOFq42Y7V+qFyxO/QJrroxiKal38JVOZwteQv8QwTNfMSZRYihGcoVZURmwhQTVsPF4 xetA== X-Gm-Message-State: AFqh2krEoyA+ivI2ED/A/fyQr8wJ4Et5YL96nx6Eux73p2R00A8SWvBw 7uGNUNo6cusP5HKpXBt7m4FvIYndaP2HWcQnrlmbjsObcjxnGQ== X-Google-Smtp-Source: AMrXdXvriiecxv94oFyNzC2T1mlcIOCGyylc1PU07NzTODGI5E5xy6VauQpOLv8xTI2NvwCOWbhfJ4kmVxAS9e7BjJw= X-Received: by 2002:a25:d8d1:0:b0:701:2b03:dbb5 with SMTP id p200-20020a25d8d1000000b007012b03dbb5mr622485ybg.261.1671815911693; Fri, 23 Dec 2022 09:18:31 -0800 (PST) MIME-Version: 1.0 References: <00e501d916ee$0ef7d210$2ce77630$@nextmovesoftware.com> In-Reply-To: <00e501d916ee$0ef7d210$2ce77630$@nextmovesoftware.com> From: Uros Bizjak Date: Fri, 23 Dec 2022 18:18:20 +0100 Message-ID: Subject: Re: [x86 PATCH] Use movss/movsd to implement V4SI/V2DI VEC_PERM. To: Roger Sayle Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Dec 23, 2022 at 5:46 PM Roger Sayle wrote: > > > This patch tweaks the x86 backend to use the movss and movsd instructions > to perform some vector permutations on integer vectors (V4SI and V2DI) in > the same way they are used for floating point vectors (V4SF and V2DF). > > As a motivating example, consider: > > typedef unsigned int v4si __attribute__((vector_size(16))); > typedef float v4sf __attribute__((vector_size(16))); > v4si foo(v4si x,v4si y) { return (v4si){y[0],x[1],x[2],x[3]}; } > v4sf bar(v4sf x,v4sf y) { return (v4sf){y[0],x[1],x[2],x[3]}; } > > which is currently compiled with -O2 to: > > foo: movdqa %xmm0, %xmm2 > shufps $80, %xmm0, %xmm1 > movdqa %xmm1, %xmm0 > shufps $232, %xmm2, %xmm0 > ret > > bar: movss %xmm1, %xmm0 > ret > > with this patch both functions compile to the same form. > Likewise for the V2DI case: > > typedef unsigned long v2di __attribute__((vector_size(16))); > typedef double v2df __attribute__((vector_size(16))); > > v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; } > v2df bar(v2df x,v2df y) { return (v2df){y[0],x[1]}; } > > which is currently generates: > > foo: shufpd $2, %xmm0, %xmm1 > movdqa %xmm1, %xmm0 > ret > > bar: movsd %xmm1, %xmm0 > ret > > There are two possible approaches to adding integer vector forms of the > sse_movss and sse2_movsd instructions. One is to use a mode iterator > (VI4F_128 or VI8F_128) on the existing define_insn patterns, but this > requires renaming the patterns to sse_movss_ which then requires > changes to i386-builtins.def and through-out the backend to reflect the > new naming of gen_sse_movss_v4sf. The alternate approach (taken here) > is to simply clone and specialize the existing patterns. Uros, if you'd > prefer the first approach, I'm happy to make/test/commit those changes. I would really prefer the variant with VI4F_128/VI8F_128, these two iterators were introduced specifically for this case (see e.g. sse_shufps_ and sse2_shufpd_. The internal name of the pattern is fairly irrelevant and a trivial search and replace operation can replace the grand total of 6 occurrences ...) Also, changing sse2_movsd to use VI8F_128 mode iterator would enable more alternatives besides movsd, so we give combine pass some more opportunities with memory operands. So, the patch with those two iterators is pre-approved. Uros. > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > 2022-12-23 Roger Sayle > > gcc/ChangeLog > * config/i386/i386-expand.cc (expand_vec_perm_movs): Also allow > V4SImode with TARGET_SSE and V2DImode with TARGET_SSE2. > * config/i386/sse.md (sse_movss_v4si): New define_insn, a V4SI > specialization of sse_movss. > (sse2_movsd_v2di): Likewise, a V2DI specialization of sse2_movsd. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse-movss-4.c: New test case. > * gcc.target/i386/sse2-movsd-3.c: New test case. > > > Thanks in advance, > Roger > -- >