Re: [PATCH] x86_64: Add insn patterns for V1TI mode logic operations.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] x86_64: Add insn patterns for V1TI mode logic operations.
Date: Fri, 22 Oct 2021 16:53:24 +0200	[thread overview]
Message-ID: <CAFULd4Zc05xpL-4v7=_+amuo6wb7gJZEr9+HRGiJYx3AM_NEdA@mail.gmail.com> (raw)
In-Reply-To: <002f01d7c715$1cc96400$565c2c00$@nextmovesoftware.com>

On Fri, Oct 22, 2021 at 9:19 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> On x86_64, V1TI mode holds a 128-bit integer value in a (vector) SSE
> register (where regular TI mode uses a pair of 64-bit general purpose
> scalar registers).  This patch improves the implementation of AND, IOR,
> XOR and NOT on these values.
>
> The benefit is demonstrated by the following simple test program:
>
> typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16)));
> v1ti and(v1ti x, v1ti y) { return x & y; }
> v1ti ior(v1ti x, v1ti y) { return x | y; }
> v1ti xor(v1ti x, v1ti y) { return x ^ y; }
> v1ti not(v1ti x) { return ~x; }
>
> For which GCC currently generates the rather large:
>
> and:    movdqa  %xmm0, %xmm2
>         movq    %xmm1, %rdx
>         movq    %xmm0, %rax
>         andq    %rdx, %rax
>         movhlps %xmm2, %xmm3
>         movhlps %xmm1, %xmm4
>         movq    %rax, %xmm0
>         movq    %xmm4, %rdx
>         movq    %xmm3, %rax
>         andq    %rdx, %rax
>         movq    %rax, %xmm5
>         punpcklqdq      %xmm5, %xmm0
>         ret
>
> ior:    movdqa  %xmm0, %xmm2
>         movq    %xmm1, %rdx
>         movq    %xmm0, %rax
>         orq     %rdx, %rax
>         movhlps %xmm2, %xmm3
>         movhlps %xmm1, %xmm4
>         movq    %rax, %xmm0
>         movq    %xmm4, %rdx
>         movq    %xmm3, %rax
>         orq     %rdx, %rax
>         movq    %rax, %xmm5
>         punpcklqdq      %xmm5, %xmm0
>         ret
>
> xor:    movdqa  %xmm0, %xmm2
>         movq    %xmm1, %rdx
>         movq    %xmm0, %rax
>         xorq    %rdx, %rax
>         movhlps %xmm2, %xmm3
>         movhlps %xmm1, %xmm4
>         movq    %rax, %xmm0
>         movq    %xmm4, %rdx
>         movq    %xmm3, %rax
>         xorq    %rdx, %rax
>         movq    %rax, %xmm5
>         punpcklqdq      %xmm5, %xmm0
>         ret
>
> not:    movdqa  %xmm0, %xmm1
>         movq    %xmm0, %rax
>         notq    %rax
>         movhlps %xmm1, %xmm2
>         movq    %rax, %xmm0
>         movq    %xmm2, %rax
>         notq    %rax
>         movq    %rax, %xmm3
>         punpcklqdq      %xmm3, %xmm0
>         ret
>
>
> with this patch we now generate the much more efficient:
>
> and:    pand    %xmm1, %xmm0
>         ret
>
> ior:    por     %xmm1, %xmm0
>         ret
>
> xor:    pxor    %xmm1, %xmm0
>         ret
>
> not:    pcmpeqd %xmm1, %xmm1
>         pxor    %xmm1, %xmm0
>         ret
>
>
> For my first few attempts at this patch I tried adding V1TI to the
> existing VI and VI12_AVX_512F mode iterators, but these then have
> dependencies on other iterators (and attributes), and so on until
> everything ties itself into a knot, as V1TI mode isn't really a
> first-class vector mode on x86_64.  Hence I ultimately opted to use
> simple stand-alone patterns (as used by the existing TF mode support).
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  Ok for mainline?
>
>
> 2021-10-22  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/sse.md (<any_logic>v1ti3): New define_insn to
>         implement V1TImode AND, IOR and XOR on TARGET_SSE2 (and above).
>         (one_cmplv1ti2): New define expand.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse2-v1ti-logic.c: New test case.
>         * gcc.target/i386/sse2-v1ti-logic-2.c: New test case.

There is no need for

/* { dg-require-effective-target sse2 } */

for compile tests. The compilation does not reach the assembler.

OK with the above change.

BTW: You can add testcases to the main patch with "git add <filename>"
and then create the patch with "git diff HEAD".

Thanks,
Uros.

     prev parent reply	other threads:[~2021-10-22 14:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-22  7:19 Roger Sayle
2021-10-22 14:53 ` Uros Bizjak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4Zc05xpL-4v7=_+amuo6wb7gJZEr9+HRGiJYx3AM_NEdA@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).