public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Allow vec_duplicate_optab to fail
@ 2021-06-05 15:18 H.J. Lu
  2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
  2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu
  0 siblings, 2 replies; 10+ messages in thread
From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener

We'd like to add vec_duplicate_optab to x86 backend.  There are 3 ways
to broadcast an integer constant:

1. Load the full size from constant pool directly.
2. Use AVX2/AVX512 broadcast instruction.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory      : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   text	   data	    bss	    dec	    hex	filename
    132	      0	      0	    132	     84	memory.o
    122	      0	      0	    122	     7a	broadcast.o
$

The preferred choices are

1. Use AVX2/AVX512 broadcast instruction.
2. Load the full size from constant pool directly.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

The first patch updates vec_duplicate_optab usage to allow it to fail so
that x86 backend can opt out SSE2 broadcast emulation from an integer
constant.

The second patch adds vec_duplicate<mode> expander and updates move
expanders to convert the CONST_WIDE_INT and CONST_VECTO operands to
vector broadcast from an integer with AVX2.

H.J. Lu (2):
  Allow vec_duplicate_optab to fail
  x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

 gcc/config/i386/i386-expand.c                 | 216 +++++++++++++++++-
 gcc/config/i386/i386-protos.h                 |   3 +
 gcc/config/i386/i386.c                        |  31 +++
 gcc/config/i386/sse.md                        |  19 ++
 gcc/doc/md.texi                               |   2 -
 gcc/expr.c                                    |  10 +-
 .../i386/avx512f-broadcast-pr87767-1.c        |   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c        |   5 +-
 .../gcc.target/i386/avx512f_cond_move.c       |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c       |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c       |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c    |  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c    |  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c    |  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 ++
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 ++
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
 28 files changed, 534 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9b.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-09 21:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu
2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
2021-06-07  7:12   ` Richard Sandiford
2021-06-07 14:18     ` H.J. Lu
2021-06-07 17:59       ` Richard Biener
2021-06-07 18:10         ` Richard Biener
2021-06-07 20:33           ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu
2021-06-09 21:03             ` Jeff Law
2021-06-09 21:31               ` H.J. Lu
2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).