From: Jan Hubicka <hubicka@ucw.cz>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, hongtao.liu@intel.com
Subject: Re: [PATCH 2/2] [i386] Adjust costing of emulated vectorized gather/scatter
Date: Fri, 24 Mar 2023 14:12:25 +0100 [thread overview]
Message-ID: <ZB2hue6Yxbk+mHri@kam.mff.cuni.cz> (raw)
In-Reply-To: <20230324130404.2C4ED138ED@imap2.suse-dmz.suse.de>
> Emulated gather/scatter behave similar to strided elementwise
> accesses in that they need to decompose the offset vector
> and construct or decompose the data vector so handle them
> the same way, pessimizing the cases with may elements.
>
> For pr88531-2c.c instead of
>
> .L4:
> leaq (%r15,%rcx), %rdx
> incl %edi
> movl 16(%rdx), %r13d
> movl 24(%rdx), %r14d
> movl (%rdx), %r10d
> movl 4(%rdx), %r9d
> movl 8(%rdx), %ebx
> movl 12(%rdx), %r11d
> movl 20(%rdx), %r12d
> vmovss (%rax,%r14,4), %xmm2
> movl 28(%rdx), %edx
> vmovss (%rax,%r13,4), %xmm1
> vmovss (%rax,%r10,4), %xmm0
> vinsertps $0x10, (%rax,%rdx,4), %xmm2, %xmm2
> vinsertps $0x10, (%rax,%r12,4), %xmm1, %xmm1
> vinsertps $0x10, (%rax,%r9,4), %xmm0, %xmm0
> vmovlhps %xmm2, %xmm1, %xmm1
> vmovss (%rax,%rbx,4), %xmm2
> vinsertps $0x10, (%rax,%r11,4), %xmm2, %xmm2
> vmovlhps %xmm2, %xmm0, %xmm0
> vinsertf128 $0x1, %xmm1, %ymm0, %ymm0
> vmulps %ymm3, %ymm0, %ymm0
> vmovups %ymm0, (%r8,%rcx)
> addq $32, %rcx
> cmpl %esi, %edi
> jb .L4
>
> we now prefer
>
> .L4:
> leaq 0(%rbp,%rdx,8), %rcx
> movl (%rcx), %r10d
> movl 4(%rcx), %ecx
> vmovss (%rsi,%r10,4), %xmm0
> vinsertps $0x10, (%rsi,%rcx,4), %xmm0, %xmm0
> vmulps %xmm1, %xmm0, %xmm0
> vmovlps %xmm0, (%rbx,%rdx,8)
> incq %rdx
> cmpl %edi, %edx
> jb .L4
>
> which vectorizes with SSE instead of AVX2 which looks like an
> improvement.
>
> When testing this on SPEC CPU 2017 with -Ofast -flto -march=znver4
> there are quite some cases where we now prefer SSE vectorization
> over AVX512 + AVX2 epilogue and some cases where we now reject
> vectorization. Runtime the changes are noise with the off-noise
> candidates better after the patch.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK for stage1?
>
> Thanks,
> Richard.
>
> * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> Tame down element extracts and scalar loads for gather/scatter
> similar to elementwise strided accesses.
>
> * gcc.target/i386/pr89618-2.c: New testcase.
> * gcc.target/i386/pr88531-2b.c: Adjust.
> * gcc.target/i386/pr88531-2c.c: Likewise.
OK.
Honza
prev parent reply other threads:[~2023-03-24 13:12 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-24 13:04 Richard Biener
2023-03-24 13:12 ` Jan Hubicka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZB2hue6Yxbk+mHri@kam.mff.cuni.cz \
--to=hubicka@ucw.cz \
--cc=gcc-patches@gcc.gnu.org \
--cc=hongtao.liu@intel.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).