public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.
Date: Thu, 25 Jan 2024 09:05:44 +0000 [thread overview]
Message-ID: <bug-113583-4-GoRIwfy2W9@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-113583-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 25 Jan 2024, juzhe.zhong at rivai dot ai wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
>
> --- Comment #5 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> Both ICC and Clang X86 can vectorize SPEC 2017 lbm:
>
> https://godbolt.org/z/MjbTbYf1G
>
> But I am not sure X86 ICC is better or X86 Clang is better.
gather/scatter are possibly slow (and gather now has that Intel
security issue). The reason is a "cost" one:
t.c:47:21: note: ==> examining statement: _4 = *_3;
t.c:47:21: missed: no array mode for V8DF[20]
t.c:47:21: missed: no array mode for V8DF[20]
t.c:47:21: missed: the size of the group of accesses is not a power of 2
or not equal to 3
t.c:47:21: missed: not falling back to elementwise accesses
t.c:58:15: missed: not vectorized: relevant stmt not supported: _4 =
*_3;
t.c:47:21: missed: bad operation or unsupported loop bound.
where we don't consider using gather because we have a known constant
stride (20). Since the stores are really scatters we don't attempt
to SLP either.
Disabling the above heuristic we get this vectorized as well, avoiding
gather/scatter by manually implementing them and using a quite high
VF of 8 (with -mprefer-vector-width=256 you get VF 4 and likely
faster code in the end). But yes, I doubt that any of ICC or clang
vectorized codes are faster anywhere (but without specifying an
uarch you get some generic cost modelling applied). Maybe SPR doesn't
have the gather bug and it does have reasonable gather and scatter
(zen4 scatter sucks).
.L3:
vmovsd 952(%rax), %xmm0
vmovsd -8(%rax), %xmm2
addq $1280, %rsi
addq $1280, %rax
vmovhpd -168(%rax), %xmm0, %xmm1
vmovhpd -1128(%rax), %xmm2, %xmm2
vmovsd -648(%rax), %xmm0
vmovhpd -488(%rax), %xmm0, %xmm0
vinsertf32x4 $0x1, %xmm1, %ymm0, %ymm0
vmovsd -968(%rax), %xmm1
vmovhpd -808(%rax), %xmm1, %xmm1
vinsertf32x4 $0x1, %xmm1, %ymm2, %ymm2
vinsertf64x4 $0x1, %ymm0, %zmm2, %zmm2
vmovsd -320(%rax), %xmm0
vmovhpd -160(%rax), %xmm0, %xmm1
vmovsd -640(%rax), %xmm0
vmovhpd -480(%rax), %xmm0, %xmm0
vinsertf32x4 $0x1, %xmm1, %ymm0, %ymm1
vmovsd -960(%rax), %xmm0
vmovhpd -800(%rax), %xmm0, %xmm8
vmovsd -1280(%rax), %xmm0
vmovhpd -1120(%rax), %xmm0, %xmm0
vinsertf32x4 $0x1, %xmm8, %ymm0, %ymm0
vinsertf64x4 $0x1, %ymm1, %zmm0, %zmm0
vmovsd -312(%rax), %xmm1
vmovhpd -152(%rax), %xmm1, %xmm8
vmovsd -632(%rax), %xmm1
vmovhpd -472(%rax), %xmm1, %xmm1
vinsertf32x4 $0x1, %xmm8, %ymm1, %ymm8
vmovsd -952(%rax), %xmm1
vmovhpd -792(%rax), %xmm1, %xmm9
vmovsd -1272(%rax), %xmm1
vmovhpd -1112(%rax), %xmm1, %xmm1
vinsertf32x4 $0x1, %xmm9, %ymm1, %ymm1
vinsertf64x4 $0x1, %ymm8, %zmm1, %zmm1
vaddpd %zmm1, %zmm0, %zmm0
vaddpd %zmm7, %zmm2, %zmm1
vfnmadd132pd %zmm3, %zmm2, %zmm1
vfmadd132pd %zmm6, %zmm5, %zmm0
valignq $3, %ymm1, %ymm1, %ymm2
vmovlpd %xmm1, -1280(%rsi)
vextractf64x2 $1, %ymm1, %xmm8
vmovhpd %xmm1, -1120(%rsi)
vextractf64x4 $0x1, %zmm1, %ymm1
vmovlpd %xmm1, -640(%rsi)
vmovhpd %xmm1, -480(%rsi)
vmovsd %xmm2, -800(%rsi)
vextractf64x2 $1, %ymm1, %xmm2
vmovsd %xmm8, -960(%rsi)
valignq $3, %ymm1, %ymm1, %ymm1
vmovsd %xmm2, -320(%rsi)
vmovsd %xmm1, -160(%rsi)
vmovsd -320(%rax), %xmm1
vmovhpd -160(%rax), %xmm1, %xmm2
vmovsd -640(%rax), %xmm1
vmovhpd -480(%rax), %xmm1, %xmm1
vinsertf32x4 $0x1, %xmm2, %ymm1, %ymm2
vmovsd -960(%rax), %xmm1
vmovhpd -800(%rax), %xmm1, %xmm8
vmovsd -1280(%rax), %xmm1
vmovhpd -1120(%rax), %xmm1, %xmm1
vinsertf32x4 $0x1, %xmm8, %ymm1, %ymm1
vinsertf64x4 $0x1, %ymm2, %zmm1, %zmm1
vfnmadd132pd %zmm3, %zmm1, %zmm0
vaddpd %zmm4, %zmm0, %zmm0
valignq $3, %ymm0, %ymm0, %ymm1
vmovlpd %xmm0, 14728(%rsi)
vextractf64x2 $1, %ymm0, %xmm2
vmovhpd %xmm0, 14888(%rsi)
vextractf64x4 $0x1, %zmm0, %ymm0
vmovlpd %xmm0, 15368(%rsi)
vmovhpd %xmm0, 15528(%rsi)
vmovsd %xmm1, 15208(%rsi)
vextractf64x2 $1, %ymm0, %xmm1
vmovsd %xmm2, 15048(%rsi)
valignq $3, %ymm0, %ymm0, %ymm0
vmovsd %xmm1, 15688(%rsi)
vmovsd %xmm0, 15848(%rsi)
cmpq %rdx, %rsi
jne .L3
next prev parent reply other threads:[~2024-01-25 9:06 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-24 14:21 [Bug tree-optimization/113583] New: " rdapp at gcc dot gnu.org
2024-01-24 14:42 ` [Bug tree-optimization/113583] " juzhe.zhong at rivai dot ai
2024-01-24 14:44 ` rdapp at gcc dot gnu.org
2024-01-24 15:00 ` juzhe.zhong at rivai dot ai
2024-01-25 3:06 ` juzhe.zhong at rivai dot ai
2024-01-25 3:13 ` juzhe.zhong at rivai dot ai
2024-01-25 5:41 ` pinskia at gcc dot gnu.org
2024-01-25 9:05 ` rguenther at suse dot de [this message]
2024-01-25 9:16 ` juzhe.zhong at rivai dot ai
2024-01-25 9:34 ` rguenth at gcc dot gnu.org
2024-01-26 9:50 ` rdapp at gcc dot gnu.org
2024-01-26 10:21 ` rguenther at suse dot de
2024-02-05 6:59 ` juzhe.zhong at rivai dot ai
2024-02-07 3:39 ` juzhe.zhong at rivai dot ai
2024-02-07 7:48 ` juzhe.zhong at rivai dot ai
2024-02-07 8:04 ` rguenther at suse dot de
2024-02-07 8:08 ` juzhe.zhong at rivai dot ai
2024-02-07 8:13 ` juzhe.zhong at rivai dot ai
2024-02-07 10:24 ` rguenther at suse dot de
2024-05-13 14:17 ` rdapp at gcc dot gnu.org
2024-05-16 12:41 ` rguenth at gcc dot gnu.org
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-113583-4-GoRIwfy2W9@http.gcc.gnu.org/bugzilla/ \
--to=gcc-bugzilla@gcc.gnu.org \
--cc=gcc-bugs@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).