public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/102543] New: -march=cascadelake performs odd alignment peeling
@ 2021-09-30 10:13 rguenth at gcc dot gnu.org
2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-30 10:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
Bug ID: 102543
Summary: -march=cascadelake performs odd alignment peeling
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
For gcc.dg/torture/pr65270-1.c we choose to misalign an aligned store + load
combo for runtime aligning a single load because we have (skylake_cost):
{6, 6, 6, 10, 20}, /* cost of loading SSE register
in 32bit, 64bit, 128bit, 256bit and
512bit */
{8, 8, 8, 12, 24}, /* cost of storing SSE register
in 32bit, 64bit, 128bit, 256bit and
512bit */
{6, 6, 6, 10, 20}, /* cost of unaligned loads. */
{8, 8, 8, 8, 16}, /* cost of unaligned stores. */
which means that an unaligned store is cheaper than an aligned store for
%ymm and even more so for %zmm!??
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
@ 2021-09-30 10:14 ` rguenth at gcc dot gnu.org
2021-10-06 15:00 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-30 10:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
same for icelake_cost.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
@ 2021-10-06 15:00 ` rguenth at gcc dot gnu.org
2021-10-08 9:04 ` crazylht at gmail dot com
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-06 15:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Caused by
commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
Date: Thu Feb 8 23:31:15 2018 +0100
re PR target/83008 ([performance] Is it better to avoid extra instructions
in data passing between loops?)
PR target/83008
* config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
storing integer register in SImode. Fix cost of 256 and 512
byte aligned SSE register store.
* config/i386/i386.c (ix86_multiplication_cost): Fix
multiplication cost for TARGET_AVX512DQ.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
2021-10-06 15:00 ` rguenth at gcc dot gnu.org
@ 2021-10-08 9:04 ` crazylht at gmail dot com
2021-10-08 9:49 ` crazylht at gmail dot com
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-08 9:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> Caused by
>
> commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
> Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
> Date: Thu Feb 8 23:31:15 2018 +0100
>
> re PR target/83008 ([performance] Is it better to avoid extra
> instructions in data passing between loops?)
>
> PR target/83008
> * config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
> storing integer register in SImode. Fix cost of 256 and 512
> byte aligned SSE register store.
>
> * config/i386/i386.c (ix86_multiplication_cost): Fix
> multiplication cost for TARGET_AVX512DQ.
This patch looks like it is adjusting the cost of the vector and scalar stores,
but forgot to increase unalign sse store cost to at least the same as aligned
ones.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (2 preceding siblings ...)
2021-10-08 9:04 ` crazylht at gmail dot com
@ 2021-10-08 9:49 ` crazylht at gmail dot com
2021-10-08 10:07 ` rguenther at suse dot de
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-08 9:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Richard Biener from comment #2)
> > Caused by
> >
> > commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
> > Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
> > Date: Thu Feb 8 23:31:15 2018 +0100
> >
> > re PR target/83008 ([performance] Is it better to avoid extra
> > instructions in data passing between loops?)
> >
> > PR target/83008
> > * config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
> > storing integer register in SImode. Fix cost of 256 and 512
> > byte aligned SSE register store.
Revert change in skylake_cost, still pass pr83008.c, guess it's fixed by some
other patch?
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (3 preceding siblings ...)
2021-10-08 9:49 ` crazylht at gmail dot com
@ 2021-10-08 10:07 ` rguenther at suse dot de
2021-10-08 10:43 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-10-08 10:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 8 Oct 2021, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
>
> --- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Hongtao.liu from comment #3)
> > (In reply to Richard Biener from comment #2)
> > > Caused by
> > >
> > > commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
> > > Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
> > > Date: Thu Feb 8 23:31:15 2018 +0100
> > >
> > > re PR target/83008 ([performance] Is it better to avoid extra
> > > instructions in data passing between loops?)
> > >
> > > PR target/83008
> > > * config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
> > > storing integer register in SImode. Fix cost of 256 and 512
> > > byte aligned SSE register store.
> Revert change in skylake_cost, still pass pr83008.c, guess it's fixed by some
> other patch?
Yes, likely.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (4 preceding siblings ...)
2021-10-08 10:07 ` rguenther at suse dot de
@ 2021-10-08 10:43 ` rguenth at gcc dot gnu.org
2021-10-11 2:19 ` crazylht at gmail dot com
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-08 10:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the end only benchmarking will tell what is best to do (adjust the aligned
cost or revert the unaligned cost).
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (5 preceding siblings ...)
2021-10-08 10:43 ` rguenth at gcc dot gnu.org
@ 2021-10-11 2:19 ` crazylht at gmail dot com
2021-10-12 12:53 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-11 2:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
SPEC2017 data on CLX seems to ok after changing unaligned sse store cost.
fprate:
503.bwaves_r BuildSame
507.cactuBSSN_r -0.22
508.namd_r -0.02
510.parest_r -0.28
511.povray_r -0.20
519.lbm_r BuildSame
521.wrf_r -0.58
526.blender_r -0.30
527.cam4_r 1.07
538.imagick_r 0.01
544.nab_r -0.09
549.fotonik3d_r BuildSame
554.roms_r BuildSame
intrate:
500.perlbench_r -0.25
502.gcc_r -0.15
505.mcf_r BuildSame
520.omnetpp_r 1.03
523.xalancbmk_r -0.13
525.x264_r -0.05
531.deepsjeng_r -0.27
541.leela_r -0.24
548.exchange2_r -0.06
557.xz_r -0.10
999.specrand_ir 2.69
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (6 preceding siblings ...)
2021-10-11 2:19 ` crazylht at gmail dot com
@ 2021-10-12 12:53 ` rguenth at gcc dot gnu.org
2021-10-13 1:20 ` crazylht at gmail dot com
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-12 12:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
I would mostly expect less peeling for alignment being done (and thus slightly
smaller code size with the issue fixed).
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (7 preceding siblings ...)
2021-10-12 12:53 ` rguenth at gcc dot gnu.org
@ 2021-10-13 1:20 ` crazylht at gmail dot com
2021-10-13 7:25 ` rguenther at suse dot de
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-13 1:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
I'm curious why we need peeling for unaligned access, because unaligned access
instructions should also be available for aligned addresses, can't we just mark
mem_ref as unaligned (although this is fake, just to generate unaligned
instructions for the back end only)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (8 preceding siblings ...)
2021-10-13 1:20 ` crazylht at gmail dot com
@ 2021-10-13 7:25 ` rguenther at suse dot de
2021-11-19 1:23 ` cvs-commit at gcc dot gnu.org
2023-11-30 8:53 ` liuhongt at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-10-13 7:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 13 Oct 2021, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
>
> --- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
> I'm curious why we need peeling for unaligned access, because unaligned access
> instructions should also be available for aligned addresses, can't we just mark
> mem_ref as unaligned (although this is fake, just to generate unaligned
> instructions for the back end only)
The costing is not for movaps vs movups but for movups on aligned vs.
unaligned storage. So to make the access fast the costing tells us
that the access has to be actually unaligned.
Anyhow, the vectorizer does not consider to actively misalign in
case all accesses are known to be aligned - but what happens is
that if there's at least one unaligned access it evaluates the
costs of aligning that access vs. aligning the other accesses
and the bug makes it appear that aligning a single access is
cheaper than aligning multiple accesses (even if those are already
aligned and thus would require no peeling at all).
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (9 preceding siblings ...)
2021-10-13 7:25 ` rguenther at suse dot de
@ 2021-11-19 1:23 ` cvs-commit at gcc dot gnu.org
2023-11-30 8:53 ` liuhongt at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-19 1:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:d3152981f71eef16e50246a94819c39ff1489c70
commit r12-5390-gd3152981f71eef16e50246a94819c39ff1489c70
Author: liuhongt <hongtao.liu@intel.com>
Date: Sat Oct 9 09:42:10 2021 +0800
Reduce cost of aligned sse register store.
Make them be equal to cost of unaligned ones to avoid odd alignment
peeling.
Impact for SPEC2017 on CLX:
fprate:
503.bwaves_r BuildSame
507.cactuBSSN_r -0.22
508.namd_r -0.02
510.parest_r -0.28
511.povray_r -0.20
519.lbm_r BuildSame
521.wrf_r -0.58
526.blender_r -0.30
527.cam4_r 1.07
538.imagick_r 0.01
544.nab_r -0.09
549.fotonik3d_r BuildSame
554.roms_r BuildSame
intrate:
500.perlbench_r -0.25
502.gcc_r -0.15
505.mcf_r BuildSame
520.omnetpp_r 1.03
523.xalancbmk_r -0.13
525.x264_r -0.05
531.deepsjeng_r -0.27
541.leela_r -0.24
548.exchange2_r -0.06
557.xz_r -0.10
999.specrand_ir 2.69
gcc/ChangeLog:
PR target/102543
* config/i386/x86-tune-costs.h (skylake_cost): Reduce cost of
storing 256/512-bit SSE register to be equal to cost of
unaligned store to avoid odd alignment peeling.
(icelake_cost): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr102543.c: New test.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/102543] -march=cascadelake performs odd alignment peeling
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
` (10 preceding siblings ...)
2021-11-19 1:23 ` cvs-commit at gcc dot gnu.org
@ 2023-11-30 8:53 ` liuhongt at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2023-11-30 8:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
liuhongt at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
CC| |liuhongt at gcc dot gnu.org
Resolution|--- |FIXED
--- Comment #12 from liuhongt at gcc dot gnu.org ---
Fixed in GCC12 and above.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-11-30 8:53 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
2021-10-06 15:00 ` rguenth at gcc dot gnu.org
2021-10-08 9:04 ` crazylht at gmail dot com
2021-10-08 9:49 ` crazylht at gmail dot com
2021-10-08 10:07 ` rguenther at suse dot de
2021-10-08 10:43 ` rguenth at gcc dot gnu.org
2021-10-11 2:19 ` crazylht at gmail dot com
2021-10-12 12:53 ` rguenth at gcc dot gnu.org
2021-10-13 1:20 ` crazylht at gmail dot com
2021-10-13 7:25 ` rguenther at suse dot de
2021-11-19 1:23 ` cvs-commit at gcc dot gnu.org
2023-11-30 8:53 ` liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).