[Bug target/102543] New: -march=cascadelake performs odd alignment peeling

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/102543] New: -march=cascadelake performs odd alignment peeling
@ 2021-09-30 10:13 rguenth at gcc dot gnu.org
  2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-30 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

            Bug ID: 102543
           Summary: -march=cascadelake performs odd alignment peeling
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

For gcc.dg/torture/pr65270-1.c we choose to misalign an aligned store + load
combo for runtime aligning a single load because we have (skylake_cost):

  {6, 6, 6, 10, 20},                    /* cost of loading SSE register
                                           in 32bit, 64bit, 128bit, 256bit and
512bit */
  {8, 8, 8, 12, 24},                    /* cost of storing SSE register
                                           in 32bit, 64bit, 128bit, 256bit and
512bit */
  {6, 6, 6, 10, 20},                    /* cost of unaligned loads.  */
  {8, 8, 8, 8, 16},                     /* cost of unaligned stores.  */

which means that an unaligned store is cheaper than an aligned store for
%ymm and even more so for %zmm!??

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
@ 2021-09-30 10:14 ` rguenth at gcc dot gnu.org
  2021-10-06 15:00 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-30 10:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
same for icelake_cost.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
  2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
@ 2021-10-06 15:00 ` rguenth at gcc dot gnu.org
  2021-10-08  9:04 ` crazylht at gmail dot com
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-06 15:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Caused by

commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
Date:   Thu Feb 8 23:31:15 2018 +0100

    re PR target/83008 ([performance] Is it better to avoid extra instructions
in data passing between loops?)

            PR target/83008
            * config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
            storing integer register in SImode.  Fix cost of 256 and 512
            byte aligned SSE register store.

            * config/i386/i386.c (ix86_multiplication_cost): Fix
            multiplication cost for TARGET_AVX512DQ.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
  2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
  2021-10-06 15:00 ` rguenth at gcc dot gnu.org
@ 2021-10-08  9:04 ` crazylht at gmail dot com
  2021-10-08  9:49 ` crazylht at gmail dot com
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-08  9:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> Caused by
> 
> commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
> Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
> Date:   Thu Feb 8 23:31:15 2018 +0100
> 
>     re PR target/83008 ([performance] Is it better to avoid extra
> instructions in data passing between loops?)
>     
>             PR target/83008
>             * config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
>             storing integer register in SImode.  Fix cost of 256 and 512
>             byte aligned SSE register store.
>     
>             * config/i386/i386.c (ix86_multiplication_cost): Fix
>             multiplication cost for TARGET_AVX512DQ.

This patch looks like it is adjusting the cost of the vector and scalar stores,
but forgot to increase unalign sse store cost to at least the same as aligned
ones.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-10-08  9:04 ` crazylht at gmail dot com
@ 2021-10-08  9:49 ` crazylht at gmail dot com
  2021-10-08 10:07 ` rguenther at suse dot de
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-08  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Richard Biener from comment #2)
> > Caused by
> > 
> > commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
> > Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
> > Date:   Thu Feb 8 23:31:15 2018 +0100
> > 
> >     re PR target/83008 ([performance] Is it better to avoid extra
> > instructions in data passing between loops?)
> >     
> >             PR target/83008
> >             * config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
> >             storing integer register in SImode.  Fix cost of 256 and 512
> >             byte aligned SSE register store.
Revert change in skylake_cost, still pass pr83008.c, guess it's fixed by some
other patch?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-10-08  9:49 ` crazylht at gmail dot com
@ 2021-10-08 10:07 ` rguenther at suse dot de
  2021-10-08 10:43 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-10-08 10:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 8 Oct 2021, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
> 
> --- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Hongtao.liu from comment #3)
> > (In reply to Richard Biener from comment #2)
> > > Caused by
> > > 
> > > commit 001e73373e6d2e7c756141e0d7ac8e24ae1574ad
> > > Author: Sergey Shalnov <Sergey.Shalnov@intel.com>
> > > Date:   Thu Feb 8 23:31:15 2018 +0100
> > > 
> > >     re PR target/83008 ([performance] Is it better to avoid extra
> > > instructions in data passing between loops?)
> > >     
> > >             PR target/83008
> > >             * config/i386/x86-tune-costs.h (skylake_cost): Fix cost of
> > >             storing integer register in SImode.  Fix cost of 256 and 512
> > >             byte aligned SSE register store.
> Revert change in skylake_cost, still pass pr83008.c, guess it's fixed by some
> other patch?

Yes, likely.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-10-08 10:07 ` rguenther at suse dot de
@ 2021-10-08 10:43 ` rguenth at gcc dot gnu.org
  2021-10-11  2:19 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-08 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the end only benchmarking will tell what is best to do (adjust the aligned
cost or revert the unaligned cost).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-10-08 10:43 ` rguenth at gcc dot gnu.org
@ 2021-10-11  2:19 ` crazylht at gmail dot com
  2021-10-12 12:53 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-11  2:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
SPEC2017 data on CLX seems to ok after changing unaligned sse store cost.

fprate:
  503.bwaves_r    BuildSame
  507.cactuBSSN_r     -0.22
  508.namd_r          -0.02
  510.parest_r        -0.28
  511.povray_r        -0.20
  519.lbm_r       BuildSame
  521.wrf_r           -0.58
  526.blender_r       -0.30
  527.cam4_r           1.07
  538.imagick_r        0.01
  544.nab_r           -0.09
  549.fotonik3d_r BuildSame
  554.roms_r      BuildSame
intrate:
  500.perlbench_r     -0.25
  502.gcc_r           -0.15
  505.mcf_r       BuildSame
  520.omnetpp_r        1.03
  523.xalancbmk_r     -0.13
  525.x264_r          -0.05
  531.deepsjeng_r     -0.27
  541.leela_r         -0.24
  548.exchange2_r     -0.06
  557.xz_r            -0.10
  999.specrand_ir      2.69

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-10-11  2:19 ` crazylht at gmail dot com
@ 2021-10-12 12:53 ` rguenth at gcc dot gnu.org
  2021-10-13  1:20 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-12 12:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
I would mostly expect less peeling for alignment being done (and thus slightly
smaller code size with the issue fixed).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-10-12 12:53 ` rguenth at gcc dot gnu.org
@ 2021-10-13  1:20 ` crazylht at gmail dot com
  2021-10-13  7:25 ` rguenther at suse dot de
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-13  1:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
I'm curious why we need peeling for unaligned access, because unaligned access
instructions should also be available for aligned addresses, can't we just mark
mem_ref as unaligned (although this is fake, just to generate unaligned
instructions for the back end only)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-10-13  1:20 ` crazylht at gmail dot com
@ 2021-10-13  7:25 ` rguenther at suse dot de
  2021-11-19  1:23 ` cvs-commit at gcc dot gnu.org
  2023-11-30  8:53 ` liuhongt at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-10-13  7:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 13 Oct 2021, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543
> 
> --- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
> I'm curious why we need peeling for unaligned access, because unaligned access
> instructions should also be available for aligned addresses, can't we just mark
> mem_ref as unaligned (although this is fake, just to generate unaligned
> instructions for the back end only)

The costing is not for movaps vs movups but for movups on aligned vs. 
unaligned storage.  So to make the access fast the costing tells us
that the access has to be actually unaligned.

Anyhow, the vectorizer does not consider to actively misalign in
case all accesses are known to be aligned - but what happens is
that if there's at least one unaligned access it evaluates the
costs of aligning that access vs. aligning the other accesses
and the bug makes it appear that aligning a single access is
cheaper than aligning multiple accesses (even if those are already
aligned and thus would require no peeling at all).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-10-13  7:25 ` rguenther at suse dot de
@ 2021-11-19  1:23 ` cvs-commit at gcc dot gnu.org
  2023-11-30  8:53 ` liuhongt at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-19  1:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:d3152981f71eef16e50246a94819c39ff1489c70

commit r12-5390-gd3152981f71eef16e50246a94819c39ff1489c70
Author: liuhongt <hongtao.liu@intel.com>
Date:   Sat Oct 9 09:42:10 2021 +0800

    Reduce cost of aligned sse register store.

    Make them be equal to cost of unaligned ones to avoid odd alignment
    peeling.

    Impact for SPEC2017 on CLX:
    fprate:
      503.bwaves_r    BuildSame
      507.cactuBSSN_r     -0.22
      508.namd_r          -0.02
      510.parest_r        -0.28
      511.povray_r        -0.20
      519.lbm_r       BuildSame
      521.wrf_r           -0.58
      526.blender_r       -0.30
      527.cam4_r           1.07
      538.imagick_r        0.01
      544.nab_r           -0.09
      549.fotonik3d_r BuildSame
      554.roms_r      BuildSame
    intrate:
      500.perlbench_r     -0.25
      502.gcc_r           -0.15
      505.mcf_r       BuildSame
      520.omnetpp_r        1.03
      523.xalancbmk_r     -0.13
      525.x264_r          -0.05
      531.deepsjeng_r     -0.27
      541.leela_r         -0.24
      548.exchange2_r     -0.06
      557.xz_r            -0.10
      999.specrand_ir      2.69

    gcc/ChangeLog:

            PR target/102543
            * config/i386/x86-tune-costs.h (skylake_cost): Reduce cost of
            storing 256/512-bit SSE register to be equal to cost of
            unaligned store to avoid odd alignment peeling.
            (icelake_cost): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr102543.c: New test.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/102543] -march=cascadelake performs odd alignment peeling
  2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-11-19  1:23 ` cvs-commit at gcc dot gnu.org
@ 2023-11-30  8:53 ` liuhongt at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2023-11-30  8:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543

liuhongt at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |liuhongt at gcc dot gnu.org
         Resolution|---                         |FIXED

--- Comment #12 from liuhongt at gcc dot gnu.org ---
Fixed in GCC12 and above.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-11-30  8:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 10:13 [Bug target/102543] New: -march=cascadelake performs odd alignment peeling rguenth at gcc dot gnu.org
2021-09-30 10:14 ` [Bug target/102543] " rguenth at gcc dot gnu.org
2021-10-06 15:00 ` rguenth at gcc dot gnu.org
2021-10-08  9:04 ` crazylht at gmail dot com
2021-10-08  9:49 ` crazylht at gmail dot com
2021-10-08 10:07 ` rguenther at suse dot de
2021-10-08 10:43 ` rguenth at gcc dot gnu.org
2021-10-11  2:19 ` crazylht at gmail dot com
2021-10-12 12:53 ` rguenth at gcc dot gnu.org
2021-10-13  1:20 ` crazylht at gmail dot com
2021-10-13  7:25 ` rguenther at suse dot de
2021-11-19  1:23 ` cvs-commit at gcc dot gnu.org
2023-11-30  8:53 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).