[Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step
@ 2021-12-10  9:03 husseydevin at gmail dot com
  2021-12-10  9:25 ` [Bug rtl-optimization/103641] " marxin at gcc dot gnu.org
                   ` (36 more replies)
  0 siblings, 37 replies; 38+ messages in thread
From: husseydevin at gmail dot com @ 2021-12-10  9:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

            Bug ID: 103641
           Summary: [aarch64][11 regression] Severe compile time
                    regression in SLP vectorize step
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: husseydevin at gmail dot com
  Target Milestone: ---

Created attachment 51966
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51966&action=edit
aarch64-linux-gnu-gcc-11 -O3 -c xxhash.c -ftime-report -ftime-report-details

While GCC 11.2 has been noticably better at NEON64 code, with some files it
hangs for more than 15-30 seconds on the SLP vectorization step.

I haven't narrowed this down to a specific thing yet because I don't know much
about the GCC internals, but it is *extremely* noticeable in the xxHash
library. (https://github.com/Cyan4973/xxHash).

This is a test compiling xxhash.c from Git revision
a17161efb1d2de151857277628678b0e0b486155.

This was done on a Core i5-430m with 8GB RAM and an SSD on Debian Bullseye
amd64. GCC 10 (10.2.1-6) was from the\repos, GCC 11 (11.2.0) was built from the
tarball with similar flags. While this may cause bias, the two compilers get
very similar times when the SLP vectorizer is off.

$ time aarch64-linux-gnu-gcc-10 -O3 -c xxhash.c

real    0m3.596s
user    0m3.270s
sys     0m0.149s
$ time aarch64-linux-gnu-gcc-11 -O3 -c xxhash.c

real    0m31.579s
user    0m31.314s
sys     0m0.112s

When disabling the NEON intrinsics with `-DXXH_VECTOR=0`, it only takes ~21
seconds. 

Time variable                                   usr           sys          wall
          GGC
 phase opt and generate             :  31.46 ( 97%)   0.24 ( 32%)  31.80 ( 96%)
   54M ( 63%)
 callgraph functions expansion      :  31.01 ( 96%)   0.18 ( 24%)  31.29 ( 94%)
   42M ( 49%)
 tree slp vectorization             :  28.35 ( 88%)   0.03 (  4%)  28.37 ( 85%)
 9941k ( 11%)

 TOTAL                              :  32.34          0.75         33.20       
   86M

This is significantly worse on my Pi 4B, where an ARMv7->AArch64 build took 3
minutes, although I presume that is mostly due to being 32-bit and the CPU
being much slower.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug rtl-optimization/103641] [aarch64][11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
@ 2021-12-10  9:25 ` marxin at gcc dot gnu.org
  2021-12-10  9:37 ` [Bug tree-optimization/103641] " pinskia at gcc dot gnu.org
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-12-10  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-12-10
             Status|UNCONFIRMED                 |WAITING

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Can you please attach pre-processed source file (-E) that can be consumed by
both GCC 10 and 11? Can you confirm the time difference also for it?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [aarch64][11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
  2021-12-10  9:25 ` [Bug rtl-optimization/103641] " marxin at gcc dot gnu.org
@ 2021-12-10  9:37 ` pinskia at gcc dot gnu.org
  2021-12-10  9:43 ` [Bug tree-optimization/103641] [11/12 " pinskia at gcc dot gnu.org
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=101028

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
30 seconds is not too bad really. Though we should look into it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
  2021-12-10  9:25 ` [Bug rtl-optimization/103641] " marxin at gcc dot gnu.org
  2021-12-10  9:37 ` [Bug tree-optimization/103641] " pinskia at gcc dot gnu.org
@ 2021-12-10  9:43 ` pinskia at gcc dot gnu.org
  2021-12-10  9:46 ` pinskia at gcc dot gnu.org
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10  9:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.3
             Target|                            |aarch64-*-linux-gnu
            Summary|[aarch64][11 regression]    |[11/12 regression] Severe
                   |Severe compile time         |compile time regression in
                   |regression in SLP vectorize |SLP vectorize step
                   |step                        |
             Status|WAITING                     |NEW

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
 tree slp vectorization             :  53.92 ( 83%)   0.00 (  0%)  53.90 ( 82%)
   19M ( 21%)

Still happens on the trunk.

Will provide the preprocessed source in a few minutes.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (2 preceding siblings ...)
  2021-12-10  9:43 ` [Bug tree-optimization/103641] [11/12 " pinskia at gcc dot gnu.org
@ 2021-12-10  9:46 ` pinskia at gcc dot gnu.org
  2021-12-10  9:56 ` pinskia at gcc dot gnu.org
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 51967
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51967&action=edit
preprocessed source

preprocessed source from gcc:
gcc version 12.0.0 20211118 (experimental) [master r12-5363] (GCC)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (3 preceding siblings ...)
  2021-12-10  9:46 ` pinskia at gcc dot gnu.org
@ 2021-12-10  9:56 ` pinskia at gcc dot gnu.org
  2021-12-10 10:01 ` marxin at gcc dot gnu.org
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10  9:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
>  tree slp vectorization             :  53.92 ( 83%)   0.00 (  0%)  53.90 (
> 82%)    19M ( 21%)
> 
> Still happens on the trunk.

I should say this is with checking and also on an aarch64 machine itself.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (4 preceding siblings ...)
  2021-12-10  9:56 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:01 ` marxin at gcc dot gnu.org
  2021-12-10 10:02 ` pinskia at gcc dot gnu.org
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-12-10 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> ---
How does one enable NEON? What -mcpu -mtune do you use?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (5 preceding siblings ...)
  2021-12-10 10:01 ` marxin at gcc dot gnu.org
@ 2021-12-10 10:02 ` pinskia at gcc dot gnu.org
  2021-12-10 10:03 ` pinskia at gcc dot gnu.org
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
  80.80%  cc1      cc1                [.] synth_mult

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (6 preceding siblings ...)
  2021-12-10 10:02 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:03 ` pinskia at gcc dot gnu.org
  2021-12-10 10:06 ` marxin at gcc dot gnu.org
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #6)
> How does one enable NEON? What -mcpu -mtune do you use?

NEON is enabled by default :).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (7 preceding siblings ...)
  2021-12-10 10:03 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:06 ` marxin at gcc dot gnu.org
  2021-12-10 10:08 ` pinskia at gcc dot gnu.org
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-12-10 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> (In reply to Martin Liška from comment #6)
> > How does one enable NEON? What -mcpu -mtune do you use?
> 
> NEON is enabled by default :).

I have a cross compiler that can't consume the provided pre-processed source
file:

$ marxin@marxinbox:~/Programming/testcases> aarch64-suse-linux-gcc-11 -v
Using built-in specs.
COLLECT_GCC=aarch64-suse-linux-gcc-11
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/aarch64-suse-linux/11/lto-wrapper
Target: aarch64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64
--enable-languages=c,c++ --enable-checking=release --disable-werror
--with-gxx-include-dir=/usr/include/c++/11 --enable-ssp --disable-libssp
--disable-libvtv --enable-cet=auto --disable-libcc1 --disable-plugin
--with-bugurl=https://bugs.opensuse.org/ --with-pkgversion='SUSE Linux'
--with-slibdir=/usr/aarch64-suse-linux/sys-root/lib64 --with-system-zlib
--enable-libstdcxx-allocator=new --disable-libstdcxx-pch
--enable-version-specific-runtime-libs --with-gcc-major-version-only
--enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function
--program-suffix=-11 --program-prefix=aarch64-suse-linux-
--target=aarch64-suse-linux --disable-nls
--with-sysroot=/usr/aarch64-suse-linux/sys-root
--with-build-sysroot=/usr/aarch64-suse-linux/sys-root
--with-build-time-tools=/usr/aarch64-suse-linux/bin
--enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419
--disable-libsanitizer --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.1 20211124 [revision 7510c23c1ec53aa4a62705f0384079661342ff7b]
(SUSE Linux) 

$ aarch64-suse-linux-gcc-11 xxhash.i -c -fmax-errors=5
In file included from xxhash.h:2710,
                 from xxhash.c:43:
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:33:9:
error: unknown '#pragma GCC aarch64' option 'arm_neon.h'
In file included from xxhash.h:2710,
                 from xxhash.c:43:
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:
In function 'vaddl_u8':
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:401:10:
error: incompatible types when returning type 'int' but 'uint16x8_t' was
expected
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:
In function 'vaddl_u16':
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:408:10:
error: incompatible types when returning type 'int' but 'uint32x4_t' was
expected
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:
In function 'vaddl_u32':
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:415:10:
error: incompatible types when returning type 'int' but 'uint64x2_t' was
expected
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:
In function 'vaddl_high_u8':
/home/ubuntu/upstream-gcc/lib/gcc/aarch64-unknown-linux-gnu/12.0.0/include/arm_neon.h:443:10:
error: incompatible types when returning type 'int' but 'uint16x8_t' was
expected

That's why I'm asking.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (8 preceding siblings ...)
  2021-12-10 10:06 ` marxin at gcc dot gnu.org
@ 2021-12-10 10:08 ` pinskia at gcc dot gnu.org
  2021-12-10 10:09 ` pinskia at gcc dot gnu.org
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #6)
> How does one enable NEON? What -mcpu -mtune do you use?

I was just using -march=armv8-a -mtune=generic -mcpu=generic -O3 to see that
most of the time was in synth_mult.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (9 preceding siblings ...)
  2021-12-10 10:08 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:09 ` pinskia at gcc dot gnu.org
  2021-12-10 10:12 ` marxin at gcc dot gnu.org
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #9)
> (In reply to Andrew Pinski from comment #8)
> > (In reply to Martin Liška from comment #6)
> > > How does one enable NEON? What -mcpu -mtune do you use?
> > 
> > NEON is enabled by default :).
> 
> I have a cross compiler that can't consume the provided pre-processed source
> file:

because the preprocessed source is from the trunk :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (10 preceding siblings ...)
  2021-12-10 10:09 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:12 ` marxin at gcc dot gnu.org
  2021-12-10 10:12 ` pinskia at gcc dot gnu.org
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-12-10 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #12 from Martin Liška <marxin at gcc dot gnu.org> ---
> because the preprocessed source is from the trunk :)

So please create one that can both GCC-10 and GCC-11 happily consume.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (11 preceding siblings ...)
  2021-12-10 10:12 ` marxin at gcc dot gnu.org
@ 2021-12-10 10:12 ` pinskia at gcc dot gnu.org
  2021-12-10 10:14 ` pinskia at gcc dot gnu.org
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #11)
> because the preprocessed source is from the trunk :)

The trunk changed how arm_neon.h is handled so most everything is internal to
GCC rather than having implementation in the header file so it is easier to
make changes and such.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug tree-optimization/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (12 preceding siblings ...)
  2021-12-10 10:12 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:14 ` pinskia at gcc dot gnu.org
  2021-12-10 10:15 ` [Bug middle-end/103641] " pinskia at gcc dot gnu.org
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 51968
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51968&action=edit
GCC 10 preprocessed source

I don't have GCC 11 installed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (13 preceding siblings ...)
  2021-12-10 10:14 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:15 ` pinskia at gcc dot gnu.org
  2021-12-10 10:24 ` pinskia at gcc dot gnu.org
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |middle-end

--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The problem is obvious on the trunk really, something made synth_mult really
really slow.
It was much faster in GCC 10 even:
  15.50%  cc1      cc1                [.] synth_mult

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (14 preceding siblings ...)
  2021-12-10 10:15 ` [Bug middle-end/103641] " pinskia at gcc dot gnu.org
@ 2021-12-10 10:24 ` pinskia at gcc dot gnu.org
  2021-12-10 10:28 ` pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=65951

--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
PR 65951 added the support to the vectorizer in the first place but I suspect
what changed between GCC 10 and GCC 11 were costs rather than anything else ...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (15 preceding siblings ...)
  2021-12-10 10:24 ` pinskia at gcc dot gnu.org
@ 2021-12-10 10:28 ` pinskia at gcc dot gnu.org
  2021-12-10 13:17 ` roger at nextmovesoftware dot com
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-10 10:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
My bet is r11-7153-ga11ef53238c8ebaab9a3f exposed the issue.  That is the
different cost model for the multiply exposed the compile time issue inside
synth_mult (which I am not shocked based on PR 87256 and all).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (16 preceding siblings ...)
  2021-12-10 10:28 ` pinskia at gcc dot gnu.org
@ 2021-12-10 13:17 ` roger at nextmovesoftware dot com
  2021-12-10 13:19 ` husseydevin at gmail dot com
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: roger at nextmovesoftware dot com @ 2021-12-10 13:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #18 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hm.  PR 87256 was much easier to diagnose, as the cost of multiplication on
hppa64 was 20-30 ALU instructions, so a single pathological multiplier would
combinatorially challenge synth_mult.  The new costs on AArch64 have a vector
multiplication cost of 4, which is very reasonable.  But I did see somewhere in
Andre's patch that had an ALU cost of COSTS_N_INSNS(0), which would fool
synth_mult that it's allowed/beneficial to use an infinite number of ALU
instructions to avoid a single multiplication.  Likewise, the very cheap (free)
shifts on ARM.  But this is just guess work; a reduced testcase that exhibits
the problem would be helpful.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (17 preceding siblings ...)
  2021-12-10 13:17 ` roger at nextmovesoftware dot com
@ 2021-12-10 13:19 ` husseydevin at gmail dot com
  2022-01-18 14:10 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: husseydevin at gmail dot com @ 2021-12-10 13:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #19 from Devin Hussey <husseydevin at gmail dot com> ---
> The new costs on AArch64 have a vector multiplication cost of 4, which is very reasonable.

Would this include multv2di3 by any chance?

Because another thing I noticed is that GCC is also trying to multiply 64-bit
numbers like it's free but it just ends up scalarizing.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (18 preceding siblings ...)
  2021-12-10 13:19 ` husseydevin at gmail dot com
@ 2022-01-18 14:10 ` rguenth at gcc dot gnu.org
  2022-01-22 14:30 ` roger at nextmovesoftware dot com
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-18 14:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (19 preceding siblings ...)
  2022-01-18 14:10 ` rguenth at gcc dot gnu.org
@ 2022-01-22 14:30 ` roger at nextmovesoftware dot com
  2022-01-24  8:13 ` rguenther at suse dot de
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-01-22 14:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #20 from Roger Sayle <roger at nextmovesoftware dot com> ---
IMHO, the problem is in tree-vect-patterns.cc's vect_synth_mult_by_constant.
The comment above line 3054 reads:
  /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
     The vectorizer's benefit analysis will decide whether it's beneficial
     to do this.  */
  bool possible = choose_mult_variant (mode, hwval, &alg, &variant, MAX_COST);

By using MAX_COST here, synth_mult is being allowed to take an unbounded
amount of time, considering all possible permutations/implementations to
find an optimal synthetic multiply sequence.  A more pragmatic bound might
be to compare the target's vector_multiply cost, or failing that use an
arbitrary, but reasonable limit, say COSTS_N_INSNS(8) machine instructions.
In the worst case, if it takes 100 instructions to do a vector multiply,
then the loop probably shouldn't be vectorized.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (20 preceding siblings ...)
  2022-01-22 14:30 ` roger at nextmovesoftware dot com
@ 2022-01-24  8:13 ` rguenther at suse dot de
  2022-01-24 16:49 ` roger at nextmovesoftware dot com
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenther at suse dot de @ 2022-01-24  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #21 from rguenther at suse dot de <rguenther at suse dot de> ---
On Sat, 22 Jan 2022, roger at nextmovesoftware dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641
> 
> --- Comment #20 from Roger Sayle <roger at nextmovesoftware dot com> ---
> IMHO, the problem is in tree-vect-patterns.cc's vect_synth_mult_by_constant.
> The comment above line 3054 reads:
>   /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
>      The vectorizer's benefit analysis will decide whether it's beneficial
>      to do this.  */
>   bool possible = choose_mult_variant (mode, hwval, &alg, &variant, MAX_COST);
> 
> By using MAX_COST here, synth_mult is being allowed to take an unbounded
> amount of time, considering all possible permutations/implementations to
> find an optimal synthetic multiply sequence.  A more pragmatic bound might
> be to compare the target's vector_multiply cost, or failing that use an
> arbitrary, but reasonable limit, say COSTS_N_INSNS(8) machine instructions.
> In the worst case, if it takes 100 instructions to do a vector multiply,
> then the loop probably shouldn't be vectorized.

Is there a way to switch synth_mult to number of insn based costs?
Like using -Os metrics?  And would that improve things here?

I agree that an ubound search is bad but as the comment explains
we want to delay costing to the vectorizer cost evaluation time ...

But sure, setting an upper bound to limit compile-time sounds still
reasonable.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (21 preceding siblings ...)
  2022-01-24  8:13 ` rguenther at suse dot de
@ 2022-01-24 16:49 ` roger at nextmovesoftware dot com
  2022-01-24 17:02 ` roger at nextmovesoftware dot com
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-01-24 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #22 from Roger Sayle <roger at nextmovesoftware dot com> ---
I completely agree with Richard that the decision to vectorize or not to
vectorize should be made elsewhere taking the whole function/loop into account.
 It's quite reasonable to synthesize a slow vector multiply if there's an
overall benefit from SLP.  What I think is required is that the "baseline" cost
should be the cost of moving from the vector to a scalar mode, performing the
multiplication(s) as a scalar and moving the result back again.  i.e. we're
assuming that we're always going to multiply the value in a vector register,
we're just choosing the cheapest implementation for it.  For the xxhash.i
testcase, I'm seeing DI mode multiplications with COSTS_N_INSNS(30) [i.e. a
mult_cost of 120]. Even with slow inter-unit moves it must be possible to do
this faster on AArch64?  In fact, we'll probably vectorize more in SLP, if we
have the option to shuffle data back to the scalar multiplier if required.
Perhaps even a define_insn_and_split of mulv2di3 to fool the middle-end into
thinking we can do this "natively" via an optab.

Note that multipliers used in cryptographic hash functions are sometimes
(chosen to be) pathological to synth_mult.  Like the design of DES' sboxes,
these are coefficients designed to be slow to implement in software [and faster
in custom hardware].  64bit values with around 32 (random) bits set.

I/we can try to speed up the recursion in synth_mult, and/or increase the size
of the hash-table cache [which will help hppa64 and other targets with slow
multipliers] but that's perhaps just working around the deeper issue with this
PR.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (22 preceding siblings ...)
  2022-01-24 16:49 ` roger at nextmovesoftware dot com
@ 2022-01-24 17:02 ` roger at nextmovesoftware dot com
  2022-01-25  7:23 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-01-24 17:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #23 from Roger Sayle <roger at nextmovesoftware dot com> ---
In fact I can see from my debugging logs that doing a DImode scalar
multiplication on AArch64 is never more than COSTS_N_INSNS(9) [mult_cost=36],
so doing this is a win if moving back and forth is cheaper than
COSTS_N_INSNS(21)!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (23 preceding siblings ...)
  2022-01-24 17:02 ` roger at nextmovesoftware dot com
@ 2022-01-25  7:23 ` rguenth at gcc dot gnu.org
  2022-01-25  7:52 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-25  7:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another note, a quick look at synth_mult shows that it should support vector
modes just fine but we are passing it the scalar mode.  We do know the
vector type that's going to be used so we should better pass down its mode
I think, given vector shifts might not even be supported (and thus they
hopefully will have prohibitive costs).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (24 preceding siblings ...)
  2022-01-25  7:23 ` rguenth at gcc dot gnu.org
@ 2022-01-25  7:52 ` rguenth at gcc dot gnu.org
  2022-02-04  7:26 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-25  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index bea04992160..856b8bd222e 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -3050,13 +3050,13 @@ vect_synth_mult_by_constant (vec_info *vinfo, tree op,
tree val,
   /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
      The vectorizer's benefit analysis will decide whether it's beneficial
      to do this.  */
-  bool possible = choose_mult_variant (mode, hwval, &alg,
+  tree vectype = get_vectype_for_scalar_type (vinfo, multtype);
+
+  bool possible = choose_mult_variant (TYPE_MODE (vectype), hwval, &alg,
                                        &variant, MAX_COST);
   if (!possible)
     return NULL;

-  tree vectype = get_vectype_for_scalar_type (vinfo, multtype);
-
   if (!vectype
       || !target_supports_mult_synth_alg (&alg, variant,
                                           vectype, synth_shift_p))


improves compile-time from 29s to 12s for the testcase (-O0 compiled cc1 with
checking enabled ...), I didn't analyze what the actual difference in
the chosen multiplication sequence is of course.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (25 preceding siblings ...)
  2022-01-25  7:52 ` rguenth at gcc dot gnu.org
@ 2022-02-04  7:26 ` rguenth at gcc dot gnu.org
  2022-02-04 10:30 ` cvs-commit at gcc dot gnu.org
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-04  7:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |kyrylo.tkachov at arm dot com

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm testing a patch along comment#25 - CCing Kyrylo who seems to have authored
the code.  It doesn't address the issue noted by Roger that we use MAX_COST
but as said in the comment it improves compile-time quite a bit and it should
also produce better sequences since we base the cost on the vector mode that
will be used rather than the scalar mode (assuming the vector ops are costed in
a non-random way - which is likely where this will fail).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (26 preceding siblings ...)
  2022-02-04  7:26 ` rguenth at gcc dot gnu.org
@ 2022-02-04 10:30 ` cvs-commit at gcc dot gnu.org
  2022-02-04 10:43 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-04 10:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:876e70d4681332a600492173af0c7259e5a438c6

commit r12-7047-g876e70d4681332a600492173af0c7259e5a438c6
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 4 09:26:57 2022 +0100

    tree-optimization/103641 - improve vect_synth_mult_by_constant

    The following happens to improve compile-time of the PR103641
    testcase on aarch64 significantly.  I did not investigate the
    effect on the generated code but at least in theory
    choose_mult_variant should do a better job when we tell it
    the actual mode we are going to use for the operations it
    synthesizes.

    2022-02-04  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/103641
            * tree-vect-patterns.cc (vect_synth_mult_by_constant):
            Pass the vector mode to choose_mult_variant.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (27 preceding siblings ...)
  2022-02-04 10:30 ` cvs-commit at gcc dot gnu.org
@ 2022-02-04 10:43 ` rguenth at gcc dot gnu.org
  2022-02-04 11:08 ` tnfchris at gcc dot gnu.org
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-04 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm not removing the regression marker yet - can ARM folks please update the
trunk numbers with a fully built compiler (w/o checking)?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (28 preceding siblings ...)
  2022-02-04 10:43 ` rguenth at gcc dot gnu.org
@ 2022-02-04 11:08 ` tnfchris at gcc dot gnu.org
  2022-02-07 12:19 ` tnfchris at gcc dot gnu.org
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-02-04 11:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #29 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #28)
> I'm not removing the regression marker yet - can ARM folks please update the
> trunk numbers with a fully built compiler (w/o checking)?

Sure, I'll come back on Monday when it's gathered data in the CI.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11/12 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (29 preceding siblings ...)
  2022-02-04 11:08 ` tnfchris at gcc dot gnu.org
@ 2022-02-07 12:19 ` tnfchris at gcc dot gnu.org
  2022-02-07 15:05 ` [Bug middle-end/103641] [11 " rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-02-07 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #30 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
No problem during nightlies. No real changes in other workloads in compile time
nor runtime.

can confirm no perf change for xxhash and compile time decreased from 8 to 1
sec.

tree vectorization                 :   0.28 (  3%)   0.00 (  0%)   0.28 (  3%) 
 135k (  0%)
tree slp vectorization             :   7.43 ( 89%)   0.00 (  0%)   7.41 ( 87%) 
3450k (  8%)

into

tree vectorization                 :   0.02 (  2%)   0.00 (  0%)   0.02 (  2%) 
 135k (  0%)
tree slp vectorization             :   0.37 ( 35%)   0.00 (  0%)   0.39 ( 31%) 
3400k (  8%)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (30 preceding siblings ...)
  2022-02-07 12:19 ` tnfchris at gcc dot gnu.org
@ 2022-02-07 15:05 ` rguenth at gcc dot gnu.org
  2022-02-08  8:08 ` tnfchris at gcc dot gnu.org
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-07 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |12.0
            Summary|[11/12 regression] Severe   |[11 regression] Severe
                   |compile time regression in  |compile time regression in
                   |SLP vectorize step          |SLP vectorize step
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> ---
Marking fixed for GCC 12 then (still 35% with SLP vectorization is on the high
end).  I'm not sure about backporting this particular change, I'll definitely
wait more for that.

I suppose the slowness is still entirely within synth_mult?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (31 preceding siblings ...)
  2022-02-07 15:05 ` [Bug middle-end/103641] [11 " rguenth at gcc dot gnu.org
@ 2022-02-08  8:08 ` tnfchris at gcc dot gnu.org
  2022-02-08  8:13 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-02-08  8:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #32 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> 
> I suppose the slowness is still entirely within synth_mult?

I'm not sure... I can't seem to get the same granularity level that Andrew
got... How did you get that report Andrew?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (32 preceding siblings ...)
  2022-02-08  8:08 ` tnfchris at gcc dot gnu.org
@ 2022-02-08  8:13 ` pinskia at gcc dot gnu.org
  2022-02-08  8:15 ` tnfchris at gcc dot gnu.org
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-08  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #33 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #32)
> I'm not sure... I can't seem to get the same granularity level that Andrew
> got... How did you get that report Andrew?

I was using perf record/perf report to get that report.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (33 preceding siblings ...)
  2022-02-08  8:13 ` pinskia at gcc dot gnu.org
@ 2022-02-08  8:15 ` tnfchris at gcc dot gnu.org
  2022-03-16  8:22 ` cvs-commit at gcc dot gnu.org
  2022-03-16  8:23 ` rguenth at gcc dot gnu.org
  36 siblings, 0 replies; 38+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-02-08  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #34 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #33)
> (In reply to Tamar Christina from comment #32)
> > I'm not sure... I can't seem to get the same granularity level that Andrew
> > got... How did you get that report Andrew?
> 
> I was using perf record/perf report to get that report.

Ah! doh.. thanks I'll take a look

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (34 preceding siblings ...)
  2022-02-08  8:15 ` tnfchris at gcc dot gnu.org
@ 2022-03-16  8:22 ` cvs-commit at gcc dot gnu.org
  2022-03-16  8:23 ` rguenth at gcc dot gnu.org
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-16  8:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

--- Comment #35 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b6950623cd13c98354b105d7210cc1cf6a284f3a

commit r11-9656-gb6950623cd13c98354b105d7210cc1cf6a284f3a
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 4 09:26:57 2022 +0100

    tree-optimization/103641 - improve vect_synth_mult_by_constant

    The following happens to improve compile-time of the PR103641
    testcase on aarch64 significantly.  I did not investigate the
    effect on the generated code but at least in theory
    choose_mult_variant should do a better job when we tell it
    the actual mode we are going to use for the operations it
    synthesizes.

    2022-02-04  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/103641
            * tree-vect-patterns.c (vect_synth_mult_by_constant):
            Pass the vector mode to choose_mult_variant.

    (cherry picked from commit 876e70d4681332a600492173af0c7259e5a438c6)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/103641] [11 regression] Severe compile time regression in SLP vectorize step
  2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
                   ` (35 preceding siblings ...)
  2022-03-16  8:22 ` cvs-commit at gcc dot gnu.org
@ 2022-03-16  8:23 ` rguenth at gcc dot gnu.org
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-03-16  8:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED
      Known to work|                            |11.2.1
      Known to fail|                            |11.2.0

--- Comment #36 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-03-16  8:23 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10  9:03 [Bug rtl-optimization/103641] New: [aarch64][11 regression] Severe compile time regression in SLP vectorize step husseydevin at gmail dot com
2021-12-10  9:25 ` [Bug rtl-optimization/103641] " marxin at gcc dot gnu.org
2021-12-10  9:37 ` [Bug tree-optimization/103641] " pinskia at gcc dot gnu.org
2021-12-10  9:43 ` [Bug tree-optimization/103641] [11/12 " pinskia at gcc dot gnu.org
2021-12-10  9:46 ` pinskia at gcc dot gnu.org
2021-12-10  9:56 ` pinskia at gcc dot gnu.org
2021-12-10 10:01 ` marxin at gcc dot gnu.org
2021-12-10 10:02 ` pinskia at gcc dot gnu.org
2021-12-10 10:03 ` pinskia at gcc dot gnu.org
2021-12-10 10:06 ` marxin at gcc dot gnu.org
2021-12-10 10:08 ` pinskia at gcc dot gnu.org
2021-12-10 10:09 ` pinskia at gcc dot gnu.org
2021-12-10 10:12 ` marxin at gcc dot gnu.org
2021-12-10 10:12 ` pinskia at gcc dot gnu.org
2021-12-10 10:14 ` pinskia at gcc dot gnu.org
2021-12-10 10:15 ` [Bug middle-end/103641] " pinskia at gcc dot gnu.org
2021-12-10 10:24 ` pinskia at gcc dot gnu.org
2021-12-10 10:28 ` pinskia at gcc dot gnu.org
2021-12-10 13:17 ` roger at nextmovesoftware dot com
2021-12-10 13:19 ` husseydevin at gmail dot com
2022-01-18 14:10 ` rguenth at gcc dot gnu.org
2022-01-22 14:30 ` roger at nextmovesoftware dot com
2022-01-24  8:13 ` rguenther at suse dot de
2022-01-24 16:49 ` roger at nextmovesoftware dot com
2022-01-24 17:02 ` roger at nextmovesoftware dot com
2022-01-25  7:23 ` rguenth at gcc dot gnu.org
2022-01-25  7:52 ` rguenth at gcc dot gnu.org
2022-02-04  7:26 ` rguenth at gcc dot gnu.org
2022-02-04 10:30 ` cvs-commit at gcc dot gnu.org
2022-02-04 10:43 ` rguenth at gcc dot gnu.org
2022-02-04 11:08 ` tnfchris at gcc dot gnu.org
2022-02-07 12:19 ` tnfchris at gcc dot gnu.org
2022-02-07 15:05 ` [Bug middle-end/103641] [11 " rguenth at gcc dot gnu.org
2022-02-08  8:08 ` tnfchris at gcc dot gnu.org
2022-02-08  8:13 ` pinskia at gcc dot gnu.org
2022-02-08  8:15 ` tnfchris at gcc dot gnu.org
2022-03-16  8:22 ` cvs-commit at gcc dot gnu.org
2022-03-16  8:23 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).