public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
@ 2024-02-13 15:37 ro at gcc dot gnu.org
  2024-02-13 15:38 ` [Bug tree-optimization/113910] " ro at gcc dot gnu.org
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: ro at gcc dot gnu.org @ 2024-02-13 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

            Bug ID: 113910
           Summary: [12/13/14 regression] Factor 15 slowdown compiling
                    AMDGPUDisassembler.cpp on SPARC
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ro at gcc dot gnu.org
  Target Milestone: ---
            Target: sparcv9-sun-solaris2.11

After GCC 11, compile time for LLVM's
lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
on 64-bit Solaris/SPARC regressed by a factor of 25:

cc1plus -fpreprocessed AMDGPUDisassembler.cpp.ii -quiet -mcpu=v9 -O -std=c++17
-ftime-report -o AMDGPUDisassembler.cpp.s

* GCC 11.4.0:

real        2:14.94
user        2:09.96
sys            4.83

* GCC 14.0.1:

real       33:03.33
user       32:57.32
sys            5.52

I'm attaching the preprocessed input and -ftime-report output for both.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
@ 2024-02-13 15:38 ` ro at gcc dot gnu.org
  2024-02-13 15:39 ` ro at gcc dot gnu.org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ro at gcc dot gnu.org @ 2024-02-13 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #1 from Rainer Orth <ro at gcc dot gnu.org> ---
Created attachment 57414
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57414&action=edit
preprocessed input

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
  2024-02-13 15:38 ` [Bug tree-optimization/113910] " ro at gcc dot gnu.org
@ 2024-02-13 15:39 ` ro at gcc dot gnu.org
  2024-02-13 15:40 ` ro at gcc dot gnu.org
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ro at gcc dot gnu.org @ 2024-02-13 15:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #2 from Rainer Orth <ro at gcc dot gnu.org> ---
Created attachment 57415
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57415&action=edit
GCC 11.4.0 -ftime-report output

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
  2024-02-13 15:38 ` [Bug tree-optimization/113910] " ro at gcc dot gnu.org
  2024-02-13 15:39 ` ro at gcc dot gnu.org
@ 2024-02-13 15:40 ` ro at gcc dot gnu.org
  2024-02-13 15:55 ` pinskia at gcc dot gnu.org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ro at gcc dot gnu.org @ 2024-02-13 15:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #3 from Rainer Orth <ro at gcc dot gnu.org> ---
Created attachment 57416
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57416&action=edit
GCC 14.0.1 -ftime-report output

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-02-13 15:40 ` ro at gcc dot gnu.org
@ 2024-02-13 15:55 ` pinskia at gcc dot gnu.org
  2024-02-13 16:04 ` ro at CeBiTec dot Uni-Bielefeld.DE
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-13 15:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |compile-time-hog

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>Configure with --enable-checking=release to disable checks.


Can you try that if you are comparing compile times?
Some of the slow down is definitely related to that:
>  tree SSA verifier                  :  12.28 (  1%)   0.02 (  0%)  12.12 (  1%)     0  (  0%)
> tree STMT verifier                 :  18.62 (  1%)   0.00 (  0%)  18.79 (  1%)     0  (  0%)
>  CFG verifier                       :   9.77 (  0%)   0.01 (  0%)  10.01 (  1%)     0  (  0%)
>  verify RTL sharing                 :  12.45 (  1%)   0.01 (  0%)  12.46 (  1%)     0  (  0%)


For an example.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-02-13 15:55 ` pinskia at gcc dot gnu.org
@ 2024-02-13 16:04 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-02-14  9:36 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2024-02-13 16:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>>Configure with --enable-checking=release to disable checks.

I'm seeing the same slowdown with release builds of GCC 12.3.0 and
13.2.0.

> Can you try that if you are comparing compile times?
> Some of the slow down is definitely related to that:
>>  tree SSA verifier : 12.28 ( 1%) 0.02 ( 0%) 12.12 ( 1%) 0 ( 0%)
>> tree STMT verifier : 18.62 ( 1%) 0.00 ( 0%) 18.79 ( 1%) 0 ( 0%)
>>  CFG verifier : 9.77 ( 0%) 0.01 ( 0%) 10.01 ( 1%) 0 ( 0%)
>>  verify RTL sharing : 12.45 ( 1%) 0.01 ( 0%) 12.46 ( 1%) 0 ( 0%)
>
>
> For an example.

13.2.0 takes

real          19.59
user          16.05
sys            3.43

but was still in the half-hour range with the original full set of
flags.  I'll retry that and report.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-02-13 16:04 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2024-02-14  9:36 ` rguenth at gcc dot gnu.org
  2024-02-14  9:37 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14  9:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-02-14
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |needs-bisection
     Ever confirmed|0                           |1

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
 tree PTA                           :1795.76 ( 91%)

"nice".  Possibly some of the PTA speedups done have regressed this case.

Bisecting would be nice.  It seems the preprocessed source "works" on x86_64 as
well at least, for both trunk and GCC 11 (and I confirm 11 is fast).

It might be that inlining heuristic changes make a difference here.  PTA is
known to have difficulties with functions with very many calls.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-02-14  9:36 ` rguenth at gcc dot gnu.org
@ 2024-02-14  9:37 ` rguenth at gcc dot gnu.org
  2024-02-14  9:44 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|sparcv9-sun-solaris2.11     |sparcv9-sun-solaris2.11,
                   |                            |x86_64-*-*
   Target Milestone|---                         |12.4
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-02-14  9:37 ` rguenth at gcc dot gnu.org
@ 2024-02-14  9:44 ` rguenth at gcc dot gnu.org
  2024-02-14 10:05 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note GCC 13 seems to dislike the preprocessed source (odd, 12 and trunk are
happy...)

In file included from /usr/gcc/11/include/c++/11.4.0/memory:76,
                 from
/vol/llvm/src/llvm-project/local/llvm/include/llvm/ADT/SmallVector.h:28,
                 from
/vol/llvm/src/llvm-project/local/llvm/include/llvm/ADT/SmallString.h:17,
                 from
/vol/llvm/src/llvm-project/local/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h:19,
                 from
/vol/llvm/src/llvm-project/local/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:19:
/usr/gcc/11/include/c++/11.4.0/bits/unique_ptr.h:486:8: error: expected
identifier before '__remove_cv'
/usr/gcc/11/include/c++/11.4.0/bits/unique_ptr.h:486:20: error: expected '('
before '=' token
/usr/gcc/11/include/c++/11.4.0/bits/unique_ptr.h:486:20: error: expected
type-specifier before '=' token
/usr/gcc/11/include/c++/11.4.0/bits/unique_ptr.h:486:20: error: expected
unqualified-id before '=' token
/usr/gcc/11/include/c++/11.4.0/bits/unique_ptr.h:492:55: error: wrong number of
template arguments (1, should be 2)

that's

 using __remove_cv = typename remove_cv<_Up>::type;


      template<typename _Up>
 using __is_derived_Tp
   = __and_< is_base_of<_Tp, _Up>,
      __not_<is_same<__remove_cv<_Tp>, __remove_cv<_Up>>> >;

I think.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-02-14  9:44 ` rguenth at gcc dot gnu.org
@ 2024-02-14 10:05 ` rguenth at gcc dot gnu.org
  2024-02-14 10:26 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
early PTA for decodeToMCInst runs on 241782 variables, and we have 751952
constraints.

A fun testcase ;)  A little bit large to work with of course.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-02-14 10:05 ` rguenth at gcc dot gnu.org
@ 2024-02-14 10:26 ` rguenth at gcc dot gnu.org
  2024-02-14 11:32 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 10:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
With only enabling early PTA via -fdisable-tree-alias -fdisable-tree-pre I
got the compile finished in 18 minutes and

 tree PTA                           :1044.48 ( 98%)   1.53 ( 27%)1046.29 ( 97%)
 4341k (  0%)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-02-14 10:26 ` rguenth at gcc dot gnu.org
@ 2024-02-14 11:32 ` rguenth at gcc dot gnu.org
  2024-02-14 11:41 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 11:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Cutting the switch in decodeToMCInst after case 693: (roughly halving it by the
number of source lines) gets us to

 tree PTA                           : 129.70 ( 92%)   0.51 ( 14%) 130.28 ( 90%)
 2279k (  0%)
 TOTAL                              : 140.28          3.68        144.01       
  982M

a profile shows

Samples: 657K of event 'cycles:u', Event count (approx.): 873340708228          
Overhead       Samples  Command  Shared Object       Symbol                     
  88.08%        578019  cc1plus  cc1plus             [.] bitmap_equal_p
   4.76%         31340  cc1plus  cc1plus             [.]
equiv_class_lookup_or_a
   0.59%          4039  cc1plus  cc1plus             [.] bitmap_set_bit
   0.24%          1611  cc1plus  cc1plus             [.] bitmap_copy

the way we hash bitmaps is quite bad, we effectively hash set and a subset
of unset bits.  Adding a simple additional "hash", the number of set bits,
improves this to

Samples: 214K of event 'cycles:u', Event count (approx.): 283548833048          
Overhead       Samples  Command  Shared Object     Symbol                       
  69.73%        148209  cc1plus  cc1plus           [.] bitmap_equal_p
   6.29%         13499  cc1plus  cc1plus           [.]
equiv_class_lookup_or_add

of course we still have too many calls (or too large but almost equal bitmaps
here).  Still I have a handle on this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-02-14 11:32 ` rguenth at gcc dot gnu.org
@ 2024-02-14 11:41 ` rguenth at gcc dot gnu.org
  2024-02-14 14:51 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 11:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 57422
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57422&action=edit
patch

I'm testing the attached which brings down compile-time to the levels of GCC 11
again (a bit faster even, 30s vs. 50s).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-02-14 11:41 ` rguenth at gcc dot gnu.org
@ 2024-02-14 14:51 ` rguenth at gcc dot gnu.org
  2024-02-14 14:51 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 14:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note after the proper fix we still have

(gdb) p pointer_equiv_class_table->m_searches 
$17 = 180497
(gdb) p pointer_equiv_class_table->m_collisions 
$18 = 4101085
(gdb) p pointer_equiv_class_table->m_n_elements 
$22 = 143701
(gdb) p pointer_equiv_class_table->m_size
$23 = 262139

"perfecting" the hash helps (mixing each individual bit number) but then
all the time is spent hashing ;)

Samples: 177K of event 'cycles:u', Event count (approx.): 232966151280          
Overhead       Samples  Command  Shared Object     Symbol                       
  35.77%         65423  cc1plus  cc1plus           [.] bitmap_hash
   9.64%         16589  cc1plus  cc1plus           [.] bitmap_set_bit

I think the data structure used is simply far from optimal.

Mixing each bitmap word has higher collision rates than the XOR (dropping
the XOR of the first bit number).  Mixing in ptr->indx as well gives
OK collision rates but still

  12.78%         16684  cc1plus  cc1plus             [.] bitmap_set_bit
  12.56%         19318  cc1plus  cc1plus             [.] bitmap_hash

XOR for the words ontop of mixing of ptr->indx gets little worse but still
reasonable rates with

  14.03%         16837  cc1plus  cc1plus             [.] bitmap_set_bit
   6.33%          7694  cc1plus  cc1plus             [.]
insert_updated_phi_node
   4.74%          7500  cc1plus  cc1plus             [.] bitmap_hash

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2024-02-14 14:51 ` rguenth at gcc dot gnu.org
@ 2024-02-14 14:51 ` cvs-commit at gcc dot gnu.org
  2024-02-14 15:11 ` [Bug tree-optimization/113910] [12/13 Regression] " rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-02-14 14:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ad7a365aaccecd23ea287c7faaab9c7bd50b944a

commit r14-8980-gad7a365aaccecd23ea287c7faaab9c7bd50b944a
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Feb 14 12:33:13 2024 +0100

    tree-optimization/113910 - huge compile time during PTA

    For the testcase in PR113910 we spend a lot of time in PTA comparing
    bitmaps for looking up equivalence class members.  This points to
    the very weak bitmap_hash function which effectively hashes set
    and a subset of not set bits.

    The major problem with it is that it simply truncates the
    BITMAP_WORD sized intermediate hash to hashval_t which is
    unsigned int, effectively not hashing half of the bits.

    This reduces the compile-time for the testcase from tens of minutes
    to 42 seconds and PTA time from 99% to 46%.

            PR tree-optimization/113910
            * bitmap.cc (bitmap_hash): Mix the full element "hash" to
            the hashval_t hash.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13 Regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2024-02-14 14:51 ` cvs-commit at gcc dot gnu.org
@ 2024-02-14 15:11 ` rguenth at gcc dot gnu.org
  2024-02-14 20:07 ` ro at CeBiTec dot Uni-Bielefeld.DE
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 15:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[12/13/14 regression]       |[12/13 Regression] Factor
                   |Factor 15 slowdown          |15 slowdown compiling
                   |compiling                   |AMDGPUDisassembler.cpp on
                   |AMDGPUDisassembler.cpp on   |SPARC
                   |SPARC                       |
      Known to work|                            |14.0

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
The regression should be fixed, can you check we're now no longer slower on
trunk?  (either use a release checking build or use -fno-checking which should
get you reasonably close)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13 Regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2024-02-14 15:11 ` [Bug tree-optimization/113910] [12/13 Regression] " rguenth at gcc dot gnu.org
@ 2024-02-14 20:07 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-02-15 10:43 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2024-02-14 20:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #15 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
> The regression should be fixed, can you check we're now no longer slower on
> trunk?  (either use a release checking build or use -fno-checking which should
> get you reasonably close)

I've done a --enable-checking=release build on trunk and compare compile
times of the -save-temps with g++ 11.4.0:

$ time cc1plus -fpreprocessed AMDGPUDisassembler.cpp.ii -quiet -mcpu=v9 -O
-std=c++17 -o AMDGPUDisassembler.cpp.s

* 11.4.0:

real        2:04.33
user        2:03.86
sys            0.30

* 14.0.1:

real        2:17.58
user        2:16.47
sys            0.87

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13 Regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2024-02-14 20:07 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2024-02-15 10:43 ` rguenth at gcc dot gnu.org
  2024-02-16 12:57 ` rguenth at gcc dot gnu.org
  2024-03-21 11:49 ` cvs-commit at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-15 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, thanks for checking.  Btw, -ftime-report for GCC 11 has different
bottle-necks meanwhile fixed:

 tree PTA                           :   1.66 (  3%)
 tree SSA incremental               :  31.86 ( 61%)
 TOTAL                              :  52.08 

but it had a bit less bloated PTA.

I now have

 tree PTA                           :  12.21 ( 35%)
 tree SSA incremental               :   3.96 ( 11%)
 TOTAL                              :  35.24 

on trunk.  I guess with bitmaps it always also depends on the memory
hierarchy of the machine, nevertheless overall it looks fine on SPARC
then.

Queued for backporting, some RFC for further improvements on bitmap_hash
is on the mailing list but I'm not going to backport that.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13 Regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2024-02-15 10:43 ` rguenth at gcc dot gnu.org
@ 2024-02-16 12:57 ` rguenth at gcc dot gnu.org
  2024-03-21 11:49 ` cvs-commit at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-16 12:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
The following still helps quite a bit on its own.

diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
index 459e32c1ad1..a05ad810800 100644
--- a/gcc/bitmap.cc
+++ b/gcc/bitmap.cc
@@ -2695,18 +2695,21 @@ hashval_t
 bitmap_hash (const_bitmap head)
 {
   const bitmap_element *ptr;
-  BITMAP_WORD hash = 0;
+  hashval_t hash = 0;
   int ix;

   gcc_checking_assert (!head->tree_form);

   for (ptr = head->first; ptr; ptr = ptr->next)
     {
-      hash ^= ptr->indx;
+      hash = iterative_hash_hashval_t (ptr->indx, hash);
       for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
-       hash ^= ptr->bits[ix];
+       if (sizeof (BITMAP_WORD) > sizeof (hashval_t))
+         hash = iterative_hash_host_wide_int (ptr->bits[ix], hash);
+       else
+         hash = iterative_hash_hashval_t (ptr->bits[ix], hash);
     }
-  return iterative_hash (&hash, sizeof (hash), 0);
+  return hash;
 }

 ^L

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/113910] [12/13 Regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
  2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2024-02-16 12:57 ` rguenth at gcc dot gnu.org
@ 2024-03-21 11:49 ` cvs-commit at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-21 11:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

--- Comment #18 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:9a19811ea1e9b3024c0f41b074d71679088bb2d7

commit r13-8478-g9a19811ea1e9b3024c0f41b074d71679088bb2d7
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Feb 14 12:33:13 2024 +0100

    tree-optimization/113910 - huge compile time during PTA

    For the testcase in PR113910 we spend a lot of time in PTA comparing
    bitmaps for looking up equivalence class members.  This points to
    the very weak bitmap_hash function which effectively hashes set
    and a subset of not set bits.

    The major problem with it is that it simply truncates the
    BITMAP_WORD sized intermediate hash to hashval_t which is
    unsigned int, effectively not hashing half of the bits.

    This reduces the compile-time for the testcase from tens of minutes
    to 42 seconds and PTA time from 99% to 46%.

            PR tree-optimization/113910
            * bitmap.cc (bitmap_hash): Mix the full element "hash" to
            the hashval_t hash.

    (cherry picked from commit ad7a365aaccecd23ea287c7faaab9c7bd50b944a)

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-03-21 11:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-13 15:37 [Bug tree-optimization/113910] New: [12/13/14 regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC ro at gcc dot gnu.org
2024-02-13 15:38 ` [Bug tree-optimization/113910] " ro at gcc dot gnu.org
2024-02-13 15:39 ` ro at gcc dot gnu.org
2024-02-13 15:40 ` ro at gcc dot gnu.org
2024-02-13 15:55 ` pinskia at gcc dot gnu.org
2024-02-13 16:04 ` ro at CeBiTec dot Uni-Bielefeld.DE
2024-02-14  9:36 ` rguenth at gcc dot gnu.org
2024-02-14  9:37 ` rguenth at gcc dot gnu.org
2024-02-14  9:44 ` rguenth at gcc dot gnu.org
2024-02-14 10:05 ` rguenth at gcc dot gnu.org
2024-02-14 10:26 ` rguenth at gcc dot gnu.org
2024-02-14 11:32 ` rguenth at gcc dot gnu.org
2024-02-14 11:41 ` rguenth at gcc dot gnu.org
2024-02-14 14:51 ` rguenth at gcc dot gnu.org
2024-02-14 14:51 ` cvs-commit at gcc dot gnu.org
2024-02-14 15:11 ` [Bug tree-optimization/113910] [12/13 Regression] " rguenth at gcc dot gnu.org
2024-02-14 20:07 ` ro at CeBiTec dot Uni-Bielefeld.DE
2024-02-15 10:43 ` rguenth at gcc dot gnu.org
2024-02-16 12:57 ` rguenth at gcc dot gnu.org
2024-03-21 11:49 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).