* Re: [PATCH] tree-optimization/113910 - huge compile time during PTA
@ 2024-02-14 13:47 Richard Biener
2024-02-15 17:05 ` Richard Sandiford
0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2024-02-14 13:47 UTC (permalink / raw)
To: gcc-patches
On Wed, 14 Feb 2024, Richard Biener wrote:
> For the testcase in PR113910 we spend a lot of time in PTA comparing
> bitmaps for looking up equivalence class members. This points to
> the very weak bitmap_hash function which effectively hashes set
> and a subset of not set bits. The following improves it by mixing
> that weak result with the population count of the bitmap, reducing
> the number of collisions significantly. It's still by no means
> a good hash function.
>
> One major problem with it was that it simply truncated the
> BITMAP_WORD sized intermediate hash to hashval_t which is
> unsigned int, effectively not hashing half of the bits. That solves
> most of the slowness. Mixing in the population count improves
> compile-time by another 30% though.
>
> This reduces the compile-time for the testcase from tens of minutes
> to 30 seconds and PTA time from 99% to 25%. bitmap_equal_p is gone
> from the profile.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, will
> push to trunk and branches.
Ha, and it breaks bootstrap because I misunderstood
bitmap_count_bits_in_word (should be word_s_). Fixing this turns
out that hashing the population count doesn't help anything
so I'm re-testing the following simpler variant, giving up on the
cheap last 25% but solving the regression as well.
Richard.
From a76aebfdc4b6247db6a061e6395fd088a5694122 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Wed, 14 Feb 2024 12:33:13 +0100
Subject: [PATCH] tree-optimization/113910 - huge compile time during PTA
To: gcc-patches@gcc.gnu.org
For the testcase in PR113910 we spend a lot of time in PTA comparing
bitmaps for looking up equivalence class members. This points to
the very weak bitmap_hash function which effectively hashes set
and a subset of not set bits.
The major problem with it is that it simply truncates the
BITMAP_WORD sized intermediate hash to hashval_t which is
unsigned int, effectively not hashing half of the bits.
This reduces the compile-time for the testcase from tens of minutes
to 42 seconds and PTA time from 99% to 46%.
PR tree-optimization/113910
* bitmap.cc (bitmap_hash): Mix the full element "hash" to
the hashval_t hash.
---
gcc/bitmap.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
index 6cf326bca5a..459e32c1ad1 100644
--- a/gcc/bitmap.cc
+++ b/gcc/bitmap.cc
@@ -2706,7 +2706,7 @@ bitmap_hash (const_bitmap head)
for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
hash ^= ptr->bits[ix];
}
- return (hashval_t)hash;
+ return iterative_hash (&hash, sizeof (hash), 0);
}
\f
--
2.35.3
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] tree-optimization/113910 - huge compile time during PTA
2024-02-14 13:47 [PATCH] tree-optimization/113910 - huge compile time during PTA Richard Biener
@ 2024-02-15 17:05 ` Richard Sandiford
2024-02-15 17:26 ` Richard Biener
0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2024-02-15 17:05 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
Richard Biener <rguenther@suse.de> writes:
> On Wed, 14 Feb 2024, Richard Biener wrote:
>
>> For the testcase in PR113910 we spend a lot of time in PTA comparing
>> bitmaps for looking up equivalence class members. This points to
>> the very weak bitmap_hash function which effectively hashes set
>> and a subset of not set bits. The following improves it by mixing
>> that weak result with the population count of the bitmap, reducing
>> the number of collisions significantly. It's still by no means
>> a good hash function.
>>
>> One major problem with it was that it simply truncated the
>> BITMAP_WORD sized intermediate hash to hashval_t which is
>> unsigned int, effectively not hashing half of the bits. That solves
>> most of the slowness. Mixing in the population count improves
>> compile-time by another 30% though.
>>
>> This reduces the compile-time for the testcase from tens of minutes
>> to 30 seconds and PTA time from 99% to 25%. bitmap_equal_p is gone
>> from the profile.
>>
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, will
>> push to trunk and branches.
>
> Ha, and it breaks bootstrap because I misunderstood
> bitmap_count_bits_in_word (should be word_s_). Fixing this turns
> out that hashing the population count doesn't help anything
> so I'm re-testing the following simpler variant, giving up on the
> cheap last 25% but solving the regression as well.
>
> Richard.
>
> From a76aebfdc4b6247db6a061e6395fd088a5694122 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Wed, 14 Feb 2024 12:33:13 +0100
> Subject: [PATCH] tree-optimization/113910 - huge compile time during PTA
> To: gcc-patches@gcc.gnu.org
>
> For the testcase in PR113910 we spend a lot of time in PTA comparing
> bitmaps for looking up equivalence class members. This points to
> the very weak bitmap_hash function which effectively hashes set
> and a subset of not set bits.
>
> The major problem with it is that it simply truncates the
> BITMAP_WORD sized intermediate hash to hashval_t which is
> unsigned int, effectively not hashing half of the bits.
>
> This reduces the compile-time for the testcase from tens of minutes
> to 42 seconds and PTA time from 99% to 46%.
>
> PR tree-optimization/113910
> * bitmap.cc (bitmap_hash): Mix the full element "hash" to
> the hashval_t hash.
> ---
> gcc/bitmap.cc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
> index 6cf326bca5a..459e32c1ad1 100644
> --- a/gcc/bitmap.cc
> +++ b/gcc/bitmap.cc
> @@ -2706,7 +2706,7 @@ bitmap_hash (const_bitmap head)
> for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
> hash ^= ptr->bits[ix];
> }
> - return (hashval_t)hash;
> + return iterative_hash (&hash, sizeof (hash), 0);
> }
>
> \f
LGTM FWIW, but just curious: does using the iterative hash routines for
each update (instead of ^) help too, or is it too slow? Or maybe do an
iterative hash for the idx part and keep ^ for the bits accumulation?
Also wonder whether using + rather than ^ for the bits accumulation
would help...
Thanks,
Richard
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] tree-optimization/113910 - huge compile time during PTA
2024-02-15 17:05 ` Richard Sandiford
@ 2024-02-15 17:26 ` Richard Biener
0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2024-02-15 17:26 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches
> Am 15.02.2024 um 18:06 schrieb Richard Sandiford <richard.sandiford@arm.com>:
>
> Richard Biener <rguenther@suse.de> writes:
>>> On Wed, 14 Feb 2024, Richard Biener wrote:
>>>
>>> For the testcase in PR113910 we spend a lot of time in PTA comparing
>>> bitmaps for looking up equivalence class members. This points to
>>> the very weak bitmap_hash function which effectively hashes set
>>> and a subset of not set bits. The following improves it by mixing
>>> that weak result with the population count of the bitmap, reducing
>>> the number of collisions significantly. It's still by no means
>>> a good hash function.
>>>
>>> One major problem with it was that it simply truncated the
>>> BITMAP_WORD sized intermediate hash to hashval_t which is
>>> unsigned int, effectively not hashing half of the bits. That solves
>>> most of the slowness. Mixing in the population count improves
>>> compile-time by another 30% though.
>>>
>>> This reduces the compile-time for the testcase from tens of minutes
>>> to 30 seconds and PTA time from 99% to 25%. bitmap_equal_p is gone
>>> from the profile.
>>>
>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, will
>>> push to trunk and branches.
>>
>> Ha, and it breaks bootstrap because I misunderstood
>> bitmap_count_bits_in_word (should be word_s_). Fixing this turns
>> out that hashing the population count doesn't help anything
>> so I'm re-testing the following simpler variant, giving up on the
>> cheap last 25% but solving the regression as well.
>>
>> Richard.
>>
>> From a76aebfdc4b6247db6a061e6395fd088a5694122 Mon Sep 17 00:00:00 2001
>> From: Richard Biener <rguenther@suse.de>
>> Date: Wed, 14 Feb 2024 12:33:13 +0100
>> Subject: [PATCH] tree-optimization/113910 - huge compile time during PTA
>> To: gcc-patches@gcc.gnu.org
>>
>> For the testcase in PR113910 we spend a lot of time in PTA comparing
>> bitmaps for looking up equivalence class members. This points to
>> the very weak bitmap_hash function which effectively hashes set
>> and a subset of not set bits.
>>
>> The major problem with it is that it simply truncates the
>> BITMAP_WORD sized intermediate hash to hashval_t which is
>> unsigned int, effectively not hashing half of the bits.
>>
>> This reduces the compile-time for the testcase from tens of minutes
>> to 42 seconds and PTA time from 99% to 46%.
>>
>> PR tree-optimization/113910
>> * bitmap.cc (bitmap_hash): Mix the full element "hash" to
>> the hashval_t hash.
>> ---
>> gcc/bitmap.cc | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
>> index 6cf326bca5a..459e32c1ad1 100644
>> --- a/gcc/bitmap.cc
>> +++ b/gcc/bitmap.cc
>> @@ -2706,7 +2706,7 @@ bitmap_hash (const_bitmap head)
>> for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
>> hash ^= ptr->bits[ix];
>> }
>> - return (hashval_t)hash;
>> + return iterative_hash (&hash, sizeof (hash), 0);
>> }
>>
>> \f
>
> LGTM FWIW, but just curious: does using the iterative hash routines for
> each update (instead of ^) help too, or is it too slow?
It helps but is too slow.
> Or maybe do an
> iterative hash for the idx part and keep ^ for the bits accumulation?
> Also wonder whether using + rather than ^ for the bits accumulation
> would help...
I have a patch picking the new BFD string hash for this which is fast. I’ll do this for stage1, trying to replace our generic hash function.
Richard
> Thanks,
> Richard
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] tree-optimization/113910 - huge compile time during PTA
@ 2024-02-14 12:02 Richard Biener
0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2024-02-14 12:02 UTC (permalink / raw)
To: gcc-patches
For the testcase in PR113910 we spend a lot of time in PTA comparing
bitmaps for looking up equivalence class members. This points to
the very weak bitmap_hash function which effectively hashes set
and a subset of not set bits. The following improves it by mixing
that weak result with the population count of the bitmap, reducing
the number of collisions significantly. It's still by no means
a good hash function.
One major problem with it was that it simply truncated the
BITMAP_WORD sized intermediate hash to hashval_t which is
unsigned int, effectively not hashing half of the bits. That solves
most of the slowness. Mixing in the population count improves
compile-time by another 30% though.
This reduces the compile-time for the testcase from tens of minutes
to 30 seconds and PTA time from 99% to 25%. bitmap_equal_p is gone
from the profile.
Bootstrap and regtest running on x86_64-unknown-linux-gnu, will
push to trunk and branches.
PR tree-optimization/113910
* bitmap.cc (bitmap_hash): Mix the full element "hash" with
the bitmap population count.
---
gcc/bitmap.cc | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
index 6cf326bca5a..33aa0beb2b0 100644
--- a/gcc/bitmap.cc
+++ b/gcc/bitmap.cc
@@ -2696,6 +2696,7 @@ bitmap_hash (const_bitmap head)
{
const bitmap_element *ptr;
BITMAP_WORD hash = 0;
+ unsigned long count = 0;
int ix;
gcc_checking_assert (!head->tree_form);
@@ -2704,9 +2705,12 @@ bitmap_hash (const_bitmap head)
{
hash ^= ptr->indx;
for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
- hash ^= ptr->bits[ix];
+ {
+ hash ^= ptr->bits[ix];
+ count += bitmap_count_bits_in_word (&ptr->bits[ix]);
+ }
}
- return (hashval_t)hash;
+ return iterative_hash (&hash, sizeof (hash), count);
}
\f
--
2.35.3
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-02-15 17:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-14 13:47 [PATCH] tree-optimization/113910 - huge compile time during PTA Richard Biener
2024-02-15 17:05 ` Richard Sandiford
2024-02-15 17:26 ` Richard Biener
-- strict thread matches above, loose matches on Subject: below --
2024-02-14 12:02 Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).