From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id DF9D53858D20 for ; Fri, 3 Feb 2023 15:54:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF9D53858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D1AC0C14; Fri, 3 Feb 2023 07:54:53 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2D16C3F71E; Fri, 3 Feb 2023 07:54:11 -0800 (PST) From: Richard Sandiford To: Richard Biener via Gcc-patches Mail-Followup-To: Richard Biener via Gcc-patches ,Richard Biener , richard.sandiford@arm.com Cc: Richard Biener Subject: Re: [PATCH] Improve RTL CSE hash table hash usage References: <617912D0-503E-4F87-BFE1-CC9324ED67CA@suse.de> Date: Fri, 03 Feb 2023 15:54:09 +0000 In-Reply-To: <617912D0-503E-4F87-BFE1-CC9324ED67CA@suse.de> (Richard Biener via Gcc-patches's message of "Fri, 3 Feb 2023 16:44:12 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-35.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener via Gcc-patches writes: >> Am 03.02.2023 um 15:20 schrieb Richard Sandiford via Gcc-patches : >>=20 >> =EF=BB=BFRichard Biener via Gcc-patches writes: >>> The RTL CSE hash table has a fixed number of buckets (32) each >>> with a linked list of entries with the same hash value. The >>> actual hash values are computed using hash_rtx which uses adds >>> for mixing and adds the rtx CODE as CODE << 7 (apart from some >>> exceptions such as MEM). The unsigned int typed hash value >>> is then simply truncated for the actual lookup into the fixed >>> size table which means that usually CODE is simply lost. >>>=20 >>> The following improves this truncation by first mixing in more >>> bits using xor. It does not change the actual hash function >>> since that's used outside of CSE as well. >>>=20 >>> An alternative would be to bump the fixed number of buckets, >>> say to 256 which would retain the LSB of CODE or to 8192 which >>> can capture all 6 bits required for the last CODE. >>>=20 >>> As the comment in CSE says, there's invalidate_memory and >>> flush_hash_table done possibly frequently and those at least >>> need to walk all slots, so when the hash table is mostly empty >>> enlarging it will be a loss. Still there should be more >>> regular lookups by hash, so less collisions should pay off >>> as well. >>=20 >> Going purely from this description and without having looked >> at the code properly, would it be possible to link all current >> values together, not just those with the same hash? And would >> that help? It looks like the list is already doubly-linked, >> and there's spare room to store a "start of new hash" marker. > > We already do have equivalent values linked, but I=E2=80=99m not sure tha= t=E2=80=99s what you are suggesting. I was thinking of linking every active value in the table together, but with entries for the same hash being consecutive. That way, things like invalidate_memory can just walk the list and ignore the hash table. > Those should also have the same hash value, so both lists are somewhat re= dundant and we might be able to save some storage here by making this a lis= t of lists of same hash and value list? I thought the value-equality list was to establish that (e.g.) (reg R1) and (reg R2) are known to have the same value, despite being different expressions with different hash values. But I suppose if we reused an existing hash table structure (with its own mechanism for handling collisions), it would make sense to use the equivalent-value list to join everything together, rather than the same-hash list. Again, there could be a marker to establish the start of a new equivalent-value sublist. Thanks, Richard > >>=20 >> Thanks, >> Richard >>=20 >>> Without enlarging the table a better hash function is unlikely >>> going to make a big difference, simple statistics on the >>> number of collisions at insertion time shows a reduction of >>> around 10%. Bumping HASH_SHIFT by 1 improves that to 30% >>> at the expense of reducing the average table fill by 10% >>> (all of this stats from looking just at fold-const.i at -O2). >>> Increasing HASH_SHIFT more leaves the table even more sparse >>> likely showing that hash_rtx uses add for mixing which is >>> quite bad. Bumping HASH_SHIFT by 2 removes 90% of all >>> collisions. >>>=20 >>> Experimenting with using inchash instead of adds for the >>> mixing does not improve things when looking at the HASH_SHIFT >>> bumped by 2 numbers. >>>=20 >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. >>>=20 >>> Any opinions? >>>=20 >>> * cse.cc (HASH): Turn into inline function and mix >>> in another HASH_SHIFT bits. >>> (SAFE_HASH): Likewise. >>> --- >>> gcc/cse.cc | 37 +++++++++++++++++++++++-------------- >>> 1 file changed, 23 insertions(+), 14 deletions(-) >>>=20 >>> diff --git a/gcc/cse.cc b/gcc/cse.cc >>> index 37afc88b439..4777e559b86 100644 >>> --- a/gcc/cse.cc >>> +++ b/gcc/cse.cc >>> @@ -420,20 +420,6 @@ struct table_elt >>> #define HASH_SIZE (1 << HASH_SHIFT) >>> #define HASH_MASK (HASH_SIZE - 1) >>>=20 >>> -/* Compute hash code of X in mode M. Special-case case where X is a p= seudo >>> - register (hard registers may require `do_not_record' to be set). */ >>> - >>> -#define HASH(X, M) \ >>> - ((REG_P (X) && REGNO (X) >=3D FIRST_PSEUDO_REGISTER \ >>> - ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X))) \ >>> - : canon_hash (X, M)) & HASH_MASK) >>> - >>> -/* Like HASH, but without side-effects. */ >>> -#define SAFE_HASH(X, M) \ >>> - ((REG_P (X) && REGNO (X) >=3D FIRST_PSEUDO_REGISTER \ >>> - ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X))) \ >>> - : safe_hash (X, M)) & HASH_MASK) >>> - >>> /* Determine whether register number N is considered a fixed register f= or the >>> purpose of approximating register costs. >>> It is desirable to replace other regs with fixed regs, to reduce nee= d for >>> @@ -586,6 +572,29 @@ static machine_mode cse_cc_succs (basic_block, bas= ic_block, rtx, rtx, >>>=20 >>> static const struct rtl_hooks cse_rtl_hooks =3D RTL_HOOKS_INITIALIZER; >>> >>> +/* Compute hash code of X in mode M. Special-case case where X is a p= seudo >>> + register (hard registers may require `do_not_record' to be set). */ >>> + >>> +static inline unsigned >>> +HASH (rtx x, machine_mode mode) >>> +{ >>> + unsigned h =3D (REG_P (x) && REGNO (x) >=3D FIRST_PSEUDO_REGISTER >>> + ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x))) >>> + : canon_hash (x, mode)); >>> + return (h ^ (h >> HASH_SHIFT)) & HASH_MASK; >>> +} >>> + >>> +/* Like HASH, but without side-effects. */ >>> + >>> +static inline unsigned >>> +SAFE_HASH (rtx x, machine_mode mode) >>> +{ >>> + unsigned h =3D (REG_P (x) && REGNO (x) >=3D FIRST_PSEUDO_REGISTER >>> + ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x))) >>> + : safe_hash (x, mode)); >>> + return (h ^ (h >> HASH_SHIFT)) & HASH_MASK; >>> +} >>> + >>> /* Nonzero if X has the form (PLUS frame-pointer integer). */ >>>=20 >>> static bool