From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id D02ED3858D20 for ; Fri, 3 Feb 2023 19:05:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D02ED3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 7A36E20696 for ; Fri, 3 Feb 2023 19:05:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1675451156; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gTvnsEZ9c0s0VNuywT2VuV+Br49XNbQQahMhC7QDptI=; b=ITxSBeXIPnY/bJMaWfJE/6JlUC1kSD7MohBXvHBWoL5hpbszhY8juNLEpPNRoJbksPZ2uU dCIcPVuoJ8Bg9/fg5b7texErMLgpgh6ttnUT/yApW59rm7yrQ2C3DlmdVxBZXdnC1cZwrO kNw1dY4buTA2v7fdJCuz3ez24kGPKeE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1675451156; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gTvnsEZ9c0s0VNuywT2VuV+Br49XNbQQahMhC7QDptI=; b=UhNUrNfUtNRb6lXAYPKS2Ytucbjr9Ns3ZWoFIboHcYFtoFaDdIviV+qPFhKOvW+vygKAx7 fBUCDydQgFPbZ5Cw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 60E431358A for ; Fri, 3 Feb 2023 19:05:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id YNwqFxRb3WPPFQAAMHmgww (envelope-from ) for ; Fri, 03 Feb 2023 19:05:56 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Richard Biener Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] Improve RTL CSE hash table hash usage Date: Fri, 3 Feb 2023 20:05:45 +0100 Message-Id: <3BE21948-1ED6-48F1-B3CC-C1E80EF91966@suse.de> References: In-Reply-To: To: Richard Sandiford via Gcc-patches X-Mailer: iPhone Mail (20D47) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,MIME_QP_LONG_LINE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Am 03.02.2023 um 16:55 schrieb Richard Sandiford via Gcc-patches : >=20 > =EF=BB=BFRichard Biener via Gcc-patches writes: >>>> Am 03.02.2023 um 15:20 schrieb Richard Sandiford via Gcc-patches : >>>=20 >>> =EF=BB=BFRichard Biener via Gcc-patches writes= : >>>> The RTL CSE hash table has a fixed number of buckets (32) each >>>> with a linked list of entries with the same hash value. The >>>> actual hash values are computed using hash_rtx which uses adds >>>> for mixing and adds the rtx CODE as CODE << 7 (apart from some >>>> exceptions such as MEM). The unsigned int typed hash value >>>> is then simply truncated for the actual lookup into the fixed >>>> size table which means that usually CODE is simply lost. >>>>=20 >>>> The following improves this truncation by first mixing in more >>>> bits using xor. It does not change the actual hash function >>>> since that's used outside of CSE as well. >>>>=20 >>>> An alternative would be to bump the fixed number of buckets, >>>> say to 256 which would retain the LSB of CODE or to 8192 which >>>> can capture all 6 bits required for the last CODE. >>>>=20 >>>> As the comment in CSE says, there's invalidate_memory and >>>> flush_hash_table done possibly frequently and those at least >>>> need to walk all slots, so when the hash table is mostly empty >>>> enlarging it will be a loss. Still there should be more >>>> regular lookups by hash, so less collisions should pay off >>>> as well. >>>=20 >>> Going purely from this description and without having looked >>> at the code properly, would it be possible to link all current >>> values together, not just those with the same hash? And would >>> that help? It looks like the list is already doubly-linked, >>> and there's spare room to store a "start of new hash" marker. >>=20 >> We already do have equivalent values linked, but I=E2=80=99m not sure tha= t=E2=80=99s what you are suggesting. >=20 > I was thinking of linking every active value in the table together, > but with entries for the same hash being consecutive. That way, things > like invalidate_memory can just walk the list and ignore the hash table. Ah, yeah. Even better might be a generation count for memory like there=E2=80= =99s one (but only for a subset of cases?!) for pseudos. That would avoid t= he walking altogether. >> Those should also have the same hash value, so both lists are somewhat re= dundant and we might be able to save some storage here by making this a list= of lists of same hash and value list? >=20 > I thought the value-equality list was to establish that (e.g.) > (reg R1) and (reg R2) are known to have the same value, despite > being different expressions with different hash values. I=E2=80=99d have to double check, I was just cursory sweeping over the code a= fter being pointed to it from a profile of a testcase. Most of CSE.cc dates= back to the initial source revision =E2=80=A6 Richard=20 > But I suppose if we reused an existing hash table structure (with its > own mechanism for handling collisions), it would make sense to use the > equivalent-value list to join everything together, rather than the > same-hash list. Again, there could be a marker to establish the start > of a new equivalent-value sublist. >=20 > Thanks, > Richard >>=20 >>>=20 >>> Thanks, >>> Richard >>>=20 >>>> Without enlarging the table a better hash function is unlikely >>>> going to make a big difference, simple statistics on the >>>> number of collisions at insertion time shows a reduction of >>>> around 10%. Bumping HASH_SHIFT by 1 improves that to 30% >>>> at the expense of reducing the average table fill by 10% >>>> (all of this stats from looking just at fold-const.i at -O2). >>>> Increasing HASH_SHIFT more leaves the table even more sparse >>>> likely showing that hash_rtx uses add for mixing which is >>>> quite bad. Bumping HASH_SHIFT by 2 removes 90% of all >>>> collisions. >>>>=20 >>>> Experimenting with using inchash instead of adds for the >>>> mixing does not improve things when looking at the HASH_SHIFT >>>> bumped by 2 numbers. >>>>=20 >>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. >>>>=20 >>>> Any opinions? >>>>=20 >>>> * cse.cc (HASH): Turn into inline function and mix >>>> in another HASH_SHIFT bits. >>>> (SAFE_HASH): Likewise. >>>> --- >>>> gcc/cse.cc | 37 +++++++++++++++++++++++-------------- >>>> 1 file changed, 23 insertions(+), 14 deletions(-) >>>>=20 >>>> diff --git a/gcc/cse.cc b/gcc/cse.cc >>>> index 37afc88b439..4777e559b86 100644 >>>> --- a/gcc/cse.cc >>>> +++ b/gcc/cse.cc >>>> @@ -420,20 +420,6 @@ struct table_elt >>>> #define HASH_SIZE (1 << HASH_SHIFT) >>>> #define HASH_MASK (HASH_SIZE - 1) >>>>=20 >>>> -/* Compute hash code of X in mode M. Special-case case where X is a p= seudo >>>> - register (hard registers may require `do_not_record' to be set). *= / >>>> - >>>> -#define HASH(X, M) \ >>>> - ((REG_P (X) && REGNO (X) >=3D FIRST_PSEUDO_REGISTER \ >>>> - ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X))) \ >>>> - : canon_hash (X, M)) & HASH_MASK) >>>> - >>>> -/* Like HASH, but without side-effects. */ >>>> -#define SAFE_HASH(X, M) \ >>>> - ((REG_P (X) && REGNO (X) >=3D FIRST_PSEUDO_REGISTER \ >>>> - ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X))) \ >>>> - : safe_hash (X, M)) & HASH_MASK) >>>> - >>>> /* Determine whether register number N is considered a fixed register f= or the >>>> purpose of approximating register costs. >>>> It is desirable to replace other regs with fixed regs, to reduce need= for >>>> @@ -586,6 +572,29 @@ static machine_mode cse_cc_succs (basic_block, bas= ic_block, rtx, rtx, >>>>=20 >>>> static const struct rtl_hooks cse_rtl_hooks =3D RTL_HOOKS_INITIALIZER; >>>> =0C >>>> +/* Compute hash code of X in mode M. Special-case case where X is a p= seudo >>>> + register (hard registers may require `do_not_record' to be set). *= / >>>> + >>>> +static inline unsigned >>>> +HASH (rtx x, machine_mode mode) >>>> +{ >>>> + unsigned h =3D (REG_P (x) && REGNO (x) >=3D FIRST_PSEUDO_REGISTER >>>> + ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x))) >>>> + : canon_hash (x, mode)); >>>> + return (h ^ (h >> HASH_SHIFT)) & HASH_MASK; >>>> +} >>>> + >>>> +/* Like HASH, but without side-effects. */ >>>> + >>>> +static inline unsigned >>>> +SAFE_HASH (rtx x, machine_mode mode) >>>> +{ >>>> + unsigned h =3D (REG_P (x) && REGNO (x) >=3D FIRST_PSEUDO_REGISTER >>>> + ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (x))) >>>> + : safe_hash (x, mode)); >>>> + return (h ^ (h >> HASH_SHIFT)) & HASH_MASK; >>>> +} >>>> + >>>> /* Nonzero if X has the form (PLUS frame-pointer integer). */ >>>>=20 >>>> static bool