From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 9CCE53858004 for ; Fri, 1 Sep 2023 09:00:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9CCE53858004 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693558838; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qfZM/SoYet0aWTRf1WujG4XOmoxzAzz0ij26Kfl+vQA=; b=Giqe9+yhM2A52vNNotUECbUoN31flL8nIED7hndj02IAzWtWmriCJ7yaUfMCcTUBC49FbP zoNdvHxr3CeKhVO/EK7PkiRM3Fuk309Jblha8NFEhgro/unn3z23TLkXzp0ail4I3F6IES ot4ogUuWIQb8nfSxFHb/e8KvFTAYQrU= Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-211-fdwA8JnDNqqMcVsstVx0Jw-1; Fri, 01 Sep 2023 05:00:37 -0400 X-MC-Unique: fdwA8JnDNqqMcVsstVx0Jw-1 Received: by mail-lj1-f199.google.com with SMTP id 38308e7fff4ca-2bd1687c5d2so23262781fa.0 for ; Fri, 01 Sep 2023 02:00:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693558835; x=1694163635; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qfZM/SoYet0aWTRf1WujG4XOmoxzAzz0ij26Kfl+vQA=; b=cqrYyAFnwDZMcYRoByDf9RB7QY8LKfTgg9ZNTXR/hBg27nSyerHQaHQKh0WQlsx5TI I0/C+p35b7cd3Z8Cs80QFvtZB9bhysDPdPkYi0vHmnARxUd1MrC3+SLNf1pAcAfQSIIw x+xwvF+9if08rgQXCFMCqzh2JhgcwCQ9HiPkoMARQJbKFdDKSzh1yseGDHvhC/Crryjz 8aKyrf4/VGDpGimbYcCTpIU6EK8nBqnUAllW9U944Yd4Kd4iJl8xrLWcmrwihg7NYMhO iKSC5QmMpeL5esu2yRmYBDXOOoQ76g+qT7tkMXIrt0SuZoLNCa6F1aSX/rABz8Bdz85x Q7aQ== X-Gm-Message-State: AOJu0YwoCfLMOGIggcPn11n91Um801HBtTgdaGAFVt3vEzzeBVGr+Ca7 68uKeAyJlWTjuoJssruxbOmHkKKSuLtAJiqFWR06vr8k2DblweTkXVpODVyx62zYPyVTdojl7wf fSEeeWrJEx9KyMmxsYyOpuFnwM2RJPgk= X-Received: by 2002:a2e:8097:0:b0:2bd:1615:f9f8 with SMTP id i23-20020a2e8097000000b002bd1615f9f8mr1139858ljg.45.1693558835564; Fri, 01 Sep 2023 02:00:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF6W2o+IjfNFtDukcOO9XKiQfeUaH+WQ5+plKY9fzfBV7lkhU2nPHATRYjClsL3NQoewQhxN9ccEDpB5wNfvxk= X-Received: by 2002:a2e:8097:0:b0:2bd:1615:f9f8 with SMTP id i23-20020a2e8097000000b002bd1615f9f8mr1139837ljg.45.1693558835185; Fri, 01 Sep 2023 02:00:35 -0700 (PDT) MIME-Version: 1.0 References: <6841b337-c7a0-55d7-a513-e655c99df01a@gmail.com> <8fe52f15-1acc-da0e-9353-35bdc9aa25ef@gmail.com> In-Reply-To: From: Jonathan Wakely Date: Fri, 1 Sep 2023 10:00:23 +0100 Message-ID: Subject: Re: [PATCH][Hashtable] Performance optimization through use of insertion hint To: =?UTF-8?Q?Fran=C3=A7ois_Dumont?= Cc: "libstdc++" , gcc-patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 1 Sept 2023 at 09:59, Jonathan Wakely wrote: > > On Tue, 29 Aug 2023 at 20:52, Fran=C3=A7ois Dumont via Libstdc++ > wrote: > > > > Hi > > > > Any feedback regarding this patch ? > > This is a fairly large patch and before we make any more changes to > unordered containers we have an ABI break to fix: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111050 Or at least decide what to do about it... > > > > > > Fran=C3=A7ois > > > > On 24/07/2023 13:02, Fran=C3=A7ois Dumont wrote: > > > libstdc++: [_Hashtable] Make more use of insertion hint > > > > > > > > > When inserting an element into an empty bucket we currently inser= t > > > the new node > > > after the before-begin node so in first position. The drawback of > > > doing this is > > > that we are forced to update the bucket that was containing this > > > before-begin > > > node to point to the newly inserted node. To do so we need at bes= t > > > to do a modulo > > > to find this bucket and when hash code is not cached also compute= it. > > > > > > To avoid this side effect it is better to insert after the last > > > node. Adding > > > a member to keep track of this last node would be an abi breaking > > > change. Still we > > > can ask the user to maintain and provide this last node as an > > > insertion hint. > > > > > > Adapt range insertion methods to try to detect the last node and > > > then use it as > > > the insertion hint. > > > > > > libstdc++-v3/ChangeLog: > > > > > > * include/bits/hashtable_policy.h: > > > (_Insert_base::try_emplace): Adapt to use hint. > > > (_Insert_base::_M_insert_range): Try to detect container > > > last node and use it > > > as hint for insertions. > > > (_Hash_code_base::_M_hash_code(const _Hash&, const > > > _Hash_node_value<>&)): Remove. > > > (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const > > > _Hash_node_value<>&)): Remove. > > > * include/bits/hashtable.h > > > (_Hashtable<>::_M_insert_bucket_begin): Add hint paramete= r > > > and use it when inserting > > > into an empty bucket if hint is the last container node. > > > (_Hashtable<>::_InsertInfo): New struct. > > > (_Hashtable<>::_M_get_insert_info): New, return latter. > > > (_Hashtable<>::_M_insert_multi_node): Add _InsertInfo > > > parameter. > > > (_Hashtable<>::_M_insert_unique_node): Add __node_ptr hin= t > > > parameter. > > > (_Hashtable<>::_M_emplace_unique(__node_ptr, _Args&&...))= : > > > New. > > > (_Hashtable<>::_M_emplace_multi(__node_ptr, _Args&&...)): > > > New. > > > (_Hashtable<>::_M_emplace()): Adapt to use latters. > > > (_Hashtable<>::_M_insert_unique): Add __node_ptr paramete= r. > > > (_Hashtable<>::_M_insert_unique_aux): Add __node_ptr > > > parameter. > > > (_Hashtable<>::_M_insert(__node_ptr, _Arg&&, const > > > _NodeGenerator&, true_type)): > > > Use latter. > > > (_Hashtable<>::_M_reinsert_node(const_iterator, > > > node_type&&)): > > > Add hint parameter, adapt to use it. > > > (_Hashtable<>::_M_reinsert_node_multi): Use hint paramete= r > > > if available to extract > > > hash code. > > > (_Hashtable<>::_M_compute_hash_code(const _Hash&, > > > __node_ptr, __node_ptr, > > > const key_type&)): New. > > > (_Hashtable<>::_M_compute_hash_code<_H2>(const _H2&, __node_ptr, > > > __node_ptr, > > > const key_type&)): New. > > > (_Hashtable<>::_M_merge_unique): Adapt to use latter. > > > Implement small size > > > optimization. > > > (_Hashtable<>::_M_get_insert_info(const _Hash&, > > > __node_ptr, __node_ptr, > > > const key_type&)): New. > > > (_Hashtable<>::_M_get_insert_info<_H2>(const _H2&, __node_ptr, > > > __node_ptr, > > > const key_type&)): New. > > > (_Hashtable<>::_M_merge_multi): Adapt to use latter. > > > * include/bits/unordered_map.h > > > (unordered_map<>::insert(node_type&&)): Pass cend as > > > hint. > > > (unordered_map<>::insert(const_iterator, node_type&&)): > > > Adapt to use hint. > > > * include/bits/unordered_set.h > > > (unordered_set<>::insert(node_type&&)): Pass cend as > > > hint. > > > (unordered_set<>::insert(const_iterator, node_type&&)): > > > Adapt to use hint. > > > * > > > testsuite/23_containers/unordered_multimap/insert/hint.cc: Adapt > > > implementation > > > specific tests. > > > > > > Tested under linux x86_64. > > > > > > Here is the performance results of this patch, before: > > > > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > insertions w/o hint 94r 94u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > insertions with perfect hint 95r 94u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > insertions with bad hint 94r 94u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > range insertions 88r 88u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 > > > insertions w/o hint 91r 91u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 > > > insertions with perfect hint 92r 93u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 > > > insertions with bad hint 93r 93u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 range > > > insertions 88r 88u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 > > > insertions w/o hint 94r 95u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 > > > insertions with perfect hint 94r 95u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 > > > insertions with bad hint 94r 94u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 range > > > insertions 90r 90u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 insertions > > > w/o hint 68r 68u 0s 223999264mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 insertions > > > with perfect hint 67r 67u 0s 223999264mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 insertions > > > with bad hint 68r 68u 0s 223999264mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 range > > > insertions 62r 62u 0s 223999264mem 0pf > > > > > > After: > > > > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > insertions w/o hint 93r 93u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > insertions with perfect hint 58r 59u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > insertions with bad hint 93r 94u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code NOT cached 2000000 > > > range insertions 52r 51u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 > > > insertions w/o hint 96r 95u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 > > > insertions with perfect hint 61r 62u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 > > > insertions with bad hint 94r 94u 0s 191999712mem 0pf > > > unordered_multiset_hint.cc-thread hash code cached 2000000 range > > > insertions 52r 52u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 > > > insertions w/o hint 96r 95u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 > > > insertions with perfect hint 58r 59u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 > > > insertions with bad hint 94r 94u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code NOT cached 2000000 range > > > insertions 52r 52u 0s 191999712mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 insertions > > > w/o hint 70r 69u 0s 223999264mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 insertions > > > with perfect hint 67r 67u 0s 223999264mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 insertions > > > with bad hint 67r 67u 0s 223999264mem 0pf > > > unordered_set_hint.cc-thread hash code cached 2000000 range > > > insertions 63r 62u 0s 223999264mem 0pf > > > > > > Ok to commit ? > > > > > > Fran=C3=A7ois > >