From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13047 invoked by alias); 31 May 2019 11:44:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 13001 invoked by uid 89); 31 May 2019 11:44:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=Small, 2018-10, 201810 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 May 2019 11:44:42 +0000 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A6CC03179B49; Fri, 31 May 2019 11:44:32 +0000 (UTC) Received: from localhost (unknown [10.33.36.135]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5662B7E5D5; Fri, 31 May 2019 11:44:32 +0000 (UTC) Date: Fri, 31 May 2019 11:56:00 -0000 From: Jonathan Wakely To: =?iso-8859-1?Q?Fran=E7ois?= Dumont Cc: "libstdc++@gcc.gnu.org" , gcc-patches Subject: Re: Hashtable Small size optimization Message-ID: <20190531114431.GY2599@redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.11.3 (2019-02-01) X-SW-Source: 2019-05/txt/msg02122.txt.bz2 Re https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00903.html On 15/10/18 22:46 +0200, François Dumont wrote: >I started considering PR libstdc++/68303. > >First thing was to write a dedicated performance test case, it is the >unordered_small_size.cc I'd like to add with this patch. Great, more performance tests are always good. >The first runs show a major difference between tr1 and std >implementations, tr1 being much better: > >std::tr1::unordered_set without hash code cached: 1st insert    >   9r    9u    1s  14725920mem    0pf >std::tr1::unordered_set with hash code cached: 1st insert       >8r    9u    0s  14719680mem    0pf >std::unordered_set without hash code cached: 1st insert      >17r   17u    0s  16640080mem    0pf >std::unordered_set with hash code cached: 1st insert      14r   >14u    0s  16638656mem    0pf > >I had a look in gdb to find out why and the answer was quite obvious. >For 20 insertions tr1 implementation bucket count goes through [11, >23] whereas for std it is [2, 5, 11, 23], so 2 more expensive rehash. > >As unordered containers are dedicated to rather important number of >elements I propose to review the rehash policy with this patch so that >std also starts at 11 on the 1st insertion. After the patch figures >are: > >std::tr1::unordered_set without hash code cached: 1st insert    >   9r    9u    0s  14725920mem    0pf >std::tr1::unordered_set with hash code cached: 1st insert       >8r    8u    0s  14719680mem    0pf >std::unordered_set without hash code cached: 1st insert      >15r   15u    0s  16640128mem    0pf >std::unordered_set with hash code cached: 1st insert      12r   >12u    0s  16638688mem    0pf OK, that seems worthwhile then. >Moreover I noticed that performance tests are built with -O2, is that >intentional ? The std implementation uses more abstractions than tr1, >looks like building with -O3 optimizes away most of those abstractions >making tr1 and std implementation much closer: Yes, I think it's intentional that -O2 is used, because I think that's the most commonly-used optimisation level. We don't want to >std::tr1::unordered_set without hash code cached: 1st insert    >   2r    1u    1s  14725952mem    0pf >std::tr1::unordered_set with hash code cached: 1st insert       >2r    1u    0s  14719536mem    0pf >std::unordered_set without hash code cached: 1st insert       >2r    2u    0s  16640064mem    0pf >std::unordered_set with hash code cached: 1st insert       2r    >2u    0s  16638608mem    0pf Hmm, interesting. I wonder if we can determine what is not being optimized at -O2, and either tweak the code or improve the compiler. >Note that this patch also rework the alternative rehash policy based >on powers of 2 so that it also starts with a larger number of bucket >(16) and respects LWG2156. > >Last I had to wider the memory column so that alignment is preserved >even when memory diff is negative. > >Tested under Linux x86_64. > >Ok to commit ? Does this patch still apply cleanly? I think it would be good to commit it now, early in stage 1.