From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-502069-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13047 invoked by alias); 31 May 2019 11:44:44 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 13001 invoked by uid 89); 31 May 2019 11:44:43 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=Small, 2018-10, 201810
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 May 2019 11:44:42 +0000
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id A6CC03179B49;	Fri, 31 May 2019 11:44:32 +0000 (UTC)
Received: from localhost (unknown [10.33.36.135])	by smtp.corp.redhat.com (Postfix) with ESMTP id 5662B7E5D5;	Fri, 31 May 2019 11:44:32 +0000 (UTC)
Date: Fri, 31 May 2019 11:56:00 -0000
From: Jonathan Wakely <jwakely@redhat.com>
To: =?iso-8859-1?Q?Fran=E7ois?= Dumont <frs.dumont@gmail.com>
Cc: "libstdc++@gcc.gnu.org" <libstdc++@gcc.gnu.org>,	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: Hashtable Small size optimization
Message-ID: <20190531114431.GY2599@redhat.com>
References: <efa88a3a-efc3-aade-85d6-697a7228d66c@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1; format=flowed
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <efa88a3a-efc3-aade-85d6-697a7228d66c@gmail.com>
X-Clacks-Overhead: GNU Terry Pratchett
User-Agent: Mutt/1.11.3 (2019-02-01)
X-SW-Source: 2019-05/txt/msg02122.txt.bz2

Re https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00903.html


On 15/10/18 22:46 +0200, François Dumont wrote:
>I started considering PR libstdc++/68303.
>
>First thing was to write a dedicated performance test case, it is the 
>unordered_small_size.cc I'd like to add with this patch.

Great, more performance tests are always good.

>The first runs show a major difference between tr1 and std 
>implementations, tr1 being much better:
>
>std::tr1::unordered_set<int> without hash code cached: 1st insert    
>   9r    9u    1s  14725920mem    0pf
>std::tr1::unordered_set<int> with hash code cached: 1st insert       
>8r    9u    0s  14719680mem    0pf
>std::unordered_set<int> without hash code cached: 1st insert      
>17r   17u    0s  16640080mem    0pf
>std::unordered_set<int> with hash code cached: 1st insert      14r   
>14u    0s  16638656mem    0pf
>
>I had a look in gdb to find out why and the answer was quite obvious. 
>For 20 insertions tr1 implementation bucket count goes through [11, 
>23] whereas for std it is [2, 5, 11, 23], so 2 more expensive rehash.
>
>As unordered containers are dedicated to rather important number of 
>elements I propose to review the rehash policy with this patch so that 
>std also starts at 11 on the 1st insertion. After the patch figures 
>are:
>
>std::tr1::unordered_set<int> without hash code cached: 1st insert    
>   9r    9u    0s  14725920mem    0pf
>std::tr1::unordered_set<int> with hash code cached: 1st insert       
>8r    8u    0s  14719680mem    0pf
>std::unordered_set<int> without hash code cached: 1st insert      
>15r   15u    0s  16640128mem    0pf
>std::unordered_set<int> with hash code cached: 1st insert      12r   
>12u    0s  16638688mem    0pf

OK, that seems worthwhile then.

>Moreover I noticed that performance tests are built with -O2, is that 
>intentional ? The std implementation uses more abstractions than tr1, 
>looks like building with -O3 optimizes away most of those abstractions 
>making tr1 and std implementation much closer:

Yes, I think it's intentional that -O2 is used, because I think
that's the most commonly-used optimisation level. We don't want to 

>std::tr1::unordered_set<int> without hash code cached: 1st insert    
>   2r    1u    1s  14725952mem    0pf
>std::tr1::unordered_set<int> with hash code cached: 1st insert       
>2r    1u    0s  14719536mem    0pf
>std::unordered_set<int> without hash code cached: 1st insert       
>2r    2u    0s  16640064mem    0pf
>std::unordered_set<int> with hash code cached: 1st insert       2r    
>2u    0s  16638608mem    0pf

Hmm, interesting. I wonder if we can determine what is not being
optimized at -O2, and either tweak the code or improve the compiler.

>Note that this patch also rework the alternative rehash policy based 
>on powers of 2 so that it also starts with a larger number of bucket 
>(16) and respects LWG2156.
>
>Last I had to wider the memory column so that alignment is preserved 
>even when memory diff is negative.
>
>Tested under Linux x86_64.
>
>Ok to commit ?

Does this patch still apply cleanly? I think it would be good to
commit it now, early in stage 1.