public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
@ 2024-01-17  8:11 Huanghui Nie
  2024-01-17 20:04 ` François Dumont
  0 siblings, 1 reply; 13+ messages in thread
From: Huanghui Nie @ 2024-01-17  8:11 UTC (permalink / raw)
  To: libstdc++, gcc-patches, gcc
  Cc: paolo.carlini, drepper, Benjamin De Kosnik, jwakely

[-- Attachment #1: Type: text/plain, Size: 4246 bytes --]

Hi.

When I implemented a hash table with reference to the C++ STL, I found that
when the hash table in the C++ STL deletes elements, if the first element
deleted is the begin element, the before begin node is repeatedly assigned.
This creates unnecessary performance overhead.


First, let’s see the code implementation:

In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when
&_M_before_begin == _M_buckets[__bkt]. That also means
_M_buckets[__bkt]->_M_nxt is assigned under some conditions.

_M_remove_bucket_begin is called by _M_erase and _M_extract_node:

   1. Case _M_erase a range: _M_remove_bucket_begin is called in a for loop
   when __is_bucket_begin is true. And if __is_bucket_begin is true and
   &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin.
   __prev_n->_M_nxt is always assigned in _M_erase. That means
   _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is
   called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to
   assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
   2. Other cases: _M_remove_bucket_begin is called when __prev_n ==
   _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and
   _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned.
   So there's no need to assign _M_buckets[__bkt]->_M_nxt in
   _M_remove_bucket_begin.

In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt]
and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.


Then let’s see the responsibility of each method:

The hash table in the C++ STL is composed of hash buckets and a node list.
The update of the node list is responsible for _M_erase and _M_extract_node
method. _M_remove_bucket_begin method only needs to update the hash
buckets. The update of _M_before_begin belongs to the update of the node
list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.


Existing tests listed below cover this change:

23_containers/unordered_set/allocator/copy.cc

23_containers/unordered_set/allocator/copy_assign.cc

23_containers/unordered_set/allocator/move.cc

23_containers/unordered_set/allocator/move_assign.cc

23_containers/unordered_set/allocator/swap.cc

23_containers/unordered_set/erase/1.cc

23_containers/unordered_set/erase/24061-set.cc

23_containers/unordered_set/modifiers/extract.cc

23_containers/unordered_set/operations/count.cc

23_containers/unordered_set/requirements/exception/basic.cc

23_containers/unordered_map/allocator/copy.cc

23_containers/unordered_map/allocator/copy_assign.cc

23_containers/unordered_map/allocator/move.cc

23_containers/unordered_map/allocator/move_assign.cc

23_containers/unordered_map/allocator/swap.cc

23_containers/unordered_map/erase/1.cc

23_containers/unordered_map/erase/24061-map.cc

23_containers/unordered_map/modifiers/extract.cc

23_containers/unordered_map/modifiers/move_assign.cc

23_containers/unordered_map/operations/count.cc

23_containers/unordered_map/requirements/exception/basic.cc


Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?


---

ChangeLog:


libstdc++: hashtable: No need to update before begin node in
_M_remove_bucket_begin


2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>


gcc/

* libstdc++-v3/include/bits/hashtable.h


---


diff --git a/libstdc++-v3/include/bits/hashtable.h
b/libstdc++-v3/include/bits/hashtable.h

index b48610036fa..6056639e663 100644

--- a/libstdc++-v3/include/bits/hashtable.h

+++ b/libstdc++-v3/include/bits/hashtable.h

@@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

        if (!__next_n || __next_bkt != __bkt)

          {

            // Bucket is now empty

-           // First update next bucket if any

+           // Update next bucket if any

            if (__next_n)

              _M_buckets[__next_bkt] = _M_buckets[__bkt];



-           // Second update before begin node if necessary

-           if (&_M_before_begin == _M_buckets[__bkt])

-             _M_before_begin._M_nxt = __next_n;

            _M_buckets[__bkt] = nullptr;

          }

       }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-17  8:11 [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin Huanghui Nie
@ 2024-01-17 20:04 ` François Dumont
  2024-01-18  9:26   ` Huanghui Nie
  0 siblings, 1 reply; 13+ messages in thread
From: François Dumont @ 2024-01-17 20:04 UTC (permalink / raw)
  To: Huanghui Nie, libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4694 bytes --]

Hi

Looks like a great finding to me, this is indeed a useless check, thanks!

Have you any figures on the performance enhancement ? It might help to 
get proper approval as gcc is currently in dev stage 4 that is to say 
only bug fixes normally.

François

On 17/01/2024 09:11, Huanghui Nie wrote:
>
> Hi.
>
> When I implemented a hash table with reference to the C++ STL, I found 
> that when the hash table in the C++ STL deletes elements, if the first 
> element deleted is the begin element, the before begin node is 
> repeatedly assigned. This creates unnecessary performance overhead.
>
>
> First, let’s see the code implementation:
>
> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when 
> &_M_before_begin == _M_buckets[__bkt]. That also means 
> _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>
> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>
>  1. Case _M_erase a range: _M_remove_bucket_begin is called in a for
>     loop when __is_bucket_begin is true. And if __is_bucket_begin is
>     true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be
>     &_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase.
>     That means _M_before_begin._M_nxt is always assigned, if
>     _M_remove_bucket_begin is called and &_M_before_begin ==
>     _M_buckets[__bkt]. So there’s no need to assign
>     _M_before_begin._M_nxt in _M_remove_bucket_begin.
>  2. Other cases: _M_remove_bucket_begin is called when __prev_n ==
>     _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in
>     _M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt
>     is always assigned. So there's no need to assign
>     _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
>
> In summary, there’s no need to check &_M_before_begin == 
> _M_buckets[__bkt] and assign _M_before_begin._M_nxt in 
> _M_remove_bucket_begin.
>
>
> Then let’s see the responsibility of each method:
>
> The hash table in the C++ STL is composed of hash buckets and a node 
> list. The update of the node list is responsible for _M_erase and 
> _M_extract_node method. _M_remove_bucket_begin method only needs to 
> update the hash buckets. The update of _M_before_begin belongs to the 
> update of the node list. So _M_remove_bucket_begin doesn’t need to 
> update _M_before_begin.
>
>
> Existing tests listed below cover this change:
>
> 23_containers/unordered_set/allocator/copy.cc
>
> 23_containers/unordered_set/allocator/copy_assign.cc
>
> 23_containers/unordered_set/allocator/move.cc
>
> 23_containers/unordered_set/allocator/move_assign.cc
>
> 23_containers/unordered_set/allocator/swap.cc
>
> 23_containers/unordered_set/erase/1.cc
>
> 23_containers/unordered_set/erase/24061-set.cc
>
> 23_containers/unordered_set/modifiers/extract.cc
>
> 23_containers/unordered_set/operations/count.cc
>
> 23_containers/unordered_set/requirements/exception/basic.cc
>
> 23_containers/unordered_map/allocator/copy.cc
>
> 23_containers/unordered_map/allocator/copy_assign.cc
>
> 23_containers/unordered_map/allocator/move.cc
>
> 23_containers/unordered_map/allocator/move_assign.cc
>
> 23_containers/unordered_map/allocator/swap.cc
>
> 23_containers/unordered_map/erase/1.cc
>
> 23_containers/unordered_map/erase/24061-map.cc
>
> 23_containers/unordered_map/modifiers/extract.cc
>
> 23_containers/unordered_map/modifiers/move_assign.cc
>
> 23_containers/unordered_map/operations/count.cc
>
> 23_containers/unordered_map/requirements/exception/basic.cc
>
>
> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>
>
> ---
>
> ChangeLog:
>
>
> libstdc++: hashtable: No need to update before begin node in 
> _M_remove_bucket_begin
>
>
> 2024-01-16Huanghui Nie<nnnjkk@gmail.com>
>
>
> gcc/
>
> * libstdc++-v3/include/bits/hashtable.h
>
>
> ---
>
>
> diff --git a/libstdc++-v3/include/bits/hashtable.h 
> b/libstdc++-v3/include/bits/hashtable.h
>
> index b48610036fa..6056639e663 100644
>
> --- a/libstdc++-v3/include/bits/hashtable.h
>
> +++ b/libstdc++-v3/include/bits/hashtable.h
>
> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>       if (!__next_n || __next_bkt != __bkt)
>
>         {
>
>           // Bucket is now empty
>
> -         // First update next bucket if any
>
> +         // Update next bucket if any
>
>           if (__next_n)
>
>             _M_buckets[__next_bkt] = _M_buckets[__bkt];
>
> -         // Second update before begin node if necessary
>
> -         if (&_M_before_begin == _M_buckets[__bkt])
>
> -           _M_before_begin._M_nxt = __next_n;
>
>           _M_buckets[__bkt] = nullptr;
>
>         }
>
>     }
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-17 20:04 ` François Dumont
@ 2024-01-18  9:26   ` Huanghui Nie
  2024-01-18 15:58     ` Théo Papadopoulo
  2024-01-22  6:14     ` François Dumont
  0 siblings, 2 replies; 13+ messages in thread
From: Huanghui Nie @ 2024-01-18  9:26 UTC (permalink / raw)
  To: François Dumont; +Cc: libstdc++, gcc-patches


[-- Attachment #1.1: Type: text/plain, Size: 6190 bytes --]

Yes, I have. I did a benchmark today.

The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% when
unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), end()).


My test environment:

CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs

MEM: 256G

OS: CentOS-8.2

g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

Compile flags: -O3 -std=c++17


Test conclusion data (time taken to delete every 100 million elements):

erase(begin()):

|size of unordered_set   |100     |1,000   |10,000  |100,000
|1,000,000|10,000,000|

|base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713
|3854.168  |

|test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494
|3808.137  |

|Time-consuming reduction|1.16%   |0.48%   |1.02%   |0.77%   |0.40%    |1.19%
    |

erase(begin(),end()):

|size of unordered_set   |100     |1,000   |10,000  |100,000
|1,000,000|10,000,000|

|base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521
|2804.099  |

|test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953
|2739.727  |

|Time-consuming reduction|2.39%   |1.52%   |1.56%   |1.27%   |1.54%    |2.30%
    |


Please see the attachment for test code and detailed test result.

2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:

> Hi
>
> Looks like a great finding to me, this is indeed a useless check, thanks!
>
> Have you any figures on the performance enhancement ? It might help to get
> proper approval as gcc is currently in dev stage 4 that is to say only bug
> fixes normally.
>
> François
> On 17/01/2024 09:11, Huanghui Nie wrote:
>
> Hi.
>
> When I implemented a hash table with reference to the C++ STL, I found
> that when the hash table in the C++ STL deletes elements, if the first
> element deleted is the begin element, the before begin node is repeatedly
> assigned. This creates unnecessary performance overhead.
>
>
> First, let’s see the code implementation:
>
> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when
> &_M_before_begin == _M_buckets[__bkt]. That also means
> _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>
> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>
>    1. Case _M_erase a range: _M_remove_bucket_begin is called in a for
>    loop when __is_bucket_begin is true. And if __is_bucket_begin is true and
>    &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin.
>    __prev_n->_M_nxt is always assigned in _M_erase. That means
>    _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is
>    called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to
>    assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>    2. Other cases: _M_remove_bucket_begin is called when __prev_n ==
>    _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and
>    _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned.
>    So there's no need to assign _M_buckets[__bkt]->_M_nxt in
>    _M_remove_bucket_begin.
>
> In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt]
> and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>
>
> Then let’s see the responsibility of each method:
>
> The hash table in the C++ STL is composed of hash buckets and a node list.
> The update of the node list is responsible for _M_erase and _M_extract_node
> method. _M_remove_bucket_begin method only needs to update the hash
> buckets. The update of _M_before_begin belongs to the update of the node
> list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>
>
> Existing tests listed below cover this change:
>
> 23_containers/unordered_set/allocator/copy.cc
>
> 23_containers/unordered_set/allocator/copy_assign.cc
>
> 23_containers/unordered_set/allocator/move.cc
>
> 23_containers/unordered_set/allocator/move_assign.cc
>
> 23_containers/unordered_set/allocator/swap.cc
>
> 23_containers/unordered_set/erase/1.cc
>
> 23_containers/unordered_set/erase/24061-set.cc
>
> 23_containers/unordered_set/modifiers/extract.cc
>
> 23_containers/unordered_set/operations/count.cc
>
> 23_containers/unordered_set/requirements/exception/basic.cc
>
> 23_containers/unordered_map/allocator/copy.cc
>
> 23_containers/unordered_map/allocator/copy_assign.cc
>
> 23_containers/unordered_map/allocator/move.cc
>
> 23_containers/unordered_map/allocator/move_assign.cc
>
> 23_containers/unordered_map/allocator/swap.cc
>
> 23_containers/unordered_map/erase/1.cc
>
> 23_containers/unordered_map/erase/24061-map.cc
>
> 23_containers/unordered_map/modifiers/extract.cc
>
> 23_containers/unordered_map/modifiers/move_assign.cc
>
> 23_containers/unordered_map/operations/count.cc
>
> 23_containers/unordered_map/requirements/exception/basic.cc
>
>
> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>
>
> ---
>
> ChangeLog:
>
>
> libstdc++: hashtable: No need to update before begin node in
> _M_remove_bucket_begin
>
>
> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
>
>
> gcc/
>
> * libstdc++-v3/include/bits/hashtable.h
>
>
> ---
>
>
> diff --git a/libstdc++-v3/include/bits/hashtable.h
> b/libstdc++-v3/include/bits/hashtable.h
>
> index b48610036fa..6056639e663 100644
>
> --- a/libstdc++-v3/include/bits/hashtable.h
>
> +++ b/libstdc++-v3/include/bits/hashtable.h
>
> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>         if (!__next_n || __next_bkt != __bkt)
>
>           {
>
>             // Bucket is now empty
>
> -           // First update next bucket if any
>
> +           // Update next bucket if any
>
>             if (__next_n)
>
>               _M_buckets[__next_bkt] = _M_buckets[__bkt];
>
>
>
> -           // Second update before begin node if necessary
>
> -           if (&_M_before_begin == _M_buckets[__bkt])
>
> -             _M_before_begin._M_nxt = __next_n;
>
>             _M_buckets[__bkt] = nullptr;
>
>           }
>
>        }
>
>

[-- Attachment #2: test1.cc --]
[-- Type: application/octet-stream, Size: 630 bytes --]

#include <cstdio>
#include <cstdlib>
#include <unordered_set>
#include <chrono>

int main(int argc, char** argv) {
  int size = ::atoi(argv[1]);
  int loop = ::atoi(argv[2]);

  std::unordered_set<int> data;
  data.reserve(size);
  double t;
  for (int k = 0; k < loop; ++k) {
    for (int i = 0; i < size; ++i) {
      data.emplace(i);
    }
    auto t1 = std::chrono::steady_clock::now();
    for (int i = 0; i < size; ++i) {
      data.erase(data.begin());
    }
    auto t2 = std::chrono::steady_clock::now();
    t += std::chrono::duration<double, std::micro>(t2 - t1).count();
  }
  printf("Cost: %lf\n", t);
  return 0;
}


[-- Attachment #3: test2.cc --]
[-- Type: application/octet-stream, Size: 597 bytes --]

#include <cstdio>
#include <cstdlib>
#include <unordered_set>
#include <chrono>

int main(int argc, char** argv) {
  int size = ::atoi(argv[1]);
  int loop = ::atoi(argv[2]);

  std::unordered_set<int> data;
  data.reserve(size);
  double t;
  for (int k = 0; k < loop; ++k) {
    for (int i = 0; i < size; ++i) {
      data.emplace(i);
    }
    auto t1 = std::chrono::steady_clock::now();
    data.erase(data.begin(), data.end());
    auto t2 = std::chrono::steady_clock::now();
    t += std::chrono::duration<double, std::micro>(t2 - t1).count();
  }
  printf("Cost: %lf\n", t);
  return 0;
}


[-- Attachment #4: result.csv --]
[-- Type: text/csv, Size: 6212 bytes --]

hashtable version,"test type(1=erase(begin() 2=erase(begin(), end())",size of unordered_set,loop,cost time1,cost time2,cost time3,cost time4,cost time5,cost time6,cost time7,cost time8,cost time9,cost time10,cost time11,cost time12,cost time13,cost time14,cost time15,cost time16,cost time17,cost time18,cost time19,cost time20,cost time21,cost time22,cost time23,cost time24,average costtime
base,1,100,1000000,3823.493,3822.681,3823.487,3826.009,3827.419,3815.043,3824.576,3827.587,3823.115,3822.625,3828.422,3823.058,3826.549,3873.806,3815.655,3853.773,3816.512,3826.372,3807.932,3852.067,3820.759,3824.035,3833.077,3827.616,3827.736
test,1,100,1000000,3771.571,3769.245,3784.259,3770.356,3799.412,3773.946,3811.564,3770.921,3776.988,3814.076,3782.95,3796.984,3770.042,3783.934,3822.614,3786.986,3799.942,3769.783,3771.954,3767.856,3771.71,3771.4,3787.584,3775.674,3783.406
base,2,100,1000000,2809.389,2778.206,2776.783,2806.46,2773.297,2753.868,2779.01,2782.715,2801.758,2776.475,2775.62,2786.181,2772.866,2779.171,2776.704,2779.79,2778.756,2780.544,2783.637,2754.423,2780.021,2778.152,2758.39,2779.289,2779.229
test,2,100,1000000,2695.99,2706.163,2741.074,2702.181,2711.039,2697.633,2706.972,2734.776,2705.539,2704.209,2709.499,2727.339,2737.397,2748.108,2737.865,2705.216,2709.018,2706.612,2712.392,2702.284,2707.86,2685.334,2711.586,2700.123,2712.759
base,1,1000,100000,3797.022,3801.701,3845.955,3800.277,3799.269,3804.899,3798.337,3797.996,3803.812,3797.084,3799.005,3803.165,3795.779,3886.368,3805.567,3801.1,3796.808,3799.167,3801.139,3824.756,3797.933,3824.186,3799.616,3804.458,3807.725
test,1,1000,100000,3791.256,3790.226,3787.149,3787.383,3842.439,3787.531,3789.466,3788.159,3790.282,3777.703,3788.607,3788.634,3789.289,3783.478,3785.203,3796.558,3787.165,3789.253,3739.095,3809.067,3790.257,3789.134,3788.45,3791.262,3789.46
base,2,1000,100000,2770.653,2766.165,2766.432,2771.019,2764.841,2773.394,2768.809,2771.287,2766.246,2777.166,2769.634,2765.984,2766.127,2754.672,2768.229,2772.901,2769.123,2773.159,2774.906,2764.569,2766.496,2765.35,2767.608,2770.423,2768.55
test,2,1000,100000,2716.778,2714.804,2715.655,2720.923,2719.634,2731.495,2718.936,2742.902,2699.984,2723.99,2720.647,2724.48,2738.964,2720.266,2733.468,2715.76,2718.717,2716.562,2725.198,2717.34,2825.85,2732.884,2720.978,2721.668,2726.578
base,1,10000,10000,3822.724,3822.136,3832.718,3823.444,3822.066,3824.938,3845.033,3823.653,3825.097,3821.607,3856.073,3821.575,3821.405,3824.923,3821.77,3823.008,3821.993,3839.306,3820.877,3925.253,3817.731,3821.818,3821.173,3823.716,3830.168
test,1,10000,10000,3799.764,3785.985,3788.534,3789.304,3786.5,3788.131,3825.623,3785.362,3784.366,3801.078,3784.956,3784.937,3822.219,3785.595,3785.953,3788.438,3784.868,3791.152,3786.138,3786.018,3790.52,3790.296,3786.592,3785.18,3791.146
base,2,10000,10000,2792.236,2792.593,2793.781,2794.741,2794.203,2805.638,2795.744,2791.699,2816.771,2792.745,2791.476,2797.161,2790.68,2805.948,2790.781,2790.321,2790.225,2807.69,2790.333,2788.867,2796.099,2796.358,2795.142,2797.44,2795.778
test,2,10000,10000,2746.432,2742.164,2745.393,2768.938,2770.194,2747.573,2747.745,2745.096,2743.483,2744.135,2743.616,2811.89,2749.38,2745.868,2745.868,2744.948,2760.844,2743.756,2746.124,2745.905,2746.125,2774.749,2745.862,2747.297,2752.224
base,1,100000,1000,3803.815,3801.374,3805.723,3838.825,3807.848,3804.506,3809.31,3808.968,3809.804,3803.585,3804.104,3803.752,3833.906,3801.922,3801.311,3808.528,3801.909,3809.198,3800.951,3807.704,3802.195,3808.516,3806.399,3792.792,3807.373
test,1,100000,1000,3772.25,3773.281,3775.894,3773.233,3779.474,3776.377,3772.346,3776.045,3774.53,3777.022,3771.48,3774.368,3774.361,3774.061,3776.991,3784.857,3793.553,3773.227,3773.481,3806.12,3780.02,3791.458,3776.513,3771.853,3778.033
base,2,100000,1000,2774.227,2765.314,2770.277,2767.069,2763.217,2760.879,2770.374,2768.014,2765.732,2773.391,2765.237,2764.888,2766.775,2768.556,2772.143,2766.144,2765.849,2759.546,2768.855,2766.845,2766.124,2769.056,2768.17,2770.565,2767.385
test,2,100000,1000,2762.534,2688.875,2727.942,2732.512,2731.866,2730.009,2732.819,2731.785,2726.283,2730.371,2725.973,2737.078,2731.83,2730.165,2734.018,2727.441,2730.356,2729.162,2729.638,2781.079,2729.196,2730.84,2728.868,2730.714,2732.14
base,1,1000000,100,3795.995,3794.453,3798.35,3795.408,3797.449,3793.136,3799.852,3799.073,3795.84,3794.06,3796.174,3792.995,3795.656,3808.393,3795.261,3796.692,3794.845,3794.827,3793.469,3834.264,3796.209,3792.741,3816.559,3797.411,3798.713
test,1,1000000,100,3731.346,3779.127,3787.058,3763.331,3782.874,3787.526,3783.998,3794.579,3787.743,3780.275,3780.013,3778.24,3779.863,3785.267,3782.577,3795.678,3868.659,3782.761,3781.478,3781.01,3781.462,3780.29,3781.546,3767.153,3783.494
base,2,1000000,100,2761.417,2764.684,2761.706,2749.641,2759.581,2764.313,2760.867,2754.61,2760.823,2768.231,2759.123,2759.991,2759.151,2758.963,2765.045,2760.949,2762.091,2762.359,2770.512,2761.514,2759.618,2758.706,2769.368,2763.245,2761.521
test,2,1000000,100,2713.671,2715.116,2723.414,2713.218,2713.316,2726.65,2714.696,2688.076,2789.947,2712.222,2710.506,2691.592,2713.17,2712.399,2714.737,2713.679,2713.862,2713.766,2712.854,2724.32,2713.091,2713.861,2781.472,2715.245,2718.953
base,1,10000000,10,3840.799,3822.04,3823.061,3823.658,3822.138,4065.106,3836.565,3826.96,3821.787,3819.953,3822.437,3823.611,3823.497,4050.641,3820.995,3827.454,4061.5,3820.556,3824.042,3822.966,3822.598,3828.905,3822.462,3826.311,3854.168
test,1,10000000,10,3794.528,3794.328,3795.333,3798.696,3769.481,3793.053,3873.152,3799.613,3796.982,3798.711,3795.175,3795.941,4031.049,3809.053,3793.369,3793.605,3793.4,3793.058,3795.602,3794.969,3796.661,3797.499,3794.753,3797.278,3808.137
base,2,10000000,10,2789.121,2788.516,2776.764,2787.776,2789.507,2790.068,2777.693,2792.21,2790.414,2802.279,2789.662,2789.86,2785.881,2975.997,2788.857,2959.528,2784.704,2789.618,2790.713,2788.469,2788.762,2793.355,2797.491,2791.119,2804.099
test,2,10000000,10,2736.393,2740.613,2738.415,2736.739,2746.083,2750.259,2739.898,2736.715,2756.323,2740.862,2738.368,2717.114,2750.023,2739.277,2736.757,2737.928,2734.841,2736.417,2740.543,2739.228,2741.903,2740.973,2738.099,2739.668,2739.727

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-18  9:26   ` Huanghui Nie
@ 2024-01-18 15:58     ` Théo Papadopoulo
  2024-01-22 10:07       ` Jonathan Wakely
  2024-01-22  6:14     ` François Dumont
  1 sibling, 1 reply; 13+ messages in thread
From: Théo Papadopoulo @ 2024-01-18 15:58 UTC (permalink / raw)
  To: libstdc++

[-- Attachment #1: Type: text/plain, Size: 6924 bytes --]

Wouldn't it be clearer if written as:

if (!__next_n) {
     _M_buckets[__bkt] = nullptr;
} else if (__next_bkt != __bkt) {
     _M_buckets[__next_bkt] = _M_buckets[__bkt];
     _M_buckets[__bkt] = nullptr;
}

I think this is strictly equivalent and clearer.

     Theo.


On 1/18/24 10:26, Huanghui Nie wrote:
>
> Yes, I have. I did a benchmark today.
>
> The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% 
> when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), 
> end()).
>
>
> My test environment:
>
> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
>
> MEM: 256G
>
> OS: CentOS-8.2
>
> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
>
> Compile flags: -O3 -std=c++17
>
>
> Test conclusion data (time taken to delete every 100 million elements):
>
> erase(begin()):
>
> |size of unordered_set |100 |1,000 |10,000|100,000 |1,000,000|10,000,000|
>
> |base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 
> |3854.168|
>
> |test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 
> |3808.137|
>
> |Time-consuming reduction|1.16% |0.48% |1.02% |0.77% |0.40%|1.19% |
>
> erase(begin(),end()):
>
> |size of unordered_set |100 |1,000 |10,000|100,000 |1,000,000|10,000,000|
>
> |base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 
> |2804.099|
>
> |test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 
> |2739.727|
>
> |Time-consuming reduction|2.39% |1.52% |1.56% |1.27% |1.54%|2.30% |
>
>
> Please see the attachment for test code and detailed test result.
>
>
> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
>
>     Hi
>
>     Looks like a great finding to me, this is indeed a useless check,
>     thanks!
>
>     Have you any figures on the performance enhancement ? It might
>     help to get proper approval as gcc is currently in dev stage 4
>     that is to say only bug fixes normally.
>
>     François
>
>     On 17/01/2024 09:11, Huanghui Nie wrote:
>>
>>     Hi.
>>
>>     When I implemented a hash table with reference to the C++ STL, I
>>     found that when the hash table in the C++ STL deletes elements,
>>     if the first element deleted is the begin element, the before
>>     begin node is repeatedly assigned. This creates unnecessary
>>     performance overhead.
>>
>>
>>     First, let’s see the code implementation:
>>
>>     In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned
>>     when &_M_before_begin == _M_buckets[__bkt]. That also means
>>     _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>>
>>     _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>>
>>      1. Case _M_erase a range: _M_remove_bucket_begin is called in a
>>         for loop when __is_bucket_begin is true. And if
>>         __is_bucket_begin is true and &_M_before_begin ==
>>         _M_buckets[__bkt], __prev_n must be &_M_before_begin.
>>         __prev_n->_M_nxt is always assigned in _M_erase. That means
>>         _M_before_begin._M_nxt is always assigned, if
>>         _M_remove_bucket_begin is called and &_M_before_begin ==
>>         _M_buckets[__bkt]. So there’s no need to assign
>>         _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>      2. Other cases: _M_remove_bucket_begin is called when __prev_n
>>         == _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned
>>         in _M_erase and _M_before_begin. That means
>>         _M_buckets[__bkt]->_M_nxt is always assigned. So there's no
>>         need to assign _M_buckets[__bkt]->_M_nxt in
>>         _M_remove_bucket_begin.
>>
>>     In summary, there’s no need to check &_M_before_begin ==
>>     _M_buckets[__bkt] and assign _M_before_begin._M_nxt in
>>     _M_remove_bucket_begin.
>>
>>
>>     Then let’s see the responsibility of each method:
>>
>>     The hash table in the C++ STL is composed of hash buckets and a
>>     node list. The update of the node list is responsible for
>>     _M_erase and _M_extract_node method. _M_remove_bucket_begin
>>     method only needs to update the hash buckets. The update of
>>     _M_before_begin belongs to the update of the node list. So
>>     _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>>
>>
>>     Existing tests listed below cover this change:
>>
>>     23_containers/unordered_set/allocator/copy.cc
>>
>>     23_containers/unordered_set/allocator/copy_assign.cc
>>
>>     23_containers/unordered_set/allocator/move.cc
>>
>>     23_containers/unordered_set/allocator/move_assign.cc
>>
>>     23_containers/unordered_set/allocator/swap.cc
>>
>>     23_containers/unordered_set/erase/1.cc
>>
>>     23_containers/unordered_set/erase/24061-set.cc
>>
>>     23_containers/unordered_set/modifiers/extract.cc
>>
>>     23_containers/unordered_set/operations/count.cc
>>
>>     23_containers/unordered_set/requirements/exception/basic.cc
>>
>>     23_containers/unordered_map/allocator/copy.cc
>>
>>     23_containers/unordered_map/allocator/copy_assign.cc
>>
>>     23_containers/unordered_map/allocator/move.cc
>>
>>     23_containers/unordered_map/allocator/move_assign.cc
>>
>>     23_containers/unordered_map/allocator/swap.cc
>>
>>     23_containers/unordered_map/erase/1.cc
>>
>>     23_containers/unordered_map/erase/24061-map.cc
>>
>>     23_containers/unordered_map/modifiers/extract.cc
>>
>>     23_containers/unordered_map/modifiers/move_assign.cc
>>
>>     23_containers/unordered_map/operations/count.cc
>>
>>     23_containers/unordered_map/requirements/exception/basic.cc
>>
>>
>>     Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>>
>>
>>     ---
>>
>>     ChangeLog:
>>
>>
>>     libstdc++: hashtable: No need to update before begin node in
>>     _M_remove_bucket_begin
>>
>>
>>     2024-01-16Huanghui Nie<nnnjkk@gmail.com>
>>
>>
>>     gcc/
>>
>>     * libstdc++-v3/include/bits/hashtable.h
>>
>>
>>     ---
>>
>>
>>     diff --git a/libstdc++-v3/include/bits/hashtable.h
>>     b/libstdc++-v3/include/bits/hashtable.h
>>
>>     index b48610036fa..6056639e663 100644
>>
>>     --- a/libstdc++-v3/include/bits/hashtable.h
>>
>>     +++ b/libstdc++-v3/include/bits/hashtable.h
>>
>>     @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>           if (!__next_n || __next_bkt != __bkt)
>>
>>             {
>>
>>               // Bucket is now empty
>>
>>     -         // First update next bucket if any
>>
>>     +         // Update next bucket if any
>>
>>               if (__next_n)
>>
>>                 _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>
>>     -         // Second update before begin node if necessary
>>
>>     -         if (&_M_before_begin == _M_buckets[__bkt])
>>
>>     -           _M_before_begin._M_nxt = __next_n;
>>
>>               _M_buckets[__bkt] = nullptr;
>>
>>             }
>>
>>         }
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-18  9:26   ` Huanghui Nie
  2024-01-18 15:58     ` Théo Papadopoulo
@ 2024-01-22  6:14     ` François Dumont
  1 sibling, 0 replies; 13+ messages in thread
From: François Dumont @ 2024-01-22  6:14 UTC (permalink / raw)
  To: Huanghui Nie; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6868 bytes --]

Thanks, nice result, I'll try to run the performance benchmarks that are 
coming with libstdc++ to see if they spot anything.

That's tests in testsuite/performance folder in case you want to have a 
try yourself.

François


On 18/01/2024 10:26, Huanghui Nie wrote:
>
> Yes, I have. I did a benchmark today.
>
> The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% 
> when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), 
> end()).
>
>
> My test environment:
>
> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
>
> MEM: 256G
>
> OS: CentOS-8.2
>
> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
>
> Compile flags: -O3 -std=c++17
>
>
> Test conclusion data (time taken to delete every 100 million elements):
>
> erase(begin()):
>
> |size of unordered_set |100 |1,000 |10,000|100,000 |1,000,000|10,000,000|
>
> |base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 
> |3854.168|
>
> |test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 
> |3808.137|
>
> |Time-consuming reduction|1.16% |0.48% |1.02% |0.77% |0.40%|1.19% |
>
> erase(begin(),end()):
>
> |size of unordered_set |100 |1,000 |10,000|100,000 |1,000,000|10,000,000|
>
> |base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 
> |2804.099|
>
> |test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 
> |2739.727|
>
> |Time-consuming reduction|2.39% |1.52% |1.56% |1.27% |1.54%|2.30% |
>
>
> Please see the attachment for test code and detailed test result.
>
>
> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
>
>     Hi
>
>     Looks like a great finding to me, this is indeed a useless check,
>     thanks!
>
>     Have you any figures on the performance enhancement ? It might
>     help to get proper approval as gcc is currently in dev stage 4
>     that is to say only bug fixes normally.
>
>     François
>
>     On 17/01/2024 09:11, Huanghui Nie wrote:
>>
>>     Hi.
>>
>>     When I implemented a hash table with reference to the C++ STL, I
>>     found that when the hash table in the C++ STL deletes elements,
>>     if the first element deleted is the begin element, the before
>>     begin node is repeatedly assigned. This creates unnecessary
>>     performance overhead.
>>
>>
>>     First, let’s see the code implementation:
>>
>>     In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned
>>     when &_M_before_begin == _M_buckets[__bkt]. That also means
>>     _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>>
>>     _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>>
>>      1. Case _M_erase a range: _M_remove_bucket_begin is called in a
>>         for loop when __is_bucket_begin is true. And if
>>         __is_bucket_begin is true and &_M_before_begin ==
>>         _M_buckets[__bkt], __prev_n must be &_M_before_begin.
>>         __prev_n->_M_nxt is always assigned in _M_erase. That means
>>         _M_before_begin._M_nxt is always assigned, if
>>         _M_remove_bucket_begin is called and &_M_before_begin ==
>>         _M_buckets[__bkt]. So there’s no need to assign
>>         _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>      2. Other cases: _M_remove_bucket_begin is called when __prev_n
>>         == _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned
>>         in _M_erase and _M_before_begin. That means
>>         _M_buckets[__bkt]->_M_nxt is always assigned. So there's no
>>         need to assign _M_buckets[__bkt]->_M_nxt in
>>         _M_remove_bucket_begin.
>>
>>     In summary, there’s no need to check &_M_before_begin ==
>>     _M_buckets[__bkt] and assign _M_before_begin._M_nxt in
>>     _M_remove_bucket_begin.
>>
>>
>>     Then let’s see the responsibility of each method:
>>
>>     The hash table in the C++ STL is composed of hash buckets and a
>>     node list. The update of the node list is responsible for
>>     _M_erase and _M_extract_node method. _M_remove_bucket_begin
>>     method only needs to update the hash buckets. The update of
>>     _M_before_begin belongs to the update of the node list. So
>>     _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>>
>>
>>     Existing tests listed below cover this change:
>>
>>     23_containers/unordered_set/allocator/copy.cc
>>
>>     23_containers/unordered_set/allocator/copy_assign.cc
>>
>>     23_containers/unordered_set/allocator/move.cc
>>
>>     23_containers/unordered_set/allocator/move_assign.cc
>>
>>     23_containers/unordered_set/allocator/swap.cc
>>
>>     23_containers/unordered_set/erase/1.cc
>>
>>     23_containers/unordered_set/erase/24061-set.cc
>>
>>     23_containers/unordered_set/modifiers/extract.cc
>>
>>     23_containers/unordered_set/operations/count.cc
>>
>>     23_containers/unordered_set/requirements/exception/basic.cc
>>
>>     23_containers/unordered_map/allocator/copy.cc
>>
>>     23_containers/unordered_map/allocator/copy_assign.cc
>>
>>     23_containers/unordered_map/allocator/move.cc
>>
>>     23_containers/unordered_map/allocator/move_assign.cc
>>
>>     23_containers/unordered_map/allocator/swap.cc
>>
>>     23_containers/unordered_map/erase/1.cc
>>
>>     23_containers/unordered_map/erase/24061-map.cc
>>
>>     23_containers/unordered_map/modifiers/extract.cc
>>
>>     23_containers/unordered_map/modifiers/move_assign.cc
>>
>>     23_containers/unordered_map/operations/count.cc
>>
>>     23_containers/unordered_map/requirements/exception/basic.cc
>>
>>
>>     Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>>
>>
>>     ---
>>
>>     ChangeLog:
>>
>>
>>     libstdc++: hashtable: No need to update before begin node in
>>     _M_remove_bucket_begin
>>
>>
>>     2024-01-16Huanghui Nie<nnnjkk@gmail.com>
>>
>>
>>     gcc/
>>
>>     * libstdc++-v3/include/bits/hashtable.h
>>
>>
>>     ---
>>
>>
>>     diff --git a/libstdc++-v3/include/bits/hashtable.h
>>     b/libstdc++-v3/include/bits/hashtable.h
>>
>>     index b48610036fa..6056639e663 100644
>>
>>     --- a/libstdc++-v3/include/bits/hashtable.h
>>
>>     +++ b/libstdc++-v3/include/bits/hashtable.h
>>
>>     @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>           if (!__next_n || __next_bkt != __bkt)
>>
>>             {
>>
>>               // Bucket is now empty
>>
>>     -         // First update next bucket if any
>>
>>     +         // Update next bucket if any
>>
>>               if (__next_n)
>>
>>                 _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>
>>     -         // Second update before begin node if necessary
>>
>>     -         if (&_M_before_begin == _M_buckets[__bkt])
>>
>>     -           _M_before_begin._M_nxt = __next_n;
>>
>>               _M_buckets[__bkt] = nullptr;
>>
>>             }
>>
>>         }
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-18 15:58     ` Théo Papadopoulo
@ 2024-01-22 10:07       ` Jonathan Wakely
  2024-01-22 10:11         ` Jonathan Wakely
  2024-01-22 13:22         ` Théo Papadopoulo
  0 siblings, 2 replies; 13+ messages in thread
From: Jonathan Wakely @ 2024-01-22 10:07 UTC (permalink / raw)
  To: Théo Papadopoulo; +Cc: libstdc++

On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo <papadopoulo@gmail.com> wrote:
>
> Wouldn't it be clearer if written as:
>
> if (!__next_n) {
>     _M_buckets[__bkt] = nullptr;
> } else if (__next_bkt != __bkt) {
>     _M_buckets[__next_bkt] = _M_buckets[__bkt];
>     _M_buckets[__bkt] = nullptr;
> }
>
> I think this is strictly equivalent and clearer.

But it does two branches, when the second one isn't necessary. If
__next_bkt == __bkt the assignment is harmless.

I think it's much clearer the way Huanghui Nie originally suggested.

>
>     Theo.
>
>
> On 1/18/24 10:26, Huanghui Nie wrote:
>
> Yes, I have. I did a benchmark today.
>
> The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), end()).
>
>
> My test environment:
>
> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
>
> MEM: 256G
>
> OS: CentOS-8.2
>
> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
>
> Compile flags: -O3 -std=c++17
>
>
> Test conclusion data (time taken to delete every 100 million elements):
>
> erase(begin()):
>
> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
>
> |base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
>
> |test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
>
> |Time-consuming reduction|1.16%   |0.48%   |1.02%   |0.77%   |0.40%    |1.19%     |
>
> erase(begin(),end()):
>
> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
>
> |base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
>
> |test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
>
> |Time-consuming reduction|2.39%   |1.52%   |1.56%   |1.27%   |1.54%    |2.30%     |
>
>
> Please see the attachment for test code and detailed test result.
>
>
> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
>>
>> Hi
>>
>> Looks like a great finding to me, this is indeed a useless check, thanks!
>>
>> Have you any figures on the performance enhancement ? It might help to get proper approval as gcc is currently in dev stage 4 that is to say only bug fixes normally.
>>
>> François
>>
>> On 17/01/2024 09:11, Huanghui Nie wrote:
>>
>> Hi.
>>
>> When I implemented a hash table with reference to the C++ STL, I found that when the hash table in the C++ STL deletes elements, if the first element deleted is the begin element, the before begin node is repeatedly assigned. This creates unnecessary performance overhead.
>>
>>
>> First, let’s see the code implementation:
>>
>> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when &_M_before_begin == _M_buckets[__bkt]. That also means _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>>
>> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>>
>> Case _M_erase a range: _M_remove_bucket_begin is called in a for loop when __is_bucket_begin is true. And if __is_bucket_begin is true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase. That means _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>> Other cases: _M_remove_bucket_begin is called when __prev_n == _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned. So there's no need to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
>>
>> In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt] and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>
>>
>> Then let’s see the responsibility of each method:
>>
>> The hash table in the C++ STL is composed of hash buckets and a node list. The update of the node list is responsible for _M_erase and _M_extract_node method. _M_remove_bucket_begin method only needs to update the hash buckets. The update of _M_before_begin belongs to the update of the node list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>>
>>
>> Existing tests listed below cover this change:
>>
>> 23_containers/unordered_set/allocator/copy.cc
>>
>> 23_containers/unordered_set/allocator/copy_assign.cc
>>
>> 23_containers/unordered_set/allocator/move.cc
>>
>> 23_containers/unordered_set/allocator/move_assign.cc
>>
>> 23_containers/unordered_set/allocator/swap.cc
>>
>> 23_containers/unordered_set/erase/1.cc
>>
>> 23_containers/unordered_set/erase/24061-set.cc
>>
>> 23_containers/unordered_set/modifiers/extract.cc
>>
>> 23_containers/unordered_set/operations/count.cc
>>
>> 23_containers/unordered_set/requirements/exception/basic.cc
>>
>> 23_containers/unordered_map/allocator/copy.cc
>>
>> 23_containers/unordered_map/allocator/copy_assign.cc
>>
>> 23_containers/unordered_map/allocator/move.cc
>>
>> 23_containers/unordered_map/allocator/move_assign.cc
>>
>> 23_containers/unordered_map/allocator/swap.cc
>>
>> 23_containers/unordered_map/erase/1.cc
>>
>> 23_containers/unordered_map/erase/24061-map.cc
>>
>> 23_containers/unordered_map/modifiers/extract.cc
>>
>> 23_containers/unordered_map/modifiers/move_assign.cc
>>
>> 23_containers/unordered_map/operations/count.cc
>>
>> 23_containers/unordered_map/requirements/exception/basic.cc
>>
>>
>> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>>
>>
>> ---
>>
>> ChangeLog:
>>
>>
>> libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
>>
>>
>> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
>>
>>
>> gcc/
>>
>> * libstdc++-v3/include/bits/hashtable.h
>>
>>
>> ---
>>
>>
>> diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
>>
>> index b48610036fa..6056639e663 100644
>>
>> --- a/libstdc++-v3/include/bits/hashtable.h
>>
>> +++ b/libstdc++-v3/include/bits/hashtable.h
>>
>> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>
>>         if (!__next_n || __next_bkt != __bkt)
>>
>>           {
>>
>>             // Bucket is now empty
>>
>> -           // First update next bucket if any
>>
>> +           // Update next bucket if any
>>
>>             if (__next_n)
>>
>>               _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>
>>
>>
>> -           // Second update before begin node if necessary
>>
>> -           if (&_M_before_begin == _M_buckets[__bkt])
>>
>> -             _M_before_begin._M_nxt = __next_n;
>>
>>             _M_buckets[__bkt] = nullptr;
>>
>>           }
>>
>>        }
>>
>>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-22 10:07       ` Jonathan Wakely
@ 2024-01-22 10:11         ` Jonathan Wakely
  2024-01-22 13:22         ` Théo Papadopoulo
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Wakely @ 2024-01-22 10:11 UTC (permalink / raw)
  To: Théo Papadopoulo; +Cc: libstdc++

On Mon, 22 Jan 2024 at 10:07, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo <papadopoulo@gmail.com> wrote:
> >
> > Wouldn't it be clearer if written as:
> >
> > if (!__next_n) {
> >     _M_buckets[__bkt] = nullptr;
> > } else if (__next_bkt != __bkt) {
> >     _M_buckets[__next_bkt] = _M_buckets[__bkt];
> >     _M_buckets[__bkt] = nullptr;
> > }
> >
> > I think this is strictly equivalent and clearer.
>
> But it does two branches, when the second one isn't necessary. If
> __next_bkt == __bkt the assignment is harmless.
>
> I think it's much clearer the way Huanghui Nie originally suggested.

Maybe it could be:

if (__next_n && __next_bkt != __bkt)
  _M_buckets[__next_bkt] = _M_buckets[__bkt];
_M_buckets[__bkt] = nullptr;

But I don't see much benefit to the extra condition. The processor is
already going to have to load _M_buckets[__bkt] into a cacheline to
zero it anyway.

But maybe it can be benchmarked to see if it matters.


>
> >
> >     Theo.
> >
> >
> > On 1/18/24 10:26, Huanghui Nie wrote:
> >
> > Yes, I have. I did a benchmark today.
> >
> > The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), end()).
> >
> >
> > My test environment:
> >
> > CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
> >
> > MEM: 256G
> >
> > OS: CentOS-8.2
> >
> > g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
> >
> > Compile flags: -O3 -std=c++17
> >
> >
> > Test conclusion data (time taken to delete every 100 million elements):
> >
> > erase(begin()):
> >
> > |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
> >
> > |base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
> >
> > |test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
> >
> > |Time-consuming reduction|1.16%   |0.48%   |1.02%   |0.77%   |0.40%    |1.19%     |
> >
> > erase(begin(),end()):
> >
> > |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
> >
> > |base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
> >
> > |test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
> >
> > |Time-consuming reduction|2.39%   |1.52%   |1.56%   |1.27%   |1.54%    |2.30%     |
> >
> >
> > Please see the attachment for test code and detailed test result.
> >
> >
> > 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
> >>
> >> Hi
> >>
> >> Looks like a great finding to me, this is indeed a useless check, thanks!
> >>
> >> Have you any figures on the performance enhancement ? It might help to get proper approval as gcc is currently in dev stage 4 that is to say only bug fixes normally.
> >>
> >> François
> >>
> >> On 17/01/2024 09:11, Huanghui Nie wrote:
> >>
> >> Hi.
> >>
> >> When I implemented a hash table with reference to the C++ STL, I found that when the hash table in the C++ STL deletes elements, if the first element deleted is the begin element, the before begin node is repeatedly assigned. This creates unnecessary performance overhead.
> >>
> >>
> >> First, let’s see the code implementation:
> >>
> >> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when &_M_before_begin == _M_buckets[__bkt]. That also means _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
> >>
> >> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
> >>
> >> Case _M_erase a range: _M_remove_bucket_begin is called in a for loop when __is_bucket_begin is true. And if __is_bucket_begin is true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase. That means _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
> >> Other cases: _M_remove_bucket_begin is called when __prev_n == _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned. So there's no need to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
> >>
> >> In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt] and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
> >>
> >>
> >> Then let’s see the responsibility of each method:
> >>
> >> The hash table in the C++ STL is composed of hash buckets and a node list. The update of the node list is responsible for _M_erase and _M_extract_node method. _M_remove_bucket_begin method only needs to update the hash buckets. The update of _M_before_begin belongs to the update of the node list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.
> >>
> >>
> >> Existing tests listed below cover this change:
> >>
> >> 23_containers/unordered_set/allocator/copy.cc
> >>
> >> 23_containers/unordered_set/allocator/copy_assign.cc
> >>
> >> 23_containers/unordered_set/allocator/move.cc
> >>
> >> 23_containers/unordered_set/allocator/move_assign.cc
> >>
> >> 23_containers/unordered_set/allocator/swap.cc
> >>
> >> 23_containers/unordered_set/erase/1.cc
> >>
> >> 23_containers/unordered_set/erase/24061-set.cc
> >>
> >> 23_containers/unordered_set/modifiers/extract.cc
> >>
> >> 23_containers/unordered_set/operations/count.cc
> >>
> >> 23_containers/unordered_set/requirements/exception/basic.cc
> >>
> >> 23_containers/unordered_map/allocator/copy.cc
> >>
> >> 23_containers/unordered_map/allocator/copy_assign.cc
> >>
> >> 23_containers/unordered_map/allocator/move.cc
> >>
> >> 23_containers/unordered_map/allocator/move_assign.cc
> >>
> >> 23_containers/unordered_map/allocator/swap.cc
> >>
> >> 23_containers/unordered_map/erase/1.cc
> >>
> >> 23_containers/unordered_map/erase/24061-map.cc
> >>
> >> 23_containers/unordered_map/modifiers/extract.cc
> >>
> >> 23_containers/unordered_map/modifiers/move_assign.cc
> >>
> >> 23_containers/unordered_map/operations/count.cc
> >>
> >> 23_containers/unordered_map/requirements/exception/basic.cc
> >>
> >>
> >> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
> >>
> >>
> >> ---
> >>
> >> ChangeLog:
> >>
> >>
> >> libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
> >>
> >>
> >> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
> >>
> >>
> >> gcc/
> >>
> >> * libstdc++-v3/include/bits/hashtable.h
> >>
> >>
> >> ---
> >>
> >>
> >> diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
> >>
> >> index b48610036fa..6056639e663 100644
> >>
> >> --- a/libstdc++-v3/include/bits/hashtable.h
> >>
> >> +++ b/libstdc++-v3/include/bits/hashtable.h
> >>
> >> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >>         if (!__next_n || __next_bkt != __bkt)
> >>
> >>           {
> >>
> >>             // Bucket is now empty
> >>
> >> -           // First update next bucket if any
> >>
> >> +           // Update next bucket if any
> >>
> >>             if (__next_n)
> >>
> >>               _M_buckets[__next_bkt] = _M_buckets[__bkt];
> >>
> >>
> >>
> >> -           // Second update before begin node if necessary
> >>
> >> -           if (&_M_before_begin == _M_buckets[__bkt])
> >>
> >> -             _M_before_begin._M_nxt = __next_n;
> >>
> >>             _M_buckets[__bkt] = nullptr;
> >>
> >>           }
> >>
> >>        }
> >>
> >>
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-22 10:07       ` Jonathan Wakely
  2024-01-22 10:11         ` Jonathan Wakely
@ 2024-01-22 13:22         ` Théo Papadopoulo
  2024-01-22 13:29           ` Jonathan Wakely
  1 sibling, 1 reply; 13+ messages in thread
From: Théo Papadopoulo @ 2024-01-22 13:22 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++

On 1/22/24 11:07, Jonathan Wakely wrote:
> On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo <papadopoulo@gmail.com> wrote:
>> Wouldn't it be clearer if written as:
>>
>> if (!__next_n) {
>>      _M_buckets[__bkt] = nullptr;
>> } else if (__next_bkt != __bkt) {
>>      _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>      _M_buckets[__bkt] = nullptr;
>> }
>>
>> I think this is strictly equivalent and clearer.
> But it does two branches, when the second one isn't necessary. If
> __next_bkt == __bkt the assignment is harmless.
>
> I think it's much clearer the way Huanghui Nie originally suggested.

But there are two branches in the original code as well (the ifs are 
just nested)...
And they are sequential in both cases (with the !next_n on the faster path).
  The condition are just a bit simpler in the proposed new form.

Anyway, as a (the) maintainer, your opinion is the most important and if 
you are more at ease
with one form than the other, so be it.

     Theo.



>
>>      Theo.
>>
>>
>> On 1/18/24 10:26, Huanghui Nie wrote:
>>
>> Yes, I have. I did a benchmark today.
>>
>> The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), end()).
>>
>>
>> My test environment:
>>
>> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
>>
>> MEM: 256G
>>
>> OS: CentOS-8.2
>>
>> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
>>
>> Compile flags: -O3 -std=c++17
>>
>>
>> Test conclusion data (time taken to delete every 100 million elements):
>>
>> erase(begin()):
>>
>> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
>>
>> |base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
>>
>> |test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
>>
>> |Time-consuming reduction|1.16%   |0.48%   |1.02%   |0.77%   |0.40%    |1.19%     |
>>
>> erase(begin(),end()):
>>
>> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
>>
>> |base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
>>
>> |test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
>>
>> |Time-consuming reduction|2.39%   |1.52%   |1.56%   |1.27%   |1.54%    |2.30%     |
>>
>>
>> Please see the attachment for test code and detailed test result.
>>
>>
>> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
>>> Hi
>>>
>>> Looks like a great finding to me, this is indeed a useless check, thanks!
>>>
>>> Have you any figures on the performance enhancement ? It might help to get proper approval as gcc is currently in dev stage 4 that is to say only bug fixes normally.
>>>
>>> François
>>>
>>> On 17/01/2024 09:11, Huanghui Nie wrote:
>>>
>>> Hi.
>>>
>>> When I implemented a hash table with reference to the C++ STL, I found that when the hash table in the C++ STL deletes elements, if the first element deleted is the begin element, the before begin node is repeatedly assigned. This creates unnecessary performance overhead.
>>>
>>>
>>> First, let’s see the code implementation:
>>>
>>> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when &_M_before_begin == _M_buckets[__bkt]. That also means _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>>>
>>> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>>>
>>> Case _M_erase a range: _M_remove_bucket_begin is called in a for loop when __is_bucket_begin is true. And if __is_bucket_begin is true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase. That means _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>> Other cases: _M_remove_bucket_begin is called when __prev_n == _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned. So there's no need to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
>>>
>>> In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt] and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>>
>>>
>>> Then let’s see the responsibility of each method:
>>>
>>> The hash table in the C++ STL is composed of hash buckets and a node list. The update of the node list is responsible for _M_erase and _M_extract_node method. _M_remove_bucket_begin method only needs to update the hash buckets. The update of _M_before_begin belongs to the update of the node list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>>>
>>>
>>> Existing tests listed below cover this change:
>>>
>>> 23_containers/unordered_set/allocator/copy.cc
>>>
>>> 23_containers/unordered_set/allocator/copy_assign.cc
>>>
>>> 23_containers/unordered_set/allocator/move.cc
>>>
>>> 23_containers/unordered_set/allocator/move_assign.cc
>>>
>>> 23_containers/unordered_set/allocator/swap.cc
>>>
>>> 23_containers/unordered_set/erase/1.cc
>>>
>>> 23_containers/unordered_set/erase/24061-set.cc
>>>
>>> 23_containers/unordered_set/modifiers/extract.cc
>>>
>>> 23_containers/unordered_set/operations/count.cc
>>>
>>> 23_containers/unordered_set/requirements/exception/basic.cc
>>>
>>> 23_containers/unordered_map/allocator/copy.cc
>>>
>>> 23_containers/unordered_map/allocator/copy_assign.cc
>>>
>>> 23_containers/unordered_map/allocator/move.cc
>>>
>>> 23_containers/unordered_map/allocator/move_assign.cc
>>>
>>> 23_containers/unordered_map/allocator/swap.cc
>>>
>>> 23_containers/unordered_map/erase/1.cc
>>>
>>> 23_containers/unordered_map/erase/24061-map.cc
>>>
>>> 23_containers/unordered_map/modifiers/extract.cc
>>>
>>> 23_containers/unordered_map/modifiers/move_assign.cc
>>>
>>> 23_containers/unordered_map/operations/count.cc
>>>
>>> 23_containers/unordered_map/requirements/exception/basic.cc
>>>
>>>
>>> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>>>
>>>
>>> ---
>>>
>>> ChangeLog:
>>>
>>>
>>> libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
>>>
>>>
>>> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
>>>
>>>
>>> gcc/
>>>
>>> * libstdc++-v3/include/bits/hashtable.h
>>>
>>>
>>> ---
>>>
>>>
>>> diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
>>>
>>> index b48610036fa..6056639e663 100644
>>>
>>> --- a/libstdc++-v3/include/bits/hashtable.h
>>>
>>> +++ b/libstdc++-v3/include/bits/hashtable.h
>>>
>>> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>
>>>          if (!__next_n || __next_bkt != __bkt)
>>>
>>>            {
>>>
>>>              // Bucket is now empty
>>>
>>> -           // First update next bucket if any
>>>
>>> +           // Update next bucket if any
>>>
>>>              if (__next_n)
>>>
>>>                _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>>
>>>
>>>
>>> -           // Second update before begin node if necessary
>>>
>>> -           if (&_M_before_begin == _M_buckets[__bkt])
>>>
>>> -             _M_before_begin._M_nxt = __next_n;
>>>
>>>              _M_buckets[__bkt] = nullptr;
>>>
>>>            }
>>>
>>>         }
>>>
>>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-22 13:22         ` Théo Papadopoulo
@ 2024-01-22 13:29           ` Jonathan Wakely
  2024-01-22 13:42             ` Théo Papadopoulo
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2024-01-22 13:29 UTC (permalink / raw)
  To: Théo Papadopoulo; +Cc: libstdc++

On Mon, 22 Jan 2024 at 13:22, Théo Papadopoulo <papadopoulo@gmail.com> wrote:
>
> On 1/22/24 11:07, Jonathan Wakely wrote:
> > On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo <papadopoulo@gmail.com> wrote:
> >> Wouldn't it be clearer if written as:
> >>
> >> if (!__next_n) {
> >>      _M_buckets[__bkt] = nullptr;
> >> } else if (__next_bkt != __bkt) {
> >>      _M_buckets[__next_bkt] = _M_buckets[__bkt];
> >>      _M_buckets[__bkt] = nullptr;
> >> }
> >>
> >> I think this is strictly equivalent and clearer.
> > But it does two branches, when the second one isn't necessary. If
> > __next_bkt == __bkt the assignment is harmless.
> >
> > I think it's much clearer the way Huanghui Nie originally suggested.
>
> But there are two branches in the original code as well (the ifs are
> just nested)...
> And they are sequential in both cases (with the !next_n on the faster path).
>   The condition are just a bit simpler in the proposed new form.


Ah, yes, so you mean also replacing the earlier condition:
        if (!__next_n || __next_bkt != __bkt)

I was only looking at the changed code not the earlier context.
/facepalm

Yes, looking at the full context I like your suggestion.


>
> Anyway, as a (the) maintainer, your opinion is the most important and if
> you are more at ease
> with one form than the other, so be it.
>
>      Theo.
>
>
>
> >
> >>      Theo.
> >>
> >>
> >> On 1/18/24 10:26, Huanghui Nie wrote:
> >>
> >> Yes, I have. I did a benchmark today.
> >>
> >> The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), end()).
> >>
> >>
> >> My test environment:
> >>
> >> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
> >>
> >> MEM: 256G
> >>
> >> OS: CentOS-8.2
> >>
> >> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
> >>
> >> Compile flags: -O3 -std=c++17
> >>
> >>
> >> Test conclusion data (time taken to delete every 100 million elements):
> >>
> >> erase(begin()):
> >>
> >> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
> >>
> >> |base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
> >>
> >> |test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
> >>
> >> |Time-consuming reduction|1.16%   |0.48%   |1.02%   |0.77%   |0.40%    |1.19%     |
> >>
> >> erase(begin(),end()):
> >>
> >> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
> >>
> >> |base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
> >>
> >> |test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
> >>
> >> |Time-consuming reduction|2.39%   |1.52%   |1.56%   |1.27%   |1.54%    |2.30%     |
> >>
> >>
> >> Please see the attachment for test code and detailed test result.
> >>
> >>
> >> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
> >>> Hi
> >>>
> >>> Looks like a great finding to me, this is indeed a useless check, thanks!
> >>>
> >>> Have you any figures on the performance enhancement ? It might help to get proper approval as gcc is currently in dev stage 4 that is to say only bug fixes normally.
> >>>
> >>> François
> >>>
> >>> On 17/01/2024 09:11, Huanghui Nie wrote:
> >>>
> >>> Hi.
> >>>
> >>> When I implemented a hash table with reference to the C++ STL, I found that when the hash table in the C++ STL deletes elements, if the first element deleted is the begin element, the before begin node is repeatedly assigned. This creates unnecessary performance overhead.
> >>>
> >>>
> >>> First, let’s see the code implementation:
> >>>
> >>> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when &_M_before_begin == _M_buckets[__bkt]. That also means _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
> >>>
> >>> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
> >>>
> >>> Case _M_erase a range: _M_remove_bucket_begin is called in a for loop when __is_bucket_begin is true. And if __is_bucket_begin is true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase. That means _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
> >>> Other cases: _M_remove_bucket_begin is called when __prev_n == _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned. So there's no need to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
> >>>
> >>> In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt] and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
> >>>
> >>>
> >>> Then let’s see the responsibility of each method:
> >>>
> >>> The hash table in the C++ STL is composed of hash buckets and a node list. The update of the node list is responsible for _M_erase and _M_extract_node method. _M_remove_bucket_begin method only needs to update the hash buckets. The update of _M_before_begin belongs to the update of the node list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.
> >>>
> >>>
> >>> Existing tests listed below cover this change:
> >>>
> >>> 23_containers/unordered_set/allocator/copy.cc
> >>>
> >>> 23_containers/unordered_set/allocator/copy_assign.cc
> >>>
> >>> 23_containers/unordered_set/allocator/move.cc
> >>>
> >>> 23_containers/unordered_set/allocator/move_assign.cc
> >>>
> >>> 23_containers/unordered_set/allocator/swap.cc
> >>>
> >>> 23_containers/unordered_set/erase/1.cc
> >>>
> >>> 23_containers/unordered_set/erase/24061-set.cc
> >>>
> >>> 23_containers/unordered_set/modifiers/extract.cc
> >>>
> >>> 23_containers/unordered_set/operations/count.cc
> >>>
> >>> 23_containers/unordered_set/requirements/exception/basic.cc
> >>>
> >>> 23_containers/unordered_map/allocator/copy.cc
> >>>
> >>> 23_containers/unordered_map/allocator/copy_assign.cc
> >>>
> >>> 23_containers/unordered_map/allocator/move.cc
> >>>
> >>> 23_containers/unordered_map/allocator/move_assign.cc
> >>>
> >>> 23_containers/unordered_map/allocator/swap.cc
> >>>
> >>> 23_containers/unordered_map/erase/1.cc
> >>>
> >>> 23_containers/unordered_map/erase/24061-map.cc
> >>>
> >>> 23_containers/unordered_map/modifiers/extract.cc
> >>>
> >>> 23_containers/unordered_map/modifiers/move_assign.cc
> >>>
> >>> 23_containers/unordered_map/operations/count.cc
> >>>
> >>> 23_containers/unordered_map/requirements/exception/basic.cc
> >>>
> >>>
> >>> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
> >>>
> >>>
> >>> ---
> >>>
> >>> ChangeLog:
> >>>
> >>>
> >>> libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
> >>>
> >>>
> >>> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
> >>>
> >>>
> >>> gcc/
> >>>
> >>> * libstdc++-v3/include/bits/hashtable.h
> >>>
> >>>
> >>> ---
> >>>
> >>>
> >>> diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
> >>>
> >>> index b48610036fa..6056639e663 100644
> >>>
> >>> --- a/libstdc++-v3/include/bits/hashtable.h
> >>>
> >>> +++ b/libstdc++-v3/include/bits/hashtable.h
> >>>
> >>> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>
> >>>          if (!__next_n || __next_bkt != __bkt)
> >>>
> >>>            {
> >>>
> >>>              // Bucket is now empty
> >>>
> >>> -           // First update next bucket if any
> >>>
> >>> +           // Update next bucket if any
> >>>
> >>>              if (__next_n)
> >>>
> >>>                _M_buckets[__next_bkt] = _M_buckets[__bkt];
> >>>
> >>>
> >>>
> >>> -           // Second update before begin node if necessary
> >>>
> >>> -           if (&_M_before_begin == _M_buckets[__bkt])
> >>>
> >>> -             _M_before_begin._M_nxt = __next_n;
> >>>
> >>>              _M_buckets[__bkt] = nullptr;
> >>>
> >>>            }
> >>>
> >>>         }
> >>>
> >>>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-22 13:29           ` Jonathan Wakely
@ 2024-01-22 13:42             ` Théo Papadopoulo
  2024-01-22 21:16               ` François Dumont
  0 siblings, 1 reply; 13+ messages in thread
From: Théo Papadopoulo @ 2024-01-22 13:42 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++

On 1/22/24 14:29, Jonathan Wakely wrote:
> On Mon, 22 Jan 2024 at 13:22, Théo Papadopoulo <papadopoulo@gmail.com> wrote:
>> On 1/22/24 11:07, Jonathan Wakely wrote:
>>> On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo <papadopoulo@gmail.com> wrote:
>>>> Wouldn't it be clearer if written as:
>>>>
>>>> if (!__next_n) {
>>>>       _M_buckets[__bkt] = nullptr;
>>>> } else if (__next_bkt != __bkt) {
>>>>       _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>>>       _M_buckets[__bkt] = nullptr;
>>>> }
>>>>
>>>> I think this is strictly equivalent and clearer.
>>> But it does two branches, when the second one isn't necessary. If
>>> __next_bkt == __bkt the assignment is harmless.
>>>
>>> I think it's much clearer the way Huanghui Nie originally suggested.
>> But there are two branches in the original code as well (the ifs are
>> just nested)...
>> And they are sequential in both cases (with the !next_n on the faster path).
>>    The condition are just a bit simpler in the proposed new form.
>
> Ah, yes, so you mean also replacing the earlier condition:
>          if (!__next_n || __next_bkt != __bkt)
>
> I was only looking at the changed code not the earlier context.
> /facepalm
>
> Yes, looking at the full context I like your suggestion.

Similarly, with your response, I too understand better your answer !!!


>
>> Anyway, as a (the) maintainer, your opinion is the most important and if
>> you are more at ease
>> with one form than the other, so be it.
>>
>>       Theo.
>>
>>
>>
>>>>       Theo.
>>>>
>>>>
>>>> On 1/18/24 10:26, Huanghui Nie wrote:
>>>>
>>>> Yes, I have. I did a benchmark today.
>>>>
>>>> The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), end()).
>>>>
>>>>
>>>> My test environment:
>>>>
>>>> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
>>>>
>>>> MEM: 256G
>>>>
>>>> OS: CentOS-8.2
>>>>
>>>> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
>>>>
>>>> Compile flags: -O3 -std=c++17
>>>>
>>>>
>>>> Test conclusion data (time taken to delete every 100 million elements):
>>>>
>>>> erase(begin()):
>>>>
>>>> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
>>>>
>>>> |base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
>>>>
>>>> |test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
>>>>
>>>> |Time-consuming reduction|1.16%   |0.48%   |1.02%   |0.77%   |0.40%    |1.19%     |
>>>>
>>>> erase(begin(),end()):
>>>>
>>>> |size of unordered_set   |100     |1,000   |10,000  |100,000 |1,000,000|10,000,000|
>>>>
>>>> |base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
>>>>
>>>> |test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
>>>>
>>>> |Time-consuming reduction|2.39%   |1.52%   |1.56%   |1.27%   |1.54%    |2.30%     |
>>>>
>>>>
>>>> Please see the attachment for test code and detailed test result.
>>>>
>>>>
>>>> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
>>>>> Hi
>>>>>
>>>>> Looks like a great finding to me, this is indeed a useless check, thanks!
>>>>>
>>>>> Have you any figures on the performance enhancement ? It might help to get proper approval as gcc is currently in dev stage 4 that is to say only bug fixes normally.
>>>>>
>>>>> François
>>>>>
>>>>> On 17/01/2024 09:11, Huanghui Nie wrote:
>>>>>
>>>>> Hi.
>>>>>
>>>>> When I implemented a hash table with reference to the C++ STL, I found that when the hash table in the C++ STL deletes elements, if the first element deleted is the begin element, the before begin node is repeatedly assigned. This creates unnecessary performance overhead.
>>>>>
>>>>>
>>>>> First, let’s see the code implementation:
>>>>>
>>>>> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when &_M_before_begin == _M_buckets[__bkt]. That also means _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>>>>>
>>>>> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>>>>>
>>>>> Case _M_erase a range: _M_remove_bucket_begin is called in a for loop when __is_bucket_begin is true. And if __is_bucket_begin is true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase. That means _M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>>>> Other cases: _M_remove_bucket_begin is called when __prev_n == _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned. So there's no need to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
>>>>>
>>>>> In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt] and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>>>>
>>>>>
>>>>> Then let’s see the responsibility of each method:
>>>>>
>>>>> The hash table in the C++ STL is composed of hash buckets and a node list. The update of the node list is responsible for _M_erase and _M_extract_node method. _M_remove_bucket_begin method only needs to update the hash buckets. The update of _M_before_begin belongs to the update of the node list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>>>>>
>>>>>
>>>>> Existing tests listed below cover this change:
>>>>>
>>>>> 23_containers/unordered_set/allocator/copy.cc
>>>>>
>>>>> 23_containers/unordered_set/allocator/copy_assign.cc
>>>>>
>>>>> 23_containers/unordered_set/allocator/move.cc
>>>>>
>>>>> 23_containers/unordered_set/allocator/move_assign.cc
>>>>>
>>>>> 23_containers/unordered_set/allocator/swap.cc
>>>>>
>>>>> 23_containers/unordered_set/erase/1.cc
>>>>>
>>>>> 23_containers/unordered_set/erase/24061-set.cc
>>>>>
>>>>> 23_containers/unordered_set/modifiers/extract.cc
>>>>>
>>>>> 23_containers/unordered_set/operations/count.cc
>>>>>
>>>>> 23_containers/unordered_set/requirements/exception/basic.cc
>>>>>
>>>>> 23_containers/unordered_map/allocator/copy.cc
>>>>>
>>>>> 23_containers/unordered_map/allocator/copy_assign.cc
>>>>>
>>>>> 23_containers/unordered_map/allocator/move.cc
>>>>>
>>>>> 23_containers/unordered_map/allocator/move_assign.cc
>>>>>
>>>>> 23_containers/unordered_map/allocator/swap.cc
>>>>>
>>>>> 23_containers/unordered_map/erase/1.cc
>>>>>
>>>>> 23_containers/unordered_map/erase/24061-map.cc
>>>>>
>>>>> 23_containers/unordered_map/modifiers/extract.cc
>>>>>
>>>>> 23_containers/unordered_map/modifiers/move_assign.cc
>>>>>
>>>>> 23_containers/unordered_map/operations/count.cc
>>>>>
>>>>> 23_containers/unordered_map/requirements/exception/basic.cc
>>>>>
>>>>>
>>>>> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>>>>>
>>>>>
>>>>> ---
>>>>>
>>>>> ChangeLog:
>>>>>
>>>>>
>>>>> libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
>>>>>
>>>>>
>>>>> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
>>>>>
>>>>>
>>>>> gcc/
>>>>>
>>>>> * libstdc++-v3/include/bits/hashtable.h
>>>>>
>>>>>
>>>>> ---
>>>>>
>>>>>
>>>>> diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
>>>>>
>>>>> index b48610036fa..6056639e663 100644
>>>>>
>>>>> --- a/libstdc++-v3/include/bits/hashtable.h
>>>>>
>>>>> +++ b/libstdc++-v3/include/bits/hashtable.h
>>>>>
>>>>> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>>
>>>>>           if (!__next_n || __next_bkt != __bkt)
>>>>>
>>>>>             {
>>>>>
>>>>>               // Bucket is now empty
>>>>>
>>>>> -           // First update next bucket if any
>>>>>
>>>>> +           // Update next bucket if any
>>>>>
>>>>>               if (__next_n)
>>>>>
>>>>>                 _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>>>>
>>>>>
>>>>>
>>>>> -           // Second update before begin node if necessary
>>>>>
>>>>> -           if (&_M_before_begin == _M_buckets[__bkt])
>>>>>
>>>>> -             _M_before_begin._M_nxt = __next_n;
>>>>>
>>>>>               _M_buckets[__bkt] = nullptr;
>>>>>
>>>>>             }
>>>>>
>>>>>          }
>>>>>
>>>>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-22 13:42             ` Théo Papadopoulo
@ 2024-01-22 21:16               ` François Dumont
  2024-01-24  0:25                 ` Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: François Dumont @ 2024-01-22 21:16 UTC (permalink / raw)
  To: Théo Papadopoulo, Jonathan Wakely; +Cc: libstdc++

[-- Attachment #1: Type: text/plain, Size: 9905 bytes --]

Hi

Here is what I compiled from your proposals with this commit message:

Author: Huanghui Nie <nnnjkk@gmail.com>
Date:   Mon Jan 22 06:45:48 2024 +0100

     libstdc++: [_Hashtable] Remove useless check for _M_before_begin

     When removing the first node of a bucket it is useless to check if 
this bucket
     is the one containing the _M_before_begin node. The bucket 
before-begin node is
     already transfered to the next pointed-to bucket regardeless if it 
is the container
     before-begin node.

     libstdc++-v3/ChangeLog:

             * include/bits/hashtable.h 
(_Hahstable<>::_M_remove_bucket_begin): Remove
             _M_before_begin check and cleanup implementation.

     Co-authored-by: Théo Papadopoulo <papadopoulo@gmail.com>

Let me know if ok to commit ?

François


On 22/01/2024 14:42, Théo Papadopoulo wrote:
> On 1/22/24 14:29, Jonathan Wakely wrote:
>> On Mon, 22 Jan 2024 at 13:22, Théo Papadopoulo 
>> <papadopoulo@gmail.com> wrote:
>>> On 1/22/24 11:07, Jonathan Wakely wrote:
>>>> On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo 
>>>> <papadopoulo@gmail.com> wrote:
>>>>> Wouldn't it be clearer if written as:
>>>>>
>>>>> if (!__next_n) {
>>>>>       _M_buckets[__bkt] = nullptr;
>>>>> } else if (__next_bkt != __bkt) {
>>>>>       _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>>>>       _M_buckets[__bkt] = nullptr;
>>>>> }
>>>>>
>>>>> I think this is strictly equivalent and clearer.
>>>> But it does two branches, when the second one isn't necessary. If
>>>> __next_bkt == __bkt the assignment is harmless.
>>>>
>>>> I think it's much clearer the way Huanghui Nie originally suggested.
>>> But there are two branches in the original code as well (the ifs are
>>> just nested)...
>>> And they are sequential in both cases (with the !next_n on the 
>>> faster path).
>>>    The condition are just a bit simpler in the proposed new form.
>>
>> Ah, yes, so you mean also replacing the earlier condition:
>>          if (!__next_n || __next_bkt != __bkt)
>>
>> I was only looking at the changed code not the earlier context.
>> /facepalm
>>
>> Yes, looking at the full context I like your suggestion.
>
> Similarly, with your response, I too understand better your answer !!!
>
>
>>
>>> Anyway, as a (the) maintainer, your opinion is the most important 
>>> and if
>>> you are more at ease
>>> with one form than the other, so be it.
>>>
>>>       Theo.
>>>
>>>
>>>
>>>>>       Theo.
>>>>>
>>>>>
>>>>> On 1/18/24 10:26, Huanghui Nie wrote:
>>>>>
>>>>> Yes, I have. I did a benchmark today.
>>>>>
>>>>> The conclusion is: the time consumption can be reduced by 0.4% ~ 
>>>>> 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when 
>>>>> erase(begin(), end()).
>>>>>
>>>>>
>>>>> My test environment:
>>>>>
>>>>> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
>>>>>
>>>>> MEM: 256G
>>>>>
>>>>> OS: CentOS-8.2
>>>>>
>>>>> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
>>>>>
>>>>> Compile flags: -O3 -std=c++17
>>>>>
>>>>>
>>>>> Test conclusion data (time taken to delete every 100 million 
>>>>> elements):
>>>>>
>>>>> erase(begin()):
>>>>>
>>>>> |size of unordered_set   |100     |1,000   |10,000 |100,000 
>>>>> |1,000,000|10,000,000|
>>>>>
>>>>> |base time consuming 
>>>>> (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
>>>>>
>>>>> |test time consuming 
>>>>> (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
>>>>>
>>>>> |Time-consuming reduction|1.16%   |0.48%   |1.02% |0.77%   
>>>>> |0.40%    |1.19%     |
>>>>>
>>>>> erase(begin(),end()):
>>>>>
>>>>> |size of unordered_set   |100     |1,000   |10,000 |100,000 
>>>>> |1,000,000|10,000,000|
>>>>>
>>>>> |base time consuming 
>>>>> (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
>>>>>
>>>>> |test time consuming 
>>>>> (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
>>>>>
>>>>> |Time-consuming reduction|2.39%   |1.52%   |1.56% |1.27%   
>>>>> |1.54%    |2.30%     |
>>>>>
>>>>>
>>>>> Please see the attachment for test code and detailed test result.
>>>>>
>>>>>
>>>>> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
>>>>>> Hi
>>>>>>
>>>>>> Looks like a great finding to me, this is indeed a useless check, 
>>>>>> thanks!
>>>>>>
>>>>>> Have you any figures on the performance enhancement ? It might 
>>>>>> help to get proper approval as gcc is currently in dev stage 4 
>>>>>> that is to say only bug fixes normally.
>>>>>>
>>>>>> François
>>>>>>
>>>>>> On 17/01/2024 09:11, Huanghui Nie wrote:
>>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> When I implemented a hash table with reference to the C++ STL, I 
>>>>>> found that when the hash table in the C++ STL deletes elements, 
>>>>>> if the first element deleted is the begin element, the before 
>>>>>> begin node is repeatedly assigned. This creates unnecessary 
>>>>>> performance overhead.
>>>>>>
>>>>>>
>>>>>> First, let’s see the code implementation:
>>>>>>
>>>>>> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned 
>>>>>> when &_M_before_begin == _M_buckets[__bkt]. That also means 
>>>>>> _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>>>>>>
>>>>>> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>>>>>>
>>>>>> Case _M_erase a range: _M_remove_bucket_begin is called in a for 
>>>>>> loop when __is_bucket_begin is true. And if __is_bucket_begin is 
>>>>>> true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be 
>>>>>> &_M_before_begin. __prev_n->_M_nxt is always assigned in 
>>>>>> _M_erase. That means _M_before_begin._M_nxt is always assigned, 
>>>>>> if _M_remove_bucket_begin is called and &_M_before_begin == 
>>>>>> _M_buckets[__bkt]. So there’s no need to assign 
>>>>>> _M_before_begin._M_nxt in _M_remove_bucket_begin.
>>>>>> Other cases: _M_remove_bucket_begin is called when __prev_n == 
>>>>>> _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in 
>>>>>> _M_erase and _M_before_begin. That means 
>>>>>> _M_buckets[__bkt]->_M_nxt is always assigned. So there's no need 
>>>>>> to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
>>>>>>
>>>>>> In summary, there’s no need to check &_M_before_begin == 
>>>>>> _M_buckets[__bkt] and assign _M_before_begin._M_nxt in 
>>>>>> _M_remove_bucket_begin.
>>>>>>
>>>>>>
>>>>>> Then let’s see the responsibility of each method:
>>>>>>
>>>>>> The hash table in the C++ STL is composed of hash buckets and a 
>>>>>> node list. The update of the node list is responsible for 
>>>>>> _M_erase and _M_extract_node method. _M_remove_bucket_begin 
>>>>>> method only needs to update the hash buckets. The update of 
>>>>>> _M_before_begin belongs to the update of the node list. So 
>>>>>> _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>>>>>>
>>>>>>
>>>>>> Existing tests listed below cover this change:
>>>>>>
>>>>>> 23_containers/unordered_set/allocator/copy.cc
>>>>>>
>>>>>> 23_containers/unordered_set/allocator/copy_assign.cc
>>>>>>
>>>>>> 23_containers/unordered_set/allocator/move.cc
>>>>>>
>>>>>> 23_containers/unordered_set/allocator/move_assign.cc
>>>>>>
>>>>>> 23_containers/unordered_set/allocator/swap.cc
>>>>>>
>>>>>> 23_containers/unordered_set/erase/1.cc
>>>>>>
>>>>>> 23_containers/unordered_set/erase/24061-set.cc
>>>>>>
>>>>>> 23_containers/unordered_set/modifiers/extract.cc
>>>>>>
>>>>>> 23_containers/unordered_set/operations/count.cc
>>>>>>
>>>>>> 23_containers/unordered_set/requirements/exception/basic.cc
>>>>>>
>>>>>> 23_containers/unordered_map/allocator/copy.cc
>>>>>>
>>>>>> 23_containers/unordered_map/allocator/copy_assign.cc
>>>>>>
>>>>>> 23_containers/unordered_map/allocator/move.cc
>>>>>>
>>>>>> 23_containers/unordered_map/allocator/move_assign.cc
>>>>>>
>>>>>> 23_containers/unordered_map/allocator/swap.cc
>>>>>>
>>>>>> 23_containers/unordered_map/erase/1.cc
>>>>>>
>>>>>> 23_containers/unordered_map/erase/24061-map.cc
>>>>>>
>>>>>> 23_containers/unordered_map/modifiers/extract.cc
>>>>>>
>>>>>> 23_containers/unordered_map/modifiers/move_assign.cc
>>>>>>
>>>>>> 23_containers/unordered_map/operations/count.cc
>>>>>>
>>>>>> 23_containers/unordered_map/requirements/exception/basic.cc
>>>>>>
>>>>>>
>>>>>> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>>>>>>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>>
>>>>>> libstdc++: hashtable: No need to update before begin node in 
>>>>>> _M_remove_bucket_begin
>>>>>>
>>>>>>
>>>>>> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
>>>>>>
>>>>>>
>>>>>> gcc/
>>>>>>
>>>>>> * libstdc++-v3/include/bits/hashtable.h
>>>>>>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>>
>>>>>> diff --git a/libstdc++-v3/include/bits/hashtable.h 
>>>>>> b/libstdc++-v3/include/bits/hashtable.h
>>>>>>
>>>>>> index b48610036fa..6056639e663 100644
>>>>>>
>>>>>> --- a/libstdc++-v3/include/bits/hashtable.h
>>>>>>
>>>>>> +++ b/libstdc++-v3/include/bits/hashtable.h
>>>>>>
>>>>>> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>>>>
>>>>>>           if (!__next_n || __next_bkt != __bkt)
>>>>>>
>>>>>>             {
>>>>>>
>>>>>>               // Bucket is now empty
>>>>>>
>>>>>> -           // First update next bucket if any
>>>>>>
>>>>>> +           // Update next bucket if any
>>>>>>
>>>>>>               if (__next_n)
>>>>>>
>>>>>>                 _M_buckets[__next_bkt] = _M_buckets[__bkt];
>>>>>>
>>>>>>
>>>>>>
>>>>>> -           // Second update before begin node if necessary
>>>>>>
>>>>>> -           if (&_M_before_begin == _M_buckets[__bkt])
>>>>>>
>>>>>> -             _M_before_begin._M_nxt = __next_n;
>>>>>>
>>>>>>               _M_buckets[__bkt] = nullptr;
>>>>>>
>>>>>>             }
>>>>>>
>>>>>>          }
>>>>>>
>>>>>>
>

[-- Attachment #2: remove_bucket_begin_patch.txt --]
[-- Type: text/plain, Size: 898 bytes --]

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index b48610036fa..c3ef7a0a3d5 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -869,16 +869,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_remove_bucket_begin(size_type __bkt, __node_ptr __next_n,
 			     size_type __next_bkt)
       {
-	if (!__next_n || __next_bkt != __bkt)
+	if (!__next_n)
+	  _M_buckets[__bkt] = nullptr;
+	else if (__next_bkt != __bkt)
 	  {
-	    // Bucket is now empty
-	    // First update next bucket if any
-	    if (__next_n)
-	      _M_buckets[__next_bkt] = _M_buckets[__bkt];
-
-	    // Second update before begin node if necessary
-	    if (&_M_before_begin == _M_buckets[__bkt])
-	      _M_before_begin._M_nxt = __next_n;
+	    _M_buckets[__next_bkt] = _M_buckets[__bkt];
 	    _M_buckets[__bkt] = nullptr;
 	  }
       }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-22 21:16               ` François Dumont
@ 2024-01-24  0:25                 ` Jonathan Wakely
  2024-01-24  5:39                   ` François Dumont
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2024-01-24  0:25 UTC (permalink / raw)
  To: François Dumont; +Cc: Théo Papadopoulo, Jonathan Wakely, libstdc++

[-- Attachment #1: Type: text/plain, Size: 10589 bytes --]

On Mon, 22 Jan 2024, 21:17 François Dumont, <frs.dumont@gmail.com> wrote:

> Hi
>
> Here is what I compiled from your proposals with this commit message:
>
> Author: Huanghui Nie <nnnjkk@gmail.com>
> Date:   Mon Jan 22 06:45:48 2024 +0100
>
>      libstdc++: [_Hashtable] Remove useless check for _M_before_begin
>
>      When removing the first node of a bucket it is useless to check if
> this bucket
>      is the one containing the _M_before_begin node. The bucket
> before-begin node is
>      already transfered to the next pointed-to bucket regardeless if it
> is the container
>      before-begin node.
>
>      libstdc++-v3/ChangeLog:
>
>              * include/bits/hashtable.h
> (_Hahstable<>::_M_remove_bucket_begin): Remove
>              _M_before_begin check and cleanup implementation.
>
>      Co-authored-by: Théo Papadopoulo <papadopoulo@gmail.com>
>
> Let me know if ok to commit ?
>

OK, thanks



> François
>
>
> On 22/01/2024 14:42, Théo Papadopoulo wrote:
> > On 1/22/24 14:29, Jonathan Wakely wrote:
> >> On Mon, 22 Jan 2024 at 13:22, Théo Papadopoulo
> >> <papadopoulo@gmail.com> wrote:
> >>> On 1/22/24 11:07, Jonathan Wakely wrote:
> >>>> On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo
> >>>> <papadopoulo@gmail.com> wrote:
> >>>>> Wouldn't it be clearer if written as:
> >>>>>
> >>>>> if (!__next_n) {
> >>>>>       _M_buckets[__bkt] = nullptr;
> >>>>> } else if (__next_bkt != __bkt) {
> >>>>>       _M_buckets[__next_bkt] = _M_buckets[__bkt];
> >>>>>       _M_buckets[__bkt] = nullptr;
> >>>>> }
> >>>>>
> >>>>> I think this is strictly equivalent and clearer.
> >>>> But it does two branches, when the second one isn't necessary. If
> >>>> __next_bkt == __bkt the assignment is harmless.
> >>>>
> >>>> I think it's much clearer the way Huanghui Nie originally suggested.
> >>> But there are two branches in the original code as well (the ifs are
> >>> just nested)...
> >>> And they are sequential in both cases (with the !next_n on the
> >>> faster path).
> >>>    The condition are just a bit simpler in the proposed new form.
> >>
> >> Ah, yes, so you mean also replacing the earlier condition:
> >>          if (!__next_n || __next_bkt != __bkt)
> >>
> >> I was only looking at the changed code not the earlier context.
> >> /facepalm
> >>
> >> Yes, looking at the full context I like your suggestion.
> >
> > Similarly, with your response, I too understand better your answer !!!
> >
> >
> >>
> >>> Anyway, as a (the) maintainer, your opinion is the most important
> >>> and if
> >>> you are more at ease
> >>> with one form than the other, so be it.
> >>>
> >>>       Theo.
> >>>
> >>>
> >>>
> >>>>>       Theo.
> >>>>>
> >>>>>
> >>>>> On 1/18/24 10:26, Huanghui Nie wrote:
> >>>>>
> >>>>> Yes, I have. I did a benchmark today.
> >>>>>
> >>>>> The conclusion is: the time consumption can be reduced by 0.4% ~
> >>>>> 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when
> >>>>> erase(begin(), end()).
> >>>>>
> >>>>>
> >>>>> My test environment:
> >>>>>
> >>>>> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs
> >>>>>
> >>>>> MEM: 256G
> >>>>>
> >>>>> OS: CentOS-8.2
> >>>>>
> >>>>> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
> >>>>>
> >>>>> Compile flags: -O3 -std=c++17
> >>>>>
> >>>>>
> >>>>> Test conclusion data (time taken to delete every 100 million
> >>>>> elements):
> >>>>>
> >>>>> erase(begin()):
> >>>>>
> >>>>> |size of unordered_set   |100     |1,000   |10,000 |100,000
> >>>>> |1,000,000|10,000,000|
> >>>>>
> >>>>> |base time consuming
> >>>>> (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
> >>>>>
> >>>>> |test time consuming
> >>>>> (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
> >>>>>
> >>>>> |Time-consuming reduction|1.16%   |0.48%   |1.02% |0.77%
> >>>>> |0.40%    |1.19%     |
> >>>>>
> >>>>> erase(begin(),end()):
> >>>>>
> >>>>> |size of unordered_set   |100     |1,000   |10,000 |100,000
> >>>>> |1,000,000|10,000,000|
> >>>>>
> >>>>> |base time consuming
> >>>>> (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
> >>>>>
> >>>>> |test time consuming
> >>>>> (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
> >>>>>
> >>>>> |Time-consuming reduction|2.39%   |1.52%   |1.56% |1.27%
> >>>>> |1.54%    |2.30%     |
> >>>>>
> >>>>>
> >>>>> Please see the attachment for test code and detailed test result.
> >>>>>
> >>>>>
> >>>>> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
> >>>>>> Hi
> >>>>>>
> >>>>>> Looks like a great finding to me, this is indeed a useless check,
> >>>>>> thanks!
> >>>>>>
> >>>>>> Have you any figures on the performance enhancement ? It might
> >>>>>> help to get proper approval as gcc is currently in dev stage 4
> >>>>>> that is to say only bug fixes normally.
> >>>>>>
> >>>>>> François
> >>>>>>
> >>>>>> On 17/01/2024 09:11, Huanghui Nie wrote:
> >>>>>>
> >>>>>> Hi.
> >>>>>>
> >>>>>> When I implemented a hash table with reference to the C++ STL, I
> >>>>>> found that when the hash table in the C++ STL deletes elements,
> >>>>>> if the first element deleted is the begin element, the before
> >>>>>> begin node is repeatedly assigned. This creates unnecessary
> >>>>>> performance overhead.
> >>>>>>
> >>>>>>
> >>>>>> First, let’s see the code implementation:
> >>>>>>
> >>>>>> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned
> >>>>>> when &_M_before_begin == _M_buckets[__bkt]. That also means
> >>>>>> _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
> >>>>>>
> >>>>>> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
> >>>>>>
> >>>>>> Case _M_erase a range: _M_remove_bucket_begin is called in a for
> >>>>>> loop when __is_bucket_begin is true. And if __is_bucket_begin is
> >>>>>> true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be
> >>>>>> &_M_before_begin. __prev_n->_M_nxt is always assigned in
> >>>>>> _M_erase. That means _M_before_begin._M_nxt is always assigned,
> >>>>>> if _M_remove_bucket_begin is called and &_M_before_begin ==
> >>>>>> _M_buckets[__bkt]. So there’s no need to assign
> >>>>>> _M_before_begin._M_nxt in _M_remove_bucket_begin.
> >>>>>> Other cases: _M_remove_bucket_begin is called when __prev_n ==
> >>>>>> _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in
> >>>>>> _M_erase and _M_before_begin. That means
> >>>>>> _M_buckets[__bkt]->_M_nxt is always assigned. So there's no need
> >>>>>> to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
> >>>>>>
> >>>>>> In summary, there’s no need to check &_M_before_begin ==
> >>>>>> _M_buckets[__bkt] and assign _M_before_begin._M_nxt in
> >>>>>> _M_remove_bucket_begin.
> >>>>>>
> >>>>>>
> >>>>>> Then let’s see the responsibility of each method:
> >>>>>>
> >>>>>> The hash table in the C++ STL is composed of hash buckets and a
> >>>>>> node list. The update of the node list is responsible for
> >>>>>> _M_erase and _M_extract_node method. _M_remove_bucket_begin
> >>>>>> method only needs to update the hash buckets. The update of
> >>>>>> _M_before_begin belongs to the update of the node list. So
> >>>>>> _M_remove_bucket_begin doesn’t need to update _M_before_begin.
> >>>>>>
> >>>>>>
> >>>>>> Existing tests listed below cover this change:
> >>>>>>
> >>>>>> 23_containers/unordered_set/allocator/copy.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/allocator/copy_assign.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/allocator/move.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/allocator/move_assign.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/allocator/swap.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/erase/1.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/erase/24061-set.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/modifiers/extract.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/operations/count.cc
> >>>>>>
> >>>>>> 23_containers/unordered_set/requirements/exception/basic.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/allocator/copy.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/allocator/copy_assign.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/allocator/move.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/allocator/move_assign.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/allocator/swap.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/erase/1.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/erase/24061-map.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/modifiers/extract.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/modifiers/move_assign.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/operations/count.cc
> >>>>>>
> >>>>>> 23_containers/unordered_map/requirements/exception/basic.cc
> >>>>>>
> >>>>>>
> >>>>>> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
> >>>>>>
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> ChangeLog:
> >>>>>>
> >>>>>>
> >>>>>> libstdc++: hashtable: No need to update before begin node in
> >>>>>> _M_remove_bucket_begin
> >>>>>>
> >>>>>>
> >>>>>> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
> >>>>>>
> >>>>>>
> >>>>>> gcc/
> >>>>>>
> >>>>>> * libstdc++-v3/include/bits/hashtable.h
> >>>>>>
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>>
> >>>>>> diff --git a/libstdc++-v3/include/bits/hashtable.h
> >>>>>> b/libstdc++-v3/include/bits/hashtable.h
> >>>>>>
> >>>>>> index b48610036fa..6056639e663 100644
> >>>>>>
> >>>>>> --- a/libstdc++-v3/include/bits/hashtable.h
> >>>>>>
> >>>>>> +++ b/libstdc++-v3/include/bits/hashtable.h
> >>>>>>
> >>>>>> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>>>>
> >>>>>>           if (!__next_n || __next_bkt != __bkt)
> >>>>>>
> >>>>>>             {
> >>>>>>
> >>>>>>               // Bucket is now empty
> >>>>>>
> >>>>>> -           // First update next bucket if any
> >>>>>>
> >>>>>> +           // Update next bucket if any
> >>>>>>
> >>>>>>               if (__next_n)
> >>>>>>
> >>>>>>                 _M_buckets[__next_bkt] = _M_buckets[__bkt];
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -           // Second update before begin node if necessary
> >>>>>>
> >>>>>> -           if (&_M_before_begin == _M_buckets[__bkt])
> >>>>>>
> >>>>>> -             _M_before_begin._M_nxt = __next_n;
> >>>>>>
> >>>>>>               _M_buckets[__bkt] = nullptr;
> >>>>>>
> >>>>>>             }
> >>>>>>
> >>>>>>          }
> >>>>>>
> >>>>>>
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin
  2024-01-24  0:25                 ` Jonathan Wakely
@ 2024-01-24  5:39                   ` François Dumont
  0 siblings, 0 replies; 13+ messages in thread
From: François Dumont @ 2024-01-24  5:39 UTC (permalink / raw)
  To: Jonathan Wakely, Huanghui Nie
  Cc: Théo Papadopoulo, Jonathan Wakely, libstdc++

[-- Attachment #1: Type: text/plain, Size: 11878 bytes --]

That's now in.

Thanks again for your contributions.

On 24/01/2024 01:25, Jonathan Wakely wrote:
>
>
> On Mon, 22 Jan 2024, 21:17 François Dumont, <frs.dumont@gmail.com> wrote:
>
>     Hi
>
>     Here is what I compiled from your proposals with this commit message:
>
>     Author: Huanghui Nie <nnnjkk@gmail.com>
>     Date:   Mon Jan 22 06:45:48 2024 +0100
>
>          libstdc++: [_Hashtable] Remove useless check for _M_before_begin
>
>          When removing the first node of a bucket it is useless to
>     check if
>     this bucket
>          is the one containing the _M_before_begin node. The bucket
>     before-begin node is
>          already transfered to the next pointed-to bucket regardeless
>     if it
>     is the container
>          before-begin node.
>
>          libstdc++-v3/ChangeLog:
>
>                  * include/bits/hashtable.h
>     (_Hahstable<>::_M_remove_bucket_begin): Remove
>                  _M_before_begin check and cleanup implementation.
>
>          Co-authored-by: Théo Papadopoulo <papadopoulo@gmail.com>
>
>     Let me know if ok to commit ?
>
>
> OK, thanks
>
>
>
>     François
>
>
>     On 22/01/2024 14:42, Théo Papadopoulo wrote:
>     > On 1/22/24 14:29, Jonathan Wakely wrote:
>     >> On Mon, 22 Jan 2024 at 13:22, Théo Papadopoulo
>     >> <papadopoulo@gmail.com> wrote:
>     >>> On 1/22/24 11:07, Jonathan Wakely wrote:
>     >>>> On Thu, 18 Jan 2024 at 15:59, Théo Papadopoulo
>     >>>> <papadopoulo@gmail.com> wrote:
>     >>>>> Wouldn't it be clearer if written as:
>     >>>>>
>     >>>>> if (!__next_n) {
>     >>>>>       _M_buckets[__bkt] = nullptr;
>     >>>>> } else if (__next_bkt != __bkt) {
>     >>>>>       _M_buckets[__next_bkt] = _M_buckets[__bkt];
>     >>>>>       _M_buckets[__bkt] = nullptr;
>     >>>>> }
>     >>>>>
>     >>>>> I think this is strictly equivalent and clearer.
>     >>>> But it does two branches, when the second one isn't necessary. If
>     >>>> __next_bkt == __bkt the assignment is harmless.
>     >>>>
>     >>>> I think it's much clearer the way Huanghui Nie originally
>     suggested.
>     >>> But there are two branches in the original code as well (the
>     ifs are
>     >>> just nested)...
>     >>> And they are sequential in both cases (with the !next_n on the
>     >>> faster path).
>     >>>    The condition are just a bit simpler in the proposed new form.
>     >>
>     >> Ah, yes, so you mean also replacing the earlier condition:
>     >>          if (!__next_n || __next_bkt != __bkt)
>     >>
>     >> I was only looking at the changed code not the earlier context.
>     >> /facepalm
>     >>
>     >> Yes, looking at the full context I like your suggestion.
>     >
>     > Similarly, with your response, I too understand better your
>     answer !!!
>     >
>     >
>     >>
>     >>> Anyway, as a (the) maintainer, your opinion is the most important
>     >>> and if
>     >>> you are more at ease
>     >>> with one form than the other, so be it.
>     >>>
>     >>>       Theo.
>     >>>
>     >>>
>     >>>
>     >>>>>       Theo.
>     >>>>>
>     >>>>>
>     >>>>> On 1/18/24 10:26, Huanghui Nie wrote:
>     >>>>>
>     >>>>> Yes, I have. I did a benchmark today.
>     >>>>>
>     >>>>> The conclusion is: the time consumption can be reduced by
>     0.4% ~
>     >>>>> 1.2% when unordered_set erase(begin()), and 1.2% ~ 2.4% when
>     >>>>> erase(begin(), end()).
>     >>>>>
>     >>>>>
>     >>>>> My test environment:
>     >>>>>
>     >>>>> CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365
>     MHz, 56 CPUs
>     >>>>>
>     >>>>> MEM: 256G
>     >>>>>
>     >>>>> OS: CentOS-8.2
>     >>>>>
>     >>>>> g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
>     >>>>>
>     >>>>> Compile flags: -O3 -std=c++17
>     >>>>>
>     >>>>>
>     >>>>> Test conclusion data (time taken to delete every 100 million
>     >>>>> elements):
>     >>>>>
>     >>>>> erase(begin()):
>     >>>>>
>     >>>>> |size of unordered_set   |100 |1,000   |10,000 |100,000
>     >>>>> |1,000,000|10,000,000|
>     >>>>>
>     >>>>> |base time consuming
>     >>>>> (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 |3854.168  |
>     >>>>>
>     >>>>> |test time consuming
>     >>>>> (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 |3808.137  |
>     >>>>>
>     >>>>> |Time-consuming reduction|1.16% |0.48%   |1.02% |0.77%
>     >>>>> |0.40%    |1.19%     |
>     >>>>>
>     >>>>> erase(begin(),end()):
>     >>>>>
>     >>>>> |size of unordered_set   |100 |1,000   |10,000 |100,000
>     >>>>> |1,000,000|10,000,000|
>     >>>>>
>     >>>>> |base time consuming
>     >>>>> (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 |2804.099  |
>     >>>>>
>     >>>>> |test time consuming
>     >>>>> (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 |2739.727  |
>     >>>>>
>     >>>>> |Time-consuming reduction|2.39% |1.52%   |1.56% |1.27%
>     >>>>> |1.54%    |2.30%     |
>     >>>>>
>     >>>>>
>     >>>>> Please see the attachment for test code and detailed test
>     result.
>     >>>>>
>     >>>>>
>     >>>>> 2024年1月18日(木) 4:04 François Dumont <frs.dumont@gmail.com>:
>     >>>>>> Hi
>     >>>>>>
>     >>>>>> Looks like a great finding to me, this is indeed a useless
>     check,
>     >>>>>> thanks!
>     >>>>>>
>     >>>>>> Have you any figures on the performance enhancement ? It might
>     >>>>>> help to get proper approval as gcc is currently in dev stage 4
>     >>>>>> that is to say only bug fixes normally.
>     >>>>>>
>     >>>>>> François
>     >>>>>>
>     >>>>>> On 17/01/2024 09:11, Huanghui Nie wrote:
>     >>>>>>
>     >>>>>> Hi.
>     >>>>>>
>     >>>>>> When I implemented a hash table with reference to the C++
>     STL, I
>     >>>>>> found that when the hash table in the C++ STL deletes
>     elements,
>     >>>>>> if the first element deleted is the begin element, the before
>     >>>>>> begin node is repeatedly assigned. This creates unnecessary
>     >>>>>> performance overhead.
>     >>>>>>
>     >>>>>>
>     >>>>>> First, let’s see the code implementation:
>     >>>>>>
>     >>>>>> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned
>     >>>>>> when &_M_before_begin == _M_buckets[__bkt]. That also means
>     >>>>>> _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>     >>>>>>
>     >>>>>> _M_remove_bucket_begin is called by _M_erase and
>     _M_extract_node:
>     >>>>>>
>     >>>>>> Case _M_erase a range: _M_remove_bucket_begin is called in
>     a for
>     >>>>>> loop when __is_bucket_begin is true. And if
>     __is_bucket_begin is
>     >>>>>> true and &_M_before_begin == _M_buckets[__bkt], __prev_n
>     must be
>     >>>>>> &_M_before_begin. __prev_n->_M_nxt is always assigned in
>     >>>>>> _M_erase. That means _M_before_begin._M_nxt is always
>     assigned,
>     >>>>>> if _M_remove_bucket_begin is called and &_M_before_begin ==
>     >>>>>> _M_buckets[__bkt]. So there’s no need to assign
>     >>>>>> _M_before_begin._M_nxt in _M_remove_bucket_begin.
>     >>>>>> Other cases: _M_remove_bucket_begin is called when __prev_n ==
>     >>>>>> _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in
>     >>>>>> _M_erase and _M_before_begin. That means
>     >>>>>> _M_buckets[__bkt]->_M_nxt is always assigned. So there's no
>     need
>     >>>>>> to assign _M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.
>     >>>>>>
>     >>>>>> In summary, there’s no need to check &_M_before_begin ==
>     >>>>>> _M_buckets[__bkt] and assign _M_before_begin._M_nxt in
>     >>>>>> _M_remove_bucket_begin.
>     >>>>>>
>     >>>>>>
>     >>>>>> Then let’s see the responsibility of each method:
>     >>>>>>
>     >>>>>> The hash table in the C++ STL is composed of hash buckets
>     and a
>     >>>>>> node list. The update of the node list is responsible for
>     >>>>>> _M_erase and _M_extract_node method. _M_remove_bucket_begin
>     >>>>>> method only needs to update the hash buckets. The update of
>     >>>>>> _M_before_begin belongs to the update of the node list. So
>     >>>>>> _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>     >>>>>>
>     >>>>>>
>     >>>>>> Existing tests listed below cover this change:
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/allocator/copy.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/allocator/copy_assign.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/allocator/move.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/allocator/move_assign.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/allocator/swap.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/erase/1.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/erase/24061-set.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/modifiers/extract.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/operations/count.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_set/requirements/exception/basic.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/allocator/copy.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/allocator/copy_assign.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/allocator/move.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/allocator/move_assign.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/allocator/swap.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/erase/1.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/erase/24061-map.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/modifiers/extract.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/modifiers/move_assign.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/operations/count.cc
>     >>>>>>
>     >>>>>> 23_containers/unordered_map/requirements/exception/basic.cc
>     >>>>>>
>     >>>>>>
>     >>>>>> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>     >>>>>>
>     >>>>>>
>     >>>>>> ---
>     >>>>>>
>     >>>>>> ChangeLog:
>     >>>>>>
>     >>>>>>
>     >>>>>> libstdc++: hashtable: No need to update before begin node in
>     >>>>>> _M_remove_bucket_begin
>     >>>>>>
>     >>>>>>
>     >>>>>> 2024-01-16  Huanghui Nie  <nnnjkk@gmail.com>
>     >>>>>>
>     >>>>>>
>     >>>>>> gcc/
>     >>>>>>
>     >>>>>> * libstdc++-v3/include/bits/hashtable.h
>     >>>>>>
>     >>>>>>
>     >>>>>> ---
>     >>>>>>
>     >>>>>>
>     >>>>>> diff --git a/libstdc++-v3/include/bits/hashtable.h
>     >>>>>> b/libstdc++-v3/include/bits/hashtable.h
>     >>>>>>
>     >>>>>> index b48610036fa..6056639e663 100644
>     >>>>>>
>     >>>>>> --- a/libstdc++-v3/include/bits/hashtable.h
>     >>>>>>
>     >>>>>> +++ b/libstdc++-v3/include/bits/hashtable.h
>     >>>>>>
>     >>>>>> @@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     >>>>>>
>     >>>>>>           if (!__next_n || __next_bkt != __bkt)
>     >>>>>>
>     >>>>>>             {
>     >>>>>>
>     >>>>>>               // Bucket is now empty
>     >>>>>>
>     >>>>>> -           // First update next bucket if any
>     >>>>>>
>     >>>>>> +           // Update next bucket if any
>     >>>>>>
>     >>>>>>               if (__next_n)
>     >>>>>>
>     >>>>>> _M_buckets[__next_bkt] = _M_buckets[__bkt];
>     >>>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>> -           // Second update before begin node if necessary
>     >>>>>>
>     >>>>>> -           if (&_M_before_begin == _M_buckets[__bkt])
>     >>>>>>
>     >>>>>> - _M_before_begin._M_nxt = __next_n;
>     >>>>>>
>     >>>>>>               _M_buckets[__bkt] = nullptr;
>     >>>>>>
>     >>>>>>             }
>     >>>>>>
>     >>>>>>          }
>     >>>>>>
>     >>>>>>
>     >
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-01-24  5:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-17  8:11 [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin Huanghui Nie
2024-01-17 20:04 ` François Dumont
2024-01-18  9:26   ` Huanghui Nie
2024-01-18 15:58     ` Théo Papadopoulo
2024-01-22 10:07       ` Jonathan Wakely
2024-01-22 10:11         ` Jonathan Wakely
2024-01-22 13:22         ` Théo Papadopoulo
2024-01-22 13:29           ` Jonathan Wakely
2024-01-22 13:42             ` Théo Papadopoulo
2024-01-22 21:16               ` François Dumont
2024-01-24  0:25                 ` Jonathan Wakely
2024-01-24  5:39                   ` François Dumont
2024-01-22  6:14     ` François Dumont

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).