From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-517200-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 58688 invoked by alias); 10 Jan 2020 22:22:04 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 58671 invoked by uid 89); 10 Jan 2020 22:22:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-5.2 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=
X-HELO: us-smtp-1.mimecast.com
Received: from us-smtp-delivery-1.mimecast.com (HELO us-smtp-1.mimecast.com) (207.211.31.120) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 10 Jan 2020 22:22:02 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;	s=mimecast20190719; t=1578694921;	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:	 content-transfer-encoding:content-transfer-encoding:	 in-reply-to:in-reply-to:references:references;	bh=30zmPi4zbSPKMMCEwgij6aKaOQaZZVs80QYsRsaaShw=;	b=IlvUtF7VljsWw0Q0ml+43+2/rQaf7oQnelFWgq+4sObuOyW0Mr+r+I8w8cOzJnQ1+acFHP	mhWwl7FaRX4zJWkTn4q3OO+Ffz/oIiovSI7UJw35mgKCuPYxDX55b1f3JRmst/HgwBi76K	KAZD4xiHUsXjEZHWrRvV2mGJ1RMtB2Y=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-11-jx5IhO08Phql74tcIFUCpw-1; Fri, 10 Jan 2020 17:21:57 -0500
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A031A1800D78;	Fri, 10 Jan 2020 22:21:56 +0000 (UTC)
Received: from localhost (unknown [10.33.36.12])	by smtp.corp.redhat.com (Postfix) with ESMTP id 15EA55C54A;	Fri, 10 Jan 2020 22:21:55 +0000 (UTC)
Date: Fri, 10 Jan 2020 22:31:00 -0000
From: Jonathan Wakely <jwakely@redhat.com>
To: =?iso-8859-1?Q?Fran=E7ois?= Dumont <frs.dumont@gmail.com>
Cc: "libstdc++@gcc.gnu.org" <libstdc++@gcc.gnu.org>,	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++/91223 Improve unordered containers == operator
Message-ID: <20200110222155.GG490107@redhat.com>
References: <25edbd87-809b-9c75-48a0-294852ea40f2@gmail.com> <20200110220128.GF490107@redhat.com>
MIME-Version: 1.0
In-Reply-To: <20200110220128.GF490107@redhat.com>
X-Clacks-Overhead: GNU Terry Pratchett
X-Mimecast-Spam-Score: 0
Content-Type: text/plain; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SW-Source: 2020-01/txt/msg00645.txt.bz2

On 10/01/20 22:01 +0000, Jonathan Wakely wrote:
>On 10/01/20 18:54 +0100, Fran=E7ois Dumont wrote:
>>Hi
>>
>>=A0=A0=A0 Here is my attempt to improve =3D=3D operator.
>>
>>=A0=A0=A0 There is a small optimization for the std::unordered_mutiXXX=20
>>containers but the main enhancement rely on some partial template=20
>>specialization of the _Equality type. I limit it to usage of=20
>>unordered containers with std::equal_to to be sure that the=20
>>container _Equal functor is like the key type =3D=3D.
>
>I think we can assume that for any _Equal, not just std::equal_to:
>http://eel.is/c++draft/unord.req#12.sentence-5
>
>However ...
>
>>=A0=A0=A0 Do I need to also consider user partial template specialization=
=20
>>of std::equal_to ? It is a well know bad practice so I hope the=20
>>Standard says that such a partial specialization leads to undefined=20
>>behavior.
>
>It's certainly not undefined to specialize equal_to, and I'm not sure
>how to make your optimisations valid in that case.
>
>Consider:
>
>struct X
>{
>  int i;
>  int rounded() const { return i - (i % 10); }
>  bool operator=3D=3D(X x) const { return i =3D=3D x.i; }
>};
>
>template<> struct std::equal_to<X>
>{
>  bool operator()(X l, X r) const
>  { return l.rounded() =3D=3D r.rounded(); }
>};
>
>template<> std::hash<X>
>{
>  bool operator()(X x) const { return hash<int>()(x.rounded()); }
>};
>
>std::unordered_multiset<X> u{ X{10}, X{11}, X{12} };
>std::unordered_multiset<X> v{ X{15}, X{16}, X{17} };
>bool b1 =3D u =3D=3D v;
>bool b2 =3D std::is_permutation(u.begin(), u.end(), v.begin());
>assert(b1 =3D=3D b2);
>
>I think the last new specialization in your patch would be used for
>this case, and because __x_count =3D=3D v.count(*u.begin()) it will say
>they're equal. But the is_permutation call says they're not.
>
>So I think the assertion would fail, but the standard says it should
>pass. Am I mistaken?

I believe the optimization would still be valid if you do not use
__other.count(*__itx) to check for equivalent keys in the other
container. Your patch does:

+	  if (__x_count !=3D __other.count(*__itx))
+	    return false;

This uses the _Equal predicate to count the equivalent elements in
__other. Instead you need to use operator=3D=3D to count the **equal**
elements.

I think there's a similar problem in the _Equality specialization for
unordered_map (i.e. key-value pairs, unique keys):

+	  const auto __ity =3D __other.find(__itx->first);
+	  if (__ity =3D=3D __other.end() || !bool(__ity->second =3D=3D __itx->sec=
ond))
+	    return false;

The call to __other.find(__itx->first) will return an element with
equivalent key, but that's not guaranteed to be equal. I think you
could fix this either by still using =3D=3D to compare the keys after
__other.find(*__itx) returns an element (which doesn't fix the PR91263
bug) or by replacing find with a similar operation that looks up the
hash code and then uses =3D=3D to test for equality (instead of using
_Equal pred to test for equivalent keys).

Basically, you can't use functions like find and count that rely on
equivalence of keys, you need to use handwritten lookups using =3D=3D.

And if you do that, then it doesn't matter whether _Equal is a
specialization of std::equal_to or not, and it doesn't matter whether
the user has defined their own specialization of std::equal_to. You
can do the optimizations for any _Equal, because you won't actually be
using it to test for equality.

Does that make sense?