public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/115285] New: std::unordered_set can have duplicate values
@ 2024-05-30  1:36 bugs at phlexo dot com
  2024-05-30  1:40 ` [Bug libstdc++/115285] " bugs at phlexo dot com
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: bugs at phlexo dot com @ 2024-05-30  1:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

            Bug ID: 115285
           Summary: std::unordered_set can have duplicate values
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bugs at phlexo dot com
  Target Milestone: ---

When a derived class, which uses its base class' hasher and performs a
conversion during the base class constructor call, is used as the value_type of
a std::unordered_set object, the std::unordered_set will contain duplicates if
it's constructed from iterators of a container holding base class objects that
result in duplicate values after the conversion.

Example:

#include <string>
#include <vector>
#include <unordered_set>
#include <cassert>

class TrimmedStr : public std::string {
        static std::string trim_str(std::string const &str) {
                auto start = str.find_first_not_of(" \r\n\t");

                return start == std::string::npos
                        ? str
                        : str.substr(start, str.find_last_not_of(" \r\n\t") -
start + 1);
        }

        public:
        TrimmedStr(std::string const &arg) : std::string{trim_str(arg)} {}
        TrimmedStr(char const *arg) : TrimmedStr{std::string{arg}} {}
};

namespace std {
        template<> struct hash<TrimmedStr> : public std::hash<std::string> {};
}

int main() {
        assert(TrimmedStr{"foo "} == std::string{"foo"});
        assert(TrimmedStr{" foo"} == std::string{"foo"});
        assert(TrimmedStr{" foo "} == std::string{"foo"});

        std::unordered_set<TrimmedStr> set_from_initializer_list{"foo", "bar",
" foo ", " bar "};
        assert(set_from_initializer_list.size() == 2); //Passes.

        std::vector<std::string> args{"foo", "bar", " foo ", " bar "};
        std::unordered_set<TrimmedStr> set_from_iterators{args.begin(),
args.end()};
        assert(set_from_iterators.size() == 2); //Fails.
}

The final assertion in this code fails with libstdc++ v13 and above, but works
as expected with libstdc++, and libstdc++ < v13.

Bisecting suggests this was introduced in commit
dc9b92facf87a6f2d8b0e5d5fc404f30c3b15a74.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] std::unordered_set can have duplicate values
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
@ 2024-05-30  1:40 ` bugs at phlexo dot com
  2024-05-30  1:43 ` [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate values since r13-1114 pinskia at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugs at phlexo dot com @ 2024-05-30  1:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

Phil <bugs at phlexo dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugs at phlexo dot com

--- Comment #1 from Phil <bugs at phlexo dot com> ---
Apologies, the comment at the bottom should been it works as expected with
libc++, and libstdc++ < v13.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate values since r13-1114
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
  2024-05-30  1:40 ` [Bug libstdc++/115285] " bugs at phlexo dot com
@ 2024-05-30  1:43 ` pinskia at gcc dot gnu.org
  2024-05-30  1:46 ` pinskia at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  1:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.4
            Summary|std::unordered_set can have |[13/14/15 Regression]
                   |duplicate values            |std::unordered_set can have
                   |                            |duplicate values since
                   |                            |r13-1114

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
r13-1114-gdc9b92facf87a6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate values since r13-1114
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
  2024-05-30  1:40 ` [Bug libstdc++/115285] " bugs at phlexo dot com
  2024-05-30  1:43 ` [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate values since r13-1114 pinskia at gcc dot gnu.org
@ 2024-05-30  1:46 ` pinskia at gcc dot gnu.org
  2024-05-30  1:49 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  1:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Looks to be a defect:
https://cplusplus.github.io/LWG/issue2844

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate values since r13-1114
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (2 preceding siblings ...)
  2024-05-30  1:46 ` pinskia at gcc dot gnu.org
@ 2024-05-30  1:49 ` pinskia at gcc dot gnu.org
  2024-05-30  1:51 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  1:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Though I think the standard says only one is entered rather than both ... But
not which one.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate values since r13-1114
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (3 preceding siblings ...)
  2024-05-30  1:49 ` pinskia at gcc dot gnu.org
@ 2024-05-30  1:51 ` pinskia at gcc dot gnu.org
  2024-05-30  1:52 ` [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate value pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  1:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 58312
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58312&action=edit
Testcase which fails for GCC 12+ (rather than 13+)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate value
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (4 preceding siblings ...)
  2024-05-30  1:51 ` pinskia at gcc dot gnu.org
@ 2024-05-30  1:52 ` pinskia at gcc dot gnu.org
  2024-05-30  1:52 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  1:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |needs-bisection
   Target Milestone|13.4                        |12.4
            Summary|[13/14/15 Regression]       |[13/14/15 Regression]
                   |std::unordered_set can have |std::unordered_set can have
                   |duplicate values since      |duplicate value
                   |r13-1114                    |

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> r13-1114-gdc9b92facf87a6

Looks like this just exposed the issue with .insert. See my new testcase which
fails for GCC 12+ .

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate value
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (5 preceding siblings ...)
  2024-05-30  1:52 ` [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate value pinskia at gcc dot gnu.org
@ 2024-05-30  1:52 ` pinskia at gcc dot gnu.org
  2024-05-30  1:55 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  1:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-05-30
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate value
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (6 preceding siblings ...)
  2024-05-30  1:52 ` pinskia at gcc dot gnu.org
@ 2024-05-30  1:55 ` pinskia at gcc dot gnu.org
  2024-05-30 10:09 ` [Bug libstdc++/115285] [12/13/14/15 " redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  1:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=96088

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I am suspecting it was
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2c43f5ec9db163696de8691eb529df06c4999bcc
which caused the issue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [12/13/14/15 Regression] std::unordered_set can have duplicate value
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (7 preceding siblings ...)
  2024-05-30  1:55 ` pinskia at gcc dot gnu.org
@ 2024-05-30 10:09 ` redi at gcc dot gnu.org
  2024-05-30 10:24 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2024-05-30 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Or r12-6272-ge3ef832a9e8d6a

I think the testcase is invalid. The unordered_set<TrimmedStr> uses
std::equal_to<TrimmedStr> to decide if a key already exists in the container,
and that just uses operator== which is not defined for TrimmedStr, so it uses
the equality comparison for std::string. When that is used to compare
TrimmedStr("foo") == "foo "s it's equivalent to "foo"s == "foo "s which is
obviously false.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [12/13/14/15 Regression] std::unordered_set can have duplicate value
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (8 preceding siblings ...)
  2024-05-30 10:09 ` [Bug libstdc++/115285] [12/13/14/15 " redi at gcc dot gnu.org
@ 2024-05-30 10:24 ` redi at gcc dot gnu.org
  2024-05-30 10:40 ` redi at gcc dot gnu.org
  2024-06-20  9:15 ` rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2024-05-30 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

--- Comment #9 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #7)
> I am suspecting it was
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;
> h=2c43f5ec9db163696de8691eb529df06c4999bcc which caused the issue.

Ah no, I think you're right about this.

The problem is not std::equal_to, but in std::hash<TrimmedStr>, because it
takes an argument of type std::string not TrimmedStr.

This means that hashing "foo "s does not convert to a TrimmedStr first, so it
hashes the un-trimmed string.

This fixes the code:

template<> struct hash<TrimmedStr> {
  bool operator()(const TrimmedStr& s) const noexcept
  { return std::hash<std::string>{}(s); }
};

I'm not sure if the code is valid or not. It certainly seems like a problem
that a plain std::string can be compared to a TrimmedStr using
std::equal_to<TrimmedStr> *and* can be hashed using std::hash<TrimmedStr>, when
a plain std::string is *not* a TrimmedStr.

Using inheritance this way is a bad idea, certainly for the std::hash
specialization. IMHO also for the TrimmedStr itself.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [12/13/14/15 Regression] std::unordered_set can have duplicate value
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (9 preceding siblings ...)
  2024-05-30 10:24 ` redi at gcc dot gnu.org
@ 2024-05-30 10:40 ` redi at gcc dot gnu.org
  2024-06-20  9:15 ` rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: redi at gcc dot gnu.org @ 2024-05-30 10:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

--- Comment #10 from Jonathan Wakely <redi at gcc dot gnu.org> ---
When deciding whether a key already exists in the hash set we use these
overloads:

      template<typename _Kt>
        static __conditional_t<
          __and_<__is_nothrow_invocable<_Hash&, const key_type&>,
                 __not_<__is_nothrow_invocable<_Hash&, _Kt>>>::value,
          key_type, _Kt&&>
        _S_forward_key(_Kt&& __k)
        { return std::forward<_Kt>(__k); }

      static const key_type&
      _S_forward_key(const key_type& __k)
      { return __k; }


If the value being inserted, __k, can be hashed directly using the hash
function then we return it unchanged, and then hash it. If it can't be hashed
directly, we convert it to the container's key_type, which is TrimmedStr.

In this case, a std::string can be hash directly without constructing a
TrimmedStr (because of the questionable std::hash specialization using
inheritance). So we don't convert it to TrimmedStr and just hash it directly.
Hashing "foo"s and "foo "s give different hash values, so we do not consider
them to be equivalent keys.

While I think the code is highly questionable, the standard does say that
inserting [first, last) is equivalent to insert(*first) for each iterator, and
that would force an implicit conversion to value_type.

So I'm not sure if the optimization to avoid temporaries (PR 96088) is actually
allowed by the standard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug libstdc++/115285] [12/13/14/15 Regression] std::unordered_set can have duplicate value
  2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
                   ` (10 preceding siblings ...)
  2024-05-30 10:40 ` redi at gcc dot gnu.org
@ 2024-06-20  9:15 ` rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-20  9:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115285

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.4                        |12.5

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.4 is being released, retargeting bugs to GCC 12.5.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-06-20  9:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-30  1:36 [Bug libstdc++/115285] New: std::unordered_set can have duplicate values bugs at phlexo dot com
2024-05-30  1:40 ` [Bug libstdc++/115285] " bugs at phlexo dot com
2024-05-30  1:43 ` [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate values since r13-1114 pinskia at gcc dot gnu.org
2024-05-30  1:46 ` pinskia at gcc dot gnu.org
2024-05-30  1:49 ` pinskia at gcc dot gnu.org
2024-05-30  1:51 ` pinskia at gcc dot gnu.org
2024-05-30  1:52 ` [Bug libstdc++/115285] [13/14/15 Regression] std::unordered_set can have duplicate value pinskia at gcc dot gnu.org
2024-05-30  1:52 ` pinskia at gcc dot gnu.org
2024-05-30  1:55 ` pinskia at gcc dot gnu.org
2024-05-30 10:09 ` [Bug libstdc++/115285] [12/13/14/15 " redi at gcc dot gnu.org
2024-05-30 10:24 ` redi at gcc dot gnu.org
2024-05-30 10:40 ` redi at gcc dot gnu.org
2024-06-20  9:15 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).