From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 986A33954C07; Wed,  6 May 2020 21:04:19 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 986A33954C07
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug libc/25924] Very poor choice of hash function in hsearch
Date: Wed, 06 May 2020 21:04:19 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: libc
X-Bugzilla-Version: 2.30
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: carlos at redhat dot com
X-Bugzilla-Status: SUSPENDED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-25924-131-1h5rArMqO8@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-25924-131@http.sourceware.org/bugzilla/>
References: <bug-25924-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: glibc-bugs@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Glibc-bugs mailing list <glibc-bugs.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/glibc-bugs/>
List-Help: <mailto:glibc-bugs-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 21:04:19 -0000

https://sourceware.org/bugzilla/show_bug.cgi?id=3D25924
--- Comment #5 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Witold Baryluk from comment #3)
> (1) Proof of performance for various architectures, or work with the
> architecture maintainers to get them to run microbenchmarks for you.
>=20
> I am not sure what you refer to, but the performance (number of colissions
> in the hash table for a given set of keys and order) of this hash function
> is architecture independent.

Yes, number of collisions is an abstract measure of performance for the
algorithm.

To make performance changes to glibc we add microbenchmarks that model spec=
ific
workloads. We would need to add glibc/benchtests/* microbenchmarks that mea=
sure
the performance of the APIs in question with specific workloads.

Alternatively real performance numbers across a variety of architectures.

> Re:
> (2)
>=20
> As of the licensing, the only option is to use existing hash function
> (developing a new hash function is not easy, and usually ends up bad), wh=
ich
> implementation would definitively be already licensed and owned by somebo=
dy
> else, so copyright assignment to FSF is simply impossible.=20

It is not impossible. As you note, we have to reach out to the authors and =
ask
if they will contribute the copyright of the code to the FSF and glibc.

> So I don't know
> how would you go from there. All hash functions developed and published a=
re
> usually published under permissive non-copyleft licenses.
>=20
> AFAIK you can integrate MIT code into GPL / LGPL code base, without asking
> the copyright owners, as long as the used code is marked with the license
> text. It states so directly in the license text.

Integration of code and the licenses is not the problem.

Please see:
"Why the FSF gets copyright assignments from contributors"
https://www.gnu.org/licenses/why-assign.en.html

We have integrated some non-LGPL code into glibc, but we would have to have=
 a
very strong reason to do so again.

In this case I would need to see a more exhaustive search of hash functions=
 and
which licenses they are under, and which authors would be interested in
collaborating with glibc.

> Re:
> (3) Tests for the changes. Systems level work requires good testing.
>=20
> Are you implying glibc doesn't have tests and test infrastructure at the
> moment? Fix for this bug doesn't introduce any public APIs.

We have only 3 hsearch tests to verify the correct operation of hsearch.

We have empirical data from hours of operation that the *current*
implementation appears to be correct (but not as fast).

In order to change an implementation that hours of operation and few tests,
would require we increase the number of tests to validate the new
implementation.

Thus we would need more hsearch tests added.

Even if you can't provide a new hsearch implementation, it would be great to
have someone add more hsearch tests, and an hsearch microbenchmark. This wo=
uld
go a long way to preparing the project to drop in a new hash function for
hsearch.

--=20
You are receiving this mail because:
You are on the CC list for the bug.=