From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 986A33954C07; Wed, 6 May 2020 21:04:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 986A33954C07 From: "carlos at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug libc/25924] Very poor choice of hash function in hsearch Date: Wed, 06 May 2020 21:04:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: 2.30 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: SUSPENDED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 May 2020 21:04:19 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25924 --- Comment #5 from Carlos O'Donell --- (In reply to Witold Baryluk from comment #3) > (1) Proof of performance for various architectures, or work with the > architecture maintainers to get them to run microbenchmarks for you. >=20 > I am not sure what you refer to, but the performance (number of colissions > in the hash table for a given set of keys and order) of this hash function > is architecture independent. Yes, number of collisions is an abstract measure of performance for the algorithm. To make performance changes to glibc we add microbenchmarks that model spec= ific workloads. We would need to add glibc/benchtests/* microbenchmarks that mea= sure the performance of the APIs in question with specific workloads. Alternatively real performance numbers across a variety of architectures. > Re: > (2) >=20 > As of the licensing, the only option is to use existing hash function > (developing a new hash function is not easy, and usually ends up bad), wh= ich > implementation would definitively be already licensed and owned by somebo= dy > else, so copyright assignment to FSF is simply impossible.=20 It is not impossible. As you note, we have to reach out to the authors and = ask if they will contribute the copyright of the code to the FSF and glibc. > So I don't know > how would you go from there. All hash functions developed and published a= re > usually published under permissive non-copyleft licenses. >=20 > AFAIK you can integrate MIT code into GPL / LGPL code base, without asking > the copyright owners, as long as the used code is marked with the license > text. It states so directly in the license text. Integration of code and the licenses is not the problem. Please see: "Why the FSF gets copyright assignments from contributors" https://www.gnu.org/licenses/why-assign.en.html We have integrated some non-LGPL code into glibc, but we would have to have= a very strong reason to do so again. In this case I would need to see a more exhaustive search of hash functions= and which licenses they are under, and which authors would be interested in collaborating with glibc. > Re: > (3) Tests for the changes. Systems level work requires good testing. >=20 > Are you implying glibc doesn't have tests and test infrastructure at the > moment? Fix for this bug doesn't introduce any public APIs. We have only 3 hsearch tests to verify the correct operation of hsearch. We have empirical data from hours of operation that the *current* implementation appears to be correct (but not as fast). In order to change an implementation that hours of operation and few tests, would require we increase the number of tests to validate the new implementation. Thus we would need more hsearch tests added. Even if you can't provide a new hsearch implementation, it would be great to have someone add more hsearch tests, and an hsearch microbenchmark. This wo= uld go a long way to preparing the project to drop in a new hash function for hsearch. --=20 You are receiving this mail because: You are on the CC list for the bug.=