From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by sourceware.org (Postfix) with ESMTPS id E082E385F01D for ; Thu, 19 Mar 2020 10:21:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E082E385F01D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=seketeli.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dodji@seketeli.org X-Originating-IP: 91.166.131.130 Received: from localhost (91-166-131-130.subs.proxad.net [91.166.131.130]) (Authenticated sender: dodji@seketeli.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id BBBD41C0010; Thu, 19 Mar 2020 10:21:00 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id E824E581890; Thu, 19 Mar 2020 11:20:59 +0100 (CET) From: Dodji Seketeli To: Giuliano Procida Cc: libabigail@sourceware.org, kernel-team@android.com Subject: Re: [PATCH v2] dwarf-reader: Use all bits of Bloom filter words. Organization: Me, myself and I References: <20200318113754.GI211970@google.com> <20200318121241.146259-1-gprocida@google.com> X-Operating-System: Fedora 33 X-URL: http://www.seketeli.net/~dodji Date: Thu, 19 Mar 2020 11:20:59 +0100 In-Reply-To: <20200318121241.146259-1-gprocida@google.com> (Giuliano Procida's message of "Wed, 18 Mar 2020 12:12:41 +0000") Message-ID: <877dzgtzlg.fsf@seketeli.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=-25.2 required=5.0 tests=GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libabigail mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Mar 2020 10:21:03 -0000 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello Giuliano, Giuliano Procida a =C3=A9crit: [...] > diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc > index 3454fcf5..5556bde5 100644 > --- a/src/abg-dwarf-reader.cc > +++ b/src/abg-dwarf-reader.cc > @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment* = env, > // filter, in bits. > int c =3D get_elf_class_size_in_bytes(elf_handle) * 8; > int n =3D (h1 / c) % ht.bf_nwords; > - unsigned char bitmask =3D (1ul << (h1 % c)) | (1ul << (h2 % c)); > + GElf_Word bitmask =3D (1ul << (h1 % c)) | (1ul << (h2 % c)); Good catch! I wonder what made me chose a byte-wide bitmask to begin with. The docmentation of the bloom filter of the GNU Hash table at https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2 says: "The filter consists of maskwords words, each of which is 32-bits (ELFCLASS32) or 64-bits (ELFCLASS64) depending on the class of object" So it's clear, the bitmask must be of type GElf_Word. Have you seen any practical speed increase in the loading of ELF binaries due to this change? [...] > Most of the bit values used for GNU hash ELF section Bloom filtering > were being ignored due to integer narrowing, reducing missing symbol > filtering efficiency considerably. > > This patch fixes this. > > * src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab): > Don't narrow calculated Bloom word to 8 bits before using it > to mask the fetched Bloom word. I have amended the patch to have the ChangeLog section above come last in the commit log message. It really needs to come last because we have a script that build a ChangeLog file from all the change log parts of the commit log and that one is shipped in the released tarball, making it compliant with the GNU tarball standards. That way, users who consume the tarball can access some history without needing to have (internet) access to the full Git history. [...] > Note on testing. > > The .gnu.hash section seems to be present in all the .so but none of > the .o test files. abisym doesn't appear to find dynamic symbols (nm > -D), only normal ones, so it was a little tricky to test this. > > I found a Debian .so (libpthread) with both the .gnu.hash section and > normal symbols. abisym behaves identically with my change, looking up > lots of present and non-present (as far as it's concerned) symbols. I > just extracted a full list with nm/sed and looked up each one. > > 389 symbols looked up, 241 present, 148 absent > 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency) > 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency) Thanks. So I am applying this to master. Below is the actual patch that got applied. Cheers, --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-dwarf-reader-Use-all-bits-of-Bloom-filter-words.patch Content-Description: applied patch >From e0950e64279d59e92de5170d805be99f84cdfd8d Mon Sep 17 00:00:00 2001 From: Giuliano Procida Date: Wed, 18 Mar 2020 12:12:41 +0000 Subject: [PATCH] dwarf-reader: Use all bits of Bloom filter words. Most of the bit values used for GNU hash ELF section Bloom filtering were being ignored due to integer narrowing, reducing missing symbol filtering efficiency considerably. This patch fixes this. Note on testing. The .gnu.hash section seems to be present in all the .so but none of the .o test files. abisym doesn't appear to find dynamic symbols (nm -D), only normal ones, so it was a little tricky to test this. I found a Debian .so (libpthread) with both the .gnu.hash section and normal symbols. abisym behaves identically with my change, looking up lots of present and non-present (as far as it's concerned) symbols. I just extracted a full list with nm/sed and looked up each one. 389 symbols looked up, 241 present, 148 absent 8-bit filter: 336 maybe, 53 no (53/148 filtering efficiency) 64-bit filter: 255 maybe, 134 no (134/148 filtering efficiency) * src/abg-dwarf-reader.cc (lookup_symbol_from_gnu_hash_tab): Don't narrow calculated Bloom word to 8 bits before using it to mask the fetched Bloom word. Signed-off-by: Giuliano Procida Signed-off-by: Dodji Seketeli --- src/abg-dwarf-reader.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc index ff532cfd..5bc8a157 100644 --- a/src/abg-dwarf-reader.cc +++ b/src/abg-dwarf-reader.cc @@ -2025,7 +2025,7 @@ lookup_symbol_from_gnu_hash_tab(const environment* env, // filter, in bits. int c = get_elf_class_size_in_bytes(elf_handle) * 8; int n = (h1 / c) % ht.bf_nwords; - unsigned char bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); + GElf_Word bitmask = (1ul << (h1 % c)) | (1ul << (h2 % c)); // Test if the symbol is *NOT* present in this ELF file. if ((bloom_word_at(elf_handle, ht.bloom_filter, n) & bitmask) != bitmask) -- 2.25.0 --=-=-= Content-Type: text/plain -- Dodji --=-=-=--