From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by sourceware.org (Postfix) with ESMTP id 9C5723850421 for ; Mon, 21 Sep 2020 13:04:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9C5723850421 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-198-5uWQMpSBNi64PFztsFXHMQ-1; Mon, 21 Sep 2020 09:04:02 -0400 X-MC-Unique: 5uWQMpSBNi64PFztsFXHMQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5ED958064B4; Mon, 21 Sep 2020 13:04:00 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-114-108.ams2.redhat.com [10.36.114.108]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5E1967EB7A; Mon, 21 Sep 2020 13:03:59 +0000 (UTC) From: Florian Weimer To: Vivek =?utf-8?Q?Das=C2=A0Mohapatra?= Cc: Mark Wielaard , GNU gABI gnu-gabi Subject: Re: ABI document References: <87o8nypmpq.fsf@oldenburg2.str.redhat.com> <616c0f661732fd1021e5a5b13ef872927390004d.camel@klomp.org> <87ime6pl2q.fsf@oldenburg2.str.redhat.com> <1597402688538.b66a61e7629438@mozgaia> <874kojj7rn.fsf@oldenburg2.str.redhat.com> Date: Mon, 21 Sep 2020 15:03:57 +0200 In-Reply-To: ("Vivek =?utf-8?Q?Das=C2=A0Mohapatra=22's?= message of "Tue, 8 Sep 2020 13:09:47 +0100 (BST)") Message-ID: <87y2l31dtu.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gnu-gabi@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gnu-gabi mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2020 13:04:06 -0000 * Vivek Das=C2=A0Mohapatra: > + The GNU hash of a symbol is computed as follows: > + - extract the NAME of the symbol > + - examples: 'foo@version-info' becomes 'foo'; 'bar' remains 'bar' > + - unsigned long h =E2=86=90 5381 > + - for each unsigned character C in NAME, starting at position 0: > + - h =E2=86=90 (h << 5) + h + C; > + - HASH =E2=86=90 h & 0xffffffff // 32 bit value Since the version is stored separately anyway, just say that the hash covers the name only? Maybe write this? h =E2=86=90 31 * h + C And just use uint32_t to express the truncation? > + > + Hash Table contents: > + > + bitmask-bits is a power of 2. > + It is at least 32 (on 32 bit); at least 64 on 64 bit architectures. > + There are other restrictions, see elflink.c in the binutils-gdb/bfd so= urce. > + > + The bucket in which a symbol's hash entry is found is: > + > + gnu-hash( symbol-name ) % nbuckets > + > + The table is divided into 4 parts: > + ----------------------------------------------------------------------= ------ > + Part 1 (metadata): > + > + - nbuckets : 4 byte native integer. Number of buckets > + A bucket occupies 32 bits. > + > + - symoffset : 4 byte native integer. > + Starting index of first "real" symbol in the ".dynsym= " > + section, See below. > + > + - bitmask-words: 4 byte native integer. > + The number of ELFCLASS words in part 2 of the t= able. > + On 64-bit architctures: bitmask-bits / 64 > + And on 32-bit ones : bitmask-bits / 32 > + > + - bloom-shift : 4 byte native integer. > + The shift-count used in the bloom filter. > + > + symoffset: > + There are synthetic symbols - one for each section in the linker outpu= t. > + symoffset gives the number of such synthetic symbols ( which cannot be > + looked up via the GNU hash section described here ). > + > + NB: symbols that _can_ be looked up via the GNU hash must be stored in > + the ".dynsym" section in ascending order of bucket. > + That is the ordering is determined by: > + > + gnu-hash( symbol-name ) % nbuckets > + > + ----------------------------------------------------------------------= ------ > + Part 2 (the bloom filter bitmask): > + > + - bloom : ElfW(Addr)[ bitmask-words ] > + > + For each symbol [name] S the following is carried out: > + - C =E2=86=90 __ELF_NATIVE_CLASS /* ie 32 on ELF32, 64 on ELF64 */ > + - H =E2=86=90 gnu-hash( S ) > + - BWORD =E2=86=90 (H / C) & (bitmask-words - 1) > + - in bloom[ BWORD ] set: > + - bit H & (C - 1) > + - bit (H >> bloom-shift) & (C - 1) Maybe say that the link editor does this? The description looks correct. > + For each symbol [name] S: > + > + - CHASH =E2=86=90 gnu-hash( S ) > + - BUCKET =E2=86=90 CHASH % nbuckets > + - CINDEX =E2=86=90 position of the symbol _within_ its bucket > + 0 for the first symbol, 1 for the second and so forth I don't understand the =E2=80=9Cwithin its bucket=E2=80=9D comment. I think CINDEX increases sequentially among the symbols where CHASH % nbuckets collides. Is that it? > + - if this is the last symbol in the bucket: > + - CHASH =E2=86=90 CHASH | 1 /* set the least bit */ > + - else > + - CHASH =E2=86=90 CHASH & ~1 /* unset the least bit */ > + - BYTE-OFFSET =E2=86=90 (bucket[BUCKET] + CINDEX - symoffset) * 4 > + - CHAIN-ADDR =E2=86=90 ((char *)&bucket[nbuckets]) + BYTE-OFFSET > + - *(ElfW(Word) *)(CHAIN-ADDR) =E2=86=90 CHASH Kind of an odd way of writing this, but I think it's correct. Thanks, Florian --=20 Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'N= eill