From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) by sourceware.org (Postfix) with ESMTP id D70A93861038 for ; Thu, 10 Sep 2020 14:27:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D70A93861038 Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-432-RHYQuw1_Meq0LpMSzgPAAQ-1; Thu, 10 Sep 2020 10:27:50 -0400 X-MC-Unique: RHYQuw1_Meq0LpMSzgPAAQ-1 Received: by mail-wr1-f72.google.com with SMTP id y3so2311838wrl.21 for ; Thu, 10 Sep 2020 07:27:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:organization:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=wIaTeFZylsHJefywjHcpOfIIWqOJ2arGwqvuanqJROY=; b=Aynwpk4/5VTDNm6w4ETIh7ArL48EgD2tgMeTjLcMhyPeWUFyv9MGphPu4eaQzVWiML VRnaHr949OUOqJkw+HM5PeZbcQCXkFRiHZdC59/JUOb9yFpKEj4fLVq4Jvf1cdEreLXb gthd5X3HbrK20K7AVFgKKCJnhS//4yQmP7yHd6g8nUAsfnEayJyIo/CBb/AbbhvdxRdr ItCXl9dWrJm+gVuY6a257dN1nYO/GAUoiU5IxsvkG3lpJQiLhmt4jssP3NHsfR0x9z7w wIdLd6gU2SPSyHnthLbVCyoWGnk+6+jFoAfcASvcrLrLEgE4WwIfv5LcKvkH6k7cAQ7a SQ0Q== X-Gm-Message-State: AOAM532HbkjkmG1OBQT5h/ITBhEEJrC5zdy4st+eEu9nuQqgbrT8ONd5 fiM37zav6+sSg2pc5YDltFdcbysbpIYAqG6Jk5ZixrpyY4ZyKXMHm5zx1U7ug5F8Yc9FysC2tjG VkB85NvaRD1p9KoJUq74l X-Received: by 2002:a1c:f008:: with SMTP id a8mr240911wmb.155.1599748069272; Thu, 10 Sep 2020 07:27:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzKTOp5Hy3Vn3E79mDz8zQYsrWs0Ax2MVghv7UMfnMtay+2L6Y5tzycQNV80Pv1LLr0gRyUsw== X-Received: by 2002:a1c:f008:: with SMTP id a8mr240883wmb.155.1599748068953; Thu, 10 Sep 2020 07:27:48 -0700 (PDT) Received: from localhost (91-166-131-130.subs.proxad.net. [91.166.131.130]) by smtp.gmail.com with ESMTPSA id p16sm9629910wro.71.2020.09.10.07.27.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Sep 2020 07:27:48 -0700 (PDT) Received: by localhost (Postfix, from userid 1000) id D8C731800938; Thu, 10 Sep 2020 16:27:45 +0200 (CEST) From: Dodji Seketeli To: Giuliano Procida Cc: libabigail@sourceware.org Subject: Re: [PATCH 2/2] writer: Avoid using dynamic hashing in type maps Organization: Red Hat / France References: <877dt3tjtp.fsf@redhat.com> <87r1rbrlds.fsf@redhat.com> X-Operating-System: Red Hat Enterprise Linux Workstation 7.8 Beta X-URL: http://www.redhat.com Date: Thu, 10 Sep 2020 16:27:45 +0200 In-Reply-To: <87r1rbrlds.fsf@redhat.com> (Dodji Seketeli's message of "Wed, 09 Sep 2020 17:59:59 +0200") Message-ID: <878sdhso4e.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0.002 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Sep 2020 14:27:57 -0000 Hello, Dodji Seketeli writes: > Hey, > > Giuliano Procida writes: > >> This probably fixes the hash/equality issue. At least, it makes no >> difference (for a simple stock kernel) on top of, or without, my change >> that switched to using plain pointers. > > Okay, that is good to know, thanks. > >> >> It does cause the XML writer to emit (more) XML elements with duplicate >> ids, perhaps where anonymous types are being identified very late in the >> process. You can see this with the following, before and after the chang= e. >> Admittedly, there are already about 400 ids that appear to be duplicated= ! >> >> find tests -name '*abi' | while read x; do sed -En "s;.* >> id=3D'([^']*)'.*;\1;p" $x | sort | uniq -c | grep -vw 1 | sort -nr | sed >> "s;^;$x: ;"; done > > > This is interesting. I went through most of the duplicated IDs reported > by your script. Thank you for putting this together. > > I have just filled an enhancement request for a potential tool =C3=A0 la > abilint (probably reusing abilint infrastructure) that would > analyze/sanitize abixml files to detect potential pathological cases for > these: https://sourceware.org/bugzilla/show_bug.cgi?id=3D26591. > > Most of the current redundant IDs in the tests/data/test-read-dwarf are > for array sub-ranges. They are are always sub-types of arrays in C and > C++. They have their own ID but we don't de-duplicate these at the > moment. So this is not an symptom of a problem per se. > > There are other cases that are mostly related to C++. I have listed > these in the enhancement request above. > > From manual inspection of the cases reported by your script, I haven't > noticed a case resulting from a type equality problem. If you do, I'd > be glad to look into. > > >> I'd still prefer less code to more code and to switch to making the XML >> writer not care about equality at all. > > I still think that the writer should try (as much as possible, but no > more) to ensure that types are emitted just once. In the past, it was > the first point where this was done. Then slowly, we built > de-duplication features throughout the stack. They are not perfect yet, > so I think we still need to ensuring that at the writer level. Much > less now, but we still do. > > Note that we do much less de-duplication at the level of the writer than > we used to do in the past. Hopefully at some point we won't need it > that anymore, and that point I'll be happy to remove that from the > writer. But I don't think that time has come yet. > > [...] > >>> --- a/src/abg-ir.cc >>> +++ b/src/abg-ir.cc >>> @@ -23085,6 +23085,45 @@ size_t >>> hash_type_or_decl(const type_or_decl_base_sptr& tod) >>> {return hash_type_or_decl(tod.get());} >>> >>> +/// Hash a type by either returning the pointer value of its canonical >>> +/// type or by returning a constant if the type doesn't have a >>> +/// canonical type. >>> +/// >>> +/// @param t the type to consider. >>> +/// >>> +/// @return the hash value. >>> +size_t >>> +hash_as_canonical_type_or_constant(const type_base *t) >>> +{ >>> + type_base *canonical_type =3D 0; >>> + >>> + if (t) >>> + canonical_type =3D t->get_naked_canonical_type(); >>> + >>> + if (!canonical_type) >>> + { >>> >> >> Is the following code definitely compatible with the definition of equal= ity >> or is there a chance you'll determine two different canonical types of >> definitions of declarations but equality would actually succeed? If you >> have any doubts, then it's safer to remove this. > > If two decl-only classes have definitions, then their definitions are > compared. > > So I do expect the following code to comply with the equality code, yes. > >> >> >>> + // If the type doesn't have a canonical type, maybe it's because >>> + // it's a declaration-only type? If that's the case, let's try >>> + // to get the canonical type of the definition of this >>> + // declaration. >>> + decl_base *decl =3D is_decl(t); >>> + if (decl >>> + && decl->get_is_declaration_only() >>> + && decl->get_naked_definition_of_declaration()) >>> + { >>> + type_base *definition =3D >>> + is_type(decl->get_naked_definition_of_declaration()); >>> + ABG_ASSERT(definition); >>> + canonical_type =3D definition->get_naked_canonical_type(); >>> + } >>> + } >>> + >>> + if (canonical_type) >>> + return reinterpret_cast(canonical_type); >>> + >>> + return 0xDEADBABE; >>> +} >>> + >>> /// Test if the pretty representation of a given @ref function_decl is >>> /// lexicographically less then the pretty representation of another >>> /// @ref function_decl. >>> diff --git a/src/abg-writer.cc b/src/abg-writer.cc >>> index 4c751c2..59060e1 100644 >>> --- a/src/abg-writer.cc >>> +++ b/src/abg-writer.cc >>> @@ -136,7 +136,7 @@ struct type_hasher >>> { >>> size_t >>> operator()(const type_base* t) const >>> - {return hash_type(t);} >>> + {return hash_as_canonical_type_or_constant(t);} >>> }; // end struct type_hasher >>> >>> /// A convenience typedef for a map that associates a pointer to type >>> diff --git a/tests/data/test-read-dwarf/test16-pr18904.so.abi >>> b/tests/data/test-read-dwarf/test16-pr18904.so.abi >>> index 0075a6e..377d322 100644 >>> >> >> Modulo my comments above, >> >> Signed-off-by: Giuliano Procida > > Thanks!=20 Applied to master at https://sourceware.org/git/?p=3Dlibabigail.git;a=3Dcom= mit;h=3D5cf2473d3ca3acdb5664313e33c4ced618cf023a. Cheers, --=20 =09=09Dodji