From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <maennich@google.com>
Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com
 [IPv6:2a00:1450:4864:20::443])
 by sourceware.org (Postfix) with ESMTPS id 5A34338708E6
 for <libabigail@sourceware.org>; Mon,  4 May 2020 13:21:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5A34338708E6
Received: by mail-wr1-x443.google.com with SMTP id g13so20877647wrb.8
 for <libabigail@sourceware.org>; Mon, 04 May 2020 06:21:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=ImQGzxRpIvuiGVeljsaRiAvMsMRuAwal1BOVcJVFpGw=;
 b=FO30GWUBU1PQfZMyfhxwAt9ta9nTtFzc3jHyEUwprsHbK6+fE/QOYnX7M2nZo98TyI
 LoXSaaqwcFcFbmWmQTgKUVeXoPGCD83Zp1urIajFis2mZ2S4BpePZF7y2nuh5Qc++R36
 3xpb94Xsa/9V6IU4y6EQxd7wQzfocTUnkUlS6k7EWEuo2qTNGiwvVAksbPnhqEu6rm6j
 CATRb93zJsNR8TZ0alDTEWlaEFTtPKLByAHP2BQ6VAyygxNuC88De6dLFc/QzUFPCbLg
 2rD8hhQ5/lSf4K4n2sYJ9ccvhD+V4aylhoV8yweP/Gx2Ho6UgyV5KrwQuriQmaOAVZRC
 hzXg==
X-Gm-Message-State: AGi0PubbGvPxWcs64Vjq80HYO1a0WiaZWBShfjMgMnOO1GQuBu+ktaVI
 ZrjS+k1eI6TB9tGBlwEK+6af9A==
X-Google-Smtp-Source: APiQypJ3X+/sj7lQIjYSoWSAkPZQG6flKVEt56/fUlFxf3yM2B8t9us0zEgESmIY4vWvZpg5tcnNkg==
X-Received: by 2002:a05:6000:12c7:: with SMTP id
 l7mr20056648wrx.239.1588598515887; 
 Mon, 04 May 2020 06:21:55 -0700 (PDT)
Received: from google.com ([2a00:79e0:d:210:e8f7:125b:61e9:733d])
 by smtp.gmail.com with ESMTPSA id n2sm18416917wrt.33.2020.05.04.06.21.54
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 04 May 2020 06:21:55 -0700 (PDT)
Date: Mon, 4 May 2020 15:21:54 +0200
From: Matthias Maennich <maennich@google.com>
To: Giuliano Procida <gprocida@google.com>
Cc: libabigail@sourceware.org, Dodji Seketeli <dodji@seketeli.org>,
 kernel-team@android.com
Subject: Re: [RESEND PATCH] reader context: do not reuse current corpus and
 corpus_group
Message-ID: <20200504132154.GA80772@google.com>
References: <20200430215256.19135-1-maennich@google.com>
 <CAGvU0HkH5a2FwDYSSfQ9qMYUztEBgYe-4BguyKb=e76+zY7m2A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
In-Reply-To: <CAGvU0HkH5a2FwDYSSfQ9qMYUztEBgYe-4BguyKb=e76+zY7m2A@mail.gmail.com>
X-Spam-Status: No, score=-41.3 required=5.0 tests=BAYES_00, DKIMWL_WL_MED,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,
 GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL,
 USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libabigail@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libabigail mailing list <libabigail.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/libabigail>,
 <mailto:libabigail-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libabigail/>
List-Help: <mailto:libabigail-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/libabigail>,
 <mailto:libabigail-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 04 May 2020 13:22:07 -0000

On Fri, May 01, 2020 at 03:23:05PM +0100, Giuliano Procida wrote:
>Hi.
>
>Does this change have any impact on incomplete (forward-declared) type
>differences?
>Does it significantly impact performance on large inputs?

This was a prerequisite change for sorting the corpus vector. But we
replace that by sorting the corpora much earlier. Still this is a useful
change for others coming here and hitting an issue with the reuse (as I
did).

>
>Otherwise, this change makes sense and
>Reviewed-by: Giuliano Procida <gprocida@google.com>
>
>More generally, do you have a good understanding of what in the
>context is generic to the tool run and what is specific to a file
>being read and can they be separated cleanly?

There is more opportunity to separate things a bit (also for potential
multithreaded execution). As it stands, the read_context is a very
central object in the dwarf reader that contains likely too much data
and carries too much responsibility. I am working on splitting out the
symtab reader part to address such a concern. (Also to make it testable
individually).

Cheers,
Matthias

>
>Giuliano.
>
>On Thu, 30 Apr 2020 at 22:53, Matthias Maennich <maennich@google.com> wrote:
>>
>> libabigail's readers (abg-reader and abg-dwarf-reader) currently spare
>> some allocations by reusing the reader context's existing current corpus
>> and current corpus group. When building a corpus_group's vector of
>> corpora, reusing the shared_ptr referring to a corpus means we are
>> modifying the corpus data of a previously read corpus. As a user of the
>> read*corpus functions, that isn't entirely transparent and when storing
>> corpare like in the vector above, we might introduce subtle bugs.
>>
>> Fix this by explicitly creating new corpus / corpus group instances when
>> reading from elf/dwarf or xml.
>>
>>         * src/abg-dwarf-reader.cc (read_debug_info_into_corpus): always
>>         instantiate a new corpus instance.
>>         * src/abg-reader.cc (read_corpus_from_input): Likewise.
>>         (read_corpus_group_from_input): always instantiate a new corpus
>>         group instance.
>>
>> Signed-off-by: Matthias Maennich <maennich@google.com>
>> ---
>>  src/abg-dwarf-reader.cc | 11 ++++-----
>>  src/abg-reader.cc       | 49 +++++++++++++++++------------------------
>>  2 files changed, 24 insertions(+), 36 deletions(-)
>>
>> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>> index 850281ad1ce5..d99d25bfb3b9 100644
>> --- a/src/abg-dwarf-reader.cc
>> +++ b/src/abg-dwarf-reader.cc
>> @@ -16068,13 +16068,10 @@ read_debug_info_into_corpus(read_context& ctxt)
>>  {
>>    ctxt.clear_per_corpus_data();
>>
>> -  if (!ctxt.current_corpus())
>> -    {
>> -      corpus_sptr corp (new corpus(ctxt.env(), ctxt.elf_path()));
>> -      ctxt.current_corpus(corp);
>> -      if (!ctxt.env())
>> -       ctxt.env(corp->get_environment());
>> -    }
>> +  corpus_sptr corp(new corpus(ctxt.env(), ctxt.elf_path()));
>> +  ctxt.current_corpus(corp);
>> +  if (!ctxt.env())
>> +    ctxt.env(corp->get_environment());
>>
>>    // First set some mundane properties of the corpus gathered from
>>    // ELF.
>> diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>> index 255a200f2a25..0be45ec59a39 100644
>> --- a/src/abg-reader.cc
>> +++ b/src/abg-reader.cc
>> @@ -25,6 +25,7 @@
>>  /// ABI Instrumentation file in libabigail native XML format.  This
>>  /// native XML format is named "abixml".
>>
>> +#include "abg-fwd.h"
>>  #include "config.h"
>>  #include <cstring>
>>  #include <cstdlib>
>> @@ -1899,17 +1900,14 @@ read_corpus_from_input(read_context& ctxt)
>>                                        BAD_CAST("abi-corpus")))
>>         return nil;
>>
>> -      if (!ctxt.get_corpus())
>> -       {
>> -         corpus_sptr c(new corpus(ctxt.get_environment(), ""));
>> -         ctxt.set_corpus(c);
>> -       }
>> +      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
>> +      ctxt.set_corpus(corp);
>>
>>        if (!ctxt.get_corpus_group())
>>         ctxt.clear_per_corpus_data();
>>
>> -      corpus& corp = *ctxt.get_corpus();
>> -      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
>> +      ctxt.set_exported_decls_builder(
>> +         corp->get_exported_decls_builder().get());
>>
>>        xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
>>        string path;
>> @@ -1917,13 +1915,13 @@ read_corpus_from_input(read_context& ctxt)
>>        if (path_str)
>>         {
>>           path = reinterpret_cast<char*>(path_str.get());
>> -         corp.set_path(path);
>> +         corp->set_path(path);
>>         }
>>
>>        xml::xml_char_sptr architecture_str =
>>         XML_READER_GET_ATTRIBUTE(reader, "architecture");
>>        if (architecture_str)
>> -       corp.set_architecture_name
>> +       corp->set_architecture_name
>>           (reinterpret_cast<char*>(architecture_str.get()));
>>
>>        xml::xml_char_sptr soname_str =
>> @@ -1933,7 +1931,7 @@ read_corpus_from_input(read_context& ctxt)
>>        if (soname_str)
>>         {
>>           soname = reinterpret_cast<char*>(soname_str.get());
>> -         corp.set_soname(soname);
>> +         corp->set_soname(soname);
>>         }
>>
>>        // Apply suppression specifications here to honour:
>> @@ -1955,32 +1953,29 @@ read_corpus_from_input(read_context& ctxt)
>>      }
>>    else
>>      {
>> -      if (!ctxt.get_corpus())
>> -       {
>> -         corpus_sptr c(new corpus(ctxt.get_environment(), ""));
>> -         ctxt.set_corpus(c);
>> -       }
>> +      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
>> +      ctxt.set_corpus(corp);
>>
>>        if (!ctxt.get_corpus_group())
>>         ctxt.clear_per_corpus_data();
>>
>> -      corpus& corp = *ctxt.get_corpus();
>> -      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
>> +      ctxt.set_exported_decls_builder(
>> +         corp->get_exported_decls_builder().get());
>>
>>        xml::xml_char_sptr path_str = XML_NODE_GET_ATTRIBUTE(node, "path");
>>        if (path_str)
>> -       corp.set_path(reinterpret_cast<char*>(path_str.get()));
>> +       corp->set_path(reinterpret_cast<char*>(path_str.get()));
>>
>>        xml::xml_char_sptr architecture_str =
>>         XML_NODE_GET_ATTRIBUTE(node, "architecture");
>>        if (architecture_str)
>> -       corp.set_architecture_name
>> +       corp->set_architecture_name
>>           (reinterpret_cast<char*>(architecture_str.get()));
>>
>>        xml::xml_char_sptr soname_str =
>>         XML_NODE_GET_ATTRIBUTE(node, "soname");
>>        if (soname_str)
>> -       corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
>> +       corp->set_soname(reinterpret_cast<char*>(soname_str.get()));
>>      }
>>
>>    if (!node->children)
>> @@ -2093,14 +2088,10 @@ read_corpus_group_from_input(read_context& ctxt)
>>                                    BAD_CAST("abi-corpus-group")))
>>      return nil;
>>
>> -  if (!ctxt.get_corpus_group())
>> -    {
>> -      corpus_group_sptr g(new corpus_group(ctxt.get_environment(),
>> -                                          ctxt.get_path()));
>> -      ctxt.set_corpus_group(g);
>> -    }
>> +  corpus_group_sptr group(
>> +      new corpus_group(ctxt.get_environment(), ctxt.get_path()));
>> +  ctxt.set_corpus_group(group);
>>
>> -  corpus_group_sptr group = ctxt.get_corpus_group();
>>    xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
>>    if (path_str)
>>      group->set_path(reinterpret_cast<char*>(path_str.get()));
>> @@ -2115,11 +2106,11 @@ read_corpus_group_from_input(read_context& ctxt)
>>
>>    corpus_sptr corp;
>>    while ((corp = read_corpus_from_input(ctxt)))
>> -    ctxt.get_corpus_group()->add_corpus(corp);
>> +    group->add_corpus(corp);
>>
>>    xmlTextReaderNext(reader.get());
>>
>> -  return ctxt.get_corpus_group();
>> +  return group;
>>  }
>>
>>  /// De-serialize an ABI corpus group from an input XML document which
>> --
>> 2.26.2.526.g744177e7f7-goog
>>