From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dodji@seketeli.org>
Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net
 [IPv6:2001:4b98:dc4:8::225])
 by sourceware.org (Postfix) with ESMTPS id 2046A394D82C
 for <libabigail@sourceware.org>; Tue,  3 May 2022 15:32:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2046A394D82C
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=seketeli.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=seketeli.org
Received: (Authenticated sender: dodji@seketeli.org)
 by mail.gandi.net (Postfix) with ESMTPSA id 2F1161C0013;
 Tue,  3 May 2022 15:32:53 +0000 (UTC)
Received: by localhost (Postfix, from userid 1000)
 id 6CB485802BD; Tue,  3 May 2022 17:32:53 +0200 (CEST)
From: Dodji Seketeli <dodji@seketeli.org>
To: "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org>
Subject: Re: [PATCH v2] ctf-reader: Add support to read CTF information from
 Linux kernel
Organization: Me, myself and I
References: <20220330153531.2769612-1-guillermo.e.martinez@oracle.com>
 <20220429141658.260012-1-guillermo.e.martinez@oracle.com>
X-Operating-System: Fedora 36
X-URL: http://www.seketeli.net/~dodji
Date: Tue, 03 May 2022 17:32:53 +0200
In-Reply-To: <20220429141658.260012-1-guillermo.e.martinez@oracle.com>
 (Guillermo E. Martinez via Libabigail's message of "Fri, 29 Apr 2022
 09:16:58 -0500")
Message-ID: <877d72fx7e.fsf@seketeli.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libabigail@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Mailing list of the Libabigail project <libabigail.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libabigail>,
 <mailto:libabigail-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libabigail/>
List-Help: <mailto:libabigail-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libabigail>,
 <mailto:libabigail-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2022 15:32:59 -0000

Hello Guillermo,

Thanks a lot for this patch!  I like it.

I do have some comments however.
Please find them below.

[...]
diff --git a/include/abg-ir.h b/include/abg-ir.h
index a2f4e1a7..033e3708 100644
--- a/include/abg-ir.h
+++ b/include/abg-ir.h
@@ -136,7 +136,16 @@ class environment
 public:
   struct priv;
   std::unique_ptr<priv> priv_;
+  /// The possible debug format types. Default is DWARF_FORMAT_TYPE
+  enum debug_format_type
+  {
+    DWARF_FORMAT_TYPE,
+#ifdef WITH_CTF
+    CTF_FORMAT_TYPE,
+#endif
+  };

I'd prefer abg-ir.h to stay agnostic from front-end considerations
such as the kind of input (DWARF, CTF etc) as much as possible.  Those
considerations are handled in abg-corpus.h with the
abigail::corpus::origin enum, which is currently defined as:

  /// This abstracts where the corpus comes from.  That is, either it
  /// has been read from the native xml format, from DWARF or built
  /// artificially using the library's API.
  enum origin
  {
    ARTIFICIAL_ORIGIN = 0,
    NATIVE_XML_ORIGIN,
    DWARF_ORIGIN,
    CTF_ORIGIN,
    LINUX_KERNEL_BINARY_ORIGIN
  };

You can modify it to make it be a bitmap defined as:

  enum origin
  {
    ARTIFICIAL_ORIGIN = 0,
    NATIVE_XML_ORIGIN = 1,
    DWARF_ORIGIN = 1 << 1,
    CTF_ORIGIN = 1 << 2,
    LINUX_KERNEL_BINARY_ORIGIN = 1 << 3
  };

That way, you can modify abg-{ctf-reader,dwarf-reader,ir}.cc so that
instead of saying:

    if (corp->get_origin() == corpus::LINUX_KERNEL_BINARY_ORIGIN)
      // blah

it would instead say:

    if (corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)
      // blah

or even:

    if (corp->get_origin()
	& corpus::LINUX_KERNEL_BINARY_ORIGIN
	& corpus:: CTF_ORIGIN)
      // blah

The different parts of the code that test for return of
corpus::get_origin() should be updated to take into account that the
value returned is now a bitmap.

Of course, in abg-ctf-reader.cc, the origin of the corpus would be set by doing:

    corp->set_origin(corpus::LINUX_KERNEL_BINARY_ORIGIN
		     | corpus::CTF_ORIGIN);

and in abg-dwarf-reader.cc, the origin of the corpus would be set by
doing: 

    ctxt.current_corpus()->set_origin(corpus::LINUX_KERNEL_BINARY_ORIGIN
				      | corpus::DWARF_ORIGIN);

[...]

+  debug_format_type debug_format_;

So, the abigail::ir::environment type should not have a debug_format_
data member.  If anything, all the data members are hidden in the
abigail::ir::environment::priv_, which is a pointer to
abigail::ir::environment::priv, defined in abg-ir-priv.h.  But I think
it's not neccessary to add anything there.  More on that below.

[...]

diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 2c6839cb..dcc65d4e 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc

[...]

+  void initialize(const string& elf_path, ir::environment *env)
   {

Please, add a doxygen comment to this function.

[...]

+std::string
+dic_type_key(ctf_dict_t *dic, ctf_id_t ctf_type)
+{

Likewise.

diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 0ef5e8b2..5eebcfa3 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -3165,7 +3165,8 @@ typedef unordered_map<interned_string,
 
 /// Default constructor of the @ref environment type.
 environment::environment()
-  :priv_(new priv)
+  :priv_(new priv),
+  debug_format_(DWARF_FORMAT_TYPE)
 {}

As said above, the debug_format_ data member should not be added to
the environment type.

[...]

diff --git a/src/abg-tools-utils.cc b/src/abg-tools-utils.cc
index 1f0f6fa8..faa7243f 100644
--- a/src/abg-tools-utils.cc
+++ b/src/abg-tools-utils.cc

[...]

@@ -2543,12 +2576,21 @@ build_corpus_group_from_kernel_dist_under(const string&	root,
   t.start();
   bool got_binary_paths =
     get_binary_paths_from_kernel_dist(root, debug_info_root, vmlinux, modules);
+#ifdef WITH_CTF
+  string vmlinux_ctfa;
+  if (got_binary_paths &&
+      env->get_debug_format_type() == environment::CTF_FORMAT_TYPE)
+    {
+      got_binary_paths = get_vmlinux_ctfa_path_from_kernel_dist(root, vmlinux_ctfa);
+      ABG_ASSERT(!vmlinux_ctfa.empty());
+    }
+#endif
+

I think, rather than calling env->get_debug_format_type()
build_corpus_group_from_kernel_dist_under can be passed an additional
'corpus_origin' parameter of type corpus::origin.  The parameter would
have corpus::DWARF_ORIGIN as default argument, for instance.

Also, I think the rest of this function that follows could be
encapsulated into two functions that would be called:

    maybe_load_vmlinux_dwarf_corpus()

This function would load the vmlinux corpus from the ELF information.
That function would contain the code that is inside the block:

+      if (env->get_debug_format_type() == environment::DWARF_FORMAT_TYPE)
+        {

The new condition inside the function would rather be something like:

    if (corpus_origin & corpus::DWARF_ORIGIN)
      {
         //blah
      }
The function wouldn't do anything if the origin of the corpus is not DWARF.


The second function would be this one:

maybe_load_vmlinux_ctf_corpus()

it's invocation would be protected by an #ifdef WITH_CTF and its body
would be inside an if-block that would read:

    if (corpus_origin & corpus::CTF_ORIGIN)
      {
        // blah

I hope this all makes sense.  If not, please do not hesitate to discuss
it.

Thanks again.

Cheers,

-- 
		Dodji