From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) by sourceware.org (Postfix) with ESMTPS id 888E9384A4A1 for ; Tue, 16 Jul 2024 14:55:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 888E9384A4A1 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=seketeli.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 888E9384A4A1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.70.183.201 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721141748; cv=none; b=ryfml24lDKxnd6SWnCgceJ3M42Hlrrv/sc3XrBS2MaFNk1ohkqmrF2nyqTiDWBh1VqBadfPUO1XLfUUQeZXXgw3gTFJEgV5c3PoTB9bwaldjonUxFXI3wRBs/4WXj71IeivnQjbAu3e9FI00hUFNO2Kt8AC8g3HS17942EF0OyA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721141748; c=relaxed/simple; bh=MjRZxiZfwyou1CZsKEQth7YzhJfzrErrFlTHyDvEHI4=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=ccxPKO2Yu2lFIHJPrDgVTQG2gNR3JDFY7FvajHpDKP4YMhkErNOXdDdzbTa+lgPImQOK+MJ6YIq+uQBXv03lPmRYxZ31o64chz3zwrhATY30iXzHeUmy/Iukg6bdHO81FNpqdwDZrQu7lfkcujtYydVgG91s9Agyl09iC/t93+M= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail.gandi.net (Postfix) with ESMTPSA id 19F241BF205; Tue, 16 Jul 2024 14:55:41 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id 380C2C1B7764; Tue, 16 Jul 2024 16:55:41 +0200 (CEST) From: dodji@redhat.com To: libabigail@sourceware.org Cc: dodji@redhat.com Subject: [PATCH 03/17] tools-utils,btf-reader: Take modules into account for corpus group Date: Tue, 16 Jul 2024 16:55:14 +0200 Message-ID: <20240716145541.473065-4-dodji@redhat.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240716145541.473065-1-dodji@redhat.com> References: <20240716145541.473065-1-dodji@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-GND-Sasl: dodj@seketeli.org X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Dodji Seketeli It turns out the BTF front-end won't take kernel modules into account when analyzing a linux kernel build tree. This is because it doesn't support split and base BTF objects. Please note that the BTF API is not yet documented in https://docs.kernel.org/bpf/btf.html, so my understanding of how it works comes the source code of the bpftool program available at https://github.com/libbpf/bpftool and https://github.com/libbpf/libbpf. When analyzing the vmlinux binary, the btf__parse function returns a "base BTF object" which contains the type information contained in the .BTF section of that binary. The memory of the BTF types describing the types of vmlinux lives in the base BTF object. So the base BTF object must be kept around for the entire time of the analysis of the Linux kernel tree (vmlinux + modules). When analyzing a kernel module however, it's the btf__parse_split function that is used. That function takes into parameter the "base BTF object" representing the type information of the corresponding vmlinux binary and returns a "split BTF object" which contains the type information contained in the .BTF section of the kernel module. Please note that the split BTF object does reference the base BTF object as well. The memory of the BTF types describing the types the kernel module lives in the split BTF object. Because some types and decls might be exported by a kernel module and used by another one, the memory of a given split BTF object might be needed to be kept alive for the entire time of the analysis of the Linux kernel tree as well. It's this base/split model of BTF that is not supported by the BTF front-end. This patch introduces support for that model. At any point in time, the BTF front-end now has one handle to the base BTF object and one handle to the current split BTF object if we are looking at kernel module. The base and split BTF objects are kept around for the entire lifetime of the front-end. For a given binary, each type is identified by a type ID. The validity of a given type ID is effective only for that binary. The same type ID in another binary might designate another type. So rather than maintaining a map that associates a type ID with a libabigail IR artifact, the front-end now maintains a map that associates a *type* with libabigail IR artifact. With this changes, abidw can now analyze and emit the ABIXML for a linux kernel tree in 28 seconds, using BTF. It can self-compare a kernel tree in 36 seconds. * src/abg-btf-reader.cc (btf_type_to_abi_artifact_map_type): Rename btf_type_id_to_abi_artifact_map_type into this. Make it associate a btf_type* to a libabigail IR, rather than a type ID to a libabigail IR. (reader::{base_btf_handle_, base_btf_file_name_, split_btf_handle_, split_btf_file_name_, split_btfs_to_free_}): Define new data members. (reader::{btf_handle_}): Remove. (reader::base_btf_handle): Renamed btf_handle into this. Create the BTF object using btf__parse. Store the name of the binary into reader::base_btf_file_name_. (reader::read_split_btf): Define new member function. (reader::btf_handle): This is new code for an existing member function name. Return the current split BTF object handle if applicable, otherwise create the base BTF object handle. (reader::btf_type_to_artifacts_): Renamed btf_type_id_to_artifacts into this. (reader::associate_artifact_to_btf_type): Renamed associate_artifact_to_btf_type_id into this. (reader::schedule_type_for_canonicalization): Do not schedule a type that has already been canonicalized. (reader::nr_btf_types): Take a BTF* parameter. (reader::~reader): Free all the allocated split BTF objects. (reader::read_debug_info_into_corpus): If we are looking at a split BTF, then set the first type ID as the one right after the last type ID of the base BTF. (reader::build_ir_node_from_btf_type): Clean-up the logic for building type void type. (reader::{build_ir_node_for_void_type, build_ir_node_for_void_pointer_type, build_ir_node_for_variadic_parameter_type, build_enum_underlying_type, build_array_type}): Schedule type for canonicalization at the end of the IR construction, not right away. * src/abg-tools-utils.cc (create_best_elf_based_reader): Pass show_all_types and linux_kernel_mode arguments to btf::create_reader. Signed-off-by: Dodji Seketeli --- src/abg-btf-reader.cc | 275 +++++++++++++++++++++++++++++++---------- src/abg-tools-utils.cc | 6 +- 2 files changed, 213 insertions(+), 68 deletions(-) diff --git a/src/abg-btf-reader.cc b/src/abg-btf-reader.cc index e4aca24f..ec13d267 100644 --- a/src/abg-btf-reader.cc +++ b/src/abg-btf-reader.cc @@ -53,33 +53,139 @@ btf_offset_to_string(const ::btf* btf, uint32_t offset) return btf__name_by_offset(btf, offset) ?: "(invalid string offset)"; } -/// A convenience typedef of a map that associates a btf type id to a -/// libabigail ABI artifact. -typedef std::unordered_map -btf_type_id_to_abi_artifact_map_type; +/// A convenience typedef of a map that associates a btf type to a +/// libabigail ABI artifact. The type is allocated inside a given btf +/// handle (of type ::btf*). All handles (one for each kernel binary) +/// should be kept around until a complete corpus group is built. +typedef std::unordered_map +btf_type_to_abi_artifact_map_type; /// The BTF front-end abstraction type. +/// +/// Note that one instance of front-end is meant to analyze one +/// vmlinux file and all its associated modules. For now, the +/// front-end doesn't know how to analyze a module without having +/// analyzed a vmlinux first. +/// +/// The BTF information of a vmlinux is parsed with the btf__parse +/// function. The result is called a "base BTF" handle. +/// +/// The BTF information of a kernel module is parsed with the function +/// btf__parse_split. The result is called a "split BTF" handle. A +/// split BTF handle references information that are in the base BTF +/// handle. The base BTF handle can be retrieved from a split BTF +/// handle using btf__base_btf. class reader : public elf_based_reader { - ::btf* btf_handle_ = nullptr; + ::btf* base_btf_handle_ = nullptr; + // The path to the binary that contains the base BTF information + // held in base_btf_handle_ + string base_btf_file_name_; + ::btf* split_btf_handle_ = nullptr; + // The path to the binary that contains the split BTF information + // held in split_btf_handle_ + string split_btf_file_name_; + // A vector of (split) BTF objects that are to be freed once the + // corpus group is built for an entire kernel (vmliunx + modules). + vector<::btf*> split_btfs_to_free_; translation_unit_sptr cur_tu_; vector types_to_canonicalize_; - btf_type_id_to_abi_artifact_map_type btf_type_id_to_artifacts_; + btf_type_to_abi_artifact_map_type btf_type_to_artifacts_; - /// Getter of the handle to the BTF data as returned by libbpf. + /// Getter of the handle to the base BTF object of the current + /// binary being analyzed. + /// + /// The base BTF object ALWAYS represents the BTF information of the + /// vmlinux binary, even if the current binary being analyzed is a + /// kernel module. /// - /// @return the handle to the BTF data as returned by libbpf. + /// @return handle to the base BTF object of the current binary + /// being analyzed. ::btf* - btf_handle() + base_btf_handle() { - if (btf_handle_ == nullptr) + if (base_btf_handle_ == nullptr) { - btf_handle_ = btf__parse(corpus_path().c_str(), nullptr); - if (!btf_handle_) - std::cerr << "Could not parse BTF information from file '" - << corpus_path().c_str() << "'" << std::endl; + base_btf_handle_ = btf__parse(corpus_path().c_str(), nullptr); + if (!base_btf_handle_) + { + std::cerr << "Could not parse base BTF information from file '" + << corpus_path().c_str() << "'" << std::endl; + return nullptr; + } + base_btf_file_name_ = corpus_path(); } - return btf_handle_; + return base_btf_handle_; + } + + /// Read the BTF information of the current binary which path is + /// @ref fe_iface::corpus_path() and return its associated object + /// handle. This is called the split BTF object. + /// + /// Note that this function expects the base BTF object (the one for + /// the vmlinux binary) to be already present, otherwise, it returns + /// nullptr. + /// + /// @return the split BTF object for the file designed by + /// fe_iface::corpus_path(). + ::btf* + read_split_btf() + { + if (!base_btf_handle_) + { + std::cerr << "Base BTF information not present. " + << "Not attempting to parse split BTF information" + << std::endl; + return nullptr; + } + + if (corpus_path().empty() || corpus_path() == base_btf_file_name_) + { + std::cerr << "BTF reader not initialized with split file name. " + << "Not attending to read split BTF information" + << std::endl; + return nullptr; + } + + split_btf_handle_ = btf__parse_split(corpus_path().c_str(), + base_btf_handle()); + if (!split_btf_handle_) + { + std::cerr << "Could not read split BTF information from file " + << corpus_path() << std::endl; + return nullptr; + } + split_btf_file_name_ = corpus_path(); + + return split_btf_handle_; + } + + /// Getter of the handle to the BTF object as returned by libbpf. + /// + /// This returns the handle to the current BTF object. If the + /// current BTF object is for a vmlinux binary, then it's the base + /// BTF object that is returned. Otherwise, if the current BTF + /// object if for a kernel module then it's the split BTF object + /// that is returned. + /// + /// @return the handle to the BTF object of the current binary being + /// analyeed by this front-end. + ::btf* + btf_handle() + { + if (split_btf_handle_) + return split_btf_handle_; + + if (!base_btf_handle_) + return base_btf_handle(); + + if (corpus_path() != base_btf_file_name_) + // The reader was re-initialized with a corpus_path that is + // different from the the BTF base file. That means we are + // instructed to read a split BTF file information. + return read_split_btf(); + + return base_btf_handle(); } /// Getter of the environment of the current front-end. @@ -129,57 +235,60 @@ class reader : public elf_based_reader cur_tu(const translation_unit_sptr& tu) {cur_tu_ = tu;} - /// Getter of the map that associates a BTF type ID to an ABI - /// artifact. + /// Getter of the map that associates a BTF type to the internal + /// representation of an ABI artifact. /// - /// @return The map that associates a BTF type ID to an ABI + /// @return The map that associates a BTF type to the IR of an ABI /// artifact. - btf_type_id_to_abi_artifact_map_type& - btf_type_id_to_artifacts() - {return btf_type_id_to_artifacts_;} + btf_type_to_abi_artifact_map_type& + btf_type_to_artifacts() + {return btf_type_to_artifacts_;} - /// Getter of the map that associates a BTF type ID to an ABI + /// Getter of the map that associates a BTF type to the IR of an ABI /// artifact. /// - /// @return The map that associates a BTF type ID to an ABI + /// @return The map that associates a BTF type to the IR of an ABI /// artifact. - const btf_type_id_to_abi_artifact_map_type& - btf_type_id_to_artifacts() const - {return btf_type_id_to_artifacts_;} + const btf_type_to_abi_artifact_map_type& + btf_type_to_artifacts() const + {return btf_type_to_artifacts_;} - /// Get the ABI artifact that is associated to a given BTF type ID. + /// Get the IR of the ABI artifact that is associated to a given BTF + /// type. /// - /// If no ABI artifact is associated to the BTF type id, then return - /// nil. + /// If no ABI artifact is associated to the BTF type, then return + /// nullptr. /// - /// @return the ABI artifact that is associated to a given BTF type - /// id. + /// @return the ABI artifact that is associated to a given BTF type. type_or_decl_base_sptr - lookup_artifact_from_btf_id(int btf_id) + lookup_artifact_from_btf_type(const btf_type* t) { - auto i = btf_type_id_to_artifacts().find(btf_id); - if (i != btf_type_id_to_artifacts().end()) + auto i = btf_type_to_artifacts().find(t); + if (i != btf_type_to_artifacts().end()) return i->second; return type_or_decl_base_sptr(); } - /// Associate an ABI artifact to a given BTF type ID. + /// Associate an ABI artifact to a given BTF type. /// /// @param artifact the ABI artifact to consider. /// - /// @param btf_type_id the BTF type ID to associate to @p artifact. + /// @param btf_type_id the BTF type to associate to @p artifact. void - associate_artifact_to_btf_type_id(const type_or_decl_base_sptr& artifact, - int btf_type_id) - {btf_type_id_to_artifacts()[btf_type_id] = artifact;} + associate_artifact_to_btf_type(const type_or_decl_base_sptr& artifact, + const btf_type* t) + {btf_type_to_artifacts()[t] = artifact;} /// Schecule a type for canonicalization at the end of the debug /// info loading. /// /// @param t the type to schedule. void - schedule_type_for_canonocalization(const type_base_sptr& t) - {types_to_canonicalize_.push_back(t);} + schedule_type_for_canonicalization(const type_base_sptr& t) + { + if (t && !t->get_naked_canonical_type()) + types_to_canonicalize_.push_back(t); + } /// Canonicalize all the types scheduled for canonicalization using /// abigail::ir::canonicalize_types() which performs some sanity @@ -193,8 +302,13 @@ class reader : public elf_based_reader {return *i;}); } + /// Getter of the number of types carried by a given BTF object. + /// + /// @param handle the BTF object to consider. + /// + /// @return the number of types carried by a given BTF object. uint64_t - nr_btf_types() const + nr_btf_types(const ::btf* handle) const { #ifdef WITH_BTF__GET_NR_TYPES #define GET_NB_TYPES btf__get_nr_types @@ -210,7 +324,7 @@ class reader : public elf_based_reader return 0; #endif - return GET_NB_TYPES(const_cast(this)->btf_handle()); + return GET_NB_TYPES(handle); } protected: @@ -238,12 +352,21 @@ protected: bool load_all_types, bool linux_kernel_mode) { - btf__free(btf_handle_); - btf_handle_ = nullptr; + if (split_btf_handle_) + { + // We need to keep this split_btf_handle_ on the side so that + // we can free it when we are done analyzing all the kernel + // modules. We cannot free it right now because the memory of + // all btf types lives in it. + split_btfs_to_free_.push_back(split_btf_handle_); + split_btf_handle_ = nullptr; + } + + split_btf_file_name_.clear(); types_to_canonicalize_.clear(); - btf_type_id_to_artifacts_.clear(); cur_tu_.reset(); elf_based_reader::initialize(elf_path, debug_info_root_paths); + corpus_path(elf_path); options().load_all_types = load_all_types; options().load_in_linux_kernel_mode = linux_kernel_mode; } @@ -308,8 +431,12 @@ public: /// Destructor of the btf::reader type. ~reader() { - btf__free(btf_handle_); - btf_handle_ = nullptr; + for (auto b : split_btfs_to_free_) + btf__free(b); + btf__free(split_btf_handle_); + btf__free(base_btf_handle_); + split_btf_handle_ = nullptr; + base_btf_handle_ = nullptr; } /// Read the ELF information as well as the BTF type information to @@ -359,8 +486,24 @@ public: corpus()->add(artificial_tu); cur_tu(artificial_tu); - int number_of_types = nr_btf_types(); + int number_of_types = nr_btf_types(btf_handle()); int first_type_id = 1; + // Are we looking at the BTF for a kernel module? + const ::btf* base = btf__base_btf(btf_handle()); + if (base) + { + // So, base is non-nil. This means we are looking at the BTF + // for a kernel module and base points to the BTF for the + // corresponding vmlinux. That base BTF should be the same as + // base_btf_handle(). + ABG_ASSERT(base == base_btf_handle()); + + // The ID of the first type that is contained in this BTF + // representing a kernel module is the number of types + // contained in the base BTF (i.e, the BTF for the vmlinux + // binary). + first_type_id = nr_btf_types(base); + } // Let's cycle through whatever is described in the BTF section // and emit libabigail IR for it. @@ -421,13 +564,10 @@ public: type_or_decl_base_sptr result; const btf_type *t = nullptr; - if ((result = lookup_artifact_from_btf_id(type_id))) - return result; + t = btf__type_by_id(btf_handle(), type_id); - if (type_id == 0) - result = build_ir_node_for_void_type(); - else - t = btf__type_by_id(btf_handle(), type_id); + if ((result = lookup_artifact_from_btf_type(t))) + return result; if (!result) { @@ -436,6 +576,11 @@ public: switch(type_kind) { + case BTF_KIND_UNKN/* Unknown: This is really for the void + type. */: + result = build_ir_node_for_void_type(); + break; + case BTF_KIND_INT/* Integer */: result = build_int_type(type_id); break; @@ -500,8 +645,6 @@ public: #endif case BTF_KIND_DATASEC/* Section */: break; - case BTF_KIND_UNKN/* Unknown */: - break; default: ABG_ASSERT_NOT_REACHED; break; @@ -511,9 +654,9 @@ public: add_decl_to_scope(is_decl(result), cur_tu()->get_global_scope()); if (type_base_sptr type = is_type(result)) - schedule_type_for_canonocalization(type); + schedule_type_for_canonicalization(type); - associate_artifact_to_btf_type_id(result, type_id); + associate_artifact_to_btf_type(result, t); if (function_decl_sptr fn = is_function_decl(result)) add_fn_to_exported_or_undefined_decls(fn.get()); @@ -531,7 +674,7 @@ public: { type_base_sptr t = env().get_void_type(); add_decl_to_scope(is_decl(t), cur_tu()->get_global_scope()); - canonicalize(t); + schedule_type_for_canonicalization(t); return t; } @@ -543,7 +686,7 @@ public: { type_base_sptr t = env().get_void_pointer_type(); add_decl_to_scope(is_decl(t), cur_tu()->get_global_scope()); - canonicalize(t); + schedule_type_for_canonicalization(t); return t; } @@ -556,7 +699,7 @@ public: type_base_sptr t = env().get_variadic_parameter_type(); add_decl_to_scope(is_decl(t), cur_tu()->get_global_scope()); decl_base_sptr t_decl = get_type_declaration(t); - canonicalize(t); + schedule_type_for_canonicalization(t); return t; } @@ -649,7 +792,7 @@ public: result->set_is_anonymous(is_anonymous); result->set_is_artificial(true); add_decl_to_scope(result, cur_tu()->get_global_scope()); - canonicalize(result); + schedule_type_for_canonicalization(result); return result; } @@ -806,7 +949,7 @@ public: subrange->is_non_finite(!arr->nelems); subrange->set_size_in_bits(cur_tu()->get_address_size()); add_decl_to_scope(subrange, cur_tu()->get_global_scope()); - canonicalize(subrange); + schedule_type_for_canonicalization(subrange); array_type_def::subranges_type subranges = {subrange}; array_type_def_sptr result(new array_type_def(underlying_type, subranges, location())); @@ -900,7 +1043,7 @@ public: add_decl_to_scope(result, cur_tu()->get_global_scope()); - associate_artifact_to_btf_type_id(result, type_id); + associate_artifact_to_btf_type(result, t); // For defined classes and unions, add data members to the type // being built. @@ -963,7 +1106,7 @@ public: /*alignment=*/0)); result->set_return_type(return_type); - associate_artifact_to_btf_type_id(result, type_id); + associate_artifact_to_btf_type(result, t); uint16_t nb_parms = btf_vlen(t); const struct btf_param* parm = diff --git a/src/abg-tools-utils.cc b/src/abg-tools-utils.cc index d24fdc61..c42073cb 100644 --- a/src/abg-tools-utils.cc +++ b/src/abg-tools-utils.cc @@ -3230,7 +3230,8 @@ create_best_elf_based_reader(const string& elf_file_path, { #ifdef WITH_BTF if (file_has_btf_debug_info(elf_file_path, debug_info_root_paths)) - result = btf::create_reader(elf_file_path, debug_info_root_paths, env); + result = btf::create_reader(elf_file_path, debug_info_root_paths, env, + show_all_types, linux_kernel_mode); #endif } else @@ -3249,7 +3250,8 @@ create_best_elf_based_reader(const string& elf_file_path, && file_has_btf_debug_info(elf_file_path, debug_info_root_paths)) // The file has BTF debug info and no BTF, let's use the BTF // front-end even if it wasn't formally requested by the user. - result = btf::create_reader(elf_file_path, debug_info_root_paths, env); + result = btf::create_reader(elf_file_path, debug_info_root_paths, env, + show_all_types, linux_kernel_mode); #endif } -- 2.43.5