* [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member @ 2022-08-31 15:16 Guillermo E. Martinez 2022-08-31 15:16 ` Guillermo E. Martinez ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Guillermo E. Martinez @ 2022-08-31 15:16 UTC (permalink / raw) To: libabigail; +Cc: Guillermo E. Martinez Hello, This patch improves the ABI XML file generated by ctf reader, there are Linux symbols (EXPORT_SYMBOL*) that were missing. Comments will be grateful and appreciated!. Thanks in advanced, guillermo -- The current mechanism used by the ctf reader to looking for debug information given a specific Linux symbol, is open the dictionary (default) which the name match with the binary name being processing in the current corpus, e.g. `vmlinux' or `module-name.ko'. However there are symbol information is not located in a default dictionary, this is evident comparing the symbols in `Module.symvers' file with ABI XML file, so for example, the ctf reader is expecting to find the information for `LZ4_decompress_fast' symbol in the CTF `vmlinux' archive member, because this symbols is defined in `vmlinux' binary: 0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL But it figures out that it is missing. The correct location is `vmlinux#0' dictionary. CTF archive member: vmlinux: ... Function objects: ... CTF archive member: vmlinux#0: Function objects: ... LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8) ... Therefore, ctf reader is looking for debug information in the whole archive, fortunately `libctf' provides a fast lookup mechanism using cache, dictionary references, etc., so the penalty performance is ~10%. * src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function. (process_ctf_archive): Use `lookup_symbol_in_ctf_archive'. Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com> --- src/abg-ctf-reader.cc | 72 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 64 insertions(+), 8 deletions(-) diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc index 71808f9a..8fa98a94 100644 --- a/src/abg-ctf-reader.cc +++ b/src/abg-ctf-reader.cc @@ -1204,6 +1204,62 @@ lookup_type(read_context *ctxt, corpus_sptr corp, return result; } +/// Given a symbol name, lookup the corresponding CTF information in +/// the default dictionary (CTF archive member provided by the caller) +/// If the search is not success, the looks for the symbol name +/// in _all_ archive members. +/// +/// @param ctfa the CTF archive. +/// @param dict the default dictionary to looks for. +/// @param sym_name the symbol name. +/// @param corp the IR corpus. +/// +/// Note that if @ref sym_name is found in other than default dictionary +/// @ref ctf_dict will be updated and it must be explicate closed by its +/// caller. +/// +/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise. + +static ctf_id_t +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict, + const char *sym_name, corpus_sptr corp) +{ + int ctf_err; + ctf_dict_t *dict = *ctf_dict; + ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name); + + /* lookup CTF type for a given symbol in its default + dictionary */ + if (ctf_type == (ctf_id_t) -1 + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) + ctf_type = ctf_lookup_by_symbol_name(dict, sym_name); + + /* Not lucky, then, search in whole archive */ + if (ctf_type == (ctf_id_t) -1) + { + ctf_dict_t *fp; + ctf_next_t *i = NULL; + const char *arcname; + + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL) + { + ctf_type = ctf_lookup_variable (fp, sym_name); + if (ctf_type == (ctf_id_t) -1 + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) + ctf_type = ctf_lookup_by_symbol_name(fp, sym_name); + + if (ctf_type != (ctf_id_t) -1) + { + *ctf_dict = fp; + break; + } + ctf_dict_close(fp); + } + } + + return ctf_type; +} + /// Process a CTF archive and create libabigail IR for the types, /// variables and function declarations found in the archive, iterating /// over public symbols. The IR is added to the given corpus. @@ -1222,7 +1278,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) corp->add(ir_translation_unit); int ctf_err; - ctf_dict_t *ctf_dict; + ctf_dict_t *ctf_dict, *dict_tmp; const auto symtab = ctxt->symtab; symtab_reader::symtab_filter filter = symtab->make_filter(); filter.set_public_symbols(); @@ -1248,19 +1304,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) abort(); } + dict_tmp = ctf_dict; + for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter)) { std::string sym_name = symbol->get_name(); ctf_id_t ctf_sym_type; - ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str()); - if (ctf_sym_type == (ctf_id_t) -1 - && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) - // lookup in function objects - ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str()); - + ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict, + sym_name.c_str(), corp); if (ctf_sym_type == (ctf_id_t) -1) - continue; + continue; if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION) { @@ -1305,6 +1359,8 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) func_declaration->set_is_in_public_symbol_table(true); ctxt->maybe_add_fn_to_exported_decls(func_declaration.get()); } + + ctf_dict = dict_tmp; } ctf_dict_close(ctf_dict); -- 2.35.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member 2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez @ 2022-08-31 15:16 ` Guillermo E. Martinez 2022-09-06 12:49 ` Dodji Seketeli 2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez 2 siblings, 0 replies; 6+ messages in thread From: Guillermo E. Martinez @ 2022-08-31 15:16 UTC (permalink / raw) To: libabigail Hello, This patch improves the ABI XML file generated by ctf reader, there are Linux symbols (EXPORT_SYMBOL*) that were missing. Comments will be grateful and appreciated!. Thanks in advanced, guillermo -- The current mechanism used by the ctf reader to looking for debug information given a specific Linux symbol, is open the dictionary (default) which the name match with the binary name being processing in the current corpus, e.g. `vmlinux' or `module-name.ko'. However there are symbol information is not located in a default dictionary, this is evident comparing the symbols in `Module.symvers' file with ABI XML file, so for example, the ctf reader is expecting to find the information for `LZ4_decompress_fast' symbol in the CTF `vmlinux' archive member, because this symbols is defined in `vmlinux' binary: 0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL But it figures out that it is missing. The correct location is `vmlinux#0' dictionary. CTF archive member: vmlinux: ... Function objects: ... CTF archive member: vmlinux#0: Function objects: ... LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8) ... Therefore, ctf reader is looking for debug information in the whole archive, fortunately `libctf' provides a fast lookup mechanism using cache, dictionary references, etc., so the penalty performance is ~10%. * src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function. (process_ctf_archive): Use `lookup_symbol_in_ctf_archive'. Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com> --- src/abg-ctf-reader.cc | 72 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 64 insertions(+), 8 deletions(-) diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc index 71808f9a..8fa98a94 100644 --- a/src/abg-ctf-reader.cc +++ b/src/abg-ctf-reader.cc @@ -1204,6 +1204,62 @@ lookup_type(read_context *ctxt, corpus_sptr corp, return result; } +/// Given a symbol name, lookup the corresponding CTF information in +/// the default dictionary (CTF archive member provided by the caller) +/// If the search is not success, the looks for the symbol name +/// in _all_ archive members. +/// +/// @param ctfa the CTF archive. +/// @param dict the default dictionary to looks for. +/// @param sym_name the symbol name. +/// @param corp the IR corpus. +/// +/// Note that if @ref sym_name is found in other than default dictionary +/// @ref ctf_dict will be updated and it must be explicate closed by its +/// caller. +/// +/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise. + +static ctf_id_t +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict, + const char *sym_name, corpus_sptr corp) +{ + int ctf_err; + ctf_dict_t *dict = *ctf_dict; + ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name); + + /* lookup CTF type for a given symbol in its default + dictionary */ + if (ctf_type == (ctf_id_t) -1 + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) + ctf_type = ctf_lookup_by_symbol_name(dict, sym_name); + + /* Not lucky, then, search in whole archive */ + if (ctf_type == (ctf_id_t) -1) + { + ctf_dict_t *fp; + ctf_next_t *i = NULL; + const char *arcname; + + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL) + { + ctf_type = ctf_lookup_variable (fp, sym_name); + if (ctf_type == (ctf_id_t) -1 + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) + ctf_type = ctf_lookup_by_symbol_name(fp, sym_name); + + if (ctf_type != (ctf_id_t) -1) + { + *ctf_dict = fp; + break; + } + ctf_dict_close(fp); + } + } + + return ctf_type; +} + /// Process a CTF archive and create libabigail IR for the types, /// variables and function declarations found in the archive, iterating /// over public symbols. The IR is added to the given corpus. @@ -1222,7 +1278,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) corp->add(ir_translation_unit); int ctf_err; - ctf_dict_t *ctf_dict; + ctf_dict_t *ctf_dict, *dict_tmp; const auto symtab = ctxt->symtab; symtab_reader::symtab_filter filter = symtab->make_filter(); filter.set_public_symbols(); @@ -1248,19 +1304,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) abort(); } + dict_tmp = ctf_dict; + for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter)) { std::string sym_name = symbol->get_name(); ctf_id_t ctf_sym_type; - ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str()); - if (ctf_sym_type == (ctf_id_t) -1 - && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) - // lookup in function objects - ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str()); - + ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict, + sym_name.c_str(), corp); if (ctf_sym_type == (ctf_id_t) -1) - continue; + continue; if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION) { @@ -1305,6 +1359,8 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) func_declaration->set_is_in_public_symbol_table(true); ctxt->maybe_add_fn_to_exported_decls(func_declaration.get()); } + + ctf_dict = dict_tmp; } ctf_dict_close(ctf_dict); -- 2.35.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member 2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez 2022-08-31 15:16 ` Guillermo E. Martinez @ 2022-09-06 12:49 ` Dodji Seketeli 2022-09-07 18:40 ` Guillermo E. Martinez 2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez 2 siblings, 1 reply; 6+ messages in thread From: Dodji Seketeli @ 2022-09-06 12:49 UTC (permalink / raw) To: Guillermo E. Martinez via Libabigail; +Cc: Guillermo E. Martinez Hello Guillermo, Thanks for the patch. I have tested and it seems to pass regression testing on my system. However, there are some things that I don't understand so I have some questions below. The questions are just for my own understanding. I don't have anything major against the patch, obviously. [...] "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> a écrit: [...] > +/// Given a symbol name, lookup the corresponding CTF information in > +/// the default dictionary (CTF archive member provided by the caller) > +/// If the search is not success, the looks for the symbol name > +/// in _all_ archive members. > +/// > +/// @param ctfa the CTF archive. > +/// @param dict the default dictionary to looks for. > +/// @param sym_name the symbol name. > +/// @param corp the IR corpus. > +/// > +/// Note that if @ref sym_name is found in other than default dictionary > +/// @ref ctf_dict will be updated and it must be explicate closed by its > +/// caller. > +/// > +/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise. > + > +static ctf_id_t > +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict, > + const char *sym_name, corpus_sptr corp) > +{ > + int ctf_err; > + ctf_dict_t *dict = *ctf_dict; > + ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name); So, here, we begin by looking for a variable (using ctf_lookup_variable) which ELF symbol is sym_name, is that correct? > + > + /* lookup CTF type for a given symbol in its default > + dictionary */ > + if (ctf_type == (ctf_id_t) -1 So, I guess the variable lookup failed, right? > + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) Why this condition? Why only considering cases where we are not looking at a Linux Kernel binary? I would think that we would want to consider the case where the variable lookup failed, even in the case of a Linux Kernel binary, wouldn't we? If not why? Maybe we should add a comment to explain this. > + ctf_type = ctf_lookup_by_symbol_name(dict, sym_name); So I am guessing that ctf_lookup_by_symbol_name looks up both variable and function symbols from the same dictionary, is that correct? Also, I don't understand why we don't just use ctf_lookup_by_symbol_name rather than starting with ctf_lookup_variable first. Is it a performance things? Incidentally, I haven't found documentation for the lookup functions other than by looking at the code, in say: https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=libctf/ctf-lookup.c;hb=refs/heads/master. If there is documentation for it somewhere else, maybe we can link that place in the code here in a comment somewhere, or we can just point to that link above. Both would be fine by me. > + > + /* Not lucky, then, search in whole archive */ > + if (ctf_type == (ctf_id_t) -1) > + { > + ctf_dict_t *fp; > + ctf_next_t *i = NULL; > + const char *arcname; > + > + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL) > + { > + ctf_type = ctf_lookup_variable (fp, sym_name); > + if (ctf_type == (ctf_id_t) -1 > + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) The same questions as above. > + ctf_type = ctf_lookup_by_symbol_name(fp, sym_name); > + > + if (ctf_type != (ctf_id_t) -1) > + { > + *ctf_dict = fp; > + break; > + } > + ctf_dict_close(fp); > + } > + } > + > + return ctf_type; > +} > + Cheers, [...] -- Dodji ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member 2022-09-06 12:49 ` Dodji Seketeli @ 2022-09-07 18:40 ` Guillermo E. Martinez 0 siblings, 0 replies; 6+ messages in thread From: Guillermo E. Martinez @ 2022-09-07 18:40 UTC (permalink / raw) To: Dodji Seketeli, Guillermo E. Martinez via Libabigail On 9/6/22 07:49, Dodji Seketeli wrote: > Hello Guillermo, Hello Dodji, > Thanks for the patch. I have tested and it seems to pass regression > testing on my system. However, there are some things that I don't > understand so I have some questions below. The questions are just for > my own understanding. I don't have anything major against the patch, > obviously. > > [...] > > "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> a > écrit: > > > [...] > >> +/// Given a symbol name, lookup the corresponding CTF information in >> +/// the default dictionary (CTF archive member provided by the caller) >> +/// If the search is not success, the looks for the symbol name >> +/// in _all_ archive members. >> +/// >> +/// @param ctfa the CTF archive. >> +/// @param dict the default dictionary to looks for. >> +/// @param sym_name the symbol name. >> +/// @param corp the IR corpus. >> +/// >> +/// Note that if @ref sym_name is found in other than default dictionary >> +/// @ref ctf_dict will be updated and it must be explicate closed by its >> +/// caller. >> +/// >> +/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise. >> + >> +static ctf_id_t >> +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict, >> + const char *sym_name, corpus_sptr corp) >> +{ >> + int ctf_err; >> + ctf_dict_t *dict = *ctf_dict; >> + ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name); > > So, here, we begin by looking for a variable (using ctf_lookup_variable) > which ELF symbol is sym_name, is that correct? That's correct, `sym_name' is the symbol name. >> + >> + /* lookup CTF type for a given symbol in its default >> + dictionary */ >> + if (ctf_type == (ctf_id_t) -1 > > So, I guess the variable lookup failed, right? Correct, libctf `ctf_lookup_*' functions return CTF_ERR when fails, so I'm goinf to changed it for clarity. >> + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) > > Why this condition? Why only considering cases where we are not looking > at a Linux Kernel binary? I would think that we would want to consider > the case where the variable lookup failed, even in the case of a Linux > Kernel binary, wouldn't we? If not why? Maybe we should add a comment > to explain this. OK. The linker (ld) in the Kenel build mechanism uses: `--ctf-variables', then it emits the symbols type definitions using just the CTF Variable ection: $ objdump --ctf foo ... Labels: Data objects: Function objects: Variables: main -> 0x2: (kind 5) int (*) () (aligned at 0x8) main_func -> 0x4: (kind 5) void (*) () (aligned at 0x8) okkk -> 0x1: (kind 1) int (format 0x1) (size 0x4) (aligned at 0x4) Otherwise, it must be splitted across CTF Data, Function and Variable sections: $ objdump --ctf foo.o Data objects: okkk -> 0x1: (kind 1) int (format 0x1) (size 0x4) (aligned at 0x4) Function objects: main -> 0x2: (kind 5) int (*) () (aligned at 0x8) main_func -> 0x4: (kind 5) void (*) () (aligned at 0x8) Variables: okkk -> 0x1: (kind 1) int (format 0x1) (size 0x4) (aligned at 0x4) Since, vmlinux + *.ko, is *big* binary, I arranged the order of CTF lookup functions invoking at first: 'ctf_lookup_variable` and then, if it fails `ctf_lookup_by_symbol_name' by performance reasons. But I'm agree to remove `!(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))' changing the invocation order for those functions, the penalty performance was less than 10s building the ABI representation for the kernel, I consider it as acceptable. >> + ctf_type = ctf_lookup_by_symbol_name(dict, sym_name); > > So I am guessing that ctf_lookup_by_symbol_name looks up both variable > and function symbols from the same dictionary, is that correct? True. > Also, I don't understand why we don't just use ctf_lookup_by_symbol_name > rather than starting with ctf_lookup_variable first. Is it a > performance things? Exactly. Performance when we are processing a Linux tree directory. > Incidentally, I haven't found documentation for the lookup functions > other than by looking at the code, in say: > https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=libctf/ctf-lookup.c;hb=refs/heads/master. I'm afraid that the documentation is just in the source code. > If there is documentation for it somewhere else, maybe we can link that > place in the code here in a comment somewhere, or we can just point to > that link above. Both would be fine by me. > >> + >> + /* Not lucky, then, search in whole archive */ >> + if (ctf_type == (ctf_id_t) -1) >> + { >> + ctf_dict_t *fp; >> + ctf_next_t *i = NULL; >> + const char *arcname; >> + >> + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL) >> + { >> + ctf_type = ctf_lookup_variable (fp, sym_name); >> + if (ctf_type == (ctf_id_t) -1 >> + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) > > The same questions as above. > >> + ctf_type = ctf_lookup_by_symbol_name(fp, sym_name); >> + >> + if (ctf_type != (ctf_id_t) -1) >> + { >> + *ctf_dict = fp; >> + break; >> + } >> + ctf_dict_close(fp); >> + } >> + } >> + >> + return ctf_type; >> +} >> + > > Cheers, > > [...] > > Really thanks for your comments!, I will prepare the v2 Kind regards, guillermo ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCHv v2] ctf-reader: Lookup debug info for symbols in a non default archive member 2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez 2022-08-31 15:16 ` Guillermo E. Martinez 2022-09-06 12:49 ` Dodji Seketeli @ 2022-09-07 23:40 ` Guillermo E. Martinez 2022-09-13 9:26 ` Dodji Seketeli 2 siblings, 1 reply; 6+ messages in thread From: Guillermo E. Martinez @ 2022-09-07 23:40 UTC (permalink / raw) To: libabigail; +Cc: Guillermo E. Martinez Hello, This patch v2 to improves the ABI XML file generated by ctf reader, there are Linux symbols (EXPORT_SYMBOL*) that were missing. Changes from v1: - Change order for `ctf_lookup_*' to at first looks symbol function types in `CTF Function section', and afterwards if is not success try in `CTF Variable section'. - Add comments describing use of `ctf_lookup_variable'. Comments will be grateful and appreciated!. Thanks in advanced, guillermo -- The current mechanism used by the ctf reader to looking for debug information given a specific Linux symbol, it opens the dictionary (default) which the name match with the binary name being processing in the current corpus, e.g. `vmlinux' or `module-name`.ko. However there are information symbols not located in a default dictionary, this is evident comparing the symbols in `Module.symvers' file with ABI XML file, so for example, the ctf reader is expecting to find the information for `LZ4_decompress_fast' symbol in the CTF `vmlinux' archive member, because this symbols is defined in `vmlinux' binary: 0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL But, it figures out that it is missing. The correct location is `vmlinux#0' dictionary: CTF archive member: vmlinux: ... Function objects: ... CTF archive member: vmlinux#0: Function objects: ... LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8) ... Therefore, ctf reader is looking for debug information in the whole archive, fortunately `libctf' provides a fast lookup mechanism using cache, dictionary references, etc., so the penalty performance is ~10%. Now, it make use of `ctf_lookup_by_symbol_name' at first instance which is in charge to locate symbol information given a symbol name on either CTF Function o Variable sections, if there isn't found it tries by using `ctf_lookup_variable' to looks in the CTF Variable section, this could happens due to `ld' operated with `--ctf-variables' option and function types information now resides in CTF Variable section. * src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function. (process_ctf_archive): Use `lookup_symbol_in_ctf_archive'. Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com> --- src/abg-ctf-reader.cc | 74 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 64 insertions(+), 10 deletions(-) diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc index 71808f9a..f5f58c7a 100644 --- a/src/abg-ctf-reader.cc +++ b/src/abg-ctf-reader.cc @@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp, return result; } +/// Given a symbol name, lookup the corresponding CTF information in +/// the default dictionary (CTF archive member provided by the caller) +/// If the search is not success, the looks for the symbol name +/// in _all_ archive members. +/// +/// @param ctfa the CTF archive. +/// @param dict the default dictionary to looks for. +/// @param sym_name the symbol name. +/// @param corp the IR corpus. +/// +/// Note that if @ref sym_name is found in other than its default dictionary +/// @ref ctf_dict will be updated and it must be explicitly closed by its +/// caller. +/// +/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise. + +static ctf_id_t +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict, + const char *sym_name, corpus_sptr corp) +{ + int ctf_err; + ctf_dict_t *dict = *ctf_dict; + ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name); + + if (ctf_type != CTF_ERR) + return ctf_type; + + /* Probably --ctf-variables option was used by ld, so symbol type + definition must be found in the CTF Variable section. */ + ctf_type = ctf_lookup_variable(dict, sym_name); + + /* Not lucky, then, search in whole archive */ + if (ctf_type == CTF_ERR) + { + ctf_dict_t *fp; + ctf_next_t *i = NULL; + const char *arcname; + + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL) + { + if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR) + ctf_type = ctf_lookup_variable(fp, sym_name); + + if (ctf_type != CTF_ERR) + { + *ctf_dict = fp; + break; + } + ctf_dict_close(fp); + } + } + + return ctf_type; +} + /// Process a CTF archive and create libabigail IR for the types, /// variables and function declarations found in the archive, iterating /// over public symbols. The IR is added to the given corpus. @@ -1222,7 +1277,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) corp->add(ir_translation_unit); int ctf_err; - ctf_dict_t *ctf_dict; + ctf_dict_t *ctf_dict, *dict_tmp; const auto symtab = ctxt->symtab; symtab_reader::symtab_filter filter = symtab->make_filter(); filter.set_public_symbols(); @@ -1248,19 +1303,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) abort(); } + dict_tmp = ctf_dict; + for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter)) { std::string sym_name = symbol->get_name(); ctf_id_t ctf_sym_type; - ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str()); - if (ctf_sym_type == (ctf_id_t) -1 - && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) - // lookup in function objects - ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str()); - - if (ctf_sym_type == (ctf_id_t) -1) - continue; + ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict, + sym_name.c_str(), corp); + if (ctf_sym_type == CTF_ERR) + continue; if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION) { @@ -1298,13 +1351,14 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) func_type, 0 /* is_inline */, location())); - func_declaration->set_symbol(symbol); add_decl_to_scope(func_declaration, ir_translation_unit->get_global_scope()); func_declaration->set_is_in_public_symbol_table(true); ctxt->maybe_add_fn_to_exported_decls(func_declaration.get()); } + + ctf_dict = dict_tmp; } ctf_dict_close(ctf_dict); -- 2.35.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv v2] ctf-reader: Lookup debug info for symbols in a non default archive member 2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez @ 2022-09-13 9:26 ` Dodji Seketeli 0 siblings, 0 replies; 6+ messages in thread From: Dodji Seketeli @ 2022-09-13 9:26 UTC (permalink / raw) To: Guillermo E. Martinez via Libabigail; +Cc: Guillermo E. Martinez Hello Guillermo, Thank you for the explanations and the updated patch. Everything is clear for me now! Thanks again. I have applied the patch to master, but just with some slight obvious changes that I am discussing below. "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> a écrit: [...] > --- a/src/abg-ctf-reader.cc [...] > @@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp, > return result; > } > > +/// Given a symbol name, lookup the corresponding CTF information in > +/// the default dictionary (CTF archive member provided by the caller) > +/// If the search is not success, the looks for the symbol name > +/// in _all_ archive members. > +/// > +/// @param ctfa the CTF archive. > +/// @param dict the default dictionary to looks for. > +/// @param sym_name the symbol name. > +/// @param corp the IR corpus. > +/// > +/// Note that if @ref sym_name is found in other than its default dictionary > +/// @ref ctf_dict will be updated and it must be explicitly closed by its > +/// caller. > +/// > +/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise. > + > +static ctf_id_t > +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict, > + const char *sym_name, corpus_sptr corp) It seems to me that the "corp" parameter is not used in the function, so I removed it. I have adjusted the doxygen comment to remove it as well. > +{ > + int ctf_err; > + ctf_dict_t *dict = *ctf_dict; > + ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name); > + > + if (ctf_type != CTF_ERR) > + return ctf_type; > + > + /* Probably --ctf-variables option was used by ld, so symbol type > + definition must be found in the CTF Variable section. */ > + ctf_type = ctf_lookup_variable(dict, sym_name); > + > + /* Not lucky, then, search in whole archive */ > + if (ctf_type == CTF_ERR) > + { > + ctf_dict_t *fp; > + ctf_next_t *i = NULL; > + const char *arcname; > + > + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL) > + { > + if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR) > + ctf_type = ctf_lookup_variable(fp, sym_name); > + > + if (ctf_type != CTF_ERR) > + { > + *ctf_dict = fp; > + break; > + } > + ctf_dict_close(fp); > + } > + } > + > + return ctf_type; > +} > + [...] > for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter)) > { > std::string sym_name = symbol->get_name(); > ctf_id_t ctf_sym_type; > > - ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str()); > - if (ctf_sym_type == (ctf_id_t) -1 > - && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) > - // lookup in function objects > - ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str()); > - > - if (ctf_sym_type == (ctf_id_t) -1) > - continue; > + ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict, > + sym_name.c_str(), corp); I have adjusted that call to remove the "corp" argument as it's no longer needed. Oh, thanks for adjusting this code. Using lookup_symbol_in_ctf_archive here makes things a lot clearer to me at least! Below is the patch that I have applied. I have slightly amended the introductory test to correct some slight typos. From ad47854627f76c7959ae1a7ae59c9fcda38091c5 Mon Sep 17 00:00:00 2001 From: "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> Date: Wed, 7 Sep 2022 18:40:42 -0500 Subject: [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member The current mechanism used by the ctf reader for looking for debug information given a specific Linux symbol is the following: it opens the dictionary (default) which name matches the binary name being processed in the current corpus, e.g. `vmlinux' or `module-name`.ko. However there are symbols and information that are not located in the default dictionary; this is evident comparing the symbols in `Module.symvers' file with ABI XML file, so for example, the ctf reader is expecting to find the information for `LZ4_decompress_fast' symbol in the CTF `vmlinux' archive member, because this symbols is defined in `vmlinux' binary: 0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL But, it figures out that it is missing. The correct location is `vmlinux#0' dictionary: CTF archive member: vmlinux: ... Function objects: ... CTF archive member: vmlinux#0: Function objects: ... LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8) ... Therefore, ctf reader must be looking for debug information in the whole archive; fortunately `libctf' provides a fast lookup mechanism using cache, dictionary references, etc., so the penalty performance is ~10%. Now, it make use of `ctf_lookup_by_symbol_name' at first instance which is in charge to locate symbol information given a symbol name on either CTF Function or Variable sections; if the symbol isn't found it tries using `ctf_lookup_variable' to look into the CTF Variable section; this could happens due to `ld' operating with the `--ctf-variables' option which makes function types information to reside in the CTF Variable section. * src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function. (process_ctf_archive): Use `lookup_symbol_in_ctf_archive'. Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com> Signed-off-by: Dodji Seketeli <dodji@redhat.com> --- src/abg-ctf-reader.cc | 74 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 64 insertions(+), 10 deletions(-) diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc index 71808f9a..e307fcd7 100644 --- a/src/abg-ctf-reader.cc +++ b/src/abg-ctf-reader.cc @@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp, return result; } +/// Given a symbol name, lookup the corresponding CTF information in +/// the default dictionary (CTF archive member provided by the caller) +/// If the search is not success, the looks for the symbol name +/// in _all_ archive members. +/// +/// @param ctfa the CTF archive. +/// @param dict the default dictionary to looks for. +/// @param sym_name the symbol name. +/// @param corp the IR corpus. +/// +/// Note that if @ref sym_name is found in other than its default dictionary +/// @ref ctf_dict will be updated and it must be explicitly closed by its +/// caller. +/// +/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise. + +static ctf_id_t +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict, + const char *sym_name) +{ + int ctf_err; + ctf_dict_t *dict = *ctf_dict; + ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name); + + if (ctf_type != CTF_ERR) + return ctf_type; + + /* Probably --ctf-variables option was used by ld, so symbol type + definition must be found in the CTF Variable section. */ + ctf_type = ctf_lookup_variable(dict, sym_name); + + /* Not lucky, then, search in whole archive */ + if (ctf_type == CTF_ERR) + { + ctf_dict_t *fp; + ctf_next_t *i = NULL; + const char *arcname; + + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL) + { + if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR) + ctf_type = ctf_lookup_variable(fp, sym_name); + + if (ctf_type != CTF_ERR) + { + *ctf_dict = fp; + break; + } + ctf_dict_close(fp); + } + } + + return ctf_type; +} + /// Process a CTF archive and create libabigail IR for the types, /// variables and function declarations found in the archive, iterating /// over public symbols. The IR is added to the given corpus. @@ -1222,7 +1277,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) corp->add(ir_translation_unit); int ctf_err; - ctf_dict_t *ctf_dict; + ctf_dict_t *ctf_dict, *dict_tmp; const auto symtab = ctxt->symtab; symtab_reader::symtab_filter filter = symtab->make_filter(); filter.set_public_symbols(); @@ -1248,19 +1303,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) abort(); } + dict_tmp = ctf_dict; + for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter)) { std::string sym_name = symbol->get_name(); ctf_id_t ctf_sym_type; - ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str()); - if (ctf_sym_type == (ctf_id_t) -1 - && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN)) - // lookup in function objects - ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str()); - - if (ctf_sym_type == (ctf_id_t) -1) - continue; + ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict, + sym_name.c_str()); + if (ctf_sym_type == CTF_ERR) + continue; if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION) { @@ -1298,13 +1351,14 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp) func_type, 0 /* is_inline */, location())); - func_declaration->set_symbol(symbol); add_decl_to_scope(func_declaration, ir_translation_unit->get_global_scope()); func_declaration->set_is_in_public_symbol_table(true); ctxt->maybe_add_fn_to_exported_decls(func_declaration.get()); } + + ctf_dict = dict_tmp; } ctf_dict_close(ctf_dict); -- 2.37.2 -- Dodji ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-13 9:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez 2022-08-31 15:16 ` Guillermo E. Martinez 2022-09-06 12:49 ` Dodji Seketeli 2022-09-07 18:40 ` Guillermo E. Martinez 2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez 2022-09-13 9:26 ` Dodji Seketeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).