* [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member
@ 2022-08-31 15:16 Guillermo E. Martinez
2022-08-31 15:16 ` Guillermo E. Martinez
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Guillermo E. Martinez @ 2022-08-31 15:16 UTC (permalink / raw)
To: libabigail; +Cc: Guillermo E. Martinez
Hello,
This patch improves the ABI XML file generated by ctf reader, there
are Linux symbols (EXPORT_SYMBOL*) that were missing.
Comments will be grateful and appreciated!.
Thanks in advanced,
guillermo
--
The current mechanism used by the ctf reader to looking for debug
information given a specific Linux symbol, is open the dictionary
(default) which the name match with the binary name being processing
in the current corpus, e.g. `vmlinux' or `module-name.ko'. However
there are symbol information is not located in a default dictionary,
this is evident comparing the symbols in `Module.symvers' file with
ABI XML file, so for example, the ctf reader is expecting to find the
information for `LZ4_decompress_fast' symbol in the CTF `vmlinux'
archive member, because this symbols is defined in `vmlinux' binary:
0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL
But it figures out that it is missing. The correct location is
`vmlinux#0' dictionary.
CTF archive member: vmlinux:
...
Function objects:
...
CTF archive member: vmlinux#0:
Function objects:
...
LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8)
...
Therefore, ctf reader is looking for debug information in the whole
archive, fortunately `libctf' provides a fast lookup mechanism using
cache, dictionary references, etc., so the penalty performance is ~10%.
* src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function.
(process_ctf_archive): Use `lookup_symbol_in_ctf_archive'.
Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com>
---
src/abg-ctf-reader.cc | 72 ++++++++++++++++++++++++++++++++++++++-----
1 file changed, 64 insertions(+), 8 deletions(-)
diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 71808f9a..8fa98a94 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -1204,6 +1204,62 @@ lookup_type(read_context *ctxt, corpus_sptr corp,
return result;
}
+/// Given a symbol name, lookup the corresponding CTF information in
+/// the default dictionary (CTF archive member provided by the caller)
+/// If the search is not success, the looks for the symbol name
+/// in _all_ archive members.
+///
+/// @param ctfa the CTF archive.
+/// @param dict the default dictionary to looks for.
+/// @param sym_name the symbol name.
+/// @param corp the IR corpus.
+///
+/// Note that if @ref sym_name is found in other than default dictionary
+/// @ref ctf_dict will be updated and it must be explicate closed by its
+/// caller.
+///
+/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise.
+
+static ctf_id_t
+lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
+ const char *sym_name, corpus_sptr corp)
+{
+ int ctf_err;
+ ctf_dict_t *dict = *ctf_dict;
+ ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name);
+
+ /* lookup CTF type for a given symbol in its default
+ dictionary */
+ if (ctf_type == (ctf_id_t) -1
+ && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
+ ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
+
+ /* Not lucky, then, search in whole archive */
+ if (ctf_type == (ctf_id_t) -1)
+ {
+ ctf_dict_t *fp;
+ ctf_next_t *i = NULL;
+ const char *arcname;
+
+ while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
+ {
+ ctf_type = ctf_lookup_variable (fp, sym_name);
+ if (ctf_type == (ctf_id_t) -1
+ && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
+ ctf_type = ctf_lookup_by_symbol_name(fp, sym_name);
+
+ if (ctf_type != (ctf_id_t) -1)
+ {
+ *ctf_dict = fp;
+ break;
+ }
+ ctf_dict_close(fp);
+ }
+ }
+
+ return ctf_type;
+}
+
/// Process a CTF archive and create libabigail IR for the types,
/// variables and function declarations found in the archive, iterating
/// over public symbols. The IR is added to the given corpus.
@@ -1222,7 +1278,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
corp->add(ir_translation_unit);
int ctf_err;
- ctf_dict_t *ctf_dict;
+ ctf_dict_t *ctf_dict, *dict_tmp;
const auto symtab = ctxt->symtab;
symtab_reader::symtab_filter filter = symtab->make_filter();
filter.set_public_symbols();
@@ -1248,19 +1304,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
abort();
}
+ dict_tmp = ctf_dict;
+
for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
{
std::string sym_name = symbol->get_name();
ctf_id_t ctf_sym_type;
- ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
- if (ctf_sym_type == (ctf_id_t) -1
- && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
- // lookup in function objects
- ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
-
+ ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
+ sym_name.c_str(), corp);
if (ctf_sym_type == (ctf_id_t) -1)
- continue;
+ continue;
if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION)
{
@@ -1305,6 +1359,8 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
func_declaration->set_is_in_public_symbol_table(true);
ctxt->maybe_add_fn_to_exported_decls(func_declaration.get());
}
+
+ ctf_dict = dict_tmp;
}
ctf_dict_close(ctf_dict);
--
2.35.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member
2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez
@ 2022-08-31 15:16 ` Guillermo E. Martinez
2022-09-06 12:49 ` Dodji Seketeli
2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez
2 siblings, 0 replies; 6+ messages in thread
From: Guillermo E. Martinez @ 2022-08-31 15:16 UTC (permalink / raw)
To: libabigail
Hello,
This patch improves the ABI XML file generated by ctf reader, there
are Linux symbols (EXPORT_SYMBOL*) that were missing.
Comments will be grateful and appreciated!.
Thanks in advanced,
guillermo
--
The current mechanism used by the ctf reader to looking for debug
information given a specific Linux symbol, is open the dictionary
(default) which the name match with the binary name being processing
in the current corpus, e.g. `vmlinux' or `module-name.ko'. However
there are symbol information is not located in a default dictionary,
this is evident comparing the symbols in `Module.symvers' file with
ABI XML file, so for example, the ctf reader is expecting to find the
information for `LZ4_decompress_fast' symbol in the CTF `vmlinux'
archive member, because this symbols is defined in `vmlinux' binary:
0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL
But it figures out that it is missing. The correct location is
`vmlinux#0' dictionary.
CTF archive member: vmlinux:
...
Function objects:
...
CTF archive member: vmlinux#0:
Function objects:
...
LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8)
...
Therefore, ctf reader is looking for debug information in the whole
archive, fortunately `libctf' provides a fast lookup mechanism using
cache, dictionary references, etc., so the penalty performance is ~10%.
* src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function.
(process_ctf_archive): Use `lookup_symbol_in_ctf_archive'.
Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com>
---
src/abg-ctf-reader.cc | 72 ++++++++++++++++++++++++++++++++++++++-----
1 file changed, 64 insertions(+), 8 deletions(-)
diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 71808f9a..8fa98a94 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -1204,6 +1204,62 @@ lookup_type(read_context *ctxt, corpus_sptr corp,
return result;
}
+/// Given a symbol name, lookup the corresponding CTF information in
+/// the default dictionary (CTF archive member provided by the caller)
+/// If the search is not success, the looks for the symbol name
+/// in _all_ archive members.
+///
+/// @param ctfa the CTF archive.
+/// @param dict the default dictionary to looks for.
+/// @param sym_name the symbol name.
+/// @param corp the IR corpus.
+///
+/// Note that if @ref sym_name is found in other than default dictionary
+/// @ref ctf_dict will be updated and it must be explicate closed by its
+/// caller.
+///
+/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise.
+
+static ctf_id_t
+lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
+ const char *sym_name, corpus_sptr corp)
+{
+ int ctf_err;
+ ctf_dict_t *dict = *ctf_dict;
+ ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name);
+
+ /* lookup CTF type for a given symbol in its default
+ dictionary */
+ if (ctf_type == (ctf_id_t) -1
+ && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
+ ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
+
+ /* Not lucky, then, search in whole archive */
+ if (ctf_type == (ctf_id_t) -1)
+ {
+ ctf_dict_t *fp;
+ ctf_next_t *i = NULL;
+ const char *arcname;
+
+ while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
+ {
+ ctf_type = ctf_lookup_variable (fp, sym_name);
+ if (ctf_type == (ctf_id_t) -1
+ && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
+ ctf_type = ctf_lookup_by_symbol_name(fp, sym_name);
+
+ if (ctf_type != (ctf_id_t) -1)
+ {
+ *ctf_dict = fp;
+ break;
+ }
+ ctf_dict_close(fp);
+ }
+ }
+
+ return ctf_type;
+}
+
/// Process a CTF archive and create libabigail IR for the types,
/// variables and function declarations found in the archive, iterating
/// over public symbols. The IR is added to the given corpus.
@@ -1222,7 +1278,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
corp->add(ir_translation_unit);
int ctf_err;
- ctf_dict_t *ctf_dict;
+ ctf_dict_t *ctf_dict, *dict_tmp;
const auto symtab = ctxt->symtab;
symtab_reader::symtab_filter filter = symtab->make_filter();
filter.set_public_symbols();
@@ -1248,19 +1304,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
abort();
}
+ dict_tmp = ctf_dict;
+
for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
{
std::string sym_name = symbol->get_name();
ctf_id_t ctf_sym_type;
- ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
- if (ctf_sym_type == (ctf_id_t) -1
- && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
- // lookup in function objects
- ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
-
+ ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
+ sym_name.c_str(), corp);
if (ctf_sym_type == (ctf_id_t) -1)
- continue;
+ continue;
if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION)
{
@@ -1305,6 +1359,8 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
func_declaration->set_is_in_public_symbol_table(true);
ctxt->maybe_add_fn_to_exported_decls(func_declaration.get());
}
+
+ ctf_dict = dict_tmp;
}
ctf_dict_close(ctf_dict);
--
2.35.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member
2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez
2022-08-31 15:16 ` Guillermo E. Martinez
@ 2022-09-06 12:49 ` Dodji Seketeli
2022-09-07 18:40 ` Guillermo E. Martinez
2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez
2 siblings, 1 reply; 6+ messages in thread
From: Dodji Seketeli @ 2022-09-06 12:49 UTC (permalink / raw)
To: Guillermo E. Martinez via Libabigail; +Cc: Guillermo E. Martinez
Hello Guillermo,
Thanks for the patch. I have tested and it seems to pass regression
testing on my system. However, there are some things that I don't
understand so I have some questions below. The questions are just for
my own understanding. I don't have anything major against the patch,
obviously.
[...]
"Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> a
écrit:
[...]
> +/// Given a symbol name, lookup the corresponding CTF information in
> +/// the default dictionary (CTF archive member provided by the caller)
> +/// If the search is not success, the looks for the symbol name
> +/// in _all_ archive members.
> +///
> +/// @param ctfa the CTF archive.
> +/// @param dict the default dictionary to looks for.
> +/// @param sym_name the symbol name.
> +/// @param corp the IR corpus.
> +///
> +/// Note that if @ref sym_name is found in other than default dictionary
> +/// @ref ctf_dict will be updated and it must be explicate closed by its
> +/// caller.
> +///
> +/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise.
> +
> +static ctf_id_t
> +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
> + const char *sym_name, corpus_sptr corp)
> +{
> + int ctf_err;
> + ctf_dict_t *dict = *ctf_dict;
> + ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name);
So, here, we begin by looking for a variable (using ctf_lookup_variable)
which ELF symbol is sym_name, is that correct?
> +
> + /* lookup CTF type for a given symbol in its default
> + dictionary */
> + if (ctf_type == (ctf_id_t) -1
So, I guess the variable lookup failed, right?
> + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
Why this condition? Why only considering cases where we are not looking
at a Linux Kernel binary? I would think that we would want to consider
the case where the variable lookup failed, even in the case of a Linux
Kernel binary, wouldn't we? If not why? Maybe we should add a comment
to explain this.
> + ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
So I am guessing that ctf_lookup_by_symbol_name looks up both variable
and function symbols from the same dictionary, is that correct?
Also, I don't understand why we don't just use ctf_lookup_by_symbol_name
rather than starting with ctf_lookup_variable first. Is it a
performance things?
Incidentally, I haven't found documentation for the lookup functions
other than by looking at the code, in say:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=libctf/ctf-lookup.c;hb=refs/heads/master.
If there is documentation for it somewhere else, maybe we can link that
place in the code here in a comment somewhere, or we can just point to
that link above. Both would be fine by me.
> +
> + /* Not lucky, then, search in whole archive */
> + if (ctf_type == (ctf_id_t) -1)
> + {
> + ctf_dict_t *fp;
> + ctf_next_t *i = NULL;
> + const char *arcname;
> +
> + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
> + {
> + ctf_type = ctf_lookup_variable (fp, sym_name);
> + if (ctf_type == (ctf_id_t) -1
> + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
The same questions as above.
> + ctf_type = ctf_lookup_by_symbol_name(fp, sym_name);
> +
> + if (ctf_type != (ctf_id_t) -1)
> + {
> + *ctf_dict = fp;
> + break;
> + }
> + ctf_dict_close(fp);
> + }
> + }
> +
> + return ctf_type;
> +}
> +
Cheers,
[...]
--
Dodji
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member
2022-09-06 12:49 ` Dodji Seketeli
@ 2022-09-07 18:40 ` Guillermo E. Martinez
0 siblings, 0 replies; 6+ messages in thread
From: Guillermo E. Martinez @ 2022-09-07 18:40 UTC (permalink / raw)
To: Dodji Seketeli, Guillermo E. Martinez via Libabigail
On 9/6/22 07:49, Dodji Seketeli wrote:
> Hello Guillermo,
Hello Dodji,
> Thanks for the patch. I have tested and it seems to pass regression
> testing on my system. However, there are some things that I don't
> understand so I have some questions below. The questions are just for
> my own understanding. I don't have anything major against the patch,
> obviously.
>
> [...]
>
> "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> a
> écrit:
>
>
> [...]
>
>> +/// Given a symbol name, lookup the corresponding CTF information in
>> +/// the default dictionary (CTF archive member provided by the caller)
>> +/// If the search is not success, the looks for the symbol name
>> +/// in _all_ archive members.
>> +///
>> +/// @param ctfa the CTF archive.
>> +/// @param dict the default dictionary to looks for.
>> +/// @param sym_name the symbol name.
>> +/// @param corp the IR corpus.
>> +///
>> +/// Note that if @ref sym_name is found in other than default dictionary
>> +/// @ref ctf_dict will be updated and it must be explicate closed by its
>> +/// caller.
>> +///
>> +/// @return a valid CTF type id, if @ref sym_name was found, -1 otherwise.
>> +
>> +static ctf_id_t
>> +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
>> + const char *sym_name, corpus_sptr corp)
>> +{
>> + int ctf_err;
>> + ctf_dict_t *dict = *ctf_dict;
>> + ctf_id_t ctf_type = ctf_lookup_variable(dict, sym_name);
>
> So, here, we begin by looking for a variable (using ctf_lookup_variable)
> which ELF symbol is sym_name, is that correct?
That's correct, `sym_name' is the symbol name.
>> +
>> + /* lookup CTF type for a given symbol in its default
>> + dictionary */
>> + if (ctf_type == (ctf_id_t) -1
>
> So, I guess the variable lookup failed, right?
Correct, libctf `ctf_lookup_*' functions return CTF_ERR when fails,
so I'm goinf to changed it for clarity.
>> + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
>
> Why this condition? Why only considering cases where we are not looking
> at a Linux Kernel binary? I would think that we would want to consider
> the case where the variable lookup failed, even in the case of a Linux
> Kernel binary, wouldn't we? If not why? Maybe we should add a comment
> to explain this.
OK. The linker (ld) in the Kenel build mechanism uses: `--ctf-variables',
then it emits the symbols type definitions using just the CTF Variable
ection:
$ objdump --ctf foo
...
Labels:
Data objects:
Function objects:
Variables:
main -> 0x2: (kind 5) int (*) () (aligned at 0x8)
main_func -> 0x4: (kind 5) void (*) () (aligned at 0x8)
okkk -> 0x1: (kind 1) int (format 0x1) (size 0x4) (aligned at 0x4)
Otherwise, it must be splitted across CTF Data, Function and Variable
sections:
$ objdump --ctf foo.o
Data objects:
okkk -> 0x1: (kind 1) int (format 0x1) (size 0x4) (aligned at 0x4)
Function objects:
main -> 0x2: (kind 5) int (*) () (aligned at 0x8)
main_func -> 0x4: (kind 5) void (*) () (aligned at 0x8)
Variables:
okkk -> 0x1: (kind 1) int (format 0x1) (size 0x4) (aligned at 0x4)
Since, vmlinux + *.ko, is *big* binary, I arranged the order of CTF
lookup functions invoking at first: 'ctf_lookup_variable` and then,
if it fails `ctf_lookup_by_symbol_name' by performance reasons.
But I'm agree to remove `!(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))'
changing the invocation order for those functions, the penalty performance
was less than 10s building the ABI representation for the kernel,
I consider it as acceptable.
>> + ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
>
> So I am guessing that ctf_lookup_by_symbol_name looks up both variable
> and function symbols from the same dictionary, is that correct?
True.
> Also, I don't understand why we don't just use ctf_lookup_by_symbol_name
> rather than starting with ctf_lookup_variable first. Is it a
> performance things?
Exactly. Performance when we are processing a Linux tree directory.
> Incidentally, I haven't found documentation for the lookup functions
> other than by looking at the code, in say:
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=libctf/ctf-lookup.c;hb=refs/heads/master.
I'm afraid that the documentation is just in the source code.
> If there is documentation for it somewhere else, maybe we can link that
> place in the code here in a comment somewhere, or we can just point to
> that link above. Both would be fine by me.
>
>> +
>> + /* Not lucky, then, search in whole archive */
>> + if (ctf_type == (ctf_id_t) -1)
>> + {
>> + ctf_dict_t *fp;
>> + ctf_next_t *i = NULL;
>> + const char *arcname;
>> +
>> + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
>> + {
>> + ctf_type = ctf_lookup_variable (fp, sym_name);
>> + if (ctf_type == (ctf_id_t) -1
>> + && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
>
> The same questions as above.
>
>> + ctf_type = ctf_lookup_by_symbol_name(fp, sym_name);
>> +
>> + if (ctf_type != (ctf_id_t) -1)
>> + {
>> + *ctf_dict = fp;
>> + break;
>> + }
>> + ctf_dict_close(fp);
>> + }
>> + }
>> +
>> + return ctf_type;
>> +}
>> +
>
> Cheers,
>
> [...]
>
>
Really thanks for your comments!,
I will prepare the v2
Kind regards,
guillermo
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCHv v2] ctf-reader: Lookup debug info for symbols in a non default archive member
2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez
2022-08-31 15:16 ` Guillermo E. Martinez
2022-09-06 12:49 ` Dodji Seketeli
@ 2022-09-07 23:40 ` Guillermo E. Martinez
2022-09-13 9:26 ` Dodji Seketeli
2 siblings, 1 reply; 6+ messages in thread
From: Guillermo E. Martinez @ 2022-09-07 23:40 UTC (permalink / raw)
To: libabigail; +Cc: Guillermo E. Martinez
Hello,
This patch v2 to improves the ABI XML file generated by ctf reader,
there are Linux symbols (EXPORT_SYMBOL*) that were missing.
Changes from v1:
- Change order for `ctf_lookup_*' to at first looks symbol
function types in `CTF Function section', and afterwards
if is not success try in `CTF Variable section'.
- Add comments describing use of `ctf_lookup_variable'.
Comments will be grateful and appreciated!.
Thanks in advanced,
guillermo
--
The current mechanism used by the ctf reader to looking for debug
information given a specific Linux symbol, it opens the dictionary
(default) which the name match with the binary name being processing
in the current corpus, e.g. `vmlinux' or `module-name`.ko. However
there are information symbols not located in a default dictionary,
this is evident comparing the symbols in `Module.symvers' file with
ABI XML file, so for example, the ctf reader is expecting to find the
information for `LZ4_decompress_fast' symbol in the CTF `vmlinux'
archive member, because this symbols is defined in `vmlinux' binary:
0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL
But, it figures out that it is missing. The correct location is
`vmlinux#0' dictionary:
CTF archive member: vmlinux:
...
Function objects:
...
CTF archive member: vmlinux#0:
Function objects:
...
LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8)
...
Therefore, ctf reader is looking for debug information in the whole
archive, fortunately `libctf' provides a fast lookup mechanism using
cache, dictionary references, etc., so the penalty performance is ~10%.
Now, it make use of `ctf_lookup_by_symbol_name' at first instance which
is in charge to locate symbol information given a symbol name on either
CTF Function o Variable sections, if there isn't found it tries by using
`ctf_lookup_variable' to looks in the CTF Variable section, this could
happens due to `ld' operated with `--ctf-variables' option and function
types information now resides in CTF Variable section.
* src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function.
(process_ctf_archive): Use `lookup_symbol_in_ctf_archive'.
Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com>
---
src/abg-ctf-reader.cc | 74 +++++++++++++++++++++++++++++++++++++------
1 file changed, 64 insertions(+), 10 deletions(-)
diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 71808f9a..f5f58c7a 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp,
return result;
}
+/// Given a symbol name, lookup the corresponding CTF information in
+/// the default dictionary (CTF archive member provided by the caller)
+/// If the search is not success, the looks for the symbol name
+/// in _all_ archive members.
+///
+/// @param ctfa the CTF archive.
+/// @param dict the default dictionary to looks for.
+/// @param sym_name the symbol name.
+/// @param corp the IR corpus.
+///
+/// Note that if @ref sym_name is found in other than its default dictionary
+/// @ref ctf_dict will be updated and it must be explicitly closed by its
+/// caller.
+///
+/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise.
+
+static ctf_id_t
+lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
+ const char *sym_name, corpus_sptr corp)
+{
+ int ctf_err;
+ ctf_dict_t *dict = *ctf_dict;
+ ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
+
+ if (ctf_type != CTF_ERR)
+ return ctf_type;
+
+ /* Probably --ctf-variables option was used by ld, so symbol type
+ definition must be found in the CTF Variable section. */
+ ctf_type = ctf_lookup_variable(dict, sym_name);
+
+ /* Not lucky, then, search in whole archive */
+ if (ctf_type == CTF_ERR)
+ {
+ ctf_dict_t *fp;
+ ctf_next_t *i = NULL;
+ const char *arcname;
+
+ while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
+ {
+ if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR)
+ ctf_type = ctf_lookup_variable(fp, sym_name);
+
+ if (ctf_type != CTF_ERR)
+ {
+ *ctf_dict = fp;
+ break;
+ }
+ ctf_dict_close(fp);
+ }
+ }
+
+ return ctf_type;
+}
+
/// Process a CTF archive and create libabigail IR for the types,
/// variables and function declarations found in the archive, iterating
/// over public symbols. The IR is added to the given corpus.
@@ -1222,7 +1277,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
corp->add(ir_translation_unit);
int ctf_err;
- ctf_dict_t *ctf_dict;
+ ctf_dict_t *ctf_dict, *dict_tmp;
const auto symtab = ctxt->symtab;
symtab_reader::symtab_filter filter = symtab->make_filter();
filter.set_public_symbols();
@@ -1248,19 +1303,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
abort();
}
+ dict_tmp = ctf_dict;
+
for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
{
std::string sym_name = symbol->get_name();
ctf_id_t ctf_sym_type;
- ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
- if (ctf_sym_type == (ctf_id_t) -1
- && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
- // lookup in function objects
- ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
-
- if (ctf_sym_type == (ctf_id_t) -1)
- continue;
+ ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
+ sym_name.c_str(), corp);
+ if (ctf_sym_type == CTF_ERR)
+ continue;
if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION)
{
@@ -1298,13 +1351,14 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
func_type,
0 /* is_inline */,
location()));
-
func_declaration->set_symbol(symbol);
add_decl_to_scope(func_declaration,
ir_translation_unit->get_global_scope());
func_declaration->set_is_in_public_symbol_table(true);
ctxt->maybe_add_fn_to_exported_decls(func_declaration.get());
}
+
+ ctf_dict = dict_tmp;
}
ctf_dict_close(ctf_dict);
--
2.35.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv v2] ctf-reader: Lookup debug info for symbols in a non default archive member
2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez
@ 2022-09-13 9:26 ` Dodji Seketeli
0 siblings, 0 replies; 6+ messages in thread
From: Dodji Seketeli @ 2022-09-13 9:26 UTC (permalink / raw)
To: Guillermo E. Martinez via Libabigail; +Cc: Guillermo E. Martinez
Hello Guillermo,
Thank you for the explanations and the updated patch. Everything is
clear for me now! Thanks again.
I have applied the patch to master, but just with some slight obvious
changes that I am discussing below.
"Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org> a
écrit:
[...]
> --- a/src/abg-ctf-reader.cc
[...]
> @@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp,
> return result;
> }
>
> +/// Given a symbol name, lookup the corresponding CTF information in
> +/// the default dictionary (CTF archive member provided by the caller)
> +/// If the search is not success, the looks for the symbol name
> +/// in _all_ archive members.
> +///
> +/// @param ctfa the CTF archive.
> +/// @param dict the default dictionary to looks for.
> +/// @param sym_name the symbol name.
> +/// @param corp the IR corpus.
> +///
> +/// Note that if @ref sym_name is found in other than its default dictionary
> +/// @ref ctf_dict will be updated and it must be explicitly closed by its
> +/// caller.
> +///
> +/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise.
> +
> +static ctf_id_t
> +lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
> + const char *sym_name, corpus_sptr corp)
It seems to me that the "corp" parameter is not used in the function, so
I removed it. I have adjusted the doxygen comment to remove it as well.
> +{
> + int ctf_err;
> + ctf_dict_t *dict = *ctf_dict;
> + ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
> +
> + if (ctf_type != CTF_ERR)
> + return ctf_type;
> +
> + /* Probably --ctf-variables option was used by ld, so symbol type
> + definition must be found in the CTF Variable section. */
> + ctf_type = ctf_lookup_variable(dict, sym_name);
> +
> + /* Not lucky, then, search in whole archive */
> + if (ctf_type == CTF_ERR)
> + {
> + ctf_dict_t *fp;
> + ctf_next_t *i = NULL;
> + const char *arcname;
> +
> + while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
> + {
> + if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR)
> + ctf_type = ctf_lookup_variable(fp, sym_name);
> +
> + if (ctf_type != CTF_ERR)
> + {
> + *ctf_dict = fp;
> + break;
> + }
> + ctf_dict_close(fp);
> + }
> + }
> +
> + return ctf_type;
> +}
> +
[...]
> for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
> {
> std::string sym_name = symbol->get_name();
> ctf_id_t ctf_sym_type;
>
> - ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
> - if (ctf_sym_type == (ctf_id_t) -1
> - && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
> - // lookup in function objects
> - ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
> -
> - if (ctf_sym_type == (ctf_id_t) -1)
> - continue;
> + ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
> + sym_name.c_str(), corp);
I have adjusted that call to remove the "corp" argument as it's no
longer needed.
Oh, thanks for adjusting this code. Using lookup_symbol_in_ctf_archive
here makes things a lot clearer to me at least!
Below is the patch that I have applied. I have slightly amended the
introductory test to correct some slight typos.
From ad47854627f76c7959ae1a7ae59c9fcda38091c5 Mon Sep 17 00:00:00 2001
From: "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org>
Date: Wed, 7 Sep 2022 18:40:42 -0500
Subject: [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member
The current mechanism used by the ctf reader for looking for debug
information given a specific Linux symbol is the following: it opens
the dictionary (default) which name matches the binary name being
processed in the current corpus, e.g. `vmlinux' or
`module-name`.ko. However there are symbols and information that are
not located in the default dictionary; this is evident comparing the
symbols in `Module.symvers' file with ABI XML file, so for example,
the ctf reader is expecting to find the information for
`LZ4_decompress_fast' symbol in the CTF `vmlinux' archive member,
because this symbols is defined in `vmlinux' binary:
0x4c416eb9 LZ4_decompress_fast vmlinux EXPORT_SYMBOL
But, it figures out that it is missing. The correct location is
`vmlinux#0' dictionary:
CTF archive member: vmlinux:
...
Function objects:
...
CTF archive member: vmlinux#0:
Function objects:
...
LZ4_decompress_fast -> 0x80037400: (kind 5) int (*) (const char *, char *, int) (aligned at 0x8)
...
Therefore, ctf reader must be looking for debug information in the
whole archive; fortunately `libctf' provides a fast lookup mechanism
using cache, dictionary references, etc., so the penalty performance
is ~10%.
Now, it make use of `ctf_lookup_by_symbol_name' at first instance
which is in charge to locate symbol information given a symbol name on
either CTF Function or Variable sections; if the symbol isn't found it
tries using `ctf_lookup_variable' to look into the CTF Variable
section; this could happens due to `ld' operating with the
`--ctf-variables' option which makes function types information to
reside in the CTF Variable section.
* src/abg-ctf-reader.cc (lookup_symbol_in_ctf_archive): New function.
(process_ctf_archive): Use `lookup_symbol_in_ctf_archive'.
Signed-off-by: Guillermo E. Martinez <guillermo.e.martinez@oracle.com>
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
src/abg-ctf-reader.cc | 74 +++++++++++++++++++++++++++++++++++++------
1 file changed, 64 insertions(+), 10 deletions(-)
diff --git a/src/abg-ctf-reader.cc b/src/abg-ctf-reader.cc
index 71808f9a..e307fcd7 100644
--- a/src/abg-ctf-reader.cc
+++ b/src/abg-ctf-reader.cc
@@ -1204,6 +1204,61 @@ lookup_type(read_context *ctxt, corpus_sptr corp,
return result;
}
+/// Given a symbol name, lookup the corresponding CTF information in
+/// the default dictionary (CTF archive member provided by the caller)
+/// If the search is not success, the looks for the symbol name
+/// in _all_ archive members.
+///
+/// @param ctfa the CTF archive.
+/// @param dict the default dictionary to looks for.
+/// @param sym_name the symbol name.
+/// @param corp the IR corpus.
+///
+/// Note that if @ref sym_name is found in other than its default dictionary
+/// @ref ctf_dict will be updated and it must be explicitly closed by its
+/// caller.
+///
+/// @return a valid CTF type id, if @ref sym_name was found, CTF_ERR otherwise.
+
+static ctf_id_t
+lookup_symbol_in_ctf_archive(ctf_archive_t *ctfa, ctf_dict_t **ctf_dict,
+ const char *sym_name)
+{
+ int ctf_err;
+ ctf_dict_t *dict = *ctf_dict;
+ ctf_id_t ctf_type = ctf_lookup_by_symbol_name(dict, sym_name);
+
+ if (ctf_type != CTF_ERR)
+ return ctf_type;
+
+ /* Probably --ctf-variables option was used by ld, so symbol type
+ definition must be found in the CTF Variable section. */
+ ctf_type = ctf_lookup_variable(dict, sym_name);
+
+ /* Not lucky, then, search in whole archive */
+ if (ctf_type == CTF_ERR)
+ {
+ ctf_dict_t *fp;
+ ctf_next_t *i = NULL;
+ const char *arcname;
+
+ while ((fp = ctf_archive_next(ctfa, &i, &arcname, 1, &ctf_err)) != NULL)
+ {
+ if ((ctf_type = ctf_lookup_by_symbol_name (fp, sym_name)) == CTF_ERR)
+ ctf_type = ctf_lookup_variable(fp, sym_name);
+
+ if (ctf_type != CTF_ERR)
+ {
+ *ctf_dict = fp;
+ break;
+ }
+ ctf_dict_close(fp);
+ }
+ }
+
+ return ctf_type;
+}
+
/// Process a CTF archive and create libabigail IR for the types,
/// variables and function declarations found in the archive, iterating
/// over public symbols. The IR is added to the given corpus.
@@ -1222,7 +1277,7 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
corp->add(ir_translation_unit);
int ctf_err;
- ctf_dict_t *ctf_dict;
+ ctf_dict_t *ctf_dict, *dict_tmp;
const auto symtab = ctxt->symtab;
symtab_reader::symtab_filter filter = symtab->make_filter();
filter.set_public_symbols();
@@ -1248,19 +1303,17 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
abort();
}
+ dict_tmp = ctf_dict;
+
for (const auto& symbol : symtab_reader::filtered_symtab(*symtab, filter))
{
std::string sym_name = symbol->get_name();
ctf_id_t ctf_sym_type;
- ctf_sym_type = ctf_lookup_variable(ctf_dict, sym_name.c_str());
- if (ctf_sym_type == (ctf_id_t) -1
- && !(corp->get_origin() & corpus::LINUX_KERNEL_BINARY_ORIGIN))
- // lookup in function objects
- ctf_sym_type = ctf_lookup_by_symbol_name(ctf_dict, sym_name.c_str());
-
- if (ctf_sym_type == (ctf_id_t) -1)
- continue;
+ ctf_sym_type = lookup_symbol_in_ctf_archive(ctxt->ctfa, &ctf_dict,
+ sym_name.c_str());
+ if (ctf_sym_type == CTF_ERR)
+ continue;
if (ctf_type_kind(ctf_dict, ctf_sym_type) != CTF_K_FUNCTION)
{
@@ -1298,13 +1351,14 @@ process_ctf_archive(read_context *ctxt, corpus_sptr corp)
func_type,
0 /* is_inline */,
location()));
-
func_declaration->set_symbol(symbol);
add_decl_to_scope(func_declaration,
ir_translation_unit->get_global_scope());
func_declaration->set_is_in_public_symbol_table(true);
ctxt->maybe_add_fn_to_exported_decls(func_declaration.get());
}
+
+ ctf_dict = dict_tmp;
}
ctf_dict_close(ctf_dict);
--
2.37.2
--
Dodji
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-13 9:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31 15:16 [PATCH] ctf-reader: Lookup debug info for symbols in a non default archive member Guillermo E. Martinez
2022-08-31 15:16 ` Guillermo E. Martinez
2022-09-06 12:49 ` Dodji Seketeli
2022-09-07 18:40 ` Guillermo E. Martinez
2022-09-07 23:40 ` [PATCHv v2] " Guillermo E. Martinez
2022-09-13 9:26 ` Dodji Seketeli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).