From: Yonghong Song <yhs@fb.com>
To: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: David Faust <david.faust@oracle.com>, gcc-patches@gcc.gnu.org
Subject: Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
Date: Fri, 24 Jun 2022 11:01:04 -0700 [thread overview]
Message-ID: <0c104e7b-1873-c141-37b9-71444f585793@fb.com> (raw)
In-Reply-To: <87edziknc1.fsf@oracle.com>
On 6/21/22 9:12 AM, Jose E. Marchesi wrote:
>
>> On 6/17/22 10:18 AM, Jose E. Marchesi wrote:
>>> Hi Yonghong.
>>>
>>>> On 6/15/22 1:57 PM, David Faust wrote:
>>>>>
>>>>> On 6/14/22 22:53, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 6/7/22 2:43 PM, David Faust wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> This patch series adds support for:
>>>>>>>
>>>>>>> - Two new C-language-level attributes that allow to associate (to "annotate" or
>>>>>>> to "tag") particular declarations and types with arbitrary strings. As
>>>>>>> explained below, this is intended to be used to, for example, characterize
>>>>>>> certain pointer types.
>>>>>>>
>>>>>>> - The conveyance of that information in the DWARF output in the form of a new
>>>>>>> DIE: DW_TAG_GNU_annotation.
>>>>>>>
>>>>>>> - The conveyance of that information in the BTF output in the form of two new
>>>>>>> kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>>>>>>
>>>>>>> All of these facilities are being added to the eBPF ecosystem, and support for
>>>>>>> them exists in some form in LLVM.
>>>>>>>
>>>>>>> Purpose
>>>>>>> =======
>>>>>>>
>>>>>>> 1) Addition of C-family language constructs (attributes) to specify free-text
>>>>>>> tags on certain language elements, such as struct fields.
>>>>>>>
>>>>>>> The purpose of these annotations is to provide additional information about
>>>>>>> types, variables, and function parameters of interest to the kernel. A
>>>>>>> driving use case is to tag pointer types within the linux kernel and eBPF
>>>>>>> programs with additional semantic information, such as '__user' or '__rcu'.
>>>>>>>
>>>>>>> For example, consider the linux kernel function do_execve with the
>>>>>>> following declaration:
>>>>>>>
>>>>>>> static int do_execve(struct filename *filename,
>>>>>>> const char __user *const __user *__argv,
>>>>>>> const char __user *const __user *__envp);
>>>>>>>
>>>>>>> Here, __user could be defined with these annotations to record semantic
>>>>>>> information about the pointer parameters (e.g., they are user-provided) in
>>>>>>> DWARF and BTF information. Other kernel facilites such as the eBPF verifier
>>>>>>> can read the tags and make use of the information.
>>>>>>>
>>>>>>> 2) Conveying the tags in the generated DWARF debug info.
>>>>>>>
>>>>>>> The main motivation for emitting the tags in DWARF is that the Linux kernel
>>>>>>> generates its BTF information via pahole, using DWARF as a source:
>>>>>>>
>>>>>>> +--------+ BTF BTF +----------+
>>>>>>> | pahole |-------> vmlinux.btf ------->| verifier |
>>>>>>> +--------+ +----------+
>>>>>>> ^ ^
>>>>>>> | |
>>>>>>> DWARF | BTF |
>>>>>>> | |
>>>>>>> vmlinux +-------------+
>>>>>>> module1.ko | BPF program |
>>>>>>> module2.ko +-------------+
>>>>>>> ...
>>>>>>>
>>>>>>> This is because:
>>>>>>>
>>>>>>> a) Unlike GCC, LLVM will only generate BTF for BPF programs.
>>>>>>>
>>>>>>> b) GCC can generate BTF for whatever target with -gbtf, but there is no
>>>>>>> support for linking/deduplicating BTF in the linker.
>>>>>>>
>>>>>>> In the scenario above, the verifier needs access to the pointer tags of
>>>>>>> both the kernel types/declarations (conveyed in the DWARF and translated
>>>>>>> to BTF by pahole) and those of the BPF program (available directly in BTF).
>>>>>>>
>>>>>>> Another motivation for having the tag information in DWARF, unrelated to
>>>>>>> BPF and BTF, is that the drgn project (another DWARF consumer) also wants
>>>>>>> to benefit from these tags in order to differentiate between different
>>>>>>> kinds of pointers in the kernel.
>>>>>>>
>>>>>>> 3) Conveying the tags in the generated BTF debug info.
>>>>>>>
>>>>>>> This is easy: the main purpose of having this info in BTF is for the
>>>>>>> compiled eBPF programs. The kernel verifier can then access the tags
>>>>>>> of pointers used by the eBPF programs.
>>>>>>>
>>>>>>>
>>>>>>> For more information about these tags and the motivation behind them, please
>>>>>>> refer to the following linux kernel discussions:
>>>>>>>
>>>>>>> https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
>>>>>>> https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
>>>>>>> https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/
>>>>>>>
>>>>>>>
>>>>>>> Implementation Overview
>>>>>>> =======================
>>>>>>>
>>>>>>> To enable these annotations, two new C language attributes are added:
>>>>>>> __attribute__((debug_annotate_decl("foo"))) and
>>>>>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a single
>>>>>>> arbitrary string constant argument, which will be recorded in the generated
>>>>>>> DWARF and/or BTF debug information. They have no effect on code generation.
>>>>>>>
>>>>>>> Note that we are not using the same attribute names as LLVM (btf_decl_tag and
>>>>>>> btf_type_tag, respectively). While these attributes are functionally very
>>>>>>> similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
>>>>>>> in the attribute name seems misleading.
>>>>>>>
>>>>>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
>>>>>>> declarations and types will be checked for the corresponding attributes. If
>>>>>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
>>>>>>> the annotated type or declaration, one for each tag. These DIEs link the
>>>>>>> arbitrary tag value to the item they annotate.
>>>>>>>
>>>>>>> For example, the following variable declaration:
>>>>>>>
>>>>>>> #define __typetag1 __attribute__((debug_annotate_type ("typetag1")))
>>>>>>>
>>>>>>> #define __decltag1 __attribute__((debug_annotate_decl ("decltag1")))
>>>>>>> #define __decltag2 __attribute__((debug_annotate_decl ("decltag2")))
>>>>>>>
>>>>>>> int * __typetag1 x __decltag1 __decltag2;
>>>>>>
>>>>>> Based on the above example
>>>>>> static int do_execve(struct filename *filename,
>>>>>> const char __user *const __user *__argv,
>>>>>> const char __user *const __user *__envp);
>>>>>>
>>>>>> Should the above example should be the below?
>>>>>> int __typetag1 * x __decltag1 __decltag2
>>>>>>
>>>>> This example is not related to the one above. It is just meant to
>>>>> show the behavior of both attributes. My apologies for not making
>>>>> that clear.
>>>>
>>>> Okay, it should be fine if the dwarf debug_info is shown.
>>>>
>>>>>
>>>>>>>
>>>>>>> Produces the following DWARF information:
>>>>>>>
>>>>>>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>>>>>>> <1f> DW_AT_name : x
>>>>>>> <21> DW_AT_decl_file : 1
>>>>>>> <22> DW_AT_decl_line : 7
>>>>>>> <23> DW_AT_decl_column : 18
>>>>>>> <24> DW_AT_type : <0x49>
>>>>>>> <28> DW_AT_external : 1
>>>>>>> <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0)
>>>>>>> <32> DW_AT_sibling : <0x49>
>>>>>>> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>> <37> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl
>>>>>>> <3b> DW_AT_const_value : (indirect string, offset: 0xcd): decltag2
>>>>>>> <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>> <40> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl
>>>>>>> <44> DW_AT_const_value : (indirect string, offset: 0x0): decltag1
>>>>>>> <2><48>: Abbrev Number: 0
>>>>>>> <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>>>>>>> <4a> DW_AT_byte_size : 8
>>>>>>> <4b> DW_AT_type : <0x5d>
>>>>>>> <4f> DW_AT_sibling : <0x5d>
>>>>>>> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>> <54> DW_AT_name : (indirect string, offset: 0x9): debug_annotate_type
>>>>>>> <58> DW_AT_const_value : (indirect string, offset: 0x1d): typetag1
>>>>>>> <2><5c>: Abbrev Number: 0
>>>>>>> <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>>>>>>> <5e> DW_AT_byte_size : 4
>>>>>>> <5f> DW_AT_encoding : 5 (signed)
>>>>>>> <60> DW_AT_name : int
>>>>>>> <1><64>: Abbrev Number: 0
>>>>
>>>> This shows the info in .debug_abbrev. What I mean is to
>>>> show the related info in .debug_info section which seems more useful to
>>>> understand the relationships between different tags. Maybe this is due
>>>> to that I am not fully understanding what <1>/<2> means in <1><49> and
>>>> <2><53> etc.
>>> I think that dump actually shows .debug_info, with the abbrevs
>>> expanded...
>>> Anyway, it seems to us that the root of this problem is the fact the
>>> kernel sparse annotations, such as address_space(__user), are:
>>> 1) To be processed by an external kernel-specific tool (
>>> https://sparse.docs.kernel.org/en/latest/annotations.html) and not a
>>> C compiler, and therefore,
>>> 2) Not quite the same than compiler attributes (despite the way they
>>> look.) In particular, they seem to assume an ordering different than
>>> of GNU attributes: in some cases given the same written order, they
>>> refer to different things!. Which is quite unfortunate :(
>>
>> Yes, currently __user/__kernel macros (implemented with address_space
>> attribute) are processed by macros.
>>
>>> Now, if I understood properly, you plan to change the definition of
>>> __user and __kernel in the kernel sources in order to generate the tag
>>> compiler attributes, correct?
>>
>> Right. The original __user definition likes:
>> # define __user __attribute__((noderef, address_space(__user)))
>>
>> The new attribute looks like
>> # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value)))
>> # define __user BTF_TYPE_TAG(user)
>
> Ok I see. So the kernel will stop using sparse attributes to implement
> __user and __kernel and start using compiler attributes for tags
> instead.
>
>>> Is that the reason why LLVM implements what we assume to be the
>>> sparse
>>> ordering, and not the correct GNU attributes ordering, for the tag
>>> attributes?
>>
>> Note that __user attributes apply to pointee's and not pointers.
>> Just like
>> const int *p;
>> the 'const' is not applied to pointer 'p', but the pointee of 'p'.
>>
>> What current llvm dwarf generation with
>> pointer
>> <--- btf_type_tag
>> is just ONE implementation. As I said earlier, I am okay to
>> have dwarf implementation like
>> p->btf_type_tag->const->int.
>> If you can propose an implementation like this in dwarf. I can propose
>> to change implementation in llvm.
>
> I think we are miscommunicating.
>
> Looks like there is a divergence on what attributes apply to what
> language entities between the sparse compiler and GCC/LLVM. How to
> represent that in DWARF is a different matter.
>
> For this example:
>
> int __typetag1 * __typetag2 __typetag3 * g;
>
> a) GCC associates __typetag1 with the pointer-to-pointer-to-int.
> b) LLVM associates __typetag1 to pointer-to-int.
>
> Where:
>
> a) Is the expected behavior of a compiler attributes, as documented in
> the GCC manual.
>
> b) Is presumably what the sparse compiler expects, but _not_ the
> ordering expected for a compiler GNU attribute.
>
> So, if the kernel source __user and __kernel annotations (which
> currently expand to sparse attributes) follow the sparse ordering, and
> you want to implement __user and __kernel in terms of compiler
> attributes instead (the annotation attributes) then you will have to:
>
> 1) Fix LLVM to implement the usual ordering for these attributes and
> 2) fix the kernel sources to use that ordering
>
> [Incidentally, the same applies to another "ex-sparse" attribute you
> have in the kernel and also implemented in LLVM with a weird ordering:
> the address_space attribute.]
>
> For 2), it may be possible to write a coccinnelle script to generate the
> patch...
I don't think (2) (to change kernel source for different attr ordering)
will work. So the only thing we can do is in compiler/pahole except
macro replacement in kernel.
>
> Does this make sense?
>
>>> If that is so, we have quite a problem here: I don't think we can
>>> change
>>> the way GCC handles GNU-like attributes just because the kernel sources
>>> want to hook on these __user/__kernel sparse annotations to generate the
>>> compiler tags, even if we could mayhaps get GCC to handle
>>> debug_annotate_type and debug_annotate_decl differently. Some would say
>>> doing so would perpetuate the mistake instead of fixing it...
>>> Is my understanding correct?
>>
>> Let us just say that the btf_type_tag attribute applies to pointees.
>> Does this help?
>>
>>>
>>>>>>
>>>>>> Maybe you can also show what dwarf debug_info looks like
>>>>> I am not sure what you mean. This is the .debug_info section as output
>>>>> by readelf -w. I did trim some information not relevant to the discussion
>>>>> such as the DW_TAG_compile_unit DIE, for brevity.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> In the case of BTF, the annotations are recorded in two type kinds recently
>>>>>>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>>>>>> The above example declaration prodcues the following BTF information:
>>>>>>>
>>>>>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>>>>>>> [2] PTR '(anon)' type_id=3
>>>>>>> [3] TYPE_TAG 'typetag1' type_id=1
>>>>>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1
>>>>>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1
>>>>>>> [6] VAR 'x' type_id=2, linkage=global
>>>>>>> [7] DATASEC '.bss' size=0 vlen=1
>>>>>>> type_id=6 offset=0 size=8 (VAR 'x')
>>>>>>>
>>>>>>>
>>>>>> [...]
next prev parent reply other threads:[~2022-06-24 18:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-07 21:43 David Faust
2022-06-07 21:43 ` [PATCH 1/9] dwarf: add dw_get_die_parent function David Faust
2022-06-13 10:13 ` Richard Biener
2022-06-07 21:43 ` [PATCH 2/9] include: Add new definitions David Faust
2022-06-07 21:43 ` [PATCH 3/9] c-family: Add debug_annotate attribute handlers David Faust
2022-06-07 21:43 ` [PATCH 4/9] dwarf: generate annotation DIEs David Faust
2022-06-07 21:43 ` [PATCH 5/9] ctfc: pass through debug annotations to BTF David Faust
2022-06-07 21:43 ` [PATCH 6/9] dwarf2ctf: convert annotation DIEs to CTF types David Faust
2022-06-07 21:43 ` [PATCH 7/9] btf: output decl_tag and type_tag records David Faust
2022-06-07 21:43 ` [PATCH 8/9] doc: document new attributes David Faust
2022-06-07 21:43 ` [PATCH 9/9] testsuite: add debug annotation tests David Faust
2022-06-15 5:53 ` [PATCH 0/9] Add debug_annotate attributes Yonghong Song
2022-06-15 20:57 ` David Faust
2022-06-15 22:56 ` Yonghong Song
2022-06-17 17:18 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} WAS: " Jose E. Marchesi
2022-06-20 17:06 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-06-21 16:12 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} " Jose E. Marchesi
2022-06-24 18:01 ` Yonghong Song [this message]
2022-07-07 20:24 ` Jose E. Marchesi
2022-07-13 4:23 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-07-14 15:09 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} " Jose E. Marchesi
2022-07-15 1:20 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-07-15 14:17 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} " Jose E. Marchesi
2022-07-15 16:48 ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-11-01 22:29 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0c104e7b-1873-c141-37b9-71444f585793@fb.com \
--to=yhs@fb.com \
--cc=david.faust@oracle.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jose.marchesi@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).