public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: David Faust <david.faust@oracle.com>, gcc-patches@gcc.gnu.org
Subject: Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
Date: Tue, 12 Jul 2022 21:23:36 -0700	[thread overview]
Message-ID: <94288e98-b6b8-b4b7-27a4-572f6150c691@fb.com> (raw)
In-Reply-To: <87let4isc8.fsf@oracle.com>



On 7/7/22 1:24 PM, Jose E. Marchesi wrote:
> 
> Hi Yonghong.
> 
>> On 6/21/22 9:12 AM, Jose E. Marchesi wrote:
>>>
>>>> On 6/17/22 10:18 AM, Jose E. Marchesi wrote:
>>>>> Hi Yonghong.
>>>>>
>>>>>> On 6/15/22 1:57 PM, David Faust wrote:
>>>>>>>
>>>>>>> On 6/14/22 22:53, Yonghong Song wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/7/22 2:43 PM, David Faust wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> This patch series adds support for:
>>>>>>>>>
>>>>>>>>> - Two new C-language-level attributes that allow to associate (to "annotate" or
>>>>>>>>>        to "tag") particular declarations and types with arbitrary strings. As
>>>>>>>>>        explained below, this is intended to be used to, for example, characterize
>>>>>>>>>        certain pointer types.
>>>>>>>>>
>>>>>>>>> - The conveyance of that information in the DWARF output in the form of a new
>>>>>>>>>        DIE: DW_TAG_GNU_annotation.
>>>>>>>>>
>>>>>>>>> - The conveyance of that information in the BTF output in the form of two new
>>>>>>>>>        kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>>>>>>>>
>>>>>>>>> All of these facilities are being added to the eBPF ecosystem, and support for
>>>>>>>>> them exists in some form in LLVM.
>>>>>>>>>
>>>>>>>>> Purpose
>>>>>>>>> =======
>>>>>>>>>
>>>>>>>>> 1)  Addition of C-family language constructs (attributes) to specify free-text
>>>>>>>>>          tags on certain language elements, such as struct fields.
>>>>>>>>>
>>>>>>>>>          The purpose of these annotations is to provide additional information about
>>>>>>>>>          types, variables, and function parameters of interest to the kernel. A
>>>>>>>>>          driving use case is to tag pointer types within the linux kernel and eBPF
>>>>>>>>>          programs with additional semantic information, such as '__user' or '__rcu'.
>>>>>>>>>
>>>>>>>>>          For example, consider the linux kernel function do_execve with the
>>>>>>>>>          following declaration:
>>>>>>>>>
>>>>>>>>>            static int do_execve(struct filename *filename,
>>>>>>>>>               const char __user *const __user *__argv,
>>>>>>>>>               const char __user *const __user *__envp);
>>>>>>>>>
>>>>>>>>>          Here, __user could be defined with these annotations to record semantic
>>>>>>>>>          information about the pointer parameters (e.g., they are user-provided) in
>>>>>>>>>          DWARF and BTF information. Other kernel facilites such as the eBPF verifier
>>>>>>>>>          can read the tags and make use of the information.
>>>>>>>>>
>>>>>>>>> 2)  Conveying the tags in the generated DWARF debug info.
>>>>>>>>>
>>>>>>>>>          The main motivation for emitting the tags in DWARF is that the Linux kernel
>>>>>>>>>          generates its BTF information via pahole, using DWARF as a source:
>>>>>>>>>
>>>>>>>>>              +--------+  BTF                  BTF   +----------+
>>>>>>>>>              | pahole |-------> vmlinux.btf ------->| verifier |
>>>>>>>>>              +--------+                             +----------+
>>>>>>>>>                  ^                                        ^
>>>>>>>>>                  |                                        |
>>>>>>>>>            DWARF |                                    BTF |
>>>>>>>>>                  |                                        |
>>>>>>>>>               vmlinux                              +-------------+
>>>>>>>>>               module1.ko                           | BPF program |
>>>>>>>>>               module2.ko                           +-------------+
>>>>>>>>>                 ...
>>>>>>>>>
>>>>>>>>>          This is because:
>>>>>>>>>
>>>>>>>>>          a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>>>>>>>>
>>>>>>>>>          b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>>>>>>>>>              support for linking/deduplicating BTF in the linker.
>>>>>>>>>
>>>>>>>>>          In the scenario above, the verifier needs access to the pointer tags of
>>>>>>>>>          both the kernel types/declarations (conveyed in the DWARF and translated
>>>>>>>>>          to BTF by pahole) and those of the BPF program (available directly in BTF).
>>>>>>>>>
>>>>>>>>>          Another motivation for having the tag information in DWARF, unrelated to
>>>>>>>>>          BPF and BTF, is that the drgn project (another DWARF consumer) also wants
>>>>>>>>>          to benefit from these tags in order to differentiate between different
>>>>>>>>>          kinds of pointers in the kernel.
>>>>>>>>>
>>>>>>>>> 3)  Conveying the tags in the generated BTF debug info.
>>>>>>>>>
>>>>>>>>>          This is easy: the main purpose of having this info in BTF is for the
>>>>>>>>>          compiled eBPF programs. The kernel verifier can then access the tags
>>>>>>>>>          of pointers used by the eBPF programs.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For more information about these tags and the motivation behind them, please
>>>>>>>>> refer to the following linux kernel discussions:
>>>>>>>>>
>>>>>>>>>        https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
>>>>>>>>>        https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
>>>>>>>>>        https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Implementation Overview
>>>>>>>>> =======================
>>>>>>>>>
>>>>>>>>> To enable these annotations, two new C language attributes are added:
>>>>>>>>> __attribute__((debug_annotate_decl("foo"))) and
>>>>>>>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a single
>>>>>>>>> arbitrary string constant argument, which will be recorded in the generated
>>>>>>>>> DWARF and/or BTF debug information. They have no effect on code generation.
>>>>>>>>>
>>>>>>>>> Note that we are not using the same attribute names as LLVM (btf_decl_tag and
>>>>>>>>> btf_type_tag, respectively). While these attributes are functionally very
>>>>>>>>> similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
>>>>>>>>> in the attribute name seems misleading.
>>>>>>>>>
>>>>>>>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
>>>>>>>>> declarations and types will be checked for the corresponding attributes. If
>>>>>>>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
>>>>>>>>> the annotated type or declaration, one for each tag. These DIEs link the
>>>>>>>>> arbitrary tag value to the item they annotate.
>>>>>>>>>
>>>>>>>>> For example, the following variable declaration:
>>>>>>>>>
>>>>>>>>>        #define __typetag1 __attribute__((debug_annotate_type ("typetag1")))
>>>>>>>>>
>>>>>>>>>        #define __decltag1 __attribute__((debug_annotate_decl ("decltag1")))
>>>>>>>>>        #define __decltag2 __attribute__((debug_annotate_decl ("decltag2")))
>>>>>>>>>
>>>>>>>>>        int * __typetag1 x __decltag1 __decltag2;
>>>>>>>>
>>>>>>>> Based on the above example
>>>>>>>>              static int do_execve(struct filename *filename,
>>>>>>>>                const char __user *const __user *__argv,
>>>>>>>>                const char __user *const __user *__envp);
>>>>>>>>
>>>>>>>> Should the above example should be the below?
>>>>>>>>          int __typetag1 * x __decltag1 __decltag2
>>>>>>>>
>>>>>>> This example is not related to the one above. It is just meant to
>>>>>>> show the behavior of both attributes. My apologies for not making
>>>>>>> that clear.
>>>>>>
>>>>>> Okay, it should be fine if the dwarf debug_info is shown.
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Produces the following DWARF information:
>>>>>>>>>
>>>>>>>>>       <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>>>>>>>>>          <1f>   DW_AT_name        : x
>>>>>>>>>          <21>   DW_AT_decl_file   : 1
>>>>>>>>>          <22>   DW_AT_decl_line   : 7
>>>>>>>>>          <23>   DW_AT_decl_column : 18
>>>>>>>>>          <24>   DW_AT_type        : <0x49>
>>>>>>>>>          <28>   DW_AT_external    : 1
>>>>>>>>>          <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>>>>>>>>>          <32>   DW_AT_sibling     : <0x49>
>>>>>>>>>       <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>>>>          <37>   DW_AT_name        : (indirect string, offset: 0xd6): debug_annotate_decl
>>>>>>>>>          <3b>   DW_AT_const_value : (indirect string, offset: 0xcd): decltag2
>>>>>>>>>       <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>>>>          <40>   DW_AT_name        : (indirect string, offset: 0xd6): debug_annotate_decl
>>>>>>>>>          <44>   DW_AT_const_value : (indirect string, offset: 0x0): decltag1
>>>>>>>>>       <2><48>: Abbrev Number: 0
>>>>>>>>>       <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>>>>>>>>>          <4a>   DW_AT_byte_size   : 8
>>>>>>>>>          <4b>   DW_AT_type        : <0x5d>
>>>>>>>>>          <4f>   DW_AT_sibling     : <0x5d>
>>>>>>>>>       <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>>>>          <54>   DW_AT_name        : (indirect string, offset: 0x9): debug_annotate_type
>>>>>>>>>          <58>   DW_AT_const_value : (indirect string, offset: 0x1d): typetag1
>>>>>>>>>       <2><5c>: Abbrev Number: 0
>>>>>>>>>       <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>>>>>>>>>          <5e>   DW_AT_byte_size   : 4
>>>>>>>>>          <5f>   DW_AT_encoding    : 5	(signed)
>>>>>>>>>          <60>   DW_AT_name        : int
>>>>>>>>>       <1><64>: Abbrev Number: 0
>>>>>>
>>>>>> This shows the info in .debug_abbrev. What I mean is to
>>>>>> show the related info in .debug_info section which seems more useful to
>>>>>> understand the relationships between different tags. Maybe this is due
>>>>>> to that I am not fully understanding what <1>/<2> means in <1><49> and
>>>>>> <2><53> etc.
>>>>> I think that dump actually shows .debug_info, with the abbrevs
>>>>> expanded...
>>>>> Anyway, it seems to us that the root of this problem is the fact the
>>>>> kernel sparse annotations, such as address_space(__user), are:
>>>>> 1) To be processed by an external kernel-specific tool (
>>>>>       https://sparse.docs.kernel.org/en/latest/annotations.html) and not a
>>>>>       C compiler, and therefore,
>>>>> 2) Not quite the same than compiler attributes (despite the way they
>>>>>       look.)  In particular, they seem to assume an ordering different than
>>>>>       of GNU attributes: in some cases given the same written order, they
>>>>>       refer to different things!.  Which is quite unfortunate :(
>>>>
>>>> Yes, currently __user/__kernel macros (implemented with address_space
>>>> attribute) are processed by macros.
>>>>
>>>>> Now, if I understood properly, you plan to change the definition of
>>>>> __user and __kernel in the kernel sources in order to generate the tag
>>>>> compiler attributes, correct?
>>>>
>>>> Right. The original __user definition likes:
>>>>     # define __user         __attribute__((noderef, address_space(__user)))
>>>>
>>>> The new attribute looks like
>>>>     # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value)))
>>>>     #  define __user        BTF_TYPE_TAG(user)
>>> Ok I see.  So the kernel will stop using sparse attributes to
>>> implement
>>> __user and __kernel and start using compiler attributes for tags
>>> instead.
>>>
>>>>> Is that the reason why LLVM implements what we assume to be the
>>>>> sparse
>>>>> ordering, and not the correct GNU attributes ordering, for the tag
>>>>> attributes?
>>>>
>>>> Note that __user attributes apply to pointee's and not pointers.
>>>> Just like
>>>>      const int *p;
>>>> the 'const' is not applied to pointer 'p', but the pointee of 'p'.
>>>>
>>>> What current llvm dwarf generation with
>>>>      pointer
>>>>        <--- btf_type_tag
>>>> is just ONE implementation. As I said earlier, I am okay to
>>>> have dwarf implementation like
>>>>      p->btf_type_tag->const->int.
>>>> If you can propose an implementation like this in dwarf. I can propose
>>>> to change implementation in llvm.
>>> I think we are miscommunicating.
>>> Looks like there is a divergence on what attributes apply to what
>>> language entities between the sparse compiler and GCC/LLVM.  How to
>>> represent that in DWARF is a different matter.
>>> For this example:
>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>> a) GCC associates __typetag1 with the pointer-to-pointer-to-int.
>>> b) LLVM associates __typetag1 to pointer-to-int.
>>> Where:
>>> a) Is the expected behavior of a compiler attributes, as documented
>>> in
>>>      the GCC manual.
>>> b) Is presumably what the sparse compiler expects, but _not_ the
>>>      ordering expected for a compiler GNU attribute.
>>> So, if the kernel source __user and __kernel annotations (which
>>> currently expand to sparse attributes) follow the sparse ordering, and
>>> you want to implement __user and __kernel in terms of compiler
>>> attributes instead (the annotation attributes) then you will have to:
>>> 1) Fix LLVM to implement the usual ordering for these attributes and
>>> 2) fix the kernel sources to use that ordering
>>> [Incidentally, the same applies to another "ex-sparse" attribute you
>>>    have in the kernel and also implemented in LLVM with a weird ordering:
>>>    the address_space attribute.]
>>> For 2), it may be possible to write a coccinnelle script to generate
>>> the
>>> patch...
>>
>> I don't think (2) (to change kernel source for different attr ordering)
>> will work. So the only thing we can do is in compiler/pahole except
>> macro replacement in kernel.
> 
> I looked at sparse and its parser.  Wanted to be sure the ordering it
> uses to interpret sparse annotations (such as address_space, alignment,
> etc) is definitely _not_ the same ordering used by __attribute__ in C
> compilers.
> 
> It is very different indeed and the same can be said about how sparse
> interprets other modifiers like `const': in sparse both `int const *foo'
> and `int *const foo' parse to a constant pointer to int, for example.
> 
> I am not to judge how sparse handles its annotations.  It may be very
> well and pertinent for its particular purpose.
> 
> But I am not sure if it is reasonable to expect C compilers to implement
> certain type __attributes__ to parse differently, just because it
> happens these attributes are reused from sparse annotations in a
> particular program (in this case the kernel.)  The debug_annotate_decl
> and debug_annotate_type attributes are not even intended to be
> kernel-specific.
> 
> So, if changing the kernel sources is not an option (why btw, other than
> being a PITA?) at this point I really don't know what else to suggest :/
> 
> Any suggestion from the front-end people?

Just want to understand the overall picture. So gcc can still emit
BTF properly with btf_type_tag right? The issue we are talking about
here is about the dwarf, right? If this is the case, we might have
a partial solution here.
   - gcc emits BTF for vmlinux
   - gcc emits dwarf for vmlinux ignoring btf_type_tag
   - in pahole, vmlinux BTF is amended with some additional misc things.
Although there are some use cases to have btf_type_tag in dwarf, but
that can be workarouned with BTF + dwarf both of which are generated
by the compiler. Not elegent, but probably works.

> 
>>> Does this make sense?
>>>
>>>>> If that is so, we have quite a problem here: I don't think we can
>>>>> change
>>>>> the way GCC handles GNU-like attributes just because the kernel sources
>>>>> want to hook on these __user/__kernel sparse annotations to generate the
>>>>> compiler tags, even if we could mayhaps get GCC to handle
>>>>> debug_annotate_type and debug_annotate_decl differently.  Some would say
>>>>> doing so would perpetuate the mistake instead of fixing it...
>>>>> Is my understanding correct?
>>>>
>>>> Let us just say that the btf_type_tag attribute applies to pointees.
>>>> Does this help?
>>>>
>>>>>
>>>>>>>>
>>>>>>>> Maybe you can also show what dwarf debug_info looks like
>>>>>>> I am not sure what you mean. This is the .debug_info section as output
>>>>>>> by readelf -w. I did trim some information not relevant to the discussion
>>>>>>> such as the DW_TAG_compile_unit DIE, for brevity.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> In the case of BTF, the annotations are recorded in two type kinds recently
>>>>>>>>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>>>>>>>> The above example declaration prodcues the following BTF information:
>>>>>>>>>
>>>>>>>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>>>>>>>>> [2] PTR '(anon)' type_id=3
>>>>>>>>> [3] TYPE_TAG 'typetag1' type_id=1
>>>>>>>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1
>>>>>>>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1
>>>>>>>>> [6] VAR 'x' type_id=2, linkage=global
>>>>>>>>> [7] DATASEC '.bss' size=0 vlen=1
>>>>>>>>> 	type_id=6 offset=0 size=8 (VAR 'x')
>>>>>>>>>
>>>>>>>>>
>>>>>>>> [...]

  reply	other threads:[~2022-07-13  4:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 21:43 David Faust
2022-06-07 21:43 ` [PATCH 1/9] dwarf: add dw_get_die_parent function David Faust
2022-06-13 10:13   ` Richard Biener
2022-06-07 21:43 ` [PATCH 2/9] include: Add new definitions David Faust
2022-06-07 21:43 ` [PATCH 3/9] c-family: Add debug_annotate attribute handlers David Faust
2022-06-07 21:43 ` [PATCH 4/9] dwarf: generate annotation DIEs David Faust
2022-06-07 21:43 ` [PATCH 5/9] ctfc: pass through debug annotations to BTF David Faust
2022-06-07 21:43 ` [PATCH 6/9] dwarf2ctf: convert annotation DIEs to CTF types David Faust
2022-06-07 21:43 ` [PATCH 7/9] btf: output decl_tag and type_tag records David Faust
2022-06-07 21:43 ` [PATCH 8/9] doc: document new attributes David Faust
2022-06-07 21:43 ` [PATCH 9/9] testsuite: add debug annotation tests David Faust
2022-06-15  5:53 ` [PATCH 0/9] Add debug_annotate attributes Yonghong Song
2022-06-15 20:57   ` David Faust
2022-06-15 22:56     ` Yonghong Song
2022-06-17 17:18       ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} WAS: " Jose E. Marchesi
2022-06-20 17:06         ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-06-21 16:12           ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} " Jose E. Marchesi
2022-06-24 18:01             ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-07-07 20:24               ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} " Jose E. Marchesi
2022-07-13  4:23                 ` Yonghong Song [this message]
2022-07-14 15:09                   ` Jose E. Marchesi
2022-07-15  1:20                     ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-07-15 14:17                       ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} " Jose E. Marchesi
2022-07-15 16:48                         ` kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} " Yonghong Song
2022-11-01 22:29       ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94288e98-b6b8-b4b7-27a4-572f6150c691@fb.com \
    --to=yhs@fb.com \
    --cc=david.faust@oracle.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jose.marchesi@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).