public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Jose E. Marchesi" <jose.marchesi@oracle.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: David Faust <david.faust@oracle.com>,
	gcc-patches@gcc.gnu.org, yhs@meta.com,
	Eduard Zingerman <eddyz87@gmail.com>
Subject: Re: [PATCH 0/9] Add btf_decl_tag C attribute
Date: Wed, 12 Jul 2023 14:43:47 +0200	[thread overview]
Message-ID: <87y1jlz4e4.fsf@oracle.com> (raw)
In-Reply-To: <CAFiYyc2gp-HUdc5ZQRGr0ATiOF3AzpeC2+Sy=chFe744qN-DSg@mail.gmail.com> (Richard Biener's message of "Wed, 12 Jul 2023 09:38:22 +0200")


[Added Eduard Zingerman in CC, who is implementing this same feature in
 clang/llvm and also the consumer component in the kernel (pahole).]

Hi Richard.

> On Tue, Jul 11, 2023 at 11:58 PM David Faust via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hello,
>>
>> This series adds support for a new attribute, "btf_decl_tag" in GCC.
>> The same attribute is already supported in clang, and is used by various
>> components of the BPF ecosystem.
>>
>> The purpose of the attribute is to allow to associate (to "tag")
>> declarations with arbitrary string annotations, which are emitted into
>> debugging information (DWARF and/or BTF) to facilitate post-compilation
>> analysis (the motivating use case being the Linux kernel BPF verifier).
>> Multiple tags are allowed on the same declaration.
>>
>> These strings are not interpreted by the compiler, and the attribute
>> itself has no effect on generated code, other than to produce additional
>> DWARF DIEs and/or BTF records conveying the annotations.
>>
>> This entails:
>>
>> - A new C-language-level attribute which allows to associate (to "tag")
>>   particular declarations with arbitrary strings.
>>
>> - The conveyance of that information in DWARF in the form of a new DIE,
>>   DW_TAG_GNU_annotation, with tag number (0x6000) and format matching
>>   that of the DW_TAG_LLVM_annotation extension supported in LLVM for
>>   the same purpose. These DIEs are already supported by BPF tooling,
>>   such as pahole.
>>
>> - The conveyance of that information in BTF debug info in the form of
>>   BTF_KIND_DECL_TAG records. These records are already supported by
>>   LLVM and other tools in the eBPF ecosystem, such as the Linux kernel
>>   eBPF verifier.
>>
>>
>> Background
>> ==========
>>
>> The purpose of these tags is to convey additional semantic information
>> to post-compilation consumers, in particular the Linux kernel eBPF
>> verifier. The verifier can make use of that information while analyzing
>> a BPF program to aid in determining whether to allow or reject the
>> program to be run. More background on these tags can be found in the
>> early support for them in the kernel here [1] and [2].
>>
>> The "btf_decl_tag" attribute is half the story; the other half is a
>> sibling attribute "btf_type_tag" which serves the same purpose but
>> applies to types. Support for btf_type_tag will come in a separate
>> patch series, since it is impaced by GCC bug 110439 which needs to be
>> addressed first.
>>
>> I submitted an initial version of this work (including btf_type_tag)
>> last spring [3], however at the time there were some open questions
>> about the behavior of the btf_type_tag attribute and issues with its
>> implementation. Since then we have clarified these details and agreed
>> to solutions with the BPF community and LLVM BPF folks.
>>
>> The main motivation for emitting the tags in DWARF is that the Linux
>> kernel generates its BTF information via pahole, using DWARF as a source:
>>
>>     +--------+  BTF                  BTF   +----------+
>>     | pahole |-------> vmlinux.btf ------->| verifier |
>>     +--------+                             +----------+
>>         ^                                        ^
>>         |                                        |
>>   DWARF |                                    BTF |
>>         |                                        |
>>       vmlinux                              +-------------+
>>       module1.ko                           | BPF program |
>>       module2.ko                           +-------------+
>>         ...
>>
>> This is because:
>>
>> a)  pahole adds additional kernel-specific information into the
>>     produced BTF based on additional analysis of kernel objects.
>>
>> b)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>
>> b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>>     support for linking/deduplicating BTF in the linker.
>>
>> In the scenario above, the verifier needs access to the pointer tags of
>> both the kernel types/declarations (conveyed in the DWARF and translated
>> to BTF by pahole) and those of the BPF program (available directly in BTF).
>>
>>
>> DWARF Representation
>> ====================
>>
>> As noted above, btf_decl_tag is represented in DWARF via a new DIE
>> DW_TAG_GNU_annotation, with identical format to the LLVM DWARF
>> extension DW_TAG_LLVM_annotation serving the same purpose. The DIE has
>> the following format:
>>
>>   DW_TAG_GNU_annotation (0x6000)
>>     DW_AT_name: "btf_decl_tag"
>>     DW_AT_const_value: <string argument>
>>
>> These DIEs are placed in the DWARF tree as children of the DIE for the
>> appropriate declaration, and one such DIE is created for each occurrence
>> of the btf_decl_tag attribute on a declaration.
>>
>> For example:
>>
>>   const int * c __attribute__((btf_decl_tag ("__c"), btf_decl_tag ("devicemem")));
>>
>> This declaration produces the following DWARF:
>>
>>  <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>>     <1f>   DW_AT_name        : c
>>     <24>   DW_AT_type        : <0x49>
>>     ...
>>  <2><36>: Abbrev Number: 3 (User TAG value: 0x6000)
>>     <37>   DW_AT_name        : (indirect string, offset: 0x4c): btf_decl_tag
>>     <3b>   DW_AT_const_value : (indirect string, offset: 0): devicemem
>>  <2><3f>: Abbrev Number: 4 (User TAG value: 0x6000)
>>     <40>   DW_AT_name        : (indirect string, offset: 0x4c): btf_decl_tag
>>     <44>   DW_AT_const_value : __c
>>  <2><48>: Abbrev Number: 0
>>  <1><49>: Abbrev Number: 5 (DW_TAG_pointer_type)
>>  ...
>>
>> The DIEs for btf_decl_tag are placed as children of the DIE for
>> variable "c".
>
> It looks like a bit of overkill, and inefficient as well.  Why's the
> tags not referenced via the existing DW_AT_description?

The DWARF spec ("Entity Descriptions") seems to imply that the
DW_AT_description attribute is intended to be used to hold alternative
ways to denote the same "debugging information" (object, type, ...),
i.e. alternative aliases to refer to the same entity than the
DW_AT_name.  For example, for a type name='foo' we could have
description='aka. long int'.  We don't think this is the case of the btf
tags, which are more like properties partially characterizing the tagged
"debugging information", but couldn't be used as an alias to the name.

Also, repurposing the DW_AT_description attribute to hold btf tag
information would require to introduce a mini-language and subsequent
parsing by the clients: how to denote several tags, how to encode the
embedded string contents, etc.  You kick the complexity out the door and
it comes back in through the window :)

Finally, for what we know, the existing attribute may already be used by
some language and handled by some debugger the way it is recommended in
the spec.  That would be incompatible with having btf tags encoded
there.

> Iff you want new TAGs why require them as children for each DIE rather
> than referencing (and sharing!) them via a DIE reference from a new
> attribute?

Hmm, thats a very good question.  The Linux kernel sources uses both
declaration tags and type tags and not sharing the DIEs may result in
serious bloating, since the tags are brought in to declarations and type
specifiers via macros...

> That said, I'd go with DW_AT_description 'btf_decl_tag ("devicemem")'.
>
> But well ...
>
> Richard.
>
>>
>> BTF Representation
>> ==================
>>
>> In BTF, BTF_KIND_DECL_TAG records convey the annotations. These records refer
>> to the annotated object by BTF type ID, as well as a component index which is
>> used for btf_decl_tags placed on struct/union members or function arguments.
>>
>> For example, the BTF for the above declaration is:
>>
>>   [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>>   [2] CONST '(anon)' type_id=1
>>   [3] PTR '(anon)' type_id=2
>>   [4] DECL_TAG '__c' type_id=6 component_idx=-1
>>   [5] DECL_TAG 'devicemem' type_id=6 component_idx=-1
>>   [6] VAR 'c' type_id=3, linkage=global
>>   ...
>>
>> The BTF format is documented here [4].
>>
>>
>> References
>> ==========
>>
>> [1] https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
>> [2] https://lore.kernel.org/bpf/20211011040608.3031468-1-yhs@fb.com/
>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593936.html
>> [4] https://www.kernel.org/doc/Documentation/bpf/btf.rst
>>
>>
>> David Faust (9):
>>   c-family: add btf_decl_tag attribute
>>   include: add BTF decl tag defines
>>   dwarf: create annotation DIEs for decl tags
>>   dwarf: expose get_die_parent
>>   ctf: add support to pass through BTF tags
>>   dwarf2ctf: convert annotation DIEs to CTF types
>>   btf: create and output BTF_KIND_DECL_TAG types
>>   testsuite: add tests for BTF decl tags
>>   doc: document btf_decl_tag attribute
>>
>>  gcc/btfout.cc                                 | 81 ++++++++++++++++++-
>>  gcc/c-family/c-attribs.cc                     | 23 ++++++
>>  gcc/ctf-int.h                                 | 28 +++++++
>>  gcc/ctfc.cc                                   | 10 ++-
>>  gcc/ctfc.h                                    | 17 +++-
>>  gcc/doc/extend.texi                           | 47 +++++++++++
>>  gcc/dwarf2ctf.cc                              | 73 ++++++++++++++++-
>>  gcc/dwarf2out.cc                              | 37 ++++++++-
>>  gcc/dwarf2out.h                               |  1 +
>>  .../gcc.dg/debug/btf/btf-decltag-func.c       | 21 +++++
>>  .../gcc.dg/debug/btf/btf-decltag-sou.c        | 33 ++++++++
>>  .../gcc.dg/debug/btf/btf-decltag-var.c        | 19 +++++
>>  .../gcc.dg/debug/dwarf2/annotation-decl-1.c   |  9 +++
>>  .../gcc.dg/debug/dwarf2/annotation-decl-2.c   | 18 +++++
>>  .../gcc.dg/debug/dwarf2/annotation-decl-3.c   | 17 ++++
>>  include/btf.h                                 | 14 +++-
>>  include/dwarf2.def                            |  4 +
>>  17 files changed, 437 insertions(+), 15 deletions(-)
>>  create mode 100644 gcc/ctf-int.h
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-var.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-decl-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-decl-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-decl-3.c
>>
>> --
>> 2.40.1
>>

  reply	other threads:[~2023-07-12 12:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-11 21:57 David Faust
2023-07-11 21:57 ` [PATCH 1/9] c-family: add btf_decl_tag attribute David Faust
2023-07-11 21:57 ` [PATCH 2/9] include: add BTF decl tag defines David Faust
2023-07-11 21:57 ` [PATCH 3/9] dwarf: create annotation DIEs for decl tags David Faust
2023-07-11 21:57 ` [PATCH 4/9] dwarf: expose get_die_parent David Faust
2023-07-11 21:57 ` [PATCH 5/9] ctf: add support to pass through BTF tags David Faust
2023-07-11 21:57 ` [PATCH 6/9] dwarf2ctf: convert annotation DIEs to CTF types David Faust
2023-07-11 21:57 ` [PATCH 7/9] btf: create and output BTF_KIND_DECL_TAG types David Faust
2023-07-11 21:57 ` [PATCH 8/9] testsuite: add tests for BTF decl tags David Faust
2023-07-11 21:57 ` [PATCH 9/9] doc: document btf_decl_tag attribute David Faust
2023-07-12  7:38 ` [PATCH 0/9] Add btf_decl_tag C attribute Richard Biener
2023-07-12 12:43   ` Jose E. Marchesi [this message]
2023-07-12 13:21     ` Richard Biener
2023-07-12 13:49       ` Jose E. Marchesi
2023-07-12 19:33         ` David Faust
2023-07-24 15:56 ` David Faust
2023-08-09 21:05 ` [PING 2][PATCH " David Faust
2023-09-11 21:39 ` [PING][PATCH " David Faust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1jlz4e4.fsf@oracle.com \
    --to=jose.marchesi@oracle.com \
    --cc=david.faust@oracle.com \
    --cc=eddyz87@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).