public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
@ 2022-04-01 19:42 David Faust
  2022-04-01 19:42 ` [PATCH 1/8] dwarf: Add dw_get_die_parent function David Faust
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

Hello,

This patch series is a first attempt at adding support for:

- Two new C-language-level attributes that allow to associate (to "tag")
  particular declarations and types with arbitrary strings. As explained below,
  this is intended to be used to, for example, characterize certain pointer
  types.

- The conveyance of that information in the DWARF output in the form of a new
  DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
  kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM. However, as we shall see, we have found some
problems implementing them so some discussion is in order.

Purpose
=======

1)  Addition of C-family language constructs (attributes) to specify free-text
    tags on certain language elements, such as struct fields.

    The purpose of these annotations is to provide additional information about
    types, variables, and function paratemeters of interest to the kernel. A
    driving use case is to tag pointer types within the linux kernel and eBPF
    programs with additional semantic information, such as '__user' or '__rcu'.

    For example, consider the linux kernel function do_execve with the
    following declaration:

      static int do_execve(struct filename *filename,
         const char __user *const __user *__argv,
         const char __user *const __user *__envp);

    Here, __user could be defined with these annotations to record semantic
    information about the pointer parameters (e.g., they are user-provided) in
    DWARF and BTF information. Other kernel facilites such as the eBPF verifier
    can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

    The main motivation for emitting the tags in DWARF is that the Linux kernel
    generates its BTF information via pahole, using DWARF as a source:

        +--------+  BTF                  BTF   +----------+
        | pahole |-------> vmlinux.btf ------->| verifier |
        +--------+                             +----------+
            ^                                        ^
            |                                        |
      DWARF |                                    BTF |
            |                                        |
         vmlinux                              +-------------+
         module1.ko                           | BPF program |
         module2.ko                           +-------------+
           ...

    This is because:

    a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

    b)  GCC can generate BTF for whatever target with -gbtf, but there is no
        support for linking/deduplicating BTF in the linker.

    In the scenario above, the verifier needs access to the pointer tags of
    both the kernel types/declarations (conveyed in the DWARF and translated
    to BTF by pahole) and those of the BPF program (available directly in BTF).

    Another motivation for having the tag information in DWARF, unrelated to
    BPF and BTF, is that the drgn project (another DWARF consumer) also wants
    to benefit from these tags in order to differentiate between different
    kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

    This is easy: the main purpose of having this info in BTF is for the
    compiled eBPF programs. The kernel verifier can then access the tags
    of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

  https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
  https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
  https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/


What is in this patch series
============================

This patch series adds support for these annotations in GCC. The implementation
is largely complete. However, in some cases the produced debug info (both DWARF
and BTF) differs significantly from that produced by LLVM. This issue is
discussed in detail below, along with a few specific questions for both GCC and
LLVM. Any input would be much appreciated.


Implementation Overview
=======================

To enable these annotations, two new C language attributes are added:
__attribute__((btf_decl_tag("foo")) and __attribute__((btf_type_tag("bar"))).
Both attributes accept a single arbitrary string constant argument, which will
be recorded in the generated DWARF and/or BTF debugging information. They have
no effect on code generation.

Note that we are using the same attribute names as LLVM, which include "btf"
in the name. This may be controversial, as these tags are not really
BTF-specific. A different name may be more appropriate. There was much
discussion about naming in the proposal for the functionality in LLVM, the
full thread can be found here:

  https://lists.llvm.org/pipermail/llvm-dev/2021-June/151023.html

The name debug_info_annotate, suggested here, might better suit the attribute:

  https://lists.llvm.org/pipermail/llvm-dev/2021-June/151042.html

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
declarations and types will be checked for the corresponding attributes. If
present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
the annotated type or declaration, one for each tag. These DIEs link the
arbitrary tag value to the item they annotate.

For example, the following variable declaration:

    #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
    #define __decltag1 __attribute__((btf_decl_tag("decl-tag-1")))
    #define __decltag2 __attribute__((btf_decl_tag("decl-tag-2")))

    int __typetag1 * x __decltag1 __decltag2;

Produces the following DIEs:

<1><1e>: Abbrev Number: 3 (DW_TAG_variable)
    <1f>   DW_AT_name        : x
    <21>   DW_AT_decl_file   : 1
    <22>   DW_AT_decl_line   : 6
    <23>   DW_AT_decl_column : 18
    <24>   DW_AT_type        : <0x49>
    <28>   DW_AT_external    : 1
    <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
    <32>   DW_AT_sibling     : <0x49>
 <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
    <37>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
    <3b>   DW_AT_const_value : (indirect string, offset: 0x0): decl-tag-2
 <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
    <40>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
    <44>   DW_AT_const_value : (indirect string, offset: 0x1d): decl-tag-1
 <2><48>: Abbrev Number: 0
 <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
    <4a>   DW_AT_byte_size   : 8
    <4b>   DW_AT_type        : <0x5d>
    <4f>   DW_AT_sibling     : <0x5d>
 <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
    <54>   DW_AT_name        : (indirect string, offset: 0x28): btf_type_tag
    <58>   DW_AT_const_value : (indirect string, offset: 0xd7): type-tag-1
 <2><5c>: Abbrev Number: 0
 <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
    <5e>   DW_AT_byte_size   : 4
    <5f>   DW_AT_encoding    : 5	(signed)
    <60>   DW_AT_name        : int
 <1><64>: Abbrev Number: 0

Please note that currently, the annotation DWARF DIEs will be generated only if
BTF debug information requested (via -gbtf). Therefore, the annotation DIEs
will only be output if both BTF and DWARF are requested (e.g. -gbtf -gdwarf).
This will change, since these tags are needed even when not generating BTF,
for example in a GCC-built Linux kernel.

In the case of BTF, the annotations are recorded in two type kinds recently
added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
The above example declaration prodcues the following BTF information:

    [1] int 'int'(1U#B) size=4U#B offset=0UB#b bits=32UB#b SIGNED
    [2] ptr <anonymous> type=3
    [3] type_tag 'type-tag-1'(5U#B) type=1
    [4] decl_tag 'decl-tag-1'(18U#B) type=6 component_idx=-1
    [5] decl_tag 'decl-tag-2'(29U#B) type=6 component_idx=-1
    [6] var 'x'(16U#B) type=2 linkage=1 (global)


Current issues in the implementation
====================================

The __attribute__((btf_type_tag ("foo"))) syntax does not work correctly for
types involving multiple pointers.

Consider the following example:

  #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
  #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
  #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))

  int __typetag1 * __typetag2 __typetag3 * g;

The current implementation produces the following DWARF:

 <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
    <1f>   DW_AT_name        : g
    <21>   DW_AT_decl_file   : 1
    <22>   DW_AT_decl_line   : 6
    <23>   DW_AT_decl_column : 42
    <24>   DW_AT_type        : <0x32>
    <28>   DW_AT_external    : 1
    <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
 <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
    <33>   DW_AT_byte_size   : 8
    <33>   DW_AT_type        : <0x45>
    <37>   DW_AT_sibling     : <0x45>
 <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
    <3c>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
    <40>   DW_AT_const_value : (indirect string, offset: 0xc7): type-tag-1
 <2><44>: Abbrev Number: 0
 <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
    <46>   DW_AT_byte_size   : 8
    <46>   DW_AT_type        : <0x61>
    <4a>   DW_AT_sibling     : <0x61>
 <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
    <4f>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
    <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
 <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
    <58>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
    <5c>   DW_AT_const_value : (indirect string, offset: 0xd2): type-tag-2
 <2><60>: Abbrev Number: 0
 <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
    <62>   DW_AT_byte_size   : 4
    <63>   DW_AT_encoding    : 5	(signed)
    <64>   DW_AT_name        : int
 <1><68>: Abbrev Number: 0

This does not agree with the DWARF produced by LLVM/clang for the same case:
(clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)

<1><1e>: Abbrev Number: 2 (DW_TAG_variable)
    <1f>   DW_AT_name        : (indexed string: 0x3): g
    <20>   DW_AT_type        : <0x29>
    <24>   DW_AT_external    : 1
    <24>   DW_AT_decl_file   : 0
    <25>   DW_AT_decl_line   : 6
    <26>   DW_AT_location    : 2 byte block: a1 0 	((Unknown location op 0xa1))
 <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
    <2a>   DW_AT_type        : <0x35>
 <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
    <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
    <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
 <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
    <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
    <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
 <2><34>: Abbrev Number: 0
 <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
    <36>   DW_AT_type        : <0x3e>
 <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
    <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
    <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
 <2><3d>: Abbrev Number: 0
 <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
    <3f>   DW_AT_name        : (indexed string: 0x4): int
    <40>   DW_AT_encoding    : 5	(signed)
    <41>   DW_AT_byte_size   : 4
 <1><42>: Abbrev Number: 0

Notice the structural difference. From the DWARF produced by GCC (i.e. this
patch series), variable 'g' is a pointer with tag 'type-tag-1' to a pointer
with tags 'type-tag-2' and 'type-tag3' to an int. But from the LLVM DWARF,
variable 'g' is a pointer with tags 'type-tag-2' and 'type-tag3' to a pointer
to an int.

Because GCC produces BTF from the internal DWARF DIE tree, the BTF also differs.
This can be seen most obviously in the BTF type reference chains:

  GCC
    VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int

  LLVM
    VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int

It seems that the ultimate cause for this is the structure of the TREE
produced by the C frontend parsing and attribute handling. I believe this may
be due to differences in __attribute__ syntax parsing between GCC and LLVM.

This is the TREE for variable 'g':
  int __typetag1 * __typetag2 __typetag3 * g;

 <var_decl 0x7ffff7547090 g
    type <pointer_type 0x7ffff7548000
        type <pointer_type 0x7ffff75097e0 type <integer_type 0x7ffff74495e8 int>
            asm_written unsigned DI
            size <integer_cst 0x7ffff743c450 constant 64>
            unit-size <integer_cst 0x7ffff743c468 constant 8>
            align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7450888
            attributes <tree_list 0x7ffff75275c8
                purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
                value <tree_list 0x7ffff7527550
                    value <string_cst 0x7ffff75292e0 type <array_type 0x7ffff7509738>
                        readonly constant static "type-tag-3\000">>
                chain <tree_list 0x7ffff75275a0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
                    value <tree_list 0x7ffff75274d8
                        value <string_cst 0x7ffff75292c0 type <array_type 0x7ffff7509738>
                            readonly constant static "type-tag-2\000">>>>
            pointer_to_this <pointer_type 0x7ffff7509888>>
        asm_written unsigned DI size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7509930
        attributes <tree_list 0x7ffff75275f0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
            value <tree_list 0x7ffff7527438
                value <string_cst 0x7ffff75292a0 type <array_type 0x7ffff7509738>
                    readonly constant static "type-tag-1\000">>>>
    public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
    align:64 warn_if_not_align:0>

To me this is surprising. I would have expected the int** type of "g" to have
the tags 'type-tag-2' and 'type-tag-3', and the inner (int*) pointer type to
have the 'type-tag-1' tag. So far my attempts at resolving this difference in
the new attribute handlers for the tag attributes has not been successful.

I do not understand why exacly the attributes are attached in this way. I think
that it may be related to the pointer cases discussed in the "All other
attributes" section here:

  https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html

In particular it seems similar to this example:

    char *__attribute__((aligned(8))) *f;

  specifies the type “pointer to 8-byte-aligned pointer to char”. Note again
  that this does not work with most attributes; for example, the usage of
  ‘aligned’ and ‘noreturn’ attributes given above is not yet supported.

I am not sure if this section of the documentation is outdated, if scenarios
like this one have not been an issue before now, or if there is a way to
resolve this within the attribute handler. I am by no means an expert in the C
frontend nor attribute handling, if someone with more knowledge could help me
understand this case I would be very grateful. :)

Questions for GCC
=================

1)  How can this issue with the type tags be resolved? Is this a bug or
    limitation in the attribute parsing that hasn't been an issue until now?
    Oris it that the above case is somehow a "weird" usage of attribtes?

2)  Are attributes the right tool for this? Is there some other mechanism that
    would better fit the design of these tags? In some ways the type tags seem
    more similar to const/volatile/restrict qualifiers than to most other
    attributes.


Questions for LLVM / kernel BPF
===============================

1)  What special handling does the LLVM frontend/clang do for these attributes?
    Is there anything specific? Or does it simply follow whatever is default?

2)  What is the correct BTF representation for type tags? The documentation for
    BTF_KIND_TYPE_TAG in linux/Documentation/bpf/btf.rst seems to conflict with
    the output of clang, and the format change that was discussed here:
      https://reviews.llvm.org/D113496
    I assume the kernel btf.rst might simply be outdated, but I want to be sure.

3)  Is the ordering of multiple type tags on the same type important?
    e.g. for this variable:
        int __tag1 __tag2 __tag3 * b;

    would it be "correct" (or at least, acceptable) to produce:
        VAR(b) -> ptr -> tag2 -> tag3 -> tag1 -> int

    or _must_ it be:
        VAR(b) -> ptr -> tag3 -> tag2 -> tag1 -> int

    In the DWARF representation, all tags are equal sibling children of the type
    they annotate, so this 'ordering' problem seems like it only arises because of
    the BTF format for type tags.

4)  Are types with the same tags in different orders considered distinct types?
    I think the answer is "no", but given the format of the tags in BTF we get
    distinct chains for the types I am curious.
    e.g.
        int __tag1 __tag2 * x;
        int __tag2 __tag1 * y;

    produces
        VAR(x) -> ptr -> tag2 -> tag1 -> int
        VAR(y) -> ptr -> tag1 -> tag2 -> int

    but would
        VAR(y) -> ptr -> tag2 -> tag1 -> int

    be just as correct?

5)  According to the clang docs, type tags are currently ignored for non-pointer
    types. Is pointer tagging e.g. '__user' the only use case so far?

    This GCC implementation allows type tags on non-pointer types. Such tags
    can be represented in the DWARF but don't make much sense in BTF output,
    e.g.

        struct __typetag1 S {
            int a;
            int b;
        } __decltag1;

        struct S my_s;

    This will produce a type tag child DIE of S. In the current implementation,
    it will also produce a BTF type tag type, which refers to the __decltag1 BTF
    decl tag, which in turn refers to the struct type.  But nothing refers to
    the type tag type, currently variable my_s in BTF refers to the struct type
    directly.

    In my opinion, the DWARF here is useful but the BTF seems odd. What would be
    "correct" BTF in such a case?

6)  Would LLVM be open to changing the name of the attribute, for example to
    'debug_info_annotate' (or any other suggestion)? The use cases for these
    tags have grown (e.g. drgn) since they were originally proposed, and the
    scope is no longer limited to BTF.

    The kernel eBPF developers have said they can accomodate whatever name we
    would like to use. So although we in GCC are not tied to the name LLVM
    uses, it would be ideal for everyone to use the same attribute name.

Thanks!

David

David Faust (8):
  dwarf: Add dw_get_die_parent function
  include: Add BTF tag defines to dwarf2 and btf
  c-family: Add BTF tag attribute handlers
  dwarf: create BTF decl and type tag DIEs
  ctfc: Add support to pass through BTF annotations
  dwarf2ctf: convert tag DIEs to CTF types
  Output BTF DECL_TAG and TYPE_TAG types
  testsuite: Add tests for BTF tags

 gcc/btfout.cc                                 |  28 +++++
 gcc/c-family/c-attribs.cc                     |  45 +++++++
 gcc/ctf-int.h                                 |  29 +++++
 gcc/ctfc.cc                                   |  11 +-
 gcc/ctfc.h                                    |  17 ++-
 gcc/dwarf2ctf.cc                              | 115 +++++++++++++++++-
 gcc/dwarf2out.cc                              | 110 +++++++++++++++++
 gcc/dwarf2out.h                               |   1 +
 .../gcc.dg/debug/btf/btf-decltag-func.c       |  18 +++
 .../gcc.dg/debug/btf/btf-decltag-sou.c        |  34 ++++++
 .../gcc.dg/debug/btf/btf-decltag-typedef.c    |  15 +++
 .../gcc.dg/debug/btf/btf-typetag-1.c          |  20 +++
 .../gcc.dg/debug/dwarf2/annotation-1.c        |  29 +++++
 include/btf.h                                 |  17 ++-
 include/dwarf2.def                            |   4 +
 15 files changed, 482 insertions(+), 11 deletions(-)
 create mode 100644 gcc/ctf-int.h
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/8] dwarf: Add dw_get_die_parent function
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-01 19:42 ` [PATCH 2/8] include: Add BTF tag defines to dwarf2 and btf David Faust
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

gcc/

	* dwarf2out.cc (dw_get_die_parent): New function.
	* dwarf2out.h (dw_get_die_parent): Declare it here.
---
 gcc/dwarf2out.cc | 8 ++++++++
 gcc/dwarf2out.h  | 1 +
 2 files changed, 9 insertions(+)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 5681b01749a..35322fb5f6e 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -5235,6 +5235,14 @@ dw_get_die_sib (dw_die_ref die)
   return die->die_sib;
 }
 
+/* Return a reference to the parent of a given DIE.  */
+
+dw_die_ref
+dw_get_die_parent (dw_die_ref die)
+{
+  return die->die_parent;
+}
+
 /* Add an address constant attribute value to a DIE.  When using
    dwarf_split_debug_info, address attributes in dies destined for the
    final executable should be direct references--setting the parameter
diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
index 656ef94afde..e6962fb4848 100644
--- a/gcc/dwarf2out.h
+++ b/gcc/dwarf2out.h
@@ -455,6 +455,7 @@ extern dw_die_ref lookup_type_die (tree);
 
 extern dw_die_ref dw_get_die_child (dw_die_ref);
 extern dw_die_ref dw_get_die_sib (dw_die_ref);
+extern dw_die_ref dw_get_die_parent (dw_die_ref);
 extern enum dwarf_tag dw_get_die_tag (dw_die_ref);
 
 /* Data about a single source file.  */
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/8] include: Add BTF tag defines to dwarf2 and btf
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
  2022-04-01 19:42 ` [PATCH 1/8] dwarf: Add dw_get_die_parent function David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-01 19:42 ` [PATCH 3/8] c-family: Add BTF tag attribute handlers David Faust
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

include/

	* btf.h: Add BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG defines. Update
	comments.
	(struct btf_decl_tag): New.
	* dwarf2.def: Add new DWARF extension DW_TAG_GNU_annotation.
---
 include/btf.h      | 17 +++++++++++++++--
 include/dwarf2.def |  4 ++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/btf.h b/include/btf.h
index 78b551ced23..37deaef8b48 100644
--- a/include/btf.h
+++ b/include/btf.h
@@ -69,7 +69,7 @@ struct btf_type
 
   /* SIZE is used by INT, ENUM, STRUCT, UNION, DATASEC kinds.
      TYPE is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT, FUNC,
-     FUNC_PROTO and VAR kinds.  */
+     FUNC_PROTO, VAR and DECL_TAG kinds.  */
   union
   {
     uint32_t size;	/* Size of the entire type, in bytes.  */
@@ -109,7 +109,9 @@ struct btf_type
 #define BTF_KIND_VAR		14	/* Variable.  */
 #define BTF_KIND_DATASEC	15	/* Section such as .bss or .data.  */
 #define BTF_KIND_FLOAT		16	/* Floating point.  */
-#define BTF_KIND_MAX		BTF_KIND_FLOAT
+#define BTF_KIND_DECL_TAG	17	/* Decl Tag.  */
+#define BTF_KIND_TYPE_TAG	18	/* Type Tag.  */
+#define BTF_KIND_MAX		BTF_KIND_TYPE_TAG
 #define NR_BTF_KINDS		(BTF_KIND_MAX + 1)
 
 /* For some BTF_KINDs, struct btf_type is immediately followed by
@@ -190,6 +192,17 @@ struct btf_var_secinfo
   uint32_t size;	/* Size (in bytes) of variable.  */
 };
 
+/* BTF_KIND_DECL_TAG is followed by a single struct btf_decl_tag, which
+   describes the tag location:
+   - If component_idx == -1, then the tag is applied to a struct, union,
+     variable or function.
+   - Otherwise it is applied to a struct/union member or function argument
+     with the given given index numbered 0..vlen-1.  */
+struct btf_decl_tag
+{
+  int32_t component_idx;
+};
+
 #ifdef	__cplusplus
 }
 #endif
diff --git a/include/dwarf2.def b/include/dwarf2.def
index 4214c80907a..e054890130a 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -174,6 +174,10 @@ DW_TAG (DW_TAG_GNU_formal_parameter_pack, 0x4108)
    are properly part of DWARF 5.  */
 DW_TAG (DW_TAG_GNU_call_site, 0x4109)
 DW_TAG (DW_TAG_GNU_call_site_parameter, 0x410a)
+
+/* Extension for BTF annotations.  */
+DW_TAG (DW_TAG_GNU_annotation, 0x6000)
+
 /* Extensions for UPC.  See: http://dwarfstd.org/doc/DWARF4.pdf.  */
 DW_TAG (DW_TAG_upc_shared_type, 0x8765)
 DW_TAG (DW_TAG_upc_strict_type, 0x8766)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/8] c-family: Add BTF tag attribute handlers
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
  2022-04-01 19:42 ` [PATCH 1/8] dwarf: Add dw_get_die_parent function David Faust
  2022-04-01 19:42 ` [PATCH 2/8] include: Add BTF tag defines to dwarf2 and btf David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-01 19:42 ` [PATCH 4/8] dwarf: create BTF decl and type tag DIEs David Faust
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

This patch adds attribute handlers in GCC for two attributes already
supported in LLVM: "btf_decl_tag" and "btf_type_tag". Both attributes
accept a single string constant argument, and are used to add arbitrary
annotations to debug information generated for the types/decls to which
they apply.

gcc/c-family/

	* c-attribs.cc (c_common_attribute_table): Add new attributes
	btf_decl_tag and btf_type_tag.
	(handle_btf_decl_tag_attribute): New.
	(handle_btf_type_tag_attribute): Likewise.
---
 gcc/c-family/c-attribs.cc | 45 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 111a33f405a..ec52f6defb4 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -174,6 +174,9 @@ static tree handle_signed_bool_precision_attribute (tree *, tree, tree, int,
 						    bool *);
 static tree handle_retain_attribute (tree *, tree, tree, int, bool *);
 
+static tree handle_btf_decl_tag_attribute (tree *, tree, tree, int, bool *);
+static tree handle_btf_type_tag_attribute (tree *, tree, tree, int, bool *);
+
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)	\
   { name, function, type, variable }
@@ -555,6 +558,12 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_dealloc_attribute, NULL },
   { "tainted_args",	      0, 0, true,  false, false, false,
 			      handle_tainted_args_attribute, NULL },
+
+  { "btf_type_tag",           1, 1, false, true, false, false,
+			      handle_btf_type_tag_attribute, NULL },
+  { "btf_decl_tag",           1, 1, false, false, false, false,
+			      handle_btf_decl_tag_attribute, NULL },
+
   { NULL,                     0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -5854,6 +5863,42 @@ handle_tainted_args_attribute (tree *node, tree name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "btf_decl_tag" attribute; arguments as in
+   struct attribute_spec.handler.   */
+
+static tree
+handle_btf_decl_tag_attribute (tree *, tree name, tree args, int,
+			       bool *no_add_attrs)
+{
+  if (!args)
+    *no_add_attrs = true;
+  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+    {
+      error ("%qE attribute requires a string", name);
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
+/* Handle a "btf_type_tag" attribute; arguments as in
+   struct attribute_spec.handler.   */
+
+static tree
+handle_btf_type_tag_attribute (tree *, tree name, tree args, int,
+			       bool *no_add_attrs)
+{
+  if (!args)
+    *no_add_attrs = true;
+  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+    {
+      error ("%qE attribute requires a string", name);
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
 /* Attempt to partially validate a single attribute ATTR as if
    it were to be applied to an entity OPER.  */
 
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/8] dwarf: create BTF decl and type tag DIEs
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
                   ` (2 preceding siblings ...)
  2022-04-01 19:42 ` [PATCH 3/8] c-family: Add BTF tag attribute handlers David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-01 19:42 ` [PATCH 5/8] ctfc: Add support to pass through BTF annotations David Faust
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

The "btf_decl_tag" and "btf_type_tag" attributes are handled by
constructing DW_TAG_LLVM_annotation DIEs. The DIEs are children of the
declarations or types which they annotate, and convey the annotation via
a string constant.

Currently, all generation of these DIEs is gated behind
btf_debuginfo_p (). That is, they will not be generated nor output
unless BTF debug information is generated. The DIEs will be output in
DWARF if both -gbtf and -gdwarf are supplied by the user.

gcc/

	* dwarf2out.cc (gen_btf_decl_tag_dies): New function.
	(gen_btf_type_tag_dies): Likewise.
	(modified_type_die): Call them here, if appropriate.
	(gen_formal_parameter_die): Likewise.
	(gen_typedef_die): Likewise.
	(gen_type_die): Likewise.
	(gen_decl_die): Likewise.
---
 gcc/dwarf2out.cc | 102 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 35322fb5f6e..8f59213f96e 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -13612,6 +13612,78 @@ long_double_as_float128 (tree type)
   return NULL_TREE;
 }
 
+/* BTF support. Given a tree T, which may be a decl or a type, process any
+   "btf_decl_tag" attributes on T, provided in ATTR. Construct
+   DW_TAG_GNU_annotation DIEs appropriately as children of TARGET, usually
+   the DIE for T.  */
+
+static void
+gen_btf_decl_tag_dies (tree t, dw_die_ref target)
+{
+  dw_die_ref die;
+  tree attr;
+
+  if (t == NULL_TREE || !target)
+    return;
+
+  if (TYPE_P (t))
+    attr = lookup_attribute ("btf_decl_tag", TYPE_ATTRIBUTES (t));
+  else if (DECL_P (t))
+    attr = lookup_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
+  else
+    /* This is an error.  */
+    gcc_unreachable ();
+
+  while (attr != NULL_TREE)
+    {
+      die = new_die (DW_TAG_GNU_annotation, target, t);
+      add_name_attribute (die, IDENTIFIER_POINTER (get_attribute_name (attr)));
+      add_AT_string (die, DW_AT_const_value,
+		     TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr))));
+      attr = TREE_CHAIN (attr);
+    }
+
+  /* Strip the decl tag attribute to avoid creating multiple copies if we hit
+     this tree node again in some recursive call.  */
+  if (TYPE_P (t))
+    TYPE_ATTRIBUTES (t) =
+      remove_attribute ("btf_decl_tag", TYPE_ATTRIBUTES (t));
+  else if (DECL_P (t))
+    DECL_ATTRIBUTES (t) =
+      remove_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
+}
+
+/* BTF support. Given a tree TYPE, process any "btf_type_tag" attributes on
+   TYPE. Construct DW_TAG_GNU_annotation DIEs appropriately as children of
+   TARGET, usually the DIE for TYPE.  */
+
+static void
+gen_btf_type_tag_dies (tree type, dw_die_ref target)
+{
+  dw_die_ref die;
+  tree attr;
+
+  if (type == NULL_TREE || !target)
+    return;
+
+  gcc_assert (TYPE_P (type));
+
+  attr = lookup_attribute ("btf_type_tag", TYPE_ATTRIBUTES (type));
+  while (attr != NULL_TREE)
+    {
+      die = new_die (DW_TAG_GNU_annotation, target, type);
+      add_name_attribute (die, IDENTIFIER_POINTER (get_attribute_name (attr)));
+      add_AT_string (die, DW_AT_const_value,
+		     TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr))));
+      attr = TREE_CHAIN (attr);
+    }
+
+  /* Strip the type tag attribute to avoid creating multiple copies if we hit
+     this type again in some recursive call.  */
+  TYPE_ATTRIBUTES (type) =
+    remove_attribute ("btf_type_tag", TYPE_ATTRIBUTES (type));
+}
+
 /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging
    entry that chains the modifiers specified by CV_QUALS in front of the
    given type.  REVERSE is true if the type is to be interpreted in the
@@ -14010,6 +14082,10 @@ modified_type_die (tree type, int cv_quals, bool reverse,
   if (TYPE_ARTIFICIAL (type))
     add_AT_flag (mod_type_die, DW_AT_artificial, 1);
 
+  /* BTF support. Handle any "btf_type_tag" attributes on the type.  */
+  if (btf_debuginfo_p ())
+    gen_btf_type_tag_dies (type, mod_type_die);
+
   return mod_type_die;
 }
 
@@ -22986,6 +23062,10 @@ gen_formal_parameter_die (tree node, tree origin, bool emit_name_p,
       gcc_unreachable ();
     }
 
+  /* BTF Support */
+  if (btf_debuginfo_p ())
+    gen_btf_decl_tag_dies (node, parm_die);
+
   return parm_die;
 }
 
@@ -26060,6 +26140,10 @@ gen_typedef_die (tree decl, dw_die_ref context_die)
 
   if (get_AT (type_die, DW_AT_name))
     add_pubtype (decl, type_die);
+
+  /* BTF: handle attribute btf_decl_tag which may appear on the typedef.  */
+  if (btf_debuginfo_p ())
+    gen_btf_decl_tag_dies (decl, type_die);
 }
 
 /* Generate a DIE for a struct, class, enum or union type.  */
@@ -26373,6 +26457,20 @@ gen_type_die (tree type, dw_die_ref context_die)
 	  if (die)
 	    check_die (die);
 	}
+
+      /* BTF support. Handle any "btf_type_tag" or "btf_decl_tag" attributes
+	 on the type, constructing annotation DIEs as appropriate.  */
+      if (btf_debuginfo_p ())
+	{
+	  dw_die_ref die = lookup_type_die (type);
+	  if (die)
+	    {
+	      gen_btf_type_tag_dies (type, die);
+
+	      /* decl tags may also be attached to a type.  */
+	      gen_btf_decl_tag_dies (type, die);
+	    }
+	}
     }
 }
 
@@ -27129,6 +27227,10 @@ gen_decl_die (tree decl, tree origin, struct vlr_context *ctx,
       break;
     }
 
+  /* BTF: handle attribute btf_decl_tag.  */
+  if (btf_debuginfo_p ())
+    gen_btf_decl_tag_dies (decl, lookup_decl_die (decl));
+
   return NULL;
 }
 \f
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/8] ctfc: Add support to pass through BTF annotations
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
                   ` (3 preceding siblings ...)
  2022-04-01 19:42 ` [PATCH 4/8] dwarf: create BTF decl and type tag DIEs David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-01 19:42 ` [PATCH 6/8] dwarf2ctf: convert tag DIEs to CTF types David Faust
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

BTF generation currently relies on the internal CTF representation to
convert debug info from DWARF dies. This patch adds a new internal
header, "ctf-int.h", which defines CTF kinds to be used internally to
represent BTF tags which must pass through the CTF container. It also
adds a new type for representing information specific to those tags, and
a member for that type in ctf_dtdef.

This patch also updates ctf_add_reftype to accept a const char * name,
and add it for the newly added type.

gcc/

	* ctf-int.h: New file.
	* ctfc.cc (ctf_add_reftype): Add NAME parameter. Pass it to
	ctf_add_generic call.
	(ctf_add_pointer): Update ctf_add_reftype call accordingly.
	* ctfc.h (ctf_add_reftype): Analogous change.
	(ctf_btf_annotation): New.
	(ctf_dtdef): Add member for it.
	(enum ctf_dtu_d_union_enum): Likewise.
	* dwarf2ctf.cc (gen_ctf_modifier_type): Update call to
	ctf_add_reftype accordingly.
---
 gcc/ctf-int.h    | 29 +++++++++++++++++++++++++++++
 gcc/ctfc.cc      | 11 +++++++----
 gcc/ctfc.h       | 17 ++++++++++++++---
 gcc/dwarf2ctf.cc |  2 +-
 4 files changed, 51 insertions(+), 8 deletions(-)
 create mode 100644 gcc/ctf-int.h

diff --git a/gcc/ctf-int.h b/gcc/ctf-int.h
new file mode 100644
index 00000000000..fb5f4aacad6
--- /dev/null
+++ b/gcc/ctf-int.h
@@ -0,0 +1,29 @@
+/* ctf-int.h - GCC internal definitions used for CTF debug info.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_CTF_INT_H
+#define GCC_CTF_INT_H 1
+
+/* These CTF kinds only exist as a bridge to generating BTF types for
+   BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. They do not correspond to any
+   representable type kind in CTF.  */
+#define CTF_K_DECL_TAG  62
+#define CTF_K_TYPE_TAG  63
+
+#endif /* GCC_CTF_INT_H */
diff --git a/gcc/ctfc.cc b/gcc/ctfc.cc
index 6fe44d2e8d4..031a6fff65d 100644
--- a/gcc/ctfc.cc
+++ b/gcc/ctfc.cc
@@ -107,6 +107,9 @@ ctf_dtu_d_union_selector (ctf_dtdef_ref ctftype)
       return CTF_DTU_D_ARGUMENTS;
     case CTF_K_SLICE:
       return CTF_DTU_D_SLICE;
+    case CTF_K_DECL_TAG:
+    case CTF_K_TYPE_TAG:
+      return CTF_DTU_D_BTFNOTE;
     default:
       /* The largest member as default.  */
       return CTF_DTU_D_ARRAY;
@@ -394,15 +397,15 @@ ctf_add_encoded (ctf_container_ref ctfc, uint32_t flag, const char * name,
 }
 
 ctf_id_t
-ctf_add_reftype (ctf_container_ref ctfc, uint32_t flag, ctf_id_t ref,
-		 uint32_t kind, dw_die_ref die)
+ctf_add_reftype (ctf_container_ref ctfc, uint32_t flag, const char * name,
+		 ctf_id_t ref, uint32_t kind, dw_die_ref die)
 {
   ctf_dtdef_ref dtd;
   ctf_id_t type;
 
   gcc_assert (ref <= CTF_MAX_TYPE);
 
-  type = ctf_add_generic (ctfc, flag, NULL, &dtd, die);
+  type = ctf_add_generic (ctfc, flag, name, &dtd, die);
   dtd->dtd_data.ctti_info = CTF_TYPE_INFO (kind, flag, 0);
   /* Caller of this API must guarantee that a CTF type with id = ref already
      exists.  This will also be validated for us at link-time.  */
@@ -514,7 +517,7 @@ ctf_id_t
 ctf_add_pointer (ctf_container_ref ctfc, uint32_t flag, ctf_id_t ref,
 		 dw_die_ref die)
 {
-  return (ctf_add_reftype (ctfc, flag, ref, CTF_K_POINTER, die));
+  return (ctf_add_reftype (ctfc, flag, NULL, ref, CTF_K_POINTER, die));
 }
 
 ctf_id_t
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index 18c93c802a0..51f43cd01cb 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dwarf2ctf.h"
 #include "ctf.h"
 #include "btf.h"
+#include "ctf-int.h"
 
 /* Invalid CTF type ID definition.  */
 
@@ -151,6 +152,13 @@ typedef struct GTY (()) ctf_func_arg
 
 #define ctf_farg_list_next(elem) ((ctf_func_arg_t *)((elem)->farg_next))
 
+/* BTF support: a BTF type tag or decl tag.  */
+
+typedef struct GTY (()) ctf_btf_annotation
+{
+  uint32_t component_idx;
+} ctf_btf_annotation_t;
+
 /* Type definition for CTF generation.  */
 
 struct GTY ((for_user)) ctf_dtdef
@@ -173,6 +181,8 @@ struct GTY ((for_user)) ctf_dtdef
     ctf_func_arg_t * GTY ((tag ("CTF_DTU_D_ARGUMENTS"))) dtu_argv;
     /* slice.  */
     ctf_sliceinfo_t GTY ((tag ("CTF_DTU_D_SLICE"))) dtu_slice;
+    /* btf annotation.  */
+    ctf_btf_annotation_t GTY ((tag ("CTF_DTU_D_BTFNOTE"))) dtu_btfnote;
   } dtd_u;
 };
 
@@ -212,7 +222,8 @@ enum ctf_dtu_d_union_enum {
   CTF_DTU_D_ARRAY,
   CTF_DTU_D_ENCODING,
   CTF_DTU_D_ARGUMENTS,
-  CTF_DTU_D_SLICE
+  CTF_DTU_D_SLICE,
+  CTF_DTU_D_BTFNOTE
 };
 
 enum ctf_dtu_d_union_enum
@@ -396,8 +407,8 @@ extern ctf_dvdef_ref ctf_dvd_lookup (const ctf_container_ref ctfc,
 extern const char * ctf_add_string (ctf_container_ref, const char *,
 				    uint32_t *, int);
 
-extern ctf_id_t ctf_add_reftype (ctf_container_ref, uint32_t, ctf_id_t,
-				 uint32_t, dw_die_ref);
+extern ctf_id_t ctf_add_reftype (ctf_container_ref, uint32_t, const char *,
+				 ctf_id_t, uint32_t, dw_die_ref);
 extern ctf_id_t ctf_add_enum (ctf_container_ref, uint32_t, const char *,
 			      HOST_WIDE_INT, dw_die_ref);
 extern ctf_id_t ctf_add_slice (ctf_container_ref, uint32_t, ctf_id_t,
diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index 747b2f66107..32495cf4307 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -511,7 +511,7 @@ gen_ctf_modifier_type (ctf_container_ref ctfc, dw_die_ref modifier)
   gcc_assert (kind != CTF_K_MAX);
   /* Now register the modifier itself.  */
   if (!ctf_type_exists (ctfc, modifier, &modifier_type_id))
-    modifier_type_id = ctf_add_reftype (ctfc, CTF_ADD_ROOT,
+    modifier_type_id = ctf_add_reftype (ctfc, CTF_ADD_ROOT, NULL,
 					qual_type_id, kind,
 					modifier);
 
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/8] dwarf2ctf: convert tag DIEs to CTF types
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
                   ` (4 preceding siblings ...)
  2022-04-01 19:42 ` [PATCH 5/8] ctfc: Add support to pass through BTF annotations David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-01 19:42 ` [PATCH 7/8] Output BTF DECL_TAG and TYPE_TAG types David Faust
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

This patch makes the DWARF-to-CTF conversion process aware of the new
DW_TAG_GNU_annotation DIEs. The DIEs are converted to CTF_K_DECL_TAG or
CTF_K_TYPE_TAG types as approprate and added to the compilation unit CTF
container.

gcc/

	* dwarf2ctf.cc (handle_btf_tags): New function.
	(gen_ctf_sou_type): Call it here, if appropriate. Don't try to
	create member types for children that are not DW_TAG_member.
	(gen_ctf_function_type): Call handle_btf_tags if appropriate.
	(gen_ctf_variable): Likewise.
	(gen_ctf_function): Likewise.
	(gen_ctf_type): Likewise.
---
 gcc/dwarf2ctf.cc | 113 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 112 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index 32495cf4307..8811ec3e878 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -32,6 +32,12 @@ along with GCC; see the file COPYING3.  If not see
 static ctf_id_t
 gen_ctf_type (ctf_container_ref, dw_die_ref);
 
+static void
+gen_ctf_variable (ctf_container_ref, dw_die_ref);
+
+static void
+handle_btf_tags (ctf_container_ref, dw_die_ref, ctf_id_t, int);
+
 /* All the DIE structures we handle come from the DWARF information
    generated by GCC.  However, there are three situations where we need
    to create our own created DIE structures because GCC doesn't
@@ -547,6 +553,7 @@ gen_ctf_sou_type (ctf_container_ref ctfc, dw_die_ref sou, uint32_t kind)
   /* Now process the struct members.  */
   {
     dw_die_ref c;
+    int idx = 0;
 
     c = dw_get_die_child (sou);
     if (c)
@@ -559,6 +566,12 @@ gen_ctf_sou_type (ctf_container_ref ctfc, dw_die_ref sou, uint32_t kind)
 
 	  c = dw_get_die_sib (c);
 
+	  if (dw_get_die_tag (c) != DW_TAG_member)
+	    continue;
+
+	  if (c == dw_get_die_child (sou))
+	    idx = 0;
+
 	  field_name = get_AT_string (c, DW_AT_name);
 	  field_type = ctf_get_AT_type (c);
 	  field_location = ctf_get_AT_data_member_location (c);
@@ -626,6 +639,12 @@ gen_ctf_sou_type (ctf_container_ref ctfc, dw_die_ref sou, uint32_t kind)
 				 field_name,
 				 field_type_id,
 				 field_location);
+
+	  /* Handle BTF tags on the member.  */
+	  if (btf_debuginfo_p ())
+	    handle_btf_tags (ctfc, c, sou_type_id, idx);
+
+	  idx++;
 	}
       while (c != dw_get_die_child (sou));
   }
@@ -716,6 +735,9 @@ gen_ctf_function_type (ctf_container_ref ctfc, dw_die_ref function,
 	      arg_type = gen_ctf_type (ctfc, ctf_get_AT_type (c));
 	      /* Add the argument to the existing CTF function type.  */
 	      ctf_add_function_arg (ctfc, function, arg_name, arg_type);
+
+	      if (btf_debuginfo_p ())
+		handle_btf_tags (ctfc, c, function_type_id, i - 1);
 	    }
 	  else
 	    /* This is a local variable.  Ignore.  */
@@ -814,6 +836,11 @@ gen_ctf_variable (ctf_container_ref ctfc, dw_die_ref die)
   /* Generate the new CTF variable and update global counter.  */
   (void) ctf_add_variable (ctfc, var_name, var_type_id, die, external_vis);
   ctfc->ctfc_num_global_objts += 1;
+
+  /* Handle any BTF tags on the variable.  */
+  if (btf_debuginfo_p ())
+    handle_btf_tags (ctfc, die, CTF_NULL_TYPEID, -1);
+
 }
 
 /* Add a CTF function record for the given input DWARF DIE.  */
@@ -831,8 +858,12 @@ gen_ctf_function (ctf_container_ref ctfc, dw_die_ref die)
      counter.  Note that DWARF encodes function types in both
      DW_TAG_subroutine_type and DW_TAG_subprogram in exactly the same
      way.  */
-  (void) gen_ctf_function_type (ctfc, die, true /* from_global_func */);
+  function_type_id = gen_ctf_function_type (ctfc, die, true /* from_global_func */);
   ctfc->ctfc_num_global_funcs += 1;
+
+  /* Handle any BTF tags on the function itself.  */
+  if (btf_debuginfo_p ())
+    handle_btf_tags (ctfc, die, function_type_id, -1);
 }
 
 /* Add CTF type record(s) for the given input DWARF DIE and return its type id.
@@ -909,6 +940,10 @@ gen_ctf_type (ctf_container_ref ctfc, dw_die_ref die)
       break;
     }
 
+  /* Handle any BTF tags on the type.  */
+  if (btf_debuginfo_p () && !unrecog_die)
+    handle_btf_tags (ctfc, die, type_id, -1);
+
   /* For all types unrepresented in CTF, use an explicit CTF type of kind
      CTF_K_UNKNOWN.  */
   if ((type_id == CTF_NULL_TYPEID) && (!unrecog_die))
@@ -917,6 +952,82 @@ gen_ctf_type (ctf_container_ref ctfc, dw_die_ref die)
   return type_id;
 }
 
+/* BTF support. Handle any BTF tags attached to a given DIE, and generate
+   intermediate CTF types for them. Type tags are inserted into the type chain
+   at this point. The return value is the CTF type ID of the last type tag
+   created (for type chaining), or the same as the argument TYPE_ID if there are
+   no type tags.
+   Note that despite the name, the BTF spec seems to allow decl tags on types
+   as well as declarations.  */
+
+static void
+handle_btf_tags (ctf_container_ref ctfc, dw_die_ref die, ctf_id_t type_id,
+		 int component_idx)
+{
+  dw_die_ref c;
+  const char * name = NULL;
+  const char * value = NULL;
+  ctf_dtdef_ref dtd = ctf_dtd_lookup (ctfc, die);
+  ctf_id_t target_id, tag_id;
+
+  if (dtd)
+    target_id = dtd->dtd_data.ctti_type;
+  else
+    target_id = CTF_NULL_TYPEID;
+
+  c = dw_get_die_child (die);
+  if (c)
+    do
+      {
+	if (dw_get_die_tag (c) != DW_TAG_GNU_annotation)
+	  {
+	    c = dw_get_die_sib (c);
+	    continue;
+	  }
+
+	name = get_AT_string (c, DW_AT_name);
+
+	/* BTF decl tags add an arbitrary annotation to the thing they
+	   annotate. The annotated thing could be a variable or a type.  */
+	if (strcmp (name, "btf_decl_tag") == 0)
+	  {
+	    value = get_AT_string (c, DW_AT_const_value);
+	    if (!ctf_type_exists (ctfc, c, &tag_id))
+	      (void) ctf_add_reftype (ctfc, CTF_ADD_ROOT, value,
+				      type_id, CTF_K_DECL_TAG, c);
+	    ctf_dtdef_ref dtd = ctf_dtd_lookup (ctfc, c);
+	    dtd->dtd_u.dtu_btfnote.component_idx = component_idx;
+	  }
+
+	/* BTF type tags are part of the type chain similar to cvr quals.
+	   But the type tag DIEs are children of the DIEs they annotate.
+
+	   For each type tag on this type, create a CTF type for it and
+	   insert it into the type chain:
+	   - The first tag refers to the type referred to by the parent.
+	   - Each subsequent tag refers to the prior tag.
+	   - The parent type is updated to refer to the last tag.  */
+
+	/* TODO: given this type chain requirement, the representation of type
+	   tags in BTF only makes sense for pointer types. Should this be
+	   enforced here?  */
+	else if (strcmp (name, "btf_type_tag") == 0)
+	  {
+	    gcc_assert (dtd);
+	    value = get_AT_string (c, DW_AT_const_value);
+
+	    if (!ctf_type_exists (ctfc, c, &tag_id))
+	      tag_id = ctf_add_reftype (ctfc, CTF_ADD_ROOT, value,
+					target_id, CTF_K_TYPE_TAG, c);
+
+	    dtd->dtd_data.ctti_type = tag_id;
+	    target_id = tag_id;
+	  }
+	c = dw_get_die_sib (c);
+      }
+    while (c != dw_get_die_child (die));
+}
+
 /* Prepare for output and write out the CTF debug information.  */
 
 static void
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 7/8] Output BTF DECL_TAG and TYPE_TAG types
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
                   ` (5 preceding siblings ...)
  2022-04-01 19:42 ` [PATCH 6/8] dwarf2ctf: convert tag DIEs to CTF types David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-01 19:42 ` [PATCH 8/8] testsuite: Add tests for BTF tags David Faust
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

This patch updates btfout.cc to be aware of the DECL_TAG and TYPE_TAG
kinds and output them appropriately.

gcc/

	* btfout.cc (get_btf_kind): Handle TYPE_TAG and DECL_TAG kinds.
	(btf_calc_num_vbytes): Likewise.
	(btf_asm_type): Likewise.
	(output_asm_btf_vlen_bytes): Likewise.
---
 gcc/btfout.cc | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 31af50521da..f291cd925be 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -136,6 +136,8 @@ get_btf_kind (uint32_t ctf_kind)
     case CTF_K_VOLATILE: return BTF_KIND_VOLATILE;
     case CTF_K_CONST:    return BTF_KIND_CONST;
     case CTF_K_RESTRICT: return BTF_KIND_RESTRICT;
+    case CTF_K_TYPE_TAG: return BTF_KIND_TYPE_TAG;
+    case CTF_K_DECL_TAG: return BTF_KIND_DECL_TAG;
     default:;
     }
   return BTF_KIND_UNKN;
@@ -201,6 +203,7 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
     case BTF_KIND_CONST:
     case BTF_KIND_RESTRICT:
     case BTF_KIND_FUNC:
+    case BTF_KIND_TYPE_TAG:
     /* These kinds have no vlen data.  */
       break;
 
@@ -238,6 +241,10 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
       vlen_bytes += vlen * sizeof (struct btf_var_secinfo);
       break;
 
+    case BTF_KIND_DECL_TAG:
+      vlen_bytes += sizeof (struct btf_decl_tag);
+      break;
+
     default:
       break;
     }
@@ -636,6 +643,22 @@ btf_asm_type (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
       dw2_asm_output_data (4, dtd->dtd_data.ctti_size, "btt_size: %uB",
 			   dtd->dtd_data.ctti_size);
       return;
+    case BTF_KIND_DECL_TAG:
+      {
+	/* A decl tag might refer to (be the child DIE of) a variable. Try to
+	   lookup the parent DIE's CTF variable, and if it exists point to the
+	   corresponding BTF variable. This is an odd construction - we have a
+	   'type' which refers to a variable, rather than the reverse.  */
+	dw_die_ref parent = dw_get_die_parent (dtd->dtd_key);
+	ctf_dvdef_ref dvd = ctf_dvd_lookup (ctfc, parent);
+	if (dvd)
+	  {
+	    unsigned int var_id =
+	      *(btf_var_ids->get (dvd)) + num_types_added + 1;
+	    dw2_asm_output_data (4, var_id, "btt_type");
+	    return;
+	  }
+      }
     default:
       break;
     }
@@ -949,6 +972,11 @@ output_asm_btf_vlen_bytes (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
 	 at this point.  */
       gcc_unreachable ();
 
+    case BTF_KIND_DECL_TAG:
+      dw2_asm_output_data (4, dtd->dtd_u.dtu_btfnote.component_idx,
+			   "decltag_compidx");
+      break;
+
     default:
       /* All other BTF type kinds have no variable length data.  */
       break;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 8/8] testsuite: Add tests for BTF tags
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
                   ` (6 preceding siblings ...)
  2022-04-01 19:42 ` [PATCH 7/8] Output BTF DECL_TAG and TYPE_TAG types David Faust
@ 2022-04-01 19:42 ` David Faust
  2022-04-04 22:13 ` [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations Yonghong Song
  2022-04-18 19:36 ` [ping][PATCH " David Faust
  9 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-01 19:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: yhs, jose.marchesi

This commit adds tests for the tags, in BTF and in DWARF.

gcc/teststuite/

	* gcc.dg/debug/btf/btf-decltag-func.c: New test.
	* gcc.dg/debug/btf/btf-decltag-sou.c: Likewise.
	* gcc.dg/debug/btf/btf-decltag-typedef.c: Likewise.
	* gcc.dg/debug/btf/btf-typetag-1.c: Likewise.
	* gcc.dg/debug/dwarf2/annotation-1.c: Likewise.
---
 .../gcc.dg/debug/btf/btf-decltag-func.c       | 18 ++++++++++
 .../gcc.dg/debug/btf/btf-decltag-sou.c        | 34 +++++++++++++++++++
 .../gcc.dg/debug/btf/btf-decltag-typedef.c    | 15 ++++++++
 .../gcc.dg/debug/btf/btf-typetag-1.c          | 20 +++++++++++
 .../gcc.dg/debug/dwarf2/annotation-1.c        | 29 ++++++++++++++++
 5 files changed, 116 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c

diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
new file mode 100644
index 00000000000..aa2c31aaa32
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
@@ -0,0 +1,18 @@
+
+/* { dg-do compile )  */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+/* { dg-final { scan-assembler-times "\[\t \]0x11000000\[\t \]+\[^\n\]*btt_info" 4 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0xffffffff\[\t \]+\[^\n\]*decltag_compidx" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x1\[\t \]+\[^\n\]*decltag_compidx" 1 } } */
+
+#define __tag1 __attribute__((btf_decl_tag("decl-tag-1")))
+#define __tag2 __attribute__((btf_decl_tag("decl-tag-2")))
+#define __tag3 __attribute__((btf_decl_tag("decl-tag-3")))
+
+extern int bar (int __tag1, int __tag2) __tag3;
+
+int __tag1 __tag2 foo (int arg1, int *arg2 __tag2)
+  {
+    return bar (arg1 + 1, *arg2 + 2);
+  }
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
new file mode 100644
index 00000000000..be89d0d32de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
@@ -0,0 +1,34 @@
+
+/* { dg-do compile )  */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+/* { dg-final { scan-assembler-times "\[\t \]0x11000000\[\t \]+\[^\n\]*btt_info" 16 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0\[\t \]+\[^\n\]*decltag_compidx" 2 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x1\[\t \]+\[^\n\]*decltag_compidx" 1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x2\[\t \]+\[^\n\]*decltag_compidx" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x3\[\t \]+\[^\n\]*decltag_compidx" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x4\[\t \]+\[^\n\]*decltag_compidx" 1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0xffffffff\[\t \]+\[^\n\]*decltag_compidx" 6 } } */
+
+#define __tag1 __attribute__((btf_decl_tag("decl-tag-1")))
+#define __tag2 __attribute__((btf_decl_tag("decl-tag-2")))
+#define __tag3 __attribute__((btf_decl_tag("decl-tag-3")))
+
+struct t {
+  int a;
+  long b __tag3;
+  char c __tag2 __tag3;
+} __tag1 __tag2;
+
+struct t my_t __tag1 __tag3;
+
+
+union u {
+  char one __tag1 __tag2;
+  short two;
+  int three __tag1;
+  long four __tag1 __tag2 __tag3;
+  long long five __tag2;
+} __tag3;
+
+union u my_u __tag2;
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c
new file mode 100644
index 00000000000..75be876f949
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c
@@ -0,0 +1,15 @@
+/* { dg-do compile )  */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+/* { dg-final { scan-assembler-times "\[\t \]0x11000000\[\t \]+\[^\n\]*btt_info" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0xffffffff\[\t \]+\[^\n\]*decltag_compidx" 3 } } */
+
+#define __tag1 __attribute__((btf_decl_tag("decl-tag-1")))
+#define __tag2 __attribute__((btf_decl_tag("decl-tag-2")))
+#define __tag3 __attribute__((btf_decl_tag("decl-tag-3")))
+
+struct s { int a; } __tag1;
+
+typedef struct s * sptr __tag2;
+
+sptr my_sptr __tag3;
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c b/gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c
new file mode 100644
index 00000000000..4b05663385f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile )  */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+/* { dg-final { scan-assembler-times "\[\t \]0x12000000\[\t \]+\[^\n\]*btt_info" 4 } } */
+
+#define __tag1 __attribute__((btf_type_tag("tag1")))
+#define __tag2 __attribute__((btf_type_tag("tag2")))
+#define __tag3 __attribute__((btf_type_tag("tag3")))
+
+int __tag1 * x;
+const int __tag2 * y;
+
+struct a;
+
+struct b
+{
+  struct a __tag2 __tag3 * inner_a;
+};
+
+struct b my_b;
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c b/gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c
new file mode 100644
index 00000000000..543cf771f92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-gbtf -gdwarf -dA" } */
+#define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
+#define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
+#define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
+
+#define __decltag1 __attribute__((btf_decl_tag("decl-tag-1")))
+#define __decltag2 __attribute__((btf_decl_tag("decl-tag-2")))
+#define __decltag3 __attribute__((btf_decl_tag("decl-tag-3")))
+
+struct S {
+  int a __decltag2 __decltag3;
+  int b __decltag1;
+} __decltag1 __decltag2;
+
+struct S my_S __decltag3;
+
+long __typetag1 __typetag2 * x;
+
+/* Verify that we get the expected DW_TAG_GNU_annotation DIEs for each tag.
+   Note: one more TAG in debug abbrev.  */
+/* { dg-final { scan-assembler-times " DW_TAG_GNU_annotation" 9 } } */
+/* { dg-final { scan-assembler-times " DW_AT_name: \"btf_decl_tag\"" 6 } } */
+/* { dg-final { scan-assembler-times " DW_AT_const_value: \"decl-tag-1\"" 2 } } */
+/* { dg-final { scan-assembler-times " DW_AT_const_value: \"decl-tag-2\"" 2 } } */
+/* { dg-final { scan-assembler-times " DW_AT_const_value: \"decl-tag-3\"" 2 } } */
+/* { dg-final { scan-assembler-times " DW_AT_name: \"btf_type_tag\"" 2 } } */
+/* { dg-final { scan-assembler-times " DW_AT_const_value: \"type-tag-1\"" 1 } } */
+/* { dg-final { scan-assembler-times " DW_AT_const_value: \"type-tag-2\"" 1 } } */
-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
                   ` (7 preceding siblings ...)
  2022-04-01 19:42 ` [PATCH 8/8] testsuite: Add tests for BTF tags David Faust
@ 2022-04-04 22:13 ` Yonghong Song
  2022-04-05 16:26   ` David Faust
  2022-04-18 19:36 ` [ping][PATCH " David Faust
  9 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-04-04 22:13 UTC (permalink / raw)
  To: David Faust, gcc-patches



On 4/1/22 12:42 PM, David Faust wrote:
> Hello,
> 
> This patch series is a first attempt at adding support for:
> 
> - Two new C-language-level attributes that allow to associate (to "tag")
>    particular declarations and types with arbitrary strings. As explained below,
>    this is intended to be used to, for example, characterize certain pointer
>    types.
> 
> - The conveyance of that information in the DWARF output in the form of a new
>    DIE: DW_TAG_GNU_annotation.
> 
> - The conveyance of that information in the BTF output in the form of two new
>    kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
> 
> All of these facilities are being added to the eBPF ecosystem, and support for
> them exists in some form in LLVM. However, as we shall see, we have found some
> problems implementing them so some discussion is in order.
> 
> Purpose
> =======
> 
> 1)  Addition of C-family language constructs (attributes) to specify free-text
>      tags on certain language elements, such as struct fields.
> 
>      The purpose of these annotations is to provide additional information about
>      types, variables, and function paratemeters of interest to the kernel. A
>      driving use case is to tag pointer types within the linux kernel and eBPF
>      programs with additional semantic information, such as '__user' or '__rcu'.
> 
>      For example, consider the linux kernel function do_execve with the
>      following declaration:
> 
>        static int do_execve(struct filename *filename,
>           const char __user *const __user *__argv,
>           const char __user *const __user *__envp);
> 
>      Here, __user could be defined with these annotations to record semantic
>      information about the pointer parameters (e.g., they are user-provided) in
>      DWARF and BTF information. Other kernel facilites such as the eBPF verifier
>      can read the tags and make use of the information.
> 
> 2)  Conveying the tags in the generated DWARF debug info.
> 
>      The main motivation for emitting the tags in DWARF is that the Linux kernel
>      generates its BTF information via pahole, using DWARF as a source:
> 
>          +--------+  BTF                  BTF   +----------+
>          | pahole |-------> vmlinux.btf ------->| verifier |
>          +--------+                             +----------+
>              ^                                        ^
>              |                                        |
>        DWARF |                                    BTF |
>              |                                        |
>           vmlinux                              +-------------+
>           module1.ko                           | BPF program |
>           module2.ko                           +-------------+
>             ...
> 
>      This is because:
> 
>      a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
> 
>      b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>          support for linking/deduplicating BTF in the linker.
> 
>      In the scenario above, the verifier needs access to the pointer tags of
>      both the kernel types/declarations (conveyed in the DWARF and translated
>      to BTF by pahole) and those of the BPF program (available directly in BTF).
> 
>      Another motivation for having the tag information in DWARF, unrelated to
>      BPF and BTF, is that the drgn project (another DWARF consumer) also wants
>      to benefit from these tags in order to differentiate between different
>      kinds of pointers in the kernel.
> 
> 3)  Conveying the tags in the generated BTF debug info.
> 
>      This is easy: the main purpose of having this info in BTF is for the
>      compiled eBPF programs. The kernel verifier can then access the tags
>      of pointers used by the eBPF programs.
> 
> 
> For more information about these tags and the motivation behind them, please
> refer to the following linux kernel discussions:
> 
>    https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
>    https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
>    https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/
> 
> 
> What is in this patch series
> ============================
> 
> This patch series adds support for these annotations in GCC. The implementation
> is largely complete. However, in some cases the produced debug info (both DWARF
> and BTF) differs significantly from that produced by LLVM. This issue is
> discussed in detail below, along with a few specific questions for both GCC and
> LLVM. Any input would be much appreciated.

Hi, David, Thanks for the RFC implementation! I will answer your 
questions related to llvm and kernel.

> 
> 
> Implementation Overview
> =======================
> 
> To enable these annotations, two new C language attributes are added:
> __attribute__((btf_decl_tag("foo")) and __attribute__((btf_type_tag("bar"))).
> Both attributes accept a single arbitrary string constant argument, which will
> be recorded in the generated DWARF and/or BTF debugging information. They have
> no effect on code generation.
> 
> Note that we are using the same attribute names as LLVM, which include "btf"
> in the name. This may be controversial, as these tags are not really
> BTF-specific. A different name may be more appropriate. There was much
> discussion about naming in the proposal for the functionality in LLVM, the
> full thread can be found here:
> 
>    https://lists.llvm.org/pipermail/llvm-dev/2021-June/151023.html
> 
> The name debug_info_annotate, suggested here, might better suit the attribute:
> 
>    https://lists.llvm.org/pipermail/llvm-dev/2021-June/151042.html
> 
> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
> declarations and types will be checked for the corresponding attributes. If
> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
> the annotated type or declaration, one for each tag. These DIEs link the
> arbitrary tag value to the item they annotate.
> 
> For example, the following variable declaration:
> 
>      #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>      #define __decltag1 __attribute__((btf_decl_tag("decl-tag-1")))
>      #define __decltag2 __attribute__((btf_decl_tag("decl-tag-2")))
> 
>      int __typetag1 * x __decltag1 __decltag2;
> 
> Produces the following DIEs:
> 
> <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>      <1f>   DW_AT_name        : x
>      <21>   DW_AT_decl_file   : 1
>      <22>   DW_AT_decl_line   : 6
>      <23>   DW_AT_decl_column : 18
>      <24>   DW_AT_type        : <0x49>
>      <28>   DW_AT_external    : 1
>      <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>      <32>   DW_AT_sibling     : <0x49>
>   <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <37>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>      <3b>   DW_AT_const_value : (indirect string, offset: 0x0): decl-tag-2
>   <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <40>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>      <44>   DW_AT_const_value : (indirect string, offset: 0x1d): decl-tag-1
>   <2><48>: Abbrev Number: 0
>   <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>      <4a>   DW_AT_byte_size   : 8
>      <4b>   DW_AT_type        : <0x5d>
>      <4f>   DW_AT_sibling     : <0x5d>
>   <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <54>   DW_AT_name        : (indirect string, offset: 0x28): btf_type_tag
>      <58>   DW_AT_const_value : (indirect string, offset: 0xd7): type-tag-1
>   <2><5c>: Abbrev Number: 0
>   <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>      <5e>   DW_AT_byte_size   : 4
>      <5f>   DW_AT_encoding    : 5	(signed)
>      <60>   DW_AT_name        : int
>   <1><64>: Abbrev Number: 0
> 
> Please note that currently, the annotation DWARF DIEs will be generated only if
> BTF debug information requested (via -gbtf). Therefore, the annotation DIEs
> will only be output if both BTF and DWARF are requested (e.g. -gbtf -gdwarf).
> This will change, since these tags are needed even when not generating BTF,
> for example in a GCC-built Linux kernel.
> 
> In the case of BTF, the annotations are recorded in two type kinds recently
> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
> The above example declaration prodcues the following BTF information:
> 
>      [1] int 'int'(1U#B) size=4U#B offset=0UB#b bits=32UB#b SIGNED
>      [2] ptr <anonymous> type=3
>      [3] type_tag 'type-tag-1'(5U#B) type=1
>      [4] decl_tag 'decl-tag-1'(18U#B) type=6 component_idx=-1
>      [5] decl_tag 'decl-tag-2'(29U#B) type=6 component_idx=-1
>      [6] var 'x'(16U#B) type=2 linkage=1 (global)
> 
> 
> Current issues in the implementation
> ====================================
> 
> The __attribute__((btf_type_tag ("foo"))) syntax does not work correctly for
> types involving multiple pointers.
> 
> Consider the following example:
> 
>    #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>    #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>    #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
> 
>    int __typetag1 * __typetag2 __typetag3 * g;
> 
> The current implementation produces the following DWARF:
> 
>   <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>      <1f>   DW_AT_name        : g
>      <21>   DW_AT_decl_file   : 1
>      <22>   DW_AT_decl_line   : 6
>      <23>   DW_AT_decl_column : 42
>      <24>   DW_AT_type        : <0x32>
>      <28>   DW_AT_external    : 1
>      <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>   <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>      <33>   DW_AT_byte_size   : 8
>      <33>   DW_AT_type        : <0x45>
>      <37>   DW_AT_sibling     : <0x45>
>   <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <3c>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>      <40>   DW_AT_const_value : (indirect string, offset: 0xc7): type-tag-1
>   <2><44>: Abbrev Number: 0
>   <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>      <46>   DW_AT_byte_size   : 8
>      <46>   DW_AT_type        : <0x61>
>      <4a>   DW_AT_sibling     : <0x61>
>   <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <4f>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>      <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
>   <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <58>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>      <5c>   DW_AT_const_value : (indirect string, offset: 0xd2): type-tag-2
>   <2><60>: Abbrev Number: 0
>   <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>      <62>   DW_AT_byte_size   : 4
>      <63>   DW_AT_encoding    : 5	(signed)
>      <64>   DW_AT_name        : int
>   <1><68>: Abbrev Number: 0
> 
> This does not agree with the DWARF produced by LLVM/clang for the same case:
> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
> 
> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>      <1f>   DW_AT_name        : (indexed string: 0x3): g
>      <20>   DW_AT_type        : <0x29>
>      <24>   DW_AT_external    : 1
>      <24>   DW_AT_decl_file   : 0
>      <25>   DW_AT_decl_line   : 6
>      <26>   DW_AT_location    : 2 byte block: a1 0 	((Unknown location op 0xa1))
>   <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>      <2a>   DW_AT_type        : <0x35>
>   <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>      <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>      <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>   <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>      <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>      <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>   <2><34>: Abbrev Number: 0
>   <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>      <36>   DW_AT_type        : <0x3e>
>   <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>      <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>      <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>   <2><3d>: Abbrev Number: 0
>   <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>      <3f>   DW_AT_name        : (indexed string: 0x4): int
>      <40>   DW_AT_encoding    : 5	(signed)
>      <41>   DW_AT_byte_size   : 4
>   <1><42>: Abbrev Number: 0
> 
> Notice the structural difference. From the DWARF produced by GCC (i.e. this
> patch series), variable 'g' is a pointer with tag 'type-tag-1' to a pointer
> with tags 'type-tag-2' and 'type-tag3' to an int. But from the LLVM DWARF,
> variable 'g' is a pointer with tags 'type-tag-2' and 'type-tag3' to a pointer
> to an int.
> 
> Because GCC produces BTF from the internal DWARF DIE tree, the BTF also differs.
> This can be seen most obviously in the BTF type reference chains:
> 
>    GCC
>      VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
> 
>    LLVM
>      VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
> 
> It seems that the ultimate cause for this is the structure of the TREE
> produced by the C frontend parsing and attribute handling. I believe this may
> be due to differences in __attribute__ syntax parsing between GCC and LLVM.
> 
> This is the TREE for variable 'g':
>    int __typetag1 * __typetag2 __typetag3 * g;
> 
>   <var_decl 0x7ffff7547090 g
>      type <pointer_type 0x7ffff7548000
>          type <pointer_type 0x7ffff75097e0 type <integer_type 0x7ffff74495e8 int>
>              asm_written unsigned DI
>              size <integer_cst 0x7ffff743c450 constant 64>
>              unit-size <integer_cst 0x7ffff743c468 constant 8>
>              align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7450888
>              attributes <tree_list 0x7ffff75275c8
>                  purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>                  value <tree_list 0x7ffff7527550
>                      value <string_cst 0x7ffff75292e0 type <array_type 0x7ffff7509738>
>                          readonly constant static "type-tag-3\000">>
>                  chain <tree_list 0x7ffff75275a0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>                      value <tree_list 0x7ffff75274d8
>                          value <string_cst 0x7ffff75292c0 type <array_type 0x7ffff7509738>
>                              readonly constant static "type-tag-2\000">>>>
>              pointer_to_this <pointer_type 0x7ffff7509888>>
>          asm_written unsigned DI size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>          align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7509930
>          attributes <tree_list 0x7ffff75275f0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>              value <tree_list 0x7ffff7527438
>                  value <string_cst 0x7ffff75292a0 type <array_type 0x7ffff7509738>
>                      readonly constant static "type-tag-1\000">>>>
>      public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>      align:64 warn_if_not_align:0>
> 
> To me this is surprising. I would have expected the int** type of "g" to have
> the tags 'type-tag-2' and 'type-tag-3', and the inner (int*) pointer type to
> have the 'type-tag-1' tag. So far my attempts at resolving this difference in
> the new attribute handlers for the tag attributes has not been successful.
> 
> I do not understand why exacly the attributes are attached in this way. I think
> that it may be related to the pointer cases discussed in the "All other
> attributes" section here:
> 
>    https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
> 
> In particular it seems similar to this example:
> 
>      char *__attribute__((aligned(8))) *f;
> 
>    specifies the type “pointer to 8-byte-aligned pointer to char”. Note again
>    that this does not work with most attributes; for example, the usage of
>    ‘aligned’ and ‘noreturn’ attributes given above is not yet supported.
> 
> I am not sure if this section of the documentation is outdated, if scenarios
> like this one have not been an issue before now, or if there is a way to
> resolve this within the attribute handler. I am by no means an expert in the C
> frontend nor attribute handling, if someone with more knowledge could help me
> understand this case I would be very grateful. :)
> 
> Questions for GCC
> =================
> 
> 1)  How can this issue with the type tags be resolved? Is this a bug or
>      limitation in the attribute parsing that hasn't been an issue until now?
>      Oris it that the above case is somehow a "weird" usage of attribtes?
> 
> 2)  Are attributes the right tool for this? Is there some other mechanism that
>      would better fit the design of these tags? In some ways the type tags seem
>      more similar to const/volatile/restrict qualifiers than to most other
>      attributes.
> 
> 
> Questions for LLVM / kernel BPF
> ===============================
> 
> 1)  What special handling does the LLVM frontend/clang do for these attributes?
>      Is there anything specific? Or does it simply follow whatever is default?

the llvm frontend/clang only processed these attributes and encoded them 
in AST, and then only these attributes are encoded in debuginfo.
For btf_type_tag, only tags to pointee (like int __tag * __tag * var)
are encoded in debuginfo.

> 
> 2)  What is the correct BTF representation for type tags? The documentation for
>      BTF_KIND_TYPE_TAG in linux/Documentation/bpf/btf.rst seems to conflict with
>      the output of clang, and the format change that was discussed here:
>        https://reviews.llvm.org/D113496
>      I assume the kernel btf.rst might simply be outdated, but I want to be sure.

Yes, the should be the same.
The document in linux/Documentation/bpf/btf.rst:

   ptr -> [type_tag]*
       -> [const | volatile | restrict | typedef]*
       -> base_type

is related to BTF format, which is correct.

What is not specified is how the following format is
converted to C code, which is also specified in
https://reviews.llvm.org/D113496.

> 
> 3)  Is the ordering of multiple type tags on the same type important?
>      e.g. for this variable:
>          int __tag1 __tag2 __tag3 * b;
> 
>      would it be "correct" (or at least, acceptable) to produce:
>          VAR(b) -> ptr -> tag2 -> tag3 -> tag1 -> int
> 
>      or _must_ it be:
>          VAR(b) -> ptr -> tag3 -> tag2 -> tag1 -> int
> 
>      In the DWARF representation, all tags are equal sibling children of the type
>      they annotate, so this 'ordering' problem seems like it only arises because of
>      the BTF format for type tags.

No. They are all independent modifiers to the pointee. So any ordering
in the above should be correct.

> 
> 4)  Are types with the same tags in different orders considered distinct types?
>      I think the answer is "no", but given the format of the tags in BTF we get
>      distinct chains for the types I am curious.
>      e.g.
>          int __tag1 __tag2 * x;
>          int __tag2 __tag1 * y;
> 
>      produces
>          VAR(x) -> ptr -> tag2 -> tag1 -> int
>          VAR(y) -> ptr -> tag1 -> tag2 -> int
> 
>      but would
>          VAR(y) -> ptr -> tag2 -> tag1 -> int
> 
>      be just as correct?

Yes,
   VAR(y) -> ptr -> tag2 -> tag1 should be correct
   although the original order is preferred since
   when we generate vmlinux.h we could like the
   type definition as close to the original type
   definition as possible.

> 
> 5)  According to the clang docs, type tags are currently ignored for non-pointer
>      types. Is pointer tagging e.g. '__user' the only use case so far?
> 
>      This GCC implementation allows type tags on non-pointer types. Such tags
>      can be represented in the DWARF but don't make much sense in BTF output,
>      e.g.
> 
>          struct __typetag1 S {
>              int a;
>              int b;
>          } __decltag1;
> 
>          struct S my_s;
> 
>      This will produce a type tag child DIE of S. In the current implementation,
>      it will also produce a BTF type tag type, which refers to the __decltag1 BTF
>      decl tag, which in turn refers to the struct type.  But nothing refers to
>      the type tag type, currently variable my_s in BTF refers to the struct type
>      directly.
> 
>      In my opinion, the DWARF here is useful but the BTF seems odd. What would be
>      "correct" BTF in such a case?

Currently in llvm, __typetag1 will not be encoded in dwarf.

> 
> 6)  Would LLVM be open to changing the name of the attribute, for example to
>      'debug_info_annotate' (or any other suggestion)? The use cases for these
>      tags have grown (e.g. drgn) since they were originally proposed, and the
>      scope is no longer limited to BTF.
> 
>      The kernel eBPF developers have said they can accomodate whatever name we
>      would like to use. So although we in GCC are not tied to the name LLVM
>      uses, it would be ideal for everyone to use the same attribute name.

The attribute name, esp. 'btf_type_tag' has been used in the linux 
kernel. So It would be great we can use the same name.
Not sure whether gcc support or not, maybe has attribute aliases?
clang doesn't support it though.


[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-04-04 22:13 ` [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations Yonghong Song
@ 2022-04-05 16:26   ` David Faust
  0 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-04-05 16:26 UTC (permalink / raw)
  To: Yonghong Song, gcc-patches



On 4/4/22 15:13, Yonghong Song wrote:
> 
> 
> On 4/1/22 12:42 PM, David Faust wrote:
>> Hello,
>>
>> This patch series is a first attempt at adding support for:
>>
>> - Two new C-language-level attributes that allow to associate (to "tag")
>>     particular declarations and types with arbitrary strings. As explained below,
>>     this is intended to be used to, for example, characterize certain pointer
>>     types.
>>
>> - The conveyance of that information in the DWARF output in the form of a new
>>     DIE: DW_TAG_GNU_annotation.
>>
>> - The conveyance of that information in the BTF output in the form of two new
>>     kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>
>> All of these facilities are being added to the eBPF ecosystem, and support for
>> them exists in some form in LLVM. However, as we shall see, we have found some
>> problems implementing them so some discussion is in order.
>>
>> Purpose
>> =======
>>
>> 1)  Addition of C-family language constructs (attributes) to specify free-text
>>       tags on certain language elements, such as struct fields.
>>
>>       The purpose of these annotations is to provide additional information about
>>       types, variables, and function paratemeters of interest to the kernel. A
>>       driving use case is to tag pointer types within the linux kernel and eBPF
>>       programs with additional semantic information, such as '__user' or '__rcu'.
>>
>>       For example, consider the linux kernel function do_execve with the
>>       following declaration:
>>
>>         static int do_execve(struct filename *filename,
>>            const char __user *const __user *__argv,
>>            const char __user *const __user *__envp);
>>
>>       Here, __user could be defined with these annotations to record semantic
>>       information about the pointer parameters (e.g., they are user-provided) in
>>       DWARF and BTF information. Other kernel facilites such as the eBPF verifier
>>       can read the tags and make use of the information.
>>
>> 2)  Conveying the tags in the generated DWARF debug info.
>>
>>       The main motivation for emitting the tags in DWARF is that the Linux kernel
>>       generates its BTF information via pahole, using DWARF as a source:
>>
>>           +--------+  BTF                  BTF   +----------+
>>           | pahole |-------> vmlinux.btf ------->| verifier |
>>           +--------+                             +----------+
>>               ^                                        ^
>>               |                                        |
>>         DWARF |                                    BTF |
>>               |                                        |
>>            vmlinux                              +-------------+
>>            module1.ko                           | BPF program |
>>            module2.ko                           +-------------+
>>              ...
>>
>>       This is because:
>>
>>       a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>
>>       b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>>           support for linking/deduplicating BTF in the linker.
>>
>>       In the scenario above, the verifier needs access to the pointer tags of
>>       both the kernel types/declarations (conveyed in the DWARF and translated
>>       to BTF by pahole) and those of the BPF program (available directly in BTF).
>>
>>       Another motivation for having the tag information in DWARF, unrelated to
>>       BPF and BTF, is that the drgn project (another DWARF consumer) also wants
>>       to benefit from these tags in order to differentiate between different
>>       kinds of pointers in the kernel.
>>
>> 3)  Conveying the tags in the generated BTF debug info.
>>
>>       This is easy: the main purpose of having this info in BTF is for the
>>       compiled eBPF programs. The kernel verifier can then access the tags
>>       of pointers used by the eBPF programs.
>>
>>
>> For more information about these tags and the motivation behind them, please
>> refer to the following linux kernel discussions:
>>
>>     https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
>>     https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
>>     https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/
>>
>>
>> What is in this patch series
>> ============================
>>
>> This patch series adds support for these annotations in GCC. The implementation
>> is largely complete. However, in some cases the produced debug info (both DWARF
>> and BTF) differs significantly from that produced by LLVM. This issue is
>> discussed in detail below, along with a few specific questions for both GCC and
>> LLVM. Any input would be much appreciated.
> 
> Hi, David, Thanks for the RFC implementation! I will answer your
> questions related to llvm and kernel.
> 

Hi Yonghong, thanks for the answers!

>>
>>
>> Implementation Overview
>> =======================
>>
>> To enable these annotations, two new C language attributes are added:
>> __attribute__((btf_decl_tag("foo")) and __attribute__((btf_type_tag("bar"))).
>> Both attributes accept a single arbitrary string constant argument, which will
>> be recorded in the generated DWARF and/or BTF debugging information. They have
>> no effect on code generation.
>>
>> Note that we are using the same attribute names as LLVM, which include "btf"
>> in the name. This may be controversial, as these tags are not really
>> BTF-specific. A different name may be more appropriate. There was much
>> discussion about naming in the proposal for the functionality in LLVM, the
>> full thread can be found here:
>>
>>     https://lists.llvm.org/pipermail/llvm-dev/2021-June/151023.html
>>
>> The name debug_info_annotate, suggested here, might better suit the attribute:
>>
>>     https://lists.llvm.org/pipermail/llvm-dev/2021-June/151042.html
>>
>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
>> declarations and types will be checked for the corresponding attributes. If
>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
>> the annotated type or declaration, one for each tag. These DIEs link the
>> arbitrary tag value to the item they annotate.
>>
>> For example, the following variable declaration:
>>
>>       #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>       #define __decltag1 __attribute__((btf_decl_tag("decl-tag-1")))
>>       #define __decltag2 __attribute__((btf_decl_tag("decl-tag-2")))
>>
>>       int __typetag1 * x __decltag1 __decltag2;
>>
>> Produces the following DIEs:
>>
>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>>       <1f>   DW_AT_name        : x
>>       <21>   DW_AT_decl_file   : 1
>>       <22>   DW_AT_decl_line   : 6
>>       <23>   DW_AT_decl_column : 18
>>       <24>   DW_AT_type        : <0x49>
>>       <28>   DW_AT_external    : 1
>>       <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>>       <32>   DW_AT_sibling     : <0x49>
>>    <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <37>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>>       <3b>   DW_AT_const_value : (indirect string, offset: 0x0): decl-tag-2
>>    <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <40>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>>       <44>   DW_AT_const_value : (indirect string, offset: 0x1d): decl-tag-1
>>    <2><48>: Abbrev Number: 0
>>    <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>>       <4a>   DW_AT_byte_size   : 8
>>       <4b>   DW_AT_type        : <0x5d>
>>       <4f>   DW_AT_sibling     : <0x5d>
>>    <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <54>   DW_AT_name        : (indirect string, offset: 0x28): btf_type_tag
>>       <58>   DW_AT_const_value : (indirect string, offset: 0xd7): type-tag-1
>>    <2><5c>: Abbrev Number: 0
>>    <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>>       <5e>   DW_AT_byte_size   : 4
>>       <5f>   DW_AT_encoding    : 5	(signed)
>>       <60>   DW_AT_name        : int
>>    <1><64>: Abbrev Number: 0
>>
>> Please note that currently, the annotation DWARF DIEs will be generated only if
>> BTF debug information requested (via -gbtf). Therefore, the annotation DIEs
>> will only be output if both BTF and DWARF are requested (e.g. -gbtf -gdwarf).
>> This will change, since these tags are needed even when not generating BTF,
>> for example in a GCC-built Linux kernel.
>>
>> In the case of BTF, the annotations are recorded in two type kinds recently
>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>> The above example declaration prodcues the following BTF information:
>>
>>       [1] int 'int'(1U#B) size=4U#B offset=0UB#b bits=32UB#b SIGNED
>>       [2] ptr <anonymous> type=3
>>       [3] type_tag 'type-tag-1'(5U#B) type=1
>>       [4] decl_tag 'decl-tag-1'(18U#B) type=6 component_idx=-1
>>       [5] decl_tag 'decl-tag-2'(29U#B) type=6 component_idx=-1
>>       [6] var 'x'(16U#B) type=2 linkage=1 (global)
>>
>>
>> Current issues in the implementation
>> ====================================
>>
>> The __attribute__((btf_type_tag ("foo"))) syntax does not work correctly for
>> types involving multiple pointers.
>>
>> Consider the following example:
>>
>>     #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>     #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>     #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>
>>     int __typetag1 * __typetag2 __typetag3 * g;
>>
>> The current implementation produces the following DWARF:
>>
>>    <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>>       <1f>   DW_AT_name        : g
>>       <21>   DW_AT_decl_file   : 1
>>       <22>   DW_AT_decl_line   : 6
>>       <23>   DW_AT_decl_column : 42
>>       <24>   DW_AT_type        : <0x32>
>>       <28>   DW_AT_external    : 1
>>       <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>>    <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>       <33>   DW_AT_byte_size   : 8
>>       <33>   DW_AT_type        : <0x45>
>>       <37>   DW_AT_sibling     : <0x45>
>>    <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <3c>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>>       <40>   DW_AT_const_value : (indirect string, offset: 0xc7): type-tag-1
>>    <2><44>: Abbrev Number: 0
>>    <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>       <46>   DW_AT_byte_size   : 8
>>       <46>   DW_AT_type        : <0x61>
>>       <4a>   DW_AT_sibling     : <0x61>
>>    <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <4f>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>>       <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
>>    <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <58>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>>       <5c>   DW_AT_const_value : (indirect string, offset: 0xd2): type-tag-2
>>    <2><60>: Abbrev Number: 0
>>    <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>>       <62>   DW_AT_byte_size   : 4
>>       <63>   DW_AT_encoding    : 5	(signed)
>>       <64>   DW_AT_name        : int
>>    <1><68>: Abbrev Number: 0
>>
>> This does not agree with the DWARF produced by LLVM/clang for the same case:
>> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
>>
>> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>>       <1f>   DW_AT_name        : (indexed string: 0x3): g
>>       <20>   DW_AT_type        : <0x29>
>>       <24>   DW_AT_external    : 1
>>       <24>   DW_AT_decl_file   : 0
>>       <25>   DW_AT_decl_line   : 6
>>       <26>   DW_AT_location    : 2 byte block: a1 0 	((Unknown location op 0xa1))
>>    <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>       <2a>   DW_AT_type        : <0x35>
>>    <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>>       <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>       <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>>    <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>>       <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>       <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>>    <2><34>: Abbrev Number: 0
>>    <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>       <36>   DW_AT_type        : <0x3e>
>>    <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>>       <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>       <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>>    <2><3d>: Abbrev Number: 0
>>    <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>>       <3f>   DW_AT_name        : (indexed string: 0x4): int
>>       <40>   DW_AT_encoding    : 5	(signed)
>>       <41>   DW_AT_byte_size   : 4
>>    <1><42>: Abbrev Number: 0
>>
>> Notice the structural difference. From the DWARF produced by GCC (i.e. this
>> patch series), variable 'g' is a pointer with tag 'type-tag-1' to a pointer
>> with tags 'type-tag-2' and 'type-tag3' to an int. But from the LLVM DWARF,
>> variable 'g' is a pointer with tags 'type-tag-2' and 'type-tag3' to a pointer
>> to an int.
>>
>> Because GCC produces BTF from the internal DWARF DIE tree, the BTF also differs.
>> This can be seen most obviously in the BTF type reference chains:
>>
>>     GCC
>>       VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>>
>>     LLVM
>>       VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>>
>> It seems that the ultimate cause for this is the structure of the TREE
>> produced by the C frontend parsing and attribute handling. I believe this may
>> be due to differences in __attribute__ syntax parsing between GCC and LLVM.
>>
>> This is the TREE for variable 'g':
>>     int __typetag1 * __typetag2 __typetag3 * g;
>>
>>    <var_decl 0x7ffff7547090 g
>>       type <pointer_type 0x7ffff7548000
>>           type <pointer_type 0x7ffff75097e0 type <integer_type 0x7ffff74495e8 int>
>>               asm_written unsigned DI
>>               size <integer_cst 0x7ffff743c450 constant 64>
>>               unit-size <integer_cst 0x7ffff743c468 constant 8>
>>               align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7450888
>>               attributes <tree_list 0x7ffff75275c8
>>                   purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>                   value <tree_list 0x7ffff7527550
>>                       value <string_cst 0x7ffff75292e0 type <array_type 0x7ffff7509738>
>>                           readonly constant static "type-tag-3\000">>
>>                   chain <tree_list 0x7ffff75275a0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>                       value <tree_list 0x7ffff75274d8
>>                           value <string_cst 0x7ffff75292c0 type <array_type 0x7ffff7509738>
>>                               readonly constant static "type-tag-2\000">>>>
>>               pointer_to_this <pointer_type 0x7ffff7509888>>
>>           asm_written unsigned DI size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>>           align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7509930
>>           attributes <tree_list 0x7ffff75275f0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>               value <tree_list 0x7ffff7527438
>>                   value <string_cst 0x7ffff75292a0 type <array_type 0x7ffff7509738>
>>                       readonly constant static "type-tag-1\000">>>>
>>       public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>>       align:64 warn_if_not_align:0>
>>
>> To me this is surprising. I would have expected the int** type of "g" to have
>> the tags 'type-tag-2' and 'type-tag-3', and the inner (int*) pointer type to
>> have the 'type-tag-1' tag. So far my attempts at resolving this difference in
>> the new attribute handlers for the tag attributes has not been successful.
>>
>> I do not understand why exacly the attributes are attached in this way. I think
>> that it may be related to the pointer cases discussed in the "All other
>> attributes" section here:
>>
>>     https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
>>
>> In particular it seems similar to this example:
>>
>>       char *__attribute__((aligned(8))) *f;
>>
>>     specifies the type “pointer to 8-byte-aligned pointer to char”. Note again
>>     that this does not work with most attributes; for example, the usage of
>>     ‘aligned’ and ‘noreturn’ attributes given above is not yet supported.
>>
>> I am not sure if this section of the documentation is outdated, if scenarios
>> like this one have not been an issue before now, or if there is a way to
>> resolve this within the attribute handler. I am by no means an expert in the C
>> frontend nor attribute handling, if someone with more knowledge could help me
>> understand this case I would be very grateful. :)
>>
>> Questions for GCC
>> =================
>>
>> 1)  How can this issue with the type tags be resolved? Is this a bug or
>>       limitation in the attribute parsing that hasn't been an issue until now?
>>       Oris it that the above case is somehow a "weird" usage of attribtes?
>>
>> 2)  Are attributes the right tool for this? Is there some other mechanism that
>>       would better fit the design of these tags? In some ways the type tags seem
>>       more similar to const/volatile/restrict qualifiers than to most other
>>       attributes.
>>
>>
>> Questions for LLVM / kernel BPF
>> ===============================
>>
>> 1)  What special handling does the LLVM frontend/clang do for these attributes?
>>       Is there anything specific? Or does it simply follow whatever is default?
> 
> the llvm frontend/clang only processed these attributes and encoded them
> in AST, and then only these attributes are encoded in debuginfo.
> For btf_type_tag, only tags to pointee (like int __tag * __tag * var)
> are encoded in debuginfo.

OK. So btf_type_tag attribute can be applied to non-pointer types, but 
it will be effectively ignored.

> 
>>
>> 2)  What is the correct BTF representation for type tags? The documentation for
>>       BTF_KIND_TYPE_TAG in linux/Documentation/bpf/btf.rst seems to conflict with
>>       the output of clang, and the format change that was discussed here:
>>         https://reviews.llvm.org/D113496
>>       I assume the kernel btf.rst might simply be outdated, but I want to be sure.
> 
> Yes, the should be the same.
> The document in linux/Documentation/bpf/btf.rst:
> 
>     ptr -> [type_tag]*
>         -> [const | volatile | restrict | typedef]*
>         -> base_type
> 
> is related to BTF format, which is correct.
> 
> What is not specified is how the following format is
> converted to C code, which is also specified in
> https://reviews.llvm.org/D113496.
> 
>>
>> 3)  Is the ordering of multiple type tags on the same type important?
>>       e.g. for this variable:
>>           int __tag1 __tag2 __tag3 * b;
>>
>>       would it be "correct" (or at least, acceptable) to produce:
>>           VAR(b) -> ptr -> tag2 -> tag3 -> tag1 -> int
>>
>>       or _must_ it be:
>>           VAR(b) -> ptr -> tag3 -> tag2 -> tag1 -> int
>>
>>       In the DWARF representation, all tags are equal sibling children of the type
>>       they annotate, so this 'ordering' problem seems like it only arises because of
>>       the BTF format for type tags.
> 
> No. They are all independent modifiers to the pointee. So any ordering
> in the above should be correct.

OK, thanks.

> 
>>
>> 4)  Are types with the same tags in different orders considered distinct types?
>>       I think the answer is "no", but given the format of the tags in BTF we get
>>       distinct chains for the types I am curious.
>>       e.g.
>>           int __tag1 __tag2 * x;
>>           int __tag2 __tag1 * y;
>>
>>       produces
>>           VAR(x) -> ptr -> tag2 -> tag1 -> int
>>           VAR(y) -> ptr -> tag1 -> tag2 -> int
>>
>>       but would
>>           VAR(y) -> ptr -> tag2 -> tag1 -> int
>>
>>       be just as correct?
> 
> Yes,
>     VAR(y) -> ptr -> tag2 -> tag1 should be correct
>     although the original order is preferred since
>     when we generate vmlinux.h we could like the
>     type definition as close to the original type
>     definition as possible.

I see. Different orderings are technically correct but there is one 
preferred format. Thanks for the clarification.

> 
>>
>> 5)  According to the clang docs, type tags are currently ignored for non-pointer
>>       types. Is pointer tagging e.g. '__user' the only use case so far?
>>
>>       This GCC implementation allows type tags on non-pointer types. Such tags
>>       can be represented in the DWARF but don't make much sense in BTF output,
>>       e.g.
>>
>>           struct __typetag1 S {
>>               int a;
>>               int b;
>>           } __decltag1;
>>
>>           struct S my_s;
>>
>>       This will produce a type tag child DIE of S. In the current implementation,
>>       it will also produce a BTF type tag type, which refers to the __decltag1 BTF
>>       decl tag, which in turn refers to the struct type.  But nothing refers to
>>       the type tag type, currently variable my_s in BTF refers to the struct type
>>       directly.
>>
>>       In my opinion, the DWARF here is useful but the BTF seems odd. What would be
>>       "correct" BTF in such a case?
> 
> Currently in llvm, __typetag1 will not be encoded in dwarf.

OK. Related to 1, type tags on non-pointer types have no effect on debug 
info generation. Got it.

> 
>>
>> 6)  Would LLVM be open to changing the name of the attribute, for example to
>>       'debug_info_annotate' (or any other suggestion)? The use cases for these
>>       tags have grown (e.g. drgn) since they were originally proposed, and the
>>       scope is no longer limited to BTF.
>>
>>       The kernel eBPF developers have said they can accomodate whatever name we
>>       would like to use. So although we in GCC are not tied to the name LLVM
>>       uses, it would be ideal for everyone to use the same attribute name.
> 
> The attribute name, esp. 'btf_type_tag' has been used in the linux
> kernel. So It would be great we can use the same name.
> Not sure whether gcc support or not, maybe has attribute aliases?
> clang doesn't support it though.
>

OK. Will certainly keep this in mind. As I understand it, kernel could 
accommodate (via compiler.h) if we settle on a different name for GCC, 
but I agree using the same name between both compilers would be ideal.

Thanks

> 
> [...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [ping][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
                   ` (8 preceding siblings ...)
  2022-04-04 22:13 ` [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations Yonghong Song
@ 2022-04-18 19:36 ` David Faust
  2022-05-02 16:57   ` [ping2][PATCH " David Faust
  9 siblings, 1 reply; 30+ messages in thread
From: David Faust @ 2022-04-18 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jose E. Marchesi, Yonghong Song

Gentle ping :)

Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592685.html

The series adds support for new attribues btf_type_tag and btf_decl_tag, 
for recording arbitrary string tags in DWARF and BTF debug info. The 
feature is to support kernel use cases.

Thanks,
David

On 4/1/22 12:42, David Faust via Gcc-patches wrote:
> Hello,
> 
> This patch series is a first attempt at adding support for:
> 
> - Two new C-language-level attributes that allow to associate (to "tag")
>    particular declarations and types with arbitrary strings. As explained below,
>    this is intended to be used to, for example, characterize certain pointer
>    types.
> 
> - The conveyance of that information in the DWARF output in the form of a new
>    DIE: DW_TAG_GNU_annotation.
> 
> - The conveyance of that information in the BTF output in the form of two new
>    kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
> 
> All of these facilities are being added to the eBPF ecosystem, and support for
> them exists in some form in LLVM. However, as we shall see, we have found some
> problems implementing them so some discussion is in order.
> 
> Purpose
> =======
> 
> 1)  Addition of C-family language constructs (attributes) to specify free-text
>      tags on certain language elements, such as struct fields.
> 
>      The purpose of these annotations is to provide additional information about
>      types, variables, and function paratemeters of interest to the kernel. A
>      driving use case is to tag pointer types within the linux kernel and eBPF
>      programs with additional semantic information, such as '__user' or '__rcu'.
> 
>      For example, consider the linux kernel function do_execve with the
>      following declaration:
> 
>        static int do_execve(struct filename *filename,
>           const char __user *const __user *__argv,
>           const char __user *const __user *__envp);
> 
>      Here, __user could be defined with these annotations to record semantic
>      information about the pointer parameters (e.g., they are user-provided) in
>      DWARF and BTF information. Other kernel facilites such as the eBPF verifier
>      can read the tags and make use of the information.
> 
> 2)  Conveying the tags in the generated DWARF debug info.
> 
>      The main motivation for emitting the tags in DWARF is that the Linux kernel
>      generates its BTF information via pahole, using DWARF as a source:
> 
>          +--------+  BTF                  BTF   +----------+
>          | pahole |-------> vmlinux.btf ------->| verifier |
>          +--------+                             +----------+
>              ^                                        ^
>              |                                        |
>        DWARF |                                    BTF |
>              |                                        |
>           vmlinux                              +-------------+
>           module1.ko                           | BPF program |
>           module2.ko                           +-------------+
>             ...
> 
>      This is because:
> 
>      a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
> 
>      b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>          support for linking/deduplicating BTF in the linker.
> 
>      In the scenario above, the verifier needs access to the pointer tags of
>      both the kernel types/declarations (conveyed in the DWARF and translated
>      to BTF by pahole) and those of the BPF program (available directly in BTF).
> 
>      Another motivation for having the tag information in DWARF, unrelated to
>      BPF and BTF, is that the drgn project (another DWARF consumer) also wants
>      to benefit from these tags in order to differentiate between different
>      kinds of pointers in the kernel.
> 
> 3)  Conveying the tags in the generated BTF debug info.
> 
>      This is easy: the main purpose of having this info in BTF is for the
>      compiled eBPF programs. The kernel verifier can then access the tags
>      of pointers used by the eBPF programs.
> 
> 
> For more information about these tags and the motivation behind them, please
> refer to the following linux kernel discussions:
> 
>    https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
>    https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
>    https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/
> 
> 
> What is in this patch series
> ============================
> 
> This patch series adds support for these annotations in GCC. The implementation
> is largely complete. However, in some cases the produced debug info (both DWARF
> and BTF) differs significantly from that produced by LLVM. This issue is
> discussed in detail below, along with a few specific questions for both GCC and
> LLVM. Any input would be much appreciated.
> 
> 
> Implementation Overview
> =======================
> 
> To enable these annotations, two new C language attributes are added:
> __attribute__((btf_decl_tag("foo")) and __attribute__((btf_type_tag("bar"))).
> Both attributes accept a single arbitrary string constant argument, which will
> be recorded in the generated DWARF and/or BTF debugging information. They have
> no effect on code generation.
> 
> Note that we are using the same attribute names as LLVM, which include "btf"
> in the name. This may be controversial, as these tags are not really
> BTF-specific. A different name may be more appropriate. There was much
> discussion about naming in the proposal for the functionality in LLVM, the
> full thread can be found here:
> 
>    https://lists.llvm.org/pipermail/llvm-dev/2021-June/151023.html
> 
> The name debug_info_annotate, suggested here, might better suit the attribute:
> 
>    https://lists.llvm.org/pipermail/llvm-dev/2021-June/151042.html
> 
> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
> declarations and types will be checked for the corresponding attributes. If
> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
> the annotated type or declaration, one for each tag. These DIEs link the
> arbitrary tag value to the item they annotate.
> 
> For example, the following variable declaration:
> 
>      #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>      #define __decltag1 __attribute__((btf_decl_tag("decl-tag-1")))
>      #define __decltag2 __attribute__((btf_decl_tag("decl-tag-2")))
> 
>      int __typetag1 * x __decltag1 __decltag2;
> 
> Produces the following DIEs:
> 
> <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>      <1f>   DW_AT_name        : x
>      <21>   DW_AT_decl_file   : 1
>      <22>   DW_AT_decl_line   : 6
>      <23>   DW_AT_decl_column : 18
>      <24>   DW_AT_type        : <0x49>
>      <28>   DW_AT_external    : 1
>      <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>      <32>   DW_AT_sibling     : <0x49>
>   <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <37>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>      <3b>   DW_AT_const_value : (indirect string, offset: 0x0): decl-tag-2
>   <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <40>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>      <44>   DW_AT_const_value : (indirect string, offset: 0x1d): decl-tag-1
>   <2><48>: Abbrev Number: 0
>   <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>      <4a>   DW_AT_byte_size   : 8
>      <4b>   DW_AT_type        : <0x5d>
>      <4f>   DW_AT_sibling     : <0x5d>
>   <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <54>   DW_AT_name        : (indirect string, offset: 0x28): btf_type_tag
>      <58>   DW_AT_const_value : (indirect string, offset: 0xd7): type-tag-1
>   <2><5c>: Abbrev Number: 0
>   <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>      <5e>   DW_AT_byte_size   : 4
>      <5f>   DW_AT_encoding    : 5	(signed)
>      <60>   DW_AT_name        : int
>   <1><64>: Abbrev Number: 0
> 
> Please note that currently, the annotation DWARF DIEs will be generated only if
> BTF debug information requested (via -gbtf). Therefore, the annotation DIEs
> will only be output if both BTF and DWARF are requested (e.g. -gbtf -gdwarf).
> This will change, since these tags are needed even when not generating BTF,
> for example in a GCC-built Linux kernel.
> 
> In the case of BTF, the annotations are recorded in two type kinds recently
> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
> The above example declaration prodcues the following BTF information:
> 
>      [1] int 'int'(1U#B) size=4U#B offset=0UB#b bits=32UB#b SIGNED
>      [2] ptr <anonymous> type=3
>      [3] type_tag 'type-tag-1'(5U#B) type=1
>      [4] decl_tag 'decl-tag-1'(18U#B) type=6 component_idx=-1
>      [5] decl_tag 'decl-tag-2'(29U#B) type=6 component_idx=-1
>      [6] var 'x'(16U#B) type=2 linkage=1 (global)
> 
> 
> Current issues in the implementation
> ====================================
> 
> The __attribute__((btf_type_tag ("foo"))) syntax does not work correctly for
> types involving multiple pointers.
> 
> Consider the following example:
> 
>    #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>    #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>    #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
> 
>    int __typetag1 * __typetag2 __typetag3 * g;
> 
> The current implementation produces the following DWARF:
> 
>   <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>      <1f>   DW_AT_name        : g
>      <21>   DW_AT_decl_file   : 1
>      <22>   DW_AT_decl_line   : 6
>      <23>   DW_AT_decl_column : 42
>      <24>   DW_AT_type        : <0x32>
>      <28>   DW_AT_external    : 1
>      <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>   <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>      <33>   DW_AT_byte_size   : 8
>      <33>   DW_AT_type        : <0x45>
>      <37>   DW_AT_sibling     : <0x45>
>   <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <3c>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>      <40>   DW_AT_const_value : (indirect string, offset: 0xc7): type-tag-1
>   <2><44>: Abbrev Number: 0
>   <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>      <46>   DW_AT_byte_size   : 8
>      <46>   DW_AT_type        : <0x61>
>      <4a>   DW_AT_sibling     : <0x61>
>   <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <4f>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>      <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
>   <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>      <58>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>      <5c>   DW_AT_const_value : (indirect string, offset: 0xd2): type-tag-2
>   <2><60>: Abbrev Number: 0
>   <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>      <62>   DW_AT_byte_size   : 4
>      <63>   DW_AT_encoding    : 5	(signed)
>      <64>   DW_AT_name        : int
>   <1><68>: Abbrev Number: 0
> 
> This does not agree with the DWARF produced by LLVM/clang for the same case:
> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
> 
> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>      <1f>   DW_AT_name        : (indexed string: 0x3): g
>      <20>   DW_AT_type        : <0x29>
>      <24>   DW_AT_external    : 1
>      <24>   DW_AT_decl_file   : 0
>      <25>   DW_AT_decl_line   : 6
>      <26>   DW_AT_location    : 2 byte block: a1 0 	((Unknown location op 0xa1))
>   <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>      <2a>   DW_AT_type        : <0x35>
>   <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>      <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>      <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>   <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>      <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>      <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>   <2><34>: Abbrev Number: 0
>   <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>      <36>   DW_AT_type        : <0x3e>
>   <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>      <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>      <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>   <2><3d>: Abbrev Number: 0
>   <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>      <3f>   DW_AT_name        : (indexed string: 0x4): int
>      <40>   DW_AT_encoding    : 5	(signed)
>      <41>   DW_AT_byte_size   : 4
>   <1><42>: Abbrev Number: 0
> 
> Notice the structural difference. From the DWARF produced by GCC (i.e. this
> patch series), variable 'g' is a pointer with tag 'type-tag-1' to a pointer
> with tags 'type-tag-2' and 'type-tag3' to an int. But from the LLVM DWARF,
> variable 'g' is a pointer with tags 'type-tag-2' and 'type-tag3' to a pointer
> to an int.
> 
> Because GCC produces BTF from the internal DWARF DIE tree, the BTF also differs.
> This can be seen most obviously in the BTF type reference chains:
> 
>    GCC
>      VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
> 
>    LLVM
>      VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
> 
> It seems that the ultimate cause for this is the structure of the TREE
> produced by the C frontend parsing and attribute handling. I believe this may
> be due to differences in __attribute__ syntax parsing between GCC and LLVM.
> 
> This is the TREE for variable 'g':
>    int __typetag1 * __typetag2 __typetag3 * g;
> 
>   <var_decl 0x7ffff7547090 g
>      type <pointer_type 0x7ffff7548000
>          type <pointer_type 0x7ffff75097e0 type <integer_type 0x7ffff74495e8 int>
>              asm_written unsigned DI
>              size <integer_cst 0x7ffff743c450 constant 64>
>              unit-size <integer_cst 0x7ffff743c468 constant 8>
>              align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7450888
>              attributes <tree_list 0x7ffff75275c8
>                  purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>                  value <tree_list 0x7ffff7527550
>                      value <string_cst 0x7ffff75292e0 type <array_type 0x7ffff7509738>
>                          readonly constant static "type-tag-3\000">>
>                  chain <tree_list 0x7ffff75275a0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>                      value <tree_list 0x7ffff75274d8
>                          value <string_cst 0x7ffff75292c0 type <array_type 0x7ffff7509738>
>                              readonly constant static "type-tag-2\000">>>>
>              pointer_to_this <pointer_type 0x7ffff7509888>>
>          asm_written unsigned DI size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>          align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7509930
>          attributes <tree_list 0x7ffff75275f0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>              value <tree_list 0x7ffff7527438
>                  value <string_cst 0x7ffff75292a0 type <array_type 0x7ffff7509738>
>                      readonly constant static "type-tag-1\000">>>>
>      public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>      align:64 warn_if_not_align:0>
> 
> To me this is surprising. I would have expected the int** type of "g" to have
> the tags 'type-tag-2' and 'type-tag-3', and the inner (int*) pointer type to
> have the 'type-tag-1' tag. So far my attempts at resolving this difference in
> the new attribute handlers for the tag attributes has not been successful.
> 
> I do not understand why exacly the attributes are attached in this way. I think
> that it may be related to the pointer cases discussed in the "All other
> attributes" section here:
> 
>    https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
> 
> In particular it seems similar to this example:
> 
>      char *__attribute__((aligned(8))) *f;
> 
>    specifies the type “pointer to 8-byte-aligned pointer to char”. Note again
>    that this does not work with most attributes; for example, the usage of
>    ‘aligned’ and ‘noreturn’ attributes given above is not yet supported.
> 
> I am not sure if this section of the documentation is outdated, if scenarios
> like this one have not been an issue before now, or if there is a way to
> resolve this within the attribute handler. I am by no means an expert in the C
> frontend nor attribute handling, if someone with more knowledge could help me
> understand this case I would be very grateful. :)
> 
> Questions for GCC
> =================
> 
> 1)  How can this issue with the type tags be resolved? Is this a bug or
>      limitation in the attribute parsing that hasn't been an issue until now?
>      Oris it that the above case is somehow a "weird" usage of attribtes?
> 
> 2)  Are attributes the right tool for this? Is there some other mechanism that
>      would better fit the design of these tags? In some ways the type tags seem
>      more similar to const/volatile/restrict qualifiers than to most other
>      attributes.
> 
> 
> Questions for LLVM / kernel BPF
> ===============================
> 
> 1)  What special handling does the LLVM frontend/clang do for these attributes?
>      Is there anything specific? Or does it simply follow whatever is default?
> 
> 2)  What is the correct BTF representation for type tags? The documentation for
>      BTF_KIND_TYPE_TAG in linux/Documentation/bpf/btf.rst seems to conflict with
>      the output of clang, and the format change that was discussed here:
>        https://reviews.llvm.org/D113496
>      I assume the kernel btf.rst might simply be outdated, but I want to be sure.
> 
> 3)  Is the ordering of multiple type tags on the same type important?
>      e.g. for this variable:
>          int __tag1 __tag2 __tag3 * b;
> 
>      would it be "correct" (or at least, acceptable) to produce:
>          VAR(b) -> ptr -> tag2 -> tag3 -> tag1 -> int
> 
>      or _must_ it be:
>          VAR(b) -> ptr -> tag3 -> tag2 -> tag1 -> int
> 
>      In the DWARF representation, all tags are equal sibling children of the type
>      they annotate, so this 'ordering' problem seems like it only arises because of
>      the BTF format for type tags.
> 
> 4)  Are types with the same tags in different orders considered distinct types?
>      I think the answer is "no", but given the format of the tags in BTF we get
>      distinct chains for the types I am curious.
>      e.g.
>          int __tag1 __tag2 * x;
>          int __tag2 __tag1 * y;
> 
>      produces
>          VAR(x) -> ptr -> tag2 -> tag1 -> int
>          VAR(y) -> ptr -> tag1 -> tag2 -> int
> 
>      but would
>          VAR(y) -> ptr -> tag2 -> tag1 -> int
> 
>      be just as correct?
> 
> 5)  According to the clang docs, type tags are currently ignored for non-pointer
>      types. Is pointer tagging e.g. '__user' the only use case so far?
> 
>      This GCC implementation allows type tags on non-pointer types. Such tags
>      can be represented in the DWARF but don't make much sense in BTF output,
>      e.g.
> 
>          struct __typetag1 S {
>              int a;
>              int b;
>          } __decltag1;
> 
>          struct S my_s;
> 
>      This will produce a type tag child DIE of S. In the current implementation,
>      it will also produce a BTF type tag type, which refers to the __decltag1 BTF
>      decl tag, which in turn refers to the struct type.  But nothing refers to
>      the type tag type, currently variable my_s in BTF refers to the struct type
>      directly.
> 
>      In my opinion, the DWARF here is useful but the BTF seems odd. What would be
>      "correct" BTF in such a case?
> 
> 6)  Would LLVM be open to changing the name of the attribute, for example to
>      'debug_info_annotate' (or any other suggestion)? The use cases for these
>      tags have grown (e.g. drgn) since they were originally proposed, and the
>      scope is no longer limited to BTF.
> 
>      The kernel eBPF developers have said they can accomodate whatever name we
>      would like to use. So although we in GCC are not tied to the name LLVM
>      uses, it would be ideal for everyone to use the same attribute name.
> 
> Thanks!
> 
> David
> 
> David Faust (8):
>    dwarf: Add dw_get_die_parent function
>    include: Add BTF tag defines to dwarf2 and btf
>    c-family: Add BTF tag attribute handlers
>    dwarf: create BTF decl and type tag DIEs
>    ctfc: Add support to pass through BTF annotations
>    dwarf2ctf: convert tag DIEs to CTF types
>    Output BTF DECL_TAG and TYPE_TAG types
>    testsuite: Add tests for BTF tags
> 
>   gcc/btfout.cc                                 |  28 +++++
>   gcc/c-family/c-attribs.cc                     |  45 +++++++
>   gcc/ctf-int.h                                 |  29 +++++
>   gcc/ctfc.cc                                   |  11 +-
>   gcc/ctfc.h                                    |  17 ++-
>   gcc/dwarf2ctf.cc                              | 115 +++++++++++++++++-
>   gcc/dwarf2out.cc                              | 110 +++++++++++++++++
>   gcc/dwarf2out.h                               |   1 +
>   .../gcc.dg/debug/btf/btf-decltag-func.c       |  18 +++
>   .../gcc.dg/debug/btf/btf-decltag-sou.c        |  34 ++++++
>   .../gcc.dg/debug/btf/btf-decltag-typedef.c    |  15 +++
>   .../gcc.dg/debug/btf/btf-typetag-1.c          |  20 +++
>   .../gcc.dg/debug/dwarf2/annotation-1.c        |  29 +++++
>   include/btf.h                                 |  17 ++-
>   include/dwarf2.def                            |   4 +
>   15 files changed, 482 insertions(+), 11 deletions(-)
>   create mode 100644 gcc/ctf-int.h
>   create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
>   create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
>   create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c
>   create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-04-18 19:36 ` [ping][PATCH " David Faust
@ 2022-05-02 16:57   ` David Faust
  2022-05-03 22:32     ` Joseph Myers
  0 siblings, 1 reply; 30+ messages in thread
From: David Faust @ 2022-05-02 16:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jose E. Marchesi, Yonghong Song

Pinging this series again.
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592685.html

This series adds new C-family frontend attributes for recording string 
"tags" in DWARF and BTF debug info to support kernel use cases.

There remains one issue in the implementation which has not been 
resolved, which I hope someone in the GCC community may be able to shed 
some light on.

Specifically, it is related to how GCC parses the attributes: In cases 
where the new btf_type_tag attribute (which applies to types) is 
specified multiple times on different intermediate pointer types of a 
declaration, GCC seems to attach the attributes to the TREEs in an 
unexpected order.

As a result the behavior of the attribute in GCC cannot be reconciled 
with its definition in BTF or behavior in the clang compiler.

Consider the following example:

    #define __typetag1 __attribute__((btf_type_tag("tag1")))
    #define __typetag2 __attribute__((btf_type_tag("tag2")))
    #define __typetag3 __attribute__((btf_type_tag("tag3")))

    int __typetag1 * __typetag2 __typetag3 * g;

The expected behavior is that 'g' is "a pointer with tags 'tag2' and 
'tag3', to a pointer with tag 'tag1' to an int". i.e.:

   <var decl g
     <type pointer_type
       <attributes 'tag2', 'tag3'>
       <type pointer_type
         <attributes 'tag1'>
         <type int>>>>

But GCC's attribute parsing produces a variable 'g' which is "a pointer 
with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.

   <var decl g
     <type pointer_type
       <attributes 'tag1'>
       <type pointer_type
         <attributes 'tag2', 'tag3'>
         <type int>>>>

And as a result the DWARF and BTF generated for cases like this do not 
agree with the BTF type tag specification nor output of the clang 
compiler, which already supports this feature.

(Please refer to the "Current issues in implementation" section in the 
series cover letter for the full details of this example.)

So far I have been unable to resolve this issue in the btf_type_tag 
attribute handler. It seems to me that the cause must be "higher up" in 
the C frontend attribute parsing but I am not familiar with this area of 
GCC.

Any insight into understanding this issue or comments elsewhere in the 
series would be most welcome.

Thanks,
David

On 4/18/22 12:36, David Faust via Gcc-patches wrote:
> Gentle ping :)
> 
> Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592685.html
> 
> The series adds support for new attribues btf_type_tag and btf_decl_tag,
> for recording arbitrary string tags in DWARF and BTF debug info. The
> feature is to support kernel use cases.
> 
> Thanks,
> David
> 
> On 4/1/22 12:42, David Faust via Gcc-patches wrote:
>> Hello,
>>
>> This patch series is a first attempt at adding support for:
>>
>> - Two new C-language-level attributes that allow to associate (to "tag")
>>     particular declarations and types with arbitrary strings. As explained below,
>>     this is intended to be used to, for example, characterize certain pointer
>>     types.
>>
>> - The conveyance of that information in the DWARF output in the form of a new
>>     DIE: DW_TAG_GNU_annotation.
>>
>> - The conveyance of that information in the BTF output in the form of two new
>>     kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>
>> All of these facilities are being added to the eBPF ecosystem, and support for
>> them exists in some form in LLVM. However, as we shall see, we have found some
>> problems implementing them so some discussion is in order.
>>
>> Purpose
>> =======
>>
>> 1)  Addition of C-family language constructs (attributes) to specify free-text
>>       tags on certain language elements, such as struct fields.
>>
>>       The purpose of these annotations is to provide additional information about
>>       types, variables, and function paratemeters of interest to the kernel. A
>>       driving use case is to tag pointer types within the linux kernel and eBPF
>>       programs with additional semantic information, such as '__user' or '__rcu'.
>>
>>       For example, consider the linux kernel function do_execve with the
>>       following declaration:
>>
>>         static int do_execve(struct filename *filename,
>>            const char __user *const __user *__argv,
>>            const char __user *const __user *__envp);
>>
>>       Here, __user could be defined with these annotations to record semantic
>>       information about the pointer parameters (e.g., they are user-provided) in
>>       DWARF and BTF information. Other kernel facilites such as the eBPF verifier
>>       can read the tags and make use of the information.
>>
>> 2)  Conveying the tags in the generated DWARF debug info.
>>
>>       The main motivation for emitting the tags in DWARF is that the Linux kernel
>>       generates its BTF information via pahole, using DWARF as a source:
>>
>>           +--------+  BTF                  BTF   +----------+
>>           | pahole |-------> vmlinux.btf ------->| verifier |
>>           +--------+                             +----------+
>>               ^                                        ^
>>               |                                        |
>>         DWARF |                                    BTF |
>>               |                                        |
>>            vmlinux                              +-------------+
>>            module1.ko                           | BPF program |
>>            module2.ko                           +-------------+
>>              ...
>>
>>       This is because:
>>
>>       a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>
>>       b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>>           support for linking/deduplicating BTF in the linker.
>>
>>       In the scenario above, the verifier needs access to the pointer tags of
>>       both the kernel types/declarations (conveyed in the DWARF and translated
>>       to BTF by pahole) and those of the BPF program (available directly in BTF).
>>
>>       Another motivation for having the tag information in DWARF, unrelated to
>>       BPF and BTF, is that the drgn project (another DWARF consumer) also wants
>>       to benefit from these tags in order to differentiate between different
>>       kinds of pointers in the kernel.
>>
>> 3)  Conveying the tags in the generated BTF debug info.
>>
>>       This is easy: the main purpose of having this info in BTF is for the
>>       compiled eBPF programs. The kernel verifier can then access the tags
>>       of pointers used by the eBPF programs.
>>
>>
>> For more information about these tags and the motivation behind them, please
>> refer to the following linux kernel discussions:
>>
>>     https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
>>     https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
>>     https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/
>>
>>
>> What is in this patch series
>> ============================
>>
>> This patch series adds support for these annotations in GCC. The implementation
>> is largely complete. However, in some cases the produced debug info (both DWARF
>> and BTF) differs significantly from that produced by LLVM. This issue is
>> discussed in detail below, along with a few specific questions for both GCC and
>> LLVM. Any input would be much appreciated.
>>
>>
>> Implementation Overview
>> =======================
>>
>> To enable these annotations, two new C language attributes are added:
>> __attribute__((btf_decl_tag("foo")) and __attribute__((btf_type_tag("bar"))).
>> Both attributes accept a single arbitrary string constant argument, which will
>> be recorded in the generated DWARF and/or BTF debugging information. They have
>> no effect on code generation.
>>
>> Note that we are using the same attribute names as LLVM, which include "btf"
>> in the name. This may be controversial, as these tags are not really
>> BTF-specific. A different name may be more appropriate. There was much
>> discussion about naming in the proposal for the functionality in LLVM, the
>> full thread can be found here:
>>
>>     https://lists.llvm.org/pipermail/llvm-dev/2021-June/151023.html
>>
>> The name debug_info_annotate, suggested here, might better suit the attribute:
>>
>>     https://lists.llvm.org/pipermail/llvm-dev/2021-June/151042.html
>>
>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
>> declarations and types will be checked for the corresponding attributes. If
>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
>> the annotated type or declaration, one for each tag. These DIEs link the
>> arbitrary tag value to the item they annotate.
>>
>> For example, the following variable declaration:
>>
>>       #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>       #define __decltag1 __attribute__((btf_decl_tag("decl-tag-1")))
>>       #define __decltag2 __attribute__((btf_decl_tag("decl-tag-2")))
>>
>>       int __typetag1 * x __decltag1 __decltag2;
>>
>> Produces the following DIEs:
>>
>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>>       <1f>   DW_AT_name        : x
>>       <21>   DW_AT_decl_file   : 1
>>       <22>   DW_AT_decl_line   : 6
>>       <23>   DW_AT_decl_column : 18
>>       <24>   DW_AT_type        : <0x49>
>>       <28>   DW_AT_external    : 1
>>       <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>>       <32>   DW_AT_sibling     : <0x49>
>>    <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <37>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>>       <3b>   DW_AT_const_value : (indirect string, offset: 0x0): decl-tag-2
>>    <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <40>   DW_AT_name        : (indirect string, offset: 0x10): btf_decl_tag
>>       <44>   DW_AT_const_value : (indirect string, offset: 0x1d): decl-tag-1
>>    <2><48>: Abbrev Number: 0
>>    <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>>       <4a>   DW_AT_byte_size   : 8
>>       <4b>   DW_AT_type        : <0x5d>
>>       <4f>   DW_AT_sibling     : <0x5d>
>>    <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <54>   DW_AT_name        : (indirect string, offset: 0x28): btf_type_tag
>>       <58>   DW_AT_const_value : (indirect string, offset: 0xd7): type-tag-1
>>    <2><5c>: Abbrev Number: 0
>>    <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>>       <5e>   DW_AT_byte_size   : 4
>>       <5f>   DW_AT_encoding    : 5	(signed)
>>       <60>   DW_AT_name        : int
>>    <1><64>: Abbrev Number: 0
>>
>> Please note that currently, the annotation DWARF DIEs will be generated only if
>> BTF debug information requested (via -gbtf). Therefore, the annotation DIEs
>> will only be output if both BTF and DWARF are requested (e.g. -gbtf -gdwarf).
>> This will change, since these tags are needed even when not generating BTF,
>> for example in a GCC-built Linux kernel.
>>
>> In the case of BTF, the annotations are recorded in two type kinds recently
>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>> The above example declaration prodcues the following BTF information:
>>
>>       [1] int 'int'(1U#B) size=4U#B offset=0UB#b bits=32UB#b SIGNED
>>       [2] ptr <anonymous> type=3
>>       [3] type_tag 'type-tag-1'(5U#B) type=1
>>       [4] decl_tag 'decl-tag-1'(18U#B) type=6 component_idx=-1
>>       [5] decl_tag 'decl-tag-2'(29U#B) type=6 component_idx=-1
>>       [6] var 'x'(16U#B) type=2 linkage=1 (global)
>>
>>
>> Current issues in the implementation
>> ====================================
>>
>> The __attribute__((btf_type_tag ("foo"))) syntax does not work correctly for
>> types involving multiple pointers.
>>
>> Consider the following example:
>>
>>     #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>     #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>     #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>
>>     int __typetag1 * __typetag2 __typetag3 * g;
>>
>> The current implementation produces the following DWARF:
>>
>>    <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>>       <1f>   DW_AT_name        : g
>>       <21>   DW_AT_decl_file   : 1
>>       <22>   DW_AT_decl_line   : 6
>>       <23>   DW_AT_decl_column : 42
>>       <24>   DW_AT_type        : <0x32>
>>       <28>   DW_AT_external    : 1
>>       <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>>    <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>       <33>   DW_AT_byte_size   : 8
>>       <33>   DW_AT_type        : <0x45>
>>       <37>   DW_AT_sibling     : <0x45>
>>    <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <3c>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>>       <40>   DW_AT_const_value : (indirect string, offset: 0xc7): type-tag-1
>>    <2><44>: Abbrev Number: 0
>>    <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>       <46>   DW_AT_byte_size   : 8
>>       <46>   DW_AT_type        : <0x61>
>>       <4a>   DW_AT_sibling     : <0x61>
>>    <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <4f>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>>       <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
>>    <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>>       <58>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>>       <5c>   DW_AT_const_value : (indirect string, offset: 0xd2): type-tag-2
>>    <2><60>: Abbrev Number: 0
>>    <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>>       <62>   DW_AT_byte_size   : 4
>>       <63>   DW_AT_encoding    : 5	(signed)
>>       <64>   DW_AT_name        : int
>>    <1><68>: Abbrev Number: 0
>>
>> This does not agree with the DWARF produced by LLVM/clang for the same case:
>> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
>>
>> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>>       <1f>   DW_AT_name        : (indexed string: 0x3): g
>>       <20>   DW_AT_type        : <0x29>
>>       <24>   DW_AT_external    : 1
>>       <24>   DW_AT_decl_file   : 0
>>       <25>   DW_AT_decl_line   : 6
>>       <26>   DW_AT_location    : 2 byte block: a1 0 	((Unknown location op 0xa1))
>>    <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>       <2a>   DW_AT_type        : <0x35>
>>    <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>>       <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>       <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>>    <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>>       <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>       <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>>    <2><34>: Abbrev Number: 0
>>    <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>       <36>   DW_AT_type        : <0x3e>
>>    <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>>       <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>       <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>>    <2><3d>: Abbrev Number: 0
>>    <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>>       <3f>   DW_AT_name        : (indexed string: 0x4): int
>>       <40>   DW_AT_encoding    : 5	(signed)
>>       <41>   DW_AT_byte_size   : 4
>>    <1><42>: Abbrev Number: 0
>>
>> Notice the structural difference. From the DWARF produced by GCC (i.e. this
>> patch series), variable 'g' is a pointer with tag 'type-tag-1' to a pointer
>> with tags 'type-tag-2' and 'type-tag3' to an int. But from the LLVM DWARF,
>> variable 'g' is a pointer with tags 'type-tag-2' and 'type-tag3' to a pointer
>> to an int.
>>
>> Because GCC produces BTF from the internal DWARF DIE tree, the BTF also differs.
>> This can be seen most obviously in the BTF type reference chains:
>>
>>     GCC
>>       VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>>
>>     LLVM
>>       VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>>
>> It seems that the ultimate cause for this is the structure of the TREE
>> produced by the C frontend parsing and attribute handling. I believe this may
>> be due to differences in __attribute__ syntax parsing between GCC and LLVM.
>>
>> This is the TREE for variable 'g':
>>     int __typetag1 * __typetag2 __typetag3 * g;
>>
>>    <var_decl 0x7ffff7547090 g
>>       type <pointer_type 0x7ffff7548000
>>           type <pointer_type 0x7ffff75097e0 type <integer_type 0x7ffff74495e8 int>
>>               asm_written unsigned DI
>>               size <integer_cst 0x7ffff743c450 constant 64>
>>               unit-size <integer_cst 0x7ffff743c468 constant 8>
>>               align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7450888
>>               attributes <tree_list 0x7ffff75275c8
>>                   purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>                   value <tree_list 0x7ffff7527550
>>                       value <string_cst 0x7ffff75292e0 type <array_type 0x7ffff7509738>
>>                           readonly constant static "type-tag-3\000">>
>>                   chain <tree_list 0x7ffff75275a0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>                       value <tree_list 0x7ffff75274d8
>>                           value <string_cst 0x7ffff75292c0 type <array_type 0x7ffff7509738>
>>                               readonly constant static "type-tag-2\000">>>>
>>               pointer_to_this <pointer_type 0x7ffff7509888>>
>>           asm_written unsigned DI size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>>           align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7509930
>>           attributes <tree_list 0x7ffff75275f0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>               value <tree_list 0x7ffff7527438
>>                   value <string_cst 0x7ffff75292a0 type <array_type 0x7ffff7509738>
>>                       readonly constant static "type-tag-1\000">>>>
>>       public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>>       align:64 warn_if_not_align:0>
>>
>> To me this is surprising. I would have expected the int** type of "g" to have
>> the tags 'type-tag-2' and 'type-tag-3', and the inner (int*) pointer type to
>> have the 'type-tag-1' tag. So far my attempts at resolving this difference in
>> the new attribute handlers for the tag attributes has not been successful.
>>
>> I do not understand why exacly the attributes are attached in this way. I think
>> that it may be related to the pointer cases discussed in the "All other
>> attributes" section here:
>>
>>     https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
>>
>> In particular it seems similar to this example:
>>
>>       char *__attribute__((aligned(8))) *f;
>>
>>     specifies the type “pointer to 8-byte-aligned pointer to char”. Note again
>>     that this does not work with most attributes; for example, the usage of
>>     ‘aligned’ and ‘noreturn’ attributes given above is not yet supported.
>>
>> I am not sure if this section of the documentation is outdated, if scenarios
>> like this one have not been an issue before now, or if there is a way to
>> resolve this within the attribute handler. I am by no means an expert in the C
>> frontend nor attribute handling, if someone with more knowledge could help me
>> understand this case I would be very grateful. :)
>>
>> Questions for GCC
>> =================
>>
>> 1)  How can this issue with the type tags be resolved? Is this a bug or
>>       limitation in the attribute parsing that hasn't been an issue until now?
>>       Oris it that the above case is somehow a "weird" usage of attribtes?
>>
>> 2)  Are attributes the right tool for this? Is there some other mechanism that
>>       would better fit the design of these tags? In some ways the type tags seem
>>       more similar to const/volatile/restrict qualifiers than to most other
>>       attributes.
>>
>>
>> Questions for LLVM / kernel BPF
>> ===============================
>>
>> 1)  What special handling does the LLVM frontend/clang do for these attributes?
>>       Is there anything specific? Or does it simply follow whatever is default?
>>
>> 2)  What is the correct BTF representation for type tags? The documentation for
>>       BTF_KIND_TYPE_TAG in linux/Documentation/bpf/btf.rst seems to conflict with
>>       the output of clang, and the format change that was discussed here:
>>         https://reviews.llvm.org/D113496
>>       I assume the kernel btf.rst might simply be outdated, but I want to be sure.
>>
>> 3)  Is the ordering of multiple type tags on the same type important?
>>       e.g. for this variable:
>>           int __tag1 __tag2 __tag3 * b;
>>
>>       would it be "correct" (or at least, acceptable) to produce:
>>           VAR(b) -> ptr -> tag2 -> tag3 -> tag1 -> int
>>
>>       or _must_ it be:
>>           VAR(b) -> ptr -> tag3 -> tag2 -> tag1 -> int
>>
>>       In the DWARF representation, all tags are equal sibling children of the type
>>       they annotate, so this 'ordering' problem seems like it only arises because of
>>       the BTF format for type tags.
>>
>> 4)  Are types with the same tags in different orders considered distinct types?
>>       I think the answer is "no", but given the format of the tags in BTF we get
>>       distinct chains for the types I am curious.
>>       e.g.
>>           int __tag1 __tag2 * x;
>>           int __tag2 __tag1 * y;
>>
>>       produces
>>           VAR(x) -> ptr -> tag2 -> tag1 -> int
>>           VAR(y) -> ptr -> tag1 -> tag2 -> int
>>
>>       but would
>>           VAR(y) -> ptr -> tag2 -> tag1 -> int
>>
>>       be just as correct?
>>
>> 5)  According to the clang docs, type tags are currently ignored for non-pointer
>>       types. Is pointer tagging e.g. '__user' the only use case so far?
>>
>>       This GCC implementation allows type tags on non-pointer types. Such tags
>>       can be represented in the DWARF but don't make much sense in BTF output,
>>       e.g.
>>
>>           struct __typetag1 S {
>>               int a;
>>               int b;
>>           } __decltag1;
>>
>>           struct S my_s;
>>
>>       This will produce a type tag child DIE of S. In the current implementation,
>>       it will also produce a BTF type tag type, which refers to the __decltag1 BTF
>>       decl tag, which in turn refers to the struct type.  But nothing refers to
>>       the type tag type, currently variable my_s in BTF refers to the struct type
>>       directly.
>>
>>       In my opinion, the DWARF here is useful but the BTF seems odd. What would be
>>       "correct" BTF in such a case?
>>
>> 6)  Would LLVM be open to changing the name of the attribute, for example to
>>       'debug_info_annotate' (or any other suggestion)? The use cases for these
>>       tags have grown (e.g. drgn) since they were originally proposed, and the
>>       scope is no longer limited to BTF.
>>
>>       The kernel eBPF developers have said they can accomodate whatever name we
>>       would like to use. So although we in GCC are not tied to the name LLVM
>>       uses, it would be ideal for everyone to use the same attribute name.
>>
>> Thanks!
>>
>> David
>>
>> David Faust (8):
>>     dwarf: Add dw_get_die_parent function
>>     include: Add BTF tag defines to dwarf2 and btf
>>     c-family: Add BTF tag attribute handlers
>>     dwarf: create BTF decl and type tag DIEs
>>     ctfc: Add support to pass through BTF annotations
>>     dwarf2ctf: convert tag DIEs to CTF types
>>     Output BTF DECL_TAG and TYPE_TAG types
>>     testsuite: Add tests for BTF tags
>>
>>    gcc/btfout.cc                                 |  28 +++++
>>    gcc/c-family/c-attribs.cc                     |  45 +++++++
>>    gcc/ctf-int.h                                 |  29 +++++
>>    gcc/ctfc.cc                                   |  11 +-
>>    gcc/ctfc.h                                    |  17 ++-
>>    gcc/dwarf2ctf.cc                              | 115 +++++++++++++++++-
>>    gcc/dwarf2out.cc                              | 110 +++++++++++++++++
>>    gcc/dwarf2out.h                               |   1 +
>>    .../gcc.dg/debug/btf/btf-decltag-func.c       |  18 +++
>>    .../gcc.dg/debug/btf/btf-decltag-sou.c        |  34 ++++++
>>    .../gcc.dg/debug/btf/btf-decltag-typedef.c    |  15 +++
>>    .../gcc.dg/debug/btf/btf-typetag-1.c          |  20 +++
>>    .../gcc.dg/debug/dwarf2/annotation-1.c        |  29 +++++
>>    include/btf.h                                 |  17 ++-
>>    include/dwarf2.def                            |   4 +
>>    15 files changed, 482 insertions(+), 11 deletions(-)
>>    create mode 100644 gcc/ctf-int.h
>>    create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
>>    create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
>>    create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c
>>    create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c
>>    create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-02 16:57   ` [ping2][PATCH " David Faust
@ 2022-05-03 22:32     ` Joseph Myers
  2022-05-04 17:03       ` David Faust
  0 siblings, 1 reply; 30+ messages in thread
From: Joseph Myers @ 2022-05-03 22:32 UTC (permalink / raw)
  To: David Faust; +Cc: gcc-patches, Yonghong Song

On Mon, 2 May 2022, David Faust via Gcc-patches wrote:

> Consider the following example:
> 
>    #define __typetag1 __attribute__((btf_type_tag("tag1")))
>    #define __typetag2 __attribute__((btf_type_tag("tag2")))
>    #define __typetag3 __attribute__((btf_type_tag("tag3")))
> 
>    int __typetag1 * __typetag2 __typetag3 * g;
> 
> The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3',
> to a pointer with tag 'tag1' to an int". i.e.:

That's not a correct expectation for either GNU __attribute__ or C2x [[]] 
attribute syntax.  In either syntax, __typetag2 __typetag3 should apply to 
the type to which g points, not to g or its type, just as if you had a 
type qualifier there.  You'd need to put the attributes (or qualifier) 
after the *, not before, to make them apply to the pointer type.  See 
"Attribute Syntax" in the GCC manual for how the syntax is defined for GNU 
attributes and deduce in turn, for each subsequence of the tokens matching 
the syntax for some kind of declarator, what the type for "T D1" would be 
as defined there and in the C standard, as deduced from the type for "T D" 
for a sub-declarator D.

> But GCC's attribute parsing produces a variable 'g' which is "a pointer with
> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.

In GNU syntax, __typetag1 applies to the declaration, whereas in C2x 
syntax it applies to int.  Again, if you wanted it to apply to the pointer 
type it would need to go after the * not before.

If you are concerned with the fine details of what construct an attribute 
appertains to, I recommend using C2x syntax not GNU syntax.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-03 22:32     ` Joseph Myers
@ 2022-05-04 17:03       ` David Faust
  2022-05-05 23:00         ` Yonghong Song
  0 siblings, 1 reply; 30+ messages in thread
From: David Faust @ 2022-05-04 17:03 UTC (permalink / raw)
  To: Joseph Myers, Yonghong Song; +Cc: gcc-patches



On 5/3/22 15:32, Joseph Myers wrote:
> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
> 
>> Consider the following example:
>>
>>     #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>     #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>     #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>
>>     int __typetag1 * __typetag2 __typetag3 * g;
>>
>> The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3',
>> to a pointer with tag 'tag1' to an int". i.e.:
> 
> That's not a correct expectation for either GNU __attribute__ or C2x [[]]
> attribute syntax.  In either syntax, __typetag2 __typetag3 should apply to
> the type to which g points, not to g or its type, just as if you had a
> type qualifier there.  You'd need to put the attributes (or qualifier)
> after the *, not before, to make them apply to the pointer type.  See
> "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU
> attributes and deduce in turn, for each subsequence of the tokens matching
> the syntax for some kind of declarator, what the type for "T D1" would be
> as defined there and in the C standard, as deduced from the type for "T D"
> for a sub-declarator D.
>  >> But GCC's attribute parsing produces a variable 'g' which is "a 
pointer with
>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.
> 
> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
> syntax it applies to int.  Again, if you wanted it to apply to the pointer
> type it would need to go after the * not before.
> 
> If you are concerned with the fine details of what construct an attribute
> appertains to, I recommend using C2x syntax not GNU syntax.
> 

Joseph, thank you! This is very helpful. My understanding of the syntax 
was not correct.

(Actually, I made a bad mistake in paraphrasing this example from the 
discussion of it in the series cover letter. But, the reason why it is 
incorrect is the same.)


Yonghong, is the specific ordering an expectation in BPF programs or 
other users of the tags?

This example comes from my testing against clang to check that the BTF 
generated by both toolchains is compatible. In this case we get 
different results when using the GNU attribute syntax.


To avoid confusion, here is the full example (from the cover letter). 
The difference in the results is clear in the DWARF.

> Consider the following example:
> 
>   #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>   #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>   #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
> 
>   int __typetag1 * __typetag2 __typetag3 * g;
> 
>  <var_decl 0x7ffff7547090 g
>     type <pointer_type 0x7ffff7548000
>         type <pointer_type 0x7ffff75097e0 type <integer_type 0x7ffff74495e8 int>
>             asm_written unsigned DI
>             size <integer_cst 0x7ffff743c450 constant 64>
>             unit-size <integer_cst 0x7ffff743c468 constant 8>
>             align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7450888
>             attributes <tree_list 0x7ffff75275c8
>                 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>                 value <tree_list 0x7ffff7527550
>                     value <string_cst 0x7ffff75292e0 type <array_type 0x7ffff7509738>
>                         readonly constant static "type-tag-3\000">>
>                 chain <tree_list 0x7ffff75275a0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>                     value <tree_list 0x7ffff75274d8
>                         value <string_cst 0x7ffff75292c0 type <array_type 0x7ffff7509738>
>                             readonly constant static "type-tag-2\000">>>>
>             pointer_to_this <pointer_type 0x7ffff7509888>>
>         asm_written unsigned DI size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff7509930
>         attributes <tree_list 0x7ffff75275f0 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>             value <tree_list 0x7ffff7527438
>                 value <string_cst 0x7ffff75292a0 type <array_type 0x7ffff7509738>
>                     readonly constant static "type-tag-1\000">>>>
>     public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>     align:64 warn_if_not_align:0>

> 
> The current implementation produces the following DWARF:
> 
>  <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>     <1f>   DW_AT_name        : g
>     <21>   DW_AT_decl_file   : 1
>     <22>   DW_AT_decl_line   : 6
>     <23>   DW_AT_decl_column : 42
>     <24>   DW_AT_type        : <0x32>
>     <28>   DW_AT_external    : 1
>     <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
>  <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>     <33>   DW_AT_byte_size   : 8
>     <33>   DW_AT_type        : <0x45>
>     <37>   DW_AT_sibling     : <0x45>
>  <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>     <3c>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>     <40>   DW_AT_const_value : (indirect string, offset: 0xc7): type-tag-1
>  <2><44>: Abbrev Number: 0
>  <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>     <46>   DW_AT_byte_size   : 8
>     <46>   DW_AT_type        : <0x61>
>     <4a>   DW_AT_sibling     : <0x61>
>  <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>     <4f>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>     <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
>  <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>     <58>   DW_AT_name        : (indirect string, offset: 0x18): btf_type_tag
>     <5c>   DW_AT_const_value : (indirect string, offset: 0xd2): type-tag-2
>  <2><60>: Abbrev Number: 0
>  <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>     <62>   DW_AT_byte_size   : 4
>     <63>   DW_AT_encoding    : 5	(signed)
>     <64>   DW_AT_name        : int
>  <1><68>: Abbrev Number: 0
> 
> This does not agree with the DWARF produced by LLVM/clang for the same case:
> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
> 
> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>     <1f>   DW_AT_name        : (indexed string: 0x3): g
>     <20>   DW_AT_type        : <0x29>
>     <24>   DW_AT_external    : 1
>     <24>   DW_AT_decl_file   : 0
>     <25>   DW_AT_decl_line   : 6
>     <26>   DW_AT_location    : 2 byte block: a1 0 	((Unknown location op 0xa1))
>  <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>     <2a>   DW_AT_type        : <0x35>
>  <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>     <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>     <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>  <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>     <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>     <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>  <2><34>: Abbrev Number: 0
>  <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>     <36>   DW_AT_type        : <0x3e>
>  <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>     <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>     <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>  <2><3d>: Abbrev Number: 0
>  <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>     <3f>   DW_AT_name        : (indexed string: 0x4): int
>     <40>   DW_AT_encoding    : 5	(signed)
>     <41>   DW_AT_byte_size   : 4
>  <1><42>: Abbrev Number: 0
> 
> 
> Because GCC produces BTF from the internal DWARF DIE tree, the BTF also differs.
> This can be seen most obviously in the BTF type reference chains:
> 
>   GCC
>     VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
> 
>   LLVM
>     VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-04 17:03       ` David Faust
@ 2022-05-05 23:00         ` Yonghong Song
  2022-05-06 21:18           ` David Faust
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-05-05 23:00 UTC (permalink / raw)
  To: David Faust, Joseph Myers; +Cc: gcc-patches



On 5/4/22 10:03 AM, David Faust wrote:
> 
> 
> On 5/3/22 15:32, Joseph Myers wrote:
>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>
>>> Consider the following example:
>>>
>>>     #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>     #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>     #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>
>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>
>>> The expected behavior is that 'g' is "a pointer with tags 'tag2' and 
>>> 'tag3',
>>> to a pointer with tag 'tag1' to an int". i.e.:
>>
>> That's not a correct expectation for either GNU __attribute__ or C2x [[]]
>> attribute syntax.  In either syntax, __typetag2 __typetag3 should 
>> apply to
>> the type to which g points, not to g or its type, just as if you had a
>> type qualifier there.  You'd need to put the attributes (or qualifier)
>> after the *, not before, to make them apply to the pointer type.  See
>> "Attribute Syntax" in the GCC manual for how the syntax is defined for 
>> GNU
>> attributes and deduce in turn, for each subsequence of the tokens 
>> matching
>> the syntax for some kind of declarator, what the type for "T D1" would be
>> as defined there and in the C standard, as deduced from the type for 
>> "T D"
>> for a sub-declarator D.
>>  >> But GCC's attribute parsing produces a variable 'g' which is "a 
> pointer with
>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.
>>
>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>> syntax it applies to int.  Again, if you wanted it to apply to the 
>> pointer
>> type it would need to go after the * not before.
>>
>> If you are concerned with the fine details of what construct an attribute
>> appertains to, I recommend using C2x syntax not GNU syntax.
>>
> 
> Joseph, thank you! This is very helpful. My understanding of the syntax 
> was not correct.
> 
> (Actually, I made a bad mistake in paraphrasing this example from the 
> discussion of it in the series cover letter. But, the reason why it is 
> incorrect is the same.)
> 
> 
> Yonghong, is the specific ordering an expectation in BPF programs or 
> other users of the tags?

This is probably a language writing issue. We are saying tags only
apply to pointer. We probably should say it only apply to pointee.

$ cat t.c
int const *ptr;

the llvm ir debuginfo:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

We could replace 'const' with a tag like below:

int __attribute__((btf_type_tag("tag"))) *ptr;

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, 
annotations: !7)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!7 = !{!8}
!8 = !{!"btf_type_tag", !"tag"}

In the above IR, we generate annotations to pointer_type because
we didn't invent a new DI type for encode btf_type_tag. But it is
totally okay to have IR looks like

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
!11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)


> 
> This example comes from my testing against clang to check that the BTF 
> generated by both toolchains is compatible. In this case we get 
> different results when using the GNU attribute syntax.
> 
> 
> To avoid confusion, here is the full example (from the cover letter). 
> The difference in the results is clear in the DWARF.
> 
>> Consider the following example:
>>
>>   #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>   #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>   #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>
>>   int __typetag1 * __typetag2 __typetag3 * g;
>>
>>  <var_decl 0x7ffff7547090 g
>>     type <pointer_type 0x7ffff7548000
>>         type <pointer_type 0x7ffff75097e0 type <integer_type 
>> 0x7ffff74495e8 int>
>>             asm_written unsigned DI
>>             size <integer_cst 0x7ffff743c450 constant 64>
>>             unit-size <integer_cst 0x7ffff743c468 constant 8>
>>             align:64 warn_if_not_align:0 symtab:0 alias-set -1 
>> canonical-type 0x7ffff7450888
>>             attributes <tree_list 0x7ffff75275c8
>>                 purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>                 value <tree_list 0x7ffff7527550
>>                     value <string_cst 0x7ffff75292e0 type <array_type 
>> 0x7ffff7509738>
>>                         readonly constant static "type-tag-3\000">>
>>                 chain <tree_list 0x7ffff75275a0 purpose 
>> <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>                     value <tree_list 0x7ffff75274d8
>>                         value <string_cst 0x7ffff75292c0 type 
>> <array_type 0x7ffff7509738>
>>                             readonly constant static "type-tag-2\000">>>>
>>             pointer_to_this <pointer_type 0x7ffff7509888>>
>>         asm_written unsigned DI size <integer_cst 0x7ffff743c450 64> 
>> unit-size <integer_cst 0x7ffff743c468 8>
>>         align:64 warn_if_not_align:0 symtab:0 alias-set -1 
>> canonical-type 0x7ffff7509930
>>         attributes <tree_list 0x7ffff75275f0 purpose <identifier_node 
>> 0x7ffff753a1e0 btf_type_tag>
>>             value <tree_list 0x7ffff7527438
>>                 value <string_cst 0x7ffff75292a0 type <array_type 
>> 0x7ffff7509738>
>>                     readonly constant static "type-tag-1\000">>>>
>>     public static unsigned DI defer-output 
>> /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst 
>> 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>>     align:64 warn_if_not_align:0>
> 
>>
>> The current implementation produces the following DWARF:
>>
>>  <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>>     <1f>   DW_AT_name        : g
>>     <21>   DW_AT_decl_file   : 1
>>     <22>   DW_AT_decl_line   : 6
>>     <23>   DW_AT_decl_column : 42
>>     <24>   DW_AT_type        : <0x32>
>>     <28>   DW_AT_external    : 1
>>     <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0     
>> (DW_OP_addr: 0)
>>  <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>     <33>   DW_AT_byte_size   : 8
>>     <33>   DW_AT_type        : <0x45>
>>     <37>   DW_AT_sibling     : <0x45>
>>  <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>>     <3c>   DW_AT_name        : (indirect string, offset: 0x18): 
>> btf_type_tag
>>     <40>   DW_AT_const_value : (indirect string, offset: 0xc7): 
>> type-tag-1
>>  <2><44>: Abbrev Number: 0
>>  <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>     <46>   DW_AT_byte_size   : 8
>>     <46>   DW_AT_type        : <0x61>
>>     <4a>   DW_AT_sibling     : <0x61>
>>  <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>>     <4f>   DW_AT_name        : (indirect string, offset: 0x18): 
>> btf_type_tag
>>     <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
>>  <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>>     <58>   DW_AT_name        : (indirect string, offset: 0x18): 
>> btf_type_tag
>>     <5c>   DW_AT_const_value : (indirect string, offset: 0xd2): 
>> type-tag-2
>>  <2><60>: Abbrev Number: 0
>>  <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>>     <62>   DW_AT_byte_size   : 4
>>     <63>   DW_AT_encoding    : 5    (signed)
>>     <64>   DW_AT_name        : int
>>  <1><68>: Abbrev Number: 0
>>
>> This does not agree with the DWARF produced by LLVM/clang for the same 
>> case:
>> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
>>
>> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>>     <1f>   DW_AT_name        : (indexed string: 0x3): g
>>     <20>   DW_AT_type        : <0x29>
>>     <24>   DW_AT_external    : 1
>>     <24>   DW_AT_decl_file   : 0
>>     <25>   DW_AT_decl_line   : 6
>>     <26>   DW_AT_location    : 2 byte block: a1 0     ((Unknown 
>> location op 0xa1))
>>  <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>     <2a>   DW_AT_type        : <0x35>
>>  <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>>     <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>     <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>>  <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>>     <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>     <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>>  <2><34>: Abbrev Number: 0
>>  <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>     <36>   DW_AT_type        : <0x3e>
>>  <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>>     <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>     <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>>  <2><3d>: Abbrev Number: 0
>>  <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>>     <3f>   DW_AT_name        : (indexed string: 0x4): int
>>     <40>   DW_AT_encoding    : 5    (signed)
>>     <41>   DW_AT_byte_size   : 4
>>  <1><42>: Abbrev Number: 0
>>
>>
>> Because GCC produces BTF from the internal DWARF DIE tree, the BTF 
>> also differs.
>> This can be seen most obviously in the BTF type reference chains:
>>
>>   GCC
>>     VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>>
>>   LLVM
>>     VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-05 23:00         ` Yonghong Song
@ 2022-05-06 21:18           ` David Faust
  2022-05-11  3:43             ` Yonghong Song
  0 siblings, 1 reply; 30+ messages in thread
From: David Faust @ 2022-05-06 21:18 UTC (permalink / raw)
  To: Yonghong Song; +Cc: gcc-patches, Joseph Myers



On 5/5/22 16:00, Yonghong Song wrote:
> 
> 
> On 5/4/22 10:03 AM, David Faust wrote:
>>
>>
>> On 5/3/22 15:32, Joseph Myers wrote:
>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>
>>>> Consider the following example:
>>>>
>>>>      #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>      #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>      #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>
>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>
>>>> The expected behavior is that 'g' is "a pointer with tags 'tag2' and
>>>> 'tag3',
>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>
>>> That's not a correct expectation for either GNU __attribute__ or C2x [[]]
>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>> apply to
>>> the type to which g points, not to g or its type, just as if you had a
>>> type qualifier there.  You'd need to put the attributes (or qualifier)
>>> after the *, not before, to make them apply to the pointer type.  See
>>> "Attribute Syntax" in the GCC manual for how the syntax is defined for
>>> GNU
>>> attributes and deduce in turn, for each subsequence of the tokens
>>> matching
>>> the syntax for some kind of declarator, what the type for "T D1" would be
>>> as defined there and in the C standard, as deduced from the type for
>>> "T D"
>>> for a sub-declarator D.
>>>   >> But GCC's attribute parsing produces a variable 'g' which is "a
>> pointer with
>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.
>>>
>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>> pointer
>>> type it would need to go after the * not before.
>>>
>>> If you are concerned with the fine details of what construct an attribute
>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>
>>
>> Joseph, thank you! This is very helpful. My understanding of the syntax
>> was not correct.
>>
>> (Actually, I made a bad mistake in paraphrasing this example from the
>> discussion of it in the series cover letter. But, the reason why it is
>> incorrect is the same.)
>>
>>
>> Yonghong, is the specific ordering an expectation in BPF programs or
>> other users of the tags?
> 
> This is probably a language writing issue. We are saying tags only
> apply to pointer. We probably should say it only apply to pointee.
> 
> $ cat t.c
> int const *ptr;
> 
> the llvm ir debuginfo:
> 
> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
> 
> We could replace 'const' with a tag like below:
> 
> int __attribute__((btf_type_tag("tag"))) *ptr;
> 
> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
> annotations: !7)
> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
> !7 = !{!8}
> !8 = !{!"btf_type_tag", !"tag"}
> 
> In the above IR, we generate annotations to pointer_type because
> we didn't invent a new DI type for encode btf_type_tag. But it is
> totally okay to have IR looks like
> 
> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
> 
OK, thanks.

There is still the question of why the DWARF generated for this case 
that I have been concerned about:

   int __typetag1 * __typetag2 __typetag3 * g;

differs between GCC (with this series) and clang. After studying it, GCC 
is doing with the attributes exactly as is described in the Attribute 
Syntax portion of the GCC manual where the GNU syntax is described. I do 
not think there is any problem here.

So the difference in DWARF suggests to me that clang is not handling the 
GNU attribute syntax in this particular case correctly, since it seems 
to be associating __typetag2 and __typetag3 to g's type rather than the 
type to which it points.

I am not sure whether for the use purposes of the tags this difference 
is very important, but it is worth noting.


As Joseph suggested, it may be better to encourage users of these tags 
to use the C2x attribute syntax if they are concerned with precisely 
which construct the tag applies.

This would also be a way around any issues in handling the attributes 
due to the GNU syntax.

I tried a few test cases using C2x syntax BTF type tags with a clang-15 
build, but ran into some issues (in particular, some of the tag 
attributes being ignored altogether). I couldn't find confirmation 
whether C2x attribute syntax is fully supported in clang yet, so maybe 
this isn't expected to work. Do you know whether the C2x syntax is fully 
supported in clang yet?

> 
>>
>> This example comes from my testing against clang to check that the BTF
>> generated by both toolchains is compatible. In this case we get
>> different results when using the GNU attribute syntax.
>>
>>
>> To avoid confusion, here is the full example (from the cover letter).
>> The difference in the results is clear in the DWARF.
>>
>>> Consider the following example:
>>>
>>>    #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>>    #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>>    #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>>
>>>    int __typetag1 * __typetag2 __typetag3 * g;
>>>
>>>   <var_decl 0x7ffff7547090 g
>>>      type <pointer_type 0x7ffff7548000
>>>          type <pointer_type 0x7ffff75097e0 type <integer_type
>>> 0x7ffff74495e8 int>
>>>              asm_written unsigned DI
>>>              size <integer_cst 0x7ffff743c450 constant 64>
>>>              unit-size <integer_cst 0x7ffff743c468 constant 8>
>>>              align:64 warn_if_not_align:0 symtab:0 alias-set -1
>>> canonical-type 0x7ffff7450888
>>>              attributes <tree_list 0x7ffff75275c8
>>>                  purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>>                  value <tree_list 0x7ffff7527550
>>>                      value <string_cst 0x7ffff75292e0 type <array_type
>>> 0x7ffff7509738>
>>>                          readonly constant static "type-tag-3\000">>
>>>                  chain <tree_list 0x7ffff75275a0 purpose
>>> <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>>                      value <tree_list 0x7ffff75274d8
>>>                          value <string_cst 0x7ffff75292c0 type
>>> <array_type 0x7ffff7509738>
>>>                              readonly constant static "type-tag-2\000">>>>
>>>              pointer_to_this <pointer_type 0x7ffff7509888>>
>>>          asm_written unsigned DI size <integer_cst 0x7ffff743c450 64>
>>> unit-size <integer_cst 0x7ffff743c468 8>
>>>          align:64 warn_if_not_align:0 symtab:0 alias-set -1
>>> canonical-type 0x7ffff7509930
>>>          attributes <tree_list 0x7ffff75275f0 purpose <identifier_node
>>> 0x7ffff753a1e0 btf_type_tag>
>>>              value <tree_list 0x7ffff7527438
>>>                  value <string_cst 0x7ffff75292a0 type <array_type
>>> 0x7ffff7509738>
>>>                      readonly constant static "type-tag-1\000">>>>
>>>      public static unsigned DI defer-output
>>> /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst
>>> 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>>>      align:64 warn_if_not_align:0>
>>
>>>
>>> The current implementation produces the following DWARF:
>>>
>>>   <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>>>      <1f>   DW_AT_name        : g
>>>      <21>   DW_AT_decl_file   : 1
>>>      <22>   DW_AT_decl_line   : 6
>>>      <23>   DW_AT_decl_column : 42
>>>      <24>   DW_AT_type        : <0x32>
>>>      <28>   DW_AT_external    : 1
>>>      <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
>>> (DW_OP_addr: 0)
>>>   <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>>      <33>   DW_AT_byte_size   : 8
>>>      <33>   DW_AT_type        : <0x45>
>>>      <37>   DW_AT_sibling     : <0x45>
>>>   <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>      <3c>   DW_AT_name        : (indirect string, offset: 0x18):
>>> btf_type_tag
>>>      <40>   DW_AT_const_value : (indirect string, offset: 0xc7):
>>> type-tag-1
>>>   <2><44>: Abbrev Number: 0
>>>   <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>>      <46>   DW_AT_byte_size   : 8
>>>      <46>   DW_AT_type        : <0x61>
>>>      <4a>   DW_AT_sibling     : <0x61>
>>>   <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>      <4f>   DW_AT_name        : (indirect string, offset: 0x18):
>>> btf_type_tag
>>>      <53>   DW_AT_const_value : (indirect string, offset: 0xd): type-tag-3
>>>   <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>      <58>   DW_AT_name        : (indirect string, offset: 0x18):
>>> btf_type_tag
>>>      <5c>   DW_AT_const_value : (indirect string, offset: 0xd2):
>>> type-tag-2
>>>   <2><60>: Abbrev Number: 0
>>>   <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>>>      <62>   DW_AT_byte_size   : 4
>>>      <63>   DW_AT_encoding    : 5    (signed)
>>>      <64>   DW_AT_name        : int
>>>   <1><68>: Abbrev Number: 0
>>>
>>> This does not agree with the DWARF produced by LLVM/clang for the same
>>> case:
>>> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
>>>
>>> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>>>      <1f>   DW_AT_name        : (indexed string: 0x3): g
>>>      <20>   DW_AT_type        : <0x29>
>>>      <24>   DW_AT_external    : 1
>>>      <24>   DW_AT_decl_file   : 0
>>>      <25>   DW_AT_decl_line   : 6
>>>      <26>   DW_AT_location    : 2 byte block: a1 0     ((Unknown
>>> location op 0xa1))
>>>   <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>>      <2a>   DW_AT_type        : <0x35>
>>>   <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>>>      <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>>      <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>>>   <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>>>      <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>>      <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>>>   <2><34>: Abbrev Number: 0
>>>   <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>>      <36>   DW_AT_type        : <0x3e>
>>>   <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>>>      <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>>      <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>>>   <2><3d>: Abbrev Number: 0
>>>   <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>>>      <3f>   DW_AT_name        : (indexed string: 0x4): int
>>>      <40>   DW_AT_encoding    : 5    (signed)
>>>      <41>   DW_AT_byte_size   : 4
>>>   <1><42>: Abbrev Number: 0
>>>
>>>
>>> Because GCC produces BTF from the internal DWARF DIE tree, the BTF
>>> also differs.
>>> This can be seen most obviously in the BTF type reference chains:
>>>
>>>    GCC
>>>      VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>>>
>>>    LLVM
>>>      VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>>
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-06 21:18           ` David Faust
@ 2022-05-11  3:43             ` Yonghong Song
  2022-05-11  5:05               ` Yonghong Song
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-05-11  3:43 UTC (permalink / raw)
  To: David Faust; +Cc: gcc-patches, Joseph Myers



On 5/6/22 2:18 PM, David Faust wrote:
> 
> 
> On 5/5/22 16:00, Yonghong Song wrote:
>>
>>
>> On 5/4/22 10:03 AM, David Faust wrote:
>>>
>>>
>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>
>>>>> Consider the following example:
>>>>>
>>>>>      #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>      #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>      #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>
>>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>>
>>>>> The expected behavior is that 'g' is "a pointer with tags 'tag2' and
>>>>> 'tag3',
>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>
>>>> That's not a correct expectation for either GNU __attribute__ or C2x 
>>>> [[]]
>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>> apply to
>>>> the type to which g points, not to g or its type, just as if you had a
>>>> type qualifier there.  You'd need to put the attributes (or qualifier)
>>>> after the *, not before, to make them apply to the pointer type.  See
>>>> "Attribute Syntax" in the GCC manual for how the syntax is defined for
>>>> GNU
>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>> matching
>>>> the syntax for some kind of declarator, what the type for "T D1" 
>>>> would be
>>>> as defined there and in the C standard, as deduced from the type for
>>>> "T D"
>>>> for a sub-declarator D.
>>>>   >> But GCC's attribute parsing produces a variable 'g' which is "a
>>> pointer with
>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.
>>>>
>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>> pointer
>>>> type it would need to go after the * not before.
>>>>
>>>> If you are concerned with the fine details of what construct an 
>>>> attribute
>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>
>>>
>>> Joseph, thank you! This is very helpful. My understanding of the syntax
>>> was not correct.
>>>
>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>> discussion of it in the series cover letter. But, the reason why it is
>>> incorrect is the same.)
>>>
>>>
>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>> other users of the tags?
>>
>> This is probably a language writing issue. We are saying tags only
>> apply to pointer. We probably should say it only apply to pointee.
>>
>> $ cat t.c
>> int const *ptr;
>>
>> the llvm ir debuginfo:
>>
>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>
>> We could replace 'const' with a tag like below:
>>
>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>
>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>> annotations: !7)
>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>> !7 = !{!8}
>> !8 = !{!"btf_type_tag", !"tag"}
>>
>> In the above IR, we generate annotations to pointer_type because
>> we didn't invent a new DI type for encode btf_type_tag. But it is
>> totally okay to have IR looks like
>>
>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>
> OK, thanks.
> 
> There is still the question of why the DWARF generated for this case 
> that I have been concerned about:
> 
>    int __typetag1 * __typetag2 __typetag3 * g;
> 
> differs between GCC (with this series) and clang. After studying it, GCC 
> is doing with the attributes exactly as is described in the Attribute 
> Syntax portion of the GCC manual where the GNU syntax is described. I do 
> not think there is any problem here.
> 
> So the difference in DWARF suggests to me that clang is not handling the 
> GNU attribute syntax in this particular case correctly, since it seems 
> to be associating __typetag2 and __typetag3 to g's type rather than the 
> type to which it points.
> 
> I am not sure whether for the use purposes of the tags this difference 
> is very important, but it is worth noting.
> 
> 
> As Joseph suggested, it may be better to encourage users of these tags 
> to use the C2x attribute syntax if they are concerned with precisely 
> which construct the tag applies.
> 
> This would also be a way around any issues in handling the attributes 
> due to the GNU syntax.
> 
> I tried a few test cases using C2x syntax BTF type tags with a clang-15 
> build, but ran into some issues (in particular, some of the tag 
> attributes being ignored altogether). I couldn't find confirmation 
> whether C2x attribute syntax is fully supported in clang yet, so maybe 
> this isn't expected to work. Do you know whether the C2x syntax is fully 
> supported in clang yet?

Actually, I don't know either. But since the btf decl_tag and type_tag
are also used to compile linux kernel and the minimum compiler version
to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
supports c2x or not, I guess probably not. So I think we most likely
cannot use c2x syntax.

> 
>>
>>>
>>> This example comes from my testing against clang to check that the BTF
>>> generated by both toolchains is compatible. In this case we get
>>> different results when using the GNU attribute syntax.
>>>
>>>
>>> To avoid confusion, here is the full example (from the cover letter).
>>> The difference in the results is clear in the DWARF.
>>>
>>>> Consider the following example:
>>>>
>>>>    #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>>>    #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>>>    #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>>>
>>>>    int __typetag1 * __typetag2 __typetag3 * g;
>>>>
>>>>   <var_decl 0x7ffff7547090 g
>>>>      type <pointer_type 0x7ffff7548000
>>>>          type <pointer_type 0x7ffff75097e0 type <integer_type
>>>> 0x7ffff74495e8 int>
>>>>              asm_written unsigned DI
>>>>              size <integer_cst 0x7ffff743c450 constant 64>
>>>>              unit-size <integer_cst 0x7ffff743c468 constant 8>
>>>>              align:64 warn_if_not_align:0 symtab:0 alias-set -1
>>>> canonical-type 0x7ffff7450888
>>>>              attributes <tree_list 0x7ffff75275c8
>>>>                  purpose <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>>>                  value <tree_list 0x7ffff7527550
>>>>                      value <string_cst 0x7ffff75292e0 type <array_type
>>>> 0x7ffff7509738>
>>>>                          readonly constant static "type-tag-3\000">>
>>>>                  chain <tree_list 0x7ffff75275a0 purpose
>>>> <identifier_node 0x7ffff753a1e0 btf_type_tag>
>>>>                      value <tree_list 0x7ffff75274d8
>>>>                          value <string_cst 0x7ffff75292c0 type
>>>> <array_type 0x7ffff7509738>
>>>>                              readonly constant static 
>>>> "type-tag-2\000">>>>
>>>>              pointer_to_this <pointer_type 0x7ffff7509888>>
>>>>          asm_written unsigned DI size <integer_cst 0x7ffff743c450 64>
>>>> unit-size <integer_cst 0x7ffff743c468 8>
>>>>          align:64 warn_if_not_align:0 symtab:0 alias-set -1
>>>> canonical-type 0x7ffff7509930
>>>>          attributes <tree_list 0x7ffff75275f0 purpose <identifier_node
>>>> 0x7ffff753a1e0 btf_type_tag>
>>>>              value <tree_list 0x7ffff7527438
>>>>                  value <string_cst 0x7ffff75292a0 type <array_type
>>>> 0x7ffff7509738>
>>>>                      readonly constant static "type-tag-1\000">>>>
>>>>      public static unsigned DI defer-output
>>>> /home/dfaust/playpen/btf/annotate.c:29:42 size <integer_cst
>>>> 0x7ffff743c450 64> unit-size <integer_cst 0x7ffff743c468 8>
>>>>      align:64 warn_if_not_align:0>
>>>
>>>>
>>>> The current implementation produces the following DWARF:
>>>>
>>>>   <1><1e>: Abbrev Number: 4 (DW_TAG_variable)
>>>>      <1f>   DW_AT_name        : g
>>>>      <21>   DW_AT_decl_file   : 1
>>>>      <22>   DW_AT_decl_line   : 6
>>>>      <23>   DW_AT_decl_column : 42
>>>>      <24>   DW_AT_type        : <0x32>
>>>>      <28>   DW_AT_external    : 1
>>>>      <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
>>>> (DW_OP_addr: 0)
>>>>   <1><32>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>>>      <33>   DW_AT_byte_size   : 8
>>>>      <33>   DW_AT_type        : <0x45>
>>>>      <37>   DW_AT_sibling     : <0x45>
>>>>   <2><3b>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>      <3c>   DW_AT_name        : (indirect string, offset: 0x18):
>>>> btf_type_tag
>>>>      <40>   DW_AT_const_value : (indirect string, offset: 0xc7):
>>>> type-tag-1
>>>>   <2><44>: Abbrev Number: 0
>>>>   <1><45>: Abbrev Number: 2 (DW_TAG_pointer_type)
>>>>      <46>   DW_AT_byte_size   : 8
>>>>      <46>   DW_AT_type        : <0x61>
>>>>      <4a>   DW_AT_sibling     : <0x61>
>>>>   <2><4e>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>      <4f>   DW_AT_name        : (indirect string, offset: 0x18):
>>>> btf_type_tag
>>>>      <53>   DW_AT_const_value : (indirect string, offset: 0xd): 
>>>> type-tag-3
>>>>   <2><57>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>      <58>   DW_AT_name        : (indirect string, offset: 0x18):
>>>> btf_type_tag
>>>>      <5c>   DW_AT_const_value : (indirect string, offset: 0xd2):
>>>> type-tag-2
>>>>   <2><60>: Abbrev Number: 0
>>>>   <1><61>: Abbrev Number: 5 (DW_TAG_base_type)
>>>>      <62>   DW_AT_byte_size   : 4
>>>>      <63>   DW_AT_encoding    : 5    (signed)
>>>>      <64>   DW_AT_name        : int
>>>>   <1><68>: Abbrev Number: 0
>>>>
>>>> This does not agree with the DWARF produced by LLVM/clang for the same
>>>> case:
>>>> (clang 15.0.0 git 142501117a78080d2615074d3986fa42aa6a0734)
>>>>
>>>> <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>>>>      <1f>   DW_AT_name        : (indexed string: 0x3): g
>>>>      <20>   DW_AT_type        : <0x29>
>>>>      <24>   DW_AT_external    : 1
>>>>      <24>   DW_AT_decl_file   : 0
>>>>      <25>   DW_AT_decl_line   : 6
>>>>      <26>   DW_AT_location    : 2 byte block: a1 0     ((Unknown
>>>> location op 0xa1))
>>>>   <1><29>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>>>      <2a>   DW_AT_type        : <0x35>
>>>>   <2><2e>: Abbrev Number: 4 (User TAG value: 0x6000)
>>>>      <2f>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>>>      <30>   DW_AT_const_value : (indexed string: 0x7): type-tag-2
>>>>   <2><31>: Abbrev Number: 4 (User TAG value: 0x6000)
>>>>      <32>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>>>      <33>   DW_AT_const_value : (indexed string: 0x8): type-tag-3
>>>>   <2><34>: Abbrev Number: 0
>>>>   <1><35>: Abbrev Number: 3 (DW_TAG_pointer_type)
>>>>      <36>   DW_AT_type        : <0x3e>
>>>>   <2><3a>: Abbrev Number: 4 (User TAG value: 0x6000)
>>>>      <3b>   DW_AT_name        : (indexed string: 0x5): btf_type_tag
>>>>      <3c>   DW_AT_const_value : (indexed string: 0x6): type-tag-1
>>>>   <2><3d>: Abbrev Number: 0
>>>>   <1><3e>: Abbrev Number: 5 (DW_TAG_base_type)
>>>>      <3f>   DW_AT_name        : (indexed string: 0x4): int
>>>>      <40>   DW_AT_encoding    : 5    (signed)
>>>>      <41>   DW_AT_byte_size   : 4
>>>>   <1><42>: Abbrev Number: 0
>>>>
>>>>
>>>> Because GCC produces BTF from the internal DWARF DIE tree, the BTF
>>>> also differs.
>>>> This can be seen most obviously in the BTF type reference chains:
>>>>
>>>>    GCC
>>>>      VAR (g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>>>>
>>>>    LLVM
>>>>      VAR (g) -> ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>>>
>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-11  3:43             ` Yonghong Song
@ 2022-05-11  5:05               ` Yonghong Song
  2022-05-11 18:44                 ` David Faust
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-05-11  5:05 UTC (permalink / raw)
  To: David Faust; +Cc: gcc-patches, Joseph Myers



On 5/10/22 8:43 PM, Yonghong Song wrote:
> 
> 
> On 5/6/22 2:18 PM, David Faust wrote:
>>
>>
>> On 5/5/22 16:00, Yonghong Song wrote:
>>>
>>>
>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>
>>>>
>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>
>>>>>> Consider the following example:
>>>>>>
>>>>>>      #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>      #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>      #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>
>>>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>
>>>>>> The expected behavior is that 'g' is "a pointer with tags 'tag2' and
>>>>>> 'tag3',
>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>
>>>>> That's not a correct expectation for either GNU __attribute__ or 
>>>>> C2x [[]]
>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>> apply to
>>>>> the type to which g points, not to g or its type, just as if you had a
>>>>> type qualifier there.  You'd need to put the attributes (or qualifier)
>>>>> after the *, not before, to make them apply to the pointer type.  See
>>>>> "Attribute Syntax" in the GCC manual for how the syntax is defined for
>>>>> GNU
>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>> matching
>>>>> the syntax for some kind of declarator, what the type for "T D1" 
>>>>> would be
>>>>> as defined there and in the C standard, as deduced from the type for
>>>>> "T D"
>>>>> for a sub-declarator D.
>>>>>   >> But GCC's attribute parsing produces a variable 'g' which is "a
>>>> pointer with
>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.
>>>>>
>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>> pointer
>>>>> type it would need to go after the * not before.
>>>>>
>>>>> If you are concerned with the fine details of what construct an 
>>>>> attribute
>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>
>>>>
>>>> Joseph, thank you! This is very helpful. My understanding of the syntax
>>>> was not correct.
>>>>
>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>> discussion of it in the series cover letter. But, the reason why it is
>>>> incorrect is the same.)
>>>>
>>>>
>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>> other users of the tags?
>>>
>>> This is probably a language writing issue. We are saying tags only
>>> apply to pointer. We probably should say it only apply to pointee.
>>>
>>> $ cat t.c
>>> int const *ptr;
>>>
>>> the llvm ir debuginfo:
>>>
>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>
>>> We could replace 'const' with a tag like below:
>>>
>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>
>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>> annotations: !7)
>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>> !7 = !{!8}
>>> !8 = !{!"btf_type_tag", !"tag"}
>>>
>>> In the above IR, we generate annotations to pointer_type because
>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>> totally okay to have IR looks like
>>>
>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>
>> OK, thanks.
>>
>> There is still the question of why the DWARF generated for this case 
>> that I have been concerned about:
>>
>>    int __typetag1 * __typetag2 __typetag3 * g;
>>
>> differs between GCC (with this series) and clang. After studying it, 
>> GCC is doing with the attributes exactly as is described in the 
>> Attribute Syntax portion of the GCC manual where the GNU syntax is 
>> described. I do not think there is any problem here.
>>
>> So the difference in DWARF suggests to me that clang is not handling 
>> the GNU attribute syntax in this particular case correctly, since it 
>> seems to be associating __typetag2 and __typetag3 to g's type rather 
>> than the type to which it points.
>>
>> I am not sure whether for the use purposes of the tags this difference 
>> is very important, but it is worth noting.
>>
>>
>> As Joseph suggested, it may be better to encourage users of these tags 
>> to use the C2x attribute syntax if they are concerned with precisely 
>> which construct the tag applies.
>>
>> This would also be a way around any issues in handling the attributes 
>> due to the GNU syntax.
>>
>> I tried a few test cases using C2x syntax BTF type tags with a 
>> clang-15 build, but ran into some issues (in particular, some of the 
>> tag attributes being ignored altogether). I couldn't find confirmation 
>> whether C2x attribute syntax is fully supported in clang yet, so maybe 
>> this isn't expected to work. Do you know whether the C2x syntax is 
>> fully supported in clang yet?
> 
> Actually, I don't know either. But since the btf decl_tag and type_tag
> are also used to compile linux kernel and the minimum compiler version
> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
> supports c2x or not, I guess probably not. So I think we most likely
> cannot use c2x syntax.

Okay, I think we can guard btf_tag's with newer compiler versions.
What kind of c2x syntax you intend to use? I can help compile kernel
with that syntax and llvm15 to see what is the issue and may help
fix it in clang if possible.

> 
>>
>>>
>>>>
>>>> This example comes from my testing against clang to check that the BTF
>>>> generated by both toolchains is compatible. In this case we get
>>>> different results when using the GNU attribute syntax.
>>>>
[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-11  5:05               ` Yonghong Song
@ 2022-05-11 18:44                 ` David Faust
  2022-05-24  6:33                   ` Yonghong Song
  0 siblings, 1 reply; 30+ messages in thread
From: David Faust @ 2022-05-11 18:44 UTC (permalink / raw)
  To: Yonghong Song; +Cc: gcc-patches, Joseph Myers



On 5/10/22 22:05, Yonghong Song wrote:
> 
> 
> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>
>>
>> On 5/6/22 2:18 PM, David Faust wrote:
>>>
>>>
>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>
>>>>
>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>
>>>>>
>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>
>>>>>>> Consider the following example:
>>>>>>>
>>>>>>>       #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>       #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>       #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>
>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>
>>>>>>> The expected behavior is that 'g' is "a pointer with tags 'tag2' and
>>>>>>> 'tag3',
>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>
>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>> C2x [[]]
>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>> apply to
>>>>>> the type to which g points, not to g or its type, just as if you had a
>>>>>> type qualifier there.  You'd need to put the attributes (or qualifier)
>>>>>> after the *, not before, to make them apply to the pointer type.  See
>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is defined for
>>>>>> GNU
>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>> matching
>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>> would be
>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>> "T D"
>>>>>> for a sub-declarator D.
>>>>>>    >> But GCC's attribute parsing produces a variable 'g' which is "a
>>>>> pointer with
>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e.
>>>>>>
>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>> pointer
>>>>>> type it would need to go after the * not before.
>>>>>>
>>>>>> If you are concerned with the fine details of what construct an
>>>>>> attribute
>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>
>>>>>
>>>>> Joseph, thank you! This is very helpful. My understanding of the syntax
>>>>> was not correct.
>>>>>
>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>> discussion of it in the series cover letter. But, the reason why it is
>>>>> incorrect is the same.)
>>>>>
>>>>>
>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>> other users of the tags?
>>>>
>>>> This is probably a language writing issue. We are saying tags only
>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>
>>>> $ cat t.c
>>>> int const *ptr;
>>>>
>>>> the llvm ir debuginfo:
>>>>
>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>
>>>> We could replace 'const' with a tag like below:
>>>>
>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>
>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>> annotations: !7)
>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>> !7 = !{!8}
>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>
>>>> In the above IR, we generate annotations to pointer_type because
>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>> totally okay to have IR looks like
>>>>
>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>
>>> OK, thanks.
>>>
>>> There is still the question of why the DWARF generated for this case
>>> that I have been concerned about:
>>>
>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>
>>> differs between GCC (with this series) and clang. After studying it,
>>> GCC is doing with the attributes exactly as is described in the
>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>> described. I do not think there is any problem here.
>>>
>>> So the difference in DWARF suggests to me that clang is not handling
>>> the GNU attribute syntax in this particular case correctly, since it
>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>> than the type to which it points.
>>>
>>> I am not sure whether for the use purposes of the tags this difference
>>> is very important, but it is worth noting.
>>>
>>>
>>> As Joseph suggested, it may be better to encourage users of these tags
>>> to use the C2x attribute syntax if they are concerned with precisely
>>> which construct the tag applies.
>>>
>>> This would also be a way around any issues in handling the attributes
>>> due to the GNU syntax.
>>>
>>> I tried a few test cases using C2x syntax BTF type tags with a
>>> clang-15 build, but ran into some issues (in particular, some of the
>>> tag attributes being ignored altogether). I couldn't find confirmation
>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>> this isn't expected to work. Do you know whether the C2x syntax is
>>> fully supported in clang yet?
>>
>> Actually, I don't know either. But since the btf decl_tag and type_tag
>> are also used to compile linux kernel and the minimum compiler version
>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>> supports c2x or not, I guess probably not. So I think we most likely
>> cannot use c2x syntax.
> 
> Okay, I think we can guard btf_tag's with newer compiler versions.
> What kind of c2x syntax you intend to use? I can help compile kernel
> with that syntax and llvm15 to see what is the issue and may help
> fix it in clang if possible.


I am thinking to use the [[]] C2x standard attribute syntax. The syntax 
makes it quite clear to which entity each attribute applies, and in my 
opinion is a little more intuitive/less surprising too.

It's documented here (PDF):
   https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf

See sections 6.7.11 for the syntax and 6.7.6 for declarations. Section 
6.7.6.1 specifically describes using the attribute syntax with pointer 
declarators.

The attribute syntax itself for BTF tags is:
   [[clang::btf_type_tag("tag1")]]
or
   [[gnu::btf_type_tag("tag1")]]


I am also looking into whether, with the C2x syntax, we really need two
separate attributes (type_tag and decl_tag) at the language level. It 
might be possible with C2x syntax to use just one language attribute 
(e.g. just btf_tag).


A simple declaration for a tagged pointer to an int:

   int * [[gnu::btf_type_tag("tag1")]] x;

And for the example from this thread:

   #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
   #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
   #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]

   int * __typetag1 * __typetag2 __typetag3 g;

Here each tag applies to the preceding pointer, so the result is 
unsurprising.

Actually, this is where I found something that looks like an issue with 
the C2x attribute syntax in clang. The tags 2 and 3 go missing, but with 
no warning nor other indication.

Compiling this example with gcc:

$ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c -o 
c2x.o --std=c2x
$ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o

0x0000000c: DW_TAG_compile_unit
               DW_AT_producer	("GNU C2X 12.0.1 20220401 (experimental) 
-gbtf -gdwarf -std=c2x")
               DW_AT_language	(DW_LANG_C11)
               DW_AT_name	("c2x.c")
               DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
               DW_AT_stmt_list	(0x00000000)

0x0000001e:   DW_TAG_variable
                 DW_AT_name	("g")
                 DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/c2x.c")
                 DW_AT_decl_line	(16)
                 DW_AT_decl_column	(0x2a)
                 DW_AT_type	(0x00000032 "int **")
                 DW_AT_external	(true)
                 DW_AT_location	(DW_OP_addr 0x0)

0x00000032:   DW_TAG_pointer_type
                 DW_AT_byte_size	(8)
                 DW_AT_type	(0x0000004e "int *")
                 DW_AT_sibling	(0x0000004e)

0x0000003b:     DW_TAG_LLVM_annotation
                   DW_AT_name	("btf_type_tag")
                   DW_AT_const_value	("type-tag-3")

0x00000044:     DW_TAG_LLVM_annotation
                   DW_AT_name	("btf_type_tag")
                   DW_AT_const_value	("type-tag-2")

0x0000004d:     NULL

0x0000004e:   DW_TAG_pointer_type
                 DW_AT_byte_size	(8)
                 DW_AT_type	(0x00000061 "int")
                 DW_AT_sibling	(0x00000061)

0x00000057:     DW_TAG_LLVM_annotation
                   DW_AT_name	("btf_type_tag")
                   DW_AT_const_value	("type-tag-1")

0x00000060:     NULL

0x00000061:   DW_TAG_base_type
                 DW_AT_byte_size	(0x04)
                 DW_AT_encoding	(DW_ATE_signed)
                 DW_AT_name	("int")

0x00000068:   NULL


and with clang (changing the attribute prefix to clang:: appropriately):

$ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll --std=c2x
$ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll

0x0000000c: DW_TAG_compile_unit
               DW_AT_producer	("clang version 15.0.0 
(https://github.com/llvm/llvm-project.git 
f80e369f61ebd33dd9377bb42fcab64d17072b18)")
               DW_AT_language	(DW_LANG_C99)
               DW_AT_name	("c2x.c")
               DW_AT_str_offsets_base	(0x00000008)
               DW_AT_stmt_list	(0x00000000)
               DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
               DW_AT_addr_base	(0x00000008)

0x0000001e:   DW_TAG_variable
                 DW_AT_name	("g")
                 DW_AT_type	(0x00000029 "int **")
                 DW_AT_external	(true)
                 DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/c2x.c")
                 DW_AT_decl_line	(12)
                 DW_AT_location	(DW_OP_addrx 0x0)

0x00000029:   DW_TAG_pointer_type
                 DW_AT_type	(0x00000032 "int *")

0x0000002e:     DW_TAG_LLVM_annotation
                   DW_AT_name	("btf_type_tag")
                   DW_AT_const_value	("type-tag-1")

0x00000031:     NULL

0x00000032:   DW_TAG_pointer_type
                 DW_AT_type	(0x00000037 "int")

0x00000037:   DW_TAG_base_type
                 DW_AT_name	("int")
                 DW_AT_encoding	(DW_ATE_signed)
                 DW_AT_byte_size	(0x04)

0x0000003b:   NULL


> 
>>
>>>
>>>>
>>>>>
>>>>> This example comes from my testing against clang to check that the BTF
>>>>> generated by both toolchains is compatible. In this case we get
>>>>> different results when using the GNU attribute syntax.
>>>>>
> [...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-11 18:44                 ` David Faust
@ 2022-05-24  6:33                   ` Yonghong Song
  2022-05-24 11:07                     ` Jose E. Marchesi
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-05-24  6:33 UTC (permalink / raw)
  To: David Faust; +Cc: gcc-patches, Joseph Myers



On 5/11/22 11:44 AM, David Faust wrote:
> 
> 
> On 5/10/22 22:05, Yonghong Song wrote:
>>
>>
>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>
>>>
>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>
>>>>
>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>
>>>>>>
>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>
>>>>>>>> Consider the following example:
>>>>>>>>
>>>>>>>>       #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>       #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>       #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>
>>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>
>>>>>>>> The expected behavior is that 'g' is "a pointer with tags 'tag2' 
>>>>>>>> and
>>>>>>>> 'tag3',
>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>
>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>> C2x [[]]
>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>> apply to
>>>>>>> the type to which g points, not to g or its type, just as if you 
>>>>>>> had a
>>>>>>> type qualifier there.  You'd need to put the attributes (or 
>>>>>>> qualifier)
>>>>>>> after the *, not before, to make them apply to the pointer type.  
>>>>>>> See
>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is 
>>>>>>> defined for
>>>>>>> GNU
>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>> matching
>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>> would be
>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>> "T D"
>>>>>>> for a sub-declarator D.
>>>>>>>    >> But GCC's attribute parsing produces a variable 'g' which 
>>>>>>> is "a
>>>>>> pointer with
>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", 
>>>>>>>> i.e.
>>>>>>>
>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>> pointer
>>>>>>> type it would need to go after the * not before.
>>>>>>>
>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>> attribute
>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>
>>>>>>
>>>>>> Joseph, thank you! This is very helpful. My understanding of the 
>>>>>> syntax
>>>>>> was not correct.
>>>>>>
>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>> discussion of it in the series cover letter. But, the reason why 
>>>>>> it is
>>>>>> incorrect is the same.)
>>>>>>
>>>>>>
>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>> other users of the tags?
>>>>>
>>>>> This is probably a language writing issue. We are saying tags only
>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>
>>>>> $ cat t.c
>>>>> int const *ptr;
>>>>>
>>>>> the llvm ir debuginfo:
>>>>>
>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>
>>>>> We could replace 'const' with a tag like below:
>>>>>
>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>
>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>> annotations: !7)
>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>> !7 = !{!8}
>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>
>>>>> In the above IR, we generate annotations to pointer_type because
>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>> totally okay to have IR looks like
>>>>>
>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>
>>>> OK, thanks.
>>>>
>>>> There is still the question of why the DWARF generated for this case
>>>> that I have been concerned about:
>>>>
>>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>>
>>>> differs between GCC (with this series) and clang. After studying it,
>>>> GCC is doing with the attributes exactly as is described in the
>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>> described. I do not think there is any problem here.
>>>>
>>>> So the difference in DWARF suggests to me that clang is not handling
>>>> the GNU attribute syntax in this particular case correctly, since it
>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>> than the type to which it points.
>>>>
>>>> I am not sure whether for the use purposes of the tags this difference
>>>> is very important, but it is worth noting.
>>>>
>>>>
>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>> which construct the tag applies.
>>>>
>>>> This would also be a way around any issues in handling the attributes
>>>> due to the GNU syntax.
>>>>
>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>> fully supported in clang yet?
>>>
>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>> are also used to compile linux kernel and the minimum compiler version
>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>> supports c2x or not, I guess probably not. So I think we most likely
>>> cannot use c2x syntax.
>>
>> Okay, I think we can guard btf_tag's with newer compiler versions.
>> What kind of c2x syntax you intend to use? I can help compile kernel
>> with that syntax and llvm15 to see what is the issue and may help
>> fix it in clang if possible.
> 
> 
> I am thinking to use the [[]] C2x standard attribute syntax. The syntax 
> makes it quite clear to which entity each attribute applies, and in my 
> opinion is a little more intuitive/less surprising too.
> 
> It's documented here (PDF):
>    
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf 
> 
> See sections 6.7.11 for the syntax and 6.7.6 for declarations. Section 
> 6.7.6.1 specifically describes using the attribute syntax with pointer 
> declarators.
> 
> The attribute syntax itself for BTF tags is:
>    [[clang::btf_type_tag("tag1")]]
> or
>    [[gnu::btf_type_tag("tag1")]]
> 
> 
> I am also looking into whether, with the C2x syntax, we really need two
> separate attributes (type_tag and decl_tag) at the language level. It 
> might be possible with C2x syntax to use just one language attribute 
> (e.g. just btf_tag).
> 
> 
> A simple declaration for a tagged pointer to an int:
> 
>    int * [[gnu::btf_type_tag("tag1")]] x;
> 
> And for the example from this thread:
> 
>    #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>    #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>    #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
> 
>    int * __typetag1 * __typetag2 __typetag3 g;
> 
> Here each tag applies to the preceding pointer, so the result is 
> unsurprising.
> 
> Actually, this is where I found something that looks like an issue with 
> the C2x attribute syntax in clang. The tags 2 and 3 go missing, but with 
> no warning nor other indication.
> 
> Compiling this example with gcc:
> 
> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c -o 
> c2x.o --std=c2x
> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
> 
> 0x0000000c: DW_TAG_compile_unit
>                DW_AT_producer    ("GNU C2X 12.0.1 20220401 
> (experimental) -gbtf -gdwarf -std=c2x")
>                DW_AT_language    (DW_LANG_C11)
>                DW_AT_name    ("c2x.c")
>                DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>                DW_AT_stmt_list    (0x00000000)
> 
> 0x0000001e:   DW_TAG_variable
>                  DW_AT_name    ("g")
>                  DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>                  DW_AT_decl_line    (16)
>                  DW_AT_decl_column    (0x2a)
>                  DW_AT_type    (0x00000032 "int **")
>                  DW_AT_external    (true)
>                  DW_AT_location    (DW_OP_addr 0x0)
> 
> 0x00000032:   DW_TAG_pointer_type
>                  DW_AT_byte_size    (8)
>                  DW_AT_type    (0x0000004e "int *")
>                  DW_AT_sibling    (0x0000004e)
> 
> 0x0000003b:     DW_TAG_LLVM_annotation
>                    DW_AT_name    ("btf_type_tag")
>                    DW_AT_const_value    ("type-tag-3")
> 
> 0x00000044:     DW_TAG_LLVM_annotation
>                    DW_AT_name    ("btf_type_tag")
>                    DW_AT_const_value    ("type-tag-2")
> 
> 0x0000004d:     NULL
> 
> 0x0000004e:   DW_TAG_pointer_type
>                  DW_AT_byte_size    (8)
>                  DW_AT_type    (0x00000061 "int")
>                  DW_AT_sibling    (0x00000061)
> 
> 0x00000057:     DW_TAG_LLVM_annotation
>                    DW_AT_name    ("btf_type_tag")
>                    DW_AT_const_value    ("type-tag-1")
> 
> 0x00000060:     NULL
> 
> 0x00000061:   DW_TAG_base_type
>                  DW_AT_byte_size    (0x04)
>                  DW_AT_encoding    (DW_ATE_signed)
>                  DW_AT_name    ("int")
> 
> 0x00000068:   NULL
> 
> 
> and with clang (changing the attribute prefix to clang:: appropriately):
> 
> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll --std=c2x
> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
> 
> 0x0000000c: DW_TAG_compile_unit
>                DW_AT_producer    ("clang version 15.0.0 
> (https://github.com/llvm/llvm-project.git 
> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>                DW_AT_language    (DW_LANG_C99)
>                DW_AT_name    ("c2x.c")
>                DW_AT_str_offsets_base    (0x00000008)
>                DW_AT_stmt_list    (0x00000000)
>                DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>                DW_AT_addr_base    (0x00000008)
> 
> 0x0000001e:   DW_TAG_variable
>                  DW_AT_name    ("g")
>                  DW_AT_type    (0x00000029 "int **")
>                  DW_AT_external    (true)
>                  DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>                  DW_AT_decl_line    (12)
>                  DW_AT_location    (DW_OP_addrx 0x0)
> 
> 0x00000029:   DW_TAG_pointer_type
>                  DW_AT_type    (0x00000032 "int *")
> 
> 0x0000002e:     DW_TAG_LLVM_annotation
>                    DW_AT_name    ("btf_type_tag")
>                    DW_AT_const_value    ("type-tag-1")
> 
> 0x00000031:     NULL
> 
> 0x00000032:   DW_TAG_pointer_type
>                  DW_AT_type    (0x00000037 "int")
> 
> 0x00000037:   DW_TAG_base_type
>                  DW_AT_name    ("int")
>                  DW_AT_encoding    (DW_ATE_signed)
>                  DW_AT_byte_size    (0x04)
> 
> 0x0000003b:   NULL

Thanks. I checked with current clang. The generated code looks
like above. Basically, for code like below

    #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
    #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
    #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]

    int * __typetag1 * __typetag2 __typetag3 g;

The IR type looks like
    __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int

The IR is similar to what we did if using 
__attribute__((btf_type_tag(""))), but their
semantic interpretation is quite different.
For example, with c2x format,
    __typetag1 applies to ptr2
with __attribute__ format, it applies pointee of ptr1.

But more importantly, c2x format is incompatible with
the usage of linux kernel. The following are a bunch of kernel
__user usages. Here, __user intends to be replaced with a btf_type_tag.

vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device *vdev, 
char __user *buf,
vfio_pci_core.h:                                  char __user *buf, 
size_t count,
vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct 
vfio_pci_core_device *vdev, char __user *buf,
vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct 
vfio_pci_core_device *vdev, char __user *buf,
vfio_pci_core.h:                                      char __user *buf, 
size_t count,
vfio_pci_core.h:                                void __user *arg, size_t 
argsz);
vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device 
*core_vdev, char __user *buf,
vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device 
*core_vdev, const char __user *buf,
vringh.h:                    vring_desc_t __user *desc,
vringh.h:                    vring_avail_t __user *avail,
vringh.h:                    vring_used_t __user *used);
vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
vt_kern.h:int con_set_trans_old(unsigned char __user * table);
vt_kern.h:int con_get_trans_old(unsigned char __user * table);
vt_kern.h:int con_set_trans_new(unsigned short __user * table);
vt_kern.h:int con_get_trans_new(unsigned short __user * table);

You can see, we will not able to simply replace __user
with [[clang::btf_type_tag("user")]] because it won't work
according to c2x expectations.

> 
> 
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> This example comes from my testing against clang to check that the 
>>>>>> BTF
>>>>>> generated by both toolchains is compatible. In this case we get
>>>>>> different results when using the GNU attribute syntax.
>>>>>>
>> [...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-24  6:33                   ` Yonghong Song
@ 2022-05-24 11:07                     ` Jose E. Marchesi
  2022-05-24 15:52                       ` Yonghong Song
  2022-05-24 15:53                       ` David Faust
  0 siblings, 2 replies; 30+ messages in thread
From: Jose E. Marchesi @ 2022-05-24 11:07 UTC (permalink / raw)
  To: Yonghong Song via Gcc-patches; +Cc: David Faust, Yonghong Song, Joseph Myers


> On 5/11/22 11:44 AM, David Faust wrote:
>> 
>> On 5/10/22 22:05, Yonghong Song wrote:
>>>
>>>
>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>
>>>>>
>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>
>>>>>>>>> Consider the following example:
>>>>>>>>>
>>>>>>>>>       #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>       #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>       #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>
>>>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>
>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>> 'tag2' and
>>>>>>>>> 'tag3',
>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>
>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>> C2x [[]]
>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>> apply to
>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>> you had a
>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>> qualifier)
>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>> type.  See
>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>> defined for
>>>>>>>> GNU
>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>> matching
>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>> would be
>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>> "T D"
>>>>>>>> for a sub-declarator D.
>>>>>>>>    >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>> which is "a
>>>>>>> pointer with
>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>> int", i.e.
>>>>>>>>
>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>> pointer
>>>>>>>> type it would need to go after the * not before.
>>>>>>>>
>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>> attribute
>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>
>>>>>>>
>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>> the syntax
>>>>>>> was not correct.
>>>>>>>
>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>> why it is
>>>>>>> incorrect is the same.)
>>>>>>>
>>>>>>>
>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>> other users of the tags?
>>>>>>
>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>
>>>>>> $ cat t.c
>>>>>> int const *ptr;
>>>>>>
>>>>>> the llvm ir debuginfo:
>>>>>>
>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>
>>>>>> We could replace 'const' with a tag like below:
>>>>>>
>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>
>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>> annotations: !7)
>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>> !7 = !{!8}
>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>
>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>> totally okay to have IR looks like
>>>>>>
>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>
>>>>> OK, thanks.
>>>>>
>>>>> There is still the question of why the DWARF generated for this case
>>>>> that I have been concerned about:
>>>>>
>>>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>>>
>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>> GCC is doing with the attributes exactly as is described in the
>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>> described. I do not think there is any problem here.
>>>>>
>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>> than the type to which it points.
>>>>>
>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>> is very important, but it is worth noting.
>>>>>
>>>>>
>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>> which construct the tag applies.
>>>>>
>>>>> This would also be a way around any issues in handling the attributes
>>>>> due to the GNU syntax.
>>>>>
>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>> fully supported in clang yet?
>>>>
>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>> are also used to compile linux kernel and the minimum compiler version
>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>> cannot use c2x syntax.
>>>
>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>> with that syntax and llvm15 to see what is the issue and may help
>>> fix it in clang if possible.
>> 
>> I am thinking to use the [[]] C2x standard attribute syntax. The
>> syntax makes it quite clear to which entity each attribute applies,
>> and in my opinion is a little more intuitive/less surprising too.
>> It's documented here (PDF):
>>    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf 
>> See sections 6.7.11 for the syntax and 6.7.6 for
>> declarations. Section 6.7.6.1 specifically describes using the
>> attribute syntax with pointer declarators.
>> The attribute syntax itself for BTF tags is:
>>    [[clang::btf_type_tag("tag1")]]
>> or
>>    [[gnu::btf_type_tag("tag1")]]
>> 
>> I am also looking into whether, with the C2x syntax, we really need two
>> separate attributes (type_tag and decl_tag) at the language
>> level. It might be possible with C2x syntax to use just one language
>> attribute (e.g. just btf_tag).
>> 
>> A simple declaration for a tagged pointer to an int:
>>    int * [[gnu::btf_type_tag("tag1")]] x;
>> And for the example from this thread:
>>    #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>    #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>    #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>    int * __typetag1 * __typetag2 __typetag3 g;
>> Here each tag applies to the preceding pointer, so the result is 
>> unsurprising.
>> Actually, this is where I found something that looks like an issue
>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>> but with no warning nor other indication.
>> Compiling this example with gcc:
>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>> -o c2x.o --std=c2x
>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>> 0x0000000c: DW_TAG_compile_unit
>>                DW_AT_producer    ("GNU C2X 12.0.1 20220401
>> (experimental) -gbtf -gdwarf -std=c2x")
>>                DW_AT_language    (DW_LANG_C11)
>>                DW_AT_name    ("c2x.c")
>>                DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>                DW_AT_stmt_list    (0x00000000)
>> 0x0000001e:   DW_TAG_variable
>>                  DW_AT_name    ("g")
>>                  DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>                  DW_AT_decl_line    (16)
>>                  DW_AT_decl_column    (0x2a)
>>                  DW_AT_type    (0x00000032 "int **")
>>                  DW_AT_external    (true)
>>                  DW_AT_location    (DW_OP_addr 0x0)
>> 0x00000032:   DW_TAG_pointer_type
>>                  DW_AT_byte_size    (8)
>>                  DW_AT_type    (0x0000004e "int *")
>>                  DW_AT_sibling    (0x0000004e)
>> 0x0000003b:     DW_TAG_LLVM_annotation
>>                    DW_AT_name    ("btf_type_tag")
>>                    DW_AT_const_value    ("type-tag-3")
>> 0x00000044:     DW_TAG_LLVM_annotation
>>                    DW_AT_name    ("btf_type_tag")
>>                    DW_AT_const_value    ("type-tag-2")
>> 0x0000004d:     NULL
>> 0x0000004e:   DW_TAG_pointer_type
>>                  DW_AT_byte_size    (8)
>>                  DW_AT_type    (0x00000061 "int")
>>                  DW_AT_sibling    (0x00000061)
>> 0x00000057:     DW_TAG_LLVM_annotation
>>                    DW_AT_name    ("btf_type_tag")
>>                    DW_AT_const_value    ("type-tag-1")
>> 0x00000060:     NULL
>> 0x00000061:   DW_TAG_base_type
>>                  DW_AT_byte_size    (0x04)
>>                  DW_AT_encoding    (DW_ATE_signed)
>>                  DW_AT_name    ("int")
>> 0x00000068:   NULL
>> 
>> and with clang (changing the attribute prefix to clang:: appropriately):
>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>> --std=c2x
>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>> 0x0000000c: DW_TAG_compile_unit
>>                DW_AT_producer    ("clang version 15.0.0
>> (https://github.com/llvm/llvm-project.git 
>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>                DW_AT_language    (DW_LANG_C99)
>>                DW_AT_name    ("c2x.c")
>>                DW_AT_str_offsets_base    (0x00000008)
>>                DW_AT_stmt_list    (0x00000000)
>>                DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>                DW_AT_addr_base    (0x00000008)
>> 0x0000001e:   DW_TAG_variable
>>                  DW_AT_name    ("g")
>>                  DW_AT_type    (0x00000029 "int **")
>>                  DW_AT_external    (true)
>>                  DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>                  DW_AT_decl_line    (12)
>>                  DW_AT_location    (DW_OP_addrx 0x0)
>> 0x00000029:   DW_TAG_pointer_type
>>                  DW_AT_type    (0x00000032 "int *")
>> 0x0000002e:     DW_TAG_LLVM_annotation
>>                    DW_AT_name    ("btf_type_tag")
>>                    DW_AT_const_value    ("type-tag-1")
>> 0x00000031:     NULL
>> 0x00000032:   DW_TAG_pointer_type
>>                  DW_AT_type    (0x00000037 "int")
>> 0x00000037:   DW_TAG_base_type
>>                  DW_AT_name    ("int")
>>                  DW_AT_encoding    (DW_ATE_signed)
>>                  DW_AT_byte_size    (0x04)
>> 0x0000003b:   NULL
>
> Thanks. I checked with current clang. The generated code looks
> like above. Basically, for code like below
>
>    #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>    #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>    #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>
>    int * __typetag1 * __typetag2 __typetag3 g;
>
> The IR type looks like
>    __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>
> The IR is similar to what we did if using
> __attribute__((btf_type_tag(""))), but their
> semantic interpretation is quite different.
> For example, with c2x format,
>    __typetag1 applies to ptr2
> with __attribute__ format, it applies pointee of ptr1.
>
> But more importantly, c2x format is incompatible with
> the usage of linux kernel. The following are a bunch of kernel
> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>
> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
> *vdev, char __user *buf,
> vfio_pci_core.h:                                  char __user *buf,
> size_t count,
> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
> vfio_pci_core_device *vdev, char __user *buf,
> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
> vfio_pci_core_device *vdev, char __user *buf,
> vfio_pci_core.h:                                      char __user
> *buf, size_t count,
> vfio_pci_core.h:                                void __user *arg,
> size_t argsz);
> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
> *core_vdev, char __user *buf,
> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
> *core_vdev, const char __user *buf,
> vringh.h:                    vring_desc_t __user *desc,
> vringh.h:                    vring_avail_t __user *avail,
> vringh.h:                    vring_used_t __user *used);
> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>
> You can see, we will not able to simply replace __user
> with [[clang::btf_type_tag("user")]] because it won't work
> according to c2x expectations.

Hi Yongsong.

I am a bit confused regarding the GNU attributes problem: our patch
supports it, but as David already noted:

>>>> There is still the question of why the DWARF generated for this case
>>>> that I have been concerned about:
>>>>
>>>>    int __typetag1 * __typetag2 __typetag3 * g;
>>>>
>>>> differs between GCC (with this series) and clang. After studying it,
>>>> GCC is doing with the attributes exactly as is described in the
>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>> described. I do not think there is any problem here.
>>>>
>>>> So the difference in DWARF suggests to me that clang is not handling
>>>> the GNU attribute syntax in this particular case correctly, since it
>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>> than the type to which it points.

Note the example he uses is:

  (a) int __typetag1 * __typetag2 __typetag3 * g;

Not

  (b) int * __typetag1 * __typetag2 __typetag3 g;

Apparently for (a) clang is generating DWARF that associates __typetag2
and__typetag3 to g's type (the pointer to pointer) instead of the
pointer to int, which contravenes the GNU syntax rules.

AFAIK thats is where the DWARF we generate differs, and what is blocking
us.  David will correct me in the likely case I'm wrong :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-24 11:07                     ` Jose E. Marchesi
@ 2022-05-24 15:52                       ` Yonghong Song
  2022-05-24 15:53                       ` David Faust
  1 sibling, 0 replies; 30+ messages in thread
From: Yonghong Song @ 2022-05-24 15:52 UTC (permalink / raw)
  To: Jose E. Marchesi, Yonghong Song via Gcc-patches; +Cc: David Faust, Joseph Myers



On 5/24/22 4:07 AM, Jose E. Marchesi wrote:
> 
>> On 5/11/22 11:44 AM, David Faust wrote:
>>>
>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>
>>>>
>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>
>>>>>>
>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>
>>>>>>>>>> Consider the following example:
>>>>>>>>>>
>>>>>>>>>>        #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>        #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>        #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>
>>>>>>>>>>        int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>
>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>> 'tag2' and
>>>>>>>>>> 'tag3',
>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>
>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>> C2x [[]]
>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>> apply to
>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>> you had a
>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>> qualifier)
>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>> type.  See
>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>> defined for
>>>>>>>>> GNU
>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>> matching
>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>> would be
>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>> "T D"
>>>>>>>>> for a sub-declarator D.
>>>>>>>>>     >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>> which is "a
>>>>>>>> pointer with
>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>> int", i.e.
>>>>>>>>>
>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>> pointer
>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>
>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>> attribute
>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>> the syntax
>>>>>>>> was not correct.
>>>>>>>>
>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>> why it is
>>>>>>>> incorrect is the same.)
>>>>>>>>
>>>>>>>>
>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>> other users of the tags?
>>>>>>>
>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>
>>>>>>> $ cat t.c
>>>>>>> int const *ptr;
>>>>>>>
>>>>>>> the llvm ir debuginfo:
>>>>>>>
>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>
>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>
>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>
>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>> annotations: !7)
>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>> !7 = !{!8}
>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>
>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>> totally okay to have IR looks like
>>>>>>>
>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>
>>>>>> OK, thanks.
>>>>>>
>>>>>> There is still the question of why the DWARF generated for this case
>>>>>> that I have been concerned about:
>>>>>>
>>>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>
>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>> described. I do not think there is any problem here.
>>>>>>
>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>> than the type to which it points.
>>>>>>
>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>> is very important, but it is worth noting.
>>>>>>
>>>>>>
>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>> which construct the tag applies.
>>>>>>
>>>>>> This would also be a way around any issues in handling the attributes
>>>>>> due to the GNU syntax.
>>>>>>
>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>> fully supported in clang yet?
>>>>>
>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>> cannot use c2x syntax.
>>>>
>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>> with that syntax and llvm15 to see what is the issue and may help
>>>> fix it in clang if possible.
>>>
>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>> syntax makes it quite clear to which entity each attribute applies,
>>> and in my opinion is a little more intuitive/less surprising too.
>>> It's documented here (PDF):
>>>     https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>> declarations. Section 6.7.6.1 specifically describes using the
>>> attribute syntax with pointer declarators.
>>> The attribute syntax itself for BTF tags is:
>>>     [[clang::btf_type_tag("tag1")]]
>>> or
>>>     [[gnu::btf_type_tag("tag1")]]
>>>
>>> I am also looking into whether, with the C2x syntax, we really need two
>>> separate attributes (type_tag and decl_tag) at the language
>>> level. It might be possible with C2x syntax to use just one language
>>> attribute (e.g. just btf_tag).
>>>
>>> A simple declaration for a tagged pointer to an int:
>>>     int * [[gnu::btf_type_tag("tag1")]] x;
>>> And for the example from this thread:
>>>     #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>     #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>     #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>     int * __typetag1 * __typetag2 __typetag3 g;
>>> Here each tag applies to the preceding pointer, so the result is
>>> unsurprising.
>>> Actually, this is where I found something that looks like an issue
>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>> but with no warning nor other indication.
>>> Compiling this example with gcc:
>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>> -o c2x.o --std=c2x
>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>> 0x0000000c: DW_TAG_compile_unit
>>>                 DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>                 DW_AT_language    (DW_LANG_C11)
>>>                 DW_AT_name    ("c2x.c")
>>>                 DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>                 DW_AT_stmt_list    (0x00000000)
>>> 0x0000001e:   DW_TAG_variable
>>>                   DW_AT_name    ("g")
>>>                   DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>                   DW_AT_decl_line    (16)
>>>                   DW_AT_decl_column    (0x2a)
>>>                   DW_AT_type    (0x00000032 "int **")
>>>                   DW_AT_external    (true)
>>>                   DW_AT_location    (DW_OP_addr 0x0)
>>> 0x00000032:   DW_TAG_pointer_type
>>>                   DW_AT_byte_size    (8)
>>>                   DW_AT_type    (0x0000004e "int *")
>>>                   DW_AT_sibling    (0x0000004e)
>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name    ("btf_type_tag")
>>>                     DW_AT_const_value    ("type-tag-3")
>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name    ("btf_type_tag")
>>>                     DW_AT_const_value    ("type-tag-2")
>>> 0x0000004d:     NULL
>>> 0x0000004e:   DW_TAG_pointer_type
>>>                   DW_AT_byte_size    (8)
>>>                   DW_AT_type    (0x00000061 "int")
>>>                   DW_AT_sibling    (0x00000061)
>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name    ("btf_type_tag")
>>>                     DW_AT_const_value    ("type-tag-1")
>>> 0x00000060:     NULL
>>> 0x00000061:   DW_TAG_base_type
>>>                   DW_AT_byte_size    (0x04)
>>>                   DW_AT_encoding    (DW_ATE_signed)
>>>                   DW_AT_name    ("int")
>>> 0x00000068:   NULL
>>>
>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>> --std=c2x
>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>> 0x0000000c: DW_TAG_compile_unit
>>>                 DW_AT_producer    ("clang version 15.0.0
>>> (https://github.com/llvm/llvm-project.git
>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>                 DW_AT_language    (DW_LANG_C99)
>>>                 DW_AT_name    ("c2x.c")
>>>                 DW_AT_str_offsets_base    (0x00000008)
>>>                 DW_AT_stmt_list    (0x00000000)
>>>                 DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>                 DW_AT_addr_base    (0x00000008)
>>> 0x0000001e:   DW_TAG_variable
>>>                   DW_AT_name    ("g")
>>>                   DW_AT_type    (0x00000029 "int **")
>>>                   DW_AT_external    (true)
>>>                   DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>                   DW_AT_decl_line    (12)
>>>                   DW_AT_location    (DW_OP_addrx 0x0)
>>> 0x00000029:   DW_TAG_pointer_type
>>>                   DW_AT_type    (0x00000032 "int *")
>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name    ("btf_type_tag")
>>>                     DW_AT_const_value    ("type-tag-1")
>>> 0x00000031:     NULL
>>> 0x00000032:   DW_TAG_pointer_type
>>>                   DW_AT_type    (0x00000037 "int")
>>> 0x00000037:   DW_TAG_base_type
>>>                   DW_AT_name    ("int")
>>>                   DW_AT_encoding    (DW_ATE_signed)
>>>                   DW_AT_byte_size    (0x04)
>>> 0x0000003b:   NULL
>>
>> Thanks. I checked with current clang. The generated code looks
>> like above. Basically, for code like below
>>
>>     #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>     #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>     #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>
>>     int * __typetag1 * __typetag2 __typetag3 g;
>>
>> The IR type looks like
>>     __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>
>> The IR is similar to what we did if using
>> __attribute__((btf_type_tag(""))), but their
>> semantic interpretation is quite different.
>> For example, with c2x format,
>>     __typetag1 applies to ptr2
>> with __attribute__ format, it applies pointee of ptr1.
>>
>> But more importantly, c2x format is incompatible with
>> the usage of linux kernel. The following are a bunch of kernel
>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>
>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>> *vdev, char __user *buf,
>> vfio_pci_core.h:                                  char __user *buf,
>> size_t count,
>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>> vfio_pci_core_device *vdev, char __user *buf,
>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>> vfio_pci_core_device *vdev, char __user *buf,
>> vfio_pci_core.h:                                      char __user
>> *buf, size_t count,
>> vfio_pci_core.h:                                void __user *arg,
>> size_t argsz);
>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>> *core_vdev, char __user *buf,
>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>> *core_vdev, const char __user *buf,
>> vringh.h:                    vring_desc_t __user *desc,
>> vringh.h:                    vring_avail_t __user *avail,
>> vringh.h:                    vring_used_t __user *used);
>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>
>> You can see, we will not able to simply replace __user
>> with [[clang::btf_type_tag("user")]] because it won't work
>> according to c2x expectations.
> 
> Hi Yongsong.
> 
> I am a bit confused regarding the GNU attributes problem: our patch
> supports it, but as David already noted:
> 
>>>>> There is still the question of why the DWARF generated for this case
>>>>> that I have been concerned about:
>>>>>
>>>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>>>
>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>> GCC is doing with the attributes exactly as is described in the
>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>> described. I do not think there is any problem here.
>>>>>
>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>> than the type to which it points.
> 
> Note the example he uses is:
> 
>    (a) int __typetag1 * __typetag2 __typetag3 * g;
> 
> Not
> 
>    (b) int * __typetag1 * __typetag2 __typetag3 g;
> 
> Apparently for (a) clang is generating DWARF that associates __typetag2
> and__typetag3 to g's type (the pointer to pointer) instead of the
> pointer to int, which contravenes the GNU syntax rules.
> 
> AFAIK thats is where the DWARF we generate differs, and what is blocking
> us.  David will correct me in the likely case I'm wrong :)

Okay, for
   #define __typetag1 __attribute__((btf_type_tag("tag1")))
   #define __typetag2 __attribute__((btf_type_tag("tag2")))
   #define __typetag3 __attribute__((btf_type_tag("tag3")))
   int __typetag1 * __typetag2 __typetag3 * g;

As you are aware, clang generates IR like:
!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, 
annotations: !10)
!6 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64, 
annotations: !8)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!8 = !{!9}
!9 = !{!"btf_type_tag", !"tag1"}
!10 = !{!11, !12}
!11 = !{!"btf_type_tag", !"tag2"}
!12 = !{!"btf_type_tag", !"tag3"}

As you mentioned, yes, we put the annotations in the
pointer itself. This is a pure implementation issue.
We could generate
   ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
type chain and encode such type chain in dwarf as well
but it could make existing tools hard to work with new
format.

What is your proposed dwarf format? This may require to
add a new TAG type or you have a way to use existing one.

I think clang can change to the new format as well
if we reach agreement.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-24 11:07                     ` Jose E. Marchesi
  2022-05-24 15:52                       ` Yonghong Song
@ 2022-05-24 15:53                       ` David Faust
  2022-05-24 16:03                         ` Yonghong Song
  1 sibling, 1 reply; 30+ messages in thread
From: David Faust @ 2022-05-24 15:53 UTC (permalink / raw)
  To: Yonghong Song, Jose E. Marchesi
  Cc: Joseph Myers, Yonghong Song via Gcc-patches



On 5/24/22 04:07, Jose E. Marchesi wrote:
> 
>> On 5/11/22 11:44 AM, David Faust wrote:
>>>
>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>
>>>>
>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>
>>>>>>
>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>
>>>>>>>>>> Consider the following example:
>>>>>>>>>>
>>>>>>>>>>       #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>       #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>       #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>
>>>>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>
>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>> 'tag2' and
>>>>>>>>>> 'tag3',
>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>
>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>> C2x [[]]
>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>> apply to
>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>> you had a
>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>> qualifier)
>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>> type.  See
>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>> defined for
>>>>>>>>> GNU
>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>> matching
>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>> would be
>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>> "T D"
>>>>>>>>> for a sub-declarator D.
>>>>>>>>>    >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>> which is "a
>>>>>>>> pointer with
>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>> int", i.e.
>>>>>>>>>
>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>> pointer
>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>
>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>> attribute
>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>> the syntax
>>>>>>>> was not correct.
>>>>>>>>
>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>> why it is
>>>>>>>> incorrect is the same.)
>>>>>>>>
>>>>>>>>
>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>> other users of the tags?
>>>>>>>
>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>
>>>>>>> $ cat t.c
>>>>>>> int const *ptr;
>>>>>>>
>>>>>>> the llvm ir debuginfo:
>>>>>>>
>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>
>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>
>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>
>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>> annotations: !7)
>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>> !7 = !{!8}
>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>
>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>> totally okay to have IR looks like
>>>>>>>
>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>
>>>>>> OK, thanks.
>>>>>>
>>>>>> There is still the question of why the DWARF generated for this case
>>>>>> that I have been concerned about:
>>>>>>
>>>>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>
>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>> described. I do not think there is any problem here.
>>>>>>
>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>> than the type to which it points.
>>>>>>
>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>> is very important, but it is worth noting.
>>>>>>
>>>>>>
>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>> which construct the tag applies.
>>>>>>
>>>>>> This would also be a way around any issues in handling the attributes
>>>>>> due to the GNU syntax.
>>>>>>
>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>> fully supported in clang yet?
>>>>>
>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>> cannot use c2x syntax.
>>>>
>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>> with that syntax and llvm15 to see what is the issue and may help
>>>> fix it in clang if possible.
>>>
>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>> syntax makes it quite clear to which entity each attribute applies,
>>> and in my opinion is a little more intuitive/less surprising too.
>>> It's documented here (PDF):
>>>    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf 
>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>> declarations. Section 6.7.6.1 specifically describes using the
>>> attribute syntax with pointer declarators.
>>> The attribute syntax itself for BTF tags is:
>>>    [[clang::btf_type_tag("tag1")]]
>>> or
>>>    [[gnu::btf_type_tag("tag1")]]
>>>
>>> I am also looking into whether, with the C2x syntax, we really need two
>>> separate attributes (type_tag and decl_tag) at the language
>>> level. It might be possible with C2x syntax to use just one language
>>> attribute (e.g. just btf_tag).
>>>
>>> A simple declaration for a tagged pointer to an int:
>>>    int * [[gnu::btf_type_tag("tag1")]] x;
>>> And for the example from this thread:
>>>    #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>    #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>    #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>    int * __typetag1 * __typetag2 __typetag3 g;
>>> Here each tag applies to the preceding pointer, so the result is 
>>> unsurprising.
>>> Actually, this is where I found something that looks like an issue
>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>> but with no warning nor other indication.
>>> Compiling this example with gcc:
>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>> -o c2x.o --std=c2x
>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>> 0x0000000c: DW_TAG_compile_unit
>>>                DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>                DW_AT_language    (DW_LANG_C11)
>>>                DW_AT_name    ("c2x.c")
>>>                DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>                DW_AT_stmt_list    (0x00000000)
>>> 0x0000001e:   DW_TAG_variable
>>>                  DW_AT_name    ("g")
>>>                  DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>                  DW_AT_decl_line    (16)
>>>                  DW_AT_decl_column    (0x2a)
>>>                  DW_AT_type    (0x00000032 "int **")
>>>                  DW_AT_external    (true)
>>>                  DW_AT_location    (DW_OP_addr 0x0)
>>> 0x00000032:   DW_TAG_pointer_type
>>>                  DW_AT_byte_size    (8)
>>>                  DW_AT_type    (0x0000004e "int *")
>>>                  DW_AT_sibling    (0x0000004e)
>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>                    DW_AT_name    ("btf_type_tag")
>>>                    DW_AT_const_value    ("type-tag-3")
>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>                    DW_AT_name    ("btf_type_tag")
>>>                    DW_AT_const_value    ("type-tag-2")
>>> 0x0000004d:     NULL
>>> 0x0000004e:   DW_TAG_pointer_type
>>>                  DW_AT_byte_size    (8)
>>>                  DW_AT_type    (0x00000061 "int")
>>>                  DW_AT_sibling    (0x00000061)
>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>                    DW_AT_name    ("btf_type_tag")
>>>                    DW_AT_const_value    ("type-tag-1")
>>> 0x00000060:     NULL
>>> 0x00000061:   DW_TAG_base_type
>>>                  DW_AT_byte_size    (0x04)
>>>                  DW_AT_encoding    (DW_ATE_signed)
>>>                  DW_AT_name    ("int")
>>> 0x00000068:   NULL
>>>
>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>> --std=c2x
>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>> 0x0000000c: DW_TAG_compile_unit
>>>                DW_AT_producer    ("clang version 15.0.0
>>> (https://github.com/llvm/llvm-project.git 
>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>                DW_AT_language    (DW_LANG_C99)
>>>                DW_AT_name    ("c2x.c")
>>>                DW_AT_str_offsets_base    (0x00000008)
>>>                DW_AT_stmt_list    (0x00000000)
>>>                DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>                DW_AT_addr_base    (0x00000008)
>>> 0x0000001e:   DW_TAG_variable
>>>                  DW_AT_name    ("g")
>>>                  DW_AT_type    (0x00000029 "int **")
>>>                  DW_AT_external    (true)
>>>                  DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>                  DW_AT_decl_line    (12)
>>>                  DW_AT_location    (DW_OP_addrx 0x0)
>>> 0x00000029:   DW_TAG_pointer_type
>>>                  DW_AT_type    (0x00000032 "int *")
>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>                    DW_AT_name    ("btf_type_tag")
>>>                    DW_AT_const_value    ("type-tag-1")
>>> 0x00000031:     NULL
>>> 0x00000032:   DW_TAG_pointer_type
>>>                  DW_AT_type    (0x00000037 "int")
>>> 0x00000037:   DW_TAG_base_type
>>>                  DW_AT_name    ("int")
>>>                  DW_AT_encoding    (DW_ATE_signed)
>>>                  DW_AT_byte_size    (0x04)
>>> 0x0000003b:   NULL
>>
>> Thanks. I checked with current clang. The generated code looks
>> like above. Basically, for code like below
>>
>>    #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>    #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>    #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>
>>    int * __typetag1 * __typetag2 __typetag3 g;
>>
>> The IR type looks like
>>    __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>
>> The IR is similar to what we did if using
>> __attribute__((btf_type_tag(""))), but their
>> semantic interpretation is quite different.
>> For example, with c2x format,
>>    __typetag1 applies to ptr2
>> with __attribute__ format, it applies pointee of ptr1.
>>
>> But more importantly, c2x format is incompatible with
>> the usage of linux kernel. The following are a bunch of kernel
>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>
>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>> *vdev, char __user *buf,
>> vfio_pci_core.h:                                  char __user *buf,
>> size_t count,
>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>> vfio_pci_core_device *vdev, char __user *buf,
>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>> vfio_pci_core_device *vdev, char __user *buf,
>> vfio_pci_core.h:                                      char __user
>> *buf, size_t count,
>> vfio_pci_core.h:                                void __user *arg,
>> size_t argsz);
>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>> *core_vdev, char __user *buf,
>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>> *core_vdev, const char __user *buf,
>> vringh.h:                    vring_desc_t __user *desc,
>> vringh.h:                    vring_avail_t __user *avail,
>> vringh.h:                    vring_used_t __user *used);
>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>
>> You can see, we will not able to simply replace __user
>> with [[clang::btf_type_tag("user")]] because it won't work
>> according to c2x expectations.

Hi,

Thanks for checking this. I see that we probably cannot use the c2x
syntax in the kernel, since it will not work as a drop-in replacement
for the current uses.

> 
> Hi Yongsong.
> 
> I am a bit confused regarding the GNU attributes problem: our patch
> supports it, but as David already noted:
> 
>>>>> There is still the question of why the DWARF generated for this case
>>>>> that I have been concerned about:
>>>>>
>>>>>    int __typetag1 * __typetag2 __typetag3 * g;
>>>>>
>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>> GCC is doing with the attributes exactly as is described in the
>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>> described. I do not think there is any problem here.
>>>>>
>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>> than the type to which it points.
> 
> Note the example he uses is:
> 
>   (a) int __typetag1 * __typetag2 __typetag3 * g;
> 
> Not
> 
>   (b) int * __typetag1 * __typetag2 __typetag3 g;
> 
> Apparently for (a) clang is generating DWARF that associates __typetag2
> and__typetag3 to g's type (the pointer to pointer) instead of the
> pointer to int, which contravenes the GNU syntax rules.
> 
> AFAIK thats is where the DWARF we generate differs, and what is blocking
> us.  David will correct me in the likely case I'm wrong :)

Right. This is what I hoped maybe the C2x syntax could resolve.

The issue I saw is that in the case (a) above, when using the GNU
attribute syntax, GCC and clang produce different results. I think that
the underlying cause is some subtle difference in how clang is handling
the GNU attribute syntax in the case compared to GCC.


To remind ourselves, here is the full example. Notice the significant
difference in which objects the tags are associated with in DWARF.


#define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
#define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
#define __typetag3 __attribute__((btf_type_tag("type-tag-3")))

int __typetag1 * __typetag2 __typetag3 * g;


GCC: bpf-unknown-none-gcc -c -gdwarf -gbtf annotate.c

0x0000000c: DW_TAG_compile_unit
              DW_AT_producer	("GNU C17 12.0.1 20220401 (experimental) -gdwarf -gbtf")
              DW_AT_language	(DW_LANG_C11)
              DW_AT_name	("annotate.c")
              DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
              DW_AT_stmt_list	(0x00000000)

0x0000001e:   DW_TAG_variable
                DW_AT_name	("g")
                DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
                DW_AT_decl_line	(11)
                DW_AT_decl_column	(0x2a)
                DW_AT_type	(0x00000032 "int **")
                DW_AT_external	(true)
                DW_AT_location	(DW_OP_addr 0x0)

0x00000032:   DW_TAG_pointer_type
                DW_AT_byte_size	(8)
                DW_AT_type	(0x00000045 "int *")
                DW_AT_sibling	(0x00000045)

0x0000003b:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf_type_tag")
                  DW_AT_const_value	("type-tag-1")

0x00000044:     NULL

0x00000045:   DW_TAG_pointer_type
                DW_AT_byte_size	(8)
                DW_AT_type	(0x00000061 "int")
                DW_AT_sibling	(0x00000061)

0x0000004e:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf_type_tag")
                  DW_AT_const_value	("type-tag-3")

0x00000057:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf_type_tag")
                  DW_AT_const_value	("type-tag-2")

0x00000060:     NULL

0x00000061:   DW_TAG_base_type
                DW_AT_byte_size	(0x04)
                DW_AT_encoding	(DW_ATE_signed)
                DW_AT_name	("int")

0x00000068:   NULL


clang: clang -target bpf -c -g annotate.c

0x0000000c: DW_TAG_compile_unit
              DW_AT_producer	("clang version 15.0.0 (https://github.com/llvm/llvm-project.git f80e369f61ebd33dd9377bb42fcab64d17072b18)")
              DW_AT_language	(DW_LANG_C99)
              DW_AT_name	("annotate.c")
              DW_AT_str_offsets_base	(0x00000008)
              DW_AT_stmt_list	(0x00000000)
              DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
              DW_AT_addr_base	(0x00000008)

0x0000001e:   DW_TAG_variable
                DW_AT_name	("g")
                DW_AT_type	(0x00000029 "int **")
                DW_AT_external	(true)
                DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
                DW_AT_decl_line	(11)
                DW_AT_location	(DW_OP_addrx 0x0)

0x00000029:   DW_TAG_pointer_type
                DW_AT_type	(0x00000035 "int *")

0x0000002e:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf_type_tag")
                  DW_AT_const_value	("type-tag-2")

0x00000031:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf_type_tag")
                  DW_AT_const_value	("type-tag-3")

0x00000034:     NULL

0x00000035:   DW_TAG_pointer_type
                DW_AT_type	(0x0000003e "int")

0x0000003a:     DW_TAG_LLVM_annotation
                  DW_AT_name	("btf_type_tag")
                  DW_AT_const_value	("type-tag-1")

0x0000003d:     NULL

0x0000003e:   DW_TAG_base_type
                DW_AT_name	("int")
                DW_AT_encoding	(DW_ATE_signed)
                DW_AT_byte_size	(0x04)

0x00000042:   NULL



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-24 15:53                       ` David Faust
@ 2022-05-24 16:03                         ` Yonghong Song
  2022-05-24 17:04                           ` David Faust
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-05-24 16:03 UTC (permalink / raw)
  To: David Faust, Jose E. Marchesi; +Cc: Joseph Myers, Yonghong Song via Gcc-patches



On 5/24/22 8:53 AM, David Faust wrote:
> 
> 
> On 5/24/22 04:07, Jose E. Marchesi wrote:
>>
>>> On 5/11/22 11:44 AM, David Faust wrote:
>>>>
>>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>>
>>>>>>>>>>> Consider the following example:
>>>>>>>>>>>
>>>>>>>>>>>        #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>>        #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>>        #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>>
>>>>>>>>>>>        int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>
>>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>>> 'tag2' and
>>>>>>>>>>> 'tag3',
>>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>>
>>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>>> C2x [[]]
>>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>>> apply to
>>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>>> you had a
>>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>>> qualifier)
>>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>>> type.  See
>>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>>> defined for
>>>>>>>>>> GNU
>>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>>> matching
>>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>>> would be
>>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>>> "T D"
>>>>>>>>>> for a sub-declarator D.
>>>>>>>>>>     >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>>> which is "a
>>>>>>>>> pointer with
>>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>>> int", i.e.
>>>>>>>>>>
>>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>>> pointer
>>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>>
>>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>>> attribute
>>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>>> the syntax
>>>>>>>>> was not correct.
>>>>>>>>>
>>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>>> why it is
>>>>>>>>> incorrect is the same.)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>>> other users of the tags?
>>>>>>>>
>>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>>
>>>>>>>> $ cat t.c
>>>>>>>> int const *ptr;
>>>>>>>>
>>>>>>>> the llvm ir debuginfo:
>>>>>>>>
>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>
>>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>>
>>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>>
>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>>> annotations: !7)
>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>> !7 = !{!8}
>>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>>
>>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>>> totally okay to have IR looks like
>>>>>>>>
>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>
>>>>>>> OK, thanks.
>>>>>>>
>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>> that I have been concerned about:
>>>>>>>
>>>>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>
>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>> described. I do not think there is any problem here.
>>>>>>>
>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>> than the type to which it points.
>>>>>>>
>>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>>> is very important, but it is worth noting.
>>>>>>>
>>>>>>>
>>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>>> which construct the tag applies.
>>>>>>>
>>>>>>> This would also be a way around any issues in handling the attributes
>>>>>>> due to the GNU syntax.
>>>>>>>
>>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>>> fully supported in clang yet?
>>>>>>
>>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>>> cannot use c2x syntax.
>>>>>
>>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>>> with that syntax and llvm15 to see what is the issue and may help
>>>>> fix it in clang if possible.
>>>>
>>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>>> syntax makes it quite clear to which entity each attribute applies,
>>>> and in my opinion is a little more intuitive/less surprising too.
>>>> It's documented here (PDF):
>>>>     https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
>>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>>> declarations. Section 6.7.6.1 specifically describes using the
>>>> attribute syntax with pointer declarators.
>>>> The attribute syntax itself for BTF tags is:
>>>>     [[clang::btf_type_tag("tag1")]]
>>>> or
>>>>     [[gnu::btf_type_tag("tag1")]]
>>>>
>>>> I am also looking into whether, with the C2x syntax, we really need two
>>>> separate attributes (type_tag and decl_tag) at the language
>>>> level. It might be possible with C2x syntax to use just one language
>>>> attribute (e.g. just btf_tag).
>>>>
>>>> A simple declaration for a tagged pointer to an int:
>>>>     int * [[gnu::btf_type_tag("tag1")]] x;
>>>> And for the example from this thread:
>>>>     #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>>     #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>>     #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>>     int * __typetag1 * __typetag2 __typetag3 g;
>>>> Here each tag applies to the preceding pointer, so the result is
>>>> unsurprising.
>>>> Actually, this is where I found something that looks like an issue
>>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>>> but with no warning nor other indication.
>>>> Compiling this example with gcc:
>>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>>> -o c2x.o --std=c2x
>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>>> 0x0000000c: DW_TAG_compile_unit
>>>>                 DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>>                 DW_AT_language    (DW_LANG_C11)
>>>>                 DW_AT_name    ("c2x.c")
>>>>                 DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>                 DW_AT_stmt_list    (0x00000000)
>>>> 0x0000001e:   DW_TAG_variable
>>>>                   DW_AT_name    ("g")
>>>>                   DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>                   DW_AT_decl_line    (16)
>>>>                   DW_AT_decl_column    (0x2a)
>>>>                   DW_AT_type    (0x00000032 "int **")
>>>>                   DW_AT_external    (true)
>>>>                   DW_AT_location    (DW_OP_addr 0x0)
>>>> 0x00000032:   DW_TAG_pointer_type
>>>>                   DW_AT_byte_size    (8)
>>>>                   DW_AT_type    (0x0000004e "int *")
>>>>                   DW_AT_sibling    (0x0000004e)
>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name    ("btf_type_tag")
>>>>                     DW_AT_const_value    ("type-tag-3")
>>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name    ("btf_type_tag")
>>>>                     DW_AT_const_value    ("type-tag-2")
>>>> 0x0000004d:     NULL
>>>> 0x0000004e:   DW_TAG_pointer_type
>>>>                   DW_AT_byte_size    (8)
>>>>                   DW_AT_type    (0x00000061 "int")
>>>>                   DW_AT_sibling    (0x00000061)
>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name    ("btf_type_tag")
>>>>                     DW_AT_const_value    ("type-tag-1")
>>>> 0x00000060:     NULL
>>>> 0x00000061:   DW_TAG_base_type
>>>>                   DW_AT_byte_size    (0x04)
>>>>                   DW_AT_encoding    (DW_ATE_signed)
>>>>                   DW_AT_name    ("int")
>>>> 0x00000068:   NULL
>>>>
>>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>>> --std=c2x
>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>>> 0x0000000c: DW_TAG_compile_unit
>>>>                 DW_AT_producer    ("clang version 15.0.0
>>>> (https://github.com/llvm/llvm-project.git
>>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>                 DW_AT_language    (DW_LANG_C99)
>>>>                 DW_AT_name    ("c2x.c")
>>>>                 DW_AT_str_offsets_base    (0x00000008)
>>>>                 DW_AT_stmt_list    (0x00000000)
>>>>                 DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>                 DW_AT_addr_base    (0x00000008)
>>>> 0x0000001e:   DW_TAG_variable
>>>>                   DW_AT_name    ("g")
>>>>                   DW_AT_type    (0x00000029 "int **")
>>>>                   DW_AT_external    (true)
>>>>                   DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>                   DW_AT_decl_line    (12)
>>>>                   DW_AT_location    (DW_OP_addrx 0x0)
>>>> 0x00000029:   DW_TAG_pointer_type
>>>>                   DW_AT_type    (0x00000032 "int *")
>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name    ("btf_type_tag")
>>>>                     DW_AT_const_value    ("type-tag-1")
>>>> 0x00000031:     NULL
>>>> 0x00000032:   DW_TAG_pointer_type
>>>>                   DW_AT_type    (0x00000037 "int")
>>>> 0x00000037:   DW_TAG_base_type
>>>>                   DW_AT_name    ("int")
>>>>                   DW_AT_encoding    (DW_ATE_signed)
>>>>                   DW_AT_byte_size    (0x04)
>>>> 0x0000003b:   NULL
>>>
>>> Thanks. I checked with current clang. The generated code looks
>>> like above. Basically, for code like below
>>>
>>>     #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>>     #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>>     #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>>
>>>     int * __typetag1 * __typetag2 __typetag3 g;
>>>
>>> The IR type looks like
>>>     __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>>
>>> The IR is similar to what we did if using
>>> __attribute__((btf_type_tag(""))), but their
>>> semantic interpretation is quite different.
>>> For example, with c2x format,
>>>     __typetag1 applies to ptr2
>>> with __attribute__ format, it applies pointee of ptr1.
>>>
>>> But more importantly, c2x format is incompatible with
>>> the usage of linux kernel. The following are a bunch of kernel
>>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>>
>>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>>> *vdev, char __user *buf,
>>> vfio_pci_core.h:                                  char __user *buf,
>>> size_t count,
>>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>>> vfio_pci_core_device *vdev, char __user *buf,
>>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>>> vfio_pci_core_device *vdev, char __user *buf,
>>> vfio_pci_core.h:                                      char __user
>>> *buf, size_t count,
>>> vfio_pci_core.h:                                void __user *arg,
>>> size_t argsz);
>>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>>> *core_vdev, char __user *buf,
>>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>>> *core_vdev, const char __user *buf,
>>> vringh.h:                    vring_desc_t __user *desc,
>>> vringh.h:                    vring_avail_t __user *avail,
>>> vringh.h:                    vring_used_t __user *used);
>>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>>
>>> You can see, we will not able to simply replace __user
>>> with [[clang::btf_type_tag("user")]] because it won't work
>>> according to c2x expectations.
> 
> Hi,
> 
> Thanks for checking this. I see that we probably cannot use the c2x
> syntax in the kernel, since it will not work as a drop-in replacement
> for the current uses.
> 
>>
>> Hi Yongsong.
>>
>> I am a bit confused regarding the GNU attributes problem: our patch
>> supports it, but as David already noted:
>>
>>>>>> There is still the question of why the DWARF generated for this case
>>>>>> that I have been concerned about:
>>>>>>
>>>>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>
>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>> described. I do not think there is any problem here.
>>>>>>
>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>> than the type to which it points.
>>
>> Note the example he uses is:
>>
>>    (a) int __typetag1 * __typetag2 __typetag3 * g;
>>
>> Not
>>
>>    (b) int * __typetag1 * __typetag2 __typetag3 g;
>>
>> Apparently for (a) clang is generating DWARF that associates __typetag2
>> and__typetag3 to g's type (the pointer to pointer) instead of the
>> pointer to int, which contravenes the GNU syntax rules.
>>
>> AFAIK thats is where the DWARF we generate differs, and what is blocking
>> us.  David will correct me in the likely case I'm wrong :)
> 
> Right. This is what I hoped maybe the C2x syntax could resolve.
> 
> The issue I saw is that in the case (a) above, when using the GNU
> attribute syntax, GCC and clang produce different results. I think that
> the underlying cause is some subtle difference in how clang is handling
> the GNU attribute syntax in the case compared to GCC.
> 
> 
> To remind ourselves, here is the full example. Notice the significant
> difference in which objects the tags are associated with in DWARF.
> 
> 
> #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
> #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
> #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
> 
> int __typetag1 * __typetag2 __typetag3 * g;
> 
> 
> GCC: bpf-unknown-none-gcc -c -gdwarf -gbtf annotate.c
> 
> 0x0000000c: DW_TAG_compile_unit
>                DW_AT_producer	("GNU C17 12.0.1 20220401 (experimental) -gdwarf -gbtf")
>                DW_AT_language	(DW_LANG_C11)
>                DW_AT_name	("annotate.c")
>                DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>                DW_AT_stmt_list	(0x00000000)
> 
> 0x0000001e:   DW_TAG_variable
>                  DW_AT_name	("g")
>                  DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>                  DW_AT_decl_line	(11)
>                  DW_AT_decl_column	(0x2a)
>                  DW_AT_type	(0x00000032 "int **")
>                  DW_AT_external	(true)
>                  DW_AT_location	(DW_OP_addr 0x0)
> 
> 0x00000032:   DW_TAG_pointer_type
>                  DW_AT_byte_size	(8)
>                  DW_AT_type	(0x00000045 "int *")
>                  DW_AT_sibling	(0x00000045)
> 
> 0x0000003b:     DW_TAG_LLVM_annotation
>                    DW_AT_name	("btf_type_tag")
>                    DW_AT_const_value	("type-tag-1")
> 
> 0x00000044:     NULL
> 
> 0x00000045:   DW_TAG_pointer_type
>                  DW_AT_byte_size	(8)
>                  DW_AT_type	(0x00000061 "int")
>                  DW_AT_sibling	(0x00000061)
> 
> 0x0000004e:     DW_TAG_LLVM_annotation
>                    DW_AT_name	("btf_type_tag")
>                    DW_AT_const_value	("type-tag-3")
> 
> 0x00000057:     DW_TAG_LLVM_annotation
>                    DW_AT_name	("btf_type_tag")
>                    DW_AT_const_value	("type-tag-2")
> 
> 0x00000060:     NULL
> 
> 0x00000061:   DW_TAG_base_type
>                  DW_AT_byte_size	(0x04)
>                  DW_AT_encoding	(DW_ATE_signed)
>                  DW_AT_name	("int")
> 
> 0x00000068:   NULL

do you have documentation to show why gnu generates attribute this way?
If dwarf generates
     ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
does this help?

> 
> 
> clang: clang -target bpf -c -g annotate.c
> 
> 0x0000000c: DW_TAG_compile_unit
>                DW_AT_producer	("clang version 15.0.0 (https://github.com/llvm/llvm-project.git f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>                DW_AT_language	(DW_LANG_C99)
>                DW_AT_name	("annotate.c")
>                DW_AT_str_offsets_base	(0x00000008)
>                DW_AT_stmt_list	(0x00000000)
>                DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>                DW_AT_addr_base	(0x00000008)
> 
> 0x0000001e:   DW_TAG_variable
>                  DW_AT_name	("g")
>                  DW_AT_type	(0x00000029 "int **")
>                  DW_AT_external	(true)
>                  DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>                  DW_AT_decl_line	(11)
>                  DW_AT_location	(DW_OP_addrx 0x0)
> 
> 0x00000029:   DW_TAG_pointer_type
>                  DW_AT_type	(0x00000035 "int *")
> 
> 0x0000002e:     DW_TAG_LLVM_annotation
>                    DW_AT_name	("btf_type_tag")
>                    DW_AT_const_value	("type-tag-2")
> 
> 0x00000031:     DW_TAG_LLVM_annotation
>                    DW_AT_name	("btf_type_tag")
>                    DW_AT_const_value	("type-tag-3")
> 
> 0x00000034:     NULL
> 
> 0x00000035:   DW_TAG_pointer_type
>                  DW_AT_type	(0x0000003e "int")
> 
> 0x0000003a:     DW_TAG_LLVM_annotation
>                    DW_AT_name	("btf_type_tag")
>                    DW_AT_const_value	("type-tag-1")
> 
> 0x0000003d:     NULL
> 
> 0x0000003e:   DW_TAG_base_type
>                  DW_AT_name	("int")
>                  DW_AT_encoding	(DW_ATE_signed)
>                  DW_AT_byte_size	(0x04)
> 
> 0x00000042:   NULL
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-24 16:03                         ` Yonghong Song
@ 2022-05-24 17:04                           ` David Faust
  2022-05-26  7:29                             ` Yonghong Song
  0 siblings, 1 reply; 30+ messages in thread
From: David Faust @ 2022-05-24 17:04 UTC (permalink / raw)
  To: Yonghong Song, Jose E. Marchesi
  Cc: Joseph Myers, Yonghong Song via Gcc-patches



On 5/24/22 09:03, Yonghong Song wrote:
> 
> 
> On 5/24/22 8:53 AM, David Faust wrote:
>>
>>
>> On 5/24/22 04:07, Jose E. Marchesi wrote:
>>>
>>>> On 5/11/22 11:44 AM, David Faust wrote:
>>>>>
>>>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Consider the following example:
>>>>>>>>>>>>
>>>>>>>>>>>>        #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>>>        #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>>>        #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>>>
>>>>>>>>>>>>        int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>>
>>>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>>>> 'tag2' and
>>>>>>>>>>>> 'tag3',
>>>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>>>
>>>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>>>> C2x [[]]
>>>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>>>> apply to
>>>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>>>> you had a
>>>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>>>> qualifier)
>>>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>>>> type.  See
>>>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>>>> defined for
>>>>>>>>>>> GNU
>>>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>>>> matching
>>>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>>>> would be
>>>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>>>> "T D"
>>>>>>>>>>> for a sub-declarator D.
>>>>>>>>>>>     >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>>>> which is "a
>>>>>>>>>> pointer with
>>>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>>>> int", i.e.
>>>>>>>>>>>
>>>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>>>> pointer
>>>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>>>
>>>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>>>> attribute
>>>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>>>> the syntax
>>>>>>>>>> was not correct.
>>>>>>>>>>
>>>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>>>> why it is
>>>>>>>>>> incorrect is the same.)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>>>> other users of the tags?
>>>>>>>>>
>>>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>>>
>>>>>>>>> $ cat t.c
>>>>>>>>> int const *ptr;
>>>>>>>>>
>>>>>>>>> the llvm ir debuginfo:
>>>>>>>>>
>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>
>>>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>>>
>>>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>>>
>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>>>> annotations: !7)
>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>> !7 = !{!8}
>>>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>>>
>>>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>>>> totally okay to have IR looks like
>>>>>>>>>
>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>
>>>>>>>> OK, thanks.
>>>>>>>>
>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>> that I have been concerned about:
>>>>>>>>
>>>>>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>
>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>
>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>> than the type to which it points.
>>>>>>>>
>>>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>>>> is very important, but it is worth noting.
>>>>>>>>
>>>>>>>>
>>>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>>>> which construct the tag applies.
>>>>>>>>
>>>>>>>> This would also be a way around any issues in handling the attributes
>>>>>>>> due to the GNU syntax.
>>>>>>>>
>>>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>>>> fully supported in clang yet?
>>>>>>>
>>>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>>>> cannot use c2x syntax.
>>>>>>
>>>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>>>> with that syntax and llvm15 to see what is the issue and may help
>>>>>> fix it in clang if possible.
>>>>>
>>>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>>>> syntax makes it quite clear to which entity each attribute applies,
>>>>> and in my opinion is a little more intuitive/less surprising too.
>>>>> It's documented here (PDF):
>>>>>     https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
>>>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>>>> declarations. Section 6.7.6.1 specifically describes using the
>>>>> attribute syntax with pointer declarators.
>>>>> The attribute syntax itself for BTF tags is:
>>>>>     [[clang::btf_type_tag("tag1")]]
>>>>> or
>>>>>     [[gnu::btf_type_tag("tag1")]]
>>>>>
>>>>> I am also looking into whether, with the C2x syntax, we really need two
>>>>> separate attributes (type_tag and decl_tag) at the language
>>>>> level. It might be possible with C2x syntax to use just one language
>>>>> attribute (e.g. just btf_tag).
>>>>>
>>>>> A simple declaration for a tagged pointer to an int:
>>>>>     int * [[gnu::btf_type_tag("tag1")]] x;
>>>>> And for the example from this thread:
>>>>>     #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>>>     #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>>>     #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>>>     int * __typetag1 * __typetag2 __typetag3 g;
>>>>> Here each tag applies to the preceding pointer, so the result is
>>>>> unsurprising.
>>>>> Actually, this is where I found something that looks like an issue
>>>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>>>> but with no warning nor other indication.
>>>>> Compiling this example with gcc:
>>>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>>>> -o c2x.o --std=c2x
>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>                 DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>>>                 DW_AT_language    (DW_LANG_C11)
>>>>>                 DW_AT_name    ("c2x.c")
>>>>>                 DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>                 DW_AT_stmt_list    (0x00000000)
>>>>> 0x0000001e:   DW_TAG_variable
>>>>>                   DW_AT_name    ("g")
>>>>>                   DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>                   DW_AT_decl_line    (16)
>>>>>                   DW_AT_decl_column    (0x2a)
>>>>>                   DW_AT_type    (0x00000032 "int **")
>>>>>                   DW_AT_external    (true)
>>>>>                   DW_AT_location    (DW_OP_addr 0x0)
>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>                   DW_AT_byte_size    (8)
>>>>>                   DW_AT_type    (0x0000004e "int *")
>>>>>                   DW_AT_sibling    (0x0000004e)
>>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>>                     DW_AT_name    ("btf_type_tag")
>>>>>                     DW_AT_const_value    ("type-tag-3")
>>>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>>>                     DW_AT_name    ("btf_type_tag")
>>>>>                     DW_AT_const_value    ("type-tag-2")
>>>>> 0x0000004d:     NULL
>>>>> 0x0000004e:   DW_TAG_pointer_type
>>>>>                   DW_AT_byte_size    (8)
>>>>>                   DW_AT_type    (0x00000061 "int")
>>>>>                   DW_AT_sibling    (0x00000061)
>>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>>                     DW_AT_name    ("btf_type_tag")
>>>>>                     DW_AT_const_value    ("type-tag-1")
>>>>> 0x00000060:     NULL
>>>>> 0x00000061:   DW_TAG_base_type
>>>>>                   DW_AT_byte_size    (0x04)
>>>>>                   DW_AT_encoding    (DW_ATE_signed)
>>>>>                   DW_AT_name    ("int")
>>>>> 0x00000068:   NULL
>>>>>
>>>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>>>> --std=c2x
>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>                 DW_AT_producer    ("clang version 15.0.0
>>>>> (https://github.com/llvm/llvm-project.git
>>>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>>                 DW_AT_language    (DW_LANG_C99)
>>>>>                 DW_AT_name    ("c2x.c")
>>>>>                 DW_AT_str_offsets_base    (0x00000008)
>>>>>                 DW_AT_stmt_list    (0x00000000)
>>>>>                 DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>                 DW_AT_addr_base    (0x00000008)
>>>>> 0x0000001e:   DW_TAG_variable
>>>>>                   DW_AT_name    ("g")
>>>>>                   DW_AT_type    (0x00000029 "int **")
>>>>>                   DW_AT_external    (true)
>>>>>                   DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>                   DW_AT_decl_line    (12)
>>>>>                   DW_AT_location    (DW_OP_addrx 0x0)
>>>>> 0x00000029:   DW_TAG_pointer_type
>>>>>                   DW_AT_type    (0x00000032 "int *")
>>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>>                     DW_AT_name    ("btf_type_tag")
>>>>>                     DW_AT_const_value    ("type-tag-1")
>>>>> 0x00000031:     NULL
>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>                   DW_AT_type    (0x00000037 "int")
>>>>> 0x00000037:   DW_TAG_base_type
>>>>>                   DW_AT_name    ("int")
>>>>>                   DW_AT_encoding    (DW_ATE_signed)
>>>>>                   DW_AT_byte_size    (0x04)
>>>>> 0x0000003b:   NULL
>>>>
>>>> Thanks. I checked with current clang. The generated code looks
>>>> like above. Basically, for code like below
>>>>
>>>>     #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>>>     #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>>>     #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>>>
>>>>     int * __typetag1 * __typetag2 __typetag3 g;
>>>>
>>>> The IR type looks like
>>>>     __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>>>
>>>> The IR is similar to what we did if using
>>>> __attribute__((btf_type_tag(""))), but their
>>>> semantic interpretation is quite different.
>>>> For example, with c2x format,
>>>>     __typetag1 applies to ptr2
>>>> with __attribute__ format, it applies pointee of ptr1.
>>>>
>>>> But more importantly, c2x format is incompatible with
>>>> the usage of linux kernel. The following are a bunch of kernel
>>>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>>>
>>>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>>>> *vdev, char __user *buf,
>>>> vfio_pci_core.h:                                  char __user *buf,
>>>> size_t count,
>>>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>> vfio_pci_core.h:                                      char __user
>>>> *buf, size_t count,
>>>> vfio_pci_core.h:                                void __user *arg,
>>>> size_t argsz);
>>>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>>>> *core_vdev, char __user *buf,
>>>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>>>> *core_vdev, const char __user *buf,
>>>> vringh.h:                    vring_desc_t __user *desc,
>>>> vringh.h:                    vring_avail_t __user *avail,
>>>> vringh.h:                    vring_used_t __user *used);
>>>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>>>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>>>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>>>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>>>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>>>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>>>
>>>> You can see, we will not able to simply replace __user
>>>> with [[clang::btf_type_tag("user")]] because it won't work
>>>> according to c2x expectations.
>>
>> Hi,
>>
>> Thanks for checking this. I see that we probably cannot use the c2x
>> syntax in the kernel, since it will not work as a drop-in replacement
>> for the current uses.
>>
>>>
>>> Hi Yongsong.
>>>
>>> I am a bit confused regarding the GNU attributes problem: our patch
>>> supports it, but as David already noted:
>>>
>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>> that I have been concerned about:
>>>>>>>
>>>>>>>     int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>
>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>> described. I do not think there is any problem here.
>>>>>>>
>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>> than the type to which it points.
>>>
>>> Note the example he uses is:
>>>
>>>    (a) int __typetag1 * __typetag2 __typetag3 * g;
>>>
>>> Not
>>>
>>>    (b) int * __typetag1 * __typetag2 __typetag3 g;
>>>
>>> Apparently for (a) clang is generating DWARF that associates __typetag2
>>> and__typetag3 to g's type (the pointer to pointer) instead of the
>>> pointer to int, which contravenes the GNU syntax rules.
>>>
>>> AFAIK thats is where the DWARF we generate differs, and what is blocking
>>> us.  David will correct me in the likely case I'm wrong :)
>>
>> Right. This is what I hoped maybe the C2x syntax could resolve.
>>
>> The issue I saw is that in the case (a) above, when using the GNU
>> attribute syntax, GCC and clang produce different results. I think that
>> the underlying cause is some subtle difference in how clang is handling
>> the GNU attribute syntax in the case compared to GCC.
>>
>>
>> To remind ourselves, here is the full example. Notice the significant
>> difference in which objects the tags are associated with in DWARF.
>>
>>
>> #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>> #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>> #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>
>> int __typetag1 * __typetag2 __typetag3 * g;
>>
>>
>> GCC: bpf-unknown-none-gcc -c -gdwarf -gbtf annotate.c
>>
>> 0x0000000c: DW_TAG_compile_unit
>>                DW_AT_producer	("GNU C17 12.0.1 20220401 (experimental) -gdwarf -gbtf")
>>                DW_AT_language	(DW_LANG_C11)
>>                DW_AT_name	("annotate.c")
>>                DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>                DW_AT_stmt_list	(0x00000000)
>>
>> 0x0000001e:   DW_TAG_variable
>>                  DW_AT_name	("g")
>>                  DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>                  DW_AT_decl_line	(11)
>>                  DW_AT_decl_column	(0x2a)
>>                  DW_AT_type	(0x00000032 "int **")
>>                  DW_AT_external	(true)
>>                  DW_AT_location	(DW_OP_addr 0x0)
>>
>> 0x00000032:   DW_TAG_pointer_type
>>                  DW_AT_byte_size	(8)
>>                  DW_AT_type	(0x00000045 "int *")
>>                  DW_AT_sibling	(0x00000045)
>>
>> 0x0000003b:     DW_TAG_LLVM_annotation
>>                    DW_AT_name	("btf_type_tag")
>>                    DW_AT_const_value	("type-tag-1")
>>
>> 0x00000044:     NULL
>>
>> 0x00000045:   DW_TAG_pointer_type
>>                  DW_AT_byte_size	(8)
>>                  DW_AT_type	(0x00000061 "int")
>>                  DW_AT_sibling	(0x00000061)
>>
>> 0x0000004e:     DW_TAG_LLVM_annotation
>>                    DW_AT_name	("btf_type_tag")
>>                    DW_AT_const_value	("type-tag-3")
>>
>> 0x00000057:     DW_TAG_LLVM_annotation
>>                    DW_AT_name	("btf_type_tag")
>>                    DW_AT_const_value	("type-tag-2")
>>
>> 0x00000060:     NULL
>>
>> 0x00000061:   DW_TAG_base_type
>>                  DW_AT_byte_size	(0x04)
>>                  DW_AT_encoding	(DW_ATE_signed)
>>                  DW_AT_name	("int")
>>
>> 0x00000068:   NULL
> 
> do you have documentation to show why gnu generates attribute this way?
> If dwarf generates
>      ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
> does this help?

Okay, I think I see the problem. The internal representations between clang
and GCC attach the attributes to different nodes, and as a result they
produce different DWARF:

!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, 
annotations: !10)
!6 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64, 
annotations: !8)
!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!8 = !{!9}
!9 = !{!"btf_type_tag", !"tag1"}
!10 = !{!11, !12}
!11 = !{!"btf_type_tag", !"tag2"}
!12 = !{!"btf_type_tag", !"tag3"}

If I am reading this IR right, then the tags "tag2" and "tag3" are being
applied to the int**, and "tag1" is applied to the int*

But I don't think this lines up with how the attribute syntax is defined.
See
  https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
In particular the "All other attributes" section. (It's a bit dense).
Or, as Joseph summed it up nicely earlier in the thread:
> In either syntax, __typetag2 __typetag3 should apply to 
> the type to which g points, not to g or its type, just as if you had a 
> type qualifier there.  You'd need to put the attributes (or qualifier) 
> after the *, not before, to make them apply to the pointer type.


Compare that to GCC's internal representation, from which DWARF is generated:

 <var_decl 0x7ffff7535090 g
    type <pointer_type 0x7ffff74f8888
        type <pointer_type 0x7ffff74f8b28 type <integer_type 0x7ffff74385e8 int>
            unsigned DI
            size <integer_cst 0x7ffff742b450 constant 64>
            unit-size <integer_cst 0x7ffff742b468 constant 8>
            align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff743f888
            attributes <tree_list 0x7ffff75165c8
                purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
                value <tree_list 0x7ffff7516550
                    value <string_cst 0x7ffff75182e0 type <array_type 0x7ffff74f8738>
                        readonly constant static "type-tag-3\000">>
                chain <tree_list 0x7ffff75165a0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
                    value <tree_list 0x7ffff75164d8
                        value <string_cst 0x7ffff75182c0 type <array_type 0x7ffff74f8738>
                            readonly constant static "type-tag-2\000">>>>
            pointer_to_this <pointer_type 0x7ffff74f8bd0>>
        unsigned DI size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74f87e0
        attributes <tree_list 0x7ffff75165f0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
            value <tree_list 0x7ffff7516438
                value <string_cst 0x7ffff75182a0 type <array_type 0x7ffff74f8738>
                    readonly constant static "type-tag-1\000">>>>
    public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/annotate.c:10:42 size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
    align:64 warn_if_not_align:0>

See how tags "tag2" and "tag3" are associated with the pointer_type 0x7ffff74f8b28,
that is, "the type to which g points"

From GCC's DWARF the BTF we get currently looks like:
  VAR(g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
which is obviously quite different and why this case caught my attention.

I think this difference is the root of our problems. It might not be 
specifically related to the BTF tag attributes but they do reveal some 
discrepency between how clang and GCC handle the attribute syntax.

> 
>>
>>
>> clang: clang -target bpf -c -g annotate.c
>>
>> 0x0000000c: DW_TAG_compile_unit
>>                DW_AT_producer	("clang version 15.0.0 (https://github.com/llvm/llvm-project.git f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>                DW_AT_language	(DW_LANG_C99)
>>                DW_AT_name	("annotate.c")
>>                DW_AT_str_offsets_base	(0x00000008)
>>                DW_AT_stmt_list	(0x00000000)
>>                DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>                DW_AT_addr_base	(0x00000008)
>>
>> 0x0000001e:   DW_TAG_variable
>>                  DW_AT_name	("g")
>>                  DW_AT_type	(0x00000029 "int **")
>>                  DW_AT_external	(true)
>>                  DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>                  DW_AT_decl_line	(11)
>>                  DW_AT_location	(DW_OP_addrx 0x0)
>>
>> 0x00000029:   DW_TAG_pointer_type
>>                  DW_AT_type	(0x00000035 "int *")
>>
>> 0x0000002e:     DW_TAG_LLVM_annotation
>>                    DW_AT_name	("btf_type_tag")
>>                    DW_AT_const_value	("type-tag-2")
>>
>> 0x00000031:     DW_TAG_LLVM_annotation
>>                    DW_AT_name	("btf_type_tag")
>>                    DW_AT_const_value	("type-tag-3")
>>
>> 0x00000034:     NULL
>>
>> 0x00000035:   DW_TAG_pointer_type
>>                  DW_AT_type	(0x0000003e "int")
>>
>> 0x0000003a:     DW_TAG_LLVM_annotation
>>                    DW_AT_name	("btf_type_tag")
>>                    DW_AT_const_value	("type-tag-1")
>>
>> 0x0000003d:     NULL
>>
>> 0x0000003e:   DW_TAG_base_type
>>                  DW_AT_name	("int")
>>                  DW_AT_encoding	(DW_ATE_signed)
>>                  DW_AT_byte_size	(0x04)
>>
>> 0x00000042:   NULL
>>
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-24 17:04                           ` David Faust
@ 2022-05-26  7:29                             ` Yonghong Song
  2022-05-27 19:56                               ` David Faust
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-05-26  7:29 UTC (permalink / raw)
  To: David Faust, Jose E. Marchesi; +Cc: Joseph Myers, Yonghong Song via Gcc-patches



On 5/24/22 10:04 AM, David Faust wrote:
> 
> 
> On 5/24/22 09:03, Yonghong Song wrote:
>>
>>
>> On 5/24/22 8:53 AM, David Faust wrote:
>>>
>>>
>>> On 5/24/22 04:07, Jose E. Marchesi wrote:
>>>>
>>>>> On 5/11/22 11:44 AM, David Faust wrote:
>>>>>>
>>>>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Consider the following example:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>>>>         #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>>>>         #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>>>>
>>>>>>>>>>>>>         int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>>>
>>>>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>>>>> 'tag2' and
>>>>>>>>>>>>> 'tag3',
>>>>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>>>>
>>>>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>>>>> C2x [[]]
>>>>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>>>>> apply to
>>>>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>>>>> you had a
>>>>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>>>>> qualifier)
>>>>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>>>>> type.  See
>>>>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>>>>> defined for
>>>>>>>>>>>> GNU
>>>>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>>>>> matching
>>>>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>>>>> would be
>>>>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>>>>> "T D"
>>>>>>>>>>>> for a sub-declarator D.
>>>>>>>>>>>>      >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>>>>> which is "a
>>>>>>>>>>> pointer with
>>>>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>>>>> int", i.e.
>>>>>>>>>>>>
>>>>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>>>>> pointer
>>>>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>>>>
>>>>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>>>>> attribute
>>>>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>>>>> the syntax
>>>>>>>>>>> was not correct.
>>>>>>>>>>>
>>>>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>>>>> why it is
>>>>>>>>>>> incorrect is the same.)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>>>>> other users of the tags?
>>>>>>>>>>
>>>>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>>>>
>>>>>>>>>> $ cat t.c
>>>>>>>>>> int const *ptr;
>>>>>>>>>>
>>>>>>>>>> the llvm ir debuginfo:
>>>>>>>>>>
>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>
>>>>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>>>>
>>>>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>>>>
>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>>>>> annotations: !7)
>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>> !7 = !{!8}
>>>>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>>>>
>>>>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>>>>> totally okay to have IR looks like
>>>>>>>>>>
>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>
>>>>>>>>> OK, thanks.
>>>>>>>>>
>>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>>> that I have been concerned about:
>>>>>>>>>
>>>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>
>>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>>
>>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>>> than the type to which it points.
>>>>>>>>>
>>>>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>>>>> is very important, but it is worth noting.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>>>>> which construct the tag applies.
>>>>>>>>>
>>>>>>>>> This would also be a way around any issues in handling the attributes
>>>>>>>>> due to the GNU syntax.
>>>>>>>>>
>>>>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>>>>> fully supported in clang yet?
>>>>>>>>
>>>>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>>>>> cannot use c2x syntax.
>>>>>>>
>>>>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>>>>> with that syntax and llvm15 to see what is the issue and may help
>>>>>>> fix it in clang if possible.
>>>>>>
>>>>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>>>>> syntax makes it quite clear to which entity each attribute applies,
>>>>>> and in my opinion is a little more intuitive/less surprising too.
>>>>>> It's documented here (PDF):
>>>>>>      https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
>>>>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>>>>> declarations. Section 6.7.6.1 specifically describes using the
>>>>>> attribute syntax with pointer declarators.
>>>>>> The attribute syntax itself for BTF tags is:
>>>>>>      [[clang::btf_type_tag("tag1")]]
>>>>>> or
>>>>>>      [[gnu::btf_type_tag("tag1")]]
>>>>>>
>>>>>> I am also looking into whether, with the C2x syntax, we really need two
>>>>>> separate attributes (type_tag and decl_tag) at the language
>>>>>> level. It might be possible with C2x syntax to use just one language
>>>>>> attribute (e.g. just btf_tag).
>>>>>>
>>>>>> A simple declaration for a tagged pointer to an int:
>>>>>>      int * [[gnu::btf_type_tag("tag1")]] x;
>>>>>> And for the example from this thread:
>>>>>>      #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>>>>      #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>>>>      #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>>>>      int * __typetag1 * __typetag2 __typetag3 g;
>>>>>> Here each tag applies to the preceding pointer, so the result is
>>>>>> unsurprising.
>>>>>> Actually, this is where I found something that looks like an issue
>>>>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>>>>> but with no warning nor other indication.
>>>>>> Compiling this example with gcc:
>>>>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>>>>> -o c2x.o --std=c2x
>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>                  DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>>>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>>>>                  DW_AT_language    (DW_LANG_C11)
>>>>>>                  DW_AT_name    ("c2x.c")
>>>>>>                  DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>                  DW_AT_stmt_list    (0x00000000)
>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>                    DW_AT_name    ("g")
>>>>>>                    DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>                    DW_AT_decl_line    (16)
>>>>>>                    DW_AT_decl_column    (0x2a)
>>>>>>                    DW_AT_type    (0x00000032 "int **")
>>>>>>                    DW_AT_external    (true)
>>>>>>                    DW_AT_location    (DW_OP_addr 0x0)
>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>                    DW_AT_byte_size    (8)
>>>>>>                    DW_AT_type    (0x0000004e "int *")
>>>>>>                    DW_AT_sibling    (0x0000004e)
>>>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>                      DW_AT_const_value    ("type-tag-3")
>>>>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>                      DW_AT_const_value    ("type-tag-2")
>>>>>> 0x0000004d:     NULL
>>>>>> 0x0000004e:   DW_TAG_pointer_type
>>>>>>                    DW_AT_byte_size    (8)
>>>>>>                    DW_AT_type    (0x00000061 "int")
>>>>>>                    DW_AT_sibling    (0x00000061)
>>>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>                      DW_AT_const_value    ("type-tag-1")
>>>>>> 0x00000060:     NULL
>>>>>> 0x00000061:   DW_TAG_base_type
>>>>>>                    DW_AT_byte_size    (0x04)
>>>>>>                    DW_AT_encoding    (DW_ATE_signed)
>>>>>>                    DW_AT_name    ("int")
>>>>>> 0x00000068:   NULL
>>>>>>
>>>>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>>>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>>>>> --std=c2x
>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>                  DW_AT_producer    ("clang version 15.0.0
>>>>>> (https://github.com/llvm/llvm-project.git
>>>>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>>>                  DW_AT_language    (DW_LANG_C99)
>>>>>>                  DW_AT_name    ("c2x.c")
>>>>>>                  DW_AT_str_offsets_base    (0x00000008)
>>>>>>                  DW_AT_stmt_list    (0x00000000)
>>>>>>                  DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>                  DW_AT_addr_base    (0x00000008)
>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>                    DW_AT_name    ("g")
>>>>>>                    DW_AT_type    (0x00000029 "int **")
>>>>>>                    DW_AT_external    (true)
>>>>>>                    DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>                    DW_AT_decl_line    (12)
>>>>>>                    DW_AT_location    (DW_OP_addrx 0x0)
>>>>>> 0x00000029:   DW_TAG_pointer_type
>>>>>>                    DW_AT_type    (0x00000032 "int *")
>>>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>                      DW_AT_const_value    ("type-tag-1")
>>>>>> 0x00000031:     NULL
>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>                    DW_AT_type    (0x00000037 "int")
>>>>>> 0x00000037:   DW_TAG_base_type
>>>>>>                    DW_AT_name    ("int")
>>>>>>                    DW_AT_encoding    (DW_ATE_signed)
>>>>>>                    DW_AT_byte_size    (0x04)
>>>>>> 0x0000003b:   NULL
>>>>>
>>>>> Thanks. I checked with current clang. The generated code looks
>>>>> like above. Basically, for code like below
>>>>>
>>>>>      #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>>>>      #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>>>>      #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>>>>
>>>>>      int * __typetag1 * __typetag2 __typetag3 g;
>>>>>
>>>>> The IR type looks like
>>>>>      __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>>>>
>>>>> The IR is similar to what we did if using
>>>>> __attribute__((btf_type_tag(""))), but their
>>>>> semantic interpretation is quite different.
>>>>> For example, with c2x format,
>>>>>      __typetag1 applies to ptr2
>>>>> with __attribute__ format, it applies pointee of ptr1.
>>>>>
>>>>> But more importantly, c2x format is incompatible with
>>>>> the usage of linux kernel. The following are a bunch of kernel
>>>>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>>>>
>>>>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>>>>> *vdev, char __user *buf,
>>>>> vfio_pci_core.h:                                  char __user *buf,
>>>>> size_t count,
>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>> vfio_pci_core.h:                                      char __user
>>>>> *buf, size_t count,
>>>>> vfio_pci_core.h:                                void __user *arg,
>>>>> size_t argsz);
>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>>>>> *core_vdev, char __user *buf,
>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>>>>> *core_vdev, const char __user *buf,
>>>>> vringh.h:                    vring_desc_t __user *desc,
>>>>> vringh.h:                    vring_avail_t __user *avail,
>>>>> vringh.h:                    vring_used_t __user *used);
>>>>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>>>>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>>>>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>>>>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>>>>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>>>>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>>>>
>>>>> You can see, we will not able to simply replace __user
>>>>> with [[clang::btf_type_tag("user")]] because it won't work
>>>>> according to c2x expectations.
>>>
>>> Hi,
>>>
>>> Thanks for checking this. I see that we probably cannot use the c2x
>>> syntax in the kernel, since it will not work as a drop-in replacement
>>> for the current uses.
>>>
>>>>
>>>> Hi Yongsong.
>>>>
>>>> I am a bit confused regarding the GNU attributes problem: our patch
>>>> supports it, but as David already noted:
>>>>
>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>> that I have been concerned about:
>>>>>>>>
>>>>>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>
>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>
>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>> than the type to which it points.
>>>>
>>>> Note the example he uses is:
>>>>
>>>>     (a) int __typetag1 * __typetag2 __typetag3 * g;
>>>>
>>>> Not
>>>>
>>>>     (b) int * __typetag1 * __typetag2 __typetag3 g;
>>>>
>>>> Apparently for (a) clang is generating DWARF that associates __typetag2
>>>> and__typetag3 to g's type (the pointer to pointer) instead of the
>>>> pointer to int, which contravenes the GNU syntax rules.
>>>>
>>>> AFAIK thats is where the DWARF we generate differs, and what is blocking
>>>> us.  David will correct me in the likely case I'm wrong :)
>>>
>>> Right. This is what I hoped maybe the C2x syntax could resolve.
>>>
>>> The issue I saw is that in the case (a) above, when using the GNU
>>> attribute syntax, GCC and clang produce different results. I think that
>>> the underlying cause is some subtle difference in how clang is handling
>>> the GNU attribute syntax in the case compared to GCC.
>>>
>>>
>>> To remind ourselves, here is the full example. Notice the significant
>>> difference in which objects the tags are associated with in DWARF.
>>>
>>>
>>> #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>> #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>> #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>>
>>> int __typetag1 * __typetag2 __typetag3 * g;
>>>
>>>
>>> GCC: bpf-unknown-none-gcc -c -gdwarf -gbtf annotate.c
>>>
>>> 0x0000000c: DW_TAG_compile_unit
>>>                 DW_AT_producer	("GNU C17 12.0.1 20220401 (experimental) -gdwarf -gbtf")
>>>                 DW_AT_language	(DW_LANG_C11)
>>>                 DW_AT_name	("annotate.c")
>>>                 DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>                 DW_AT_stmt_list	(0x00000000)
>>>
>>> 0x0000001e:   DW_TAG_variable
>>>                   DW_AT_name	("g")
>>>                   DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>                   DW_AT_decl_line	(11)
>>>                   DW_AT_decl_column	(0x2a)
>>>                   DW_AT_type	(0x00000032 "int **")
>>>                   DW_AT_external	(true)
>>>                   DW_AT_location	(DW_OP_addr 0x0)
>>>
>>> 0x00000032:   DW_TAG_pointer_type
>>>                   DW_AT_byte_size	(8)
>>>                   DW_AT_type	(0x00000045 "int *")
>>>                   DW_AT_sibling	(0x00000045)
>>>
>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name	("btf_type_tag")
>>>                     DW_AT_const_value	("type-tag-1")
>>>
>>> 0x00000044:     NULL
>>>
>>> 0x00000045:   DW_TAG_pointer_type
>>>                   DW_AT_byte_size	(8)
>>>                   DW_AT_type	(0x00000061 "int")
>>>                   DW_AT_sibling	(0x00000061)
>>>
>>> 0x0000004e:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name	("btf_type_tag")
>>>                     DW_AT_const_value	("type-tag-3")
>>>
>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name	("btf_type_tag")
>>>                     DW_AT_const_value	("type-tag-2")
>>>
>>> 0x00000060:     NULL
>>>
>>> 0x00000061:   DW_TAG_base_type
>>>                   DW_AT_byte_size	(0x04)
>>>                   DW_AT_encoding	(DW_ATE_signed)
>>>                   DW_AT_name	("int")
>>>
>>> 0x00000068:   NULL
>>
>> do you have documentation to show why gnu generates attribute this way?
>> If dwarf generates
>>       ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>> does this help?
> 
> Okay, I think I see the problem. The internal representations between clang
> and GCC attach the attributes to different nodes, and as a result they
> produce different DWARF:
> 
> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
> annotations: !10)
> !6 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64,
> annotations: !8)
> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
> !8 = !{!9}
> !9 = !{!"btf_type_tag", !"tag1"}
> !10 = !{!11, !12}
> !11 = !{!"btf_type_tag", !"tag2"}
> !12 = !{!"btf_type_tag", !"tag3"}
> 
> If I am reading this IR right, then the tags "tag2" and "tag3" are being
> applied to the int**, and "tag1" is applied to the int*
> 
> But I don't think this lines up with how the attribute syntax is defined.
> See
>    https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
> In particular the "All other attributes" section. (It's a bit dense).
> Or, as Joseph summed it up nicely earlier in the thread:
>> In either syntax, __typetag2 __typetag3 should apply to
>> the type to which g points, not to g or its type, just as if you had a
>> type qualifier there.  You'd need to put the attributes (or qualifier)
>> after the *, not before, to make them apply to the pointer type.
> 
> 
> Compare that to GCC's internal representation, from which DWARF is generated:
> 
>   <var_decl 0x7ffff7535090 g
>      type <pointer_type 0x7ffff74f8888
>          type <pointer_type 0x7ffff74f8b28 type <integer_type 0x7ffff74385e8 int>
>              unsigned DI
>              size <integer_cst 0x7ffff742b450 constant 64>
>              unit-size <integer_cst 0x7ffff742b468 constant 8>
>              align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff743f888
>              attributes <tree_list 0x7ffff75165c8
>                  purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>                  value <tree_list 0x7ffff7516550
>                      value <string_cst 0x7ffff75182e0 type <array_type 0x7ffff74f8738>
>                          readonly constant static "type-tag-3\000">>
>                  chain <tree_list 0x7ffff75165a0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>                      value <tree_list 0x7ffff75164d8
>                          value <string_cst 0x7ffff75182c0 type <array_type 0x7ffff74f8738>
>                              readonly constant static "type-tag-2\000">>>>
>              pointer_to_this <pointer_type 0x7ffff74f8bd0>>
>          unsigned DI size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>          align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74f87e0
>          attributes <tree_list 0x7ffff75165f0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>              value <tree_list 0x7ffff7516438
>                  value <string_cst 0x7ffff75182a0 type <array_type 0x7ffff74f8738>
>                      readonly constant static "type-tag-1\000">>>>
>      public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/annotate.c:10:42 size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>      align:64 warn_if_not_align:0>
> 
> See how tags "tag2" and "tag3" are associated with the pointer_type 0x7ffff74f8b28,
> that is, "the type to which g points"
> 
>  From GCC's DWARF the BTF we get currently looks like:
>    VAR(g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
> which is obviously quite different and why this case caught my attention.
> 
> I think this difference is the root of our problems. It might not be
> specifically related to the BTF tag attributes but they do reveal some
> discrepency between how clang and GCC handle the attribute syntax.

The btf_type attribute is very similar to address_space attribute.
For example,
$ cat t1.c
int __attribute__((address_space(1))) * p;
$ clang -g -S -emit-llvm t1.c

In IR, we will have
@p = dso_local global ptr addrspace(1) null, align 8, !dbg !0
...
!5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

Replacing address_space with btf_type_tag, we will get
ptr->type_tag->int in debuginfo.

But it looks like gcc doesn't support address_space attribute

$ gcc -g -S t1.c
t1.c:1:1: warning: ‘address_space’ attribute directive ignored 
[-Wattributes]
  int __attribute__((address_space(1))) * p;
  ^~~

Is it possible for gcc to go with address_space attribute
semantics for btf_type_tag attribute?

> 
>>
>>>
>>>
>>> clang: clang -target bpf -c -g annotate.c
>>>
>>> 0x0000000c: DW_TAG_compile_unit
>>>                 DW_AT_producer	("clang version 15.0.0 (https://github.com/llvm/llvm-project.git f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>                 DW_AT_language	(DW_LANG_C99)
>>>                 DW_AT_name	("annotate.c")
>>>                 DW_AT_str_offsets_base	(0x00000008)
>>>                 DW_AT_stmt_list	(0x00000000)
>>>                 DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>                 DW_AT_addr_base	(0x00000008)
>>>
>>> 0x0000001e:   DW_TAG_variable
>>>                   DW_AT_name	("g")
>>>                   DW_AT_type	(0x00000029 "int **")
>>>                   DW_AT_external	(true)
>>>                   DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>                   DW_AT_decl_line	(11)
>>>                   DW_AT_location	(DW_OP_addrx 0x0)
>>>
>>> 0x00000029:   DW_TAG_pointer_type
>>>                   DW_AT_type	(0x00000035 "int *")
>>>
>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name	("btf_type_tag")
>>>                     DW_AT_const_value	("type-tag-2")
>>>
>>> 0x00000031:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name	("btf_type_tag")
>>>                     DW_AT_const_value	("type-tag-3")
>>>
>>> 0x00000034:     NULL
>>>
>>> 0x00000035:   DW_TAG_pointer_type
>>>                   DW_AT_type	(0x0000003e "int")
>>>
>>> 0x0000003a:     DW_TAG_LLVM_annotation
>>>                     DW_AT_name	("btf_type_tag")
>>>                     DW_AT_const_value	("type-tag-1")
>>>
>>> 0x0000003d:     NULL
>>>
>>> 0x0000003e:   DW_TAG_base_type
>>>                   DW_AT_name	("int")
>>>                   DW_AT_encoding	(DW_ATE_signed)
>>>                   DW_AT_byte_size	(0x04)
>>>
>>> 0x00000042:   NULL
>>>
>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-26  7:29                             ` Yonghong Song
@ 2022-05-27 19:56                               ` David Faust
  2022-06-03  2:04                                 ` Yonghong Song
  0 siblings, 1 reply; 30+ messages in thread
From: David Faust @ 2022-05-27 19:56 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jose E. Marchesi, Joseph Myers, Yonghong Song via Gcc-patches



On 5/26/22 00:29, Yonghong Song wrote:
> 
> 
> On 5/24/22 10:04 AM, David Faust wrote:
>>
>>
>> On 5/24/22 09:03, Yonghong Song wrote:
>>>
>>>
>>> On 5/24/22 8:53 AM, David Faust wrote:
>>>>
>>>>
>>>> On 5/24/22 04:07, Jose E. Marchesi wrote:
>>>>>
>>>>>> On 5/11/22 11:44 AM, David Faust wrote:
>>>>>>>
>>>>>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consider the following example:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>>>>>         #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>>>>>         #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>>>>>> 'tag2' and
>>>>>>>>>>>>>> 'tag3',
>>>>>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>>>>>> C2x [[]]
>>>>>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>>>>>> apply to
>>>>>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>>>>>> you had a
>>>>>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>>>>>> qualifier)
>>>>>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>>>>>> type.  See
>>>>>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>>>>>> defined for
>>>>>>>>>>>>> GNU
>>>>>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>>>>>> matching
>>>>>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>>>>>> would be
>>>>>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>>>>>> "T D"
>>>>>>>>>>>>> for a sub-declarator D.
>>>>>>>>>>>>>      >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>>>>>> which is "a
>>>>>>>>>>>> pointer with
>>>>>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>>>>>> int", i.e.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>>>>>> pointer
>>>>>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>>>>>> attribute
>>>>>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>>>>>> the syntax
>>>>>>>>>>>> was not correct.
>>>>>>>>>>>>
>>>>>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>>>>>> why it is
>>>>>>>>>>>> incorrect is the same.)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>>>>>> other users of the tags?
>>>>>>>>>>>
>>>>>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>>>>>
>>>>>>>>>>> $ cat t.c
>>>>>>>>>>> int const *ptr;
>>>>>>>>>>>
>>>>>>>>>>> the llvm ir debuginfo:
>>>>>>>>>>>
>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>
>>>>>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>>>>>
>>>>>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>>>>>
>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>>>>>> annotations: !7)
>>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>> !7 = !{!8}
>>>>>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>>>>>
>>>>>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>>>>>> totally okay to have IR looks like
>>>>>>>>>>>
>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>
>>>>>>>>>> OK, thanks.
>>>>>>>>>>
>>>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>>>> that I have been concerned about:
>>>>>>>>>>
>>>>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>
>>>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>>>
>>>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>>>> than the type to which it points.
>>>>>>>>>>
>>>>>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>>>>>> is very important, but it is worth noting.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>>>>>> which construct the tag applies.
>>>>>>>>>>
>>>>>>>>>> This would also be a way around any issues in handling the attributes
>>>>>>>>>> due to the GNU syntax.
>>>>>>>>>>
>>>>>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>>>>>> fully supported in clang yet?
>>>>>>>>>
>>>>>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>>>>>> cannot use c2x syntax.
>>>>>>>>
>>>>>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>>>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>>>>>> with that syntax and llvm15 to see what is the issue and may help
>>>>>>>> fix it in clang if possible.
>>>>>>>
>>>>>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>>>>>> syntax makes it quite clear to which entity each attribute applies,
>>>>>>> and in my opinion is a little more intuitive/less surprising too.
>>>>>>> It's documented here (PDF):
>>>>>>>      https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
>>>>>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>>>>>> declarations. Section 6.7.6.1 specifically describes using the
>>>>>>> attribute syntax with pointer declarators.
>>>>>>> The attribute syntax itself for BTF tags is:
>>>>>>>      [[clang::btf_type_tag("tag1")]]
>>>>>>> or
>>>>>>>      [[gnu::btf_type_tag("tag1")]]
>>>>>>>
>>>>>>> I am also looking into whether, with the C2x syntax, we really need two
>>>>>>> separate attributes (type_tag and decl_tag) at the language
>>>>>>> level. It might be possible with C2x syntax to use just one language
>>>>>>> attribute (e.g. just btf_tag).
>>>>>>>
>>>>>>> A simple declaration for a tagged pointer to an int:
>>>>>>>      int * [[gnu::btf_type_tag("tag1")]] x;
>>>>>>> And for the example from this thread:
>>>>>>>      #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>>>>>      #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>>>>>      #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>>>>>      int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>> Here each tag applies to the preceding pointer, so the result is
>>>>>>> unsurprising.
>>>>>>> Actually, this is where I found something that looks like an issue
>>>>>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>>>>>> but with no warning nor other indication.
>>>>>>> Compiling this example with gcc:
>>>>>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>>>>>> -o c2x.o --std=c2x
>>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>>                  DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>>>>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>>>>>                  DW_AT_language    (DW_LANG_C11)
>>>>>>>                  DW_AT_name    ("c2x.c")
>>>>>>>                  DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>>                  DW_AT_stmt_list    (0x00000000)
>>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>>                    DW_AT_name    ("g")
>>>>>>>                    DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>>                    DW_AT_decl_line    (16)
>>>>>>>                    DW_AT_decl_column    (0x2a)
>>>>>>>                    DW_AT_type    (0x00000032 "int **")
>>>>>>>                    DW_AT_external    (true)
>>>>>>>                    DW_AT_location    (DW_OP_addr 0x0)
>>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>>                    DW_AT_byte_size    (8)
>>>>>>>                    DW_AT_type    (0x0000004e "int *")
>>>>>>>                    DW_AT_sibling    (0x0000004e)
>>>>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>>                      DW_AT_const_value    ("type-tag-3")
>>>>>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>>                      DW_AT_const_value    ("type-tag-2")
>>>>>>> 0x0000004d:     NULL
>>>>>>> 0x0000004e:   DW_TAG_pointer_type
>>>>>>>                    DW_AT_byte_size    (8)
>>>>>>>                    DW_AT_type    (0x00000061 "int")
>>>>>>>                    DW_AT_sibling    (0x00000061)
>>>>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>>                      DW_AT_const_value    ("type-tag-1")
>>>>>>> 0x00000060:     NULL
>>>>>>> 0x00000061:   DW_TAG_base_type
>>>>>>>                    DW_AT_byte_size    (0x04)
>>>>>>>                    DW_AT_encoding    (DW_ATE_signed)
>>>>>>>                    DW_AT_name    ("int")
>>>>>>> 0x00000068:   NULL
>>>>>>>
>>>>>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>>>>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>>>>>> --std=c2x
>>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>>                  DW_AT_producer    ("clang version 15.0.0
>>>>>>> (https://github.com/llvm/llvm-project.git
>>>>>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>>>>                  DW_AT_language    (DW_LANG_C99)
>>>>>>>                  DW_AT_name    ("c2x.c")
>>>>>>>                  DW_AT_str_offsets_base    (0x00000008)
>>>>>>>                  DW_AT_stmt_list    (0x00000000)
>>>>>>>                  DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>>                  DW_AT_addr_base    (0x00000008)
>>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>>                    DW_AT_name    ("g")
>>>>>>>                    DW_AT_type    (0x00000029 "int **")
>>>>>>>                    DW_AT_external    (true)
>>>>>>>                    DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>>                    DW_AT_decl_line    (12)
>>>>>>>                    DW_AT_location    (DW_OP_addrx 0x0)
>>>>>>> 0x00000029:   DW_TAG_pointer_type
>>>>>>>                    DW_AT_type    (0x00000032 "int *")
>>>>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>>>>                      DW_AT_name    ("btf_type_tag")
>>>>>>>                      DW_AT_const_value    ("type-tag-1")
>>>>>>> 0x00000031:     NULL
>>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>>                    DW_AT_type    (0x00000037 "int")
>>>>>>> 0x00000037:   DW_TAG_base_type
>>>>>>>                    DW_AT_name    ("int")
>>>>>>>                    DW_AT_encoding    (DW_ATE_signed)
>>>>>>>                    DW_AT_byte_size    (0x04)
>>>>>>> 0x0000003b:   NULL
>>>>>>
>>>>>> Thanks. I checked with current clang. The generated code looks
>>>>>> like above. Basically, for code like below
>>>>>>
>>>>>>      #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>>>>>      #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>>>>>      #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>>>>>
>>>>>>      int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>
>>>>>> The IR type looks like
>>>>>>      __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>>>>>
>>>>>> The IR is similar to what we did if using
>>>>>> __attribute__((btf_type_tag(""))), but their
>>>>>> semantic interpretation is quite different.
>>>>>> For example, with c2x format,
>>>>>>      __typetag1 applies to ptr2
>>>>>> with __attribute__ format, it applies pointee of ptr1.
>>>>>>
>>>>>> But more importantly, c2x format is incompatible with
>>>>>> the usage of linux kernel. The following are a bunch of kernel
>>>>>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>>>>>
>>>>>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>>>>>> *vdev, char __user *buf,
>>>>>> vfio_pci_core.h:                                  char __user *buf,
>>>>>> size_t count,
>>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>>> vfio_pci_core.h:                                      char __user
>>>>>> *buf, size_t count,
>>>>>> vfio_pci_core.h:                                void __user *arg,
>>>>>> size_t argsz);
>>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>>>>>> *core_vdev, char __user *buf,
>>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>>>>>> *core_vdev, const char __user *buf,
>>>>>> vringh.h:                    vring_desc_t __user *desc,
>>>>>> vringh.h:                    vring_avail_t __user *avail,
>>>>>> vringh.h:                    vring_used_t __user *used);
>>>>>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>>>>>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>>>>>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>>>>>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>>>>>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>>>>>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>>>>>
>>>>>> You can see, we will not able to simply replace __user
>>>>>> with [[clang::btf_type_tag("user")]] because it won't work
>>>>>> according to c2x expectations.
>>>>
>>>> Hi,
>>>>
>>>> Thanks for checking this. I see that we probably cannot use the c2x
>>>> syntax in the kernel, since it will not work as a drop-in replacement
>>>> for the current uses.
>>>>
>>>>>
>>>>> Hi Yongsong.
>>>>>
>>>>> I am a bit confused regarding the GNU attributes problem: our patch
>>>>> supports it, but as David already noted:
>>>>>
>>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>>> that I have been concerned about:
>>>>>>>>>
>>>>>>>>>      int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>
>>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>>
>>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>>> than the type to which it points.
>>>>>
>>>>> Note the example he uses is:
>>>>>
>>>>>     (a) int __typetag1 * __typetag2 __typetag3 * g;
>>>>>
>>>>> Not
>>>>>
>>>>>     (b) int * __typetag1 * __typetag2 __typetag3 g;
>>>>>
>>>>> Apparently for (a) clang is generating DWARF that associates __typetag2
>>>>> and__typetag3 to g's type (the pointer to pointer) instead of the
>>>>> pointer to int, which contravenes the GNU syntax rules.
>>>>>
>>>>> AFAIK thats is where the DWARF we generate differs, and what is blocking
>>>>> us.  David will correct me in the likely case I'm wrong :)
>>>>
>>>> Right. This is what I hoped maybe the C2x syntax could resolve.
>>>>
>>>> The issue I saw is that in the case (a) above, when using the GNU
>>>> attribute syntax, GCC and clang produce different results. I think that
>>>> the underlying cause is some subtle difference in how clang is handling
>>>> the GNU attribute syntax in the case compared to GCC.
>>>>
>>>>
>>>> To remind ourselves, here is the full example. Notice the significant
>>>> difference in which objects the tags are associated with in DWARF.
>>>>
>>>>
>>>> #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>>> #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>>> #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>>>
>>>> int __typetag1 * __typetag2 __typetag3 * g;
>>>>
>>>>
>>>> GCC: bpf-unknown-none-gcc -c -gdwarf -gbtf annotate.c
>>>>
>>>> 0x0000000c: DW_TAG_compile_unit
>>>>                 DW_AT_producer	("GNU C17 12.0.1 20220401 (experimental) -gdwarf -gbtf")
>>>>                 DW_AT_language	(DW_LANG_C11)
>>>>                 DW_AT_name	("annotate.c")
>>>>                 DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>>                 DW_AT_stmt_list	(0x00000000)
>>>>
>>>> 0x0000001e:   DW_TAG_variable
>>>>                   DW_AT_name	("g")
>>>>                   DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>>                   DW_AT_decl_line	(11)
>>>>                   DW_AT_decl_column	(0x2a)
>>>>                   DW_AT_type	(0x00000032 "int **")
>>>>                   DW_AT_external	(true)
>>>>                   DW_AT_location	(DW_OP_addr 0x0)
>>>>
>>>> 0x00000032:   DW_TAG_pointer_type
>>>>                   DW_AT_byte_size	(8)
>>>>                   DW_AT_type	(0x00000045 "int *")
>>>>                   DW_AT_sibling	(0x00000045)
>>>>
>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name	("btf_type_tag")
>>>>                     DW_AT_const_value	("type-tag-1")
>>>>
>>>> 0x00000044:     NULL
>>>>
>>>> 0x00000045:   DW_TAG_pointer_type
>>>>                   DW_AT_byte_size	(8)
>>>>                   DW_AT_type	(0x00000061 "int")
>>>>                   DW_AT_sibling	(0x00000061)
>>>>
>>>> 0x0000004e:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name	("btf_type_tag")
>>>>                     DW_AT_const_value	("type-tag-3")
>>>>
>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name	("btf_type_tag")
>>>>                     DW_AT_const_value	("type-tag-2")
>>>>
>>>> 0x00000060:     NULL
>>>>
>>>> 0x00000061:   DW_TAG_base_type
>>>>                   DW_AT_byte_size	(0x04)
>>>>                   DW_AT_encoding	(DW_ATE_signed)
>>>>                   DW_AT_name	("int")
>>>>
>>>> 0x00000068:   NULL
>>>
>>> do you have documentation to show why gnu generates attribute this way?
>>> If dwarf generates
>>>       ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>>> does this help?
>>
>> Okay, I think I see the problem. The internal representations between clang
>> and GCC attach the attributes to different nodes, and as a result they
>> produce different DWARF:
>>
>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>> annotations: !10)
>> !6 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64,
>> annotations: !8)
>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>> !8 = !{!9}
>> !9 = !{!"btf_type_tag", !"tag1"}
>> !10 = !{!11, !12}
>> !11 = !{!"btf_type_tag", !"tag2"}
>> !12 = !{!"btf_type_tag", !"tag3"}
>>
>> If I am reading this IR right, then the tags "tag2" and "tag3" are being
>> applied to the int**, and "tag1" is applied to the int*
>>
>> But I don't think this lines up with how the attribute syntax is defined.
>> See
>>    https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
>> In particular the "All other attributes" section. (It's a bit dense).
>> Or, as Joseph summed it up nicely earlier in the thread:
>>> In either syntax, __typetag2 __typetag3 should apply to
>>> the type to which g points, not to g or its type, just as if you had a
>>> type qualifier there.  You'd need to put the attributes (or qualifier)
>>> after the *, not before, to make them apply to the pointer type.
>>
>>
>> Compare that to GCC's internal representation, from which DWARF is generated:
>>
>>   <var_decl 0x7ffff7535090 g
>>      type <pointer_type 0x7ffff74f8888
>>          type <pointer_type 0x7ffff74f8b28 type <integer_type 0x7ffff74385e8 int>
>>              unsigned DI
>>              size <integer_cst 0x7ffff742b450 constant 64>
>>              unit-size <integer_cst 0x7ffff742b468 constant 8>
>>              align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff743f888
>>              attributes <tree_list 0x7ffff75165c8
>>                  purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>                  value <tree_list 0x7ffff7516550
>>                      value <string_cst 0x7ffff75182e0 type <array_type 0x7ffff74f8738>
>>                          readonly constant static "type-tag-3\000">>
>>                  chain <tree_list 0x7ffff75165a0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>                      value <tree_list 0x7ffff75164d8
>>                          value <string_cst 0x7ffff75182c0 type <array_type 0x7ffff74f8738>
>>                              readonly constant static "type-tag-2\000">>>>
>>              pointer_to_this <pointer_type 0x7ffff74f8bd0>>
>>          unsigned DI size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>>          align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74f87e0
>>          attributes <tree_list 0x7ffff75165f0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>              value <tree_list 0x7ffff7516438
>>                  value <string_cst 0x7ffff75182a0 type <array_type 0x7ffff74f8738>
>>                      readonly constant static "type-tag-1\000">>>>
>>      public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/annotate.c:10:42 size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>>      align:64 warn_if_not_align:0>
>>
>> See how tags "tag2" and "tag3" are associated with the pointer_type 0x7ffff74f8b28,
>> that is, "the type to which g points"
>>
>>  From GCC's DWARF the BTF we get currently looks like:
>>    VAR(g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>> which is obviously quite different and why this case caught my attention.
>>
>> I think this difference is the root of our problems. It might not be
>> specifically related to the BTF tag attributes but they do reveal some
>> discrepency between how clang and GCC handle the attribute syntax.
> 
> The btf_type attribute is very similar to address_space attribute.
> For example,
> $ cat t1.c
> int __attribute__((address_space(1))) * p;
> $ clang -g -S -emit-llvm t1.c
> 
> In IR, we will have
> @p = dso_local global ptr addrspace(1) null, align 8, !dbg !0
> ...
> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
> 
> Replacing address_space with btf_type_tag, we will get
> ptr->type_tag->int in debuginfo.
> 
> But it looks like gcc doesn't support address_space attribute
> 
> $ gcc -g -S t1.c
> t1.c:1:1: warning: ‘address_space’ attribute directive ignored 
> [-Wattributes]
>   int __attribute__((address_space(1))) * p;
>   ^~~
> 
> Is it possible for gcc to go with address_space attribute
> semantics for btf_type_tag attribute?

In cases like this the behavior is the same.
$ cat foo.c
int __attribute__((btf_type_tag("tag1"))) * p;
$ gcc -c -gdwarf -gbtf foo.c

Internally:
 <var_decl 0x7ffff743abd0 p
    type <pointer_type 0x7ffff7590150
        type <integer_type 0x7ffff74475e8 int public SI
            size <integer_cst 0x7ffff742bf90 constant 32>
            unit-size <integer_cst 0x7ffff742bfa8 constant 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74475e8 precision:32 min <integer_cst 0x7ffff742bf48 -2147483648> max <integer_cst 0x7ffff742bf60 2147483647>
            pointer_to_this <pointer_type 0x7ffff744fa80>>
        unsigned DI
        size <integer_cst 0x7ffff742bd50 constant 64>
        unit-size <integer_cst 0x7ffff742bd68 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff744fa80
        attributes <tree_list 0x7ffff7564d70
            purpose <identifier_node 0x7ffff757f2d0 btf_type_tag>
            value <tree_list 0x7ffff7564cf8
                value <string_cst 0x7ffff757c220 type <array_type 0x7ffff75900a8>
                    readonly constant static "tag1\000">>>>
    public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/foo.c:1:45 size <integer_cst 0x7ffff742bd50 64> unit-size <integer_cst 0x7ffff742bd68 8>
    align:64 warn_if_not_align:0>

And the resulting BTF:

[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] PTR '(anon)' type_id=3
[3] TYPE_TAG 'tag1' type_id=1
[4] VAR 'p' type_id=2, linkage=global
[5] DATASEC '.bss' size=0 vlen=1
	type_id=4 offset=0 size=8 (VAR 'p')

var(p) -> ptr -> type_tag -> int


> 
>>
>>>
>>>>
>>>>
>>>> clang: clang -target bpf -c -g annotate.c
>>>>
>>>> 0x0000000c: DW_TAG_compile_unit
>>>>                 DW_AT_producer	("clang version 15.0.0 (https://github.com/llvm/llvm-project.git f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>                 DW_AT_language	(DW_LANG_C99)
>>>>                 DW_AT_name	("annotate.c")
>>>>                 DW_AT_str_offsets_base	(0x00000008)
>>>>                 DW_AT_stmt_list	(0x00000000)
>>>>                 DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>>                 DW_AT_addr_base	(0x00000008)
>>>>
>>>> 0x0000001e:   DW_TAG_variable
>>>>                   DW_AT_name	("g")
>>>>                   DW_AT_type	(0x00000029 "int **")
>>>>                   DW_AT_external	(true)
>>>>                   DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>>                   DW_AT_decl_line	(11)
>>>>                   DW_AT_location	(DW_OP_addrx 0x0)
>>>>
>>>> 0x00000029:   DW_TAG_pointer_type
>>>>                   DW_AT_type	(0x00000035 "int *")
>>>>
>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name	("btf_type_tag")
>>>>                     DW_AT_const_value	("type-tag-2")
>>>>
>>>> 0x00000031:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name	("btf_type_tag")
>>>>                     DW_AT_const_value	("type-tag-3")
>>>>
>>>> 0x00000034:     NULL
>>>>
>>>> 0x00000035:   DW_TAG_pointer_type
>>>>                   DW_AT_type	(0x0000003e "int")
>>>>
>>>> 0x0000003a:     DW_TAG_LLVM_annotation
>>>>                     DW_AT_name	("btf_type_tag")
>>>>                     DW_AT_const_value	("type-tag-1")
>>>>
>>>> 0x0000003d:     NULL
>>>>
>>>> 0x0000003e:   DW_TAG_base_type
>>>>                   DW_AT_name	("int")
>>>>                   DW_AT_encoding	(DW_ATE_signed)
>>>>                   DW_AT_byte_size	(0x04)
>>>>
>>>> 0x00000042:   NULL
>>>>
>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-05-27 19:56                               ` David Faust
@ 2022-06-03  2:04                                 ` Yonghong Song
  2022-06-07 21:42                                   ` David Faust
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2022-06-03  2:04 UTC (permalink / raw)
  To: David Faust; +Cc: Jose E. Marchesi, Joseph Myers, Yonghong Song via Gcc-patches



On 5/27/22 12:56 PM, David Faust wrote:
> 
> 
> On 5/26/22 00:29, Yonghong Song wrote:
>>
>>
>> On 5/24/22 10:04 AM, David Faust wrote:
>>>
>>>
>>> On 5/24/22 09:03, Yonghong Song wrote:
>>>>
>>>>
>>>> On 5/24/22 8:53 AM, David Faust wrote:
>>>>>
>>>>>
>>>>> On 5/24/22 04:07, Jose E. Marchesi wrote:
>>>>>>
>>>>>>> On 5/11/22 11:44 AM, David Faust wrote:
>>>>>>>>
>>>>>>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Consider the following example:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>>>>>>          #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>>>>>>          #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>>>>>>> 'tag2' and
>>>>>>>>>>>>>>> 'tag3',
>>>>>>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>>>>>>> C2x [[]]
>>>>>>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>>>>>>> apply to
>>>>>>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>>>>>>> you had a
>>>>>>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>>>>>>> qualifier)
>>>>>>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>>>>>>> type.  See
>>>>>>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>>>>>>> defined for
>>>>>>>>>>>>>> GNU
>>>>>>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>>>>>>> matching
>>>>>>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>>>>>>> "T D"
>>>>>>>>>>>>>> for a sub-declarator D.
>>>>>>>>>>>>>>       >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>>>>>>> which is "a
>>>>>>>>>>>>> pointer with
>>>>>>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>>>>>>> int", i.e.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>>>>>>> pointer
>>>>>>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>>>>>>> attribute
>>>>>>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>>>>>>> the syntax
>>>>>>>>>>>>> was not correct.
>>>>>>>>>>>>>
>>>>>>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>>>>>>> why it is
>>>>>>>>>>>>> incorrect is the same.)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>>>>>>> other users of the tags?
>>>>>>>>>>>>
>>>>>>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>>>>>>
>>>>>>>>>>>> $ cat t.c
>>>>>>>>>>>> int const *ptr;
>>>>>>>>>>>>
>>>>>>>>>>>> the llvm ir debuginfo:
>>>>>>>>>>>>
>>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>>
>>>>>>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>>>>>>
>>>>>>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>>>>>>
>>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>>>>>>> annotations: !7)
>>>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>> !7 = !{!8}
>>>>>>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>>>>>>
>>>>>>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>>>>>>> totally okay to have IR looks like
>>>>>>>>>>>>
>>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>>
>>>>>>>>>>> OK, thanks.
>>>>>>>>>>>
>>>>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>>>>> that I have been concerned about:
>>>>>>>>>>>
>>>>>>>>>>>        int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>
>>>>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>>>>
>>>>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>>>>> than the type to which it points.
>>>>>>>>>>>
>>>>>>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>>>>>>> is very important, but it is worth noting.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>>>>>>> which construct the tag applies.
>>>>>>>>>>>
>>>>>>>>>>> This would also be a way around any issues in handling the attributes
>>>>>>>>>>> due to the GNU syntax.
>>>>>>>>>>>
>>>>>>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>>>>>>> fully supported in clang yet?
>>>>>>>>>>
>>>>>>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>>>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>>>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>>>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>>>>>>> cannot use c2x syntax.
>>>>>>>>>
>>>>>>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>>>>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>>>>>>> with that syntax and llvm15 to see what is the issue and may help
>>>>>>>>> fix it in clang if possible.
>>>>>>>>
>>>>>>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>>>>>>> syntax makes it quite clear to which entity each attribute applies,
>>>>>>>> and in my opinion is a little more intuitive/less surprising too.
>>>>>>>> It's documented here (PDF):
>>>>>>>>       https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
>>>>>>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>>>>>>> declarations. Section 6.7.6.1 specifically describes using the
>>>>>>>> attribute syntax with pointer declarators.
>>>>>>>> The attribute syntax itself for BTF tags is:
>>>>>>>>       [[clang::btf_type_tag("tag1")]]
>>>>>>>> or
>>>>>>>>       [[gnu::btf_type_tag("tag1")]]
>>>>>>>>
>>>>>>>> I am also looking into whether, with the C2x syntax, we really need two
>>>>>>>> separate attributes (type_tag and decl_tag) at the language
>>>>>>>> level. It might be possible with C2x syntax to use just one language
>>>>>>>> attribute (e.g. just btf_tag).
>>>>>>>>
>>>>>>>> A simple declaration for a tagged pointer to an int:
>>>>>>>>       int * [[gnu::btf_type_tag("tag1")]] x;
>>>>>>>> And for the example from this thread:
>>>>>>>>       #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>>>>>>       #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>>>>>>       #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>>>>>>       int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>>> Here each tag applies to the preceding pointer, so the result is
>>>>>>>> unsurprising.
>>>>>>>> Actually, this is where I found something that looks like an issue
>>>>>>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>>>>>>> but with no warning nor other indication.
>>>>>>>> Compiling this example with gcc:
>>>>>>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>>>>>>> -o c2x.o --std=c2x
>>>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>>>                   DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>>>>>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>>>>>>                   DW_AT_language    (DW_LANG_C11)
>>>>>>>>                   DW_AT_name    ("c2x.c")
>>>>>>>>                   DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>>>                   DW_AT_stmt_list    (0x00000000)
>>>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>>>                     DW_AT_name    ("g")
>>>>>>>>                     DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>>>                     DW_AT_decl_line    (16)
>>>>>>>>                     DW_AT_decl_column    (0x2a)
>>>>>>>>                     DW_AT_type    (0x00000032 "int **")
>>>>>>>>                     DW_AT_external    (true)
>>>>>>>>                     DW_AT_location    (DW_OP_addr 0x0)
>>>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>>>                     DW_AT_byte_size    (8)
>>>>>>>>                     DW_AT_type    (0x0000004e "int *")
>>>>>>>>                     DW_AT_sibling    (0x0000004e)
>>>>>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>                       DW_AT_const_value    ("type-tag-3")
>>>>>>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>                       DW_AT_const_value    ("type-tag-2")
>>>>>>>> 0x0000004d:     NULL
>>>>>>>> 0x0000004e:   DW_TAG_pointer_type
>>>>>>>>                     DW_AT_byte_size    (8)
>>>>>>>>                     DW_AT_type    (0x00000061 "int")
>>>>>>>>                     DW_AT_sibling    (0x00000061)
>>>>>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>                       DW_AT_const_value    ("type-tag-1")
>>>>>>>> 0x00000060:     NULL
>>>>>>>> 0x00000061:   DW_TAG_base_type
>>>>>>>>                     DW_AT_byte_size    (0x04)
>>>>>>>>                     DW_AT_encoding    (DW_ATE_signed)
>>>>>>>>                     DW_AT_name    ("int")
>>>>>>>> 0x00000068:   NULL
>>>>>>>>
>>>>>>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>>>>>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>>>>>>> --std=c2x
>>>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>>>                   DW_AT_producer    ("clang version 15.0.0
>>>>>>>> (https://github.com/llvm/llvm-project.git
>>>>>>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>>>>>                   DW_AT_language    (DW_LANG_C99)
>>>>>>>>                   DW_AT_name    ("c2x.c")
>>>>>>>>                   DW_AT_str_offsets_base    (0x00000008)
>>>>>>>>                   DW_AT_stmt_list    (0x00000000)
>>>>>>>>                   DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>>>                   DW_AT_addr_base    (0x00000008)
>>>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>>>                     DW_AT_name    ("g")
>>>>>>>>                     DW_AT_type    (0x00000029 "int **")
>>>>>>>>                     DW_AT_external    (true)
>>>>>>>>                     DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>>>                     DW_AT_decl_line    (12)
>>>>>>>>                     DW_AT_location    (DW_OP_addrx 0x0)
>>>>>>>> 0x00000029:   DW_TAG_pointer_type
>>>>>>>>                     DW_AT_type    (0x00000032 "int *")
>>>>>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>                       DW_AT_const_value    ("type-tag-1")
>>>>>>>> 0x00000031:     NULL
>>>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>>>                     DW_AT_type    (0x00000037 "int")
>>>>>>>> 0x00000037:   DW_TAG_base_type
>>>>>>>>                     DW_AT_name    ("int")
>>>>>>>>                     DW_AT_encoding    (DW_ATE_signed)
>>>>>>>>                     DW_AT_byte_size    (0x04)
>>>>>>>> 0x0000003b:   NULL
>>>>>>>
>>>>>>> Thanks. I checked with current clang. The generated code looks
>>>>>>> like above. Basically, for code like below
>>>>>>>
>>>>>>>       #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>>>>>>       #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>>>>>>       #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>>>>>>
>>>>>>>       int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>>
>>>>>>> The IR type looks like
>>>>>>>       __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>>>>>>
>>>>>>> The IR is similar to what we did if using
>>>>>>> __attribute__((btf_type_tag(""))), but their
>>>>>>> semantic interpretation is quite different.
>>>>>>> For example, with c2x format,
>>>>>>>       __typetag1 applies to ptr2
>>>>>>> with __attribute__ format, it applies pointee of ptr1.
>>>>>>>
>>>>>>> But more importantly, c2x format is incompatible with
>>>>>>> the usage of linux kernel. The following are a bunch of kernel
>>>>>>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>>>>>>
>>>>>>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>>>>>>> *vdev, char __user *buf,
>>>>>>> vfio_pci_core.h:                                  char __user *buf,
>>>>>>> size_t count,
>>>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>>>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>>>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>>>> vfio_pci_core.h:                                      char __user
>>>>>>> *buf, size_t count,
>>>>>>> vfio_pci_core.h:                                void __user *arg,
>>>>>>> size_t argsz);
>>>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>>>>>>> *core_vdev, char __user *buf,
>>>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>>>>>>> *core_vdev, const char __user *buf,
>>>>>>> vringh.h:                    vring_desc_t __user *desc,
>>>>>>> vringh.h:                    vring_avail_t __user *avail,
>>>>>>> vringh.h:                    vring_used_t __user *used);
>>>>>>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>>>>>>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>>>>>>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>>>>>>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>>>>>>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>>>>>>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>>>>>>
>>>>>>> You can see, we will not able to simply replace __user
>>>>>>> with [[clang::btf_type_tag("user")]] because it won't work
>>>>>>> according to c2x expectations.
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for checking this. I see that we probably cannot use the c2x
>>>>> syntax in the kernel, since it will not work as a drop-in replacement
>>>>> for the current uses.
>>>>>
>>>>>>
>>>>>> Hi Yongsong.
>>>>>>
>>>>>> I am a bit confused regarding the GNU attributes problem: our patch
>>>>>> supports it, but as David already noted:
>>>>>>
>>>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>>>> that I have been concerned about:
>>>>>>>>>>
>>>>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>
>>>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>>>
>>>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>>>> than the type to which it points.
>>>>>>
>>>>>> Note the example he uses is:
>>>>>>
>>>>>>      (a) int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>
>>>>>> Not
>>>>>>
>>>>>>      (b) int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>
>>>>>> Apparently for (a) clang is generating DWARF that associates __typetag2
>>>>>> and__typetag3 to g's type (the pointer to pointer) instead of the
>>>>>> pointer to int, which contravenes the GNU syntax rules.
>>>>>>
>>>>>> AFAIK thats is where the DWARF we generate differs, and what is blocking
>>>>>> us.  David will correct me in the likely case I'm wrong :)
>>>>>
>>>>> Right. This is what I hoped maybe the C2x syntax could resolve.
>>>>>
>>>>> The issue I saw is that in the case (a) above, when using the GNU
>>>>> attribute syntax, GCC and clang produce different results. I think that
>>>>> the underlying cause is some subtle difference in how clang is handling
>>>>> the GNU attribute syntax in the case compared to GCC.
>>>>>
>>>>>
>>>>> To remind ourselves, here is the full example. Notice the significant
>>>>> difference in which objects the tags are associated with in DWARF.
>>>>>
>>>>>
>>>>> #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>>>> #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>>>> #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>>>>
>>>>> int __typetag1 * __typetag2 __typetag3 * g;
>>>>>
>>>>>
>>>>> GCC: bpf-unknown-none-gcc -c -gdwarf -gbtf annotate.c
>>>>>
>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>                  DW_AT_producer	("GNU C17 12.0.1 20220401 (experimental) -gdwarf -gbtf")
>>>>>                  DW_AT_language	(DW_LANG_C11)
>>>>>                  DW_AT_name	("annotate.c")
>>>>>                  DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>>>                  DW_AT_stmt_list	(0x00000000)
>>>>>
>>>>> 0x0000001e:   DW_TAG_variable
>>>>>                    DW_AT_name	("g")
>>>>>                    DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>>>                    DW_AT_decl_line	(11)
>>>>>                    DW_AT_decl_column	(0x2a)
>>>>>                    DW_AT_type	(0x00000032 "int **")
>>>>>                    DW_AT_external	(true)
>>>>>                    DW_AT_location	(DW_OP_addr 0x0)
>>>>>
>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>                    DW_AT_byte_size	(8)
>>>>>                    DW_AT_type	(0x00000045 "int *")
>>>>>                    DW_AT_sibling	(0x00000045)
>>>>>
>>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>                      DW_AT_const_value	("type-tag-1")
>>>>>
>>>>> 0x00000044:     NULL
>>>>>
>>>>> 0x00000045:   DW_TAG_pointer_type
>>>>>                    DW_AT_byte_size	(8)
>>>>>                    DW_AT_type	(0x00000061 "int")
>>>>>                    DW_AT_sibling	(0x00000061)
>>>>>
>>>>> 0x0000004e:     DW_TAG_LLVM_annotation
>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>                      DW_AT_const_value	("type-tag-3")
>>>>>
>>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>                      DW_AT_const_value	("type-tag-2")
>>>>>
>>>>> 0x00000060:     NULL
>>>>>
>>>>> 0x00000061:   DW_TAG_base_type
>>>>>                    DW_AT_byte_size	(0x04)
>>>>>                    DW_AT_encoding	(DW_ATE_signed)
>>>>>                    DW_AT_name	("int")
>>>>>
>>>>> 0x00000068:   NULL
>>>>
>>>> do you have documentation to show why gnu generates attribute this way?
>>>> If dwarf generates
>>>>        ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>>>> does this help?
>>>
>>> Okay, I think I see the problem. The internal representations between clang
>>> and GCC attach the attributes to different nodes, and as a result they
>>> produce different DWARF:
>>>
>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>> annotations: !10)
>>> !6 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64,
>>> annotations: !8)
>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>> !8 = !{!9}
>>> !9 = !{!"btf_type_tag", !"tag1"}
>>> !10 = !{!11, !12}
>>> !11 = !{!"btf_type_tag", !"tag2"}
>>> !12 = !{!"btf_type_tag", !"tag3"}
>>>
>>> If I am reading this IR right, then the tags "tag2" and "tag3" are being
>>> applied to the int**, and "tag1" is applied to the int*
>>>
>>> But I don't think this lines up with how the attribute syntax is defined.
>>> See
>>>     https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
>>> In particular the "All other attributes" section. (It's a bit dense).
>>> Or, as Joseph summed it up nicely earlier in the thread:
>>>> In either syntax, __typetag2 __typetag3 should apply to
>>>> the type to which g points, not to g or its type, just as if you had a
>>>> type qualifier there.  You'd need to put the attributes (or qualifier)
>>>> after the *, not before, to make them apply to the pointer type.
>>>
>>>
>>> Compare that to GCC's internal representation, from which DWARF is generated:
>>>
>>>    <var_decl 0x7ffff7535090 g
>>>       type <pointer_type 0x7ffff74f8888
>>>           type <pointer_type 0x7ffff74f8b28 type <integer_type 0x7ffff74385e8 int>
>>>               unsigned DI
>>>               size <integer_cst 0x7ffff742b450 constant 64>
>>>               unit-size <integer_cst 0x7ffff742b468 constant 8>
>>>               align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff743f888
>>>               attributes <tree_list 0x7ffff75165c8
>>>                   purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>>                   value <tree_list 0x7ffff7516550
>>>                       value <string_cst 0x7ffff75182e0 type <array_type 0x7ffff74f8738>
>>>                           readonly constant static "type-tag-3\000">>
>>>                   chain <tree_list 0x7ffff75165a0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>>                       value <tree_list 0x7ffff75164d8
>>>                           value <string_cst 0x7ffff75182c0 type <array_type 0x7ffff74f8738>
>>>                               readonly constant static "type-tag-2\000">>>>
>>>               pointer_to_this <pointer_type 0x7ffff74f8bd0>>
>>>           unsigned DI size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>>>           align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74f87e0
>>>           attributes <tree_list 0x7ffff75165f0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>>               value <tree_list 0x7ffff7516438
>>>                   value <string_cst 0x7ffff75182a0 type <array_type 0x7ffff74f8738>
>>>                       readonly constant static "type-tag-1\000">>>>
>>>       public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/annotate.c:10:42 size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>>>       align:64 warn_if_not_align:0>
>>>
>>> See how tags "tag2" and "tag3" are associated with the pointer_type 0x7ffff74f8b28,
>>> that is, "the type to which g points"
>>>
>>>   From GCC's DWARF the BTF we get currently looks like:
>>>     VAR(g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>>> which is obviously quite different and why this case caught my attention.
>>>
>>> I think this difference is the root of our problems. It might not be
>>> specifically related to the BTF tag attributes but they do reveal some
>>> discrepency between how clang and GCC handle the attribute syntax.
>>
>> The btf_type attribute is very similar to address_space attribute.
>> For example,
>> $ cat t1.c
>> int __attribute__((address_space(1))) * p;
>> $ clang -g -S -emit-llvm t1.c
>>
>> In IR, we will have
>> @p = dso_local global ptr addrspace(1) null, align 8, !dbg !0
>> ...
>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>
>> Replacing address_space with btf_type_tag, we will get
>> ptr->type_tag->int in debuginfo.
>>
>> But it looks like gcc doesn't support address_space attribute
>>
>> $ gcc -g -S t1.c
>> t1.c:1:1: warning: ‘address_space’ attribute directive ignored
>> [-Wattributes]
>>    int __attribute__((address_space(1))) * p;
>>    ^~~
>>
>> Is it possible for gcc to go with address_space attribute
>> semantics for btf_type_tag attribute?
> 
> In cases like this the behavior is the same.
> $ cat foo.c
> int __attribute__((btf_type_tag("tag1"))) * p;
> $ gcc -c -gdwarf -gbtf foo.c
> 
> Internally:
>   <var_decl 0x7ffff743abd0 p
>      type <pointer_type 0x7ffff7590150
>          type <integer_type 0x7ffff74475e8 int public SI
>              size <integer_cst 0x7ffff742bf90 constant 32>
>              unit-size <integer_cst 0x7ffff742bfa8 constant 4>
>              align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74475e8 precision:32 min <integer_cst 0x7ffff742bf48 -2147483648> max <integer_cst 0x7ffff742bf60 2147483647>
>              pointer_to_this <pointer_type 0x7ffff744fa80>>
>          unsigned DI
>          size <integer_cst 0x7ffff742bd50 constant 64>
>          unit-size <integer_cst 0x7ffff742bd68 constant 8>
>          align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff744fa80
>          attributes <tree_list 0x7ffff7564d70
>              purpose <identifier_node 0x7ffff757f2d0 btf_type_tag>
>              value <tree_list 0x7ffff7564cf8
>                  value <string_cst 0x7ffff757c220 type <array_type 0x7ffff75900a8>
>                      readonly constant static "tag1\000">>>>
>      public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/foo.c:1:45 size <integer_cst 0x7ffff742bd50 64> unit-size <integer_cst 0x7ffff742bd68 8>
>      align:64 warn_if_not_align:0>
> 
> And the resulting BTF:
> 
> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> [2] PTR '(anon)' type_id=3
> [3] TYPE_TAG 'tag1' type_id=1
> [4] VAR 'p' type_id=2, linkage=global
> [5] DATASEC '.bss' size=0 vlen=1
> 	type_id=4 offset=0 size=8 (VAR 'p')
> 
> var(p) -> ptr -> type_tag -> int

It would be good if we can generate similar encoding in dwarf.
Currently in clang, we generate
     var(p) -> ptr (type_tag) -> int
but I am open to generate
     var(p) -> ptr -> type_tag -> int
in dwarf as well if it is possible.

> 
> 
>>
>>>
>>>>
>>>>>
>>>>>
>>>>> clang: clang -target bpf -c -g annotate.c
>>>>>
>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>                  DW_AT_producer	("clang version 15.0.0 (https://github.com/llvm/llvm-project.git f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>>                  DW_AT_language	(DW_LANG_C99)
>>>>>                  DW_AT_name	("annotate.c")
>>>>>                  DW_AT_str_offsets_base	(0x00000008)
>>>>>                  DW_AT_stmt_list	(0x00000000)
>>>>>                  DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>>>                  DW_AT_addr_base	(0x00000008)
>>>>>
>>>>> 0x0000001e:   DW_TAG_variable
>>>>>                    DW_AT_name	("g")
>>>>>                    DW_AT_type	(0x00000029 "int **")
>>>>>                    DW_AT_external	(true)
>>>>>                    DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>>>                    DW_AT_decl_line	(11)
>>>>>                    DW_AT_location	(DW_OP_addrx 0x0)
>>>>>
>>>>> 0x00000029:   DW_TAG_pointer_type
>>>>>                    DW_AT_type	(0x00000035 "int *")
>>>>>
>>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>                      DW_AT_const_value	("type-tag-2")
>>>>>
>>>>> 0x00000031:     DW_TAG_LLVM_annotation
>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>                      DW_AT_const_value	("type-tag-3")
>>>>>
>>>>> 0x00000034:     NULL
>>>>>
>>>>> 0x00000035:   DW_TAG_pointer_type
>>>>>                    DW_AT_type	(0x0000003e "int")
>>>>>
>>>>> 0x0000003a:     DW_TAG_LLVM_annotation
>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>                      DW_AT_const_value	("type-tag-1")
>>>>>
>>>>> 0x0000003d:     NULL
>>>>>
>>>>> 0x0000003e:   DW_TAG_base_type
>>>>>                    DW_AT_name	("int")
>>>>>                    DW_AT_encoding	(DW_ATE_signed)
>>>>>                    DW_AT_byte_size	(0x04)
>>>>>
>>>>> 0x00000042:   NULL
>>>>>
>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
  2022-06-03  2:04                                 ` Yonghong Song
@ 2022-06-07 21:42                                   ` David Faust
  0 siblings, 0 replies; 30+ messages in thread
From: David Faust @ 2022-06-07 21:42 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jose E. Marchesi, Joseph Myers, Yonghong Song via Gcc-patches



On 6/2/22 19:04, Yonghong Song wrote:
> 
> 
> On 5/27/22 12:56 PM, David Faust wrote:
>>
>>
>> On 5/26/22 00:29, Yonghong Song wrote:
>>>
>>>
>>> On 5/24/22 10:04 AM, David Faust wrote:
>>>>
>>>>
>>>> On 5/24/22 09:03, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 5/24/22 8:53 AM, David Faust wrote:
>>>>>>
>>>>>>
>>>>>> On 5/24/22 04:07, Jose E. Marchesi wrote:
>>>>>>>
>>>>>>>> On 5/11/22 11:44 AM, David Faust wrote:
>>>>>>>>>
>>>>>>>>> On 5/10/22 22:05, Yonghong Song wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/10/22 8:43 PM, Yonghong Song wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/6/22 2:18 PM, David Faust wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 5/5/22 16:00, Yonghong Song wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/4/22 10:03 AM, David Faust wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 5/3/22 15:32, Joseph Myers wrote:
>>>>>>>>>>>>>>> On Mon, 2 May 2022, David Faust via Gcc-patches wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Consider the following example:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>          #define __typetag1 __attribute__((btf_type_tag("tag1")))
>>>>>>>>>>>>>>>>          #define __typetag2 __attribute__((btf_type_tag("tag2")))
>>>>>>>>>>>>>>>>          #define __typetag3 __attribute__((btf_type_tag("tag3")))
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>          int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The expected behavior is that 'g' is "a pointer with tags
>>>>>>>>>>>>>>>> 'tag2' and
>>>>>>>>>>>>>>>> 'tag3',
>>>>>>>>>>>>>>>> to a pointer with tag 'tag1' to an int". i.e.:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's not a correct expectation for either GNU __attribute__ or
>>>>>>>>>>>>>>> C2x [[]]
>>>>>>>>>>>>>>> attribute syntax.  In either syntax, __typetag2 __typetag3 should
>>>>>>>>>>>>>>> apply to
>>>>>>>>>>>>>>> the type to which g points, not to g or its type, just as if
>>>>>>>>>>>>>>> you had a
>>>>>>>>>>>>>>> type qualifier there.  You'd need to put the attributes (or
>>>>>>>>>>>>>>> qualifier)
>>>>>>>>>>>>>>> after the *, not before, to make them apply to the pointer
>>>>>>>>>>>>>>> type.  See
>>>>>>>>>>>>>>> "Attribute Syntax" in the GCC manual for how the syntax is
>>>>>>>>>>>>>>> defined for
>>>>>>>>>>>>>>> GNU
>>>>>>>>>>>>>>> attributes and deduce in turn, for each subsequence of the tokens
>>>>>>>>>>>>>>> matching
>>>>>>>>>>>>>>> the syntax for some kind of declarator, what the type for "T D1"
>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>> as defined there and in the C standard, as deduced from the type for
>>>>>>>>>>>>>>> "T D"
>>>>>>>>>>>>>>> for a sub-declarator D.
>>>>>>>>>>>>>>>       >> But GCC's attribute parsing produces a variable 'g'
>>>>>>>>>>>>>>> which is "a
>>>>>>>>>>>>>> pointer with
>>>>>>>>>>>>>>>> tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an
>>>>>>>>>>>>>>>> int", i.e.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In GNU syntax, __typetag1 applies to the declaration, whereas in C2x
>>>>>>>>>>>>>>> syntax it applies to int.  Again, if you wanted it to apply to the
>>>>>>>>>>>>>>> pointer
>>>>>>>>>>>>>>> type it would need to go after the * not before.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you are concerned with the fine details of what construct an
>>>>>>>>>>>>>>> attribute
>>>>>>>>>>>>>>> appertains to, I recommend using C2x syntax not GNU syntax.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Joseph, thank you! This is very helpful. My understanding of
>>>>>>>>>>>>>> the syntax
>>>>>>>>>>>>>> was not correct.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Actually, I made a bad mistake in paraphrasing this example from the
>>>>>>>>>>>>>> discussion of it in the series cover letter. But, the reason
>>>>>>>>>>>>>> why it is
>>>>>>>>>>>>>> incorrect is the same.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yonghong, is the specific ordering an expectation in BPF programs or
>>>>>>>>>>>>>> other users of the tags?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is probably a language writing issue. We are saying tags only
>>>>>>>>>>>>> apply to pointer. We probably should say it only apply to pointee.
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ cat t.c
>>>>>>>>>>>>> int const *ptr;
>>>>>>>>>>>>>
>>>>>>>>>>>>> the llvm ir debuginfo:
>>>>>>>>>>>>>
>>>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>>>>>>>>>>>> !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7)
>>>>>>>>>>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>>>
>>>>>>>>>>>>> We could replace 'const' with a tag like below:
>>>>>>>>>>>>>
>>>>>>>>>>>>> int __attribute__((btf_type_tag("tag"))) *ptr;
>>>>>>>>>>>>>
>>>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>>>>>>>>>>> annotations: !7)
>>>>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>>> !7 = !{!8}
>>>>>>>>>>>>> !8 = !{!"btf_type_tag", !"tag"}
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the above IR, we generate annotations to pointer_type because
>>>>>>>>>>>>> we didn't invent a new DI type for encode btf_type_tag. But it is
>>>>>>>>>>>>> totally okay to have IR looks like
>>>>>>>>>>>>>
>>>>>>>>>>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64)
>>>>>>>>>>>>> !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag")
>>>>>>>>>>>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>>>>>>>>>>>
>>>>>>>>>>>> OK, thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>>>>>> that I have been concerned about:
>>>>>>>>>>>>
>>>>>>>>>>>>        int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>>
>>>>>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>>>>>
>>>>>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>>>>>> than the type to which it points.
>>>>>>>>>>>>
>>>>>>>>>>>> I am not sure whether for the use purposes of the tags this difference
>>>>>>>>>>>> is very important, but it is worth noting.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> As Joseph suggested, it may be better to encourage users of these tags
>>>>>>>>>>>> to use the C2x attribute syntax if they are concerned with precisely
>>>>>>>>>>>> which construct the tag applies.
>>>>>>>>>>>>
>>>>>>>>>>>> This would also be a way around any issues in handling the attributes
>>>>>>>>>>>> due to the GNU syntax.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried a few test cases using C2x syntax BTF type tags with a
>>>>>>>>>>>> clang-15 build, but ran into some issues (in particular, some of the
>>>>>>>>>>>> tag attributes being ignored altogether). I couldn't find confirmation
>>>>>>>>>>>> whether C2x attribute syntax is fully supported in clang yet, so maybe
>>>>>>>>>>>> this isn't expected to work. Do you know whether the C2x syntax is
>>>>>>>>>>>> fully supported in clang yet?
>>>>>>>>>>>
>>>>>>>>>>> Actually, I don't know either. But since the btf decl_tag and type_tag
>>>>>>>>>>> are also used to compile linux kernel and the minimum compiler version
>>>>>>>>>>> to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1
>>>>>>>>>>> supports c2x or not, I guess probably not. So I think we most likely
>>>>>>>>>>> cannot use c2x syntax.
>>>>>>>>>>
>>>>>>>>>> Okay, I think we can guard btf_tag's with newer compiler versions.
>>>>>>>>>> What kind of c2x syntax you intend to use? I can help compile kernel
>>>>>>>>>> with that syntax and llvm15 to see what is the issue and may help
>>>>>>>>>> fix it in clang if possible.
>>>>>>>>>
>>>>>>>>> I am thinking to use the [[]] C2x standard attribute syntax. The
>>>>>>>>> syntax makes it quite clear to which entity each attribute applies,
>>>>>>>>> and in my opinion is a little more intuitive/less surprising too.
>>>>>>>>> It's documented here (PDF):
>>>>>>>>>       https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
>>>>>>>>> See sections 6.7.11 for the syntax and 6.7.6 for
>>>>>>>>> declarations. Section 6.7.6.1 specifically describes using the
>>>>>>>>> attribute syntax with pointer declarators.
>>>>>>>>> The attribute syntax itself for BTF tags is:
>>>>>>>>>       [[clang::btf_type_tag("tag1")]]
>>>>>>>>> or
>>>>>>>>>       [[gnu::btf_type_tag("tag1")]]
>>>>>>>>>
>>>>>>>>> I am also looking into whether, with the C2x syntax, we really need two
>>>>>>>>> separate attributes (type_tag and decl_tag) at the language
>>>>>>>>> level. It might be possible with C2x syntax to use just one language
>>>>>>>>> attribute (e.g. just btf_tag).
>>>>>>>>>
>>>>>>>>> A simple declaration for a tagged pointer to an int:
>>>>>>>>>       int * [[gnu::btf_type_tag("tag1")]] x;
>>>>>>>>> And for the example from this thread:
>>>>>>>>>       #define __typetag1 [[gnu::btf_type_tag("type-tag-1")]]
>>>>>>>>>       #define __typetag2 [[gnu::btf_type_tag("type-tag-2")]]
>>>>>>>>>       #define __typetag3 [[gnu::btf_type_tag("type-tag-3")]]
>>>>>>>>>       int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>>>> Here each tag applies to the preceding pointer, so the result is
>>>>>>>>> unsurprising.
>>>>>>>>> Actually, this is where I found something that looks like an issue
>>>>>>>>> with the C2x attribute syntax in clang. The tags 2 and 3 go missing,
>>>>>>>>> but with no warning nor other indication.
>>>>>>>>> Compiling this example with gcc:
>>>>>>>>> $ ~/toolchains/bpf/bin/bpf-unknown-none-gcc -c -gbtf -gdwarf c2x.c
>>>>>>>>> -o c2x.o --std=c2x
>>>>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o
>>>>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>>>>                   DW_AT_producer    ("GNU C2X 12.0.1 20220401
>>>>>>>>> (experimental) -gbtf -gdwarf -std=c2x")
>>>>>>>>>                   DW_AT_language    (DW_LANG_C11)
>>>>>>>>>                   DW_AT_name    ("c2x.c")
>>>>>>>>>                   DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>>>>                   DW_AT_stmt_list    (0x00000000)
>>>>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>>>>                     DW_AT_name    ("g")
>>>>>>>>>                     DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>>>>                     DW_AT_decl_line    (16)
>>>>>>>>>                     DW_AT_decl_column    (0x2a)
>>>>>>>>>                     DW_AT_type    (0x00000032 "int **")
>>>>>>>>>                     DW_AT_external    (true)
>>>>>>>>>                     DW_AT_location    (DW_OP_addr 0x0)
>>>>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>>>>                     DW_AT_byte_size    (8)
>>>>>>>>>                     DW_AT_type    (0x0000004e "int *")
>>>>>>>>>                     DW_AT_sibling    (0x0000004e)
>>>>>>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>>                       DW_AT_const_value    ("type-tag-3")
>>>>>>>>> 0x00000044:     DW_TAG_LLVM_annotation
>>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>>                       DW_AT_const_value    ("type-tag-2")
>>>>>>>>> 0x0000004d:     NULL
>>>>>>>>> 0x0000004e:   DW_TAG_pointer_type
>>>>>>>>>                     DW_AT_byte_size    (8)
>>>>>>>>>                     DW_AT_type    (0x00000061 "int")
>>>>>>>>>                     DW_AT_sibling    (0x00000061)
>>>>>>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>>                       DW_AT_const_value    ("type-tag-1")
>>>>>>>>> 0x00000060:     NULL
>>>>>>>>> 0x00000061:   DW_TAG_base_type
>>>>>>>>>                     DW_AT_byte_size    (0x04)
>>>>>>>>>                     DW_AT_encoding    (DW_ATE_signed)
>>>>>>>>>                     DW_AT_name    ("int")
>>>>>>>>> 0x00000068:   NULL
>>>>>>>>>
>>>>>>>>> and with clang (changing the attribute prefix to clang:: appropriately):
>>>>>>>>> $ ~/toolchains/llvm/bin/clang -target bpf -g -c c2x.c -o c2x.o.ll
>>>>>>>>> --std=c2x
>>>>>>>>> $ ~/toolchains/llvm/bin/llvm-dwarfdump c2x.o.ll
>>>>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>>>>                   DW_AT_producer    ("clang version 15.0.0
>>>>>>>>> (https://github.com/llvm/llvm-project.git
>>>>>>>>> f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>>>>>>                   DW_AT_language    (DW_LANG_C99)
>>>>>>>>>                   DW_AT_name    ("c2x.c")
>>>>>>>>>                   DW_AT_str_offsets_base    (0x00000008)
>>>>>>>>>                   DW_AT_stmt_list    (0x00000000)
>>>>>>>>>                   DW_AT_comp_dir    ("/home/dfaust/playpen/btf/tags")
>>>>>>>>>                   DW_AT_addr_base    (0x00000008)
>>>>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>>>>                     DW_AT_name    ("g")
>>>>>>>>>                     DW_AT_type    (0x00000029 "int **")
>>>>>>>>>                     DW_AT_external    (true)
>>>>>>>>>                     DW_AT_decl_file    ("/home/dfaust/playpen/btf/tags/c2x.c")
>>>>>>>>>                     DW_AT_decl_line    (12)
>>>>>>>>>                     DW_AT_location    (DW_OP_addrx 0x0)
>>>>>>>>> 0x00000029:   DW_TAG_pointer_type
>>>>>>>>>                     DW_AT_type    (0x00000032 "int *")
>>>>>>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>>>>>>                       DW_AT_name    ("btf_type_tag")
>>>>>>>>>                       DW_AT_const_value    ("type-tag-1")
>>>>>>>>> 0x00000031:     NULL
>>>>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>>>>                     DW_AT_type    (0x00000037 "int")
>>>>>>>>> 0x00000037:   DW_TAG_base_type
>>>>>>>>>                     DW_AT_name    ("int")
>>>>>>>>>                     DW_AT_encoding    (DW_ATE_signed)
>>>>>>>>>                     DW_AT_byte_size    (0x04)
>>>>>>>>> 0x0000003b:   NULL
>>>>>>>>
>>>>>>>> Thanks. I checked with current clang. The generated code looks
>>>>>>>> like above. Basically, for code like below
>>>>>>>>
>>>>>>>>       #define __typetag1 [[clang::btf_type_tag("type-tag-1")]]
>>>>>>>>       #define __typetag2 [[clang::btf_type_tag("type-tag-2")]]
>>>>>>>>       #define __typetag3 [[clang::btf_type_tag("type-tag-3")]]
>>>>>>>>
>>>>>>>>       int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>>>
>>>>>>>> The IR type looks like
>>>>>>>>       __typetag3 -> __typetag2 -> * (ptr1) -> __typetag1 -> * (ptr2) -> int
>>>>>>>>
>>>>>>>> The IR is similar to what we did if using
>>>>>>>> __attribute__((btf_type_tag(""))), but their
>>>>>>>> semantic interpretation is quite different.
>>>>>>>> For example, with c2x format,
>>>>>>>>       __typetag1 applies to ptr2
>>>>>>>> with __attribute__ format, it applies pointee of ptr1.
>>>>>>>>
>>>>>>>> But more importantly, c2x format is incompatible with
>>>>>>>> the usage of linux kernel. The following are a bunch of kernel
>>>>>>>> __user usages. Here, __user intends to be replaced with a btf_type_tag.
>>>>>>>>
>>>>>>>> vfio_pci_core.h:        ssize_t (*rw)(struct vfio_pci_core_device
>>>>>>>> *vdev, char __user *buf,
>>>>>>>> vfio_pci_core.h:                                  char __user *buf,
>>>>>>>> size_t count,
>>>>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_bar_rw(struct
>>>>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>>>>> vfio_pci_core.h:extern ssize_t vfio_pci_vga_rw(struct
>>>>>>>> vfio_pci_core_device *vdev, char __user *buf,
>>>>>>>> vfio_pci_core.h:                                      char __user
>>>>>>>> *buf, size_t count,
>>>>>>>> vfio_pci_core.h:                                void __user *arg,
>>>>>>>> size_t argsz);
>>>>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_read(struct vfio_device
>>>>>>>> *core_vdev, char __user *buf,
>>>>>>>> vfio_pci_core.h:ssize_t vfio_pci_core_write(struct vfio_device
>>>>>>>> *core_vdev, const char __user *buf,
>>>>>>>> vringh.h:                    vring_desc_t __user *desc,
>>>>>>>> vringh.h:                    vring_avail_t __user *avail,
>>>>>>>> vringh.h:                    vring_used_t __user *used);
>>>>>>>> vt_kern.h:int con_set_cmap(unsigned char __user *cmap);
>>>>>>>> vt_kern.h:int con_get_cmap(unsigned char __user *cmap);
>>>>>>>> vt_kern.h:int con_set_trans_old(unsigned char __user * table);
>>>>>>>> vt_kern.h:int con_get_trans_old(unsigned char __user * table);
>>>>>>>> vt_kern.h:int con_set_trans_new(unsigned short __user * table);
>>>>>>>> vt_kern.h:int con_get_trans_new(unsigned short __user * table);
>>>>>>>>
>>>>>>>> You can see, we will not able to simply replace __user
>>>>>>>> with [[clang::btf_type_tag("user")]] because it won't work
>>>>>>>> according to c2x expectations.
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for checking this. I see that we probably cannot use the c2x
>>>>>> syntax in the kernel, since it will not work as a drop-in replacement
>>>>>> for the current uses.
>>>>>>
>>>>>>>
>>>>>>> Hi Yongsong.
>>>>>>>
>>>>>>> I am a bit confused regarding the GNU attributes problem: our patch
>>>>>>> supports it, but as David already noted:
>>>>>>>
>>>>>>>>>>> There is still the question of why the DWARF generated for this case
>>>>>>>>>>> that I have been concerned about:
>>>>>>>>>>>
>>>>>>>>>>>       int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>>>>>
>>>>>>>>>>> differs between GCC (with this series) and clang. After studying it,
>>>>>>>>>>> GCC is doing with the attributes exactly as is described in the
>>>>>>>>>>> Attribute Syntax portion of the GCC manual where the GNU syntax is
>>>>>>>>>>> described. I do not think there is any problem here.
>>>>>>>>>>>
>>>>>>>>>>> So the difference in DWARF suggests to me that clang is not handling
>>>>>>>>>>> the GNU attribute syntax in this particular case correctly, since it
>>>>>>>>>>> seems to be associating __typetag2 and __typetag3 to g's type rather
>>>>>>>>>>> than the type to which it points.
>>>>>>>
>>>>>>> Note the example he uses is:
>>>>>>>
>>>>>>>      (a) int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>>
>>>>>>> Not
>>>>>>>
>>>>>>>      (b) int * __typetag1 * __typetag2 __typetag3 g;
>>>>>>>
>>>>>>> Apparently for (a) clang is generating DWARF that associates __typetag2
>>>>>>> and__typetag3 to g's type (the pointer to pointer) instead of the
>>>>>>> pointer to int, which contravenes the GNU syntax rules.
>>>>>>>
>>>>>>> AFAIK thats is where the DWARF we generate differs, and what is blocking
>>>>>>> us.  David will correct me in the likely case I'm wrong :)
>>>>>>
>>>>>> Right. This is what I hoped maybe the C2x syntax could resolve.
>>>>>>
>>>>>> The issue I saw is that in the case (a) above, when using the GNU
>>>>>> attribute syntax, GCC and clang produce different results. I think that
>>>>>> the underlying cause is some subtle difference in how clang is handling
>>>>>> the GNU attribute syntax in the case compared to GCC.
>>>>>>
>>>>>>
>>>>>> To remind ourselves, here is the full example. Notice the significant
>>>>>> difference in which objects the tags are associated with in DWARF.
>>>>>>
>>>>>>
>>>>>> #define __typetag1 __attribute__((btf_type_tag("type-tag-1")))
>>>>>> #define __typetag2 __attribute__((btf_type_tag("type-tag-2")))
>>>>>> #define __typetag3 __attribute__((btf_type_tag("type-tag-3")))
>>>>>>
>>>>>> int __typetag1 * __typetag2 __typetag3 * g;
>>>>>>
>>>>>>
>>>>>> GCC: bpf-unknown-none-gcc -c -gdwarf -gbtf annotate.c
>>>>>>
>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>                  DW_AT_producer	("GNU C17 12.0.1 20220401 (experimental) -gdwarf -gbtf")
>>>>>>                  DW_AT_language	(DW_LANG_C11)
>>>>>>                  DW_AT_name	("annotate.c")
>>>>>>                  DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>>>>                  DW_AT_stmt_list	(0x00000000)
>>>>>>
>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>                    DW_AT_name	("g")
>>>>>>                    DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>>>>                    DW_AT_decl_line	(11)
>>>>>>                    DW_AT_decl_column	(0x2a)
>>>>>>                    DW_AT_type	(0x00000032 "int **")
>>>>>>                    DW_AT_external	(true)
>>>>>>                    DW_AT_location	(DW_OP_addr 0x0)
>>>>>>
>>>>>> 0x00000032:   DW_TAG_pointer_type
>>>>>>                    DW_AT_byte_size	(8)
>>>>>>                    DW_AT_type	(0x00000045 "int *")
>>>>>>                    DW_AT_sibling	(0x00000045)
>>>>>>
>>>>>> 0x0000003b:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>>                      DW_AT_const_value	("type-tag-1")
>>>>>>
>>>>>> 0x00000044:     NULL
>>>>>>
>>>>>> 0x00000045:   DW_TAG_pointer_type
>>>>>>                    DW_AT_byte_size	(8)
>>>>>>                    DW_AT_type	(0x00000061 "int")
>>>>>>                    DW_AT_sibling	(0x00000061)
>>>>>>
>>>>>> 0x0000004e:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>>                      DW_AT_const_value	("type-tag-3")
>>>>>>
>>>>>> 0x00000057:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>>                      DW_AT_const_value	("type-tag-2")
>>>>>>
>>>>>> 0x00000060:     NULL
>>>>>>
>>>>>> 0x00000061:   DW_TAG_base_type
>>>>>>                    DW_AT_byte_size	(0x04)
>>>>>>                    DW_AT_encoding	(DW_ATE_signed)
>>>>>>                    DW_AT_name	("int")
>>>>>>
>>>>>> 0x00000068:   NULL
>>>>>
>>>>> do you have documentation to show why gnu generates attribute this way?
>>>>> If dwarf generates
>>>>>        ptr -> tag3 -> tag2 -> ptr -> tag1 -> int
>>>>> does this help?
>>>>
>>>> Okay, I think I see the problem. The internal representations between clang
>>>> and GCC attach the attributes to different nodes, and as a result they
>>>> produce different DWARF:
>>>>
>>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64,
>>>> annotations: !10)
>>>> !6 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64,
>>>> annotations: !8)
>>>> !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>> !8 = !{!9}
>>>> !9 = !{!"btf_type_tag", !"tag1"}
>>>> !10 = !{!11, !12}
>>>> !11 = !{!"btf_type_tag", !"tag2"}
>>>> !12 = !{!"btf_type_tag", !"tag3"}
>>>>
>>>> If I am reading this IR right, then the tags "tag2" and "tag3" are being
>>>> applied to the int**, and "tag1" is applied to the int*
>>>>
>>>> But I don't think this lines up with how the attribute syntax is defined.
>>>> See
>>>>     https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
>>>> In particular the "All other attributes" section. (It's a bit dense).
>>>> Or, as Joseph summed it up nicely earlier in the thread:
>>>>> In either syntax, __typetag2 __typetag3 should apply to
>>>>> the type to which g points, not to g or its type, just as if you had a
>>>>> type qualifier there.  You'd need to put the attributes (or qualifier)
>>>>> after the *, not before, to make them apply to the pointer type.
>>>>
>>>>
>>>> Compare that to GCC's internal representation, from which DWARF is generated:
>>>>
>>>>    <var_decl 0x7ffff7535090 g
>>>>       type <pointer_type 0x7ffff74f8888
>>>>           type <pointer_type 0x7ffff74f8b28 type <integer_type 0x7ffff74385e8 int>
>>>>               unsigned DI
>>>>               size <integer_cst 0x7ffff742b450 constant 64>
>>>>               unit-size <integer_cst 0x7ffff742b468 constant 8>
>>>>               align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff743f888
>>>>               attributes <tree_list 0x7ffff75165c8
>>>>                   purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>>>                   value <tree_list 0x7ffff7516550
>>>>                       value <string_cst 0x7ffff75182e0 type <array_type 0x7ffff74f8738>
>>>>                           readonly constant static "type-tag-3\000">>
>>>>                   chain <tree_list 0x7ffff75165a0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>>>                       value <tree_list 0x7ffff75164d8
>>>>                           value <string_cst 0x7ffff75182c0 type <array_type 0x7ffff74f8738>
>>>>                               readonly constant static "type-tag-2\000">>>>
>>>>               pointer_to_this <pointer_type 0x7ffff74f8bd0>>
>>>>           unsigned DI size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>>>>           align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74f87e0
>>>>           attributes <tree_list 0x7ffff75165f0 purpose <identifier_node 0x7ffff75290f0 btf_type_tag>
>>>>               value <tree_list 0x7ffff7516438
>>>>                   value <string_cst 0x7ffff75182a0 type <array_type 0x7ffff74f8738>
>>>>                       readonly constant static "type-tag-1\000">>>>
>>>>       public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/annotate.c:10:42 size <integer_cst 0x7ffff742b450 64> unit-size <integer_cst 0x7ffff742b468 8>
>>>>       align:64 warn_if_not_align:0>
>>>>
>>>> See how tags "tag2" and "tag3" are associated with the pointer_type 0x7ffff74f8b28,
>>>> that is, "the type to which g points"
>>>>
>>>>   From GCC's DWARF the BTF we get currently looks like:
>>>>     VAR(g) -> ptr -> tag1 -> ptr -> tag3 -> tag2 -> int
>>>> which is obviously quite different and why this case caught my attention.
>>>>
>>>> I think this difference is the root of our problems. It might not be
>>>> specifically related to the BTF tag attributes but they do reveal some
>>>> discrepency between how clang and GCC handle the attribute syntax.
>>>
>>> The btf_type attribute is very similar to address_space attribute.
>>> For example,
>>> $ cat t1.c
>>> int __attribute__((address_space(1))) * p;
>>> $ clang -g -S -emit-llvm t1.c
>>>
>>> In IR, we will have
>>> @p = dso_local global ptr addrspace(1) null, align 8, !dbg !0
>>> ...
>>> !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
>>> !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
>>>
>>> Replacing address_space with btf_type_tag, we will get
>>> ptr->type_tag->int in debuginfo.
>>>
>>> But it looks like gcc doesn't support address_space attribute
>>>
>>> $ gcc -g -S t1.c
>>> t1.c:1:1: warning: ‘address_space’ attribute directive ignored
>>> [-Wattributes]
>>>    int __attribute__((address_space(1))) * p;
>>>    ^~~
>>>
>>> Is it possible for gcc to go with address_space attribute
>>> semantics for btf_type_tag attribute?
>>
>> In cases like this the behavior is the same.
>> $ cat foo.c
>> int __attribute__((btf_type_tag("tag1"))) * p;
>> $ gcc -c -gdwarf -gbtf foo.c
>>
>> Internally:
>>   <var_decl 0x7ffff743abd0 p
>>      type <pointer_type 0x7ffff7590150
>>          type <integer_type 0x7ffff74475e8 int public SI
>>              size <integer_cst 0x7ffff742bf90 constant 32>
>>              unit-size <integer_cst 0x7ffff742bfa8 constant 4>
>>              align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff74475e8 precision:32 min <integer_cst 0x7ffff742bf48 -2147483648> max <integer_cst 0x7ffff742bf60 2147483647>
>>              pointer_to_this <pointer_type 0x7ffff744fa80>>
>>          unsigned DI
>>          size <integer_cst 0x7ffff742bd50 constant 64>
>>          unit-size <integer_cst 0x7ffff742bd68 constant 8>
>>          align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff744fa80
>>          attributes <tree_list 0x7ffff7564d70
>>              purpose <identifier_node 0x7ffff757f2d0 btf_type_tag>
>>              value <tree_list 0x7ffff7564cf8
>>                  value <string_cst 0x7ffff757c220 type <array_type 0x7ffff75900a8>
>>                      readonly constant static "tag1\000">>>>
>>      public static unsigned DI defer-output /home/dfaust/playpen/btf/tags/foo.c:1:45 size <integer_cst 0x7ffff742bd50 64> unit-size <integer_cst 0x7ffff742bd68 8>
>>      align:64 warn_if_not_align:0>
>>
>> And the resulting BTF:
>>
>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>> [2] PTR '(anon)' type_id=3
>> [3] TYPE_TAG 'tag1' type_id=1
>> [4] VAR 'p' type_id=2, linkage=global
>> [5] DATASEC '.bss' size=0 vlen=1
>> 	type_id=4 offset=0 size=8 (VAR 'p')
>>
>> var(p) -> ptr -> type_tag -> int
> 
> It would be good if we can generate similar encoding in dwarf.
> Currently in clang, we generate
>      var(p) -> ptr (type_tag) -> int
> but I am open to generate
>      var(p) -> ptr -> type_tag -> int
> in dwarf as well if it is possible.
> 

The DWARF encodings are the same between GCC and LLVM.

In the case we've looked at in this thread where the DWARF is not
the same, it is a result of clang attribute parsing not following
the GNU attribute syntax correctly and associating the attribute
with the wrong part of the declaration. But this is not a problem
with DWARF.

>>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> clang: clang -target bpf -c -g annotate.c
>>>>>>
>>>>>> 0x0000000c: DW_TAG_compile_unit
>>>>>>                  DW_AT_producer	("clang version 15.0.0 (https://github.com/llvm/llvm-project.git f80e369f61ebd33dd9377bb42fcab64d17072b18)")
>>>>>>                  DW_AT_language	(DW_LANG_C99)
>>>>>>                  DW_AT_name	("annotate.c")
>>>>>>                  DW_AT_str_offsets_base	(0x00000008)
>>>>>>                  DW_AT_stmt_list	(0x00000000)
>>>>>>                  DW_AT_comp_dir	("/home/dfaust/playpen/btf/tags")
>>>>>>                  DW_AT_addr_base	(0x00000008)
>>>>>>
>>>>>> 0x0000001e:   DW_TAG_variable
>>>>>>                    DW_AT_name	("g")
>>>>>>                    DW_AT_type	(0x00000029 "int **")
>>>>>>                    DW_AT_external	(true)
>>>>>>                    DW_AT_decl_file	("/home/dfaust/playpen/btf/tags/annotate.c")
>>>>>>                    DW_AT_decl_line	(11)
>>>>>>                    DW_AT_location	(DW_OP_addrx 0x0)
>>>>>>
>>>>>> 0x00000029:   DW_TAG_pointer_type
>>>>>>                    DW_AT_type	(0x00000035 "int *")
>>>>>>
>>>>>> 0x0000002e:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>>                      DW_AT_const_value	("type-tag-2")
>>>>>>
>>>>>> 0x00000031:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>>                      DW_AT_const_value	("type-tag-3")
>>>>>>
>>>>>> 0x00000034:     NULL
>>>>>>
>>>>>> 0x00000035:   DW_TAG_pointer_type
>>>>>>                    DW_AT_type	(0x0000003e "int")
>>>>>>
>>>>>> 0x0000003a:     DW_TAG_LLVM_annotation
>>>>>>                      DW_AT_name	("btf_type_tag")
>>>>>>                      DW_AT_const_value	("type-tag-1")
>>>>>>
>>>>>> 0x0000003d:     NULL
>>>>>>
>>>>>> 0x0000003e:   DW_TAG_base_type
>>>>>>                    DW_AT_name	("int")
>>>>>>                    DW_AT_encoding	(DW_ATE_signed)
>>>>>>                    DW_AT_byte_size	(0x04)
>>>>>>
>>>>>> 0x00000042:   NULL
>>>>>>
>>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-06-07 21:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 19:42 [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations David Faust
2022-04-01 19:42 ` [PATCH 1/8] dwarf: Add dw_get_die_parent function David Faust
2022-04-01 19:42 ` [PATCH 2/8] include: Add BTF tag defines to dwarf2 and btf David Faust
2022-04-01 19:42 ` [PATCH 3/8] c-family: Add BTF tag attribute handlers David Faust
2022-04-01 19:42 ` [PATCH 4/8] dwarf: create BTF decl and type tag DIEs David Faust
2022-04-01 19:42 ` [PATCH 5/8] ctfc: Add support to pass through BTF annotations David Faust
2022-04-01 19:42 ` [PATCH 6/8] dwarf2ctf: convert tag DIEs to CTF types David Faust
2022-04-01 19:42 ` [PATCH 7/8] Output BTF DECL_TAG and TYPE_TAG types David Faust
2022-04-01 19:42 ` [PATCH 8/8] testsuite: Add tests for BTF tags David Faust
2022-04-04 22:13 ` [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations Yonghong Song
2022-04-05 16:26   ` David Faust
2022-04-18 19:36 ` [ping][PATCH " David Faust
2022-05-02 16:57   ` [ping2][PATCH " David Faust
2022-05-03 22:32     ` Joseph Myers
2022-05-04 17:03       ` David Faust
2022-05-05 23:00         ` Yonghong Song
2022-05-06 21:18           ` David Faust
2022-05-11  3:43             ` Yonghong Song
2022-05-11  5:05               ` Yonghong Song
2022-05-11 18:44                 ` David Faust
2022-05-24  6:33                   ` Yonghong Song
2022-05-24 11:07                     ` Jose E. Marchesi
2022-05-24 15:52                       ` Yonghong Song
2022-05-24 15:53                       ` David Faust
2022-05-24 16:03                         ` Yonghong Song
2022-05-24 17:04                           ` David Faust
2022-05-26  7:29                             ` Yonghong Song
2022-05-27 19:56                               ` David Faust
2022-06-03  2:04                                 ` Yonghong Song
2022-06-07 21:42                                   ` David Faust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).