public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: John Moon <quic_johmoo@quicinc.com>
To: Dodji Seketeli <dodji@seketeli.org>
Cc: <libabigail@sourceware.org>, Trilok Soni <quic_tsoni@quicinc.com>,
	"Satya Durga Srinivasu Prabhala" <quic_satyap@quicinc.com>
Subject: Re: abidiff improvements for kernel UAPI checker
Date: Wed, 5 Jul 2023 09:52:21 -0700	[thread overview]
Message-ID: <ea010309-c08b-111f-61e2-80843e5561f6@quicinc.com> (raw)
In-Reply-To: <153dc338-ff1f-4273-1663-e934124e4bcc@quicinc.com>

On 5/23/2023 12:59 PM, John Moon wrote:
> On 5/10/2023 7:21 AM, Dodji Seketeli wrote:
>>> === Enum expansion ===
>>> Here are some places in the kernel where the script reported expansion
>>> as a breaking change:
>>>
>>> One "MAX" member:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool.h#L1789
>>>
>>> Two "MAX" members:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L6398
>>>
>>> Three "MAX" members:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L3311
>>>
>>> Having two is pretty common. I can only find the one occurrence of
>>> having three of these variants... maybe the suppression could take an
>>> "n_allowed" parameter of some sort?
>>
>> I guess this can certainly be done, yes.  And it does make sense to
>> me.
>>
>>> Also, would it make sense to limit the check to enumerator variants
>>> that have somewhat common names for that case? It could evolve with
>>> time, but so far I've found "MAX", "LAST", "NUM", and "NBITS" being
>>> fairly common. Maybe the suppression could also take
>>> "allowed_strings"? Something like this?
>>
>> Just so you know, there is already the "changed_enumerators" property
>> that is available[1].  It takes the list of enumerators that are allowed
>> to change.  It doesn't support regular expressions, yet.  Maybe we can
>> either make it take enumerators or add a new one named
>> "changed_enumerators_regexp" which takes a list of regular expressions
>> describing the allowed changed enumerators.  In any case, that
>> property will have to be made to work in tandem with the
>> allow_n_last_enumarator_changes property.
>>
>> [1]: https://sourceware.org/libabigail/manual/libabigail-concepts.
>> [Look for the "changed_enumerator" word in there]
>>
>>
>>> [suppress_type]
>>>    type_kind = enum
>>>    allow_n_last_enumerator_changes = 3
>>>    allow_n_last_enumerator_change_names = [".*MAX.*", ".*LAST.*", 
>>> ".*NUM.*", ".*NBITS.*"]
>>
>> Right.  Or, rather, the syntax would be:
>>
>>   [suppress_type]
>>     type_kind = enum
>>     allow_n_last_enumerator_changes = 3
>>     changed_enumerators_regexp = .*MAX.*, .*LAST.*, .*NUM.*, .*NBITS.*
>>
>> (no need for the string delimiters as everything is considered a
>>   string by default)
>>
>> Would that be OK?
> 
> Yes, this looks perfect!
> 
>>> === Padding field expansion ===
>>> For example, you'll see lots of padding fields like this:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fuse.h#L908
>>>
>>> These don't necessarily have to occur at the end of a struct. You
>>> could expand into a padding field in the middle somewhere.
>>>
>>> This is the algorithm I was working on to detect these changes:
>>>    - The type size did not change
>>>    - There is only one data member being changed
>>>    - The changed member name contains at least one of: "pad",
>>>      "reserved", "end", or "last"
>>>    - The first new member being inserted has an offset equal to the
>>>      changed member's previous offset
>>
>> Just so you know, there are a number of properties of the
>> [suppress_type] section that are designed to address this kind of
>> suppressions.  I think some can be extended to address this use case.
>>
>> E.g, there is the has_data_member_inserted_at property which I believe
>> can help with this use case.  Here is a quick screen shot to show you
>> what it does so far, in the latest 2.3 libabigail version that got
>> recently released:
>>
>>      $ cat -n test-v0.cc
>>      1    #include <cstdint>
>>      2
>>      3    struct S
>>      4    {
>>      5      int a;
>>      6      char padding[64];
>>      7      int b;
>>      8    };
>>      9
>>     10    void
>>     11    foo(S&)
>>     12    {}
>>
>>      $ cat -n test-v1.cc
>>      1    #include <cstdint>
>>      2
>>      3    struct S
>>      4    {
>>      5      int a;
>>      6      uint32_t added;
>>      7      char padding[60];
>>      8      int b;
>>      9    };
>>     10
>>     11    void
>>     12    foo(S&)
>>     13    {}
>>
>>      $ diff -u test-v0.cc test-v1.cc
>>      --- test-v0.cc    2023-05-10 13:51:19.798072787 +0200
>>      +++ test-v1.cc    2023-05-10 13:54:53.879082831 +0200
>>      @@ -3,7 +3,8 @@
>>       struct S
>>       {
>>         int a;
>>      -  char padding[64];
>>      +  uint32_t added;
>>      +  char padding[60];
>>         int b;
>>       };
>>
>>      $ gcc -g -c test-v{0,1}.cc
>>      $ abidiff test-v0.o test-v1.o
>>      Functions changes summary: 0 Removed, 1 Changed, 0 Added function
>>      Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>>
>>      1 function with some indirect sub-type change:
>>
>>        [C] 'function void foo(S&)' at test-v0.cc:11:1 has some 
>> indirect sub-type changes:
>>     parameter 1 of type 'S&' has sub-type changes:
>>       in referenced type 'struct S' at test-v1.cc:3:1:
>>         type size hasn't changed
>>         1 data member insertion:
>>           'uint32_t added', at offset 32 (in bits) at test-v1.cc:6:1
>>         1 data member change:
>>           type of 'char padding[64]' changed:
>>         type name changed from 'char[64]' to 'char[60]'
>>         array type size changed from 512 to 480
>>         array type subrange 1 changed length from 64 to 60
>>           and offset changed from 32 to 64 (in bits) (by +32 bits)
>>
>>      $ cat padding.suppr
>>      [suppress_type]
>>        type_kind = struct
>>        has_data_member_inserted_at = 
>> offset_of_first_data_member_regexp(.*padding)
>>
>>      $ /home/dodji/git/libabigail/master/build/tools/abidiff --suppr 
>> padding.suppr test-v0.o test-v1.o
>>      Functions changes summary: 0 Removed, 0 Changed (1 filtered out), 
>> 0 Added function
>>      Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>>
>>      $
>>
> 
> This looks very close! Could we just repeat the suppression for each of 
> the member names we're interested in ignoring or would we need to add 
> support for the "offset_of_first_data_member_regexp()" function to take 
> in a list of regexp?
> 
>>>
>>>
>>> === Flex array expansion ===
>>> For example:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/rseq.h#L149.
>>>
>>> Leaving a flex array at the end essentially gives the struct an
>>> "infinite" padding field. The algorithm to detect looks almost
>>> identical to the ordinary padding expansion except this class can only
>>> occur with the flex array at the end of the struct.
>>
>> I am not sure to understand this case in light of what we have been
>> talking about earlier.  If a data member is inserted anywhere around
>> the flex array field, I would have thought that it should be flagged
>> as being a meaningful change to report.  What am I missing?
> 
> Flex arrays can only be at the end of a struct, so I guess the rule 
> should be "you can add members between existing members and the ending 
> flex array".
> 
> E.g. this should be fine:
> 
>   struct foo {
>       __u32 a;
>       __u32 b;
> +    __u32 c;
>       char pad[];
>   };
> 
> But not this:
> 
>   struct foo {
>       __u32 a;
> +    __u32 c;
>       __u32 b;
>       char pad[];
>   };
> 
> 
>>> === Anonymous Enums ===
>>> Another issue that comes up when comparing ABI across wide swaths of
>>> kernel commit history are changes to anonymous enums. From what I can
>>> tell, there's not a great way to handle this. If someone adds a new
>>> anonymous enum, the tag abidiff gives (e.g. enum __anonymous_enum__6)
>>> can change, so abidiff reports a new anonymous enum with all the same
>>> fields even though it was basically just a "rename".
>>>
>>> For reference, this file is full of anonymous enums:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool_netlink.h#L15.
>>> Basically any change in there is triggering a failure from the
>>> script.
>>
>> By 'any change', do you mean any addition of enumerator?
>>
>> I think we can do something similar to what we do for anonymous
>> structs/unions, which it to consider content of the enums, rather than
>> just their "made-up" names.
>>
>> I guess we need to have well defined examples of changes we want to
>> handle and I think we can come to a solution.
>>
>>> I'm not sure how to handle these changes... there are possible ABI
>>> breakages when changing anonymous enums, but it's unclear to me how to
>>> detect them.
>>
>> Let's come up with some small examples first, I'd say.
> 
> Here's one example of the issue: 
> https://github.com/torvalds/linux/commit/1d91855304c2046115ee10be2c93161d93d5d40d (specifically the change to include/uapi/rdma/hns-abi.h).
> 
> Here, a variant was added to two anonymous enums:
> 
>   enum {
>       HNS_ROCE_EXSGE_FLAGS = 1 << 0,
>       HNS_ROCE_RQ_INLINE_FLAGS = 1 << 1,
> +    HNS_ROCE_CQE_INLINE_FLAGS = 1 << 2,
>   };
> 
>   enum {
>        HNS_ROCE_RSP_EXSGE_FLAGS = 1 << 0,
>        HNS_ROCE_RSP_RQ_INLINE_FLAGS = 1 << 1,
> +     HNS_ROCE_RSP_CQE_INLINE_FLAGS = 1 << 2,
>   };
> 
> abidiff outputs:
> 
>      [C] 'enum __anonymous_enum__1' changed:
>        type size hasn't changed
>        1 enumerator deletion:
>          '__anonymous_enum__1::HNS_ROCE_RSP_EXSGE_FLAGS' value '1'
>        3 enumerator insertions:
>          '__anonymous_enum__::HNS_ROCE_EXSGE_FLAGS' value '1'
>          '__anonymous_enum__::HNS_ROCE_RQ_INLINE_FLAGS' value '2'
>          '__anonymous_enum__::HNS_ROCE_CQE_INLINE_FLAGS' value '4'
>    1 added type unreachable from any public interface:
>      [A] 'enum __anonymous_enum__1' at hns-abi.h:94:1
> 
> It seems like it mixed them up while processing? I'm not exactly sure 
> what's going on internally.
> 
> Another example: 
> https://github.com/w1ldptr/linux/commit/d219df60a70ed0739aa5dd34b477763311fc5a7b (specifically the change to include/uapi/linux/bpf.h).
> 
> Here, just one enum was modified:
> 
>   enum {
>       BPF_F_ADJ_ROOM_FIXED_GSO    = (1ULL << 0),
>       BPF_F_ADJ_ROOM_ENCAP_L3_IPV4    = (1ULL << 1),
>       BPF_F_ADJ_ROOM_ENCAP_L3_IPV6    = (1ULL << 2),
>       BPF_F_ADJ_ROOM_ENCAP_L4_GRE    = (1ULL << 3),
>       BPF_F_ADJ_ROOM_ENCAP_L4_UDP    = (1ULL << 4),
>       BPF_F_ADJ_ROOM_NO_CSUM_RESET    = (1ULL << 5),
>       BPF_F_ADJ_ROOM_ENCAP_L2_ETH    = (1ULL << 6),
> +    BPF_F_ADJ_ROOM_DECAP_L3_IPV4    = (1ULL << 7),
> +    BPF_F_ADJ_ROOM_DECAP_L3_IPV6    = (1ULL << 8),
>   };
> 
> 
> But abidiff outputs:
> 
>      [C] 'enum __anonymous_enum__9' changed:
>        type size hasn't changed
>        3 enumerator deletions:
>          '__anonymous_enum__9::BPF_F_ZERO_CSUM_TX' value '2'
>          '__anonymous_enum__9::BPF_F_DONT_FRAGMENT' value '4'
>          '__anonymous_enum__9::BPF_F_SEQ_NUMBER' value '8'
>        9 enumerator insertions:
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_FIXED_GSO' value '1'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV4' value '2'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV6' value '4'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_GRE' value '8'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_UDP' value '16'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_NO_CSUM_RESET' value '32'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L2_ETH' value '64'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV4' value '128'
>          '__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV6' value '256'
>    3 added types unreachable from any public interface:
>      [A] 'enum __anonymous_enum__9' at bpf.h:5781:1
>      [A] 'struct bpf_rb_node' at bpf.h:6931:1
>      [A] 'struct bpf_rb_root' at bpf.h:6926:1
> 
> Any ideas on how to handle these issues?
> 
> Thank you!
> 
> - John
>

Hi Dodji,

I wanted to follow up with you and see if/how we can assist with these 
features. I'm happy to jump in and try implementation if we agree on the 
feature requirements, though I'm not very familiar with the codebase.

To summarize, here are the four usecases we have so far and plans to 
address them:

1. Enum expansion
    - Add support for this kind of suppression:
    [suppress_type]
      type_kind = enum
      allow_n_last_enumerator_changes = 3
      changed_enumerators_regexp = .*MAX.*, .*LAST.*, .*NUM.*, .*NBITS.*


2. Padding field expansion
    - Current feature set is close, but how can we handle multiple names?
    - Do we want support for this kind suppression (multiple regex's)?
    [suppress_type]
       type_kind = struct
     has_data_member_inserted_at = 
offset_of_first_data_member_regexp(.*padding, .*reserved, .*unused)


3. Flex array expansion
    - Not sure how to detect/suppress this condition
    - Flex arrays can only be at the ends of structs, so we just need to 
make sure new struct members come "second to last"


4. Anonymous enums
    - Not sure how to associate these enums when diffing since their 
"made up" names don't always match before/after a patch.


I can try tackling items 1 and 2 if you'd like - just let me know! For 
items 3 and 4, can we work up a path forward?

Thanks so much!

- John

  reply	other threads:[~2023-07-05 16:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11  0:45 John Moon
2023-04-16 18:33 ` Trilok Soni
2023-04-21 18:21 ` Dodji Seketeli
2023-04-21 20:03   ` Dodji Seketeli
2023-04-24 18:39     ` John Moon
2023-05-10 14:21       ` Dodji Seketeli
2023-05-23 19:59         ` John Moon
2023-07-05 16:52           ` John Moon [this message]
2023-10-05 13:44             ` Support suppressing data members inserted right before flexible array members (was Re: abidiff improvements for kernel UAPI checker) Dodji Seketeli
2023-07-10 10:55           ` abidiff improvements for kernel UAPI checker Dodji Seketeli
2023-09-22 11:39       ` Dodji Seketeli
2023-09-22 11:51         ` Dodji Seketeli
2023-09-22 18:28           ` John Moon
2023-09-22 20:02             ` John Moon
2023-09-26  8:38             ` Dodji Seketeli
2023-09-27 17:37               ` John Moon
2023-09-29  9:52                 ` Dodji Seketeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea010309-c08b-111f-61e2-80843e5561f6@quicinc.com \
    --to=quic_johmoo@quicinc.com \
    --cc=dodji@seketeli.org \
    --cc=libabigail@sourceware.org \
    --cc=quic_satyap@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).