public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Dodji Seketeli <dodji@seketeli.org>
To: John Moon <quic_johmoo@quicinc.com>
Cc: <libabigail@sourceware.org>,
	 Trilok Soni <quic_tsoni@quicinc.com>,
	"Satya Durga Srinivasu Prabhala" <quic_satyap@quicinc.com>
Subject: Re: abidiff improvements for kernel UAPI checker
Date: Mon, 10 Jul 2023 12:55:27 +0200	[thread overview]
Message-ID: <87jzv8yr1c.fsf@seketeli.org> (raw)
In-Reply-To: <153dc338-ff1f-4273-1663-e934124e4bcc@quicinc.com> (John Moon's message of "Tue, 23 May 2023 12:59:00 -0700")

Hello John,

Sorry for the delayed reply.

[...]


> On 5/10/2023 7:21 AM, Dodji Seketeli wrote:

[...]

>>> === Enum expansion ===

[...]


>> the syntax would be:
>>   [suppress_type]
>>     type_kind = enum
>>     allow_n_last_enumerator_changes = 3
>>     changed_enumerators_regexp = .*MAX.*, .*LAST.*, .*NUM.*, .*NBITS.*
>> (no need for the string delimiters as everything is considered a
>>   string by default)
>> Would that be OK?

John Moon wrote:

>
> Yes, this looks perfect!
>

OK, deal.

>>> === Padding field expansion ===

[...]



>>      $ cat -n test-v0.cc
>> 	 1	#include <cstdint>
>> 	 2	
>> 	 3	struct S
>> 	 4	{
>> 	 5	  int a;
>> 	 6	  char padding[64];
>> 	 7	  int b;
>> 	 8	};
>> 	 9	
>> 	10	void
>> 	11	foo(S&)
>> 	12	{}
>>      $ cat -n test-v1.cc
>> 	 1	#include <cstdint>
>> 	 2	
>> 	 3	struct S
>> 	 4	{
>> 	 5	  int a;
>> 	 6	  uint32_t added;
>> 	 7	  char padding[60];
>> 	 8	  int b;
>> 	 9	};
>> 	10	
>> 	11	void
>> 	12	foo(S&)
>> 	13	{}
>>      $ diff -u test-v0.cc test-v1.cc
>>      --- test-v0.cc	2023-05-10 13:51:19.798072787 +0200
>>      +++ test-v1.cc	2023-05-10 13:54:53.879082831 +0200
>>      @@ -3,7 +3,8 @@
>>       struct S
>>       {
>>         int a;
>>      -  char padding[64];
>>      +  uint32_t added;
>>      +  char padding[60];
>>         int b;
>>       };
>>      $ gcc -g -c test-v{0,1}.cc
>>      $ abidiff test-v0.o test-v1.o
>>      Functions changes summary: 0 Removed, 1 Changed, 0 Added function
>>      Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>>      1 function with some indirect sub-type change:
>>        [C] 'function void foo(S&)' at test-v0.cc:11:1 has some
>> indirect sub-type changes:
>> 	parameter 1 of type 'S&' has sub-type changes:
>> 	  in referenced type 'struct S' at test-v1.cc:3:1:
>> 	    type size hasn't changed
>> 	    1 data member insertion:
>> 	      'uint32_t added', at offset 32 (in bits) at test-v1.cc:6:1
>> 	    1 data member change:
>> 	      type of 'char padding[64]' changed:
>> 		type name changed from 'char[64]' to 'char[60]'
>> 		array type size changed from 512 to 480
>> 		array type subrange 1 changed length from 64 to 60
>> 	      and offset changed from 32 to 64 (in bits) (by +32 bits)
>>      $ cat padding.suppr
>>      [suppress_type]
>>        type_kind = struct
>>        has_data_member_inserted_at = offset_of_first_data_member_regexp(.*padding)
>>      $ /home/dodji/git/libabigail/master/build/tools/abidiff --suppr
>> padding.suppr test-v0.o test-v1.o
>>      Functions changes summary: 0 Removed, 0 Changed (1 filtered out), 0 Added function
>>      Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>>      $
>>

John Moon wrote:

>
> This looks very close! Could we just repeat the suppression for each
> of the member names we're interested in ignoring or would we need to
> add support for the "offset_of_first_data_member_regexp()" function to
> take in a list of regexp?

This has been designed so that (I haven't tested it right now) having one [suppress_type]
directive per data member insertion we want should work, E.g:

      [suppress_type]
        type_kind = struct
        # Suppress a data member insertion after a data member
        # ending with the suffix 'padding'.
        has_data_member_inserted_at = offset_of_first_data_member_regexp(.*padding)

      [suppress_type]
        type_kind = struct
        # Suppress a data member insertion after a data member ending
        # with the suffix 'buffer'
        has_data_member_inserted_at = offset_of_first_data_member_regexp(.*buffer)


John Moon wrote:

>>> === Flex array expansion ===
>>> For example:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/rseq.h#L149.
>>>
>>> Leaving a flex array at the end essentially gives the struct an
>>> "infinite" padding field. The algorithm to detect looks almost
>>> identical to the ordinary padding expansion except this class can only
>>> occur with the flex array at the end of the struct.


Dodji Seketeli wrote:

>> I am not sure to understand this case in light of what we have been
>> talking about earlier.  If a data member is inserted anywhere around
>> the flex array field, I would have thought that it should be flagged
>> as being a meaningful change to report.  What am I missing?

John Moon wrote:

>
> Flex arrays can only be at the end of a struct, so I guess the rule
> should be "you can add members between existing members and the ending 
> flex array".
>
> E.g. this should be fine:
>
>  struct foo {
>      __u32 a;
>      __u32 b;
> +    __u32 c;
>      char pad[];
>  };

But in this case, the size of "struct foo" changed and the offset of the
data member 'pad' changed too.  So I'd flag this as a meaningful ABI
change to report in all cases.  For instance, if a struct embeds this
type, then the size of that struct changes etc.

What am I missing?

>
> But not this:
>
>  struct foo {
>      __u32 a;
> +    __u32 c;
>      __u32 b;
>      char pad[];
>  };
>

Yes, this change is meaningful to report in all cases too.

>>> === Anonymous Enums ===
>>> Another issue that comes up when comparing ABI across wide swaths of
>>> kernel commit history are changes to anonymous enums. From what I can
>>> tell, there's not a great way to handle this. If someone adds a new
>>> anonymous enum, the tag abidiff gives (e.g. enum __anonymous_enum__6)
>>> can change, so abidiff reports a new anonymous enum with all the same
>>> fields even though it was basically just a "rename".
>>>
>>> For reference, this file is full of anonymous enums:
>>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool_netlink.h#L15.
>>> Basically any change in there is triggering a failure from the
>>> script.
>> By 'any change', do you mean any addition of enumerator?
>> I think we can do something similar to what we do for anonymous
>> structs/unions, which it to consider content of the enums, rather than
>> just their "made-up" names.
>> I guess we need to have well defined examples of changes we want to
>> handle and I think we can come to a solution.
>> 
>>> I'm not sure how to handle these changes... there are possible ABI
>>> breakages when changing anonymous enums, but it's unclear to me how to
>>> detect them.
>> Let's come up with some small examples first, I'd say.
>
> Here's one example of the issue:
> https://github.com/torvalds/linux/commit/1d91855304c2046115ee10be2c93161d93d5d40d 
> (specifically the change to include/uapi/rdma/hns-abi.h).
>
> Here, a variant was added to two anonymous enums:
>
>  enum {
>      HNS_ROCE_EXSGE_FLAGS = 1 << 0,
>      HNS_ROCE_RQ_INLINE_FLAGS = 1 << 1,
> +    HNS_ROCE_CQE_INLINE_FLAGS = 1 << 2,
>  };
>
>  enum {
>       HNS_ROCE_RSP_EXSGE_FLAGS = 1 << 0,
>       HNS_ROCE_RSP_RQ_INLINE_FLAGS = 1 << 1,
> +     HNS_ROCE_RSP_CQE_INLINE_FLAGS = 1 << 2,
>  };
>
> abidiff outputs:
>
>     [C] 'enum __anonymous_enum__1' changed:
>       type size hasn't changed
>       1 enumerator deletion:
>         '__anonymous_enum__1::HNS_ROCE_RSP_EXSGE_FLAGS' value '1'
>       3 enumerator insertions:
>         '__anonymous_enum__::HNS_ROCE_EXSGE_FLAGS' value '1'
>         '__anonymous_enum__::HNS_ROCE_RQ_INLINE_FLAGS' value '2'
>         '__anonymous_enum__::HNS_ROCE_CQE_INLINE_FLAGS' value '4'
>   1 added type unreachable from any public interface:
>     [A] 'enum __anonymous_enum__1' at hns-abi.h:94:1
>
> It seems like it mixed them up while processing? I'm not exactly sure
> what's going on internally.

Yes, this is a bug.  I need to look into it.

>
> Another example:
> https://github.com/w1ldptr/linux/commit/d219df60a70ed0739aa5dd34b477763311fc5a7b 
> (specifically the change to include/uapi/linux/bpf.h).
>
> Here, just one enum was modified:
>
>  enum {
>      BPF_F_ADJ_ROOM_FIXED_GSO	= (1ULL << 0),
>      BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	= (1ULL << 1),
>      BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
>      BPF_F_ADJ_ROOM_ENCAP_L4_GRE	= (1ULL << 3),
>      BPF_F_ADJ_ROOM_ENCAP_L4_UDP	= (1ULL << 4),
>      BPF_F_ADJ_ROOM_NO_CSUM_RESET	= (1ULL << 5),
>      BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
> +    BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
> +    BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
>  };
>
>
> But abidiff outputs:
>
>     [C] 'enum __anonymous_enum__9' changed:
>       type size hasn't changed
>       3 enumerator deletions:
>         '__anonymous_enum__9::BPF_F_ZERO_CSUM_TX' value '2'
>         '__anonymous_enum__9::BPF_F_DONT_FRAGMENT' value '4'
>         '__anonymous_enum__9::BPF_F_SEQ_NUMBER' value '8'
>       9 enumerator insertions:
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_FIXED_GSO' value '1'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV4' value '2'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV6' value '4'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_GRE' value '8'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_UDP' value '16'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_NO_CSUM_RESET' value '32'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L2_ETH' value '64'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV4' value '128'
>         '__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV6' value '256'
>   3 added types unreachable from any public interface:
>     [A] 'enum __anonymous_enum__9' at bpf.h:5781:1
>     [A] 'struct bpf_rb_node' at bpf.h:6931:1
>     [A] 'struct bpf_rb_root' at bpf.h:6926:1
>
> Any ideas on how to handle these issues?

These are bugs that ought to be fixed, I'll certainly look into them.

>
> Thank you!

Thank you for this excellent write up.

[...]


John Moon wrote:


> Hi Dodji,
>
> I wanted to follow up with you and see if/how we can assist with these
> features. I'm happy to jump in and try implementation if we agree on
> the feature requirements, though I'm not very familiar with the
> codebase.
>
> To summarize, here are the four usecases we have so far and plans to
> address them:
>
> 1. Enum expansion
>    - Add support for this kind of suppression:
>    [suppress_type]
>      type_kind = enum
>      allow_n_last_enumerator_changes = 3
>      changed_enumerators_regexp = .*MAX.*, .*LAST.*, .*NUM.*, .*NBITS.*

Right.

>
> 2. Padding field expansion
>    - Current feature set is close, but how can we handle multiple names?
>    - Do we want support for this kind suppression (multiple regex's)?
>    [suppress_type]
>       type_kind = struct
>     has_data_member_inserted_at =
>     offset_of_first_data_member_regexp(.*padding, .*reserved,
>    .*unused)

I think this one can be handled already by writing one type suppression
specification per pattern of data member after which insertion is
allowed.  Would this work for you at least for a start?


> 3. Flex array expansion
>    - Not sure how to detect/suppress this condition
>    - Flex arrays can only be at the ends of structs, so we just need
>      to make sure new struct members come "second to last"

I am still not sure why we think these can be suppressed; I am pretty
sure I am missing something but I am not sure what yet.


> 4. Anonymous enums
>    - Not sure how to associate these enums when diffing since their
>      "made up" names don't always match before/after a patch.
>

This is a plain bug that ought to be fixed, I think.

> I can try tackling items 1 and 2 if you'd like - just let me know! For
> items 3 and 4, can we work up a path forward?

First thank you for proposing your help, that would be greatly
appreciated :-) So, I have created a branch "kuapi-verify" at
https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/users/dodji/kuapi-verify.

Its purpose is to contain the code tu support this kernel uapi
verification effort, on the libabigail side.  I've folded the content of
the branch "last-enumerator-change" into it.  That branch is at
https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/users/dodji/filter-last-enumerator-change.

So the idea is that we can fold the necessary features into
"kuapi-verify" as we progress on them, if you like.  If you fancy, you
can start looking into how to extend the content of the branch
https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/users/dodji/filter-last-enumerator-change
to make it support the feature 1.  In the mean time, I'll look into fixing 4.  Until
then, I'll be more clear on 3, I guess.

I am available on IRC at irc.oftc.net#libabigail during western
European working hours if you need higher bandwidth exchanges.  I'd be
glad to chat with you there!

Thank you for following up!

Cheers,

-- 
		Dodji

  parent reply	other threads:[~2023-07-10 10:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11  0:45 John Moon
2023-04-16 18:33 ` Trilok Soni
2023-04-21 18:21 ` Dodji Seketeli
2023-04-21 20:03   ` Dodji Seketeli
2023-04-24 18:39     ` John Moon
2023-05-10 14:21       ` Dodji Seketeli
2023-05-23 19:59         ` John Moon
2023-07-05 16:52           ` John Moon
2023-10-05 13:44             ` Support suppressing data members inserted right before flexible array members (was Re: abidiff improvements for kernel UAPI checker) Dodji Seketeli
2023-07-10 10:55           ` Dodji Seketeli [this message]
2023-09-22 11:39       ` abidiff improvements for kernel UAPI checker Dodji Seketeli
2023-09-22 11:51         ` Dodji Seketeli
2023-09-22 18:28           ` John Moon
2023-09-22 20:02             ` John Moon
2023-09-26  8:38             ` Dodji Seketeli
2023-09-27 17:37               ` John Moon
2023-09-29  9:52                 ` Dodji Seketeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jzv8yr1c.fsf@seketeli.org \
    --to=dodji@seketeli.org \
    --cc=libabigail@sourceware.org \
    --cc=quic_johmoo@quicinc.com \
    --cc=quic_satyap@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).