public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: John Moon <quic_johmoo@quicinc.com>
To: Dodji Seketeli <dodji@seketeli.org>
Cc: <libabigail@sourceware.org>, Trilok Soni <quic_tsoni@quicinc.com>,
	"Satya Durga Srinivasu Prabhala" <quic_satyap@quicinc.com>
Subject: Re: abidiff improvements for kernel UAPI checker
Date: Tue, 23 May 2023 12:59:00 -0700	[thread overview]
Message-ID: <153dc338-ff1f-4273-1663-e934124e4bcc@quicinc.com> (raw)
In-Reply-To: <87sfc42rnw.fsf@seketeli.org>

On 5/10/2023 7:21 AM, Dodji Seketeli wrote:
>> === Enum expansion ===
>> Here are some places in the kernel where the script reported expansion
>> as a breaking change:
>>
>> One "MAX" member:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool.h#L1789
>>
>> Two "MAX" members:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L6398
>>
>> Three "MAX" members:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L3311
>>
>> Having two is pretty common. I can only find the one occurrence of
>> having three of these variants... maybe the suppression could take an
>> "n_allowed" parameter of some sort?
> 
> I guess this can certainly be done, yes.  And it does make sense to
> me.
> 
>> Also, would it make sense to limit the check to enumerator variants
>> that have somewhat common names for that case? It could evolve with
>> time, but so far I've found "MAX", "LAST", "NUM", and "NBITS" being
>> fairly common. Maybe the suppression could also take
>> "allowed_strings"? Something like this?
> 
> Just so you know, there is already the "changed_enumerators" property
> that is available[1].  It takes the list of enumerators that are allowed
> to change.  It doesn't support regular expressions, yet.  Maybe we can
> either make it take enumerators or add a new one named
> "changed_enumerators_regexp" which takes a list of regular expressions
> describing the allowed changed enumerators.  In any case, that
> property will have to be made to work in tandem with the
> allow_n_last_enumarator_changes property.
> 
> [1]: https://sourceware.org/libabigail/manual/libabigail-concepts.
> [Look for the "changed_enumerator" word in there]
> 
> 
>> [suppress_type]
>>    type_kind = enum
>>    allow_n_last_enumerator_changes = 3
>>    allow_n_last_enumerator_change_names = [".*MAX.*", ".*LAST.*", ".*NUM.*", ".*NBITS.*"]
> 
> Right.  Or, rather, the syntax would be:
> 
>   [suppress_type]
>     type_kind = enum
>     allow_n_last_enumerator_changes = 3
>     changed_enumerators_regexp = .*MAX.*, .*LAST.*, .*NUM.*, .*NBITS.*
> 
> (no need for the string delimiters as everything is considered a
>   string by default)
> 
> Would that be OK?

Yes, this looks perfect!

>> === Padding field expansion ===
>> For example, you'll see lots of padding fields like this:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fuse.h#L908
>>
>> These don't necessarily have to occur at the end of a struct. You
>> could expand into a padding field in the middle somewhere.
>>
>> This is the algorithm I was working on to detect these changes:
>>    - The type size did not change
>>    - There is only one data member being changed
>>    - The changed member name contains at least one of: "pad",
>>      "reserved", "end", or "last"
>>    - The first new member being inserted has an offset equal to the
>>      changed member's previous offset
> 
> Just so you know, there are a number of properties of the
> [suppress_type] section that are designed to address this kind of
> suppressions.  I think some can be extended to address this use case.
> 
> E.g, there is the has_data_member_inserted_at property which I believe
> can help with this use case.  Here is a quick screen shot to show you
> what it does so far, in the latest 2.3 libabigail version that got
> recently released:
> 
>      $ cat -n test-v0.cc
> 	 1	#include <cstdint>
> 	 2	
> 	 3	struct S
> 	 4	{
> 	 5	  int a;
> 	 6	  char padding[64];
> 	 7	  int b;
> 	 8	};
> 	 9	
> 	10	void
> 	11	foo(S&)
> 	12	{}
> 
>      $ cat -n test-v1.cc
> 	 1	#include <cstdint>
> 	 2	
> 	 3	struct S
> 	 4	{
> 	 5	  int a;
> 	 6	  uint32_t added;
> 	 7	  char padding[60];
> 	 8	  int b;
> 	 9	};
> 	10	
> 	11	void
> 	12	foo(S&)
> 	13	{}
> 
>      $ diff -u test-v0.cc test-v1.cc
>      --- test-v0.cc	2023-05-10 13:51:19.798072787 +0200
>      +++ test-v1.cc	2023-05-10 13:54:53.879082831 +0200
>      @@ -3,7 +3,8 @@
>       struct S
>       {
>         int a;
>      -  char padding[64];
>      +  uint32_t added;
>      +  char padding[60];
>         int b;
>       };
> 
>      $ gcc -g -c test-v{0,1}.cc
>      $ abidiff test-v0.o test-v1.o
>      Functions changes summary: 0 Removed, 1 Changed, 0 Added function
>      Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> 
>      1 function with some indirect sub-type change:
> 
>        [C] 'function void foo(S&)' at test-v0.cc:11:1 has some indirect sub-type changes:
> 	parameter 1 of type 'S&' has sub-type changes:
> 	  in referenced type 'struct S' at test-v1.cc:3:1:
> 	    type size hasn't changed
> 	    1 data member insertion:
> 	      'uint32_t added', at offset 32 (in bits) at test-v1.cc:6:1
> 	    1 data member change:
> 	      type of 'char padding[64]' changed:
> 		type name changed from 'char[64]' to 'char[60]'
> 		array type size changed from 512 to 480
> 		array type subrange 1 changed length from 64 to 60
> 	      and offset changed from 32 to 64 (in bits) (by +32 bits)
> 
>      $ cat padding.suppr
>      [suppress_type]
>        type_kind = struct
>        has_data_member_inserted_at = offset_of_first_data_member_regexp(.*padding)
> 
>      $ /home/dodji/git/libabigail/master/build/tools/abidiff --suppr padding.suppr test-v0.o test-v1.o
>      Functions changes summary: 0 Removed, 0 Changed (1 filtered out), 0 Added function
>      Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> 
>      $
> 

This looks very close! Could we just repeat the suppression for each of 
the member names we're interested in ignoring or would we need to add 
support for the "offset_of_first_data_member_regexp()" function to take 
in a list of regexp?

>>
>>
>> === Flex array expansion ===
>> For example:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/rseq.h#L149.
>>
>> Leaving a flex array at the end essentially gives the struct an
>> "infinite" padding field. The algorithm to detect looks almost
>> identical to the ordinary padding expansion except this class can only
>> occur with the flex array at the end of the struct.
> 
> I am not sure to understand this case in light of what we have been
> talking about earlier.  If a data member is inserted anywhere around
> the flex array field, I would have thought that it should be flagged
> as being a meaningful change to report.  What am I missing?

Flex arrays can only be at the end of a struct, so I guess the rule 
should be "you can add members between existing members and the ending 
flex array".

E.g. this should be fine:

  struct foo {
      __u32 a;
      __u32 b;
+    __u32 c;
      char pad[];
  };

But not this:

  struct foo {
      __u32 a;
+    __u32 c;
      __u32 b;
      char pad[];
  };


>> === Anonymous Enums ===
>> Another issue that comes up when comparing ABI across wide swaths of
>> kernel commit history are changes to anonymous enums. From what I can
>> tell, there's not a great way to handle this. If someone adds a new
>> anonymous enum, the tag abidiff gives (e.g. enum __anonymous_enum__6)
>> can change, so abidiff reports a new anonymous enum with all the same
>> fields even though it was basically just a "rename".
>>
>> For reference, this file is full of anonymous enums:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool_netlink.h#L15.
>> Basically any change in there is triggering a failure from the
>> script.
> 
> By 'any change', do you mean any addition of enumerator?
> 
> I think we can do something similar to what we do for anonymous
> structs/unions, which it to consider content of the enums, rather than
> just their "made-up" names.
> 
> I guess we need to have well defined examples of changes we want to
> handle and I think we can come to a solution.
> 
>> I'm not sure how to handle these changes... there are possible ABI
>> breakages when changing anonymous enums, but it's unclear to me how to
>> detect them.
> 
> Let's come up with some small examples first, I'd say.

Here's one example of the issue: 
https://github.com/torvalds/linux/commit/1d91855304c2046115ee10be2c93161d93d5d40d 
(specifically the change to include/uapi/rdma/hns-abi.h).

Here, a variant was added to two anonymous enums:

  enum {
      HNS_ROCE_EXSGE_FLAGS = 1 << 0,
      HNS_ROCE_RQ_INLINE_FLAGS = 1 << 1,
+    HNS_ROCE_CQE_INLINE_FLAGS = 1 << 2,
  };

  enum {
       HNS_ROCE_RSP_EXSGE_FLAGS = 1 << 0,
       HNS_ROCE_RSP_RQ_INLINE_FLAGS = 1 << 1,
+     HNS_ROCE_RSP_CQE_INLINE_FLAGS = 1 << 2,
  };

abidiff outputs:

     [C] 'enum __anonymous_enum__1' changed:
       type size hasn't changed
       1 enumerator deletion:
         '__anonymous_enum__1::HNS_ROCE_RSP_EXSGE_FLAGS' value '1'
       3 enumerator insertions:
         '__anonymous_enum__::HNS_ROCE_EXSGE_FLAGS' value '1'
         '__anonymous_enum__::HNS_ROCE_RQ_INLINE_FLAGS' value '2'
         '__anonymous_enum__::HNS_ROCE_CQE_INLINE_FLAGS' value '4'
   1 added type unreachable from any public interface:
     [A] 'enum __anonymous_enum__1' at hns-abi.h:94:1

It seems like it mixed them up while processing? I'm not exactly sure 
what's going on internally.

Another example: 
https://github.com/w1ldptr/linux/commit/d219df60a70ed0739aa5dd34b477763311fc5a7b 
(specifically the change to include/uapi/linux/bpf.h).

Here, just one enum was modified:

  enum {
      BPF_F_ADJ_ROOM_FIXED_GSO	= (1ULL << 0),
      BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	= (1ULL << 1),
      BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
      BPF_F_ADJ_ROOM_ENCAP_L4_GRE	= (1ULL << 3),
      BPF_F_ADJ_ROOM_ENCAP_L4_UDP	= (1ULL << 4),
      BPF_F_ADJ_ROOM_NO_CSUM_RESET	= (1ULL << 5),
      BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
+    BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
+    BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
  };


But abidiff outputs:

     [C] 'enum __anonymous_enum__9' changed:
       type size hasn't changed
       3 enumerator deletions:
         '__anonymous_enum__9::BPF_F_ZERO_CSUM_TX' value '2'
         '__anonymous_enum__9::BPF_F_DONT_FRAGMENT' value '4'
         '__anonymous_enum__9::BPF_F_SEQ_NUMBER' value '8'
       9 enumerator insertions:
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_FIXED_GSO' value '1'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV4' value '2'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV6' value '4'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_GRE' value '8'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_UDP' value '16'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_NO_CSUM_RESET' value '32'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L2_ETH' value '64'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV4' value '128'
         '__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV6' value '256'
   3 added types unreachable from any public interface:
     [A] 'enum __anonymous_enum__9' at bpf.h:5781:1
     [A] 'struct bpf_rb_node' at bpf.h:6931:1
     [A] 'struct bpf_rb_root' at bpf.h:6926:1

Any ideas on how to handle these issues?

Thank you!

- John


  reply	other threads:[~2023-05-23 19:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11  0:45 John Moon
2023-04-16 18:33 ` Trilok Soni
2023-04-21 18:21 ` Dodji Seketeli
2023-04-21 20:03   ` Dodji Seketeli
2023-04-24 18:39     ` John Moon
2023-05-10 14:21       ` Dodji Seketeli
2023-05-23 19:59         ` John Moon [this message]
2023-07-05 16:52           ` John Moon
2023-10-05 13:44             ` Support suppressing data members inserted right before flexible array members (was Re: abidiff improvements for kernel UAPI checker) Dodji Seketeli
2023-07-10 10:55           ` abidiff improvements for kernel UAPI checker Dodji Seketeli
2023-09-22 11:39       ` Dodji Seketeli
2023-09-22 11:51         ` Dodji Seketeli
2023-09-22 18:28           ` John Moon
2023-09-22 20:02             ` John Moon
2023-09-26  8:38             ` Dodji Seketeli
2023-09-27 17:37               ` John Moon
2023-09-29  9:52                 ` Dodji Seketeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=153dc338-ff1f-4273-1663-e934124e4bcc@quicinc.com \
    --to=quic_johmoo@quicinc.com \
    --cc=dodji@seketeli.org \
    --cc=libabigail@sourceware.org \
    --cc=quic_satyap@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).