From: John Moon <quic_johmoo@quicinc.com>
To: Dodji Seketeli <dodji@seketeli.org>
Cc: <libabigail@sourceware.org>, Trilok Soni <quic_tsoni@quicinc.com>,
"Satya Durga Srinivasu Prabhala" <quic_satyap@quicinc.com>
Subject: Re: abidiff improvements for kernel UAPI checker
Date: Tue, 23 May 2023 12:59:00 -0700 [thread overview]
Message-ID: <153dc338-ff1f-4273-1663-e934124e4bcc@quicinc.com> (raw)
In-Reply-To: <87sfc42rnw.fsf@seketeli.org>
On 5/10/2023 7:21 AM, Dodji Seketeli wrote:
>> === Enum expansion ===
>> Here are some places in the kernel where the script reported expansion
>> as a breaking change:
>>
>> One "MAX" member:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool.h#L1789
>>
>> Two "MAX" members:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L6398
>>
>> Three "MAX" members:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L3311
>>
>> Having two is pretty common. I can only find the one occurrence of
>> having three of these variants... maybe the suppression could take an
>> "n_allowed" parameter of some sort?
>
> I guess this can certainly be done, yes. And it does make sense to
> me.
>
>> Also, would it make sense to limit the check to enumerator variants
>> that have somewhat common names for that case? It could evolve with
>> time, but so far I've found "MAX", "LAST", "NUM", and "NBITS" being
>> fairly common. Maybe the suppression could also take
>> "allowed_strings"? Something like this?
>
> Just so you know, there is already the "changed_enumerators" property
> that is available[1]. It takes the list of enumerators that are allowed
> to change. It doesn't support regular expressions, yet. Maybe we can
> either make it take enumerators or add a new one named
> "changed_enumerators_regexp" which takes a list of regular expressions
> describing the allowed changed enumerators. In any case, that
> property will have to be made to work in tandem with the
> allow_n_last_enumarator_changes property.
>
> [1]: https://sourceware.org/libabigail/manual/libabigail-concepts.
> [Look for the "changed_enumerator" word in there]
>
>
>> [suppress_type]
>> type_kind = enum
>> allow_n_last_enumerator_changes = 3
>> allow_n_last_enumerator_change_names = [".*MAX.*", ".*LAST.*", ".*NUM.*", ".*NBITS.*"]
>
> Right. Or, rather, the syntax would be:
>
> [suppress_type]
> type_kind = enum
> allow_n_last_enumerator_changes = 3
> changed_enumerators_regexp = .*MAX.*, .*LAST.*, .*NUM.*, .*NBITS.*
>
> (no need for the string delimiters as everything is considered a
> string by default)
>
> Would that be OK?
Yes, this looks perfect!
>> === Padding field expansion ===
>> For example, you'll see lots of padding fields like this:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fuse.h#L908
>>
>> These don't necessarily have to occur at the end of a struct. You
>> could expand into a padding field in the middle somewhere.
>>
>> This is the algorithm I was working on to detect these changes:
>> - The type size did not change
>> - There is only one data member being changed
>> - The changed member name contains at least one of: "pad",
>> "reserved", "end", or "last"
>> - The first new member being inserted has an offset equal to the
>> changed member's previous offset
>
> Just so you know, there are a number of properties of the
> [suppress_type] section that are designed to address this kind of
> suppressions. I think some can be extended to address this use case.
>
> E.g, there is the has_data_member_inserted_at property which I believe
> can help with this use case. Here is a quick screen shot to show you
> what it does so far, in the latest 2.3 libabigail version that got
> recently released:
>
> $ cat -n test-v0.cc
> 1 #include <cstdint>
> 2
> 3 struct S
> 4 {
> 5 int a;
> 6 char padding[64];
> 7 int b;
> 8 };
> 9
> 10 void
> 11 foo(S&)
> 12 {}
>
> $ cat -n test-v1.cc
> 1 #include <cstdint>
> 2
> 3 struct S
> 4 {
> 5 int a;
> 6 uint32_t added;
> 7 char padding[60];
> 8 int b;
> 9 };
> 10
> 11 void
> 12 foo(S&)
> 13 {}
>
> $ diff -u test-v0.cc test-v1.cc
> --- test-v0.cc 2023-05-10 13:51:19.798072787 +0200
> +++ test-v1.cc 2023-05-10 13:54:53.879082831 +0200
> @@ -3,7 +3,8 @@
> struct S
> {
> int a;
> - char padding[64];
> + uint32_t added;
> + char padding[60];
> int b;
> };
>
> $ gcc -g -c test-v{0,1}.cc
> $ abidiff test-v0.o test-v1.o
> Functions changes summary: 0 Removed, 1 Changed, 0 Added function
> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>
> 1 function with some indirect sub-type change:
>
> [C] 'function void foo(S&)' at test-v0.cc:11:1 has some indirect sub-type changes:
> parameter 1 of type 'S&' has sub-type changes:
> in referenced type 'struct S' at test-v1.cc:3:1:
> type size hasn't changed
> 1 data member insertion:
> 'uint32_t added', at offset 32 (in bits) at test-v1.cc:6:1
> 1 data member change:
> type of 'char padding[64]' changed:
> type name changed from 'char[64]' to 'char[60]'
> array type size changed from 512 to 480
> array type subrange 1 changed length from 64 to 60
> and offset changed from 32 to 64 (in bits) (by +32 bits)
>
> $ cat padding.suppr
> [suppress_type]
> type_kind = struct
> has_data_member_inserted_at = offset_of_first_data_member_regexp(.*padding)
>
> $ /home/dodji/git/libabigail/master/build/tools/abidiff --suppr padding.suppr test-v0.o test-v1.o
> Functions changes summary: 0 Removed, 0 Changed (1 filtered out), 0 Added function
> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
>
> $
>
This looks very close! Could we just repeat the suppression for each of
the member names we're interested in ignoring or would we need to add
support for the "offset_of_first_data_member_regexp()" function to take
in a list of regexp?
>>
>>
>> === Flex array expansion ===
>> For example:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/rseq.h#L149.
>>
>> Leaving a flex array at the end essentially gives the struct an
>> "infinite" padding field. The algorithm to detect looks almost
>> identical to the ordinary padding expansion except this class can only
>> occur with the flex array at the end of the struct.
>
> I am not sure to understand this case in light of what we have been
> talking about earlier. If a data member is inserted anywhere around
> the flex array field, I would have thought that it should be flagged
> as being a meaningful change to report. What am I missing?
Flex arrays can only be at the end of a struct, so I guess the rule
should be "you can add members between existing members and the ending
flex array".
E.g. this should be fine:
struct foo {
__u32 a;
__u32 b;
+ __u32 c;
char pad[];
};
But not this:
struct foo {
__u32 a;
+ __u32 c;
__u32 b;
char pad[];
};
>> === Anonymous Enums ===
>> Another issue that comes up when comparing ABI across wide swaths of
>> kernel commit history are changes to anonymous enums. From what I can
>> tell, there's not a great way to handle this. If someone adds a new
>> anonymous enum, the tag abidiff gives (e.g. enum __anonymous_enum__6)
>> can change, so abidiff reports a new anonymous enum with all the same
>> fields even though it was basically just a "rename".
>>
>> For reference, this file is full of anonymous enums:
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool_netlink.h#L15.
>> Basically any change in there is triggering a failure from the
>> script.
>
> By 'any change', do you mean any addition of enumerator?
>
> I think we can do something similar to what we do for anonymous
> structs/unions, which it to consider content of the enums, rather than
> just their "made-up" names.
>
> I guess we need to have well defined examples of changes we want to
> handle and I think we can come to a solution.
>
>> I'm not sure how to handle these changes... there are possible ABI
>> breakages when changing anonymous enums, but it's unclear to me how to
>> detect them.
>
> Let's come up with some small examples first, I'd say.
Here's one example of the issue:
https://github.com/torvalds/linux/commit/1d91855304c2046115ee10be2c93161d93d5d40d
(specifically the change to include/uapi/rdma/hns-abi.h).
Here, a variant was added to two anonymous enums:
enum {
HNS_ROCE_EXSGE_FLAGS = 1 << 0,
HNS_ROCE_RQ_INLINE_FLAGS = 1 << 1,
+ HNS_ROCE_CQE_INLINE_FLAGS = 1 << 2,
};
enum {
HNS_ROCE_RSP_EXSGE_FLAGS = 1 << 0,
HNS_ROCE_RSP_RQ_INLINE_FLAGS = 1 << 1,
+ HNS_ROCE_RSP_CQE_INLINE_FLAGS = 1 << 2,
};
abidiff outputs:
[C] 'enum __anonymous_enum__1' changed:
type size hasn't changed
1 enumerator deletion:
'__anonymous_enum__1::HNS_ROCE_RSP_EXSGE_FLAGS' value '1'
3 enumerator insertions:
'__anonymous_enum__::HNS_ROCE_EXSGE_FLAGS' value '1'
'__anonymous_enum__::HNS_ROCE_RQ_INLINE_FLAGS' value '2'
'__anonymous_enum__::HNS_ROCE_CQE_INLINE_FLAGS' value '4'
1 added type unreachable from any public interface:
[A] 'enum __anonymous_enum__1' at hns-abi.h:94:1
It seems like it mixed them up while processing? I'm not exactly sure
what's going on internally.
Another example:
https://github.com/w1ldptr/linux/commit/d219df60a70ed0739aa5dd34b477763311fc5a7b
(specifically the change to include/uapi/linux/bpf.h).
Here, just one enum was modified:
enum {
BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0),
BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1),
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3),
BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4),
BPF_F_ADJ_ROOM_NO_CSUM_RESET = (1ULL << 5),
BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6),
+ BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7),
+ BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8),
};
But abidiff outputs:
[C] 'enum __anonymous_enum__9' changed:
type size hasn't changed
3 enumerator deletions:
'__anonymous_enum__9::BPF_F_ZERO_CSUM_TX' value '2'
'__anonymous_enum__9::BPF_F_DONT_FRAGMENT' value '4'
'__anonymous_enum__9::BPF_F_SEQ_NUMBER' value '8'
9 enumerator insertions:
'__anonymous_enum__14::BPF_F_ADJ_ROOM_FIXED_GSO' value '1'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV4' value '2'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L3_IPV6' value '4'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_GRE' value '8'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L4_UDP' value '16'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_NO_CSUM_RESET' value '32'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_ENCAP_L2_ETH' value '64'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV4' value '128'
'__anonymous_enum__14::BPF_F_ADJ_ROOM_DECAP_L3_IPV6' value '256'
3 added types unreachable from any public interface:
[A] 'enum __anonymous_enum__9' at bpf.h:5781:1
[A] 'struct bpf_rb_node' at bpf.h:6931:1
[A] 'struct bpf_rb_root' at bpf.h:6926:1
Any ideas on how to handle these issues?
Thank you!
- John
next prev parent reply other threads:[~2023-05-23 19:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 0:45 John Moon
2023-04-16 18:33 ` Trilok Soni
2023-04-21 18:21 ` Dodji Seketeli
2023-04-21 20:03 ` Dodji Seketeli
2023-04-24 18:39 ` John Moon
2023-05-10 14:21 ` Dodji Seketeli
2023-05-23 19:59 ` John Moon [this message]
2023-07-05 16:52 ` John Moon
2023-10-05 13:44 ` Support suppressing data members inserted right before flexible array members (was Re: abidiff improvements for kernel UAPI checker) Dodji Seketeli
2023-07-10 10:55 ` abidiff improvements for kernel UAPI checker Dodji Seketeli
2023-09-22 11:39 ` Dodji Seketeli
2023-09-22 11:51 ` Dodji Seketeli
2023-09-22 18:28 ` John Moon
2023-09-22 20:02 ` John Moon
2023-09-26 8:38 ` Dodji Seketeli
2023-09-27 17:37 ` John Moon
2023-09-29 9:52 ` Dodji Seketeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=153dc338-ff1f-4273-1663-e934124e4bcc@quicinc.com \
--to=quic_johmoo@quicinc.com \
--cc=dodji@seketeli.org \
--cc=libabigail@sourceware.org \
--cc=quic_satyap@quicinc.com \
--cc=quic_tsoni@quicinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).