From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay2-d.mail.gandi.net (relay2-d.mail.gandi.net [217.70.183.194]) by sourceware.org (Postfix) with ESMTPS id 3D006384600C for ; Wed, 10 May 2023 14:21:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3D006384600C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=seketeli.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=seketeli.org Received: (Authenticated sender: dodji@seketeli.org) by mail.gandi.net (Postfix) with ESMTPSA id 57C4440012; Wed, 10 May 2023 14:21:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=seketeli.org; s=gm1; t=1683728501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2aUvd+jKfztf8deFJXT98J6czVeB8lxPqhy/i45QgM4=; b=URmbqbhOJUSiYGZ4MHA7ek1FoJq0OiHqtuCyKImRBJvcLEPa1hJyaE6ivBla7i3VwN3Z0m 2Y3qaUHSXvGT6dlTJ+kbX2F0BUA085nFVy3fBaBSSS3bOMl0sh6jnwXm5ZjkoedfD9GHzv lxtXFCYAJCpAJ/EOTcK2OA4YFT97pOrZJMWLxCHFp/TwG/uhP3SVPxQzvEv3fxigSho/wz 5D2itsO/gW6W3naBcJSNkTApuCjSmL9A2dOP8SiX/Yd/TmrauvN63SNSOwUKaU6J6vDm9a aOBhyhxh8MXnlkdAR80xEXAkciMcUFwiAA97xOX1n/IvyMsuAQPZ1Vyu4ZXExw== Received: by localhost (Postfix, from userid 1000) id 6C9FEB5078; Wed, 10 May 2023 16:21:39 +0200 (CEST) From: Dodji Seketeli To: John Moon Cc: , Trilok Soni , "Satya Durga Srinivasu Prabhala" Subject: Re: abidiff improvements for kernel UAPI checker Organization: Me, myself and I References: <5363161d-8167-284e-e35d-9a8ef20adea9@quicinc.com> <877cu5t7tw.fsf@seketeli.org> <87354tt32o.fsf@seketeli.org> <340b33bd-2b43-9f99-58e1-f1b77a51b48a@quicinc.com> X-Operating-System: CentOS Stream release 9 X-URL: http://www.seketeli.net/~dodji Date: Wed, 10 May 2023 16:21:39 +0200 In-Reply-To: <340b33bd-2b43-9f99-58e1-f1b77a51b48a@quicinc.com> (John Moon's message of "Mon, 24 Apr 2023 11:39:48 -0700") Message-ID: <87sfc42rnw.fsf@seketeli.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello John, [...] > Thanks so much for getting the ball rolling on this! You are welcome! > I agree that suppressions seem to be the correct mechanism here and > I don't foresee any issue using them from the check script we're > working on. Nice to hear :-) > Before seeing this, I had gotten started on implementing some (hacky) > filters based on the output of abidiff. While testing on the Linux > codebase, I found some interesting corner cases. I'll go into the > details here so hopefully we can have more discussion and design the > right solution. Thank you. > There are basically three classes of "false positives" that we'd like > to eventually be able to filter: > > 1) Enum expansion > 2) Padding field expansion > 3) Flex array expansion ACK. > === Enum expansion === > Here are some places in the kernel where the script reported expansion > as a breaking change: > > One "MAX" member: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool.h#L1789 > > Two "MAX" members: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L6398 > > Three "MAX" members: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/nl80211.h#L3311 > > Having two is pretty common. I can only find the one occurrence of > having three of these variants... maybe the suppression could take an > "n_allowed" parameter of some sort? I guess this can certainly be done, yes. And it does make sense to me. > Also, would it make sense to limit the check to enumerator variants > that have somewhat common names for that case? It could evolve with > time, but so far I've found "MAX", "LAST", "NUM", and "NBITS" being > fairly common. Maybe the suppression could also take > "allowed_strings"? Something like this? Just so you know, there is already the "changed_enumerators" property that is available[1]. It takes the list of enumerators that are allowed to change. It doesn't support regular expressions, yet. Maybe we can either make it take enumerators or add a new one named "changed_enumerators_regexp" which takes a list of regular expressions describing the allowed changed enumerators. In any case, that property will have to be made to work in tandem with the allow_n_last_enumarator_changes property. [1]: https://sourceware.org/libabigail/manual/libabigail-concepts. [Look for the "changed_enumerator" word in there] > [suppress_type] > type_kind = enum > allow_n_last_enumerator_changes = 3 > allow_n_last_enumerator_change_names = [".*MAX.*", ".*LAST.*", ".*NUM.*", ".*NBITS.*"] Right. Or, rather, the syntax would be: [suppress_type] type_kind = enum allow_n_last_enumerator_changes = 3 changed_enumerators_regexp = .*MAX.*, .*LAST.*, .*NUM.*, .*NBITS.* (no need for the string delimiters as everything is considered a string by default) Would that be OK? > This is the algorithm I was working on to detect these changes: > - The type size did not change > - The change is to an enum > - The number of changed variants is < 4 > - The changed variants contain at least one of the above strings > - The changed variants' values are all >= the last newly-inserted value ACK. > === Padding field expansion === > For example, you'll see lots of padding fields like this: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fuse.h#L908 > > These don't necessarily have to occur at the end of a struct. You > could expand into a padding field in the middle somewhere. > > This is the algorithm I was working on to detect these changes: > - The type size did not change > - There is only one data member being changed > - The changed member name contains at least one of: "pad", > "reserved", "end", or "last" > - The first new member being inserted has an offset equal to the > changed member's previous offset Just so you know, there are a number of properties of the [suppress_type] section that are designed to address this kind of suppressions. I think some can be extended to address this use case. E.g, there is the has_data_member_inserted_at property which I believe can help with this use case. Here is a quick screen shot to show you what it does so far, in the latest 2.3 libabigail version that got recently released: $ cat -n test-v0.cc 1 #include 2 3 struct S 4 { 5 int a; 6 char padding[64]; 7 int b; 8 }; 9 10 void 11 foo(S&) 12 {} $ cat -n test-v1.cc 1 #include 2 3 struct S 4 { 5 int a; 6 uint32_t added; 7 char padding[60]; 8 int b; 9 }; 10 11 void 12 foo(S&) 13 {} $ diff -u test-v0.cc test-v1.cc --- test-v0.cc 2023-05-10 13:51:19.798072787 +0200 +++ test-v1.cc 2023-05-10 13:54:53.879082831 +0200 @@ -3,7 +3,8 @@ struct S { int a; - char padding[64]; + uint32_t added; + char padding[60]; int b; }; $ gcc -g -c test-v{0,1}.cc $ abidiff test-v0.o test-v1.o Functions changes summary: 0 Removed, 1 Changed, 0 Added function Variables changes summary: 0 Removed, 0 Changed, 0 Added variable 1 function with some indirect sub-type change: [C] 'function void foo(S&)' at test-v0.cc:11:1 has some indirect sub-type changes: parameter 1 of type 'S&' has sub-type changes: in referenced type 'struct S' at test-v1.cc:3:1: type size hasn't changed 1 data member insertion: 'uint32_t added', at offset 32 (in bits) at test-v1.cc:6:1 1 data member change: type of 'char padding[64]' changed: type name changed from 'char[64]' to 'char[60]' array type size changed from 512 to 480 array type subrange 1 changed length from 64 to 60 and offset changed from 32 to 64 (in bits) (by +32 bits) $ cat padding.suppr [suppress_type] type_kind = struct has_data_member_inserted_at = offset_of_first_data_member_regexp(.*padding) $ /home/dodji/git/libabigail/master/build/tools/abidiff --suppr padding.suppr test-v0.o test-v1.o Functions changes summary: 0 Removed, 0 Changed (1 filtered out), 0 Added function Variables changes summary: 0 Removed, 0 Changed, 0 Added variable $ > > > === Flex array expansion === > For example: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/rseq.h#L149. > > Leaving a flex array at the end essentially gives the struct an > "infinite" padding field. The algorithm to detect looks almost > identical to the ordinary padding expansion except this class can only > occur with the flex array at the end of the struct. I am not sure to understand this case in light of what we have been talking about earlier. If a data member is inserted anywhere around the flex array field, I would have thought that it should be flagged as being a meaningful change to report. What am I missing? > === Anonymous Enums === > Another issue that comes up when comparing ABI across wide swaths of > kernel commit history are changes to anonymous enums. From what I can > tell, there's not a great way to handle this. If someone adds a new > anonymous enum, the tag abidiff gives (e.g. enum __anonymous_enum__6) > can change, so abidiff reports a new anonymous enum with all the same > fields even though it was basically just a "rename". > > For reference, this file is full of anonymous enums: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool_netlink.h#L15. > Basically any change in there is triggering a failure from the > script. By 'any change', do you mean any addition of enumerator? I think we can do something similar to what we do for anonymous structs/unions, which it to consider content of the enums, rather than just their "made-up" names. I guess we need to have well defined examples of changes we want to handle and I think we can come to a solution. > I'm not sure how to handle these changes... there are possible ABI > breakages when changing anonymous enums, but it's unclear to me how to > detect them. Let's come up with some small examples first, I'd say. > Again, thanks so much for taking the initiative here. Let me know what > you think about the above topics and what next steps we could take! No problem, my pleasure. Cheers, -- Dodji