From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) by sourceware.org (Postfix) with ESMTPS id 6E7CE39730F5 for ; Fri, 1 May 2020 15:57:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6E7CE39730F5 Received: by mail-io1-xd44.google.com with SMTP id f3so5293514ioj.1 for ; Fri, 01 May 2020 08:57:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=RhAcLXMCB00zs3eYARH4IbGi4cPiXQ/UzYYxMJmQ6S4=; b=B3cRZ0e0JbduSPmmqqY8ERaljLS4ewALg4pUEYgPeevkaW0QstvpjpnL3zGgQ7eIFu XXOhTL1Kc2UuOiCEfYNQLInNURvwp5lZ6ZelOdZvXUEUkbhQwfKy85nwTbq8Ap4vPxuI pk7ZEaty8Zt2T3dt5VaCCHwmScm35cBE77Fg/ngyuccB53cE+2AeS4H8nXDJhtQUIdFm eme8tYzdNBq8C5M+pTReomXkXGyoJlhiAPuAqZCggwgDuxYB6DkEx+moiMMpiT92fSHS XccZc5LNcr2wndKvyAUAvhfM6+aSjBhN7PAFbMXA+1KT7KvtLDvDKQhdzxa3EMxYNQcR lzFQ== X-Gm-Message-State: AGi0PuZHZiEiiQiVVtSScMLT0wWILa2VrIFo1kmkwXqB0hMdfKwBmTZp 6WsKc+X36LsaQ2wAD9Xt9W9ybx0xXsh7Md3NX1PGLg== X-Google-Smtp-Source: APiQypImL+Vx7HARCnq4hjRQsCMgUf6w6ZU8RZHQyGv/ZomaUULJOLR1sHheFyEF3gws5v4ywkOAg7be5q1jBhtGvmo= X-Received: by 2002:a02:c959:: with SMTP id u25mr3760048jao.46.1588348619680; Fri, 01 May 2020 08:56:59 -0700 (PDT) MIME-Version: 1.0 References: <20200420164153.5213-1-gprocida@google.com> <87a733d42i.fsf@seketeli.org> In-Reply-To: From: Giuliano Procida Date: Fri, 1 May 2020 16:56:43 +0100 Message-ID: Subject: Re: [PATCH 1/3] test-diff-suppr: Improve regexes in tests. To: Dodji Seketeli Cc: libabigail@sourceware.org, kernel-team@android.com, =?UTF-8?Q?Matthias_M=C3=A4nnich?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-39.1 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libabigail mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 May 2020 15:57:02 -0000 Hi Dodji. On Wed, 22 Apr 2020 at 20:11, Giuliano Procida wrote: > > Hi there. > > Thanks for the review. > > On Wed, 22 Apr 2020 at 14:48, Dodji Seketeli wrote: > > > > Hello Giuliano, > > > > Giuliano Procida a =C3=A9crit: > > > > > The are some (mostly inconsequential) issues with regexes: > > > > > > - Leading and trailing .* are redundant and can be removed. > > > > They are redundant I agree, but that redundancy can be useful, I think. > > The reason why I have put them in the test is that in the past, I have > > tried different regular expression engines. Some of them would > > implement the quantifier in a greedy manner, rather than in a lazy > > manner. Whereas, what I find more useful is the lazy variant. So that > > writing ".*blah" would do what you would expect (rather than having to > > write .*?blah) > > If we only care about match existence (which is the case for > libabigail), greediness shouldn't matter, the engine should backtrack. > Do you mean possessive matching (I've just found this on Wikipedia, > it's not something I thought regex engines had)? That would be painful > to work with. > > If you want more features (like being able to match foofoo for any > foo), you'll want PCRE. If you are thinking of embedding libabigail in > a server, you'll want RE2. > > > So putting that in those tests explicitly will hopefully help us catch > > potential issues if we are to switch to a different engine tomorrow. S= o > > I'd prefer keeping that redundancy in those tests. > > Fair enough. > > > [...] > > > > > - File name matches are full path > > > > Are they? > > Correction, some file name matches seem to be full path. This may be a > bug. I've found: Have you had a chance to look at these cases where full path matching is being done? Would you like a patch that makes it base name matching instead? Regards, Giuliano. > Path in XML > corpus_is_suppressed_by_soname_or_filename > suppression_matches_soname_or_filename > matches_binary_name (@param binary_name the full path to the binary) > > ELF path > suppression_can_match > matches_binary_name (@param binary_name the full path to the binary) > > I've not traced through the code, only looked at it. This change > breaks make check: > > diff --git a/tests/data/test-diff-suppr/test24-soname-suppr-2.txt > b/tests/data/test-diff-suppr/test24-soname-suppr-2.txt > index d11e632c..8ac78817 100644 > --- a/tests/data/test-diff-suppr/test24-soname-suppr-2.txt > +++ b/tests/data/test-diff-suppr/test24-soname-suppr-2.txt > @@ -1,4 +1,4 @@ > [suppress_type] > - file_name_regexp =3D .*/libtest24-soname-v0.so$ > + file_name_regexp =3D ^libtest24-soname-v0.so$ > name =3D S > accessed_through =3D reference-or-pointer > diff --git a/tests/data/test-diff-suppr/test24-soname-suppr-3.txt > b/tests/data/test-diff-suppr/test24-soname-suppr-3.txt > index 00d3428a..e94a6477 100644 > --- a/tests/data/test-diff-suppr/test24-soname-suppr-3.txt > +++ b/tests/data/test-diff-suppr/test24-soname-suppr-3.txt > @@ -1,4 +1,4 @@ > [suppress_type] > - file_name_regexp =3D .*/libtest24-soname-v1.so$ > + file_name_regexp =3D ^libtest24-soname-v1.so$ > name =3D S > accessed_through =3D reference-or-pointer > > > It seems to me file_suppression::suppress_file() acts on the base name > > of the file, rather than on its full path. > > > > I might be missing something here, please tell me. > > > > > so should start with (^|/) if trying to match a base name, assuming a > > > Unix-like filesystem. > > > > I'd need the above to be clarified to say for sure. > > > > > Given these are just tests, it's not that important, but they > > > still serve as examples. > > > - In cases where the ^ anchor was used, full paths would usually > > > fail to match. In such cases, the regex was being ignored for > > > other reasons (see later patch) or is expected not to match > > > anyway. > > > - In many cases, the $ anchor could be considered to be missing. > > > - The .ini parser unescapes string values, so escaping regex > > > metacharacters requires a double backslash. Single backslashes > > > are pointless. > > > - The dot metacharacter is used unescaped in a few places where a > > > literal was likely intended, so should be escaped. > > > - The characters [ and ] don't need to be (.ini) escaped. > > > > They don't *need* to (in a string) I agree. But it they are escaped, > > that should work. It's a way to test the init parser as well. So I'd > > keep the redundancy there as well. > > I'd go as far to say "\." should be "\\.". In fact, changing all the > tests to this exposes a bug in the init printer (it doesn't re-escape > strings). > > For the rest, I'm much less opinionated. :-) > > > [...] > > > > Cheers, > > Regards, > Giuliano. > > > -- > > Dodji > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send = an email to kernel-team+unsubscribe@android.com. > >