From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id D66933858414 for ; Sun, 7 May 2023 22:56:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D66933858414 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=nabijaczleweli.xyz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nabijaczleweli.xyz Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 31B5F75F4 for ; Mon, 8 May 2023 00:56:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202305; t=1683500179; bh=Jas9LHObRGk+CqlppThYjomdUNapxPRojsPp4PgydwI=; h=Date:From:Cc:Subject:References:In-Reply-To:From; b=aTyurs9EfZCMl1EnvAuK2F3UGUyAUXctZmO0m+DAUA+Y9xLg8U0vlxg6bWMxMj6Zx z6d58PCdAqodPBsyDPb7CuLaNtsl7lpKbR7UniZGAitdkIC/JnieJsiWL7K7qgyW3m ngPtHDNV/owETJa5RvIWWrx5QTVjntRNR/dqLzuZTK3vb7TTIOdikGojFB+kJQhlts Ar2o1K5pq4xJLDipoSWIkEkJyWJ9p/U+XfTiRuxlYMJ+AY02mBM5/L2eu1ffOtEYEu Zik5hO7YmYx2pjJ8+F5rc5dIJJRlFW+Ct+qJv2jM4KZATr4+xdPU1YbwpAkICCZoW8 77NAvxSBvPmjw== Date: Mon, 8 May 2023 00:56:18 +0200 From: =?utf-8?B?0L3QsNCx?= Cc: libc-alpha@sourceware.org Subject: [PATCH v4 3/3] posix: regexec(): fix REG_STARTEND, pmatch->rm_so != 0 w/^ anchor Message-ID: References: <1d5642ecb4bb477c9fd7e1ebaee868fe4ccbefc7.1683500149.git.nabijaczleweli@nabijaczleweli.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="lc5debfac5d2nnjj" Content-Disposition: inline In-Reply-To: <1d5642ecb4bb477c9fd7e1ebaee868fe4ccbefc7.1683500149.git.nabijaczleweli@nabijaczleweli.xyz> User-Agent: NeoMutt/20230407 X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_INFOUSMEBIZ,MISSING_HEADERS,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --lc5debfac5d2nnjj Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable re_search_internal () starts with /* If initial states with non-begbuf contexts have no elements, the regex must be anchored. If preg->newline_anchor is set, we'll never use init_state_nl, so do not check it. */ if (dfa->init_state->nodes.nelem =3D=3D 0 && dfa->init_state_word->nodes.nelem =3D=3D 0 && (dfa->init_state_nl->nodes.nelem =3D=3D 0 || !preg->newline_anchor)) { if (start !=3D 0 && last_start !=3D 0) return REG_NOMATCH; start =3D last_start =3D 0; } and heretofor start and last_start (for example when "abc", {1, 2}, so matching just the "b") were !=3D 0, and the return was taken for a "^b" regex, which is erroneous. Fix this by giving re_search_internal (string+rm_so, start=3D0), then fixing up the returned matches in an after-pass. This brings us to compatibility with the BSD spec and implementations. Signed-off-by: Ahelenia Ziemia=C5=84ska --- posix/regexec.c | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/posix/regexec.c b/posix/regexec.c index bd0cd412d0..2ef868e1f6 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -187,38 +187,53 @@ static reg_errcode_t extend_buffers (re_match_context= _t *mctx, int min_len); string; if REG_NOTEOL is set, then $ does not match at the end. =20 Return 0 if a match is found, REG_NOMATCH if not, REG_BADPAT if - EFLAGS is invalid. */ + EFLAGS is invalid. + + If REG_STARTEND, the bounds are + [STRING + PMATCH->rm_so, STRING + PMATCH->rm_eo) + instead of the usual + [STRING, STRING + strlen(STRING)), + but returned matches are still referenced to STRING, + and matching is unaffected (i.e. "abc", {1, 2} matches regex "^b$"). + re_search_internal () has a built-in assumption of + (start !=3D 0) <=3D> (^ doesn't match), so give it a truncated view + and fix up the matches afterward. */ =20 int regexec (const regex_t *__restrict preg, const char *__restrict string, size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags) { reg_errcode_t err; - Idx start, length; + Idx startoff =3D 0, length; re_dfa_t *dfa =3D preg->buffer; + size_t i =3D 0; =20 if (eflags & ~(REG_NOTBOL | REG_NOTEOL | REG_STARTEND)) return REG_BADPAT; =20 if (eflags & REG_STARTEND) { - start =3D pmatch[0].rm_so; - length =3D pmatch[0].rm_eo; + startoff =3D pmatch[0].rm_so; + string +=3D startoff; + length =3D pmatch[0].rm_eo - startoff; } else - { - start =3D 0; - length =3D strlen (string); - } + length =3D strlen (string); =20 lock_lock (dfa->lock); if (preg->no_sub) - err =3D re_search_internal (preg, string, length, start, length, - length, 0, NULL, eflags); - else - err =3D re_search_internal (preg, string, length, start, length, - length, nmatch, pmatch, eflags); + nmatch =3D 0; + err =3D re_search_internal (preg, string, length, 0, length, + length, nmatch, pmatch, eflags); lock_unlock (dfa->lock); + + if (err =3D=3D REG_NOERROR && startoff) + for (i =3D 0; i < nmatch; ++i) + if (pmatch[i].rm_so !=3D -1) + { + pmatch[i].rm_so +=3D startoff; + pmatch[i].rm_eo +=3D startoff; + } return err !=3D REG_NOERROR; } =20 --=20 2.30.2 --lc5debfac5d2nnjj Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfWlHToQCjFzAxEFjvP0LAY0mWPEFAmRYLJEACgkQvP0LAY0m WPFyWBAAnZYQEpIy8r1VM/ZW62YHFQddPWp3xyqJojDgObh3lXlGk4HXiTGXVeNc PYRSHnXthfSBJ62wecjZs/I3N8/roJgdf4in8ai97XuqL26kvEGDCfCNKlGOLj99 Z/amA8uSX8gp7P8JZwBlzUrBBoy23i+VSwKpdqXi7++Nh0z4OrkxgoCzcdSCSkpn VFr0Na14czGCC9ILRYg/7fS1Ad1KHbwAvNj5r5TPUR6kIPAuP0+JawK5hTIffdec sG63xnaJkM8id7ScFSnpzgKM7Mh16OJEgzsALCa6dy+atnlRL2+xlq8AR5aWWwSB TItMOsgbHOCpBv3azHkw5dWK+6M1igzQ0bDnSczWtX5lRlAwGx2xpxGoBlbeGw7o sqC5ZhJ1h1f++ktY5YBOQX5uXi6Ed0Gj7WM+U/Egcc0VEQ7esXprC07xVKZ6qry/ lEmBR8H6roznC7vrwoNine2SSoV30Nw1V/KW7MlPfDqhCJkxHWBq4iSijQXbS7FL leriMDWFJrGok9+jzFhmWsTZmIi9Kcc9Btxh3WiY3KjAyEqCUc+tavIRem/yKSaF MpjXoTQB0w3l8TGnRVo7Na7BhOKnCGhvr1jBALiUUkOJYTF+IH9G21s9dCF6dzV+ IafFWgDCjRpVRpN8hApCzgOrw1W2xYj2Zuc58w3KFZjx8voMbGg= =LV2b -----END PGP SIGNATURE----- --lc5debfac5d2nnjj--