From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id 461923857733 for ; Fri, 28 Apr 2023 12:36:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 461923857733 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=nabijaczleweli.xyz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nabijaczleweli.xyz Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 9BB656CE8 for ; Fri, 28 Apr 2023 14:36:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202211; t=1682685415; bh=Ou0darbYor7gARa0kcHXZ8fZ6r8VJKcA4OJZ2cKbyGY=; h=Date:From:To:Subject:References:In-Reply-To:From; b=EV/VkGLC5H+9ivX8x2I5tOUt74Z0XwHVbGlwCFiLgZU6CYrctiLp7ff0vaxkDhdqh UJcWM8lJ1NmQN/SCDhlC4rs8spIq4B8O+jaalRDqaxZ/OZMsSoYu8kF0tApqkI1WTg xpV9bpgVD2hDbL+ik8rjIf+ZTBPzRNScPrNgIs4wD1qBEbr5e/jBXG9jEIAif1NElv l6cpOf3iQP0ROfQh8HS5E9JfioPKZnjeEaqGGyijx0R2S3CgbPgA7tqUbRWdKTQCr2 RQd2vqNSKZV1Omxaz7T2bHQcxXfFnAL6qp8wgpiFS2Xj0IIsm604WWUqEW3jcE9asr qfgjWsrTXouTA== Date: Fri, 28 Apr 2023 14:36:54 +0200 From: =?utf-8?B?0L3QsNCx?= To: libc-alpha@sourceware.org Subject: [PATCH v3 3/3] posix: regexec(): fix REG_STARTEND, pmatch->rm_so != 0 w/^ anchor Message-ID: <7b9d17a0ffc408cfe5da6d4aac7fbb2f8a6997b3.1682685278.git.nabijaczleweli@nabijaczleweli.xyz> References: <9dee3d2ba84f09e883cf7a7dfcda486fda735382.1682685278.git.nabijaczleweli@nabijaczleweli.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="4n56w3xsmxzbi47u" Content-Disposition: inline In-Reply-To: <9dee3d2ba84f09e883cf7a7dfcda486fda735382.1682685278.git.nabijaczleweli@nabijaczleweli.xyz> User-Agent: NeoMutt/20230407 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_INFOUSMEBIZ,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --4n56w3xsmxzbi47u Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable re_search_internal () starts with /* If initial states with non-begbuf contexts have no elements, the regex must be anchored. If preg->newline_anchor is set, we'll never use init_state_nl, so do not check it. */ if (dfa->init_state->nodes.nelem =3D=3D 0 && dfa->init_state_word->nodes.nelem =3D=3D 0 && (dfa->init_state_nl->nodes.nelem =3D=3D 0 || !preg->newline_anchor)) { if (start !=3D 0 && last_start !=3D 0) return REG_NOMATCH; start =3D last_start =3D 0; } and heretofor start and last_start (for example when "abc", {1, 2}, so matching just the "b") were !=3D 0, and the return was taken for a "^b" regex, which is erroneous. Fix this by giving re_search_internal (string+rm_so, start=3D0), then fixing up the returned matches in an after-pass. This brings us to compatibility with the BSD spec and implementations. Signed-off-by: Ahelenia Ziemia=C5=84ska --- Keep me in CC, please. posix/regexec.c | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/posix/regexec.c b/posix/regexec.c index bd0cd412d0..2ef868e1f6 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -187,38 +187,53 @@ static reg_errcode_t extend_buffers (re_match_context= _t *mctx, int min_len); string; if REG_NOTEOL is set, then $ does not match at the end. =20 Return 0 if a match is found, REG_NOMATCH if not, REG_BADPAT if - EFLAGS is invalid. */ + EFLAGS is invalid. + + If REG_STARTEND, the bounds are + [STRING + PMATCH->rm_so, STRING + PMATCH->rm_eo) + instead of the usual + [STRING, STRING + strlen(STRING)), + but returned matches are still referenced to STRING, + and matching is unaffected (i.e. "abc", {1, 2} matches regex "^b$"). + re_search_internal () has a built-in assumption of + (start !=3D 0) <=3D> (^ doesn't match), so give it a truncated view + and fix up the matches afterward. */ =20 int regexec (const regex_t *__restrict preg, const char *__restrict string, size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags) { reg_errcode_t err; - Idx start, length; + Idx startoff =3D 0, length; re_dfa_t *dfa =3D preg->buffer; + size_t i =3D 0; =20 if (eflags & ~(REG_NOTBOL | REG_NOTEOL | REG_STARTEND)) return REG_BADPAT; =20 if (eflags & REG_STARTEND) { - start =3D pmatch[0].rm_so; - length =3D pmatch[0].rm_eo; + startoff =3D pmatch[0].rm_so; + string +=3D startoff; + length =3D pmatch[0].rm_eo - startoff; } else - { - start =3D 0; - length =3D strlen (string); - } + length =3D strlen (string); =20 lock_lock (dfa->lock); if (preg->no_sub) - err =3D re_search_internal (preg, string, length, start, length, - length, 0, NULL, eflags); - else - err =3D re_search_internal (preg, string, length, start, length, - length, nmatch, pmatch, eflags); + nmatch =3D 0; + err =3D re_search_internal (preg, string, length, 0, length, + length, nmatch, pmatch, eflags); lock_unlock (dfa->lock); + + if (err =3D=3D REG_NOERROR && startoff) + for (i =3D 0; i < nmatch; ++i) + if (pmatch[i].rm_so !=3D -1) + { + pmatch[i].rm_so +=3D startoff; + pmatch[i].rm_eo +=3D startoff; + } return err !=3D REG_NOERROR; } =20 --=20 2.30.2 --4n56w3xsmxzbi47u Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfWlHToQCjFzAxEFjvP0LAY0mWPEFAmRLveYACgkQvP0LAY0m WPFPgA//QShDRIAZyuLb+qUu3eEFS5BVtV72MyB2B1U+k2Rd9lPk/thGkRKVO59a rthycIx2kzqynoe3eLmoWcZAJ17l9alWt3MT6n7bp9gQwLTQTxwm7bFmfwkoFm93 87FJxxIE7YOI9r4zj2jmNF4awtxh/vL2R/larKGDPpx3qZcnjatflKVjEbN3lgsF PNd7V8CWh9j3QWlH3YHU3yFpjSBfVING+Ako9SQbo21V04WnO2AZQaW+o2QvjDW0 K7wS/jqOBiP1RDKCvePifyhtt5GGuqsNQi9FR/p2DtPP0wL7DoLRpfk8oLeusBvf +EUVqF4wncvCBFQeHHsFfqgiPbpxdM0CQfuwyBg9RTfLwTj2I83rqY9v/krI2BT3 GyQIEsJVlf9wyLRKRPU0y15w4R/28Sk2IwLcVcILbHHu/JFwWdaNruFfCT+RORKS u/g/Jb/iWUZzEcByYCj0XGGhV9zK8oS/9NkhFmgMgrdYq4AD9SOukaBkXAVHTetp oaO1WJ28AF5udHJwsgxUKqdhFiyXau2sz4Qa2/si2gzTiYr6um0mUdT+Il/Be1C1 dc5ENL7vHR/rW3LqKSiuBKq4zFdywTXhHONC17dwb7NCqangLOfsei10mBbHbRKO uod4uSuWhGi13M9nf3sAILqrL4/Jv7x71LemxQfjO9tUVqVPEe0= =LivR -----END PGP SIGNATURE----- --4n56w3xsmxzbi47u--