From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id 28D893857437 for ; Mon, 29 May 2023 13:22:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 28D893857437 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=nabijaczleweli.xyz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nabijaczleweli.xyz Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 7D9E19040 for ; Mon, 29 May 2023 15:22:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202305; t=1685366569; bh=AdaGhH0XOPV8RTFjHlnsROutwIn8Y+PKCVcCPNwvTSw=; h=Date:From:Cc:Subject:References:In-Reply-To:From; b=KmPwLpvZdLEZqc8rA+tu28e0AADZmXvEQjvEsJaIUECeiEAYU4sumNvugVrV0j8K1 lt/buhXGE3/vIKCjmjq3SGvlSDHXEInl+zvrNNHNpAV7QcD1+Ikn4WjXCo4CLSHUcj H805d+A2GQkb4/BOImt4FJ7KVnzpjvPjCr8NZvd7LlhzMuWVD8qtl7Z+wNIpfChnki MAn7wwPryNA808wfm59MHdUrZHV6Sxoqz0dvfqXmgHtUySDeYF20uFQDC8yv1m450u 9idUKn1Q9YJQ/Bv1a//wBn0fTt73pzyZujDAnaUytsvIqXGRr88OUpeQfecWqMOXSl gJe6rz5J8AgEQ== Date: Mon, 29 May 2023 15:22:48 +0200 From: =?utf-8?B?0L3QsNCx?= Cc: libc-alpha@sourceware.org Subject: [PATCH v5 3/3] posix: regexec(): fix REG_STARTEND, pmatch->rm_so != 0 w/^ anchor Message-ID: References: <1d5642ecb4bb477c9fd7e1ebaee868fe4ccbefc7.1683500149.git.nabijaczleweli@nabijaczleweli.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="rav5ggsajvfcw5xi" Content-Disposition: inline In-Reply-To: <1d5642ecb4bb477c9fd7e1ebaee868fe4ccbefc7.1683500149.git.nabijaczleweli@nabijaczleweli.xyz> User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_INFOUSMEBIZ,MISSING_HEADERS,PDS_RDNS_DYNAMIC_FP,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --rav5ggsajvfcw5xi Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable re_search_internal () starts with /* If initial states with non-begbuf contexts have no elements, the regex must be anchored. If preg->newline_anchor is set, we'll never use init_state_nl, so do not check it. */ if (dfa->init_state->nodes.nelem =3D=3D 0 && dfa->init_state_word->nodes.nelem =3D=3D 0 && (dfa->init_state_nl->nodes.nelem =3D=3D 0 || !preg->newline_anchor)) { if (start !=3D 0 && last_start !=3D 0) return REG_NOMATCH; start =3D last_start =3D 0; } and heretofor start and last_start (for example when "abc", {1, 2}, so matching just the "b") were !=3D 0, and the return was taken for a "^b" regex, which is erroneous. Fix this by giving re_search_internal (string+rm_so, start=3D0), then fixing up the returned matches in an after-pass. This brings us to compatibility with the BSD spec and implementations. Signed-off-by: Ahelenia Ziemia=C5=84ska --- posix/regexec.c | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/posix/regexec.c b/posix/regexec.c index bd0cd412d0..2ef868e1f6 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -187,38 +187,53 @@ static reg_errcode_t extend_buffers (re_match_context= _t *mctx, int min_len); string; if REG_NOTEOL is set, then $ does not match at the end. =20 Return 0 if a match is found, REG_NOMATCH if not, REG_BADPAT if - EFLAGS is invalid. */ + EFLAGS is invalid. + + If REG_STARTEND, the bounds are + [STRING + PMATCH->rm_so, STRING + PMATCH->rm_eo) + instead of the usual + [STRING, STRING + strlen(STRING)), + but returned matches are still referenced to STRING, + and matching is unaffected (i.e. "abc", {1, 2} matches regex "^b$"). + re_search_internal () has a built-in assumption of + (start !=3D 0) <=3D> (^ doesn't match), so give it a truncated view + and fix up the matches afterward. */ =20 int regexec (const regex_t *__restrict preg, const char *__restrict string, size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags) { reg_errcode_t err; - Idx start, length; + Idx startoff =3D 0, length; re_dfa_t *dfa =3D preg->buffer; + size_t i =3D 0; =20 if (eflags & ~(REG_NOTBOL | REG_NOTEOL | REG_STARTEND)) return REG_BADPAT; =20 if (eflags & REG_STARTEND) { - start =3D pmatch[0].rm_so; - length =3D pmatch[0].rm_eo; + startoff =3D pmatch[0].rm_so; + string +=3D startoff; + length =3D pmatch[0].rm_eo - startoff; } else - { - start =3D 0; - length =3D strlen (string); - } + length =3D strlen (string); =20 lock_lock (dfa->lock); if (preg->no_sub) - err =3D re_search_internal (preg, string, length, start, length, - length, 0, NULL, eflags); - else - err =3D re_search_internal (preg, string, length, start, length, - length, nmatch, pmatch, eflags); + nmatch =3D 0; + err =3D re_search_internal (preg, string, length, 0, length, + length, nmatch, pmatch, eflags); lock_unlock (dfa->lock); + + if (err =3D=3D REG_NOERROR && startoff) + for (i =3D 0; i < nmatch; ++i) + if (pmatch[i].rm_so !=3D -1) + { + pmatch[i].rm_so +=3D startoff; + pmatch[i].rm_eo +=3D startoff; + } return err !=3D REG_NOERROR; } =20 --=20 2.30.2 --rav5ggsajvfcw5xi Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfWlHToQCjFzAxEFjvP0LAY0mWPEFAmR0pygACgkQvP0LAY0m WPEn3g//UTCuSkvDNJ/iSv96DnR5KrEHnfsuwimWcOzX28e+T6FFfbaOG3G0KCF0 ogkiHTdFezLCrfQYeoHNEf6BfVvHcmTGKE94TMQeRKut76UOx4OV4+ChbMRSNVlS YtzNMJ/dHuuF/RIgivvbnnTn5lp2bIKw6m47qia/6kv/4OeU8K3+HYJ21SnnPrxp jvnSGrhpFa9cdKakWmXIiBOzhSb1mrplNl2y2oO8q/CdE3kWN4TMfDIT2qBodIs/ yNEGC/fcbKzDTr9KooOlygr8HlcvQeeo7ZD8dyRE7uGyI53z317bEApbbEq/oy3l uqLV4pdERS9GjO2imH+M3VAI3BNS5sXHC9rPBaGt0fD5j1xhizYzmV4NsacQ1p11 11rDTszqAHaPR4+F5MnJQHu3VWCap7MOPo64BfrccPMoGoVvyIeacZoVAEpuf8O5 MqfvA7Sg3XvZbT5jFdMHkY5x1VpdlDkOJL62djFc/cQZvy5I77sP5McbytoOVScF x92UpZSPrQc1sA4Dh0NVNJApvQWdyfJNJllZV8R1YbSSuBjALI9hf+iojQscdqwL Q1LVdKrFagduegeWyxwwM/MUuSJXruQdcot16iTaPqk8GerW1L5JFhNje8Gc9t1N 0wYe3EFHU6yXBZqSDZ9T/TeAi8bIVuU1zsZHHye0MwrVqbfhpdw= =+4u2 -----END PGP SIGNATURE----- --rav5ggsajvfcw5xi--