public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: наб <nabijaczleweli@nabijaczleweli.xyz>
To: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org, Carlos O'Donell <carlos@redhat.com>
Subject: [PATCH v7 2/3] posix: regexec(): fix REG_STARTEND, pmatch->rm_so != 0 w/^ anchor
Date: Mon, 12 Jun 2023 02:47:52 +0200	[thread overview]
Message-ID: <695cd581035b59c759477b806640ee0b70df05f7.1686530834.git.nabijaczleweli@nabijaczleweli.xyz> (raw)
In-Reply-To: <4450a8f3-3774-5bbc-ebe2-64d8a25fdacc@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 3356 bytes --]

re_search_internal () starts with
  /* If initial states with non-begbuf contexts have no elements,
     the regex must be anchored.  If preg->newline_anchor is set,
     we'll never use init_state_nl, so do not check it.  */
  if (dfa->init_state->nodes.nelem == 0
      && dfa->init_state_word->nodes.nelem == 0
      && (dfa->init_state_nl->nodes.nelem == 0
	  || !preg->newline_anchor))
    {
      if (start != 0 && last_start != 0)
        return REG_NOMATCH;
      start = last_start = 0;
    }
and heretofor start and last_start (for example when "abc", {1, 2},
so matching just the "b") were != 0, and the return was taken for a "^b"
regex, which is erroneous.

Fix this by giving re_search_internal (string+rm_so, start=0),
then fixing up the returned matches in an after-pass.

This brings us to compatibility with the BSD spec and implementations.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 posix/regexec.c | 41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/posix/regexec.c b/posix/regexec.c
index bd0cd412d0..2ef868e1f6 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -187,38 +187,53 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len);
    string; if REG_NOTEOL is set, then $ does not match at the end.
 
    Return 0 if a match is found, REG_NOMATCH if not, REG_BADPAT if
-   EFLAGS is invalid.  */
+   EFLAGS is invalid.
+
+   If REG_STARTEND, the bounds are
+     [STRING + PMATCH->rm_so, STRING + PMATCH->rm_eo)
+   instead of the usual
+     [STRING, STRING + strlen(STRING)),
+   but returned matches are still referenced to STRING,
+   and matching is unaffected (i.e. "abc", {1, 2} matches regex "^b$").
+   re_search_internal () has a built-in assumption of
+   (start != 0) <=> (^ doesn't match), so give it a truncated view
+   and fix up the matches afterward.  */
 
 int
 regexec (const regex_t *__restrict preg, const char *__restrict string,
 	 size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags)
 {
   reg_errcode_t err;
-  Idx start, length;
+  Idx startoff = 0, length;
   re_dfa_t *dfa = preg->buffer;
+  size_t i = 0;
 
   if (eflags & ~(REG_NOTBOL | REG_NOTEOL | REG_STARTEND))
     return REG_BADPAT;
 
   if (eflags & REG_STARTEND)
     {
-      start = pmatch[0].rm_so;
-      length = pmatch[0].rm_eo;
+      startoff = pmatch[0].rm_so;
+      string += startoff;
+      length = pmatch[0].rm_eo - startoff;
     }
   else
-    {
-      start = 0;
-      length = strlen (string);
-    }
+    length = strlen (string);
 
   lock_lock (dfa->lock);
   if (preg->no_sub)
-    err = re_search_internal (preg, string, length, start, length,
-			      length, 0, NULL, eflags);
-  else
-    err = re_search_internal (preg, string, length, start, length,
-			      length, nmatch, pmatch, eflags);
+    nmatch = 0;
+  err = re_search_internal (preg, string, length, 0, length,
+			    length, nmatch, pmatch, eflags);
   lock_unlock (dfa->lock);
+
+  if (err == REG_NOERROR && startoff)
+    for (i = 0; i < nmatch; ++i)
+      if (pmatch[i].rm_so != -1)
+	{
+	  pmatch[i].rm_so += startoff;
+	  pmatch[i].rm_eo += startoff;
+	}
   return err != REG_NOERROR;
 }
 
-- 
2.39.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2023-06-12  0:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-07 22:56 [PATCH v4 1/3] posix: add (failing) test for REG_STARTEND наб
2023-05-07 22:56 ` [PATCH v4 2/3] posix: regcomp(): clear RE_DOT_NOT_NULL наб
2023-05-07 22:56 ` [PATCH v4 3/3] posix: regexec(): fix REG_STARTEND, pmatch->rm_so != 0 w/^ anchor наб
2023-05-29 18:11   ` Adhemerval Zanella Netto
2023-05-29 13:22 ` [PATCH v5 1/3] posix: add (failing) test for REG_STARTEND наб
2023-05-29 13:22 ` [PATCH v5 2/3] posix: regcomp(): clear RE_DOT_NOT_NULL наб
2023-05-29 13:22 ` [PATCH v5 3/3] posix: regexec(): fix REG_STARTEND, pmatch->rm_so != 0 w/^ anchor наб
2023-05-29 17:37 ` [PATCH v4 1/3] posix: add (failing) test for REG_STARTEND Adhemerval Zanella Netto
2023-05-29 20:10   ` наб
2023-05-29 20:23     ` Adhemerval Zanella Netto
2023-06-12  0:47       ` [PATCH v7 1/3] posix: regcomp(): clear RE_DOT_NOT_NULL наб
2023-06-12 13:11         ` Carlos O'Donell
2023-06-12  0:47       ` наб [this message]
2023-06-12 13:11         ` [PATCH v7 2/3] posix: regexec(): fix REG_STARTEND, pmatch->rm_so != 0 w/^ anchor Carlos O'Donell
2023-06-12 14:03           ` наб
2023-06-12  0:47       ` [PATCH v7 3/3] posix: add test for REG_STARTEND наб

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=695cd581035b59c759477b806640ee0b70df05f7.1686530834.git.nabijaczleweli@nabijaczleweli.xyz \
    --to=nabijaczleweli@nabijaczleweli.xyz \
    --cc=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).