public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
* Fix wrong implementation of \B (BZ #693)
@ 2005-01-26 19:02 Paolo Bonzini
  2005-01-26 19:05 ` Jakub Jelinek
  0 siblings, 1 reply; 2+ messages in thread
From: Paolo Bonzini @ 2005-01-26 19:02 UTC (permalink / raw)
  To: libc-hacker; +Cc: karl, arnold, drepper

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

This is pretty easy to do, as all the code is already there to implement
\b.  Just like \b is lowered to \<|\>, \B is lowered to the disjunction
of two (otherwise unavailable) constraints, "inside word" and "outside
word".

The patch is on top of the BZ #605 and BZ #611 patches (you were waiting
for the BZ 605 changelog, but it can be found in the bugzilla audit
trail, as well as in the ping I sent you by private mail :-).

Once my patch queue is over (apart from these, there are just a couple
more), I'll backport this to 2.3 unless someone beats me to it.

Paolo





[-- Attachment #2: 693-fix-slashB.patch --]
[-- Type: text/plain, Size: 2576 bytes --]

2005-01-26  Paolo Bonzini  <bonzini@gnu.org>

	* posix/regcomp.c (peek_token): Fix ctx_type for \B.
	(parse_expression): Lower token->opr.ctx_type == NOT_WORD_DELIM.
 	* posix/regex_internal.h (re_context_type): Add NOT_WORD_DELIM
	and OUTSIDE_WORD.
	* posix/PCRE.tests: Adjust \B tests to check if it matches outside
	a word.


--- orig/posix/regcomp.c
+++ mod/posix/regcomp.c
@@ -1864,7 +1864,7 @@ peek_token (token, input, syntax)
 	  if (!(syntax & RE_NO_GNU_OPS))
 	    {
 	      token->type = ANCHOR;
-	      token->opr.ctx_type = INSIDE_WORD;
+	      token->opr.ctx_type = NOT_WORD_DELIM;
 	    }
 	  break;
 	case 'w':
@@ -2352,15 +2352,16 @@ parse_expression (regexp, preg, token, s
       break;
     case ANCHOR:
       if ((token->opr.ctx_type
-	   & (WORD_DELIM | INSIDE_WORD | WORD_FIRST | WORD_LAST))
+	   & (WORD_DELIM | NOT_WORD_DELIM | WORD_FIRST | WORD_LAST))
 	  && dfa->word_ops_used == 0)
 	init_word_char (dfa);
-      if (token->opr.ctx_type == WORD_DELIM)
+      if (token->opr.ctx_type >= DUMMY_CONSTRAINT)
 	{
+	  int delim = (token->opr.ctx_type == WORD_DELIM);
 	  bin_tree_t *tree_first, *tree_last;
-	  token->opr.ctx_type = WORD_FIRST;
+	  token->opr.ctx_type = delim ? WORD_FIRST : INSIDE_WORD;
 	  tree_first = create_token_tree (dfa, NULL, NULL, token);
-	  token->opr.ctx_type = WORD_LAST;
+	  token->opr.ctx_type = delim ? WORD_LAST : OUTSIDE_WORD;
 	  tree_last = create_token_tree (dfa, NULL, NULL, token);
 	  tree = create_tree (dfa, tree_first, tree_last, OP_ALT);
 	  if (BE (tree_first == NULL || tree_last == NULL || tree == NULL, 0))


--- orig/posix/regex_internal.h
+++ mod/posix/regex_internal.h
@@ -148,13 +148,15 @@ static inline void bitset_mask (bitset d
 typedef enum
 {
   INSIDE_WORD = PREV_WORD_CONSTRAINT | NEXT_WORD_CONSTRAINT,
+  OUTSIDE_WORD = PREV_NOTWORD_CONSTRAINT | NEXT_NOTWORD_CONSTRAINT,
   WORD_FIRST = PREV_NOTWORD_CONSTRAINT | NEXT_WORD_CONSTRAINT,
   WORD_LAST = PREV_WORD_CONSTRAINT | NEXT_NOTWORD_CONSTRAINT,
   LINE_FIRST = PREV_NEWLINE_CONSTRAINT,
   LINE_LAST = NEXT_NEWLINE_CONSTRAINT,
   BUF_FIRST = PREV_BEGBUF_CONSTRAINT,
   BUF_LAST = NEXT_ENDBUF_CONSTRAINT,
-  WORD_DELIM = DUMMY_CONSTRAINT
+  WORD_DELIM = DUMMY_CONSTRAINT,
+  NOT_WORD_DELIM = DUMMY_CONSTRAINT << 1,
 } re_context_type;
 
 typedef struct


--- orig/posix/PCRE.tests
+++ mod/posix/PCRE.tests
@@ -1420,17 +1420,23 @@ No match
     -a-
 No match
 
-/\By\b/
+/\B.\b/
     xy
  0: y
+    x.!?y
+ 0: ?
 
-/\by\B/
+/\b.\B/
     yz
  0: y
+    x.!?y
+ 0: .
 
-/\By\B/
+/\B.\B/
     xyz
  0: y
+    x.!?y
+ 0: !
 
 /\w/
     a








^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Fix wrong implementation of \B (BZ #693)
  2005-01-26 19:02 Fix wrong implementation of \B (BZ #693) Paolo Bonzini
@ 2005-01-26 19:05 ` Jakub Jelinek
  0 siblings, 0 replies; 2+ messages in thread
From: Jakub Jelinek @ 2005-01-26 19:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: libc-hacker, karl, arnold, drepper

On Wed, Jan 26, 2005 at 08:02:15PM +0100, Paolo Bonzini wrote:
> This is pretty easy to do, as all the code is already there to implement
> \b.  Just like \b is lowered to \<|\>, \B is lowered to the disjunction
> of two (otherwise unavailable) constraints, "inside word" and "outside
> word".

Heh, posted similar patch ealier today.  I wonder how bug-regex19.c
could pass for you though, there were several tests in it that
expected (\b|\B) is not the same as nothing.

	Jakub

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-01-26 19:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-26 19:02 Fix wrong implementation of \B (BZ #693) Paolo Bonzini
2005-01-26 19:05 ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).