* Help: match '\0' with regexec(3)
@ 2024-02-03 20:33 Dirk Gouders
2024-02-03 20:50 ` Dirk Gouders
0 siblings, 1 reply; 2+ messages in thread
From: Dirk Gouders @ 2024-02-03 20:33 UTC (permalink / raw)
To: libc-help
[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]
Hi,
I would like to ask for an explanation or hint to my error for my
attempt to use regexec(3) to match null-characters ('\0').
To illustrate it, I wrote the attached test-program and what I do not
understand is why I get false match-positions when testing with a string
that contains '\0' (I am not absolutely sure if '.' is supposed to match '\0').
Here is some "normal" output:
$ printf ".\nab\n" | ./test_regex
Compiling regex "."
Testing string "ab"...
regexec match: pos 0 length 1
"ab"
Testing string "b"...
regexec match: pos 1 length 1
"b"
Testing string ""...
But when I insert a '\0' into that string, the result is confusing to
me:
$ printf ".\na\0b\n" | ./test_regex
Compiling regex "."
Testing string "a"...
regexec match: pos 0 length 1
"a"
Testing string ""...
regexec match: pos 2 length 1
"b"
Testing string "b"...
regexec match: pos 2 length 1
"b"
Testing string ""...
My appologies in advance should this question be easy to answer myself
if I had googled it correctly.
Regards,
Dirk
[-- Attachment #2: regexec(3) test-program --]
[-- Type: text/plain, Size: 1334 bytes --]
#include <stdlib.h>
#include <stdio.h>
#include <regex.h>
int main()
{
int ret;
char *line = NULL;
char *reg_expr = NULL;
size_t line_len = 256;
size_t l;
static regex_t preg;
regmatch_t pmatch[1];
ret = getline(®_expr, &line_len, stdin);
if (ret < 1)
exit(1);
reg_expr[ret - 1] = '\0'; /* remove newline */
printf("Compiling regex \"%s\"\n", reg_expr);
if (ret = regcomp(&preg, reg_expr, REG_EXTENDED | REG_NEWLINE) != 0) {
fprintf(stderr, "regcomp() failed: %d\n", ret);
exit(1);
}
while (1) {
ret = getline(&line, &line_len, stdin);
line[ret - 1] = '\0'; /* remove newline */
line_len = ret - 1;
if (ret < 1)
break;
for (int i = 0; i < line_len; i += l ? l : 1) {
pmatch[0].rm_so = 0;
pmatch[0].rm_eo = line_len - i;
printf("Testing string \"");
for (int j = i; j < line_len; j++)
printf("%c", line[j]);
printf("\"...\n");
ret = regexec(&preg, line + i, 1, pmatch, REG_NOTEOL | REG_STARTEND);
if (ret != 0) {
printf("No match.\n");
break;
} else
printf("regexec match: pos %u length %u\n\t\"%s\"\n",
pmatch[0].rm_so + i,
pmatch[0].rm_eo - pmatch[0].rm_so,
line + i + pmatch[0].rm_so);
l = pmatch[0].rm_eo - pmatch[0].rm_so;
}
}
}
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Help: match '\0' with regexec(3)
2024-02-03 20:33 Help: match '\0' with regexec(3) Dirk Gouders
@ 2024-02-03 20:50 ` Dirk Gouders
0 siblings, 0 replies; 2+ messages in thread
From: Dirk Gouders @ 2024-02-03 20:50 UTC (permalink / raw)
To: libc-help
[-- Attachment #1: Type: text/plain, Size: 687 bytes --]
Hi again,
I'm very sorry: the mail was out and I found an error in the program
(corrected version attached).
This perhaps answers my unsureness about '.':
$ printf ".\na\0b\n" | ./test_regex
Compiling regex "."
Testing string "610062"...
regexec match: pos 0 length 1
"a"
Testing string "0062"...
regexec match: pos 2 length 1
"b"
But this expression matches '\0':
$ printf "[^\\\x01-\\\xff]\na\0b\n" | ./test_regex
Compiling regex "[^\x01-\xff]"
Testing string "610062"...
regexec match: pos 0 length 1
"a"
Testing string "0062"...
regexec match: pos 1 length 1
""
Testing string "62"...
regexec match: pos 2 length 1
"b"
Regards,
Dirk
[-- Attachment #2: regexec(3) test-program --]
[-- Type: text/plain, Size: 1749 bytes --]
#include <stdlib.h>
#include <stdio.h>
#include <regex.h>
int main()
{
char *line = NULL;
int ret;
/*
* \b (backspace) is used to produce bold or underlined text.
*/
char *ref_regex = "[A-Za-z\b._-]+\\(..?\\)";
static regex_t *preg;
regmatch_t pmatch[1];
char *line_ptr;
size_t line_length;
if (preg == NULL) {
preg = calloc(sizeof(regex_t), 1);
if (ret = regcomp(preg, ref_regex, REG_EXTENDED) != 0) {
fprintf(stderr, "regcomp() failed: %d\n", ret);
exit(1);
}
}
while (1) {
/* position to next non empty line */
while (1) {
ret = getline(&line, &line_length, stdin);
if (ret == -1) {
regfree(preg);
return 0;
}
if (line_length > 1)
break;
if (line_length < 1) {
regfree(preg);
return 0;
}
}
printf("%s: while finished: %ld\n", __func__, line_length);
line_ptr = line;
while (1) {
ret = regexec(preg, line_ptr, 1, pmatch, 0);
if (ret != 0)
break;
printf("regexec match \"%s\"\n", line_ptr + pmatch[0].rm_so);
line_ptr += pmatch[0].rm_eo;
}
}
}
[-- Attachment #3: Type: text/plain, Size: 2626 bytes --]
Dirk Gouders <dirk@gouders.net> writes:
> Hi,
>
> I would like to ask for an explanation or hint to my error for my
> attempt to use regexec(3) to match null-characters ('\0').
>
> To illustrate it, I wrote the attached test-program and what I do not
> understand is why I get false match-positions when testing with a string
> that contains '\0' (I am not absolutely sure if '.' is supposed to match '\0').
>
> Here is some "normal" output:
>
> $ printf ".\nab\n" | ./test_regex
> Compiling regex "."
> Testing string "ab"...
> regexec match: pos 0 length 1
> "ab"
> Testing string "b"...
> regexec match: pos 1 length 1
> "b"
> Testing string ""...
>
> But when I insert a '\0' into that string, the result is confusing to
> me:
>
> $ printf ".\na\0b\n" | ./test_regex
> Compiling regex "."
> Testing string "a"...
> regexec match: pos 0 length 1
> "a"
> Testing string ""...
> regexec match: pos 2 length 1
> "b"
> Testing string "b"...
> regexec match: pos 2 length 1
> "b"
> Testing string ""...
>
> My appologies in advance should this question be easy to answer myself
> if I had googled it correctly.
>
> Regards,
>
> Dirk
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <regex.h>
>
> int main()
> {
> int ret;
>
> char *line = NULL;
> char *reg_expr = NULL;
> size_t line_len = 256;
> size_t l;
>
> static regex_t preg;
>
> regmatch_t pmatch[1];
>
>
> ret = getline(®_expr, &line_len, stdin);
>
> if (ret < 1)
> exit(1);
>
> reg_expr[ret - 1] = '\0'; /* remove newline */
>
> printf("Compiling regex \"%s\"\n", reg_expr);
>
> if (ret = regcomp(&preg, reg_expr, REG_EXTENDED | REG_NEWLINE) != 0) {
> fprintf(stderr, "regcomp() failed: %d\n", ret);
> exit(1);
> }
>
>
> while (1) {
> ret = getline(&line, &line_len, stdin);
>
> line[ret - 1] = '\0'; /* remove newline */
> line_len = ret - 1;
>
> if (ret < 1)
> break;
>
> for (int i = 0; i < line_len; i += l ? l : 1) {
>
> pmatch[0].rm_so = 0;
> pmatch[0].rm_eo = line_len - i;
>
> printf("Testing string \"");
> for (int j = i; j < line_len; j++)
> printf("%c", line[j]);
> printf("\"...\n");
>
> ret = regexec(&preg, line + i, 1, pmatch, REG_NOTEOL | REG_STARTEND);
>
> if (ret != 0) {
> printf("No match.\n");
> break;
> } else
> printf("regexec match: pos %u length %u\n\t\"%s\"\n",
> pmatch[0].rm_so + i,
> pmatch[0].rm_eo - pmatch[0].rm_so,
> line + i + pmatch[0].rm_so);
>
> l = pmatch[0].rm_eo - pmatch[0].rm_so;
> }
> }
> }
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-02-03 20:51 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-03 20:33 Help: match '\0' with regexec(3) Dirk Gouders
2024-02-03 20:50 ` Dirk Gouders
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).