public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Help: match '\0' with regexec(3)
@ 2024-02-03 20:33 Dirk Gouders
  2024-02-03 20:50 ` Dirk Gouders
  0 siblings, 1 reply; 2+ messages in thread
From: Dirk Gouders @ 2024-02-03 20:33 UTC (permalink / raw)
  To: libc-help

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

Hi,

I would like to ask for an explanation or hint to my error for my
attempt to use regexec(3) to match null-characters ('\0').

To illustrate it, I wrote the attached test-program and what I do not
understand is why I get false match-positions when testing with a string
that contains '\0' (I am not absolutely sure if '.' is supposed to match '\0').

Here is some "normal" output:

$ printf ".\nab\n" | ./test_regex
Compiling regex "."
Testing string "ab"...
regexec match: pos 0 length 1
        "ab"
Testing string "b"...
regexec match: pos 1 length 1
        "b"
Testing string ""...

But when I insert a '\0' into that string, the result is confusing to
me:

$ printf ".\na\0b\n" | ./test_regex
Compiling regex "."
Testing string "a"...
regexec match: pos 0 length 1
        "a"
Testing string ""...
regexec match: pos 2 length 1
        "b"
Testing string "b"...
regexec match: pos 2 length 1
        "b"
Testing string ""...

My appologies in advance should this question be easy to answer myself
if I had googled it correctly.

Regards,

Dirk


[-- Attachment #2: regexec(3) test-program --]
[-- Type: text/plain, Size: 1334 bytes --]

#include <stdlib.h>
#include <stdio.h>
#include <regex.h>

int main()
{
        int ret;

        char *line = NULL;
        char *reg_expr = NULL;
        size_t line_len = 256;
	size_t l;

        static regex_t preg;

        regmatch_t pmatch[1];

                
	ret = getline(&reg_expr, &line_len, stdin);

	if (ret < 1)
		exit(1);

	reg_expr[ret - 1] = '\0'; /* remove newline */

	printf("Compiling regex \"%s\"\n", reg_expr);

	if (ret = regcomp(&preg, reg_expr, REG_EXTENDED | REG_NEWLINE) != 0) {
		fprintf(stderr, "regcomp() failed: %d\n", ret);
		exit(1);
	}


	while (1) {
		ret = getline(&line, &line_len, stdin);
        
		line[ret - 1] = '\0'; /* remove newline */
		line_len = ret - 1;

		if (ret < 1)
			break;

		for (int i = 0; i < line_len; i += l ? l : 1) {

			pmatch[0].rm_so = 0;
			pmatch[0].rm_eo = line_len - i;

			printf("Testing string \"");
			for (int j = i; j < line_len; j++)
				printf("%c", line[j]);
			printf("\"...\n");

			ret = regexec(&preg, line + i, 1, pmatch, REG_NOTEOL | REG_STARTEND);

			if (ret != 0) {
				printf("No match.\n");
				break;
			} else
				printf("regexec match: pos %u length %u\n\t\"%s\"\n",
				       pmatch[0].rm_so + i,
				       pmatch[0].rm_eo - pmatch[0].rm_so,
				       line + i + pmatch[0].rm_so);

			l = pmatch[0].rm_eo - pmatch[0].rm_so;
		}
	}
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-02-03 20:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-03 20:33 Help: match '\0' with regexec(3) Dirk Gouders
2024-02-03 20:50 ` Dirk Gouders

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).