From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx10.gouders.net (mx10.gouders.net [202.61.206.94]) by sourceware.org (Postfix) with ESMTPS id 10AFF3858403 for ; Sat, 3 Feb 2024 20:51:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 10AFF3858403 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gouders.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gouders.net ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 10AFF3858403 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=202.61.206.94 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706993471; cv=none; b=RP16myMOHWx2be/kW9wseNL27eScvcmDzNzLwKasrkqAD9hY9VBRhewhSFgl+7CDiArMj6aa9nIZLGjDFIHFeG325ZnGdlD/WFlJGuVMsdegKS3ZO19IlvV/si9a3gjxug6jMqKJk/AwiM/U8XxlvmpwGDHiSdTuhrUd/xg82rk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706993471; c=relaxed/simple; bh=n7Naz1/W6en/wsIcG/nJrg+UskQA0GUAe9B1mTk71oM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=K6gwsmble2S+mOMRfnd1++57lI5BXJjVWHMgjrqyPa6ENptPyNqzD9TwrZZNlHlF01//PkoV+ciUmrXik4N8fNNXfFCwkfMaUPklMGBDCEqiGePTo8PU+3ALxywvhxjKtlyDOV1KvEKqsM9JJa/Dz1Jory96Y1X2OP+gEvBT0Fg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from localhost (ip-109-42-176-80.web.vodafone.de [109.42.176.80]) (authenticated bits=0) by mx10.gouders.net (8.17.1.9/8.17.1.9) with ESMTPSA id 413Kp4xZ013920 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sat, 3 Feb 2024 21:51:04 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gouders.net; s=gnet; t=1706993465; bh=n7Naz1/W6en/wsIcG/nJrg+UskQA0GUAe9B1mTk71oM=; h=From:To:Subject:In-Reply-To:References:Date; b=aqUF6jyZJ3s2n2IvKizL2Ou5Q7/2Av2Lar8dgOTjMvcQmnU2wVE4piMMhJJDbPs7a sfnb1Me7AA+Et6np6KLeRVRyQNRnRMbySR9guS0ImHAcq5yiArwS6hK7kQSkt7Nk3Y 1AHaSotmjIwlSFWyP/fPKCHFTSE8JqsCSY8M5rRg= From: Dirk Gouders To: libc-help@sourceware.org Subject: Re: Help: match '\0' with regexec(3) In-Reply-To: (Dirk Gouders's message of "Sat, 03 Feb 2024 21:33:42 +0100") References: User-Agent: Gnus/5.13 (Gnus v5.13) Date: Sat, 03 Feb 2024 21:50:59 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-=-= Content-Type: text/plain Hi again, I'm very sorry: the mail was out and I found an error in the program (corrected version attached). This perhaps answers my unsureness about '.': $ printf ".\na\0b\n" | ./test_regex Compiling regex "." Testing string "610062"... regexec match: pos 0 length 1 "a" Testing string "0062"... regexec match: pos 2 length 1 "b" But this expression matches '\0': $ printf "[^\\\x01-\\\xff]\na\0b\n" | ./test_regex Compiling regex "[^\x01-\xff]" Testing string "610062"... regexec match: pos 0 length 1 "a" Testing string "0062"... regexec match: pos 1 length 1 "" Testing string "62"... regexec match: pos 2 length 1 "b" Regards, Dirk --=-=-= Content-Type: text/plain Content-Disposition: inline; filename=test_regex.c Content-Description: regexec(3) test-program #include #include #include int main() { char *line = NULL; int ret; /* * \b (backspace) is used to produce bold or underlined text. */ char *ref_regex = "[A-Za-z\b._-]+\\(..?\\)"; static regex_t *preg; regmatch_t pmatch[1]; char *line_ptr; size_t line_length; if (preg == NULL) { preg = calloc(sizeof(regex_t), 1); if (ret = regcomp(preg, ref_regex, REG_EXTENDED) != 0) { fprintf(stderr, "regcomp() failed: %d\n", ret); exit(1); } } while (1) { /* position to next non empty line */ while (1) { ret = getline(&line, &line_length, stdin); if (ret == -1) { regfree(preg); return 0; } if (line_length > 1) break; if (line_length < 1) { regfree(preg); return 0; } } printf("%s: while finished: %ld\n", __func__, line_length); line_ptr = line; while (1) { ret = regexec(preg, line_ptr, 1, pmatch, 0); if (ret != 0) break; printf("regexec match \"%s\"\n", line_ptr + pmatch[0].rm_so); line_ptr += pmatch[0].rm_eo; } } } --=-=-= Content-Type: text/plain Dirk Gouders writes: > Hi, > > I would like to ask for an explanation or hint to my error for my > attempt to use regexec(3) to match null-characters ('\0'). > > To illustrate it, I wrote the attached test-program and what I do not > understand is why I get false match-positions when testing with a string > that contains '\0' (I am not absolutely sure if '.' is supposed to match '\0'). > > Here is some "normal" output: > > $ printf ".\nab\n" | ./test_regex > Compiling regex "." > Testing string "ab"... > regexec match: pos 0 length 1 > "ab" > Testing string "b"... > regexec match: pos 1 length 1 > "b" > Testing string ""... > > But when I insert a '\0' into that string, the result is confusing to > me: > > $ printf ".\na\0b\n" | ./test_regex > Compiling regex "." > Testing string "a"... > regexec match: pos 0 length 1 > "a" > Testing string ""... > regexec match: pos 2 length 1 > "b" > Testing string "b"... > regexec match: pos 2 length 1 > "b" > Testing string ""... > > My appologies in advance should this question be easy to answer myself > if I had googled it correctly. > > Regards, > > Dirk > > #include > #include > #include > > int main() > { > int ret; > > char *line = NULL; > char *reg_expr = NULL; > size_t line_len = 256; > size_t l; > > static regex_t preg; > > regmatch_t pmatch[1]; > > > ret = getline(®_expr, &line_len, stdin); > > if (ret < 1) > exit(1); > > reg_expr[ret - 1] = '\0'; /* remove newline */ > > printf("Compiling regex \"%s\"\n", reg_expr); > > if (ret = regcomp(&preg, reg_expr, REG_EXTENDED | REG_NEWLINE) != 0) { > fprintf(stderr, "regcomp() failed: %d\n", ret); > exit(1); > } > > > while (1) { > ret = getline(&line, &line_len, stdin); > > line[ret - 1] = '\0'; /* remove newline */ > line_len = ret - 1; > > if (ret < 1) > break; > > for (int i = 0; i < line_len; i += l ? l : 1) { > > pmatch[0].rm_so = 0; > pmatch[0].rm_eo = line_len - i; > > printf("Testing string \""); > for (int j = i; j < line_len; j++) > printf("%c", line[j]); > printf("\"...\n"); > > ret = regexec(&preg, line + i, 1, pmatch, REG_NOTEOL | REG_STARTEND); > > if (ret != 0) { > printf("No match.\n"); > break; > } else > printf("regexec match: pos %u length %u\n\t\"%s\"\n", > pmatch[0].rm_so + i, > pmatch[0].rm_eo - pmatch[0].rm_so, > line + i + pmatch[0].rm_so); > > l = pmatch[0].rm_eo - pmatch[0].rm_so; > } > } > } --=-=-=--