From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx10.gouders.net (mx10.gouders.net [202.61.206.94]) by sourceware.org (Postfix) with ESMTPS id 59AC83858403 for ; Sat, 3 Feb 2024 20:33:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 59AC83858403 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gouders.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gouders.net ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 59AC83858403 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=202.61.206.94 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706992432; cv=none; b=ro2l3UrM5T4irGJvahlhbYbSGZs3pabx09oD+Hh1D0WTmxbbxr5uNn2nHk4UCdwUQgetR8u07B+m3NWrCA0BPf73Izd+dPMVH0nyon7Bvk0J8bcVyCnUlSsLBMYTJDpZz32bqCuEgECDvpVZP8VINhNaY9uvKJrbzaFtHNN+w2o= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706992432; c=relaxed/simple; bh=xKkFrG415CSZ5JtZ2s3Iedykwun6IOiCYAvORbfZC7Y=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=T92SeNcOBCrTKx26j6R45oEH64t1OUSQZon3kcaS8lMILg1NYU6szorYo9QrXEKCux0Uzh4nU0JcHoPdp+GnuHV5aFohoq0M8ZkZ3eI8uduTmWCx6hdz58G6lLkAiV3o0a11oS65EQvzrVbuEJrDNHXxrk+pWBNZMY9qxpwHTRs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from localhost (ip-109-42-176-80.web.vodafone.de [109.42.176.80]) (authenticated bits=0) by mx10.gouders.net (8.17.1.9/8.17.1.9) with ESMTPSA id 413KXl4T013398 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sat, 3 Feb 2024 21:33:48 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gouders.net; s=gnet; t=1706992428; bh=xKkFrG415CSZ5JtZ2s3Iedykwun6IOiCYAvORbfZC7Y=; h=From:To:Subject:Date; b=gAZKeHFhrm2lk4uad9DeZaO9nAcwRo9mW5kqAbcPMPJLanWbQGbaihGElcyghykyN 77fNzAJAfX5VvAUSKuPujdbufYvBVNxkjwIBgEba4o1rgwKb7dm/LAl+rHPL1wjdTo eK8Pf8xrSDl1+n8JFfChUDuFKMpyd+GzjcG4uYhc= From: Dirk Gouders To: libc-help@sourceware.org Subject: Help: match '\0' with regexec(3) User-Agent: Gnus/5.13 (Gnus v5.13) Date: Sat, 03 Feb 2024 21:33:42 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_BARRACUDACENTRAL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-=-= Content-Type: text/plain Hi, I would like to ask for an explanation or hint to my error for my attempt to use regexec(3) to match null-characters ('\0'). To illustrate it, I wrote the attached test-program and what I do not understand is why I get false match-positions when testing with a string that contains '\0' (I am not absolutely sure if '.' is supposed to match '\0'). Here is some "normal" output: $ printf ".\nab\n" | ./test_regex Compiling regex "." Testing string "ab"... regexec match: pos 0 length 1 "ab" Testing string "b"... regexec match: pos 1 length 1 "b" Testing string ""... But when I insert a '\0' into that string, the result is confusing to me: $ printf ".\na\0b\n" | ./test_regex Compiling regex "." Testing string "a"... regexec match: pos 0 length 1 "a" Testing string ""... regexec match: pos 2 length 1 "b" Testing string "b"... regexec match: pos 2 length 1 "b" Testing string ""... My appologies in advance should this question be easy to answer myself if I had googled it correctly. Regards, Dirk --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=test_regex.c Content-Description: regexec(3) test-program #include #include #include int main() { int ret; char *line = NULL; char *reg_expr = NULL; size_t line_len = 256; size_t l; static regex_t preg; regmatch_t pmatch[1]; ret = getline(®_expr, &line_len, stdin); if (ret < 1) exit(1); reg_expr[ret - 1] = '\0'; /* remove newline */ printf("Compiling regex \"%s\"\n", reg_expr); if (ret = regcomp(&preg, reg_expr, REG_EXTENDED | REG_NEWLINE) != 0) { fprintf(stderr, "regcomp() failed: %d\n", ret); exit(1); } while (1) { ret = getline(&line, &line_len, stdin); line[ret - 1] = '\0'; /* remove newline */ line_len = ret - 1; if (ret < 1) break; for (int i = 0; i < line_len; i += l ? l : 1) { pmatch[0].rm_so = 0; pmatch[0].rm_eo = line_len - i; printf("Testing string \""); for (int j = i; j < line_len; j++) printf("%c", line[j]); printf("\"...\n"); ret = regexec(&preg, line + i, 1, pmatch, REG_NOTEOL | REG_STARTEND); if (ret != 0) { printf("No match.\n"); break; } else printf("regexec match: pos %u length %u\n\t\"%s\"\n", pmatch[0].rm_so + i, pmatch[0].rm_eo - pmatch[0].rm_so, line + i + pmatch[0].rm_so); l = pmatch[0].rm_eo - pmatch[0].rm_so; } } } --=-=-=--