From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id 196C03858D37 for ; Fri, 21 Apr 2023 01:29:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 196C03858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-3f178da219bso11870175e9.1 for ; Thu, 20 Apr 2023 18:29:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682040548; x=1684632548; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=6V7+7JEXlTNkFsawkGeo9IR2KWFoi6x9xbYuQ7ayKkw=; b=BvGe22XpZZiylySKt3+SkN91520FabLAGrXBcW+17SyOjHU6SOFA/f+D21pxDZzqKB 6naqc2vV/PhiQpGMUJW19GXJLWpLb8X74TvAD/pnXnRKKLT6yVkiTRqSAsvKN8hi5urg YOUxr6ZLel9qXzG2BN9qdxugX0yCacngz3Sz5DY+hNFlBVk3/y82V0LT9YQ3eO2NlXVr LYcEL6kvClQ/aOvdOZ90pmotxMlV5rJzOIGVtXVpo3fuFMkh9JDEQkVh0RPBxu2H7dj4 c5EMBgdhn73qm7T8jzfSc20isfC38Sjv8w9q6ASkjQRhWRPUZZ3oUCvR1sF+LY5BYxI5 EAhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682040548; x=1684632548; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=6V7+7JEXlTNkFsawkGeo9IR2KWFoi6x9xbYuQ7ayKkw=; b=SJX4F2QUrvsI5x7wYG4G10gAJhlr/f3u1zHgKClXE/64Rm5mx3Yth8JHebYGeE8J0O RrfUvPVhQFLXOELDLht0ZbXocFmZEinGEa7zYTugW3ygJUP6p2Bwt/3ygHQL+LVpTnWG 6dp4RMvDCdPcK8+a0f2T6WJq3qnblZzYSMTFeeb3TB2IT7jLQXrNoUu/EBI2YpT1Xs0f NBWckbzu5mDqgqLx5T5A6AMhZLzCCummaxE+Ye/OaJVxa+UBfG6me8cRtteL/BykZM3k pqkp7ACtfR2sz3cadmv8ojW4iuweF9ogSteUeL4t/E0jUWp0kPBb3Fz/YmjUfc49KMyV c6tQ== X-Gm-Message-State: AAQBX9dKVi48bYTpH5vWbLxn13tAQioJTY/hVnwG72u/2G5qUJOft4NS Uu7CRDGFm6aefShXKBFd4WpS3RJ9oR0= X-Google-Smtp-Source: AKy350Zk8Sn2dHAvI3qPYDvHGnMjPaXCbG1GhAqWllQaVApiwEx3R0tTo6uPha1noCZRgl7svGGujA== X-Received: by 2002:a5d:49c6:0:b0:2f9:9911:93d1 with SMTP id t6-20020a5d49c6000000b002f9991193d1mr2399671wrs.24.1682040547651; Thu, 20 Apr 2023 18:29:07 -0700 (PDT) Received: from [192.168.0.160] ([170.253.51.134]) by smtp.gmail.com with ESMTPSA id s12-20020a7bc38c000000b003f1739a0116sm3405242wmj.33.2023.04.20.18.29.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 20 Apr 2023 18:29:07 -0700 (PDT) Message-ID: <2f3a3aa5-9e01-8f46-7b98-de03cf304aad@gmail.com> Date: Fri, 21 Apr 2023 03:28:42 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: regexec(3): REG_STARTEND is not documented Content-Language: en-US To: =?UTF-8?B?0L3QsNCx?= Cc: GNU C Library , Siddhesh Poyarekar References: <0de87674-1b35-8dc8-7d2b-8dacd6b015ff@gmail.com> From: Alejandro Colomar In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------YNKBPE9QUgRkO879ICWqU5XP" X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------YNKBPE9QUgRkO879ICWqU5XP Content-Type: multipart/mixed; boundary="------------uoFiCHkSTBiApjpj8I33rckc"; protected-headers="v1" From: Alejandro Colomar To: =?UTF-8?B?0L3QsNCx?= Cc: GNU C Library , Siddhesh Poyarekar Message-ID: <2f3a3aa5-9e01-8f46-7b98-de03cf304aad@gmail.com> Subject: Re: regexec(3): REG_STARTEND is not documented References: <0de87674-1b35-8dc8-7d2b-8dacd6b015ff@gmail.com> In-Reply-To: --------------uoFiCHkSTBiApjpj8I33rckc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi =D0=BD=D0=B0=D0=B1! On 4/21/23 03:15, =D0=BD=D0=B0=D0=B1 wrote: > On Fri, Apr 21, 2023 at 03:07:00AM +0200, Alejandro Colomar wrote: >> On 4/21/23 02:45, Alejandro Colomar wrote: >>> Is the following call valid, or is it UB? >>> regmatch_t pmatch =3D { >>> .rm_so =3D string, >>> .rm_eo =3D string + 42, // Assume this offset is valid >>> }; >>> regexec(preg, string, 0, pmatch, REG_NOSUB | REG_STARTEND); >>> How about this? >>> regexec(preg, string, 999, pmatch, REG_NOSUB | REG_STARTEND); > (If you make that "&pmatch", > and put the REG_NOSUB into a preceding regcomp(), my bet is on "valid"= =2E) D'oh! I should check what I write before putting it in a bottle. Yeah, I meant that, or at least should have meant that :) >=20 >>> Current implementations will work, because nmatch is effectively >>> ignored. But is it intended to be this way, or just an implementatio= n >>> detail? > My bet is on "intended", quoth 4.4BSD-Lite regex(3): > REG_STARTEND The string is considered to start at string= + > pmatch[0].rm_so and to have a terminating NUL located = at > string + pmatch[0].rm_eo (there need not actually be= a > NUL at that location), regardless of the value of nmatc= h. > See below for the definition of pmatch and nmatch. Th= is > is an extension, compatible with but not specified = by > POSIX 1003.2, and should be used with caution in softwa= re > intended to be portable to other systems. Note that = a > non-zero rm_so does not imply REG_NOTBOL; REG_STARTE= ND > affects only the location of the string, not how it = is > matched. While this paragraph is not crystal clear to me, the one you quoted below= pretty much is. >=20 >> Here's a related question: >> regmatch_t pmatch =3D { >> .rm_so =3D string, >> .rm_eo =3D string + 42, // Assume this offset is valid >> }; >> regexec(preg, string, 0, pmatch, REG_STARTEND); >> Should regexec(3) write to the 1st element in pmatch[] because it know= s >> it exists (otherwise the call would be UB because it needs to read it)= ? > (Which would run counter to how POSIX defines the API.) >=20 >> Or is passing 0 in nmatch effectively another way of performing >> REG_NOSUB behavior without actually using the flag? > Hilariously enough, quoth 4.4BSD-Lite regex(3) again, > which phrases it exactly like you do: > If REG_NOSUB was specified in the compilation of the RE, or if nmat= ch > is 0, regexec ignores the pmatch argument (but see below for the ca= se > where REG_STARTEND is specified). Touche; it looks like your right. That sentence is unambiguous. BTW, is= the reference to some other text about REG_STARTEND the one quoted first (above)? Cheers, Alex > Otherwise, pmatch points to an array > of nmatch structures of type regmatch_t. Such a structure has at lea= st > the members rm_so and rm_eo, both of type regoff_t (a signed arithmet= ic > type at least as large as an off_t and a ssize_t), containing respe= c=E2=80=90 > tively the offset of the first character of a substring and the offs= et > of the first character after the end of the substring. Offsets a= re > measured from the beginning of the string argument given to regexe= c. > An empty substring is denoted by equal offsets, both indicating t= he > character following the empty substring. > (you know how I'm betting here). :) >=20 > =D0=BD=D0=B0=D0=B1 --=20 GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 --------------uoFiCHkSTBiApjpj8I33rckc-- --------------YNKBPE9QUgRkO879ICWqU5XP Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmRB5ssACgkQnowa+77/ 2zLDmw//Zm8vNeE3W6TEUaZGypKaleXVJ3/mviH8ClvEaI1awu/Ea78d6EMclIQV PGZ5VVazxoHFoeMwgYARp/sItYJmi2OUyfKOCLfzCJl2G49uD2KwX2xoH3OvLiGh Ugz+mFPLGuwHGYdhM1f2neShZftzY0GL3tnj0+H+DBsYbO0vAs1u++vnm7ykOVLN 4L3nuFROsxyouGI44rF9v1uel7JwJfalsyN/x8zxWUkQ48v75bd+l2IPlD0dDrrj Ym5k5FRCC0ZWGAZhvS763Sl7qToMeH8NC2w0els4GmnyAZptEJ5u3rfkhElF/MpY 73fj0x0QU0+aYm2ftkBqtl79joaQvfN/iCm0cF2dx3eAa1T7g4oTexh+9oxJxwDY yhgJrCEW79UVNekoWYCOjvlK9oO+KlTWEk3V0qYtBtD6UwJtTgiNcjMDK9CVjTdg YnqOGO+uF1bXJqjR/Nlg25tPWyBuDWz/9CHLGxf512wbn3A0xlCzZEGCUm/0hy1b HHy/mj5SUKLbZqvhAT8SETO+aeg3YgbPZSZRvRkTJBsDAprou2kR/NTlZOu6ZWBr vfLpFhlPmKrNEpvQJTLwoP/LkmWcogmWrfEmLMb3wuvZAFRi4uI9kVU4zNTa7/6Z unC+hWX8I30zwlSV6ULrCBKQ1xEgg3U54YWKmTcqhdeDiGjEzG8= =husv -----END PGP SIGNATURE----- --------------YNKBPE9QUgRkO879ICWqU5XP--