From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id 3C8D23858D37 for ; Fri, 21 Apr 2023 01:15:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3C8D23858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=nabijaczleweli.xyz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nabijaczleweli.xyz Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id C557D67C4; Fri, 21 Apr 2023 03:15:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202211; t=1682039749; bh=GiLYVqbtuN1f2hYvfaJiCIKgGLFXLLSqIGLTD3OfP70=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WTExlv8lLicaVrQzZBqP74Vv+2k71FqjOsOIDPFmq3047AZvUULAknDftyxPDhEsR fxCscFzr349yOA8vBGBsqmgCAIGNuNoYCEWoqXVF9ZZZDPNnDdrXWGFhuVgdhpVzgg 6JFlF7JglNV+UK5PbXmkEw5kFwEjQDgxi91hZWDVQj1wlKeqM6FxovRnjjourQbB/i Y0et5LoxBqpAgbbTLX7phLCm7BUTC9hvoiOEahCwvBORs/l5BsD2gqOTN/tQfB+cgt sOuBLrIxVeENeIkWprl4jCqsiLzSlbKx9OtIXJQHo78ysocrfhMiGT4osGcJ7xCA9w goFBAPPfJ3IWQ== Date: Fri, 21 Apr 2023 03:15:48 +0200 From: =?utf-8?B?0L3QsNCx?= To: Alejandro Colomar Cc: GNU C Library , Siddhesh Poyarekar Subject: Re: regexec(3): REG_STARTEND is not documented Message-ID: References: <0de87674-1b35-8dc8-7d2b-8dacd6b015ff@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="eserznv66lex7dxn" Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20230407 X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_INFOUSMEBIZ,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --eserznv66lex7dxn Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Apr 21, 2023 at 03:07:00AM +0200, Alejandro Colomar wrote: > On 4/21/23 02:45, Alejandro Colomar wrote: > > Is the following call valid, or is it UB? > > regmatch_t pmatch =3D { > > .rm_so =3D string, > > .rm_eo =3D string + 42, // Assume this offset is valid > > }; > > regexec(preg, string, 0, pmatch, REG_NOSUB | REG_STARTEND); > > How about this? > > regexec(preg, string, 999, pmatch, REG_NOSUB | REG_STARTEND); (If you make that "&pmatch", and put the REG_NOSUB into a preceding regcomp(), my bet is on "valid".) > > Current implementations will work, because nmatch is effectively > > ignored. But is it intended to be this way, or just an implementation > > detail? My bet is on "intended", quoth 4.4BSD-Lite regex(3): REG_STARTEND The string is considered to start at string + pmatch[0].rm_so and to have a terminating NUL located at string + pmatch[0].rm_eo (there need not actually be a NUL at that location), regardless of the value of nmatch. See below for the definition of pmatch and nmatch. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. Note that a non-zero rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not how it is matched. > Here's a related question: > regmatch_t pmatch =3D { > .rm_so =3D string, > .rm_eo =3D string + 42, // Assume this offset is valid > }; > regexec(preg, string, 0, pmatch, REG_STARTEND); > Should regexec(3) write to the 1st element in pmatch[] because it knows > it exists (otherwise the call would be UB because it needs to read it)? (Which would run counter to how POSIX defines the API.) > Or is passing 0 in nmatch effectively another way of performing > REG_NOSUB behavior without actually using the flag? Hilariously enough, quoth 4.4BSD-Lite regex(3) again, which phrases it exactly like you do: If REG_NOSUB was specified in the compilation of the RE, or if nmatch is 0, regexec ignores the pmatch argument (but see below for the case where REG_STARTEND is specified). Otherwise, pmatch points to an array of nmatch structures of type regmatch_t. Such a structure has at least the members rm_so and rm_eo, both of type regoff_t (a signed arithmetic type at least as large as an off_t and a ssize_t), containing respec=E2= =80=90 tively the offset of the first character of a substring and the offset of the first character after the end of the substring. Offsets are measured from the beginning of the string argument given to regexec. An empty substring is denoted by equal offsets, both indicating the character following the empty substring. (you know how I'm betting here). =D0=BD=D0=B0=D0=B1 --eserznv66lex7dxn Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfWlHToQCjFzAxEFjvP0LAY0mWPEFAmRB48IACgkQvP0LAY0m WPEPYRAAiwsevgcrbUI4RVJV8pBdRiDudfK09g0JZSM9KOpx4jU9YsBOkd9YtVr6 geWoGVTdTrjvdJ6CaBpbATEA3ZsFtzjom2zR96SSV7lU5NXFtVy71juTESJfUim+ ykFA6ZbRxniJHN8eW/j+YpXfE2sunlqQC+GyqSdDylDBZA2hH0GP3xpZfoAFkhpx YOT1WVXfkKzGWNYChNQcqhs0LH4dFz6RBeIaQ+hgGnS0THl3MPRpOXzSXQZqx3o9 gVOi4iGwRg1g3BDCNtapT2S9gWDT+yePijZj8X3/0ycqyZXQ7eUUXlaTdv6RtHpi qILiV4DTAe9R+4rWB9cdniXQgaUGu5jMPp4euriqX5Erx/yVnMVq27Gf5NTIV4VB 2qjoMXLG8PSbkrYfKb96HZfGzhzLbPLHLx5o8IYy7p5W37xC+jYthxaxfvh5CvPG jWHWxnCSiugdLBbdvXsq7N4XnLECSTxXgHtrteQdjaayLCjHhSHDL9Hh/cBFTinf LYO3qJNdXI8UNVtO+zvfC1Enr7YEwJ1GL7cI0rdUNVfiXYFV5rIUX96CSwFvKsMz vRH3s045+7OyHPleaBvmt5NmFFi9z4d0NcIcZUame1E/f5Nv+aIJOUaFOCwohy2D wz1gtPflFeTZa/K0SwF4FeKt1FTYgdBNcfR2CD35pqOhcfQLFQk= =6RNv -----END PGP SIGNATURE----- --eserznv66lex7dxn--