From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zimbra.cs.ucla.edu (zimbra.cs.ucla.edu [131.179.128.68]) by sourceware.org (Postfix) with ESMTPS id 99EE03855014 for ; Sun, 22 Aug 2021 05:06:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 99EE03855014 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=cs.ucla.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.ucla.edu Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9A21D160078; Sat, 21 Aug 2021 22:06:27 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id c0grtZRTZzuz; Sat, 21 Aug 2021 22:06:22 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5FACB16007D; Sat, 21 Aug 2021 22:06:22 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id F7HR9GZ29Hmf; Sat, 21 Aug 2021 22:06:22 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 13E7F160078; Sat, 21 Aug 2021 22:06:22 -0700 (PDT) From: Paul Eggert To: Martin Sebor Cc: GNU C Library References: <15a32181-8060-4135-cb72-9e79831697d5@gmail.com> <4251e9f2-d4d1-b649-8e86-a4336164a5b1@cs.ucla.edu> <1024a9e9-a880-7da2-7b99-3e8b8012a94a@cs.ucla.edu> Organization: UCLA Computer Science Department Subject: Re: [PATCH v3] remove attribute access from regexec Message-ID: <3e8fa13e-14e4-0e98-18e2-952cf5e9c278@cs.ucla.edu> Date: Sat, 21 Aug 2021 22:06:21 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------E347BAD65C7200D1CCCB3BB7" Content-Language: en-US X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Aug 2021 05:06:39 -0000 This is a multi-part message in MIME format. --------------E347BAD65C7200D1CCCB3BB7 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 8/19/21 4:50 PM, Martin Sebor wrote: > Is for some reason not considered a system header in your > test environment? Yes, because Coreutils etc. currently use Gnulib regex.h instead of=20 /usr/include/regex.h, as Glibc regex has a bug and the fix (in Gnulib)=20 hasn't made its way back to Glibc yet. This sort of thing is all too common, as regex development (such as it=20 is) often occurs in Gnulib first, which means it's common for=20 'configure' to discover that the current Glibc has a bug and to=20 substitute Gnulib regex instead. And when we compile Gnulib headers we=20 don't consider them to be system headers, because we want the extra=20 static checking that gcc does for non system headers. I think the simplest workaround for the warning is to disable -Wvla for=20 that particular declaration. Please see attached Gnulib patch for how I=20 suggest doing this. > I didn't know mixing and matching two implementations like this > was even possible.=C2=A0 Thanks for explaining it (though it seems > like a pretty cumbersome arrangement). Yes, it is a bit awkward. I am tempted to clean it up but that really=20 should be a separate patch and so I will focus only on this thread's issu= e. > +#ifndef __ARG_NELTS > +# ifdef __ARG_NELTS A clear typo. The suggestion was to define and use _ARG_NELTS_ instead. Some other thoughts. If we're going to change regexec, we should change __compat_regex to be=20 consistent. We should also change regexec.c's internal functions to be consistent=20 with regexec. This includes re_search_internal, update_regs, etc. These internal functions don't have regexec's quirk that pmatch is=20 ignored when not substituting, so their parameters can use the syntax=20 "regmatch_t pmatch[__ARG_NELTS (static nmatch)]" to let the compiler=20 know that the array must be of the given size. Proposed Gnulib patch attached. It should be easy to aplly to glibc=20 though glibc also needs its own regex.h patched. I have not installed=20 this into Gnulib yet. At some point we should sync Gnulib and Glibc regex of course. --------------E347BAD65C7200D1CCCB3BB7 Content-Type: text/x-patch; charset=UTF-8; name="0001-regex-use-C99-style-array-arg-syntax.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-regex-use-C99-style-array-arg-syntax.patch" =46rom 284581e11190ba13ff5a6f1dfe14d292866f1dc5 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 21 Aug 2021 21:58:16 -0700 Subject: [PATCH] regex: use C99-style array arg syntax This should help with some static checking. Derived from a suggestion by Martin Sebor in: https://sourceware.org/pipermail/libc-alpha/2021-August/130336.html * lib/cdefs.h (__ARG_NELTS): New macro. * lib/regex.c: Ignore -Wvla for the whole file. * lib/regex.h (_ARG_NELTS_): New macro. Ignore -Wvla when declaring regexec. * lib/regexec.c (regexec, __compat_regexec, re_copy_regs) (re_search_internal, proceed_next_node, push_fail_stack) (pop_fail_stack, set_regs, update_regs): Use __ARG_NELTS for each array parameter whose size is another arg. --- ChangeLog | 15 ++++++++++ lib/cdefs.h | 9 ++++++ lib/regex.c | 1 + lib/regex.h | 21 +++++++++++++- lib/regexec.c | 78 +++++++++++++++++++++++++++++---------------------- 5 files changed, 89 insertions(+), 35 deletions(-) diff --git a/ChangeLog b/ChangeLog index 732b6f1fff..346e17d305 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,18 @@ +2021-08-21 Paul Eggert + + regex: use C99-style array arg syntax + This should help with some static checking. + Derived from a suggestion by Martin Sebor in: + https://sourceware.org/pipermail/libc-alpha/2021-August/130336.html + * lib/cdefs.h (__ARG_NELTS): New macro. + * lib/regex.c: Ignore -Wvla for the whole file. + * lib/regex.h (_ARG_NELTS_): New macro. + Ignore -Wvla when declaring regexec. + * lib/regexec.c (regexec, __compat_regexec, re_copy_regs) + (re_search_internal, proceed_next_node, push_fail_stack) + (pop_fail_stack, set_regs, update_regs): + Use __ARG_NELTS for each array parameter whose size is another arg. + 2021-08-21 Bruno Haible =20 c-stack: Test for libsigsegv once, not twice. diff --git a/lib/cdefs.h b/lib/cdefs.h index 4dac9d264d..13c5542bfd 100644 --- a/lib/cdefs.h +++ b/lib/cdefs.h @@ -634,4 +634,13 @@ _Static_assert (0, "IEEE 128-bits long double requir= es redirection on this platf # define __attribute_returns_twice__ /* Ignore. */ #endif =20 +/* Specify the number of elements of a function's array parameter, + as in 'int f (int n, int a[__ARG_NELTS (n)]);'. */ +#if (defined __STDC_VERSION__ && 199901L <=3D __STDC_VERSION__ \ + && !defined __STDC_NO_VLA__) +# define __ARG_NELTS(n) n +#else +# define __ARG_NELTS(n) +#endif + #endif /* sys/cdefs.h */ diff --git a/lib/regex.c b/lib/regex.c index 7296be0f08..d32863972c 100644 --- a/lib/regex.c +++ b/lib/regex.c @@ -24,6 +24,7 @@ =20 # if __GNUC_PREREQ (4, 6) # pragma GCC diagnostic ignored "-Wsuggest-attribute=3Dpure" +# pragma GCC diagnostic ignored "-Wvla" # endif # if __GNUC_PREREQ (4, 3) # pragma GCC diagnostic ignored "-Wold-style-definition" diff --git a/lib/regex.h b/lib/regex.h index 8e4ef45578..561af78298 100644 --- a/lib/regex.h +++ b/lib/regex.h @@ -640,6 +640,22 @@ extern int re_exec (const char *); # endif #endif =20 +#ifndef _ARG_NELTS_ +# ifdef __ARG_NELTS +# define _ARG_NELTS_(arg) __ARG_NELTS (arg) +# elif (defined __STDC_VERSION__ && 199901L <=3D __STDC_VERSION__ \ + && !defined __STDC_NO_VLA__) +# define _ARG_NELTS_(n) n +# else +# define _ARG_NELTS_(n) +# endif +#endif + +#if defined __GNUC__ && 4 < __GNUC__ + (6 <=3D __GNUC_MINOR__) +# pragma GCC diagnostic push +# pragma GCC diagnostic ignored "-Wvla" +#endif + /* POSIX compatibility. */ extern int regcomp (regex_t *_Restrict_ __preg, const char *_Restrict_ __pattern, @@ -647,7 +663,7 @@ extern int regcomp (regex_t *_Restrict_ __preg, =20 extern int regexec (const regex_t *_Restrict_ __preg, const char *_Restrict_ __String, size_t __nmatch, - regmatch_t __pmatch[_Restrict_arr_], + regmatch_t __pmatch[_Restrict_arr_ _ARG_NELTS_ (__nmatch)], int __eflags); =20 extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg,= @@ -655,6 +671,9 @@ extern size_t regerror (int __errcode, const regex_t = *_Restrict_ __preg, =20 extern void regfree (regex_t *__preg); =20 +#if defined __GNUC__ && 4 < __GNUC__ + (6 <=3D __GNUC_MINOR__) +# pragma GCC diagnostic pop +#endif =20 #ifdef __cplusplus } diff --git a/lib/regexec.c b/lib/regexec.c index 5e4eb497a6..da1fb7fafa 100644 --- a/lib/regexec.c +++ b/lib/regexec.c @@ -31,11 +31,11 @@ static re_sub_match_last_t * match_ctx_add_sublast (r= e_sub_match_top_t *subtop, static void sift_ctx_init (re_sift_context_t *sctx, re_dfastate_t **sift= ed_sts, re_dfastate_t **limited_sts, Idx last_node, Idx last_str_idx); -static reg_errcode_t re_search_internal (const regex_t *preg, - const char *string, Idx length, - Idx start, Idx last_start, Idx stop, - size_t nmatch, regmatch_t pmatch[], - int eflags); +static reg_errcode_t +re_search_internal (const regex_t *preg, const char *string, Idx length,= + Idx start, Idx last_start, Idx stop, size_t nmatch, + regmatch_t pmatch[__ARG_NELTS (static nmatch)], + int eflags); static regoff_t re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1, Idx length1, const char *string2, Idx length2, @@ -47,23 +47,29 @@ static regoff_t re_search_stub (struct re_pattern_buf= fer *bufp, regoff_t range, Idx stop, struct re_registers *regs, bool ret_len); -static unsigned re_copy_regs (struct re_registers *regs, regmatch_t *pma= tch, - Idx nregs, int regs_allocated); +static unsigned re_copy_regs (struct re_registers *regs, Idx nregs, + regmatch_t pmatch[__ARG_NELTS (static nregs)], + int regs_allocated); static reg_errcode_t prune_impossible_nodes (re_match_context_t *mctx); static Idx check_matching (re_match_context_t *mctx, bool fl_longest_mat= ch, Idx *p_match_first); static Idx check_halt_state_context (const re_match_context_t *mctx, const re_dfastate_t *state, Idx idx); -static void update_regs (const re_dfa_t *dfa, regmatch_t *pmatch, - regmatch_t *prev_idx_match, Idx cur_node, - Idx cur_idx, Idx nmatch); -static reg_errcode_t push_fail_stack (struct re_fail_stack_t *fs, - Idx str_idx, Idx dest_node, Idx nregs, - regmatch_t *regs, regmatch_t *prevregs, - re_node_set *eps_via_nodes); +static void +update_regs (const re_dfa_t *dfa, Idx nmatch, + regmatch_t pmatch[__ARG_NELTS (static nmatch)], + regmatch_t prev_idx_match[__ARG_NELTS (static nmatch)], + Idx cur_node, Idx cur_idx); +static reg_errcode_t +push_fail_stack (struct re_fail_stack_t *fs, + Idx str_idx, Idx dest_node, Idx nregs, + regmatch_t regs[__ARG_NELTS (static nregs)], + regmatch_t prevregs[__ARG_NELTS (static nregs)], + re_node_set *eps_via_nodes); static reg_errcode_t set_regs (const regex_t *preg, const re_match_context_t *mctx, - size_t nmatch, regmatch_t *pmatch, + size_t nmatch, + regmatch_t pmatch[__ARG_NELTS (static nmatch)], bool fl_backtrack); static reg_errcode_t free_fail_stack_return (struct re_fail_stack_t *fs)= ; =20 @@ -191,7 +197,7 @@ static reg_errcode_t extend_buffers (re_match_context= _t *mctx, int min_len); =20 int regexec (const regex_t *__restrict preg, const char *__restrict string, - size_t nmatch, regmatch_t pmatch[], int eflags) + size_t nmatch, regmatch_t pmatch[__ARG_NELTS (nmatch)], int eflags) { reg_errcode_t err; Idx start, length; @@ -212,12 +218,8 @@ regexec (const regex_t *__restrict preg, const char = *__restrict string, } =20 lock_lock (dfa->lock); - if (preg->no_sub) - err =3D re_search_internal (preg, string, length, start, length, - length, 0, NULL, eflags); - else - err =3D re_search_internal (preg, string, length, start, length, - length, nmatch, pmatch, eflags); + err =3D re_search_internal (preg, string, length, start, length, + length, preg->no_sub ? 0 : nmatch, pmatch, eflags); lock_unlock (dfa->lock); return err !=3D REG_NOERROR; } @@ -235,7 +237,7 @@ int attribute_compat_text_section __compat_regexec (const regex_t *__restrict preg, const char *__restrict string, size_t nmatch, - regmatch_t pmatch[], int eflags) + regmatch_t pmatch[__ARG_NELTS (nmatch)], int eflags) { return regexec (preg, string, nmatch, pmatch, eflags & (REG_NOTBOL | REG_NOTEOL)); @@ -434,7 +436,7 @@ re_search_stub (struct re_pattern_buffer *bufp, const= char *string, Idx length, else if (regs !=3D NULL) { /* If caller wants register contents data back, copy them. */ - bufp->regs_allocated =3D re_copy_regs (regs, pmatch, nregs, + bufp->regs_allocated =3D re_copy_regs (regs, nregs, pmatch, bufp->regs_allocated); if (__glibc_unlikely (bufp->regs_allocated =3D=3D REGS_UNALLOCATED= )) rval =3D -2; @@ -457,7 +459,8 @@ re_search_stub (struct re_pattern_buffer *bufp, const= char *string, Idx length, } =20 static unsigned -re_copy_regs (struct re_registers *regs, regmatch_t *pmatch, Idx nregs, +re_copy_regs (struct re_registers *regs, Idx nregs, + regmatch_t pmatch[__ARG_NELTS (static nregs)], int regs_allocated) { int rval =3D REGS_REALLOCATE; @@ -585,7 +588,8 @@ static reg_errcode_t __attribute_warn_unused_result__ re_search_internal (const regex_t *preg, const char *string, Idx length,= Idx start, Idx last_start, Idx stop, size_t nmatch, - regmatch_t pmatch[], int eflags) + regmatch_t pmatch[__ARG_NELTS (static nmatch)], + int eflags) { reg_errcode_t err; const re_dfa_t *dfa =3D preg->buffer; @@ -1210,8 +1214,9 @@ check_halt_state_context (const re_match_context_t = *mctx, return -1 on match failure, -2 on error. */ =20 static Idx -proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t= *regs, - regmatch_t *prevregs, +proceed_next_node (const re_match_context_t *mctx, Idx nregs, + regmatch_t regs[__ARG_NELTS (static nregs)], + regmatch_t prevregs[__ARG_NELTS (static nregs)], Idx *pidx, Idx node, re_node_set *eps_via_nodes, struct re_fail_stack_t *fs) { @@ -1321,7 +1326,9 @@ proceed_next_node (const re_match_context_t *mctx, = Idx nregs, regmatch_t *regs, static reg_errcode_t __attribute_warn_unused_result__ push_fail_stack (struct re_fail_stack_t *fs, Idx str_idx, Idx dest_node,= - Idx nregs, regmatch_t *regs, regmatch_t *prevregs, + Idx nregs, + regmatch_t regs[__ARG_NELTS (static nregs)], + regmatch_t prevregs[__ARG_NELTS (static nregs)], re_node_set *eps_via_nodes) { reg_errcode_t err; @@ -1349,7 +1356,8 @@ push_fail_stack (struct re_fail_stack_t *fs, Idx st= r_idx, Idx dest_node, =20 static Idx pop_fail_stack (struct re_fail_stack_t *fs, Idx *pidx, Idx nregs, - regmatch_t *regs, regmatch_t *prevregs, + regmatch_t regs[__ARG_NELTS (static nregs)], + regmatch_t prevregs[__ARG_NELTS (static nregs)], re_node_set *eps_via_nodes) { if (fs =3D=3D NULL || fs->num =3D=3D 0) @@ -1379,7 +1387,7 @@ pop_fail_stack (struct re_fail_stack_t *fs, Idx *pi= dx, Idx nregs, static reg_errcode_t __attribute_warn_unused_result__ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nm= atch, - regmatch_t *pmatch, bool fl_backtrack) + regmatch_t pmatch[__ARG_NELTS (static nmatch)], bool fl_backtrack) { const re_dfa_t *dfa =3D preg->buffer; Idx idx, cur_node; @@ -1415,7 +1423,7 @@ set_regs (const regex_t *preg, const re_match_conte= xt_t *mctx, size_t nmatch, =20 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) { - update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); + update_regs (dfa, nmatch, pmatch, prev_idx_match, cur_node, idx); =20 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D mctx->last_node= ) || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) @@ -1487,8 +1495,10 @@ free_fail_stack_return (struct re_fail_stack_t *fs= ) } =20 static void -update_regs (const re_dfa_t *dfa, regmatch_t *pmatch, - regmatch_t *prev_idx_match, Idx cur_node, Idx cur_idx, Idx nmatch)= +update_regs (const re_dfa_t *dfa, Idx nmatch, + regmatch_t pmatch[__ARG_NELTS (static nmatch)], + regmatch_t prev_idx_match[__ARG_NELTS (static nmatch)], + Idx cur_node, Idx cur_idx) { int type =3D dfa->nodes[cur_node].type; if (type =3D=3D OP_OPEN_SUBEXP) --=20 2.31.1 --------------E347BAD65C7200D1CCCB3BB7--