From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by sourceware.org (Postfix) with ESMTPS id 3D1813858427 for ; Sat, 14 Aug 2021 20:08:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3D1813858427 Received: by mail-qt1-x830.google.com with SMTP id d9so11113075qty.12 for ; Sat, 14 Aug 2021 13:08:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=WyAN51JyD2weVNRFP3W/RaFqU7VqIff9xKbgn4vwlYs=; b=Ys0WfF9ZYlk7uRYoZopjd2NDshG+xryrutmDhDSPTUDzh8cHCur/iJ+rd6Kiur9B39 pQjEdxVvHhHHF9URdeGPGGspcQnUIi+okVm85Wj+9wYL2gdworyzlHxQ+aRq606a6oZt BeUSvxO+QwzgJDyEVJ5vujb4+oqRWjtgTbfxJyT7NoQNheQ9gNh6jrZFS8xui/mn2n0p ZX6fjKbn5ErUDLwaJJY5ufNqgOqAB/1HEWQ6/PX9QJ3X4VdLD1AZ/O7lT5Ws+Zc8Qlvk N2iI4+RQx77ZKVaabOfe9wrhQLMRLS+EhmkGTUVAL+TI9Kz+Quiynqh7jFFvUGwmG3IL lBaA== X-Gm-Message-State: AOAM531ovUoKj3c6CAdPJf2A0aPok8SL6KEFKwjMZtO8hEb3EGfIK7SA fP8YSkoiDJ2Hzz/yOeswO+eLK0l0Ii8= X-Google-Smtp-Source: ABdhPJyZzsw+cMbRYkYuXidttgp6r5Ja0zZcYww4sxA4OfF+SAYwGl4w+QhetI4n3PCgd3gk4916gg== X-Received: by 2002:a05:622a:15c1:: with SMTP id d1mr7633082qty.93.1628971692758; Sat, 14 Aug 2021 13:08:12 -0700 (PDT) Received: from [192.168.0.41] (97-118-104-61.hlrn.qwest.net. [97.118.104.61]) by smtp.gmail.com with ESMTPSA id b19sm631020qkc.7.2021.08.14.13.08.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 14 Aug 2021 13:08:12 -0700 (PDT) Subject: Re: [PATCH] remove attribute access from regexec To: Paul Eggert Cc: GNU C Library References: <15a32181-8060-4135-cb72-9e79831697d5@gmail.com> <4251e9f2-d4d1-b649-8e86-a4336164a5b1@cs.ucla.edu> From: Martin Sebor Message-ID: Date: Sat, 14 Aug 2021 14:08:11 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <4251e9f2-d4d1-b649-8e86-a4336164a5b1@cs.ucla.edu> Content-Type: multipart/mixed; boundary="------------4FEAB7897D37171F438D0270" Content-Language: en-US X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Aug 2021 20:08:15 -0000 This is a multi-part message in MIME format. --------------4FEAB7897D37171F438D0270 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 8/13/21 4:34 PM, Paul Eggert wrote: > On 8/13/21 2:30 PM, Martin Sebor wrote: >> Attached is a revised patch with this approach. > > The revised patch is to include/regex.h but the original patch was to > posix/regex.h. Is that intentional? Yes, they need to be consistent, otherwise GCC issues -Wvla-parameter. (That's to help detect inadvertent array/VLA mismatches as well as mismatches in the VLA parameter bounds.) > > We need to check whether __STDC_VERSION__ is defined. Also, no need for > parens around arg of 'defined'. Something like this perhaps: > >   #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ >        && !defined __STDC_NO_VLA__) > > Also, the duplication of the declarations make the headers harder to > read and encourage typos (I noticed one typo: "_Restrict_arr" without > the trailing "_"). Instead, I suggest something like this: > >   #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ >        && !defined __STDC_NO_VLA__) >   # define _REGEX_VLA(arg) arg >   #else >   # define _REGEX_VLA(arg) >   #endif > > That way, we can simply change "regmatch_t __pmatch[_Restrict_arr_]" to > "regmatch_t __pmatch[_Restrict_arr_ _REGEX_VLA (__nmatch)]" without > having to duplicate the entire function declaration. Sounds good. I've defined the macro in cdefs.h and mamed it _VLA_ARG to make it usable in other contexts. Please see the attached revision. > >> PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so >> strictly speaking, warning for such calls to it in that case is >> also a false positive. > > Ouch, this casts doubt on the entire exercise. It's not simply about > warnings: it's about the code being generated for the matcher. For > example, for: > > int > f (_Bool flag, unsigned long n, int a[n]) > { >   return n == 0 ? 0 : flag ? a[n - 1] : a[0]; > } > > a compiler is allowed to generate code that loads a[n - 1] even when > FLAG is false. Similarly, if we add this VLA business to regexec, the > generated machine code could dereference pmatch unconditionally even if > our source code makes the dereferencing conditional on REG_NOSUB, and > the resulting behavior would fail to conform to POSIX. The VLA bound by itself doesn't affect codegen. I suspect you're thinking of a[static n]? With just a[n], without static, there is no requirement that a point to an array with n elements. It simply declares an ordinary pointer, same as [] or *. Martin --------------4FEAB7897D37171F438D0270 Content-Type: text/x-patch; charset=UTF-8; name="glibc-28170.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="glibc-28170.diff" diff --git a/include/regex.h b/include/regex.h index 24eca2c297..5cf3ef7636 100644 --- a/include/regex.h +++ b/include/regex.h @@ -37,7 +37,8 @@ extern int __regcomp (regex_t *__preg, const char *__pattern, int __cflags); libc_hidden_proto (__regcomp) extern int __regexec (const regex_t *__preg, const char *__string, - size_t __nmatch, regmatch_t __pmatch[], int __eflags); + size_t __nmatch, regmatch_t __pmatch[_VLA_ARG (__nmatch)], + int __eflags); libc_hidden_proto (__regexec) extern size_t __regerror (int __errcode, const regex_t *__preg, diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h index e490fc1aeb..6b6e4c233c 100644 --- a/misc/sys/cdefs.h +++ b/misc/sys/cdefs.h @@ -632,4 +632,14 @@ _Static_assert (0, "IEEE 128-bits long double requires redirection on this platf # define __attribute_returns_twice__ /* Ignore. */ #endif + +#if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ + && !defined __STDC_NO_VLA__) +/* Used to specify a variable bound in a declaration of a function + VLA-like parameter, as in 'int f (int n, int[_VLA_ARG (n)n]);' */ +# define _VLA_ARG(arg) arg +#else +# define _VLA_ARG(arg) +#endif + #endif /* sys/cdefs.h */ diff --git a/posix/regex.h b/posix/regex.h index 14fb1d8364..52ccc4b577 100644 --- a/posix/regex.h +++ b/posix/regex.h @@ -654,9 +654,8 @@ extern int regcomp (regex_t *_Restrict_ __preg, extern int regexec (const regex_t *_Restrict_ __preg, const char *_Restrict_ __String, size_t __nmatch, - regmatch_t __pmatch[_Restrict_arr_], - int __eflags) - __attr_access ((__write_only__, 4, 3)); + regmatch_t __pmatch[_Restrict_arr_ _VLA_ARG (__nmatch)], + int __eflags); extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg, char *_Restrict_ __errbuf, size_t __errbuf_size) --------------4FEAB7897D37171F438D0270--