From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zimbra.cs.ucla.edu (zimbra.cs.ucla.edu [131.179.128.68]) by sourceware.org (Postfix) with ESMTPS id 4BC883857C6D for ; Fri, 8 Jan 2021 22:30:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4BC883857C6D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=cs.ucla.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=eggert@cs.ucla.edu Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C6CFD16008F; Fri, 8 Jan 2021 14:30:22 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id DXGRwi4NZuHk; Fri, 8 Jan 2021 14:30:21 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 54C3A1600BE; Fri, 8 Jan 2021 14:30:21 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jPH72n07T8Ms; Fri, 8 Jan 2021 14:30:21 -0800 (PST) Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com [23.243.218.95]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 26AB816008F; Fri, 8 Jan 2021 14:30:21 -0800 (PST) To: Adhemerval Zanella References: <20210106181707.1738066-1-adhemerval.zanella@linaro.org> <20210106181707.1738066-2-adhemerval.zanella@linaro.org> From: Paul Eggert Organization: UCLA Computer Science Department Cc: libc-alpha@sourceware.org, bug-gnulib@gnu.org Subject: Re: [PATCH 2/3] posix: Remove alloca usage on regex build_trtable Message-ID: <831a8e3b-7a4a-0c9a-418e-0656fc137765@cs.ucla.edu> Date: Fri, 8 Jan 2021 14:30:20 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210106181707.1738066-2-adhemerval.zanella@linaro.org> Content-Type: multipart/mixed; boundary="------------0E52406FDC342D35E8B40E93" Content-Language: en-US X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jan 2021 22:30:25 -0000 This is a multi-part message in MIME format. --------------0E52406FDC342D35E8B40E93 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 1/6/21 10:17 AM, Adhemerval Zanella wrote: > __libc_use_alloca/alloca is replaced with malloc regardless. These allocations are so small that they should be put on the stack=20 instead of using malloc. I did that in Gnulib by installing the attached=20 patch. The idea is that the resulting regexec.c file should be copyable=20 unchanged into glibc. From a Gnulib point of view this code uses a 20 KiB frame (on a 64-bit=20 host) which goes past the usual 4032-byte limit for stack frames, but I=20 think we can stretch the point here. In glibc the limit is 64 KiB so=20 there's no problem. --------------0E52406FDC342D35E8B40E93 Content-Type: text/x-patch; charset=UTF-8; name="0001-regexec-remove-alloca-usage-in-build_trtable.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-regexec-remove-alloca-usage-in-build_trtable.patch" =46rom 025e89118912adaa309374201c9012f6fb46d583 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 8 Jan 2021 14:22:15 -0800 Subject: [PATCH] regexec: remove alloca usage in build_trtable MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit Prompted by this different change proposed by Adhemerval Zanella: https://sourceware.org/pipermail/libc-alpha/2021-January/121373.html * lib/regexec.c (build_trtable): Prevent inlining, so that it doesn=E2=80=99t bloat the caller=E2=80=99s stack. Use auto variables instead of alloca/malloc. After these changes, build_trtable=E2=80=99s total stack allocation is only 20 KiB on a 64-bit machine, and this is less than glibc=E2=80=99s 64= KiB cutoff so there=E2=80=99s little point to using alloca to shrink it. Although Gnulib traditionally has used a 4 KiB cutoff, going to 20 KiB here should not be a significant problem in practice; Gnulib-using packages concerned about overflow of tiny stacks can compile with something like gcc -fstack-clash-protection. * config/srclist.txt: Comment out regexec.c for now. --- ChangeLog | 14 +++++++++ config/srclist.txt | 2 +- lib/regexec.c | 75 ++++++++-------------------------------------- 3 files changed, 28 insertions(+), 63 deletions(-) diff --git a/ChangeLog b/ChangeLog index 540f15a3c..708a266b0 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,19 @@ 2021-01-08 Paul Eggert =20 + regexec: remove alloca usage in build_trtable + Prompted by this different change proposed by Adhemerval Zanella: + https://sourceware.org/pipermail/libc-alpha/2021-January/121373.html + * lib/regexec.c (build_trtable): Prevent inlining, + so that it doesn=E2=80=99t bloat the caller=E2=80=99s stack. + Use auto variables instead of alloca/malloc. + After these changes, build_trtable=E2=80=99s total stack allocation is + only 20 KiB on a 64-bit machine, and this is less than glibc=E2=80=99s = 64 + KiB cutoff so there=E2=80=99s little point to using alloca to shrink it= =2E + Although Gnulib traditionally has used a 4 KiB cutoff, going to 20 + KiB here should not be a significant problem in practice; + Gnulib-using packages concerned about overflow of tiny stacks can + compile with something like gcc -fstack-clash-protection. + scratch_buffer: add scratch_buffer_dupfree macro * lib/scratch_buffer.h (__libc_scratch_buffer_dupfree): New macro, needed to support recent changes in this module. diff --git a/config/srclist.txt b/config/srclist.txt index e7ceeb088..d669fd8f0 100644 --- a/config/srclist.txt +++ b/config/srclist.txt @@ -69,7 +69,7 @@ $LIBCSRC posix/regex.c lib #$LIBCSRC posix/regex.h lib #$LIBCSRC posix/regex_internal.c lib #$LIBCSRC posix/regex_internal.h lib -$LIBCSRC posix/regexec.c lib +#$LIBCSRC posix/regexec.c lib #$LIBCSRC stdlib/canonicalize lib/canonicalize-lgpl.c #$LIBCSRC sysdeps/generic/eloop-threshold.h lib $LIBCSRC time/timegm.c lib diff --git a/lib/regexec.c b/lib/regexec.c index 889b10be9..f7b4f9cfc 100644 --- a/lib/regexec.c +++ b/lib/regexec.c @@ -3247,7 +3247,7 @@ expand_bkref_cache (re_match_context_t *mctx, re_no= de_set *cur_nodes, /* Build transition table for the state. Return true if successful. */ =20 -static bool +static bool __attribute_noinline__ build_trtable (const re_dfa_t *dfa, re_dfastate_t *state) { reg_errcode_t err; @@ -3255,36 +3255,20 @@ build_trtable (const re_dfa_t *dfa, re_dfastate_t= *state) int ch; bool need_word_trtable =3D false; bitset_word_t elem, mask; - bool dests_node_malloced =3D false; - bool dest_states_malloced =3D false; Idx ndests; /* Number of the destination states from 'state'. */ re_dfastate_t **trtable; - re_dfastate_t **dest_states =3D NULL, **dest_states_word, **dest_state= s_nl; - re_node_set follows, *dests_node; - bitset_t *dests_ch; + re_dfastate_t *dest_states[SBC_MAX]; + re_dfastate_t *dest_states_word[SBC_MAX]; + re_dfastate_t *dest_states_nl[SBC_MAX]; + re_node_set follows; bitset_t acceptable; =20 - struct dests_alloc - { - re_node_set dests_node[SBC_MAX]; - bitset_t dests_ch[SBC_MAX]; - } *dests_alloc; - /* We build DFA states which corresponds to the destination nodes from 'state'. 'dests_node[i]' represents the nodes which i-th destination state contains, and 'dests_ch[i]' represents the characters which i-th destination state accepts. */ - if (__libc_use_alloca (sizeof (struct dests_alloc))) - dests_alloc =3D (struct dests_alloc *) alloca (sizeof (struct dests_= alloc)); - else - { - dests_alloc =3D re_malloc (struct dests_alloc, 1); - if (__glibc_unlikely (dests_alloc =3D=3D NULL)) - return false; - dests_node_malloced =3D true; - } - dests_node =3D dests_alloc->dests_node; - dests_ch =3D dests_alloc->dests_ch; + re_node_set dests_node[SBC_MAX]; + bitset_t dests_ch[SBC_MAX]; =20 /* Initialize transition table. */ state->word_trtable =3D state->trtable =3D NULL; @@ -3294,8 +3278,6 @@ build_trtable (const re_dfa_t *dfa, re_dfastate_t *= state) ndests =3D group_nodes_into_DFAstates (dfa, state, dests_node, dests_c= h); if (__glibc_unlikely (ndests <=3D 0)) { - if (dests_node_malloced) - re_free (dests_alloc); /* Return false in case of an error, true otherwise. */ if (ndests =3D=3D 0) { @@ -3310,38 +3292,14 @@ build_trtable (const re_dfa_t *dfa, re_dfastate_t= *state) =20 err =3D re_node_set_alloc (&follows, ndests + 1); if (__glibc_unlikely (err !=3D REG_NOERROR)) - goto out_free; - - /* Avoid arithmetic overflow in size calculation. */ - size_t ndests_max - =3D ((SIZE_MAX - (sizeof (re_node_set) + sizeof (bitset_t)) * SBC_MA= X) - / (3 * sizeof (re_dfastate_t *))); - if (__glibc_unlikely (ndests_max < ndests)) - goto out_free; - - if (__libc_use_alloca ((sizeof (re_node_set) + sizeof (bitset_t)) * SB= C_MAX - + ndests * 3 * sizeof (re_dfastate_t *))) - dest_states =3D (re_dfastate_t **) - alloca (ndests * 3 * sizeof (re_dfastate_t *)); - else { - dest_states =3D re_malloc (re_dfastate_t *, ndests * 3); - if (__glibc_unlikely (dest_states =3D=3D NULL)) - { -out_free: - if (dest_states_malloced) - re_free (dest_states); - re_node_set_free (&follows); - for (i =3D 0; i < ndests; ++i) - re_node_set_free (dests_node + i); - if (dests_node_malloced) - re_free (dests_alloc); - return false; - } - dest_states_malloced =3D true; + out_free: + re_node_set_free (&follows); + for (i =3D 0; i < ndests; ++i) + re_node_set_free (dests_node + i); + return false; } - dest_states_word =3D dest_states + ndests; - dest_states_nl =3D dest_states_word + ndests; + bitset_empty (acceptable); =20 /* Then build the states for all destinations. */ @@ -3466,16 +3424,9 @@ out_free: } } =20 - if (dest_states_malloced) - re_free (dest_states); - re_node_set_free (&follows); for (i =3D 0; i < ndests; ++i) re_node_set_free (dests_node + i); - - if (dests_node_malloced) - re_free (dests_alloc); - return true; } =20 --=20 2.27.0 --------------0E52406FDC342D35E8B40E93--