From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by sourceware.org (Postfix) with ESMTPS id 7EF1B3858404 for ; Thu, 28 Oct 2021 04:12:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7EF1B3858404 Received: by mail-yb1-xb34.google.com with SMTP id r184so11859827ybc.10 for ; Wed, 27 Oct 2021 21:12:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=4LQg2xWz87w5JCQLfd4wV2IzcHR/WuZKfsaBMkGvCcs=; b=XTsQnOT3lbHHXonQanF6R0AU2bASXcaUNF9qM9vmNuaBy+QmJqk+yek+21ri5LfmwA 3Bko0tjhmuOBIlHqVO8Qd0PO579EGWAtK6ugpgMFCpDR+eLwXQWkneznoYHIhBcuJemG aZtztENOQV4cPYRhJGCNe7XFFkHhsG83zqcIWbgbCyJLtL7OBS2KrV8OWS7n26AbkKd0 q6DNf8dOXUCLSKOKAoUjjqXUk+UbQIanTlUDAkPDy29JXuNAQheMPDegyE4gx3VuLwgK PrQq6Qma12aJIcBljNX0r82/EcyYHbeaFAHYUuFfvItrNKVedak9zUuueo6N3sEy7kjF 0c/w== X-Gm-Message-State: AOAM530WRhPEbvBkSpokyQm7ddVg9hucqnN8TA2Q1Q+hR2o+MZvd9nwO HJT+QogrQxt48B8JSVscvdPFrhtkSeGpRslY3ItOvr4v1mP8GQ== X-Google-Smtp-Source: ABdhPJwP4RCvscArLLlt5yDmVx54vztf0YeI0nJUtttwjh/WLQF4yxPFJ4UUUd/cBHcDHM5d++UgSnRzkP3asRtKrFE= X-Received: by 2002:a5b:d0b:: with SMTP id y11mr1968605ybp.492.1635394334763; Wed, 27 Oct 2021 21:12:14 -0700 (PDT) MIME-Version: 1.0 References: <20211027052959.2549214-1-maskray@google.com> In-Reply-To: From: =?UTF-8?B?RsSBbmctcnXDrCBTw7JuZw==?= Date: Wed, 27 Oct 2021 21:12:03 -0700 Message-ID: Subject: Re: [PATCH] regex: Unnest nested functions in regcomp.c To: Joseph Myers Cc: Paul Eggert , libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-18.2 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2021 04:12:18 -0000 On Wed, Oct 27, 2021 at 9:11 AM Joseph Myers wrot= e: > > On Wed, 27 Oct 2021, Paul Eggert wrote: > > > On 10/26/21 22:29, Fangrui Song wrote: > > > collseqwc, table_size, symb_table, extra are now initialized to appea= se > > > GCC -Werror=3Dmaybe-uninitialized false positive. > > > > Are the diagnostics really false positives? As I understand it, the mod= ified > > code would have undefined behavior if these variables were not initiali= zed. Are sorry, the diagnostics are legitimate. collseqwc, table_size, symb_table, extra are now passed as arguments so they need to be initialized. Previously they were accessed from `auto inline` nested functions. With the delayed access GCC did not warn. In file included from regex.c:74: regcomp.c: In function =E2=80=98parse_expression=E2=80=99: regcomp.c:3102:11: error: =E2=80=98table_size=E2=80=99 may be used uninitia= lized in this function [-Werror=3Dmaybe-uninitialized] 3102 | int32_t table_size; | ^~~~~~~~~~ I don't know how to compare performance difference but the new version is just 16 bytes larger in .text on x86-64 (gcc -O2). % ~/projects/bloaty/Release/bloaty /tmp/c/regex1.o -- /tmp/c/regex.o FILE SIZE VM SIZE -------------- -------------- +118% +20 [ =3D ] 0 [Unmapped] +0.0% +16 +0.0% +16 .text -5.6% -4 [ =3D ] 0 .data +0.0% +32 +0.0% +16 TOTAL The code move apparently triggers large assembly move in GCC's code generat= ion. The 16 byte .text increase is likely due to the now needed zero initialization for the four variables. % diff -u =3D(llvm-objdump -d --symbolize-operands -M intel --no-leading-addr --no-show-raw-insn /tmp/c/regex.o) =3D(llvm-objdump -d --symbolize-operands -M intel --no-leading-addr --no-show-raw-insn /tmp/c/regex1.o) ... @@ -13236,7 +13236,11 @@ mov r15d, ebx test ebx, ebx jne -: + mov qword ptr [rsp + 96], 0 + mov qword ptr [rsp + 120], 0 + mov dword ptr [rsp + 116], 0 + mov qword ptr [rsp + 88], 0 +: mov esi, 1 mov edi, 32 call > And if they are false positives, the glibc convention in such cases is > generally to use the DIAG_* macros from libc-diag.h to suppress the > warnings, with appropriate comments explaining why they are false > positives, rather than adding an initialization that should not be > necessary. (Sometimes another approach is more appropriate to warning > suppression - for example, a call to __builtin_unreachable that enables > the compiler to see that certain paths through the code cannot actually > occur.) Thanks for the tip. > -- > Joseph S. Myers > joseph@codesourcery.com