From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115362 invoked by alias); 3 Aug 2018 19:59:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115346 invoked by uid 89); 3 Aug 2018 19:59:15 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=tone, shock X-HELO: mail-qt0-f196.google.com Received: from mail-qt0-f196.google.com (HELO mail-qt0-f196.google.com) (209.85.216.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 03 Aug 2018 19:59:13 +0000 Received: by mail-qt0-f196.google.com with SMTP id z8-v6so7445176qto.9 for ; Fri, 03 Aug 2018 12:59:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=/aQrqt0z0zQIuzXxrx8bEM2z0AxCMW8RkeInLNYoIrw=; b=NxDT9ArQnsT+J3mvBVfFc8r0EvPTul2bzsm3KGeh0yM9vUD6Nc7Ry3EZi+S4qyOgsw gPZASe4M5+hjJ/7ry2HOtSus6ntfFjKVWSxpRScRaWHpSM41tptws9Ng2E7ugXdH2qU7 Ap7E3KcfE/+gzhDfpy4AEk8oZinK93uMkzQMfcnpUbag2960eqissPmWfVb4vdmXoyPk SFG6A+nhFn8ppgIysrx/BKcQJeidnQpNFvMp02MoaS7ce39IckuKGAhAYMeDGBIlX7iO yPAymScTTi8vT8sZKwNF8OcUwF6hOoUoyb4MqIPt2QoxbMtzlvTpO//AjKYJIjt2NlV+ OlNA== Return-Path: Received: from localhost.localdomain (97-118-124-30.hlrn.qwest.net. [97.118.124.30]) by smtp.gmail.com with ESMTPSA id a6-v6sm3264309qth.8.2018.08.03.12.59.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 03 Aug 2018 12:59:10 -0700 (PDT) Subject: Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) ) To: Bernd Edlinger , Gcc Patch List References: <805ad9c9-e6e4-a159-7fe1-6bb5562b8fe0@gmail.com> <3136e337-a551-3853-c536-b3b9ff0acdb9@gmail.com> <39ae51d8-0807-7604-d1f5-78d31d6fece1@gmail.com> From: Martin Sebor Message-ID: <8d668d1a-45f8-d595-b048-d0472ed3efc5@gmail.com> Date: Fri, 03 Aug 2018 19:59:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-08/txt/msg00311.txt.bz2 On 08/03/2018 07:00 AM, Bernd Edlinger wrote: > On 08/02/18 22:34, Martin Sebor wrote: >> On 08/02/2018 12:56 PM, Bernd Edlinger wrote: >>> On 08/02/18 15:26, Bernd Edlinger wrote: >>>>> >>>>> /* If the length can be computed at compile-time, return it. */ >>>>> - len = c_strlen (src, 0); >>>>> + tree array; >>>>> + tree len = c_strlen (src, 0, &array); >>>> >>>> You know the c_strlen tries to compute wide character sizes, >>>> but strlen does not do that, strlen (L"abc") should give 1 >>>> (or 0 on a BE machine) >>>> I wonder if that is correct. >>>> >>> [snip] >>>>> >>>>> static tree >>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg) >>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg) >>>>> { >>>>> if (!validate_arg (arg, POINTER_TYPE)) >>>>> return NULL_TREE; >>>>> else >>>>> { >>>>> - tree len = c_strlen (arg, 0); >>>>> - >>>>> + tree arr = NULL_TREE; >>>>> + tree len = c_strlen (arg, 0, &arr); >>>> >>>> Is it possible to write a test case where strlen(L"test") reaches this point? >>>> what will c_strlen return then? >>>> >>> >>> Yes, of course it is: >>> >>> $ cat y.c >>> int f(char *x) >>> { >>> return __builtin_strlen(x); >>> } >>> >>> int main () >>> { >>> return f((char*)&L"abcdef"[0]); >>> } >>> $ gcc -O3 -S y.c >>> $ cat y.s >>> main: >>> .LFB1: >>> .cfi_startproc >>> movl $6, %eax >>> ret >>> .cfi_endproc >>> >>> The reason is that c_strlen tries to fold wide chars at all. >>> I do not know when that was introduced, was that already before your last patches? >> >> The function itself was introduced in 1992 if not earlier, >> before wide strings even existed. AFAICS, it has always >> accepted strings of all widths. Until r241489 (in GCC 7) >> it computed their length in bytes, not characters. I don't >> know if that was on purpose or if it was just never changed >> to compute the length in characters when wide strings were >> first introduced. From the name I assume it's the latter. >> The difference wasn't detected until sprintf tests were added >> for wide string directives. The ChangeLog description for >> the change reads: Correctly handle wide strings. I didn't >> consider pathological cases like strlen (L"abc"). It >> shouldn't be difficult to change to fix this case. >> > > Oh, oh, oh.... > > $ cat y3.c > int main () > { > char c[100]; > int x = __builtin_sprintf (c, "%S", L"\uFFFF"); > > __builtin_printf("%d %ld\n", x,__builtin_strlen(c)); > } > > $ gcc-4.8 -O3 -std=c99 y3.c > $ ./a.out > -1 0 > $ gcc -O3 y3.c > $ ./a.out > 1 0 > $ echo $LANG > de_DE.UTF-8 > > I would have expected L"\uFFFF" to converted to UTF-8 > or another encoding, so the return value if sprintf is > far from obvious, and probably language dependent. > > Why do you think it is a good idea to use really every > opportunity to optimize totally unnecessary things like > using the return value from the sprintf function as it is? > > Did you never think this adds a significant maintenance > burden on the rest of us as well? Your condescending tone is uncalled for, and you clearly speak out of ignorance. I don't owe you an explanation but as I have said multiple times: most of my work, including the sprintf pass, is primarily motivated by detecting bugs like buffer overflow. Optimization is only a secondary goal (but bug detection depends on it). It may come as a shock to you but mistakes happen. That's why it's important to make an effort to detect them. This is one is a simple typo (handling %S the same way as %s instead of %ls. If you are incapable of a professional tone I would suggest you go harass someone else. Martin