From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30359 invoked by alias); 17 Aug 2018 10:22:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 30329 invoked by uid 89); 17 Aug 2018 10:22:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=assembled, consensus, hurry X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com Received: from mail-oln040092068079.outbound.protection.outlook.com (HELO EUR02-HE1-obe.outbound.protection.outlook.com) (40.92.68.79) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Aug 2018 10:22:04 +0000 Received: from AM5EUR02FT059.eop-EUR02.prod.protection.outlook.com (10.152.8.60) by AM5EUR02HT183.eop-EUR02.prod.protection.outlook.com (10.152.9.73) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.20.1059.14; Fri, 17 Aug 2018 10:22:01 +0000 Received: from AM5PR0701MB2657.eurprd07.prod.outlook.com (10.152.8.57) by AM5EUR02FT059.mail.protection.outlook.com (10.152.9.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.1059.14 via Frontend Transport; Fri, 17 Aug 2018 10:22:01 +0000 Received: from AM5PR0701MB2657.eurprd07.prod.outlook.com ([fe80::24cf:823c:758c:41b7]) by AM5PR0701MB2657.eurprd07.prod.outlook.com ([fe80::24cf:823c:758c:41b7%5]) with mapi id 15.20.1059.007; Fri, 17 Aug 2018 10:22:01 +0000 From: Bernd Edlinger To: Richard Biener , Jeff Law CC: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] Check the STRING_CSTs in varasm.c Date: Fri, 17 Aug 2018 10:22:00 -0000 Message-ID: References: , In-Reply-To: received-spf: None (protection.outlook.com: hotmail.de does not designate permitted sender hosts) authentication-results: spf=none (sender IP is ) smtp.mailfrom=bernd.edlinger@hotmail.de; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-SW-Source: 2018-08/txt/msg01011.txt.bz2 Richard Biener wrote: > Note that I'm a little bit confused here given build_string > already appends a '\0' after TREE_STRING_LENGTH. So it is safe > to call strlen() on all STRING_CSTs. I think I raised this in > the review of one of Bernds patches but to be honest all the > various threads have become a bit convoluted (esp. if joining > in after two weeks of vacation). > > So -- why is what we do currently not enough? > > Quoting our single STRING_CST creator: > > tree > build_string (int len, const char *str) > { > tree s; > size_t length; > > /* Do not waste bytes provided by padding of struct tree_string. */ > length =3D len + offsetof (struct tree_string, str) + 1; > > record_node_allocation_statistics (STRING_CST, length); > > s =3D (tree) ggc_internal_alloc (length); > > memset (s, 0, sizeof (struct tree_typed)); > TREE_SET_CODE (s, STRING_CST); > TREE_CONSTANT (s) =3D 1; > TREE_STRING_LENGTH (s) =3D len; > memcpy (s->string.str, str, len); > s->string.str[len] =3D '\0'; > > return s; >} Thanks for this question. I have two issues with that. 1. it adds not a wide character nul, only a single byte '\0'. 2. the '\0' here is _guaranteed_ not to be assembled by varasm. Since it is at TREE_STRING_LENGTH+1. That is fine for some string constants, like ASM constaraints. it makes most of the time sense, as long as it is not used for folding of memxxx functions. Of course the whole patch series is dependent on the consensus about how we want to handle string constants in the middle-end, so no problem with going back to the drawing board, that's the reason why I did not apply the already approved bits, and I guess we are not in a hurry. What I see, is that string constants are most of the time handled like the C language strings, that is there are different character wides, and the array_type on the string constants, includes another NUL (wide) char which is assembled along with the rest of the bytes, but it is not written in the language string constant. The front end appends this before giving the string constant to build_string. That means at least most of the time. So I thought it might be good to have some more easily checkable things regarding the zero termiation, or what is allowed for excess precision. It's possible that this shoots over the target, but I think this hardening in the varasm is of some value. This way how this patch works has one advantage: That it is easy to check for "there must be one wide char zero", if it is someting that can't be checked, like: "there may be a nul or not", then it is impossible to be checked. Well yeah, it's an idea. Bernd.