From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 65765 invoked by alias); 4 Aug 2016 20:18:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 65753 invoked by uid 89); 4 Aug 2016 20:18:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=clexc, CPP_STRING32, cpp_reader, sk:string_ X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 04 Aug 2016 20:18:20 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C83B6C056790; Thu, 4 Aug 2016 20:18:18 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-194.phx2.redhat.com [10.3.116.194]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u74KIIfO014238; Thu, 4 Aug 2016 16:18:18 -0400 Subject: Re: [PATCH 2/4] (v3) On-demand locations within string-literals To: David Malcolm , gcc-patches@gcc.gnu.org References: <1469841382.17384.65.camel@redhat.com> <1470239113-42666-1-git-send-email-dmalcolm@redhat.com> <1470239113-42666-2-git-send-email-dmalcolm@redhat.com> <1470338501.8203.47.camel@redhat.com> Cc: Joseph Myers From: Jeff Law Message-ID: <46aafdfa-3cbe-8072-cef1-c827ec144b7e@redhat.com> Date: Thu, 04 Aug 2016 20:18:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1470338501.8203.47.camel@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-08/txt/msg00390.txt.bz2 On 08/04/2016 01:21 PM, David Malcolm wrote: >>> + >>> +static enum cpp_ttype >>> +get_cpp_ttype_from_string_type (tree string_type) >>> +{ >>> + gcc_assert (string_type); >>> + if (TREE_CODE (string_type) != ARRAY_TYPE) >>> + return CPP_OTHER; >>> + >>> + tree element_type = TREE_TYPE (string_type); >>> + if (TREE_CODE (element_type) != INTEGER_TYPE) >>> + return CPP_OTHER; >>> + >>> + int bits_per_character = TYPE_PRECISION (element_type); >>> + switch (bits_per_character) >>> + { >>> + case 8: >>> + return CPP_STRING; /* It could have also been >>> CPP_UTF8STRING. */ >>> + case 16: >>> + return CPP_STRING16; >>> + case 32: >>> + return CPP_STRING32; >>> + } >>> + >>> + if (bits_per_character == TYPE_PRECISION (wchar_type_node)) >>> + return CPP_WSTRING; >> Doesn't the switch above effectively mean we don't use CPP_WSTRING? >> In >> what cases do you expect it to be used? > > I was attempting to provide an inverse of lex_string and > fix_string_type, going back from a tree type to a cpp_ttype. And I'm guessing (without looking closely at those routines) that you may not be able to reliably map backwards because CPP_WSTRING and one of CCP_STRING{16,32} are indistinguishable at this point. At least I think they are indistinguishable. The > purpose of the ttype is to determine the execution charset of a > STRING_CST to enforce the requirement in cpp_interpret_string_ranges > that there's a 1:1 correspondence between bytes in the source encoding > and bytes in the execution encoding. > > c-lex.c: lex_string has: > case CPP_WSTRING: > value = build_string (TYPE_PRECISION (wchar_type_node) > / TYPE_PRECISION (char_type_node), > "\0\0\0"); /* widest supported wchar_t > is 32 bits */ > > Given that, it looks like it's not possible for that conditional to > fire (unless we somehow have a 24-bit wchar_t???) I think wchar_t has to be an integral type, so this could only happen if one of the standard integral types was 24 bits. I guess that is possible. I think the code as written would catch that case -- but then again, if we had such a target in GCC, we'd probably end up defining CPP_STRING24 and the WCHAR code wouldn't fire. > > Should I just drop the CPP_WSTRING conditional? (and update the > function comment, to capture the fact that the cpp_ttype is one with > the same execution encoding as the STRING_CST, not necessarily equal to > the exact cpp_ttype that was in use). I'd probably put a default case with some kind of assert/checking failure so that if some of this nature ever happens we'll get a nice loud message that our assumptions were incorrect ;-) And yes, I think your suggestion on the function comment is spot-on. OK with those changes. > >>> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h >>> index 8c80574..7b5da57 100644 >>> --- a/gcc/c-family/c-common.h >>> +++ b/gcc/c-family/c-common.h >>> @@ -1110,6 +1110,35 @@ extern time_t cb_get_source_date_epoch >>> (cpp_reader *pfile); >>> __TIME__ can store. */ >>> #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799) >>> >>> +extern GTY(()) string_concat_db *g_string_concat_db; >> Presumably this DB needs to persist through the entire compilation >> unit >> and the nodes inside reference GC'd objects, right? Just want to >> make >> 100% sure that we need to expose this to the GC system before ack >> -ing. > > It needs to persist for as long as we might make queries about > substring locations. It doesn't reference GC'd objects. However it > might be desirable to locate locations within string constants loaded > from PCH files, hence I put it in GTY memory. Is this overkill? Ugh. OK. I'd rather it not be in GC, but I see the motivation. And presumably with Martin's code wanting to issue warnings from the middle-end, we have to keep the table around through the middle end. jeff