From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-433265-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 65765 invoked by alias); 4 Aug 2016 20:18:23 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 65753 invoked by uid 89); 4 Aug 2016 20:18:22 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=clexc, CPP_STRING32, cpp_reader, sk:string_
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 04 Aug 2016 20:18:20 +0000
Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id C83B6C056790;	Thu,  4 Aug 2016 20:18:18 +0000 (UTC)
Received: from localhost.localdomain (ovpn-116-194.phx2.redhat.com [10.3.116.194])	by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u74KIIfO014238;	Thu, 4 Aug 2016 16:18:18 -0400
Subject: Re: [PATCH 2/4] (v3) On-demand locations within string-literals
To: David Malcolm <dmalcolm@redhat.com>, gcc-patches@gcc.gnu.org
References: <1469841382.17384.65.camel@redhat.com> <1470239113-42666-1-git-send-email-dmalcolm@redhat.com> <1470239113-42666-2-git-send-email-dmalcolm@redhat.com> <b1ca3950-b9c1-c887-a881-82e00126ca2a@redhat.com> <1470338501.8203.47.camel@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>
From: Jeff Law <law@redhat.com>
Message-ID: <46aafdfa-3cbe-8072-cef1-c827ec144b7e@redhat.com>
Date: Thu, 04 Aug 2016 20:18:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <1470338501.8203.47.camel@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2016-08/txt/msg00390.txt.bz2

On 08/04/2016 01:21 PM, David Malcolm wrote:

>>> +
>>> +static enum cpp_ttype
>>> +get_cpp_ttype_from_string_type (tree string_type)
>>> +{
>>> +  gcc_assert (string_type);
>>> +  if (TREE_CODE (string_type) != ARRAY_TYPE)
>>> +    return CPP_OTHER;
>>> +
>>> +  tree element_type = TREE_TYPE (string_type);
>>> +  if (TREE_CODE (element_type) != INTEGER_TYPE)
>>> +    return CPP_OTHER;
>>> +
>>> +  int bits_per_character = TYPE_PRECISION (element_type);
>>> +  switch (bits_per_character)
>>> +    {
>>> +    case 8:
>>> +      return CPP_STRING;  /* It could have also been
>>> CPP_UTF8STRING.  */
>>> +    case 16:
>>> +      return CPP_STRING16;
>>> +    case 32:
>>> +      return CPP_STRING32;
>>> +    }
>>> +
>>> +  if (bits_per_character == TYPE_PRECISION (wchar_type_node))
>>> +    return CPP_WSTRING;
>> Doesn't the switch above effectively mean we don't use CPP_WSTRING?
>>  In
>> what cases do you expect it to be used?
>
> I was attempting to provide an inverse of lex_string and
> fix_string_type, going back from a tree type to a cpp_ttype.
And I'm guessing (without looking closely at those routines) that you 
may not be able to reliably map backwards because CPP_WSTRING and one of 
CCP_STRING{16,32} are indistinguishable at this point.  At least I think 
they are indistinguishable.


The
> purpose of the ttype is to determine the execution charset of a
> STRING_CST to enforce the requirement in cpp_interpret_string_ranges
> that there's a 1:1 correspondence between bytes in the source encoding
> and bytes in the execution encoding.
>
> c-lex.c: lex_string has:
> 	case CPP_WSTRING:
> 	  value = build_string (TYPE_PRECISION (wchar_type_node)
> 				/ TYPE_PRECISION (char_type_node),
> 				"\0\0\0");  /* widest supported wchar_t
> 					       is 32 bits */
>
> Given that, it looks like it's not possible for that conditional to
> fire (unless we somehow have a 24-bit wchar_t???)
I think wchar_t has to be an integral type, so this could only happen if 
one of the standard integral types was 24 bits.  I guess that is 
possible.  I think the code as written would catch that case -- but then 
again, if we had such a target in GCC, we'd probably end up defining 
CPP_STRING24 and the WCHAR code wouldn't fire.


>
> Should I just drop the CPP_WSTRING conditional?  (and update the
> function comment, to capture the fact that the cpp_ttype is one with
> the same execution encoding as the STRING_CST, not necessarily equal to
> the exact cpp_ttype that was in use).
I'd probably put a default case with some kind of assert/checking 
failure so that if some of this nature ever happens we'll get a nice 
loud message that our assumptions were incorrect ;-)

And yes, I think your suggestion on the function comment is spot-on.

OK with those changes.

>
>>> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
>>> index 8c80574..7b5da57 100644
>>> --- a/gcc/c-family/c-common.h
>>> +++ b/gcc/c-family/c-common.h
>>> @@ -1110,6 +1110,35 @@ extern time_t cb_get_source_date_epoch
>>> (cpp_reader *pfile);
>>>     __TIME__ can store.  */
>>>  #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799)
>>>
>>> +extern GTY(()) string_concat_db *g_string_concat_db;
>> Presumably this DB needs to persist through the entire compilation
>> unit
>> and the nodes inside reference GC'd objects, right?  Just want to
>> make
>> 100% sure that we need to expose this to the GC system before ack
>> -ing.
>
> It needs to persist for as long as we might make queries about
> substring locations.  It doesn't reference GC'd objects.  However it
> might be desirable to locate locations within string constants loaded
> from PCH files, hence I put it in GTY memory.  Is this overkill?
Ugh.  OK.  I'd rather it not be in GC, but I see the motivation.

And presumably with Martin's code wanting to issue warnings from the 
middle-end, we have to keep the table around through the middle end.

jeff