From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 66590 invoked by alias); 4 Aug 2016 17:38:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 66580 invoked by uid 89); 4 Aug 2016 17:38:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.6 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=knock, Manu, matrix, tree_static X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 04 Aug 2016 17:37:58 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C40D72EEF; Thu, 4 Aug 2016 17:37:56 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-194.phx2.redhat.com [10.3.116.194]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u74HbuKR016591; Thu, 4 Aug 2016 13:37:56 -0400 Subject: Re: [PATCH 2/4] (v3) On-demand locations within string-literals To: David Malcolm , gcc-patches@gcc.gnu.org References: <1469841382.17384.65.camel@redhat.com> <1470239113-42666-1-git-send-email-dmalcolm@redhat.com> <1470239113-42666-2-git-send-email-dmalcolm@redhat.com> Cc: Joseph Myers From: Jeff Law Message-ID: Date: Thu, 04 Aug 2016 17:38:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1470239113-42666-2-git-send-email-dmalcolm@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-08/txt/msg00368.txt.bz2 On 08/03/2016 09:45 AM, David Malcolm wrote: > Changes in v3: > - Avoid including cpplib.h from input.h > - Properly handle stringified macro arguments (with tests for this) > - Minor whitespace fixes > - Move selftest.h changes to a separate patch > > Changes in v2: > - Tweaks to substring location selftests > - Many more selftests (EBCDIC, the various wide string types, etc) > - Clean up conditions in charset.c; require source == execution charset > to have substring locations > - Make string_concat_db field private > - Return error messages rather than bool > - Fix source_range for charset.c:convert_escape > - Introduce class substring_loc > - Handle bad input locations more gracefully > - Ensure that we can read substring information for a token which > starts in one linemap and ends in another (seen in > gcc.dg/cpp/pr69985.c) > > This version addresses Joseph's qn about stringification of macro > arguments (by failing gracefully on them), and the modularity > concerns noted by Manu. > > Successfully bootstrapped®rtested in conjunction with the rest of the > patch kit on x86_64-pc-linux-gnu. > > v2 of the kit successfully passes a full config-list.mk and a successful selftest > run for stage 1 on powerpc-ibm-aix7.1.3.0 (gcc111), both in conjunction with the > rest of the patch kit; I plan to repeat those tests. > > I believe I can self-approve the changes to input.c, input.h, libcpp, > and the testsuite; the remaining changes needing approval are those > to c-family and to gcc.c. I think that's a fair assessment. You might consider pulling those out as a distinct hunk in the future -- if you haven't noticed, I often try to knock out the smaller patches first (without even looking to see how much might be bits the author can self-approve). > > OK for trunk if it passes testing? (by itself) > > > gcc/c-family/ChangeLog: > * c-common.c: Include "substring-locations.h". > (get_cpp_ttype_from_string_type): New function. > (g_string_concat_db): New global. > (substring_loc::get_range): New method. > * c-common.h (g_string_concat_db): New declaration. > (class substring_loc): New class. > * c-lex.c (lex_string): When concatenating strings, capture the > locations of all tokens using a new obstack, and record the > concatenation locations within g_string_concat_db. > * c-opts.c (c_common_init_options): Construct g_string_concat_db > on the ggc-heap. > > gcc/ChangeLog: > * gcc.c (cpp_options): Rename string to... > (cpp_options_): ...this, to avoid clashing with struct in > cpplib.h. > (static_specs): Update initialize for above renaming > * input.c (string_concat::string_concat): New constructor. > (string_concat_db::string_concat_db): New constructor. > (string_concat_db::record_string_concatenation): New method. > (string_concat_db::get_string_concatenation): New method. > (string_concat_db::get_key_loc): New method. > (class auto_cpp_string_vec): New class. > (get_substring_ranges_for_loc): New function. > (get_source_range_for_substring): New function. > (get_num_source_ranges_for_substring): New function. > (class selftest::lexer_test_options): New class. > (struct selftest::lexer_test): New struct. > (class selftest::ebcdic_execution_charset): New class. > (selftest::ebcdic_execution_charset::s_singleton): New variable. > (selftest::lexer_test::lexer_test): New constructor. > (selftest::lexer_test::~lexer_test): New destructor. > (selftest::lexer_test::get_token): New method. > (selftest::assert_char_at_range): New function. > (ASSERT_CHAR_AT_RANGE): New macro. > (selftest::assert_num_substring_ranges): New function. > (ASSERT_NUM_SUBSTRING_RANGES): New macro. > (selftest::assert_has_no_substring_ranges): New function. > (ASSERT_HAS_NO_SUBSTRING_RANGES): New macro. > (selftest::test_lexer_string_locations_simple): New function. > (selftest::test_lexer_string_locations_ebcdic): New function. > (selftest::test_lexer_string_locations_hex): New function. > (selftest::test_lexer_string_locations_oct): New function. > (selftest::test_lexer_string_locations_letter_escape_1): New function. > (selftest::test_lexer_string_locations_letter_escape_2): New function. > (selftest::test_lexer_string_locations_ucn4): New function. > (selftest::test_lexer_string_locations_ucn8): New function. > (selftest::uint32_from_big_endian): New function. > (selftest::test_lexer_string_locations_wide_string): New function. > (selftest::uint16_from_big_endian): New function. > (selftest::test_lexer_string_locations_string16): New function. > (selftest::test_lexer_string_locations_string32): New function. > (selftest::test_lexer_string_locations_u8): New function. > (selftest::test_lexer_string_locations_utf8_source): New function. > (selftest::test_lexer_string_locations_concatenation_1): New > function. > (selftest::test_lexer_string_locations_concatenation_2): New > function. > (selftest::test_lexer_string_locations_concatenation_3): New > function. > (selftest::test_lexer_string_locations_macro): New function. > (selftest::test_lexer_string_locations_stringified_macro_argument): > New function. > (selftest::test_lexer_string_locations_non_string): New function. > (selftest::test_lexer_string_locations_long_line): New function. > (selftest::test_lexer_char_constants): New function. > (selftest::input_c_tests): Call the new test functions once per > case within the line_table test matrix. > * input.h (struct string_concat): New struct. > (struct location_hash): New struct. > (class string_concat_db): New class. > * substring-locations.h: New header. > > gcc/testsuite/ChangeLog: > * gcc.dg/plugin/diagnostic-test-string-literals-1.c: New file. > * gcc.dg/plugin/diagnostic-test-string-literals-2.c: New file. > * gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: New file. > * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above new files. > > libcpp/ChangeLog: > * charset.c (cpp_substring_ranges::cpp_substring_ranges): New > constructor. > (cpp_substring_ranges::~cpp_substring_ranges): New destructor. > (cpp_substring_ranges::add_range): New method. > (cpp_substring_ranges::add_n_ranges): New method. > (_cpp_valid_ucn): Add "char_range" and "loc_reader" params; if > they are non-NULL, read position information from *loc_reader > and update char_range->m_finish accordingly. > (convert_ucn): Add "char_range", "loc_reader", and "ranges" > params. If loc_reader is non-NULL, read location information from > it, and update *ranges accordingly, using char_range. > Conditionalize the conversion into tbuf on tbuf being non-NULL. > (convert_hex): Likewise, conditionalizing the call to > emit_numeric_escape on tbuf. > (convert_oct): Likewise. > (convert_escape): Add params "loc_reader" and "ranges". If > loc_reader is non-NULL, read location information from it, and > update *ranges accordingly. Conditionalize the conversion into > tbuf on tbuf being non-NULL. > (cpp_interpret_string): Rename to... > (cpp_interpret_string_1): ...this, adding params "loc_readers" and > "out". Use "to" to conditionalize the initialization and usage of > "tbuf", such as running the converter. If "loc_readers" is > non-NULL, use the instances within it, reading location > information from them, and passing them to convert_escape; likewise > write to "out" if loc_readers is non-NULL. Check for leading > quote and issue an error if it is not present. Update boundary > check from "== limit" to ">= limit" to protect against erroneous > location values to calls that are not parsing string literals. > (cpp_interpret_string): Reimplement in terms to > cpp_interpret_string_1. > (noop_error_cb): New function. > (cpp_interpret_string_ranges): New function. > (cpp_string_location_reader::cpp_string_location_reader): New > constructor. > (cpp_string_location_reader::get_next): New method. > * include/cpplib.h (class cpp_string_location_reader): New class. > (class cpp_substring_ranges): New class. > (cpp_interpret_string_ranges): New prototype. > * internal.h (_cpp_valid_ucn): Add params "char_range" and > "loc_reader". > * lex.c (forms_identifier_p): Pass NULL for new params to > _cpp_valid_ucn. > --- > > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c > index 27031b5..7a8b6ea 100644 > --- a/gcc/c-family/c-common.c > +++ b/gcc/c-family/c-common.c > @@ -1098,6 +1099,67 @@ fix_string_type (tree value) > TREE_STATIC (value) = 1; > return value; > } > + > +/* Given a string of type STRING_TYPE, determine what kind of string > + token created it: CPP_STRING, CPP_STRING16, CPP_STRING32, or > + CPP_WSTRING. Return CPP_OTHER in case of error. > + > + This effectively reverses part of the logic in > + lex_string and fix_string_type. */ > + > +static enum cpp_ttype > +get_cpp_ttype_from_string_type (tree string_type) > +{ > + gcc_assert (string_type); > + if (TREE_CODE (string_type) != ARRAY_TYPE) > + return CPP_OTHER; > + > + tree element_type = TREE_TYPE (string_type); > + if (TREE_CODE (element_type) != INTEGER_TYPE) > + return CPP_OTHER; > + > + int bits_per_character = TYPE_PRECISION (element_type); > + switch (bits_per_character) > + { > + case 8: > + return CPP_STRING; /* It could have also been CPP_UTF8STRING. */ > + case 16: > + return CPP_STRING16; > + case 32: > + return CPP_STRING32; > + } > + > + if (bits_per_character == TYPE_PRECISION (wchar_type_node)) > + return CPP_WSTRING; Doesn't the switch above effectively mean we don't use CPP_WSTRING? In what cases do you expect it to be used? > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h > index 8c80574..7b5da57 100644 > --- a/gcc/c-family/c-common.h > +++ b/gcc/c-family/c-common.h > @@ -1110,6 +1110,35 @@ extern time_t cb_get_source_date_epoch (cpp_reader *pfile); > __TIME__ can store. */ > #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799) > > +extern GTY(()) string_concat_db *g_string_concat_db; Presumably this DB needs to persist through the entire compilation unit and the nodes inside reference GC'd objects, right? Just want to make 100% sure that we need to expose this to the GC system before ack-ing. The rest looks reasonable. So we just need to reach closure on those two issues IMHO. jeff