From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1551) id 528033836655; Tue, 31 May 2022 12:57:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 528033836655 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Pedro Alves To: gdb-cvs@sourceware.org Subject: [binutils-gdb] Clarify why we unit test matching symbol names with 0xff characters X-Act-Checkin: binutils-gdb X-Git-Author: Pedro Alves X-Git-Refname: refs/heads/master X-Git-Oldrev: e595ad4cc20a9b34fbda044b161cc7daccdfcf66 X-Git-Newrev: 102a644eaaa8b258f021da71028c32e0744d73ce Message-Id: <20220531125706.528033836655@sourceware.org> Date: Tue, 31 May 2022 12:57:06 +0000 (GMT) X-BeenThere: gdb-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 May 2022 12:57:06 -0000 https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3D102a644eaaa8= b258f021da71028c32e0744d73ce commit 102a644eaaa8b258f021da71028c32e0744d73ce Author: Pedro Alves Date: Tue May 31 13:36:32 2022 +0100 Clarify why we unit test matching symbol names with 0xff characters =20 In the name matching unit tests in gdb/dwarf2/read.c, explain better why we test symbols with \377 / 0xff characters (Latin1 '=C3=BF'). =20 Change-Id: I517f13adfff2e4d3cd783fec1d744e2b26e18b8e Diff: --- gdb/dwarf2/read.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index c4578c687d2..848fd5627b8 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -3628,10 +3628,17 @@ static const char *test_symbols[] =3D { is "function" in PT). */ u8"u8fun=C3=A7=C3=A3o", =20 - /* \377 (0xff) is Latin1 '=C3=BF'. */ + /* Test a symbol name that ends with a 0xff character, which is a + valid character in non-UTF-8 source character sets (e.g. Latin1 + '=C3=BF'), and we can't rule out compilers allowing it in identifiers. + We test this because the completion algorithm finds the upper + bound of symbols by looking for the insertion point of + "func"-with-last-character-incremented, i.e. "fund", and adding 1 + to 0xff should wraparound and carry to the previous character. + See comments in make_sort_after_prefix_name. */ "yfunc\377", =20 - /* \377 (0xff) is Latin1 '=C3=BF'. */ + /* Some more symbols with \377 (0xff). See above. */ "\377", "\377\377123", =20 @@ -3701,7 +3708,8 @@ test_mapped_index_find_name_component_bounds () } =20 /* Check that the increment-last-char in the name matching algorithm - for completion doesn't get confused with Ansi1 '=C3=BF' / 0xff. */ + for completion doesn't get confused with Ansi1 '=C3=BF' / 0xff. See + make_sort_after_prefix_name. */ { static const char *expected_syms1[] =3D { "\377", @@ -3770,7 +3778,8 @@ test_dw2_expand_symtabs_matching_symbol () } =20 /* Check that the name matching algorithm for completion doesn't get - confused with Latin1 '=C3=BF' / 0xff. */ + confused with Latin1 '=C3=BF' / 0xff. See + make_sort_after_prefix_name. */ { static const char str[] =3D "\377"; CHECK_MATCH (str, symbol_name_match_type::FULL, true,