From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by sourceware.org (Postfix) with ESMTPS id 92BB03858D20 for ; Tue, 1 Mar 2022 14:49:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 92BB03858D20 Received: by mail-io1-xd33.google.com with SMTP id f14so18712958ioz.1 for ; Tue, 01 Mar 2022 06:49:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=dEP62rERi65SXhAx2wiTGuZz9dYxCkNdRVr9KOnREoQ=; b=xgu6QmE2RY0VDjtMkwzQWLHrjqdDLSDulmKXkH1L8Qf8l6nklA8bae4W+Qx5dfZeGP TR0Mi/cAu+lQdp72HpfDFXC42I5lUg8Iu14GzvMn0wofgvEI91/oaHUaGMZC9XaD+SA9 Y1LPFtlkBGY82rvX0vh93+fkd6y3jbw9vSx8Oo9saE/gCgcRyZ7HKnQswFchzY3JxMd8 r7Bws/8xOUVCw27jCVdN3jFCnpDqRGFSquyfxlU/XWrl1GTyy8Sdxbpi8gBcH9Dqq8AQ gcCFbmfUsDLarZbmSrkQPxMYvw7cnitmO1oqWwNseSgfwrkCaUt/0rYkvMvDCiMDa6h6 zxsg== X-Gm-Message-State: AOAM530j0kjwEQ49ySIAHht5WdUXNjDfP8gFYKrF4SGfZ5go3B6n9K7a aTT+pj/H8mN9/KiueyRUvYxgW3A6nVsfLg== X-Google-Smtp-Source: ABdhPJz015VbaxArjBoJmN5/sr/UPku0205k7j1Pp7kzD4pd8QNL3jrtTzqNqne2BmUsJeviu7fKMA== X-Received: by 2002:a05:6638:382:b0:30e:3e2e:3227 with SMTP id y2-20020a056638038200b0030e3e2e3227mr21927578jap.234.1646146169995; Tue, 01 Mar 2022 06:49:29 -0800 (PST) Received: from murgatroyd (75-166-141-253.hlrn.qwest.net. [75.166.141.253]) by smtp.gmail.com with ESMTPSA id r3-20020a92ac03000000b002c3dfcb9a6csm1579714ilh.77.2022.03.01.06.49.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Mar 2022 06:49:29 -0800 (PST) From: Tom Tromey To: Eli Zaretskii Cc: Tom Tromey , gdb-patches@sourceware.org Subject: Re: [PATCH 5/5] Handle non-ASCII identifiers in Ada References: <20220228183304.1162089-1-tromey@adacore.com> <20220228183304.1162089-6-tromey@adacore.com> <83y21ulsuo.fsf@gnu.org> <87lexulnba.fsf@tromey.com> <83wnhel5bn.fsf@gnu.org> X-Attribution: Tom Date: Tue, 01 Mar 2022 07:49:29 -0700 In-Reply-To: <83wnhel5bn.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 01 Mar 2022 05:28:12 +0200") Message-ID: <87czj5locm.fsf@tromey.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Mar 2022 14:49:31 -0000 >> Ordinarily, yes, but in practice the Ada compiler uses quite old data, >> and so whatever is provided by a recent-ish Python is more than good >> enough. Eli> How old is "old data", and how recent-ish should be "recent-ish Eli> Python", for this purpose? The Ada front end doesn't actually document this, aside from: -- Note these tables are derived from those given in AI-285. For details -- see www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00285.TXT?rev=1.22. ... which I know to be false because other changes have been made to some of these tables after this. (You can see this code in gcc/gnat/libgnat/s-utf_32.adb.) However, when I examine the case-folding tables (e.g. look for "Lower_Case_Letters"), the last letters seen are: (16#10428#, 16#1044F#), -- DESERET SMALL LETTER LONG I .. DESERET SMALL LETTER EW (16#E0061#, 16#E007A#)); -- TAG LATIN SMALL LETTER A .. TAG LATIN SMALL LETTER Z These were in Unicode back in 2001. Eli> Or maybe we should document what is the Eli> oldest version of Python that currently suits the needs? Most people shouldn't run this script. The output is checked in. And if they do and get wildly different results, that will be caught in review. Of course, it won't really matter, because you can't really write an Ada program -- at least, not using GNAT -- that uses anything after 2001 anyway. This covers all the Python versions that are in normal use. For example Python 2.7, the oldest one I have around (and for which gdb is going to drop support soon anyway): >>> import unicodedata >>> unicodedata.unidata_version '5.2.0' This version of the data comes from 2009, plenty new enough. Tom