From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id 3E081385086A; Mon, 20 Mar 2023 10:19:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3E081385086A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x52f.google.com with SMTP id x3so44492995edb.10; Mon, 20 Mar 2023 03:19:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679307596; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6ZaOWs17V4Ral3sk+tmsxewBcJmtlI75gK181WwAQ0E=; b=AjqF2/iKySVXaBx0FG7OINgD06oWXzfbwJD2BfiQt7tu2M03wcWrnTHMZ3OLzi90vc mBf0rGdBVNmS5dYKkYeRjC92Ge9btS/FSyg/G39+MX0zSEU4RC3JpWo2AcfQkNAUqqxy hx/+UIS/e3zNiQmPaXwyMHeyavo5FxI3MEMe7mwmrJo/Uymw3Ew6vVMryZKZ+COmEVen fh9TMpeM86a0n7lTuuoQQ+OdFs/ydqUl2MT7omcTCcAA+IBUw5XkLOA9HbB3/bgFnodB zVFlNFZa2s3ttmSWXCQkqBdhU7yxqurt/DeQwws7SQekUTIP66oZ+h1ic64Mo4ojswDt IEAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679307596; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6ZaOWs17V4Ral3sk+tmsxewBcJmtlI75gK181WwAQ0E=; b=tb0neGSAzAUtTiKf0VGM0PXd+aYaMQ6G3uzN2gm/k0S9q8WrJ8tu0+a/IQBEDkUJ3P SVnA++mcf2EkGfMYtbGLdvjyhvgX3I3CtnOWgCbE1zoEt1slwvcP20YYhUDHD6kMPmxn Bu6Mf0idGS/Mb7WdUPx0uYaVgVZcksFkuail7uwA5EhZoScdFPVi6Ze/fo8YV2IUW3b4 1GLwPGbJuOZKOuUBuHlY2eDIqrqhfsKFA4EriZWP6bw0zsgRwtNtuvBR/7UBFUYpYu5K CheyAQZkBFicbmNScj7gY/uK0oTSD2Aptvj0p9TrXIP786pMJu1zrLzyG/xucC12gat4 dkDA== X-Gm-Message-State: AO0yUKW95WLyt7iBPaqgz98biUkvoKTVd921Eof4S/mXtF0Mae5Z4+op SEYSpw7/BZKiFXWMZcA1InSNRPepOOl0KybmDOU= X-Google-Smtp-Source: AK7set8J4nJPQflTTtT1RaCx2aVwbBlITDC5zZl67v6DWbhh1HdbYWPHNEf1WR79G8zATA6jwdHOgcaayz6gEC5Uh/g= X-Received: by 2002:a50:c308:0:b0:501:d3a2:b4b3 with SMTP id a8-20020a50c308000000b00501d3a2b4b3mr142632edb.0.1679307595713; Mon, 20 Mar 2023 03:19:55 -0700 (PDT) MIME-Version: 1.0 References: <87lejxujso.fsf@euler.schwinge.homeip.net> In-Reply-To: From: Raiki Tamura Date: Mon, 20 Mar 2023 19:19:43 +0900 Message-ID: Subject: Re: [GSoC] gccrs Unicode support To: Jakub Jelinek Cc: Jonathan Wakely , Mark Wielaard , Thomas Schwinge , Philip Herron , "gcc@gcc.gnu.org" , gcc-rust@gcc.gnu.org, David Edelsohn , Arthur Cohen , =?UTF-8?Q?Arsen_Arsenovi=C4=87?= Content-Type: multipart/alternative; boundary="000000000000b6104005f7524454" X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000b6104005f7524454 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable 2023=E5=B9=B43=E6=9C=8818=E6=97=A5(=E5=9C=9F) 18:28 Jakub Jelinek : > That is a pretty simple thing, so no need to use an extra library for tha= t. > As is documented in contrib/unicode/README, the Unicode *.txt files are > already checked in and there are several generators of tables. > libcpp/makeucnid.cc already creates tables based on the > UnicodeData.txt DerivedNormalizationProps.txt DerivedCoreProperties.txt > files, including NFC/NKFC, it is true it doesn't currently compute > whether a character is alphanumeric. That is either Alphabetic > DerivedCoreProperties.txt property, or for numeric Nd, Nl or No category > (3rd column) in UnicodeData.txt. Should be a few lines to add that suppo= rt > to libcpp/makeucnid.cc, the only question is if it won't make the ucnrang= es > array much larger if it differentiates based on another ALPHANUM flag. > If it doesn't grow too much, let's put it there, if it would grow too muc= h, > perhaps we should emit it in a separate table. > Sounds good. I have got a concrete idea of implementation. Thank you everyone for giving your advice. Sincerely yours, Raiki Tamura --000000000000b6104005f7524454--