From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.220]) by sourceware.org (Postfix) with ESMTPS id C5AD13858CD1 for ; Mon, 31 Jul 2023 21:37:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C5AD13858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=clisp.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=clisp.org ARC-Seal: i=1; a=rsa-sha256; t=1690839428; cv=none; d=strato.com; s=strato-dkim-0002; b=SNSdUznTcoGg0hDL61K90WUC3ZV4OcwxBJUe/DJy/ECJsU01liiEVC9J/SL7x/vtON 5lOMtCf5L8LjzzE0y6pgNpqn4CcYQvx2tLHCrhh1CMWlcnotDREnFRrk1mv9Tm5INZCt ZB4ORR0kuaB260mf1ANZJkGCZQQ7rC/kx1eL5Y7/vAeGChRCZtYOBJlATxiYM03ZIKm0 vODfLAzpeXsBAlVE/zv3U+sWGpNzZsNpGQc5f8xM7XAU7eFLAf7h1LCqZ4v5f8fAEYoe tJr8EU94Dv0vZxjeU5BVbFkiGqevXLWJEKqJK/xF9pY5HdcBGJXmo7usD1BvMEbpbW63 Z6Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1690839428; s=strato-dkim-0002; d=strato.com; h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=MZCa2Zg4xyNGADRmD7w1Ev488gLq6fWSZXYUAAnOL7M=; b=CJBTAftGxa7KGrEl9c82uczGn/e3NYgocRflYaJSBw6RIjcr/I9XtAKiOioti2xHHj 4gqmXi59jmeF97rEq3PGjmgu6QbPnMcdrLqSuDyd4FbV1fbKrZovREEAk8Y/v3zxkz1Z B3do+qkCqSInuF4QnmLQ/9i3blX6cn4ZG/7tklhc78JM6Du+axfVykWavGGRDK79pj6k wUzU0sW4cCm8VLWc8MfsqHelFW1mfRX3joNWkCPcZXkJGYrZ1t+R9ISQFNRprcSJ4YPA bJvz4JUWioic5e4QhZKimJNK/cC+HJgQzbBGmnNjNfOUpbFcE8TfkKRMiKDGDKphpl/M Svmw== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1690839428; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=MZCa2Zg4xyNGADRmD7w1Ev488gLq6fWSZXYUAAnOL7M=; b=hhp1YC7SccEJgKMt7DcRabAdzksAsyjpoZYHzGXOQHuLjFrFBcelVXVqHXl+lmSshz al7oB5byxf7G5yN6cuj49Lqs+sI5Bw4BHc9+u9Y9EWEOQIvNODvqdUPnOywU7az6b+mI ykBCSg/XJHXW94L1lhWhF28hTB52+9TjKF4sE2pWAtifZDMIN9KLQEkPHO5FVDyLXBMd ks7ZmIdr/J1Hrs+9ANDi4warWPilHQOFkzupEpG6cg0WjpHMRullj9oPa15h+Q3lNiQT aoPE3JIGjuYYE5MzA9X0AnBg9JU3edspRBiDfNphCDWTLDYwcH/5srBIyBXxPglTv9Te GJXQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1690839428; s=strato-dkim-0003; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=MZCa2Zg4xyNGADRmD7w1Ev488gLq6fWSZXYUAAnOL7M=; b=Gq7aVlgqVY24I7nRfeWVK6X3wn99+NGvTlHuLWX9rsjDDHUSsKxtKv6vRvJELvxnbc 7lnhBzF4YjKz2Z7tGJAg== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpPHj6C6mIk6D1piuCc2EubRrsS9rw==" Received: from nimes.localnet by smtp.strato.de (RZmta 49.6.6 AUTH) with ESMTPSA id x129eaz6VLb858i (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Mon, 31 Jul 2023 23:37:08 +0200 (CEST) From: Bruno Haible To: cygwin@cygwin.com, Brian Inglis Subject: Re: character class "alpha" Date: Mon, 31 Jul 2023 23:37:08 +0200 Message-ID: <18620212.dDkQJl9nhx@nimes> In-Reply-To: <223e3d56-1a63-57ef-5236-bc1df37716a0@Shaw.ca> References: <3884636.3uDm00564X@nimes> <4474610.kIfH5X4irW@nimes> <223e3d56-1a63-57ef-5236-bc1df37716a0@Shaw.ca> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Brian Inglis wrote: > It seems to me that most application developers needing to support > non-Western-European languages might want a non-POSIX interpretation of digits. Sure. GNU libunistring has dedicated API for this: - https://www.gnu.org/software/libunistring/manual/html_node/Object-oriented-API.html UC_DECIMAL_DIGIT_NUMBER. - https://www.gnu.org/software/libunistring/manual/html_node/Decimal-digit-value.html - https://www.gnu.org/software/libunistring/manual/html_node/Digit-value.html - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-objects.html UC_PROPERTY_DECIMAL_DIGIT - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-functions.html uc_is_property_decimal_digit I'm sure ICU4C has similar APIs too. > Are the Unicode character attribute classes supported for those application use > cases that need more than POSIX limitations allow? POSIX allows the libc to define additional character classes. But these will be platform and locale dependent, and I don't know of any application which makes use of such additional character classes via wctype() and iswctype(). > I know that I sometimes want to see some alternative numeric digit forms and > expect to be able to find those with an appropriate grep expression. I think you can do so with GNU 'grep', when it was built with PCRE support. PCRE includes support for Unicode character classes. Bruno