From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cygwin@tlinx.org>
Received: from Ishtar.sc.tlinx.org (ishtar.tlinx.org [173.164.175.65])
 by sourceware.org (Postfix) with ESMTPS id 7D5BB3858023
 for <cygwin@cygwin.com>; Sun,  4 Apr 2021 20:22:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7D5BB3858023
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=tlinx.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cygwin@tlinx.org
Received: from [192.168.3.12] (Athenae [192.168.3.12])
 by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id
 134KMlr6093529; Sun, 4 Apr 2021 13:22:51 -0700
Message-ID: <606A2017.2040405@tlinx.org>
Date: Sun, 04 Apr 2021 13:22:47 -0700
From: L A Walsh <cygwin@tlinx.org>
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: Mark Aitchison <M.Aitchison@cyberXpress.co.nz>
CC: cygwin@cygwin.com
Subject: Re: Perl Unidecode modules - which to use (if not Text::Unidecode)?
References: <d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz>
In-Reply-To: <d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_00, BODY_8BITS,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TRACKER_ID,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
X-List-Received-Date: Sun, 04 Apr 2021 20:23:01 -0000

On 2021/04/01 13:35, Mark Aitchison wrote:
> 1. What perl Unicode modules should I consider, if not Text::Unidecode?=
 The present need=20
> is to be able to convert those few "foreign" characters (like =C3=87=C4=
=86=C4=88=C4=8A=C3=A7=C4=89=C4=8B=C4=9C=C4=9E=C4=A0=C4=A2=C4=9D=C4=A3=C4=9F=
=C4=A1=C3=8B=C3=8C=C3=8D=C3=8E=C3=8F=C3=92=C3=93=C3=94=C3=95)=20
> that are basically ASCII with accent marks to their closest ASCII equiv=
alents,=20
---
    Hmm...have you tried installing from cpan?

I just tried it and it seems to work.

>  cpan -i Text::Unidecode;
>  > cat /tmp/in

=C3=87=C4=86=C4=88=C4=8A=C3=A7=C4=89=C4=8B=C4=9C=C4=9E=C4=A0=C4=A2=C4=9D=C4=
=A3=C4=9F=C4=A1=C3=8B=C3=8C=C3=8D=C3=8E=C3=8F=C3=92=C3=93=C3=94=C3=95

>  cat /tmp/in| perl -e '
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO

---
I.e. it stripped off all the accent marks.  Is that what you
want?


=20

    (it spewed some warnings, but seemed to test out ok, so tried it).
put your characters in a file "/tmp/in", (i.e.
>  cat /tmp/in
 -- I know, not very creative,
but then:
 cat /tmp/in| tperl
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO)

    Where are you seeing those characters and how do you know they are no=
t
already in unicode?  I.e. That I'm seeing characters "CcGgEIO" but with
accents -- indicates they area already in Unicode.

What are you wanting to do.. just convert them to the ASCII characters
with the accent marks stripped off?


> but I'd=20
> like to do more with Unicode in the future, without going down any dead=
-ends as far as=20
> being able to run under cygwin is concerned.
>
> 2. I see some talk of Internationalization in Chapter 2 of "Setting up =
Cygwin", but=20
> cannot see anything relating to perl modules, and I don't see any easy =
way to search many=20
> months of the mailing list for a keyword... is there any information I =
should know about?
>
>
> Thanks,
>
> Mark Aitchison
>
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>
>  =20