From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from Ishtar.sc.tlinx.org (ishtar.tlinx.org [173.164.175.65]) by sourceware.org (Postfix) with ESMTPS id 7D5BB3858023 for ; Sun, 4 Apr 2021 20:22:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7D5BB3858023 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tlinx.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cygwin@tlinx.org Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id 134KMlr6093529; Sun, 4 Apr 2021 13:22:51 -0700 Message-ID: <606A2017.2040405@tlinx.org> Date: Sun, 04 Apr 2021 13:22:47 -0700 From: L A Walsh User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Mark Aitchison CC: cygwin@cygwin.com Subject: Re: Perl Unidecode modules - which to use (if not Text::Unidecode)? References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TRACKER_ID, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Apr 2021 20:23:01 -0000 On 2021/04/01 13:35, Mark Aitchison wrote: > 1. What perl Unicode modules should I consider, if not Text::Unidecode?= The present need=20 > is to be able to convert those few "foreign" characters (like =C3=87=C4= =86=C4=88=C4=8A=C3=A7=C4=89=C4=8B=C4=9C=C4=9E=C4=A0=C4=A2=C4=9D=C4=A3=C4=9F= =C4=A1=C3=8B=C3=8C=C3=8D=C3=8E=C3=8F=C3=92=C3=93=C3=94=C3=95)=20 > that are basically ASCII with accent marks to their closest ASCII equiv= alents,=20 --- Hmm...have you tried installing from cpan? I just tried it and it seems to work. > cpan -i Text::Unidecode; > > cat /tmp/in =C3=87=C4=86=C4=88=C4=8A=C3=A7=C4=89=C4=8B=C4=9C=C4=9E=C4=A0=C4=A2=C4=9D=C4= =A3=C4=9F=C4=A1=C3=8B=C3=8C=C3=8D=C3=8E=C3=8F=C3=92=C3=93=C3=94=C3=95 > cat /tmp/in| perl -e ' use Text::Unidecode; while (<>) { print unidecode($_); }' CCCCcccGGGGggggEIIIIOOOO --- I.e. it stripped off all the accent marks. Is that what you want? =20 (it spewed some warnings, but seemed to test out ok, so tried it). put your characters in a file "/tmp/in", (i.e. > cat /tmp/in -- I know, not very creative, but then: cat /tmp/in| tperl use Text::Unidecode; while (<>) { print unidecode($_); }' CCCCcccGGGGggggEIIIIOOOO) Where are you seeing those characters and how do you know they are no= t already in unicode? I.e. That I'm seeing characters "CcGgEIO" but with accents -- indicates they area already in Unicode. What are you wanting to do.. just convert them to the ASCII characters with the accent marks stripped off? > but I'd=20 > like to do more with Unicode in the future, without going down any dead= -ends as far as=20 > being able to run under cygwin is concerned. > > 2. I see some talk of Internationalization in Chapter 2 of "Setting up = Cygwin", but=20 > cannot see anything relating to perl modules, and I don't see any easy = way to search many=20 > months of the mailing list for a keyword... is there any information I = should know about? > > > Thanks, > > Mark Aitchison > > -- > Problem reports: https://cygwin.com/problems.html > FAQ: https://cygwin.com/faq/ > Documentation: https://cygwin.com/docs.html > Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple > > =20