From: L A Walsh <cygwin@tlinx.org>
To: Mark Aitchison <M.Aitchison@cyberXpress.co.nz>
Cc: cygwin@cygwin.com
Subject: Re: Perl Unidecode modules - which to use (if not Text::Unidecode)?
Date: Sun, 04 Apr 2021 13:22:47 -0700 [thread overview]
Message-ID: <606A2017.2040405@tlinx.org> (raw)
In-Reply-To: <d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz>
On 2021/04/01 13:35, Mark Aitchison wrote:
> 1. What perl Unicode modules should I consider, if not Text::Unidecode? The present need
> is to be able to convert those few "foreign" characters (like ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ)
> that are basically ASCII with accent marks to their closest ASCII equivalents,
---
Hmm...have you tried installing from cpan?
I just tried it and it seems to work.
> cpan -i Text::Unidecode;
> > cat /tmp/in
ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ
> cat /tmp/in| perl -e '
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'
CCCCcccGGGGggggEIIIIOOOO
---
I.e. it stripped off all the accent marks. Is that what you
want?
(it spewed some warnings, but seemed to test out ok, so tried it).
put your characters in a file "/tmp/in", (i.e.
> cat /tmp/in
-- I know, not very creative,
but then:
cat /tmp/in| tperl
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'
CCCCcccGGGGggggEIIIIOOOO)
Where are you seeing those characters and how do you know they are not
already in unicode? I.e. That I'm seeing characters "CcGgEIO" but with
accents -- indicates they area already in Unicode.
What are you wanting to do.. just convert them to the ASCII characters
with the accent marks stripped off?
> but I'd
> like to do more with Unicode in the future, without going down any dead-ends as far as
> being able to run under cygwin is concerned.
>
> 2. I see some talk of Internationalization in Chapter 2 of "Setting up Cygwin", but
> cannot see anything relating to perl modules, and I don't see any easy way to search many
> months of the mailing list for a keyword... is there any information I should know about?
>
>
> Thanks,
>
> Mark Aitchison
>
> --
> Problem reports: https://cygwin.com/problems.html
> FAQ: https://cygwin.com/faq/
> Documentation: https://cygwin.com/docs.html
> Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
>
>
next prev parent reply other threads:[~2021-04-04 20:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-01 20:35 Mark Aitchison
2021-04-04 20:22 ` L A Walsh [this message]
2021-04-04 20:27 ` L A Walsh
2021-04-04 21:26 ` Joel Rees
2021-04-05 9:26 ` L A Walsh
2021-04-05 10:49 ` Joel Rees
2021-04-05 21:50 ` Mark Aitchison
2021-04-05 22:39 ` Joel Rees
2021-04-05 6:43 ` Achim Gratz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=606A2017.2040405@tlinx.org \
--to=cygwin@tlinx.org \
--cc=M.Aitchison@cyberXpress.co.nz \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).