public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: L A Walsh <cygwin@tlinx.org>
To: Mark Aitchison <M.Aitchison@cyberXpress.co.nz>
Cc: cygwin@cygwin.com
Subject: Re: Perl Unidecode modules - which to use (if not Text::Unidecode)?
Date: Sun, 04 Apr 2021 13:22:47 -0700	[thread overview]
Message-ID: <606A2017.2040405@tlinx.org> (raw)
In-Reply-To: <d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz>

On 2021/04/01 13:35, Mark Aitchison wrote:
> 1. What perl Unicode modules should I consider, if not Text::Unidecode? The present need 
> is to be able to convert those few "foreign" characters (like ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ) 
> that are basically ASCII with accent marks to their closest ASCII equivalents, 
---
    Hmm...have you tried installing from cpan?

I just tried it and it seems to work.

>  cpan -i Text::Unidecode;
>  > cat /tmp/in

ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ

>  cat /tmp/in| perl -e '
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO

---
I.e. it stripped off all the accent marks.  Is that what you
want?


 

    (it spewed some warnings, but seemed to test out ok, so tried it).
put your characters in a file "/tmp/in", (i.e.
>  cat /tmp/in
 -- I know, not very creative,
but then:
 cat /tmp/in| tperl
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO)

    Where are you seeing those characters and how do you know they are not
already in unicode?  I.e. That I'm seeing characters "CcGgEIO" but with
accents -- indicates they area already in Unicode.

What are you wanting to do.. just convert them to the ASCII characters
with the accent marks stripped off?


> but I'd 
> like to do more with Unicode in the future, without going down any dead-ends as far as 
> being able to run under cygwin is concerned.
>
> 2. I see some talk of Internationalization in Chapter 2 of "Setting up Cygwin", but 
> cannot see anything relating to perl modules, and I don't see any easy way to search many 
> months of the mailing list for a keyword... is there any information I should know about?
>
>
> Thanks,
>
> Mark Aitchison
>
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>
>   


  reply	other threads:[~2021-04-04 20:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 20:35 Mark Aitchison
2021-04-04 20:22 ` L A Walsh [this message]
2021-04-04 20:27 ` L A Walsh
2021-04-04 21:26 ` Joel Rees
2021-04-05  9:26   ` L A Walsh
2021-04-05 10:49     ` Joel Rees
2021-04-05 21:50       ` Mark Aitchison
2021-04-05 22:39         ` Joel Rees
2021-04-05  6:43 ` Achim Gratz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=606A2017.2040405@tlinx.org \
    --to=cygwin@tlinx.org \
    --cc=M.Aitchison@cyberXpress.co.nz \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).