From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by sourceware.org (Postfix) with ESMTPS id ADC343846403 for ; Mon, 5 Apr 2021 22:39:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org ADC343846403 Received: by mail-qt1-x829.google.com with SMTP id u8so9773304qtq.12 for ; Mon, 05 Apr 2021 15:39:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Ay46eOjnuFEl0tWtzPpMkob08iKCehSFYlhHgvf3OwM=; b=qzZWtY0faelpSsDJSS9QcpEpxfQLNxTlgGRgyz0GgQl+uAUHu/C+F7k/lWAxO2evtb oJZBh/yTOQYAU70VjllLlqxifFE49XDG3xyfSqJcqDRzW2og5EGGOmdm+R+zYZM7SnZF YAeS6UVv7htkICj3707nTfImT3pabQfCCGmVBkoYObjLEljMeBCpdmnMp1uMN5y8e36a XMUxwsgfKWDKefNR32MZWz3DDOXbg+/4FR+arBRcMlWoxGuo/ZlSmVpHEQdCZjzoyt9T 48qHUxvOZVvMLdHRYmlPxvc4zudhy5OVjwbrWwXdk2LqZKZN365VUcLCr+aRgR8ar/fH Z2Rg== X-Gm-Message-State: AOAM533wd3BQ6RRhbk58Au1MvXntdtc4XvhagXsZCa5EtyuuUAZObqKA 4KDWE9jpB/PyZBrlpRN6eILvM2jXzxKVNnmSVMns8ogm X-Google-Smtp-Source: ABdhPJx4xLpy+PHhkumO8Uu6JdwqSgAPKAA3hoM0wRWLFSrC8K5EhStvCr8pyaYpnhv4zvxY4Z+BmFlEyxd4hVU1Dno= X-Received: by 2002:a05:622a:18b:: with SMTP id s11mr12073309qtw.26.1617662394307; Mon, 05 Apr 2021 15:39:54 -0700 (PDT) MIME-Version: 1.0 References: <606AD7CE.6090606@tlinx.org> In-Reply-To: From: Joel Rees Date: Tue, 6 Apr 2021 07:39:42 +0900 Message-ID: Subject: Re: Perl Unidecode modules - which to use (if not Text::Unidecode)? To: cygwin@cygwin.com X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Apr 2021 22:39:57 -0000 Well, in the following, are your plans cognizant of the fact that many major languages do not incorporate a partition between vowels and consonants? Do you plan to target only those languages which do? 2021=E5=B9=B44=E6=9C=886=E6=97=A5(=E7=81=AB) 6:50 Mark Aitchison : > > A little more detail... I realise that stripping accents off is often not > a good thing to do, but at the moment that basically is what I'm after, o= r > to be more specific: I want to know if the character is a consonant or > vowel... I basically throw away vowels and punctuation in this odd > application. Later I will want to do all sorts of things with input text > that might be utf8 or utf16 or some encoding that (hopefully) I can guess > and translate to the same standard and ultimately spit out on a web page. > > There seem to be many perl modules that do similar things... I want to be > able to distribute my code and not require people to download things from > cpan. I'd like to stick with modules that are as stock standard as standa= rd > can be, i.e. are in a standard cygwin distribution, and are normally foun= d > in other perl environments. In a sense, searching cpan gives me too many > options because that includes modules that might require a customer to do > more than I should ask them to have to do, if it could have been avoided = by > me choosing a more standard way of achieving the goal in the first place. > > What I probably should have asked is... > 1. What perl module, that comes with cygwin, is good for telling whether = a > letter is a consonant? > 2. Later on I will also need something that makes a reasonable guess as t= o > what kind of encoding is used in some text (that might not have a helpful > header telling me the answer), with the view to converting it to whatever > encoding I want? I can find software to do this, but I would like to > restrict options to just those a cygwin user can install with the setup > program... if I'm not being too unrealistic about that requirement. > Thanks, Mark > >