public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Cc: Brian Inglis <Brian.Inglis@shaw.ca>
Subject: Re: [ANNOUNCEMENT] Updated: dash 0.5.12-2
Date: Wed, 15 Feb 2023 14:52:23 +0100	[thread overview]
Message-ID: <Y+zjl5E4SsUZpQ4Y@calimero.vinschen.de> (raw)
In-Reply-To: <Y+qRXYAzPKsSHWAy@calimero.vinschen.de>

Hi Brian,

On Feb 13 20:37, Corinna Vinschen via Cygwin wrote:
> On Feb 13 12:03, Brian Inglis via Cygwin wrote:
> > On 2023-02-13 10:43, ASSI via Cygwin wrote:
> > > Corinna Vinschen via Cygwin writes:
> > > > Can you give me an example?  I'm a bit puzzled because fnmatch as well
> > > > as glob in Cygwin support native characters.
> > 
> > But not locale dependent named character classes like regexp in paths.
> 
> I checked the dash code of curent dash git, and while its internal glob
> implementation supports character classes, they are no localized, using
> standard singlebyte functions isalnum, isalpha, etc. under the hood.
> 
> So, yeah, what you say further down this mail... looks like dash
> supports locale dependent character classes only with glibc.
> [...]
> Either way, I don't care much for what a certain application provides by
> itself.  I'm talking about our libc, that is Cygwin, and what it
> provides to processes calling its implementations of regcomp/regexec,
> glob and fnmatch.
> 
> All these functions have been taken from FreeBSD and all three suffer
> shortcomings:
> 
> - regcomp/regexec supports POSIX named character classes, collating
>   symbols, and equivalence class expressions, but all of them only work
>   for ASCII chars.
> 
> - fnmatch and glob support neither of named character classes,
>   collating symbols, and equivalence class expressions.
> 
> I checked the upstream code in FreeBSD, OpenBSD and NetBSD and none of
> these functions are improved to support locales (regcomp) or any of
> the character classes stuff (fnmatch/glob).
> 
> So, if we want to add this support to Cygwin (and thus, to all
> applications calling the libc implementation of these functions),
> quite a bit of work is required.
> 
> Being able to fetch the implementation from some other source
> would reduce the effort enourmously :}

I took the liberty to add [:<class>:] support to Cygwin's fnmatch(3) and
glob(3) functions.  They also recognize collating symbols [.<coll.] and
equivalence class expressions [=<equiv>=].  But the latter two are not
implemented yet and fnmatch/glob simply skip them in the pattern.

Given that glob and fnmatch use wide characters internally, the support
for character classes is internationalized by default, albeit in a
slightly differentt way than in glibc.  The classes a unicode character
belongs to is not locale dependent in Cygwin/newlib.  All characters
have their classes assigned all the time, so, for instance, the german
character 'ä' is lower and alpha even in the en_US.utf8 locale.

The currently building cygwin test release 3.5.0-0.174.gd6d4436145b8
contains the new code.  Would you mind to build a dash for testing so we
can see if and how it works?


Thanks,
Corinna

  reply	other threads:[~2023-02-15 13:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13  5:03 Cygwin dash Co-Maintainer via Cygwin-announce
2023-02-13  9:22 ` Corinna Vinschen
2023-02-13 16:38   ` Corinna Vinschen
2023-02-13 17:43     ` ASSI
2023-02-13 17:48   ` Andrey Repin
2023-02-13 19:03     ` Brian Inglis
2023-02-13 19:37       ` Corinna Vinschen
2023-02-15 13:52         ` Corinna Vinschen [this message]
2023-02-15 14:05           ` Corinna Vinschen
2023-02-15 15:56             ` Andrey Repin
2023-02-15 22:31             ` Brian Inglis
2023-02-16  9:53               ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+zjl5E4SsUZpQ4Y@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=Brian.Inglis@shaw.ca \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).