public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* manpage searches "^\s+keyword\s" vs. ???
@ 2019-01-28  1:47 L A Walsh
  2019-01-30 18:50 ` Andrey Repin
  0 siblings, 1 reply; 7+ messages in thread
From: L A Walsh @ 2019-01-28  1:47 UTC (permalink / raw)
  To: cygwin

I've always used "^\s+keyword\s" as a way to search for some
keyword starting a section. 

On linux it still works, but on cygin it doesn't like '\s' as
symbol for white space.

Any idea why there might be a difference?

I note an option that could do similar in less -- '&pattern'
turns OFF single special characters, I tried that on linux
and it turned off the '\s' matching space.  That's nice..um
how about other way?

Well didn't know if there might be some other op to go the
other way, but didn't see anything.

any ideas?

thanks...



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: manpage searches "^\s+keyword\s" vs. ???
  2019-01-28  1:47 manpage searches "^\s+keyword\s" vs. ??? L A Walsh
@ 2019-01-30 18:50 ` Andrey Repin
  2019-01-30 19:09   ` Eric Blake
  2019-01-31  0:55   ` Brian Inglis
  0 siblings, 2 replies; 7+ messages in thread
From: Andrey Repin @ 2019-01-30 18:50 UTC (permalink / raw)
  To: L A Walsh, cygwin

Greetings, L A Walsh!

> I've always used "^\s+keyword\s" as a way to search for some
> keyword starting a section. 

Welcome to the club.

> On linux it still works, but on cygin it doesn't like '\s' as
> symbol for white space.

> Any idea why there might be a difference?

> I note an option that could do similar in less -- '&pattern'
> turns OFF single special characters, I tried that on linux
> and it turned off the '\s' matching space.  That's nice..um
> how about other way?

> Well didn't know if there might be some other op to go the
> other way, but didn't see anything.

> any ideas?

I've been puzzled by this since… forever, it seems.
This is something in less, but all the `man less` says is "regular expression
library provided by your system".
I guess this is down to compilation options at this point.


-- 
With best regards,
Andrey Repin
Wednesday, January 30, 2019 21:36:27

Sorry for my terrible english...
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: manpage searches "^\s+keyword\s" vs. ???
  2019-01-30 18:50 ` Andrey Repin
@ 2019-01-30 19:09   ` Eric Blake
  2019-01-30 19:34     ` Corinna Vinschen
                       ` (2 more replies)
  2019-01-31  0:55   ` Brian Inglis
  1 sibling, 3 replies; 7+ messages in thread
From: Eric Blake @ 2019-01-30 19:09 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 1254 bytes --]

On 1/30/19 12:40 PM, Andrey Repin wrote:

> 
> I've been puzzled by this since… forever, it seems.
> This is something in less, but all the `man less` says is "regular expression
> library provided by your system".

\s is a non-standard regex extension - glibc provides it, Cygwin has not
(at least, historically).  POSIX provides [[:space:]] as a portable
alternative (although not all libc have implemented all of POSIX yet),
but is annoyingly long to type.

Similarly, BSD regex (which is where Cygwin derives its regex from)
supports the non-standard regex extension [[:<:]] as a word boundary,
while glibc has the same feature but spelled \<.  I also seem to recall
a patch in the past to teach Cygwin to respect \< by expanding it to
[[:<:]] before calling into the BSD-derived code (although I couldn't
actually find one in a quick search); a similar patch to expand \s into
[[:space:]] would be a reasonable idea.

> I guess this is down to compilation options at this point.

Not so much compilation options of man and less, but rather the code
used in Cygwin itself for handling regex.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: manpage searches "^\s+keyword\s" vs. ???
  2019-01-30 19:09   ` Eric Blake
@ 2019-01-30 19:34     ` Corinna Vinschen
  2019-01-30 19:35     ` Eric Blake
  2019-01-30 21:03     ` Wayne Davison
  2 siblings, 0 replies; 7+ messages in thread
From: Corinna Vinschen @ 2019-01-30 19:34 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]

On Jan 30 13:09, Eric Blake wrote:
> On 1/30/19 12:40 PM, Andrey Repin wrote:
> 
> > 
> > I've been puzzled by this since… forever, it seems.
> > This is something in less, but all the `man less` says is "regular expression
> > library provided by your system".
> 
> \s is a non-standard regex extension - glibc provides it, Cygwin has not
> (at least, historically).  POSIX provides [[:space:]] as a portable
> alternative (although not all libc have implemented all of POSIX yet),
> but is annoyingly long to type.
> 
> Similarly, BSD regex (which is where Cygwin derives its regex from)
> supports the non-standard regex extension [[:<:]] as a word boundary,
> while glibc has the same feature but spelled \<.  I also seem to recall
> a patch in the past to teach Cygwin to respect \< by expanding it to
> [[:<:]] before calling into the BSD-derived code (although I couldn't
> actually find one in a quick search); a similar patch to expand \s into
> [[:space:]] would be a reasonable idea.
> 
> > I guess this is down to compilation options at this point.
> 
> Not so much compilation options of man and less, but rather the code
> used in Cygwin itself for handling regex.

FreeBSD code since we can't use glibc code for licensing reasons.

As usual: Patches welcome!  (Even a complet replacement wouldn't hurt
as long as licensing is no issue)


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: manpage searches "^\s+keyword\s" vs. ???
  2019-01-30 19:09   ` Eric Blake
  2019-01-30 19:34     ` Corinna Vinschen
@ 2019-01-30 19:35     ` Eric Blake
  2019-01-30 21:03     ` Wayne Davison
  2 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2019-01-30 19:35 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 1467 bytes --]

On 1/30/19 1:09 PM, Eric Blake wrote:

> \s is a non-standard regex extension - glibc provides it, Cygwin has not
> (at least, historically).  POSIX provides [[:space:]] as a portable
> alternative (although not all libc have implemented all of POSIX yet),
> but is annoyingly long to type.
> 
> Similarly, BSD regex (which is where Cygwin derives its regex from)
> supports the non-standard regex extension [[:<:]] as a word boundary,
> while glibc has the same feature but spelled \<.  I also seem to recall
> a patch in the past to teach Cygwin to respect \< by expanding it to
> [[:<:]] before calling into the BSD-derived code (although I couldn't
> actually find one in a quick search); a similar patch to expand \s into
> [[:space:]] would be a reasonable idea.

Found it:
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/regex/regcomp.c;h=180f599c#l425

and indeed, Cygwin fakes \< and \> but NOT \s or \b (for those, you'd
have to submit a patch to that spot in regcomp.c).

> 
>> I guess this is down to compilation options at this point.
> 
> Not so much compilation options of man and less, but rather the code
> used in Cygwin itself for handling regex.

Also a good read:

https://stackoverflow.com/questions/9792702/does-bash-support-word-boundary-regular-expressions

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: manpage searches "^\s+keyword\s" vs. ???
  2019-01-30 19:09   ` Eric Blake
  2019-01-30 19:34     ` Corinna Vinschen
  2019-01-30 19:35     ` Eric Blake
@ 2019-01-30 21:03     ` Wayne Davison
  2 siblings, 0 replies; 7+ messages in thread
From: Wayne Davison @ 2019-01-30 21:03 UTC (permalink / raw)
  To: cygwin

On Wed, Jan 30, 2019 at 11:09 AM Eric Blake wrote:
> Not so much compilation options of man and less, but rather the code
> used in Cygwin itself for handling regex.

The configuration of less supports many different regex libraries.  I
downloaded the source and ran "./configure --with-regex=pcre"  and
built a nice version of less that fully supports \b and the various
other perl regex extensions.  The output of cygwin's standard "less
--version" indicates it was compiled with posix regex, while linux
suppliers seem to all use gnu regex (which also supports various
perl-isms these days).

I think it would be nice to tweak the less package to be compiled with
pcre regex.

..wayne..

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: manpage searches "^\s+keyword\s" vs. ???
  2019-01-30 18:50 ` Andrey Repin
  2019-01-30 19:09   ` Eric Blake
@ 2019-01-31  0:55   ` Brian Inglis
  1 sibling, 0 replies; 7+ messages in thread
From: Brian Inglis @ 2019-01-31  0:55 UTC (permalink / raw)
  To: cygwin

On 2019-01-30 11:40, Andrey Repin wrote:
>> I've always used "^\s+keyword\s" as a way to search for some keyword
>> starting a section.
> Welcome to the club.
>> On linux it still works, but on cygin it doesn't like '\s' as symbol for
>> white space.
>> Any idea why there might be a difference?
>> I note an option that could do similar in less -- '&pattern' turns OFF
>> single special characters, I tried that on linux and it turned off the '\s'
>> matching space.  That's nice..um how about other way?
>> Well didn't know if there might be some other op to go the other way, but
>> didn't see anything. any ideas?
> I've been puzzled by this since… forever, it seems.
> This is something in less, but all the `man less` says is "regular expression
> library provided by your system".
> I guess this is down to compilation options at this point.

The full class [[:space:]] works as expected.
Probably config options picking the BSD POSIX ERE library without char class esc
shortcuts, rather than allowing Glib, ICU, or PCRE ERE library with char class
esc shortcuts.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-31  0:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28  1:47 manpage searches "^\s+keyword\s" vs. ??? L A Walsh
2019-01-30 18:50 ` Andrey Repin
2019-01-30 19:09   ` Eric Blake
2019-01-30 19:34     ` Corinna Vinschen
2019-01-30 19:35     ` Eric Blake
2019-01-30 21:03     ` Wayne Davison
2019-01-31  0:55   ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).