public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Erik Bray <erik.m.bray@gmail.com>
To: cygwin@cygwin.com
Subject: Re: Python2 "narrow" build, Unicode issue in regex package
Date: Wed, 24 May 2017 12:00:00 -0000	[thread overview]
Message-ID: <CAOTD34aAqvP7D++DeCXPPg6Mrpd-undtnPVdPcc3s8g0sVzt4A@mail.gmail.com> (raw)
In-Reply-To: <CAKw7uVhWj0WhU5k020GjrMZ2hx3dPDHr7Uj7-bRYpSpjmS1cHg@mail.gmail.com>

On Wed, May 24, 2017 at 10:30 AM, Václav Haisman wrote:
> Hi.
>
> I have recently hit an issue ([1]) with Python 2.7 and regex package
> for it on Cygwin. It appears that Cygwin's Python 2.7 is so called
> narrow build. This causes issues when working with Unicode code point
> outside BMP, like the emoji code points in my issue.
>
> Is there a chance Cygwin's Python could be rebuilt as a wide build?
>
> [1] https://bitbucket.org/mrabarnett/mrab-regex/issues/241/issues-matching-unicode-code-ranges-with-p

I've been bitten by this before too, and I don't know if there's a
specific policy by which Cygwin has determined the narrow build should
be used.  Though narrow builds are typical on Windows because it
translates easily to native wide character strings on Windows, whereas
using a wide build introduces significantly more overhead.

I know it's trite to answer "use a different tool", but if at all
possible you might consider switching to Python 3, which is the
future. Heck, it's really the present.  Even most of the scientific
Python community has switched over to Python 3 (well, at least the
development community has--users are understandably a little slower).
Many large corporations, such a Instagram, have switched.  And Python
2 support is ending in 2020, so the sooner the better.  I know it's a
hassle though.

Anyways, on current versions of Python 3 (I think 3.3 and above) there
is no longer a wide- versus narrow- distinction.  Instead, each string
is stored in the smallest possible representation that fits the
highest codepoint in the string.

If you need a wide character build on Cygwin you could also build it
yourself.  Just make sure to get a few Cygwin patches from
https://github.com/cygwinports/python2

Best,
Erik

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

      reply	other threads:[~2017-05-24 11:42 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-24  8:42 Václav Haisman
2017-05-24 12:00 ` Erik Bray [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOTD34aAqvP7D++DeCXPPg6Mrpd-undtnPVdPcc3s8g0sVzt4A@mail.gmail.com \
    --to=erik.m.bray@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).