public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Ondřej Bílka" <neleai@seznam.cz>
To: Leonhard Holz <leonhard.holz@web.de>
Cc: libc-alpha@sourceware.org
Subject: Re: [RFC] Add fast path for strcoll and strcasecmp
Date: Tue, 25 Nov 2014 20:36:00 -0000	[thread overview]
Message-ID: <20141125203612.GA21077@domone> (raw)
In-Reply-To: <5474E0B4.9020908@web.de>

On Tue, Nov 25, 2014 at 09:04:04PM +0100, Leonhard Holz wrote:
> Am 24.11.2014 00:47, schrieb Ondřej Bílka:
> >On Sun, Nov 23, 2014 at 11:52:06PM +0100, Leonhard Holz wrote:
> >>Hi Ondřej,
> >>
> >>as far as I understood, the current strcoll implementation scans
> >>both strings for collation sequences and compares the weights of
> >>them, whereby a collation sequence can be multiple bytes long. So
> >>whatever strcmp_l returns as index, you would need a general way of
> >>finding the start of the collation sequence this index is in.
> >>Unfortunately I cannot tell if or how this can be done.
> >>
> >As I wrote below you do not have to do that. Just precompute a table that
> >is zero for characters that are part of some collation sequence and use
> >old method when one of compared characters is in that table.
> >
> 
> Ok, I understand the idea and it would be great if it worked. BTW do
> you know how UTF-8 chars above 7F are handled?
>
A UTF-8 char consist of starting byte larger than 0xbf followed by
characters in 0x80-0xbf range, see

http://en.wikipedia.org/wiki/UTF-8

 
> > From performance perspective these are not problem as they should be
> >infrequent enough. Ignored ones are worse as they could make otherwise
> >identical long prefixes different.
> >
> >
> >>BTW I have implemented a benchmark for strcoll that is
> >>not-yet-pushed because I didn't manage to patch the bench-tests
> >>Makefile to generate additionally needed locales
> >>(https://sourceware.org/ml/libc-alpha/2014-10/msg00431.html).
> >>
> 
> I can send you the test files if you like.
> 
> Leonhard

  reply	other threads:[~2014-11-25 20:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-23 21:47 Ondřej Bílka
2014-11-23 22:52 ` Leonhard Holz
2014-11-23 23:47   ` Ondřej Bílka
2014-11-25 20:04     ` Leonhard Holz
2014-11-25 20:36       ` Ondřej Bílka [this message]
2014-11-26  9:07         ` Leonhard Holz
2014-11-27 15:25           ` Ondřej Bílka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141125203612.GA21077@domone \
    --to=neleai@seznam.cz \
    --cc=leonhard.holz@web.de \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).