public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* QUESTION: LC_COLLATE minimal requirements?
@ 2008-10-14  8:06 Harshula
  2008-10-14  8:06 ` Pravin S
  2008-10-14 12:23 ` Keld Jørn Simonsen
  0 siblings, 2 replies; 9+ messages in thread
From: Harshula @ 2008-10-14  8:06 UTC (permalink / raw)
  To: libc-locales; +Cc: Pravin S

Hi,

I was unable to find much documentation on LC_COLLATE except for [1].
Hence I have a few questions.

Firstly, some background information. The Sinhala collation sequence
(SLS1134) is relatively simple.

* It does not have multiple characters mapping to a single
collation element.
* It does not consider composed and decomposed dependent vowels as
equivalent [2].
* It does not have to deal with secondary and tertiary weights.
* It has a few simple tailoring rules [3] that need to be applied to the
DUCET [4].


Q1) Is it a requirement to use the collating-symbol keyword to define
ALL symbols? If not, is this patch sufficient and acceptable for glibc?
http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log

Q2) Instead of explicitly listing all the characters in order, is it
possible to use the reorder-after keyword to only define variations to
the DUCET?

Q3) I couldn't find any documentation on:

translit_start
include  "translit_combining";""
translit_end

/usr/share/i18n/locales/translit_combining
------------------------------------------
% SINHALA VOWEL SIGN DIGA KOMBUVA
<U0DDA> "<U0DD9><U0DCA>"
% SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA
<U0DDC> "<U0DD9><U0DCF>"
% SINHALA VOWEL SIGN KOMBUVA HAA DIGA AELA-PILLA
<U0DDD> "<U0DDC><U0DCA>"
% SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA
<U0DDE> "<U0DD9><U0DDF>"
------------------------------------------

Does translit_start have an affect on LC_COLLATE?

Thanks,
#

[1]
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html

[2]
http://sourceforge.net/mailarchive/forum.php?thread_name=1223803982.4898.16.camel%40B1.HOME&forum_name=sinhala-technical

[3]
http://www.nongnu.org/sinhala/doc/howto/sinhala-howto.html#DEV-DATABASES

[4] http://unicode.org/Public/UCA/latest/allkeys.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-14  8:06 QUESTION: LC_COLLATE minimal requirements? Harshula
@ 2008-10-14  8:06 ` Pravin S
  2008-10-14 21:56   ` Ulrich Drepper
  2008-10-14 12:23 ` Keld Jørn Simonsen
  1 sibling, 1 reply; 9+ messages in thread
From: Pravin S @ 2008-10-14  8:06 UTC (permalink / raw)
  To: Harshula; +Cc: libc-locales, Ulrich Drepper

2008/10/12 Harshula <harshula@gmail.com>:
> Hi,
>
> I was unable to find much documentation on LC_COLLATE except for [1].
> Hence I have a few questions.
>
> Firstly, some background information. The Sinhala collation sequence
> (SLS1134) is relatively simple.
>
> * It does not have multiple characters mapping to a single
> collation element.
> * It does not consider composed and decomposed dependent vowels as
> equivalent [2].
> * It does not have to deal with secondary and tertiary weights.
> * It has a few simple tailoring rules [3] that need to be applied to the
> DUCET [4].
>
>
> Q1) Is it a requirement to use the collating-symbol keyword to define
> ALL symbols? If not, is this patch sufficient and acceptable for glibc?
> http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log
>
> Q2) Instead of explicitly listing all the characters in order, is it
> possible to use the reorder-after keyword to only define variations to
> the DUCET?
>
> Q3) I couldn't find any documentation on:
>
> translit_start
> include  "translit_combining";""
> translit_end
>
> /usr/share/i18n/locales/translit_combining
> ------------------------------------------
> % SINHALA VOWEL SIGN DIGA KOMBUVA
> <U0DDA> "<U0DD9><U0DCA>"
> % SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA
> <U0DDC> "<U0DD9><U0DCF>"
> % SINHALA VOWEL SIGN KOMBUVA HAA DIGA AELA-PILLA
> <U0DDD> "<U0DDC><U0DCA>"
> % SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA
> <U0DDE> "<U0DD9><U0DDF>"
> ------------------------------------------
>
> Does translit_start have an affect on LC_COLLATE?
>
> Thanks,
> #
>
> [1]
> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html
>
> [2]
> http://sourceforge.net/mailarchive/forum.php?thread_name=1223803982.4898.16.camel%40B1.HOME&forum_name=sinhala-technical
>
> [3]
> http://www.nongnu.org/sinhala/doc/howto/sinhala-howto.html#DEV-DATABASES
>
> [4] http://unicode.org/Public/UCA/latest/allkeys.txt
>
>


Adding Ulrich in cc list,

Thanks & Regards,
----------------------
Pravin Satpute

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-14  8:06 QUESTION: LC_COLLATE minimal requirements? Harshula
  2008-10-14  8:06 ` Pravin S
@ 2008-10-14 12:23 ` Keld Jørn Simonsen
  1 sibling, 0 replies; 9+ messages in thread
From: Keld Jørn Simonsen @ 2008-10-14 12:23 UTC (permalink / raw)
  To: Harshula; +Cc: libc-locales, Pravin S

On Mon, Oct 13, 2008 at 02:52:09AM +1100, Harshula wrote:
> Hi,
> 
> I was unable to find much documentation on LC_COLLATE except for [1].
> Hence I have a few questions.

There is a more recent spec at
http://www.open-std.org/JTC1/sc22/WG20/docs/n972-14652ft.pdf
> 
> [1]
> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html

best regards
keld

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-14  8:06 ` Pravin S
@ 2008-10-14 21:56   ` Ulrich Drepper
  2008-10-19 16:40     ` Harshula
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Drepper @ 2008-10-14 21:56 UTC (permalink / raw)
  To: Pravin S; +Cc: Harshula, libc-locales

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Pravin S wrote:
>> Q1) Is it a requirement to use the collating-symbol keyword to define
>> ALL symbols? If not, is this patch sufficient and acceptable for glibc?
>> http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log

It's better to follow the example of the other languages.  This results
in better tables.  And it's trivial.  Just use

   <U0DF4>   <U0DF4>;<BAS>;<MIN>;IGNORE

etc


>> Q2) Instead of explicitly listing all the characters in order, is it
>> possible to use the reorder-after keyword to only define variations to
>> the DUCET?

I have no idea what this means?  Each Unicode position can only appear
once in the entire file.  If you add a new language with new characters,
then just put them in the right order.  If you need to change the
collation for existing characters, then you must use reorder_after the
the locale description outside the collation tables themselves.


>> Q3) I couldn't find any documentation on:
>>
>> translit_start
>> include  "translit_combining";""
>> translit_end

Just look at the files.  There is no magic.  It's a 1:N mapping.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkj030AACgkQ2ijCOnn/RHSBKACeIdqAhld7yAS4UUKZOye3QzYy
NakAn3hwJtj9Ft8V3x9hz09f2fQIj0RQ
=uJHD
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-14 21:56   ` Ulrich Drepper
@ 2008-10-19 16:40     ` Harshula
  2008-10-19 18:01       ` Ulrich Drepper
  2008-10-20  7:11       ` Pravin S
  0 siblings, 2 replies; 9+ messages in thread
From: Harshula @ 2008-10-19 16:40 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Pravin S, libc-locales

Hi Ulrich,

On Tue, 2008-10-14 at 11:04 -0700, Ulrich Drepper wrote:

> >> Q1) Is it a requirement to use the collating-symbol keyword to define
> >> ALL symbols? If not, is this patch sufficient and acceptable for glibc?
> >> http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log
> 
> It's better to follow the example of the other languages.  This results
> in better tables.  And it's trivial.  Just use
> 
>    <U0DF4>   <U0DF4>;<BAS>;<MIN>;IGNORE
> 
> etc

Thanks, I've made the changes.

Patch:
http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log

Testcase:
http://cvs.savannah.gnu.org/viewvc/sinhala/patches/mysql-data-sinhala.txt?root=sinhala&view=log

Correct output from 'sort':
http://cvs.savannah.gnu.org/viewvc/sinhala/patches/glibc-collation-correct.txt?root=sinhala&view=log


> >> Q3) I couldn't find any documentation on:
> >>
> >> translit_start
> >> include  "translit_combining";""
> >> translit_end
> 
> Just look at the files.  There is no magic.  It's a 1:N mapping.

The original question was "Does translit_start have an affect on
LC_COLLATE?"

translit_combining file contains this comment:
-----------------------------------------------
% Transliterations that remove all combining characters (accents,
% pronounciation marks, etc.).
% Generated from UnicodeData.txt.
-----------------------------------------------

Should this be interpreted as always converting to the composed form?

Thanks,
#

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-19 16:40     ` Harshula
@ 2008-10-19 18:01       ` Ulrich Drepper
  2008-10-20  7:11       ` Pravin S
  1 sibling, 0 replies; 9+ messages in thread
From: Ulrich Drepper @ 2008-10-19 18:01 UTC (permalink / raw)
  To: Harshula; +Cc: Pravin S, libc-locales

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Harshula wrote:
> The original question was "Does translit_start have an affect on
> LC_COLLATE?"

Transliteration has nothing to do with collation.  Why should it?

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkj7ckQACgkQ2ijCOnn/RHSxpQCfUWbu2ZlxCkYdnG4hVZTgKsOW
MtkAn07peGanR39l5/XCKQcGVGi++vX3
=7kG0
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-19 16:40     ` Harshula
  2008-10-19 18:01       ` Ulrich Drepper
@ 2008-10-20  7:11       ` Pravin S
  2008-10-21 13:03         ` Harshula
  1 sibling, 1 reply; 9+ messages in thread
From: Pravin S @ 2008-10-20  7:11 UTC (permalink / raw)
  To: Harshula; +Cc: Ulrich Drepper, libc-locales

Hi All,

2008/10/19 Harshula <harshula@gmail.com>:
> Hi Ulrich,
>
> On Tue, 2008-10-14 at 11:04 -0700, Ulrich Drepper wrote:
>
>> >> Q1) Is it a requirement to use the collating-symbol keyword to define
>> >> ALL symbols? If not, is this patch sufficient and acceptable for glibc?
>> >> http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log
>>
>> It's better to follow the example of the other languages.  This results
>> in better tables.  And it's trivial.  Just use
>>
>>    <U0DF4>   <U0DF4>;<BAS>;<MIN>;IGNORE
>>
>> etc
>
> Thanks, I've made the changes.
>
> Patch:
> http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log
>
> Testcase:
> http://cvs.savannah.gnu.org/viewvc/sinhala/patches/mysql-data-sinhala.txt?root=sinhala&view=log
>
> Correct output from 'sort':
> http://cvs.savannah.gnu.org/viewvc/sinhala/patches/glibc-collation-correct.txt?root=sinhala&view=log
>
>

Harshula I think,
It will be nice if you filed bug @
http://sourceware.org/bugzilla/enter_bug.cgi?product=glibc
and submit patch there, it will be good for tracking





Thanks & Regards,
----------------------
Pravin Satpute

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-20  7:11       ` Pravin S
@ 2008-10-21 13:03         ` Harshula
  2008-10-22  6:22           ` Pravin S
  0 siblings, 1 reply; 9+ messages in thread
From: Harshula @ 2008-10-21 13:03 UTC (permalink / raw)
  To: Pravin S; +Cc: Ulrich Drepper, libc-locales

On Mon, 2008-10-20 at 12:20 +0530, Pravin S wrote:
> 2008/10/19 Harshula <harshula@gmail.com>:

> > Thanks, I've made the changes.
> >
> > Patch:
> > http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log
> >
> > Testcase:
> > http://cvs.savannah.gnu.org/viewvc/sinhala/patches/mysql-data-sinhala.txt?root=sinhala&view=log
> >
> > Correct output from 'sort':
> > http://cvs.savannah.gnu.org/viewvc/sinhala/patches/glibc-collation-correct.txt?root=sinhala&view=log
> >
> >
> 
> Harshula I think,
> It will be nice if you filed bug @
> http://sourceware.org/bugzilla/enter_bug.cgi?product=glibc
> and submit patch there, it will be good for tracking

Done:
http://sourceware.org/bugzilla/show_bug.cgi?id=6968

cya,
#

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QUESTION: LC_COLLATE minimal requirements?
  2008-10-21 13:03         ` Harshula
@ 2008-10-22  6:22           ` Pravin S
  0 siblings, 0 replies; 9+ messages in thread
From: Pravin S @ 2008-10-22  6:22 UTC (permalink / raw)
  To: Harshula; +Cc: Ulrich Drepper, libc-locales

2008/10/21 Harshula <harshula@gmail.com>:
> On Mon, 2008-10-20 at 12:20 +0530, Pravin S wrote:
>> 2008/10/19 Harshula <harshula@gmail.com>:
>
>> > Thanks, I've made the changes.
>> >
>> > Patch:
>> > http://cvs.savannah.gnu.org/viewvc/sinhala/patches/iso14651_t1_common-glibc.patch?root=sinhala&view=log
>> >
>> > Testcase:
>> > http://cvs.savannah.gnu.org/viewvc/sinhala/patches/mysql-data-sinhala.txt?root=sinhala&view=log
>> >
>> > Correct output from 'sort':
>> > http://cvs.savannah.gnu.org/viewvc/sinhala/patches/glibc-collation-correct.txt?root=sinhala&view=log
>> >
>> >
>>
>> Harshula I think,
>> It will be nice if you filed bug @
>> http://sourceware.org/bugzilla/enter_bug.cgi?product=glibc
>> and submit patch there, it will be good for tracking
>
> Done:
> http://sourceware.org/bugzilla/show_bug.cgi?id=6968
>

Thanks Harshula

Best Regards,
-----------------
Pravin Satpute

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-10-22  6:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-14  8:06 QUESTION: LC_COLLATE minimal requirements? Harshula
2008-10-14  8:06 ` Pravin S
2008-10-14 21:56   ` Ulrich Drepper
2008-10-19 16:40     ` Harshula
2008-10-19 18:01       ` Ulrich Drepper
2008-10-20  7:11       ` Pravin S
2008-10-21 13:03         ` Harshula
2008-10-22  6:22           ` Pravin S
2008-10-14 12:23 ` Keld Jørn Simonsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).