public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
@ 2015-10-13 21:57 Egmont Koblinger
  2015-10-13 22:37 ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2015-10-13 21:57 UTC (permalink / raw)
  To: libc-locales

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

Hi,

Could you please review and apply the attached patch?

Recommended commit message body (feel free to edit as you please):
-----
Fix sorting of long consonants, a regression introduced by #13547. Fix
inconsistencies in uppercase vs. lowercase sorting. Fix diacritic
ordering. Fix ordering of foreign accents.

Add an extensive test file.

    [BZ #18934]
    * locales/hu_HU: Fix multiple bugs.
    * hu_HU.in: New file.
    * Makefile (test-input): Add hu_HU.UTF-8.
-----

I know that generally one patch per issue is a cleaner approach, but
this time apologize for an all-in-one: the patches would heavily
conflict, and it would be really cumbersome to unittest an incremental
series. Instead, think about it as TDD (test driven development): I
attach a decent unittest with explanations and pointers to the rules,
and a locale definition that implements them.

The addressed bugs are:

- The fix to bug 13547 was incorrect and introduced a regression. It
fixed a corner case, whereas I didn't realize it broke a more typical
once. See details over there.

- Two minor bugs/inconsistencies wrt. sorting upper/lowercase values,
as described in bug 18587.

- Someone enabled backwards ordering of diacrits by default (bug
17750), breaking tons of locales including Hungarian. So disable
backwards ordering in this locale definition.

- Foreign accents should be sorted after the native Hungarian ones, it
wasn't the case so far.

Plus, a unittest is added which is far more extensive than any other
locale has. It includes all the examples from the official rules of
Hungarian orthography's corresponding sections, as well as thorough
testing of all corner cases I could think of, created by me; and
comments all around.

In addition to fixing a(n unfortunately relatively unsignificant)
locale, I hope that this unittest file will encourage other locale
maintainers to create similarly extensive tests, increasing the
quality of other locales in the long run and preventing regressions
(such as the backward diacritics ordering) from sneaking in.

Thanks a lot,
egmont

[-- Attachment #2: glibc-18934-hu-collate-v4.patch --]
[-- Type: application/mbox, Size: 34903 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2015-10-13 21:57 [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs Egmont Koblinger
@ 2015-10-13 22:37 ` Egmont Koblinger
  2015-10-26 15:25   ` Egmont Koblinger
  2016-04-21  6:13   ` Mike Frysinger
  0 siblings, 2 replies; 33+ messages in thread
From: Egmont Koblinger @ 2015-10-13 22:37 UTC (permalink / raw)
  To: libc-locales

[-- Attachment #1: Type: text/plain, Size: 2784 bytes --]

Hi,

Please use the patch I attach now to this mail, not to the previous
one. Sorry for the confusion!

I checked the previous patch many times, yet I missed something that
I've just discovered after sending the previous mail. I forgot one of
the compound letters from the unittest.

The only change from the previous patch is the addition of these few
more lines in the unittest, so it has an even better coverage. The
patch to the locale definiton is unchanged.

I've re-run the test and of course it still passes :)

Thanks,
egmont

On Tue, Oct 13, 2015 at 11:56 PM, Egmont Koblinger <egmont@gmail.com> wrote:
> Hi,
>
> Could you please review and apply the attached patch?
>
> Recommended commit message body (feel free to edit as you please):
> -----
> Fix sorting of long consonants, a regression introduced by #13547. Fix
> inconsistencies in uppercase vs. lowercase sorting. Fix diacritic
> ordering. Fix ordering of foreign accents.
>
> Add an extensive test file.
>
>     [BZ #18934]
>     * locales/hu_HU: Fix multiple bugs.
>     * hu_HU.in: New file.
>     * Makefile (test-input): Add hu_HU.UTF-8.
> -----
>
> I know that generally one patch per issue is a cleaner approach, but
> this time apologize for an all-in-one: the patches would heavily
> conflict, and it would be really cumbersome to unittest an incremental
> series. Instead, think about it as TDD (test driven development): I
> attach a decent unittest with explanations and pointers to the rules,
> and a locale definition that implements them.
>
> The addressed bugs are:
>
> - The fix to bug 13547 was incorrect and introduced a regression. It
> fixed a corner case, whereas I didn't realize it broke a more typical
> once. See details over there.
>
> - Two minor bugs/inconsistencies wrt. sorting upper/lowercase values,
> as described in bug 18587.
>
> - Someone enabled backwards ordering of diacrits by default (bug
> 17750), breaking tons of locales including Hungarian. So disable
> backwards ordering in this locale definition.
>
> - Foreign accents should be sorted after the native Hungarian ones, it
> wasn't the case so far.
>
> Plus, a unittest is added which is far more extensive than any other
> locale has. It includes all the examples from the official rules of
> Hungarian orthography's corresponding sections, as well as thorough
> testing of all corner cases I could think of, created by me; and
> comments all around.
>
> In addition to fixing a(n unfortunately relatively unsignificant)
> locale, I hope that this unittest file will encourage other locale
> maintainers to create similarly extensive tests, increasing the
> quality of other locales in the long run and preventing regressions
> (such as the backward diacritics ordering) from sneaking in.
>
> Thanks a lot,
> egmont

[-- Attachment #2: glibc-18934-hu-collate-v5.patch --]
[-- Type: application/mbox, Size: 35189 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2015-10-13 22:37 ` Egmont Koblinger
@ 2015-10-26 15:25   ` Egmont Koblinger
  2015-11-15 21:34     ` Egmont Koblinger
  2016-04-21  6:13   ` Mike Frysinger
  1 sibling, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2015-10-26 15:25 UTC (permalink / raw)
  To: libc-locales

Hello,

Friendly ping - could you please take a look at this patch (version 5)?

Is there anything I can help you with?

Thanks,
egmont

On Wed, Oct 14, 2015 at 12:36 AM, Egmont Koblinger <egmont@gmail.com> wrote:
> Hi,
>
> Please use the patch I attach now to this mail, not to the previous
> one. Sorry for the confusion!
>
> I checked the previous patch many times, yet I missed something that
> I've just discovered after sending the previous mail. I forgot one of
> the compound letters from the unittest.
>
> The only change from the previous patch is the addition of these few
> more lines in the unittest, so it has an even better coverage. The
> patch to the locale definiton is unchanged.
>
> I've re-run the test and of course it still passes :)
>
> Thanks,
> egmont
>
> On Tue, Oct 13, 2015 at 11:56 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>> Hi,
>>
>> Could you please review and apply the attached patch?
>>
>> Recommended commit message body (feel free to edit as you please):
>> -----
>> Fix sorting of long consonants, a regression introduced by #13547. Fix
>> inconsistencies in uppercase vs. lowercase sorting. Fix diacritic
>> ordering. Fix ordering of foreign accents.
>>
>> Add an extensive test file.
>>
>>     [BZ #18934]
>>     * locales/hu_HU: Fix multiple bugs.
>>     * hu_HU.in: New file.
>>     * Makefile (test-input): Add hu_HU.UTF-8.
>> -----
>>
>> I know that generally one patch per issue is a cleaner approach, but
>> this time apologize for an all-in-one: the patches would heavily
>> conflict, and it would be really cumbersome to unittest an incremental
>> series. Instead, think about it as TDD (test driven development): I
>> attach a decent unittest with explanations and pointers to the rules,
>> and a locale definition that implements them.
>>
>> The addressed bugs are:
>>
>> - The fix to bug 13547 was incorrect and introduced a regression. It
>> fixed a corner case, whereas I didn't realize it broke a more typical
>> once. See details over there.
>>
>> - Two minor bugs/inconsistencies wrt. sorting upper/lowercase values,
>> as described in bug 18587.
>>
>> - Someone enabled backwards ordering of diacrits by default (bug
>> 17750), breaking tons of locales including Hungarian. So disable
>> backwards ordering in this locale definition.
>>
>> - Foreign accents should be sorted after the native Hungarian ones, it
>> wasn't the case so far.
>>
>> Plus, a unittest is added which is far more extensive than any other
>> locale has. It includes all the examples from the official rules of
>> Hungarian orthography's corresponding sections, as well as thorough
>> testing of all corner cases I could think of, created by me; and
>> comments all around.
>>
>> In addition to fixing a(n unfortunately relatively unsignificant)
>> locale, I hope that this unittest file will encourage other locale
>> maintainers to create similarly extensive tests, increasing the
>> quality of other locales in the long run and preventing regressions
>> (such as the backward diacritics ordering) from sneaking in.
>>
>> Thanks a lot,
>> egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2015-10-26 15:25   ` Egmont Koblinger
@ 2015-11-15 21:34     ` Egmont Koblinger
  2016-01-14 12:54       ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2015-11-15 21:34 UTC (permalink / raw)
  To: libc-locales

Hi,

Friendly ping... what's going on with this one?

I was the guy making the last few changes to this locale (even an
unfortunate regression), and now I also add the most extensive
unittesting any locale has (protecting against such regressions now or
in the future), so without even looking at this patch I guess you
should be quite confident that the patch only makes things better, not
worse.

Would it help if I broke it down to like 4 or 5 small patches on top
of each other, and added the unittests in the last step?

thanks,
egmont


On Mon, Oct 26, 2015 at 4:24 PM, Egmont Koblinger <egmont@gmail.com> wrote:
> Hello,
>
> Friendly ping - could you please take a look at this patch (version 5)?
>
> Is there anything I can help you with?
>
> Thanks,
> egmont
>
> On Wed, Oct 14, 2015 at 12:36 AM, Egmont Koblinger <egmont@gmail.com> wrote:
>> Hi,
>>
>> Please use the patch I attach now to this mail, not to the previous
>> one. Sorry for the confusion!
>>
>> I checked the previous patch many times, yet I missed something that
>> I've just discovered after sending the previous mail. I forgot one of
>> the compound letters from the unittest.
>>
>> The only change from the previous patch is the addition of these few
>> more lines in the unittest, so it has an even better coverage. The
>> patch to the locale definiton is unchanged.
>>
>> I've re-run the test and of course it still passes :)
>>
>> Thanks,
>> egmont
>>
>> On Tue, Oct 13, 2015 at 11:56 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>>> Hi,
>>>
>>> Could you please review and apply the attached patch?
>>>
>>> Recommended commit message body (feel free to edit as you please):
>>> -----
>>> Fix sorting of long consonants, a regression introduced by #13547. Fix
>>> inconsistencies in uppercase vs. lowercase sorting. Fix diacritic
>>> ordering. Fix ordering of foreign accents.
>>>
>>> Add an extensive test file.
>>>
>>>     [BZ #18934]
>>>     * locales/hu_HU: Fix multiple bugs.
>>>     * hu_HU.in: New file.
>>>     * Makefile (test-input): Add hu_HU.UTF-8.
>>> -----
>>>
>>> I know that generally one patch per issue is a cleaner approach, but
>>> this time apologize for an all-in-one: the patches would heavily
>>> conflict, and it would be really cumbersome to unittest an incremental
>>> series. Instead, think about it as TDD (test driven development): I
>>> attach a decent unittest with explanations and pointers to the rules,
>>> and a locale definition that implements them.
>>>
>>> The addressed bugs are:
>>>
>>> - The fix to bug 13547 was incorrect and introduced a regression. It
>>> fixed a corner case, whereas I didn't realize it broke a more typical
>>> once. See details over there.
>>>
>>> - Two minor bugs/inconsistencies wrt. sorting upper/lowercase values,
>>> as described in bug 18587.
>>>
>>> - Someone enabled backwards ordering of diacrits by default (bug
>>> 17750), breaking tons of locales including Hungarian. So disable
>>> backwards ordering in this locale definition.
>>>
>>> - Foreign accents should be sorted after the native Hungarian ones, it
>>> wasn't the case so far.
>>>
>>> Plus, a unittest is added which is far more extensive than any other
>>> locale has. It includes all the examples from the official rules of
>>> Hungarian orthography's corresponding sections, as well as thorough
>>> testing of all corner cases I could think of, created by me; and
>>> comments all around.
>>>
>>> In addition to fixing a(n unfortunately relatively unsignificant)
>>> locale, I hope that this unittest file will encourage other locale
>>> maintainers to create similarly extensive tests, increasing the
>>> quality of other locales in the long run and preventing regressions
>>> (such as the backward diacritics ordering) from sneaking in.
>>>
>>> Thanks a lot,
>>> egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2015-11-15 21:34     ` Egmont Koblinger
@ 2016-01-14 12:54       ` Egmont Koblinger
  2016-04-16  8:50         ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2016-01-14 12:54 UTC (permalink / raw)
  To: libc-locales

Hi,

Friendly ping...

Is there anything I could do to help this patch get accepted?

Regards,
egmont

On Sun, Nov 15, 2015 at 10:34 PM, Egmont Koblinger <egmont@gmail.com> wrote:
> Hi,
>
> Friendly ping... what's going on with this one?
>
> I was the guy making the last few changes to this locale (even an
> unfortunate regression), and now I also add the most extensive
> unittesting any locale has (protecting against such regressions now or
> in the future), so without even looking at this patch I guess you
> should be quite confident that the patch only makes things better, not
> worse.
>
> Would it help if I broke it down to like 4 or 5 small patches on top
> of each other, and added the unittests in the last step?
>
> thanks,
> egmont
>
>
> On Mon, Oct 26, 2015 at 4:24 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>> Hello,
>>
>> Friendly ping - could you please take a look at this patch (version 5)?
>>
>> Is there anything I can help you with?
>>
>> Thanks,
>> egmont
>>
>> On Wed, Oct 14, 2015 at 12:36 AM, Egmont Koblinger <egmont@gmail.com> wrote:
>>> Hi,
>>>
>>> Please use the patch I attach now to this mail, not to the previous
>>> one. Sorry for the confusion!
>>>
>>> I checked the previous patch many times, yet I missed something that
>>> I've just discovered after sending the previous mail. I forgot one of
>>> the compound letters from the unittest.
>>>
>>> The only change from the previous patch is the addition of these few
>>> more lines in the unittest, so it has an even better coverage. The
>>> patch to the locale definiton is unchanged.
>>>
>>> I've re-run the test and of course it still passes :)
>>>
>>> Thanks,
>>> egmont
>>>
>>> On Tue, Oct 13, 2015 at 11:56 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Could you please review and apply the attached patch?
>>>>
>>>> Recommended commit message body (feel free to edit as you please):
>>>> -----
>>>> Fix sorting of long consonants, a regression introduced by #13547. Fix
>>>> inconsistencies in uppercase vs. lowercase sorting. Fix diacritic
>>>> ordering. Fix ordering of foreign accents.
>>>>
>>>> Add an extensive test file.
>>>>
>>>>     [BZ #18934]
>>>>     * locales/hu_HU: Fix multiple bugs.
>>>>     * hu_HU.in: New file.
>>>>     * Makefile (test-input): Add hu_HU.UTF-8.
>>>> -----
>>>>
>>>> I know that generally one patch per issue is a cleaner approach, but
>>>> this time apologize for an all-in-one: the patches would heavily
>>>> conflict, and it would be really cumbersome to unittest an incremental
>>>> series. Instead, think about it as TDD (test driven development): I
>>>> attach a decent unittest with explanations and pointers to the rules,
>>>> and a locale definition that implements them.
>>>>
>>>> The addressed bugs are:
>>>>
>>>> - The fix to bug 13547 was incorrect and introduced a regression. It
>>>> fixed a corner case, whereas I didn't realize it broke a more typical
>>>> once. See details over there.
>>>>
>>>> - Two minor bugs/inconsistencies wrt. sorting upper/lowercase values,
>>>> as described in bug 18587.
>>>>
>>>> - Someone enabled backwards ordering of diacrits by default (bug
>>>> 17750), breaking tons of locales including Hungarian. So disable
>>>> backwards ordering in this locale definition.
>>>>
>>>> - Foreign accents should be sorted after the native Hungarian ones, it
>>>> wasn't the case so far.
>>>>
>>>> Plus, a unittest is added which is far more extensive than any other
>>>> locale has. It includes all the examples from the official rules of
>>>> Hungarian orthography's corresponding sections, as well as thorough
>>>> testing of all corner cases I could think of, created by me; and
>>>> comments all around.
>>>>
>>>> In addition to fixing a(n unfortunately relatively unsignificant)
>>>> locale, I hope that this unittest file will encourage other locale
>>>> maintainers to create similarly extensive tests, increasing the
>>>> quality of other locales in the long run and preventing regressions
>>>> (such as the backward diacritics ordering) from sneaking in.
>>>>
>>>> Thanks a lot,
>>>> egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2016-01-14 12:54       ` Egmont Koblinger
@ 2016-04-16  8:50         ` Egmont Koblinger
  0 siblings, 0 replies; 33+ messages in thread
From: Egmont Koblinger @ 2016-04-16  8:50 UTC (permalink / raw)
  To: libc-locales

Hi guys,

This is about the sixth time I'm sending this patch to the list.

No response whatsoever so far -- I see you're in the middle of fixing
tons of locales right now, so I really do hope it's going to be
different this time.

The patch fixes a few bugs (as detailed in previous mails and the
bugzilla entry), and backs them up with by far the most extensive
unittest any locale definition has.

Due to the fixes being driven by the unittests, it would have required
tons of extra work to split to smaller changes (that are not being
tested individually), that is, to create intermediate, deliberately
somewhat broken definitions in addition to the correct one. I hope
it's not a problem, let me know if it is.

Please kindly review and apply,

Cheers,
Egmont


diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 541c34f..59320a1 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,10 @@
+2015-09-09  Egmont Koblinger  <egmont@gmail.com>
+
+    [BZ #18934]
+    * locales/hu_HU: Fix multiple collate bugs.
+    * hu_HU.in: New file.
+    * Makefile (test-input): Add hu_HU.UTF-8.
+
 2016-04-15  Mike Frysinger  <vapier@gentoo.org>

     [BZ #16374]
diff --git a/localedata/Makefile b/localedata/Makefile
index 4ecb192..7e62b7e 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon
tst-rpmatch tst-trans \
          tst-ctype tst-langinfo tst-langinfo-static tst-numeric
 test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
           hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
-          si_LK.UTF-8 uk_UA.UTF-8
+          si_LK.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 test-input-data = $(addsuffix .in, $(basename $(test-input)))
 test-output := $(foreach s, .out .xout, \
              $(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8
en_US.ANSI_X3.4-1968 \
        hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
        nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
        zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
-       tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
+       tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 include ../gen-locales.mk
 endif

diff --git a/localedata/hu_HU.in b/localedata/hu_HU.in
new file mode 100644
index 0000000..4eb8eee
--- /dev/null
+++ b/localedata/hu_HU.in
@@ -0,0 +1,560 @@
+AkH-14-a1 acél          ; These tests are from:
+AkH-14-a1 cukor         ;
+AkH-14-a1 csók          ; A magyar helyesírás szabályai, 12. kiadás
+AkH-14-a1 gép           ; [The Rules of Hungarian Orthography, 12th edition]
+AkH-14-a1 hideg         ;
+AkH-14-a1 kettő         ; often referred to as akadémiai helyesírás
(AkH.) [academic orthography]
+AkH-14-a1 Nagy          ;
+AkH-14-a1 nyúl          ; http://helyesiras.mta.hu/helyesiras/default/akh12
+AkH-14-a1 olasz         ;
+AkH-14-a1 öröm          ; Alphabetical ordering described in #14-16.
+AkH-14-a1 remény
+AkH-14-a1 sokáig        ; #14-a1: Sort based on first letter.
+AkH-14-a1 szabad
+AkH-14-a1 Tamás
+AkH-14-a1 vásárol
+AkH-14-a2 jácint        ; #14-a2: If no other difference, lowercase
initial precedes uppercase.
+AkH-14-a2 Jácint
+AkH-14-a2 opera
+AkH-14-a2 Opera
+AkH-14-a2 szűcs
+AkH-14-a2 Szűcs
+AkH-14-a2 viola
+AkH-14-a2 Viola
+AkH-14-a3 cudar         ; #14-a3: Compound letters (cs, dz, dzs, gy,
ly, ny, sz, ty, zs)
+AkH-14-a3 cukor         ; are sorted separately, after their first letter:
+AkH-14-a3 cuppant       ; a b c cs d dz dzs e f g gy h ... l ly m n
ny o ... s sz t ty u ... z zs
+AkH-14-a3 csalit
+AkH-14-a3 csata
+AkH-14-a3 Csepel
+AkH-14-a3 Zoltán
+AkH-14-a3 zongora
+AkH-14-a3 zúdul
+AkH-14-a3 zsalu
+AkH-14-a3 zseni
+AkH-14-a3 Zsigmond
+AkH-14-b1 lom           ; #14-b1: The first difference matters.
+AkH-14-b1 lomb
+AkH-14-b1 lombik
+AkH-14-b1 Lontay
+AkH-14-b1 lovagol
+AkH-14-b1 pirinkó
+AkH-14-b1 pirinyó
+AkH-14-b1 pirít
+AkH-14-b1 pirkad
+AkH-14-b1 Piroska
+AkH-14-b1 tükör
+AkH-14-b1 Tünde
+AkH-14-b1 tünemény
+AkH-14-b1 tüntet
+AkH-14-b1 tüzér
+AkH-14-b2 kas           ; #14-b2: If a compound letter is pronounced
long, only the first letter
+AkH-14-b2 Kasmír        ; is duplicated in writing: <cs><cs> becomes
"ccs", <dzs><dzs> is "ddzs" etc.
+AkH-14-b2 Kassák        ; (unless it's at the boundary of a compound
word when it's written out twice).
+AkH-14-b2 kastély       ; Sort according to the actual tokens, not
the shorthand written form.
+AkH-14-b2 kasza         ; <k><a><sz><a>
+AkH-14-b2 kaszinó       ; <k><a><sz><i><n><ó>
+AkH-14-b2 kassza        ; <k><a><sz><sz><a>
+AkH-14-b2 kaszt         ; <k><a><sz><t>
+AkH-14-b2 mennek
+AkH-14-b2 mennének
+AkH-14-b2 menü
+AkH-14-b2 menza
+AkH-14-b2 meny          ; <m><e><ny>
+AkH-14-b2 Menyhért      ; <M><e><ny><h><é><r><t>
+AkH-14-b2 mennybolt     ; <m><e><ny><ny><b><o><l><t>
+AkH-14-b2 mennyi        ; <m><e><ny><ny><i>
+AkH-14-b2 nagy          ; <n><a><gy>
+AkH-14-b2 naggyá        ; <n><a><gy><gy><á>
+AkH-14-b2 nagygyakorlat ; <n><a><gy><gy><a><k><o><r><l><a><t>
(compound word: nagy+gyakorlat)
+AkH-14-b2 naggyal       ; <n><a><gy><gy><a><l>
+AkH-14-b2 nagyít        ; <n><a><gy><í><t>
+AkH-14-b2 nagyobb
+AkH-14-b2 nagyol
+AkH-14-b2 nagyoll
+AkH-14-c1 ír            ; #14-c1: Vowels collate equally in pairs:
a-á, e-é, i-í, o-ó, ö-ő, u-ú, ü-ű.
+AkH-14-c1 Irak
+AkH-14-c1 iram
+AkH-14-c1 Irán
+AkH-14-c1 írandó
+AkH-14-c1 iránt
+AkH-14-c1 író
+AkH-14-c1 iroda
+AkH-14-c1 irónia
+AkH-14-c2 Eger          ; #14-c2: Short vowel (unaccented, or with
diaeresis) comes first if that's the only difference.
+AkH-14-c2 egér
+AkH-14-c2 egyfelé
+AkH-14-c2 egyféle
+AkH-14-c2 elöl
+AkH-14-c2 elől
+AkH-14-c2 kerek
+AkH-14-c2 kerék
+AkH-14-c2 keres
+AkH-14-c2 kérés
+AkH-14-c2 koros
+AkH-14-c2 kóros
+AkH-14-c2 szel
+AkH-14-c2 szél
+AkH-14-c2 szeles
+AkH-14-c2 széles
+AkH-14-c2 szüret
+AkH-14-c2 szűret
+AkH-14-d1 kis részben   ; #14-d1: Spaces, hyphens are ignored.
+AkH-14-d1 kissé
+AkH-14-d1 Kiss Ernő
+AkH-14-d1 kis sorozat
+AkH-14-d1 kissorozat-gyártás
+AkH-14-d1 kis számban
+AkH-14-d1 kistányér
+AkH-14-d1 kis virág
+AkH-14-d1 márvány
+AkH-14-d1 márványkő
+AkH-14-d1 márvány sírkő
+AkH-14-d1 Márvány-tenger
+AkH-14-d1 márványtömb
+AkH-14-d1 Márvány Zsolt
+AkH-14-d1 másféle
+AkH-14-d1 másol
+AkH-14-d1 tiszafa
+AkH-14-d1 Tiszahát
+AkH-14-d1 Tisza Kálmán
+AkH-14-d1 Tisza menti
+AkH-14-d1 Tiszántúl
+AkH-14-d1 Tisza-part
+AkH-14-d1 tiszavirág
+AkH-14-d1 tiszt
+AkH-15 cérna            ; #15: Foreign accents are ignored, unless
they're the only difference,
+AkH-15 Černý            ; in which case they are sorted after the
Hungarian ones (in unspecified order).
+AkH-15 Champagne
+AkH-15 Cholnoky
+AkH-15 címez
+AkH-15 cukor
+AkH-15 Czuczor
+AkH-15 csapat
+AkH-15 Gaal
+AkH-15 galamb
+AkH-15 Gärtner
+AkH-15 gáz
+AkH-15 geodézia
+AkH-15 Georges
+AkH-15 góc
+AkH-15 Goethe
+AkH-15 moshat
+AkH-15 mosna
+AkH-15 Mošna
+AkH-15 mosópor
+AkH-15 Møsstrand
+AkH-15 mostan
+AkH-15 munka
+AkH-15 Muñoz
+alphabet a              ; These tests were created by egmont@gmail.com.
+alphabet á
+alphabet aa             ; a = á unless that's the only difference in
which case a < á.
+alphabet aá             ; (Same for e = é, i = í, o = ó, ö = ő, u =
ú, ü = ű below.)
+alphabet áa             ; Differences in accents matter from left to right.
+alphabet áá
+alphabet áp
+alphabet aq
+alphabet b
+alphabet c
+alphabet cz             ; <c><z>
+alphabet cs             ; <cs>        -- or rarely <c><s>, can't tell
for sure, assume <cs>.
+alphabet csc            ; <cs><c>
+alphabet ccs            ; <cs><cs>    -- or rarely <c><cs>, can't
tell for sure, assume <cs><cs>.
+alphabet cscs           ; <cs><cs>    -- Make sure ccs and cscs don't
collate as equal, see bug 13547.
+alphabet ccsa           ; <cs><cs><a>
+alphabet cscsa          ; <cs><cs><a> -- (These comments also apply
to all other compound letters below.)
+alphabet csd            ; <cs><d>
+alphabet d
+alphabet dz             ; <dz>
+alphabet dzd            ; <dz><d>
+alphabet ddz            ; <dz><dz>
+alphabet dzdz           ; <dz><dz>
+alphabet ddza           ; <dz><dz><a>
+alphabet dzdza          ; <dz><dz><a>
+alphabet dzdzs          ; <dz><dzs>
+alphabet dze            ; <dz><e>
+alphabet dzz            ; <dz><z>
+alphabet dzs            ; <dzs>
+alphabet dzsdz          ; <dzs><dz>
+alphabet ddzs           ; <dzs><dzs>
+alphabet dzsdzs         ; <dzs><dzs>
+alphabet ddzsa          ; <dzs><dzs><a>
+alphabet dzsdzsa        ; <dzs><dzs><a>
+alphabet dzse           ; <dzs><e>
+alphabet e
+alphabet é
+alphabet ee
+alphabet eé
+alphabet ée
+alphabet éé
+alphabet ép
+alphabet eq
+alphabet f
+alphabet g
+alphabet gz             ; <g><z>
+alphabet gy             ; <gy>
+alphabet gyg            ; <gy><g>
+alphabet ggy            ; <gy><gy>
+alphabet gygy           ; <gy><gy>
+alphabet ggya           ; <gy><gy><a>
+alphabet gygya          ; <gy><gy><a>
+alphabet gyh            ; <gy><h>
+alphabet h
+alphabet i
+alphabet í
+alphabet ii
+alphabet ií
+alphabet íi
+alphabet íí
+alphabet íp
+alphabet iq
+alphabet j
+alphabet k
+alphabet l
+alphabet lz             ; <l><z>
+alphabet ly             ; <ly>
+alphabet lyl            ; <ly><l>
+alphabet lly            ; <ly><ly>
+alphabet lyly           ; <ly><ly>
+alphabet llya           ; <ly><ly><a>
+alphabet lylya          ; <ly><ly><a>
+alphabet lym            ; <ly><m>
+alphabet m
+alphabet n
+alphabet nz             ; <n><z>
+alphabet ny             ; <ny>
+alphabet nyn            ; <ny><n>
+alphabet nny            ; <ny><ny>
+alphabet nyny           ; <ny><ny>
+alphabet nnya           ; <ny><ny><a>
+alphabet nynya          ; <ny><ny><a>
+alphabet nyo            ; <ny><o>
+alphabet o
+alphabet ó
+alphabet oo
+alphabet oó
+alphabet óo
+alphabet óó
+alphabet óp
+alphabet oq
+alphabet ö              ; ö = ő (unless that's the only difference),
but these come strictly after o and ó.
+alphabet ő
+alphabet öö
+alphabet öő
+alphabet őö
+alphabet őő
+alphabet őp
+alphabet öq
+alphabet p
+alphabet q
+alphabet r
+alphabet s
+alphabet sz             ; <sz>
+alphabet szs            ; <sz><s>
+alphabet ssz            ; <sz><sz>
+alphabet szsz           ; <sz><sz>
+alphabet ssza           ; <sz><sz><a>
+alphabet szsza          ; <sz><sz><a>
+alphabet szt            ; <sz><t>
+alphabet t
+alphabet tz             ; <t><z>
+alphabet ty             ; <ty>
+alphabet tyt            ; <ty><t>
+alphabet tty            ; <ty><ty>
+alphabet tyty           ; <ty><ty>
+alphabet ttya           ; <ty><ty><a>
+alphabet tytya          ; <ty><ty><a>
+alphabet tyu            ; <ty><u>
+alphabet u
+alphabet ú
+alphabet úp
+alphabet uq
+alphabet uu
+alphabet uú
+alphabet úu
+alphabet úú
+alphabet ü              ; ü = ű (unless that's the only difference),
but these come strictly after u and ú.
+alphabet ű
+alphabet űp
+alphabet üq
+alphabet üü
+alphabet üű
+alphabet űü
+alphabet űű
+alphabet v
+alphabet w
+alphabet x
+alphabet y
+alphabet z
+alphabet zz             ; <z><z>
+alphabet zs             ; <zs>
+alphabet zsz            ; <zs><z>
+alphabet zzs            ; <zs><zs>
+alphabet zszs           ; <zs><zs>
+alphabet zzsa           ; <zs><zs><a>
+alphabet zszsa          ; <zs><zs><a>
+case a                  ; #14-a2 specifies that if the same word
appears in lowercase as well as with
+case A                  ; uppercase initial, the lowercase one is to
be sorted first.
+case á                  ; Extend this to all other weird combinations
of upper- and lowercases.
+case Á
+case cs                 ; <cs>
+case cS
+case Cs
+case CS
+case ccs                ; <cs><cs>
+case ccS
+case cCs
+case cCS
+case Ccs
+case CcS
+case CCs
+case CCS
+case dz                 ; <dz>
+case dZ
+case Dz
+case DZ
+case ddz                ; <dz><dz>
+case ddZ
+case dDz
+case dDZ
+case Ddz
+case DdZ
+case DDz
+case DDZ
+case dzs                ; <dzs>
+case dzS
+case dZs
+case dZS
+case Dzs
+case DzS
+case DZs
+case DZS
+case ddzs               ; <dzs><dzs>
+case ddzS
+case ddZs
+case ddZS
+case dDzs
+case dDzS
+case dDZs
+case dDZS
+case Ddzs
+case DdzS
+case DdZs
+case DdZS
+case DDzs
+case DDzS
+case DDZs
+case DDZS
+case e
+case E
+case é
+case É
+case gy                 ; <gy>
+case gY
+case Gy
+case GY
+case ggy                ; <gy><gy>
+case ggY
+case gGy
+case gGY
+case Ggy
+case GgY
+case GGy
+case GGY
+case i
+case I
+case í
+case Í
+case ly                 ; <ly>
+case lY
+case Ly
+case LY
+case lly                ; <ly><ly>
+case llY
+case lLy
+case lLY
+case Lly
+case LlY
+case LLy
+case LLY
+case ny                 ; <ny>
+case nY
+case Ny
+case NY
+case nny                ; <ny><ny>
+case nnY
+case nNy
+case nNY
+case Nny
+case NnY
+case NNy
+case NNY
+case o
+case O
+case ó
+case Ó
+case ö
+case Ö
+case ő
+case Ő
+case sz                 ; <sz>
+case sZ
+case Sz
+case SZ
+case ssz                ; <sz><sz>
+case ssZ
+case sSz
+case sSZ
+case Ssz
+case SsZ
+case SSz
+case SSZ
+case ty                 ; <ty>
+case tY
+case Ty
+case TY
+case tty                ; <ty><ty>
+case ttY
+case tTy
+case tTY
+case Tty
+case TtY
+case TTy
+case TTY
+case u
+case U
+case ú
+case Ú
+case ü
+case Ü
+case ű
+case Ű
+case zs                 ; <zs>
+case zS
+case Zs
+case ZS
+case zzs                ; <zs><zs>
+case zzS
+case zZs
+case zZS
+case Zzs
+case ZzS
+case ZZs
+case ZZS
+foreign-a1 á            ; More thorough tests for foreign accents (#15).
+foreign-a1 à
+foreign-a1 àp
+foreign-a1 áq
+foreign-a2 á
+foreign-a2 â
+foreign-a2 âp
+foreign-a2 áq
+foreign-a3 á
+foreign-a3 ã
+foreign-a3 ãp
+foreign-a3 áq
+foreign-a4 á
+foreign-a4 ä
+foreign-a4 äp
+foreign-a4 áq
+foreign-a5 á
+foreign-a5 å
+foreign-a5 åp
+foreign-a5 áq
+foreign-a6 á
+foreign-a6 ă
+foreign-a6 ăp
+foreign-a6 áq
+foreign-c1 c
+foreign-c1 ç
+foreign-c1 çp
+foreign-c1 cq
+foreign-d1 d
+foreign-d1 đ
+foreign-d1 đp
+foreign-d1 dq
+foreign-e1 é
+foreign-e1 è
+foreign-e1 èp
+foreign-e1 éq
+foreign-e2 é
+foreign-e2 ê
+foreign-e2 êp
+foreign-e2 éq
+foreign-e3 é
+foreign-e3 ë
+foreign-e3 ëp
+foreign-e3 éq
+foreign-e4 é
+foreign-e4 ě
+foreign-e4 ěp
+foreign-e4 éq
+foreign-i1 í
+foreign-i1 ì
+foreign-i1 ìp
+foreign-i1 íq
+foreign-i2 í
+foreign-i2 î
+foreign-i2 îp
+foreign-i2 íq
+foreign-i3 í
+foreign-i3 ï
+foreign-i3 ïp
+foreign-i3 íq
+foreign-l1 l
+foreign-l1 ł
+foreign-l1 łp
+foreign-l1 lq
+foreign-n1 n
+foreign-n1 ñ
+foreign-n1 ñp
+foreign-n1 nq
+foreign-n2 n
+foreign-n2 ň
+foreign-n2 ňp
+foreign-n2 nq
+foreign-o1 ó            ; The rules are not explicit whether foreign
accents on top of o or u
+foreign-o1 ò            ; should be sorted among o-ó and u-ú, or
among ö-ő and ü-ű,
+foreign-o1 òp           ; but the example with Møsstrand makes it
clear that it's the former.
+foreign-o1 óq
+foreign-o2 ó
+foreign-o2 ô
+foreign-o2 ôp
+foreign-o2 óq
+foreign-o3 ó
+foreign-o3 õ
+foreign-o3 õp
+foreign-o3 óq
+foreign-o4 ó
+foreign-o4 ø
+foreign-o4 øp
+foreign-o4 óq
+foreign-r1 r
+foreign-r1 ř
+foreign-r1 řp
+foreign-r1 rq
+foreign-s1 s
+foreign-s1 š
+foreign-s1 šp
+foreign-s1 sq
+foreign-u1 ú
+foreign-u1 ù
+foreign-u1 ùp
+foreign-u1 úq
+foreign-u2 ú
+foreign-u2 û
+foreign-u2 ûp
+foreign-u2 úq
+foreign-u3 ú
+foreign-u3 ũ
+foreign-u3 ũp
+foreign-u3 úq
+foreign-u4 ú
+foreign-u4 ů
+foreign-u4 ůp
+foreign-u4 úq
+foreign-y1 y
+foreign-y1 ÿ
+foreign-y1 ÿp
+foreign-y1 yq
diff --git a/localedata/locales/hu_HU b/localedata/locales/hu_HU
index d76226d..8d1d95b 100644
--- a/localedata/locales/hu_HU
+++ b/localedata/locales/hu_HU
@@ -64,6 +64,7 @@ category "i18n:2012";LC_MEASUREMENT
 END LC_IDENTIFICATION

 LC_COLLATE
+define DIACRIT_FORWARD
 copy "iso14651_t1"

 %% a b c cs d dz dzs e f g gy h i j k l ly m n ny o o: p q
@@ -77,15 +78,18 @@ copy "iso14651_t1"
 %% dzs+dzs becomes ddzs, and so on.
 %% However, c+cs is also spelled as ccs, you need to speak
 %% the language to tell which one is the case.
-%% Tokenize ccs as <c_or_cs><cs>, and sort the tokens as
-%% a b c c_or_cs cs d... This effectively assumes cs+cs
-%% which is more frequent than c+cs, but guarantees that the
-%% strings ccs and cscs don't collate as equal.
+%% Tokenize ccs as <cs><cs> since this is much more frequent
+%% than <c><cs>, but apply SINGLE-OR-COMPOUND and COMPOUND
+%% to the tokens so that the strings ccs and cscs don't collate
+%% as equal.
+%% The same goes for all other compound consonants.

 collating-symbol  <odouble>
 collating-symbol  <udouble>

-collating-symbol  <c_or_cs>
+collating-symbol  <SINGLE-OR-COMPOUND>
+collating-symbol  <COMPOUND>
+
 collating-symbol  <cs>
 collating-element <C-S> from "<U0043><U0053>"
 collating-element <C-s> from "<U0043><U0073>"
@@ -100,7 +104,6 @@ collating-element <c-C-s> from "<U0063><U0043><U0073>"
 collating-element <c-c-S> from "<U0063><U0063><U0053>"
 collating-element <c-c-s> from "<U0063><U0063><U0073>"

-collating-symbol  <d_or_dz>
 collating-symbol  <dz>
 collating-element <D-Z> from "<U0044><U005A>"
 collating-element <D-z> from "<U0044><U007A>"
@@ -115,7 +118,6 @@ collating-element <d-D-z> from "<U0064><U0044><U007A>"
 collating-element <d-d-Z> from "<U0064><U0064><U005A>"
 collating-element <d-d-z> from "<U0064><U0064><U007A>"

-collating-symbol  <d_or_dzs>
 collating-symbol  <dzs>
 collating-element <D-Z-S> from "<U0044><U005A><U0053>"
 collating-element <D-Z-s> from "<U0044><U005A><U0073>"
@@ -142,7 +144,6 @@ collating-element <d-d-Z-s> from
"<U0064><U0064><U005A><U0073>"
 collating-element <d-d-z-S> from "<U0064><U0064><U007A><U0053>"
 collating-element <d-d-z-s> from "<U0064><U0064><U007A><U0073>"

-collating-symbol  <g_or_gy>
 collating-symbol  <gy>
 collating-element <G-Y> from "<U0047><U0059>"
 collating-element <G-y> from "<U0047><U0079>"
@@ -157,7 +158,6 @@ collating-element <g-G-y> from "<U0067><U0047><U0079>"
 collating-element <g-g-Y> from "<U0067><U0067><U0059>"
 collating-element <g-g-y> from "<U0067><U0067><U0079>"

-collating-symbol  <l_or_ly>
 collating-symbol  <ly>
 collating-element <L-Y> from "<U004C><U0059>"
 collating-element <L-y> from "<U004C><U0079>"
@@ -172,7 +172,6 @@ collating-element <l-L-y> from "<U006C><U004C><U0079>"
 collating-element <l-l-Y> from "<U006C><U006C><U0059>"
 collating-element <l-l-y> from "<U006C><U006C><U0079>"

-collating-symbol  <n_or_ny>
 collating-symbol  <ny>
 collating-element <N-Y> from "<U004E><U0059>"
 collating-element <N-y> from "<U004E><U0079>"
@@ -187,7 +186,6 @@ collating-element <n-N-y> from "<U006E><U004E><U0079>"
 collating-element <n-n-Y> from "<U006E><U006E><U0059>"
 collating-element <n-n-y> from "<U006E><U006E><U0079>"

-collating-symbol  <s_or_sz>
 collating-symbol  <sz>
 collating-element <S-Z> from "<U0053><U005A>"
 collating-element <S-z> from "<U0053><U007A>"
@@ -202,7 +200,6 @@ collating-element <s-S-z> from "<U0073><U0053><U007A>"
 collating-element <s-s-Z> from "<U0073><U0073><U005A>"
 collating-element <s-s-z> from "<U0073><U0073><U007A>"

-collating-symbol  <t_or_ty>
 collating-symbol  <ty>
 collating-element <T-Y> from "<U0054><U0059>"
 collating-element <T-y> from "<U0054><U0079>"
@@ -217,7 +214,6 @@ collating-element <t-T-y> from "<U0074><U0054><U0079>"
 collating-element <t-t-Y> from "<U0074><U0074><U0059>"
 collating-element <t-t-y> from "<U0074><U0074><U0079>"

-collating-symbol  <z_or_zs>
 collating-symbol  <zs>
 collating-element <Z-S> from "<U005A><U0053>"
 collating-element <Z-s> from "<U005A><U0073>"
@@ -232,8 +228,10 @@ collating-element <z-Z-s> from "<U007A><U005A><U0073>"
 collating-element <z-z-S> from "<U007A><U007A><U0053>"
 collating-element <z-z-s> from "<U007A><U007A><U0073>"

+collating-symbol <CAP-CAP>
 collating-symbol <CAP-MIN>
 collating-symbol <MIN-CAP>
+collating-symbol <MIN-MIN>
 collating-symbol <CAP-CAP-CAP>
 collating-symbol <CAP-CAP-MIN>
 collating-symbol <CAP-MIN-CAP>
@@ -244,6 +242,7 @@ collating-symbol <MIN-MIN-CAP>
 collating-symbol <MIN-MIN-MIN>

 reorder-after <MIN>
+<MIN-MIN>
 <MIN-CAP>
 <MIN-MIN-MIN>
 <MIN-MIN-CAP>
@@ -252,42 +251,38 @@ reorder-after <MIN>

 reorder-after <CAP>
 <CAP-MIN>
+<CAP-CAP>
 <CAP-MIN-MIN>
 <CAP-MIN-CAP>
 <CAP-CAP-MIN>
 <CAP-CAP-CAP>

 reorder-after <c>
-<c_or_cs>
 <cs>
 reorder-after <d>
-<d_or_dz>
-<d_or_dzs>
 <dz>
 <dzs>
 reorder-after <g>
-<g_or_gy>
 <gy>
 reorder-after <l>
-<l_or_ly>
 <ly>
 reorder-after <n>
-<n_or_ny>
 <ny>
 reorder-after <o>
 <odouble>
 reorder-after <s>
-<s_or_sz>
 <sz>
 reorder-after <t>
-<t_or_ty>
 <ty>
 reorder-after <u>
 <udouble>
 reorder-after <z>
-<z_or_zs>
 <zs>

+reorder-after <BAS>
+<SINGLE-OR-COMPOUND>
+<COMPOUND>
+
 reorder-after <o>
 <U00F6>    <odouble>;<REU>;<MIN>;IGNORE
 <U0151>    <odouble>;<DAC>;<MIN>;IGNORE
@@ -300,152 +295,157 @@ reorder-after <u>
 <U00DC>    <udouble>;<REU>;<CAP>;IGNORE
 <U0170>    <udouble>;<DAC>;<CAP>;IGNORE

+reorder-after <BAS>
+<ACA>
+<REU>
+<DAC>
+
 reorder-after <U0043>
-<C-S>        <cs>;<BAS>;<CAP>;IGNORE
-<C-s>        <cs>;<BAS>;<CAP-MIN>;IGNORE
-<C-C-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<C-C-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<C-c-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<C-c-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<C-S>        <cs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<C-s>        <cs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<C-C-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<C-C-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<C-c-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<C-c-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0063>
-<c-S>        <cs>;<BAS>;<MIN-CAP>;IGNORE
-<c-s>        <cs>;<BAS>;<MIN>;IGNORE
-<c-C-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<c-C-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<c-c-S>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<c-c-s>        "<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<c-S>        <cs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<c-s>        <cs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<c-C-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<c-C-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<c-c-S>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<c-c-s>
"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0044>
-<D-Z>        <dz>;<BAS>;<CAP>;IGNORE
-<D-z>        <dz>;<BAS>;<CAP-MIN>;IGNORE
-<D-D-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<D-D-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<D-d-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<D-d-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<D-Z>        <dz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<D-z>        <dz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<D-D-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<D-D-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<D-d-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<D-d-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z>        <dz>;<BAS>;<MIN-CAP>;IGNORE
-<d-z>        <dz>;<BAS>;<MIN>;IGNORE
-<d-D-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<d-D-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<d-d-Z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<d-d-z>        "<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<d-Z>        <dz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<d-z>        <dz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<d-D-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<d-D-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<d-d-Z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<d-d-z>
"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0044>
-<D-Z-S>        <dzs>;<BAS>;<CAP-CAP-CAP>;IGNORE
-<D-Z-s>        <dzs>;<BAS>;<CAP-CAP-MIN>;IGNORE
-<D-z-S>        <dzs>;<BAS>;<CAP-MIN-CAP>;IGNORE
-<D-z-s>        <dzs>;<BAS>;<CAP-MIN-MIN>;IGNORE
-<D-D-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-D-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-D-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-D-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
-<D-d-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-d-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-d-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-d-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-Z-S>        <dzs>;<COMPOUND>;<CAP-CAP-CAP>;IGNORE
+<D-Z-s>        <dzs>;<COMPOUND>;<CAP-CAP-MIN>;IGNORE
+<D-z-S>        <dzs>;<COMPOUND>;<CAP-MIN-CAP>;IGNORE
+<D-z-s>        <dzs>;<COMPOUND>;<CAP-MIN-MIN>;IGNORE
+<D-D-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-CAP>";IGNORE
+<D-D-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-MIN>";IGNORE
+<D-D-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-CAP>";IGNORE
+<D-D-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-d-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-CAP>";IGNORE
+<D-d-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-MIN>";IGNORE
+<D-d-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-CAP>";IGNORE
+<D-d-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z-S>        <dzs>;<BAS>;<MIN-CAP-CAP>;IGNORE
-<d-Z-s>        <dzs>;<BAS>;<MIN-CAP-MIN>;IGNORE
-<d-z-S>        <dzs>;<BAS>;<MIN-MIN-CAP>;IGNORE
-<d-z-s>        <dzs>;<BAS>;<MIN-MIN-MIN>;IGNORE
-<d-D-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-D-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-D-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-D-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
-<d-d-Z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-d-Z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-d-z-S>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-d-z-s>    "<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-Z-S>        <dzs>;<COMPOUND>;<MIN-CAP-CAP>;IGNORE
+<d-Z-s>        <dzs>;<COMPOUND>;<MIN-CAP-MIN>;IGNORE
+<d-z-S>        <dzs>;<COMPOUND>;<MIN-MIN-CAP>;IGNORE
+<d-z-s>        <dzs>;<COMPOUND>;<MIN-MIN-MIN>;IGNORE
+<d-D-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-CAP>";IGNORE
+<d-D-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-MIN>";IGNORE
+<d-D-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-CAP>";IGNORE
+<d-D-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-d-Z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-CAP>";IGNORE
+<d-d-Z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-MIN>";IGNORE
+<d-d-z-S>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-CAP>";IGNORE
+<d-d-z-s>    "<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-MIN>";IGNORE

 reorder-after <U0047>
-<G-Y>        <gy>;<BAS>;<CAP>;IGNORE
-<G-y>        <gy>;<BAS>;<CAP-MIN>;IGNORE
-<G-G-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<G-G-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<G-g-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<G-g-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<G-Y>        <gy>;<COMPOUND>;<CAP-CAP>;IGNORE
+<G-y>        <gy>;<COMPOUND>;<CAP-MIN>;IGNORE
+<G-G-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<G-G-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<G-g-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<G-g-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0067>
-<g-y>        <gy>;<BAS>;<MIN>;IGNORE
-<g-Y>        <gy>;<BAS>;<MIN-CAP>;IGNORE
-<g-G-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<g-G-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<g-g-Y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<g-g-y>        "<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<g-Y>        <gy>;<COMPOUND>;<MIN-CAP>;IGNORE
+<g-y>        <gy>;<COMPOUND>;<MIN-MIN>;IGNORE
+<g-G-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<g-G-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<g-g-Y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<g-g-y>
"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U004C>
-<L-Y>        <ly>;<BAS>;<CAP>;IGNORE
-<L-y>        <ly>;<BAS>;<CAP-MIN>;IGNORE
-<L-L-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<L-L-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<L-l-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<L-l-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<L-Y>        <ly>;<COMPOUND>;<CAP-CAP>;IGNORE
+<L-y>        <ly>;<COMPOUND>;<CAP-MIN>;IGNORE
+<L-L-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<L-L-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<L-l-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<L-l-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006C>
-<l-y>        <ly>;<BAS>;<MIN>;IGNORE
-<l-Y>        <ly>;<BAS>;<MIN-CAP>;IGNORE
-<l-L-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<l-L-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<l-l-Y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<l-l-y>        "<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<l-Y>        <ly>;<COMPOUND>;<MIN-CAP>;IGNORE
+<l-y>        <ly>;<COMPOUND>;<MIN-MIN>;IGNORE
+<l-L-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<l-L-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<l-l-Y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<l-l-y>
"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U004E>
-<N-Y>        <ny>;<BAS>;<CAP>;IGNORE
-<N-y>        <ny>;<BAS>;<CAP-MIN>;IGNORE
-<N-N-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<N-N-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<N-n-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<N-n-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<N-Y>        <ny>;<COMPOUND>;<CAP-CAP>;IGNORE
+<N-y>        <ny>;<COMPOUND>;<CAP-MIN>;IGNORE
+<N-N-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<N-N-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<N-n-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<N-n-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006E>
-<n-y>        <ny>;<BAS>;<MIN>;IGNORE
-<n-Y>        <ny>;<BAS>;<MIN-CAP>;IGNORE
-<n-N-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<n-N-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<n-n-Y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<n-n-y>        "<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<n-Y>        <ny>;<COMPOUND>;<MIN-CAP>;IGNORE
+<n-y>        <ny>;<COMPOUND>;<MIN-MIN>;IGNORE
+<n-N-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<n-N-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<n-n-Y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<n-n-y>
"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0053>
-<S-Z>        <sz>;<BAS>;<CAP>;IGNORE
-<S-z>        <sz>;<BAS>;<CAP-MIN>;IGNORE
-<S-S-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<S-S-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<S-s-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<S-s-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<S-Z>        <sz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<S-z>        <sz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<S-S-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<S-S-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<S-s-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<S-s-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0073>
-<s-Z>        <sz>;<BAS>;<MIN-CAP>;IGNORE
-<s-z>        <sz>;<BAS>;<MIN>;IGNORE
-<s-S-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<s-S-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<s-s-Z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<s-s-z>        "<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<s-Z>        <sz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<s-z>        <sz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<s-S-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<s-S-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<s-s-Z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<s-s-z>
"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U0054>
-<T-Y>        <ty>;<BAS>;<CAP>;IGNORE
-<T-y>        <ty>;<BAS>;<CAP-MIN>;IGNORE
-<T-T-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<T-T-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<T-t-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<T-t-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<T-Y>        <ty>;<COMPOUND>;<CAP-CAP>;IGNORE
+<T-y>        <ty>;<COMPOUND>;<CAP-MIN>;IGNORE
+<T-T-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<T-T-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<T-t-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<T-t-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0074>
-<t-Y>        <ty>;<BAS>;<MIN-CAP>;IGNORE
-<t-y>        <ty>;<BAS>;<MIN>;IGNORE
-<t-T-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<t-T-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<t-t-Y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<t-t-y>        "<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<t-Y>        <ty>;<COMPOUND>;<MIN-CAP>;IGNORE
+<t-y>        <ty>;<COMPOUND>;<MIN-MIN>;IGNORE
+<t-T-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<t-T-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<t-t-Y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<t-t-y>
"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-after <U005A>
-<Z-S>        <zs>;<BAS>;<CAP>;IGNORE
-<Z-s>        <zs>;<BAS>;<CAP-MIN>;IGNORE
-<Z-Z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<Z-Z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<Z-z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<Z-z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<Z-S>        <zs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<Z-s>        <zs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<Z-Z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<Z-Z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<Z-z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<Z-z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U007A>
-<z-S>        <zs>;<BAS>;<MIN-CAP>;IGNORE
-<z-s>        <zs>;<BAS>;<MIN>;IGNORE
-<z-Z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<z-Z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<z-z-S>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<z-z-s>        "<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<z-S>        <zs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<z-s>        <zs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<z-Z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<z-Z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<z-z-S>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<z-z-s>
"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE

 reorder-end



On Thu, Jan 14, 2016 at 1:53 PM, Egmont Koblinger <egmont@gmail.com> wrote:
> Hi,
>
> Friendly ping...
>
> Is there anything I could do to help this patch get accepted?
>
> Regards,
> egmont
>
> On Sun, Nov 15, 2015 at 10:34 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>> Hi,
>>
>> Friendly ping... what's going on with this one?
>>
>> I was the guy making the last few changes to this locale (even an
>> unfortunate regression), and now I also add the most extensive
>> unittesting any locale has (protecting against such regressions now or
>> in the future), so without even looking at this patch I guess you
>> should be quite confident that the patch only makes things better, not
>> worse.
>>
>> Would it help if I broke it down to like 4 or 5 small patches on top
>> of each other, and added the unittests in the last step?
>>
>> thanks,
>> egmont
>>
>>
>> On Mon, Oct 26, 2015 at 4:24 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>>> Hello,
>>>
>>> Friendly ping - could you please take a look at this patch (version 5)?
>>>
>>> Is there anything I can help you with?
>>>
>>> Thanks,
>>> egmont
>>>
>>> On Wed, Oct 14, 2015 at 12:36 AM, Egmont Koblinger <egmont@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Please use the patch I attach now to this mail, not to the previous
>>>> one. Sorry for the confusion!
>>>>
>>>> I checked the previous patch many times, yet I missed something that
>>>> I've just discovered after sending the previous mail. I forgot one of
>>>> the compound letters from the unittest.
>>>>
>>>> The only change from the previous patch is the addition of these few
>>>> more lines in the unittest, so it has an even better coverage. The
>>>> patch to the locale definiton is unchanged.
>>>>
>>>> I've re-run the test and of course it still passes :)
>>>>
>>>> Thanks,
>>>> egmont
>>>>
>>>> On Tue, Oct 13, 2015 at 11:56 PM, Egmont Koblinger <egmont@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Could you please review and apply the attached patch?
>>>>>
>>>>> Recommended commit message body (feel free to edit as you please):
>>>>> -----
>>>>> Fix sorting of long consonants, a regression introduced by #13547. Fix
>>>>> inconsistencies in uppercase vs. lowercase sorting. Fix diacritic
>>>>> ordering. Fix ordering of foreign accents.
>>>>>
>>>>> Add an extensive test file.
>>>>>
>>>>>     [BZ #18934]
>>>>>     * locales/hu_HU: Fix multiple bugs.
>>>>>     * hu_HU.in: New file.
>>>>>     * Makefile (test-input): Add hu_HU.UTF-8.
>>>>> -----
>>>>>
>>>>> I know that generally one patch per issue is a cleaner approach, but
>>>>> this time apologize for an all-in-one: the patches would heavily
>>>>> conflict, and it would be really cumbersome to unittest an incremental
>>>>> series. Instead, think about it as TDD (test driven development): I
>>>>> attach a decent unittest with explanations and pointers to the rules,
>>>>> and a locale definition that implements them.
>>>>>
>>>>> The addressed bugs are:
>>>>>
>>>>> - The fix to bug 13547 was incorrect and introduced a regression. It
>>>>> fixed a corner case, whereas I didn't realize it broke a more typical
>>>>> once. See details over there.
>>>>>
>>>>> - Two minor bugs/inconsistencies wrt. sorting upper/lowercase values,
>>>>> as described in bug 18587.
>>>>>
>>>>> - Someone enabled backwards ordering of diacrits by default (bug
>>>>> 17750), breaking tons of locales including Hungarian. So disable
>>>>> backwards ordering in this locale definition.
>>>>>
>>>>> - Foreign accents should be sorted after the native Hungarian ones, it
>>>>> wasn't the case so far.
>>>>>
>>>>> Plus, a unittest is added which is far more extensive than any other
>>>>> locale has. It includes all the examples from the official rules of
>>>>> Hungarian orthography's corresponding sections, as well as thorough
>>>>> testing of all corner cases I could think of, created by me; and
>>>>> comments all around.
>>>>>
>>>>> In addition to fixing a(n unfortunately relatively unsignificant)
>>>>> locale, I hope that this unittest file will encourage other locale
>>>>> maintainers to create similarly extensive tests, increasing the
>>>>> quality of other locales in the long run and preventing regressions
>>>>> (such as the backward diacritics ordering) from sneaking in.
>>>>>
>>>>> Thanks a lot,
>>>>> egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2015-10-13 22:37 ` Egmont Koblinger
  2015-10-26 15:25   ` Egmont Koblinger
@ 2016-04-21  6:13   ` Mike Frysinger
  2016-04-21 11:15     ` Egmont Koblinger
  2016-06-29 21:01     ` Egmont Koblinger
  1 sibling, 2 replies; 33+ messages in thread
From: Mike Frysinger @ 2016-04-21  6:13 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: libc-locales

[-- Attachment #1: Type: text/plain, Size: 2768 bytes --]

On 14 Oct 2015 00:36, Egmont Koblinger wrote:
> I checked the previous patch many times, yet I missed something that
> I've just discovered after sending the previous mail. I forgot one of
> the compound letters from the unittest.
> 
> The only change from the previous patch is the addition of these few
> more lines in the unittest, so it has an even better coverage. The
> patch to the locale definiton is unchanged.
> 
> I've re-run the test and of course it still passes :)

i'm inclined to merge this since you've added a test ;).  i've started
looking at LC_COLLATE in general, but it's taking a bit to internalize.
CLDR provides guidance here, but it uses the Unicode rule syntax which
is even more packed than the POSIX format.  i don't suppose you can read
it ? :)  i'll have to find/write a parser/converter for this ...

here's what CLDR says about hu:
	&C<cs<<<Cs<<<CS
	&D<dz<<<Dz<<<DZ
	&DZ<dzs<<<Dzs<<<DZS
	&G<gy<<<Gy<<<GY
	&L<ly<<<Ly<<<LY
	&N<ny<<<Ny<<<NY
	&S<sz<<<Sz<<<SZ
	&T<ty<<<Ty<<<TY
	&Z<zs<<<Zs<<<ZS
	&O<ö<<<Ö<<ő<<<Ő
	&U<ü<<<Ü<<ű<<<Ű
	&cs<<<ccs/cs
	&Cs<<<Ccs/cs
	&CS<<<CCS/CS
	&dz<<<ddz/dz
	&Dz<<<Ddz/dz
	&DZ<<<DDZ/DZ
	&dzs<<<ddzs/dzs
	&Dzs<<<Ddzs/dzs
	&DZS<<<DDZS/DZS
	&gy<<<ggy/gy
	&Gy<<<Ggy/gy
	&GY<<<GGY/GY
	&ly<<<lly/ly
	&Ly<<<Lly/ly
	&LY<<<LLY/LY
	&ny<<<nny/ny
	&Ny<<<Nny/ny
	&NY<<<NNY/NY
	&sz<<<ssz/sz
	&Sz<<<Ssz/sz
	&SZ<<<SSZ/SZ
	&ty<<<tty/ty
	&Ty<<<Tty/ty
	&TY<<<TTY/TY
	&zs<<<zzs/zs
	&Zs<<<Zzs/zs
	&ZS<<<ZZS/ZS

but i think there's a proposal to replace it with:
	&C<cs<<<cS<<<Cs<<<CS
	&D<dz<<<dZ<<<Dz<<<DZ
	&DZ<dzs<<<dzS<<<dZs<<<dZS<<<Dzs<<<DzS<<<DZs<<<DZS
	&G<gy<<<gY<<<Gy<<<GY
	&L<ly<<<lY<<<Ly<<<LY
	&N<ny<<<nY<<<Ny<<<NY
	&S<sz<<<sZ<<<Sz<<<SZ
	&T<ty<<<tY<<<Ty<<<TY
	&Z<zs<<<zS<<<Zs<<<ZS
	&O<ö<<<Ö<<ő<<<Ő
	&U<ü<<<Ü<<ű<<<Ű
	&cs<<<ccs/cs<<<ccS/cS<<<cCs/Cs<<<cCS/CS
	&Cs<<<Ccs/cs<<<CcS/cS<<<CCs/Cs
	&CS<<<CCS/CS
	&dz<<<ddz/dz<<<ddZ/dZ<<<dDz/Dz<<<dDZ/DZ
	&Dz<<<Ddz/dz<<<DdZ/dZ<<<DDz/Dz
	&DZ<<<DDZ/DZ
	&dzs<<<ddzs/dzs<<<ddzS/dzS<<<ddZs/dZs<<<ddZS/dZS<<<dDzs/Dzs<<<dDzS/DzS<<<dDZs/DZs
	<<<dDZS/DZS
	&Dzs<<<Ddzs/dzs<<<DdzS/dzS<<<DdZs/dZs<<<DdZS/dZS<<<DDzs/Dzs<<<DDzS/DzS<<<DDZs/DZs
	&DZS<<<DDZS/DZS
	&gy<<<ggy/gy<<<ggY/gY<<<gGy/Gy<<<gGY/GY
	&Gy<<<Ggy/gy<<<GgY/gY<<<GGy/Gy
	&GY<<<GGY/GY
	&ly<<<lly/ly<<<llY/lY<<<lLy/Ly<<<lLY/LY
	&Ly<<<Lly/ly<<<LlY/lY<<<LLy/Ly
	&LY<<<LLY/LY
	&ny<<<nny/ny<<<nnY/nY<<<nNy/Ny<<<nNY/NY
	&Ny<<<Nny/ny<<<NnY/nY<<<NNy/Ny
	&NY<<<NNY/NY
	&sz<<<ssz/sz<<<ssZ/sZ<<<sSz/Sz<<<sSZ/SZ
	&Sz<<<Ssz/sz<<<SsZ/sZ<<<SSz/Sz
	&SZ<<<SSZ/SZ
	&ty<<<tty/ty<<<ttY/tY<<<tTy/Ty<<<tTY/TY
	&Ty<<<Tty/ty<<<TtY/tY<<<TTy/Ty
	&TY<<<TTY/TY
	&zs<<<zzs/zs<<<zzS/zS<<<zZs/Zs<<<zZS/ZS
	&Zs<<<Zzs/zs<<<ZzS/zS<<<ZZs/Zs
	&ZS<<<ZZS/ZS
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2016-04-21  6:13   ` Mike Frysinger
@ 2016-04-21 11:15     ` Egmont Koblinger
  2016-04-21 15:18       ` Mike Frysinger
  2016-06-29 21:01     ` Egmont Koblinger
  1 sibling, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2016-04-21 11:15 UTC (permalink / raw)
  To: libc-locales

Hi,

You're right: I cannot read the Unicode rule syntax format :)  Its
compactness makes me worry whether it's able to express all the
subtleties that glibc's format can.

Even if I could read this syntax, I wouldn't dare to review the
definitions themselves without being able to verify against the
unittests.

Do you happen to have the link where the proposal for the new
definitions were made? I'd like to ask them to validate the rules
against my test file.

That being said, I hope fixing/verifying CLDR isn't a blocker for
applying the fix to glibc.

Thanks a lot,
egmont

On Thu, Apr 21, 2016 at 8:13 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On 14 Oct 2015 00:36, Egmont Koblinger wrote:
>> I checked the previous patch many times, yet I missed something that
>> I've just discovered after sending the previous mail. I forgot one of
>> the compound letters from the unittest.
>>
>> The only change from the previous patch is the addition of these few
>> more lines in the unittest, so it has an even better coverage. The
>> patch to the locale definiton is unchanged.
>>
>> I've re-run the test and of course it still passes :)
>
> i'm inclined to merge this since you've added a test ;).  i've started
> looking at LC_COLLATE in general, but it's taking a bit to internalize.
> CLDR provides guidance here, but it uses the Unicode rule syntax which
> is even more packed than the POSIX format.  i don't suppose you can read
> it ? :)  i'll have to find/write a parser/converter for this ...
>
> here's what CLDR says about hu:
>         &C<cs<<<Cs<<<CS
>         &D<dz<<<Dz<<<DZ
>         &DZ<dzs<<<Dzs<<<DZS
>         &G<gy<<<Gy<<<GY
>         &L<ly<<<Ly<<<LY
>         &N<ny<<<Ny<<<NY
>         &S<sz<<<Sz<<<SZ
>         &T<ty<<<Ty<<<TY
>         &Z<zs<<<Zs<<<ZS
>         &O<ö<<<Ö<<ő<<<Ő
>         &U<ü<<<Ü<<ű<<<Ű
>         &cs<<<ccs/cs
>         &Cs<<<Ccs/cs
>         &CS<<<CCS/CS
>         &dz<<<ddz/dz
>         &Dz<<<Ddz/dz
>         &DZ<<<DDZ/DZ
>         &dzs<<<ddzs/dzs
>         &Dzs<<<Ddzs/dzs
>         &DZS<<<DDZS/DZS
>         &gy<<<ggy/gy
>         &Gy<<<Ggy/gy
>         &GY<<<GGY/GY
>         &ly<<<lly/ly
>         &Ly<<<Lly/ly
>         &LY<<<LLY/LY
>         &ny<<<nny/ny
>         &Ny<<<Nny/ny
>         &NY<<<NNY/NY
>         &sz<<<ssz/sz
>         &Sz<<<Ssz/sz
>         &SZ<<<SSZ/SZ
>         &ty<<<tty/ty
>         &Ty<<<Tty/ty
>         &TY<<<TTY/TY
>         &zs<<<zzs/zs
>         &Zs<<<Zzs/zs
>         &ZS<<<ZZS/ZS
>
> but i think there's a proposal to replace it with:
>         &C<cs<<<cS<<<Cs<<<CS
>         &D<dz<<<dZ<<<Dz<<<DZ
>         &DZ<dzs<<<dzS<<<dZs<<<dZS<<<Dzs<<<DzS<<<DZs<<<DZS
>         &G<gy<<<gY<<<Gy<<<GY
>         &L<ly<<<lY<<<Ly<<<LY
>         &N<ny<<<nY<<<Ny<<<NY
>         &S<sz<<<sZ<<<Sz<<<SZ
>         &T<ty<<<tY<<<Ty<<<TY
>         &Z<zs<<<zS<<<Zs<<<ZS
>         &O<ö<<<Ö<<ő<<<Ő
>         &U<ü<<<Ü<<ű<<<Ű
>         &cs<<<ccs/cs<<<ccS/cS<<<cCs/Cs<<<cCS/CS
>         &Cs<<<Ccs/cs<<<CcS/cS<<<CCs/Cs
>         &CS<<<CCS/CS
>         &dz<<<ddz/dz<<<ddZ/dZ<<<dDz/Dz<<<dDZ/DZ
>         &Dz<<<Ddz/dz<<<DdZ/dZ<<<DDz/Dz
>         &DZ<<<DDZ/DZ
>         &dzs<<<ddzs/dzs<<<ddzS/dzS<<<ddZs/dZs<<<ddZS/dZS<<<dDzs/Dzs<<<dDzS/DzS<<<dDZs/DZs
>         <<<dDZS/DZS
>         &Dzs<<<Ddzs/dzs<<<DdzS/dzS<<<DdZs/dZs<<<DdZS/dZS<<<DDzs/Dzs<<<DDzS/DzS<<<DDZs/DZs
>         &DZS<<<DDZS/DZS
>         &gy<<<ggy/gy<<<ggY/gY<<<gGy/Gy<<<gGY/GY
>         &Gy<<<Ggy/gy<<<GgY/gY<<<GGy/Gy
>         &GY<<<GGY/GY
>         &ly<<<lly/ly<<<llY/lY<<<lLy/Ly<<<lLY/LY
>         &Ly<<<Lly/ly<<<LlY/lY<<<LLy/Ly
>         &LY<<<LLY/LY
>         &ny<<<nny/ny<<<nnY/nY<<<nNy/Ny<<<nNY/NY
>         &Ny<<<Nny/ny<<<NnY/nY<<<NNy/Ny
>         &NY<<<NNY/NY
>         &sz<<<ssz/sz<<<ssZ/sZ<<<sSz/Sz<<<sSZ/SZ
>         &Sz<<<Ssz/sz<<<SsZ/sZ<<<SSz/Sz
>         &SZ<<<SSZ/SZ
>         &ty<<<tty/ty<<<ttY/tY<<<tTy/Ty<<<tTY/TY
>         &Ty<<<Tty/ty<<<TtY/tY<<<TTy/Ty
>         &TY<<<TTY/TY
>         &zs<<<zzs/zs<<<zzS/zS<<<zZs/Zs<<<zZS/ZS
>         &Zs<<<Zzs/zs<<<ZzS/zS<<<ZZs/Zs
>         &ZS<<<ZZS/ZS
> -mike

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2016-04-21 11:15     ` Egmont Koblinger
@ 2016-04-21 15:18       ` Mike Frysinger
  0 siblings, 0 replies; 33+ messages in thread
From: Mike Frysinger @ 2016-04-21 15:18 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: libc-locales

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

On 21 Apr 2016 13:14, Egmont Koblinger wrote:
> You're right: I cannot read the Unicode rule syntax format :)  Its
> compactness makes me worry whether it's able to express all the
> subtleties that glibc's format can.
> 
> Even if I could read this syntax, I wouldn't dare to review the
> definitions themselves without being able to verify against the
> unittests.
> 
> Do you happen to have the link where the proposal for the new
> definitions were made? I'd like to ask them to validate the rules
> against my test file.

it's not clear where that is coming from.  i searched but couldn't
find a ticket for it.

> That being said, I hope fixing/verifying CLDR isn't a blocker for
> applying the fix to glibc.

no, we're not holding up collation changes at this time.  we're moving
in that direction though, so it'll need to be fixed in cldr eventually.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2016-04-21  6:13   ` Mike Frysinger
  2016-04-21 11:15     ` Egmont Koblinger
@ 2016-06-29 21:01     ` Egmont Koblinger
  2017-01-31 23:17       ` Egmont Koblinger
  1 sibling, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2016-06-29 21:01 UTC (permalink / raw)
  To: libc-locales

Hi,

On Thu, Apr 21, 2016 at 8:13 AM, Mike Frysinger <vapier@gentoo.org> wrote:

> i'm inclined to merge this since you've added a test ;).

Any news on this one?


thanks,
egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2016-06-29 21:01     ` Egmont Koblinger
@ 2017-01-31 23:17       ` Egmont Koblinger
  2017-02-01  0:48         ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-01-31 23:17 UTC (permalink / raw)
  To: libc-locales

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

Hi guys,

Could we please return to this old topic and get it addressed once and for all?

It's approximately the 8th (!!!) time I'm sending this out and/or
pinging you. The patch has been there for more than a year. Not a
single complaint, concern, question etc. was raised.

The patch not only addresses multiple bugs, but also backs it up with
by far the most exhaustive, clean, modular collation unittest any
locale has.

Mike, the last time (in last April) you said "i'm inclined to merge
this"... but then nothing happend, I wonder why.

Please guys, either commit this patch, or give me a clean and
actionable feedback about your concerns. Let me know if there's
anything else you need, but pretty please, let's not delay this issue
any further.


Thank you,

egmont

[-- Attachment #2: glibc-18934-hu-collate-v5.patch --]
[-- Type: text/x-patch, Size: 35189 bytes --]

diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 5bcb822..0a37afa 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,10 @@
+2015-09-09  Egmont Koblinger  <egmont@gmail.com>
+
+	[BZ #18934]
+	* locales/hu_HU: Fix multiple collate bugs.
+	* hu_HU.in: New file.
+	* Makefile (test-input): Add hu_HU.UTF-8.
+
 2015-09-03  Egmont Koblinger  <egmont@gmail.com>
 
 	[BZ #18918]
diff --git a/localedata/Makefile b/localedata/Makefile
index ebf6ac9..637a10f 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon tst-rpmatch tst-trans \
 	     tst-ctype tst-langinfo tst-langinfo-static tst-numeric
 test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
 	      hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
-	      si_LK.UTF-8 uk_UA.UTF-8
+	      si_LK.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 test-input-data = $(addsuffix .in, $(basename $(test-input)))
 test-output := $(foreach s, .out .xout, \
 			 $(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
 	   hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
 	   nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
 	   zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
-	   tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
+	   tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 include ../gen-locales.mk
 endif
 
diff --git a/localedata/hu_HU.in b/localedata/hu_HU.in
new file mode 100644
index 0000000..4eb8eee
--- /dev/null
+++ b/localedata/hu_HU.in
@@ -0,0 +1,560 @@
+AkH-14-a1 acél          ; These tests are from:
+AkH-14-a1 cukor         ;
+AkH-14-a1 csók          ; A magyar helyesírás szabályai, 12. kiadás
+AkH-14-a1 gép           ; [The Rules of Hungarian Orthography, 12th edition]
+AkH-14-a1 hideg         ;
+AkH-14-a1 kettő         ; often referred to as akadémiai helyesírás (AkH.) [academic orthography]
+AkH-14-a1 Nagy          ;
+AkH-14-a1 nyúl          ; http://helyesiras.mta.hu/helyesiras/default/akh12
+AkH-14-a1 olasz         ;
+AkH-14-a1 öröm          ; Alphabetical ordering described in #14-16.
+AkH-14-a1 remény
+AkH-14-a1 sokáig        ; #14-a1: Sort based on first letter.
+AkH-14-a1 szabad
+AkH-14-a1 Tamás
+AkH-14-a1 vásárol
+AkH-14-a2 jácint        ; #14-a2: If no other difference, lowercase initial precedes uppercase.
+AkH-14-a2 Jácint
+AkH-14-a2 opera
+AkH-14-a2 Opera
+AkH-14-a2 szűcs
+AkH-14-a2 Szűcs
+AkH-14-a2 viola
+AkH-14-a2 Viola
+AkH-14-a3 cudar         ; #14-a3: Compound letters (cs, dz, dzs, gy, ly, ny, sz, ty, zs)
+AkH-14-a3 cukor         ; are sorted separately, after their first letter:
+AkH-14-a3 cuppant       ; a b c cs d dz dzs e f g gy h ... l ly m n ny o ... s sz t ty u ... z zs
+AkH-14-a3 csalit
+AkH-14-a3 csata
+AkH-14-a3 Csepel
+AkH-14-a3 Zoltán
+AkH-14-a3 zongora
+AkH-14-a3 zúdul
+AkH-14-a3 zsalu
+AkH-14-a3 zseni
+AkH-14-a3 Zsigmond
+AkH-14-b1 lom           ; #14-b1: The first difference matters.
+AkH-14-b1 lomb
+AkH-14-b1 lombik
+AkH-14-b1 Lontay
+AkH-14-b1 lovagol
+AkH-14-b1 pirinkó
+AkH-14-b1 pirinyó
+AkH-14-b1 pirít
+AkH-14-b1 pirkad
+AkH-14-b1 Piroska
+AkH-14-b1 tükör
+AkH-14-b1 Tünde
+AkH-14-b1 tünemény
+AkH-14-b1 tüntet
+AkH-14-b1 tüzér
+AkH-14-b2 kas           ; #14-b2: If a compound letter is pronounced long, only the first letter
+AkH-14-b2 Kasmír        ; is duplicated in writing: <cs><cs> becomes "ccs", <dzs><dzs> is "ddzs" etc.
+AkH-14-b2 Kassák        ; (unless it's at the boundary of a compound word when it's written out twice).
+AkH-14-b2 kastély       ; Sort according to the actual tokens, not the shorthand written form.
+AkH-14-b2 kasza         ; <k><a><sz><a>
+AkH-14-b2 kaszinó       ; <k><a><sz><i><n><ó>
+AkH-14-b2 kassza        ; <k><a><sz><sz><a>
+AkH-14-b2 kaszt         ; <k><a><sz><t>
+AkH-14-b2 mennek
+AkH-14-b2 mennének
+AkH-14-b2 menü
+AkH-14-b2 menza
+AkH-14-b2 meny          ; <m><e><ny>
+AkH-14-b2 Menyhért      ; <M><e><ny><h><é><r><t>
+AkH-14-b2 mennybolt     ; <m><e><ny><ny><b><o><l><t>
+AkH-14-b2 mennyi        ; <m><e><ny><ny><i>
+AkH-14-b2 nagy          ; <n><a><gy>
+AkH-14-b2 naggyá        ; <n><a><gy><gy><á>
+AkH-14-b2 nagygyakorlat ; <n><a><gy><gy><a><k><o><r><l><a><t> (compound word: nagy+gyakorlat)
+AkH-14-b2 naggyal       ; <n><a><gy><gy><a><l>
+AkH-14-b2 nagyít        ; <n><a><gy><í><t>
+AkH-14-b2 nagyobb
+AkH-14-b2 nagyol
+AkH-14-b2 nagyoll
+AkH-14-c1 ír            ; #14-c1: Vowels collate equally in pairs: a-á, e-é, i-í, o-ó, ö-ő, u-ú, ü-ű.
+AkH-14-c1 Irak
+AkH-14-c1 iram
+AkH-14-c1 Irán
+AkH-14-c1 írandó
+AkH-14-c1 iránt
+AkH-14-c1 író
+AkH-14-c1 iroda
+AkH-14-c1 irónia
+AkH-14-c2 Eger          ; #14-c2: Short vowel (unaccented, or with diaeresis) comes first if that's the only difference.
+AkH-14-c2 egér
+AkH-14-c2 egyfelé
+AkH-14-c2 egyféle
+AkH-14-c2 elöl
+AkH-14-c2 elől
+AkH-14-c2 kerek
+AkH-14-c2 kerék
+AkH-14-c2 keres
+AkH-14-c2 kérés
+AkH-14-c2 koros
+AkH-14-c2 kóros
+AkH-14-c2 szel
+AkH-14-c2 szél
+AkH-14-c2 szeles
+AkH-14-c2 széles
+AkH-14-c2 szüret
+AkH-14-c2 szűret
+AkH-14-d1 kis részben   ; #14-d1: Spaces, hyphens are ignored.
+AkH-14-d1 kissé
+AkH-14-d1 Kiss Ernő
+AkH-14-d1 kis sorozat
+AkH-14-d1 kissorozat-gyártás
+AkH-14-d1 kis számban
+AkH-14-d1 kistányér
+AkH-14-d1 kis virág
+AkH-14-d1 márvány
+AkH-14-d1 márványkő
+AkH-14-d1 márvány sírkő
+AkH-14-d1 Márvány-tenger
+AkH-14-d1 márványtömb
+AkH-14-d1 Márvány Zsolt
+AkH-14-d1 másféle
+AkH-14-d1 másol
+AkH-14-d1 tiszafa
+AkH-14-d1 Tiszahát
+AkH-14-d1 Tisza Kálmán
+AkH-14-d1 Tisza menti
+AkH-14-d1 Tiszántúl
+AkH-14-d1 Tisza-part
+AkH-14-d1 tiszavirág
+AkH-14-d1 tiszt
+AkH-15 cérna            ; #15: Foreign accents are ignored, unless they're the only difference,
+AkH-15 Černý            ; in which case they are sorted after the Hungarian ones (in unspecified order).
+AkH-15 Champagne
+AkH-15 Cholnoky
+AkH-15 címez
+AkH-15 cukor
+AkH-15 Czuczor
+AkH-15 csapat
+AkH-15 Gaal
+AkH-15 galamb
+AkH-15 Gärtner
+AkH-15 gáz
+AkH-15 geodézia
+AkH-15 Georges
+AkH-15 góc
+AkH-15 Goethe
+AkH-15 moshat
+AkH-15 mosna
+AkH-15 Mošna
+AkH-15 mosópor
+AkH-15 Møsstrand
+AkH-15 mostan
+AkH-15 munka
+AkH-15 Muñoz
+alphabet a              ; These tests were created by egmont@gmail.com.
+alphabet á
+alphabet aa             ; a = á unless that's the only difference in which case a < á.
+alphabet aá             ; (Same for e = é, i = í, o = ó, ö = ő, u = ú, ü = ű below.)
+alphabet áa             ; Differences in accents matter from left to right.
+alphabet áá
+alphabet áp
+alphabet aq
+alphabet b
+alphabet c
+alphabet cz             ; <c><z>
+alphabet cs             ; <cs>        -- or rarely <c><s>, can't tell for sure, assume <cs>.
+alphabet csc            ; <cs><c>
+alphabet ccs            ; <cs><cs>    -- or rarely <c><cs>, can't tell for sure, assume <cs><cs>.
+alphabet cscs           ; <cs><cs>    -- Make sure ccs and cscs don't collate as equal, see bug 13547.
+alphabet ccsa           ; <cs><cs><a>
+alphabet cscsa          ; <cs><cs><a> -- (These comments also apply to all other compound letters below.)
+alphabet csd            ; <cs><d>
+alphabet d
+alphabet dz             ; <dz>
+alphabet dzd            ; <dz><d>
+alphabet ddz            ; <dz><dz>
+alphabet dzdz           ; <dz><dz>
+alphabet ddza           ; <dz><dz><a>
+alphabet dzdza          ; <dz><dz><a>
+alphabet dzdzs          ; <dz><dzs>
+alphabet dze            ; <dz><e>
+alphabet dzz            ; <dz><z>
+alphabet dzs            ; <dzs>
+alphabet dzsdz          ; <dzs><dz>
+alphabet ddzs           ; <dzs><dzs>
+alphabet dzsdzs         ; <dzs><dzs>
+alphabet ddzsa          ; <dzs><dzs><a>
+alphabet dzsdzsa        ; <dzs><dzs><a>
+alphabet dzse           ; <dzs><e>
+alphabet e
+alphabet é
+alphabet ee
+alphabet eé
+alphabet ée
+alphabet éé
+alphabet ép
+alphabet eq
+alphabet f
+alphabet g
+alphabet gz             ; <g><z>
+alphabet gy             ; <gy>
+alphabet gyg            ; <gy><g>
+alphabet ggy            ; <gy><gy>
+alphabet gygy           ; <gy><gy>
+alphabet ggya           ; <gy><gy><a>
+alphabet gygya          ; <gy><gy><a>
+alphabet gyh            ; <gy><h>
+alphabet h
+alphabet i
+alphabet í
+alphabet ii
+alphabet ií
+alphabet íi
+alphabet íí
+alphabet íp
+alphabet iq
+alphabet j
+alphabet k
+alphabet l
+alphabet lz             ; <l><z>
+alphabet ly             ; <ly>
+alphabet lyl            ; <ly><l>
+alphabet lly            ; <ly><ly>
+alphabet lyly           ; <ly><ly>
+alphabet llya           ; <ly><ly><a>
+alphabet lylya          ; <ly><ly><a>
+alphabet lym            ; <ly><m>
+alphabet m
+alphabet n
+alphabet nz             ; <n><z>
+alphabet ny             ; <ny>
+alphabet nyn            ; <ny><n>
+alphabet nny            ; <ny><ny>
+alphabet nyny           ; <ny><ny>
+alphabet nnya           ; <ny><ny><a>
+alphabet nynya          ; <ny><ny><a>
+alphabet nyo            ; <ny><o>
+alphabet o
+alphabet ó
+alphabet oo
+alphabet oó
+alphabet óo
+alphabet óó
+alphabet óp
+alphabet oq
+alphabet ö              ; ö = ő (unless that's the only difference), but these come strictly after o and ó.
+alphabet ő
+alphabet öö
+alphabet öő
+alphabet őö
+alphabet őő
+alphabet őp
+alphabet öq
+alphabet p
+alphabet q
+alphabet r
+alphabet s
+alphabet sz             ; <sz>
+alphabet szs            ; <sz><s>
+alphabet ssz            ; <sz><sz>
+alphabet szsz           ; <sz><sz>
+alphabet ssza           ; <sz><sz><a>
+alphabet szsza          ; <sz><sz><a>
+alphabet szt            ; <sz><t>
+alphabet t
+alphabet tz             ; <t><z>
+alphabet ty             ; <ty>
+alphabet tyt            ; <ty><t>
+alphabet tty            ; <ty><ty>
+alphabet tyty           ; <ty><ty>
+alphabet ttya           ; <ty><ty><a>
+alphabet tytya          ; <ty><ty><a>
+alphabet tyu            ; <ty><u>
+alphabet u
+alphabet ú
+alphabet úp
+alphabet uq
+alphabet uu
+alphabet uú
+alphabet úu
+alphabet úú
+alphabet ü              ; ü = ű (unless that's the only difference), but these come strictly after u and ú.
+alphabet ű
+alphabet űp
+alphabet üq
+alphabet üü
+alphabet üű
+alphabet űü
+alphabet űű
+alphabet v
+alphabet w
+alphabet x
+alphabet y
+alphabet z
+alphabet zz             ; <z><z>
+alphabet zs             ; <zs>
+alphabet zsz            ; <zs><z>
+alphabet zzs            ; <zs><zs>
+alphabet zszs           ; <zs><zs>
+alphabet zzsa           ; <zs><zs><a>
+alphabet zszsa          ; <zs><zs><a>
+case a                  ; #14-a2 specifies that if the same word appears in lowercase as well as with
+case A                  ; uppercase initial, the lowercase one is to be sorted first.
+case á                  ; Extend this to all other weird combinations of upper- and lowercases.
+case Á
+case cs                 ; <cs>
+case cS
+case Cs
+case CS
+case ccs                ; <cs><cs>
+case ccS
+case cCs
+case cCS
+case Ccs
+case CcS
+case CCs
+case CCS
+case dz                 ; <dz>
+case dZ
+case Dz
+case DZ
+case ddz                ; <dz><dz>
+case ddZ
+case dDz
+case dDZ
+case Ddz
+case DdZ
+case DDz
+case DDZ
+case dzs                ; <dzs>
+case dzS
+case dZs
+case dZS
+case Dzs
+case DzS
+case DZs
+case DZS
+case ddzs               ; <dzs><dzs>
+case ddzS
+case ddZs
+case ddZS
+case dDzs
+case dDzS
+case dDZs
+case dDZS
+case Ddzs
+case DdzS
+case DdZs
+case DdZS
+case DDzs
+case DDzS
+case DDZs
+case DDZS
+case e
+case E
+case é
+case É
+case gy                 ; <gy>
+case gY
+case Gy
+case GY
+case ggy                ; <gy><gy>
+case ggY
+case gGy
+case gGY
+case Ggy
+case GgY
+case GGy
+case GGY
+case i
+case I
+case í
+case Í
+case ly                 ; <ly>
+case lY
+case Ly
+case LY
+case lly                ; <ly><ly>
+case llY
+case lLy
+case lLY
+case Lly
+case LlY
+case LLy
+case LLY
+case ny                 ; <ny>
+case nY
+case Ny
+case NY
+case nny                ; <ny><ny>
+case nnY
+case nNy
+case nNY
+case Nny
+case NnY
+case NNy
+case NNY
+case o
+case O
+case ó
+case Ó
+case ö
+case Ö
+case ő
+case Ő
+case sz                 ; <sz>
+case sZ
+case Sz
+case SZ
+case ssz                ; <sz><sz>
+case ssZ
+case sSz
+case sSZ
+case Ssz
+case SsZ
+case SSz
+case SSZ
+case ty                 ; <ty>
+case tY
+case Ty
+case TY
+case tty                ; <ty><ty>
+case ttY
+case tTy
+case tTY
+case Tty
+case TtY
+case TTy
+case TTY
+case u
+case U
+case ú
+case Ú
+case ü
+case Ü
+case ű
+case Ű
+case zs                 ; <zs>
+case zS
+case Zs
+case ZS
+case zzs                ; <zs><zs>
+case zzS
+case zZs
+case zZS
+case Zzs
+case ZzS
+case ZZs
+case ZZS
+foreign-a1 á            ; More thorough tests for foreign accents (#15).
+foreign-a1 à
+foreign-a1 àp
+foreign-a1 áq
+foreign-a2 á
+foreign-a2 â
+foreign-a2 âp
+foreign-a2 áq
+foreign-a3 á
+foreign-a3 ã
+foreign-a3 ãp
+foreign-a3 áq
+foreign-a4 á
+foreign-a4 ä
+foreign-a4 äp
+foreign-a4 áq
+foreign-a5 á
+foreign-a5 å
+foreign-a5 åp
+foreign-a5 áq
+foreign-a6 á
+foreign-a6 ă
+foreign-a6 ăp
+foreign-a6 áq
+foreign-c1 c
+foreign-c1 ç
+foreign-c1 çp
+foreign-c1 cq
+foreign-d1 d
+foreign-d1 đ
+foreign-d1 đp
+foreign-d1 dq
+foreign-e1 é
+foreign-e1 è
+foreign-e1 èp
+foreign-e1 éq
+foreign-e2 é
+foreign-e2 ê
+foreign-e2 êp
+foreign-e2 éq
+foreign-e3 é
+foreign-e3 ë
+foreign-e3 ëp
+foreign-e3 éq
+foreign-e4 é
+foreign-e4 ě
+foreign-e4 ěp
+foreign-e4 éq
+foreign-i1 í
+foreign-i1 ì
+foreign-i1 ìp
+foreign-i1 íq
+foreign-i2 í
+foreign-i2 î
+foreign-i2 îp
+foreign-i2 íq
+foreign-i3 í
+foreign-i3 ï
+foreign-i3 ïp
+foreign-i3 íq
+foreign-l1 l
+foreign-l1 ł
+foreign-l1 łp
+foreign-l1 lq
+foreign-n1 n
+foreign-n1 ñ
+foreign-n1 ñp
+foreign-n1 nq
+foreign-n2 n
+foreign-n2 ň
+foreign-n2 ňp
+foreign-n2 nq
+foreign-o1 ó            ; The rules are not explicit whether foreign accents on top of o or u
+foreign-o1 ò            ; should be sorted among o-ó and u-ú, or among ö-ő and ü-ű,
+foreign-o1 òp           ; but the example with Møsstrand makes it clear that it's the former.
+foreign-o1 óq
+foreign-o2 ó
+foreign-o2 ô
+foreign-o2 ôp
+foreign-o2 óq
+foreign-o3 ó
+foreign-o3 õ
+foreign-o3 õp
+foreign-o3 óq
+foreign-o4 ó
+foreign-o4 ø
+foreign-o4 øp
+foreign-o4 óq
+foreign-r1 r
+foreign-r1 ř
+foreign-r1 řp
+foreign-r1 rq
+foreign-s1 s
+foreign-s1 š
+foreign-s1 šp
+foreign-s1 sq
+foreign-u1 ú
+foreign-u1 ù
+foreign-u1 ùp
+foreign-u1 úq
+foreign-u2 ú
+foreign-u2 û
+foreign-u2 ûp
+foreign-u2 úq
+foreign-u3 ú
+foreign-u3 ũ
+foreign-u3 ũp
+foreign-u3 úq
+foreign-u4 ú
+foreign-u4 ů
+foreign-u4 ůp
+foreign-u4 úq
+foreign-y1 y
+foreign-y1 ÿ
+foreign-y1 ÿp
+foreign-y1 yq
diff --git a/localedata/locales/hu_HU b/localedata/locales/hu_HU
index 0a8a17c..f6c5aa1 100644
--- a/localedata/locales/hu_HU
+++ b/localedata/locales/hu_HU
@@ -59,6 +59,7 @@ category  "hu_HU:2000";LC_MEASUREMENT
 END LC_IDENTIFICATION
 
 LC_COLLATE
+define DIACRIT_FORWARD
 copy "iso14651_t1"
 
 %% a b c cs d dz dzs e f g gy h i j k l ly m n ny o o: p q
@@ -72,15 +73,18 @@ copy "iso14651_t1"
 %% dzs+dzs becomes ddzs, and so on.
 %% However, c+cs is also spelled as ccs, you need to speak
 %% the language to tell which one is the case.
-%% Tokenize ccs as <c_or_cs><cs>, and sort the tokens as
-%% a b c c_or_cs cs d... This effectively assumes cs+cs
-%% which is more frequent than c+cs, but guarantees that the
-%% strings ccs and cscs don't collate as equal.
+%% Tokenize ccs as <cs><cs> since this is much more frequent
+%% than <c><cs>, but apply SINGLE-OR-COMPOUND and COMPOUND
+%% to the tokens so that the strings ccs and cscs don't collate
+%% as equal.
+%% The same goes for all other compound consonants.
 
 collating-symbol  <odouble>
 collating-symbol  <udouble>
 
-collating-symbol  <c_or_cs>
+collating-symbol  <SINGLE-OR-COMPOUND>
+collating-symbol  <COMPOUND>
+
 collating-symbol  <cs>
 collating-element <C-S> from "<U0043><U0053>"
 collating-element <C-s> from "<U0043><U0073>"
@@ -95,7 +99,6 @@ collating-element <c-C-s> from "<U0063><U0043><U0073>"
 collating-element <c-c-S> from "<U0063><U0063><U0053>"
 collating-element <c-c-s> from "<U0063><U0063><U0073>"
 
-collating-symbol  <d_or_dz>
 collating-symbol  <dz>
 collating-element <D-Z> from "<U0044><U005A>"
 collating-element <D-z> from "<U0044><U007A>"
@@ -110,7 +113,6 @@ collating-element <d-D-z> from "<U0064><U0044><U007A>"
 collating-element <d-d-Z> from "<U0064><U0064><U005A>"
 collating-element <d-d-z> from "<U0064><U0064><U007A>"
 
-collating-symbol  <d_or_dzs>
 collating-symbol  <dzs>
 collating-element <D-Z-S> from "<U0044><U005A><U0053>"
 collating-element <D-Z-s> from "<U0044><U005A><U0073>"
@@ -137,7 +139,6 @@ collating-element <d-d-Z-s> from "<U0064><U0064><U005A><U0073>"
 collating-element <d-d-z-S> from "<U0064><U0064><U007A><U0053>"
 collating-element <d-d-z-s> from "<U0064><U0064><U007A><U0073>"
 
-collating-symbol  <g_or_gy>
 collating-symbol  <gy>
 collating-element <G-Y> from "<U0047><U0059>"
 collating-element <G-y> from "<U0047><U0079>"
@@ -152,7 +153,6 @@ collating-element <g-G-y> from "<U0067><U0047><U0079>"
 collating-element <g-g-Y> from "<U0067><U0067><U0059>"
 collating-element <g-g-y> from "<U0067><U0067><U0079>"
 
-collating-symbol  <l_or_ly>
 collating-symbol  <ly>
 collating-element <L-Y> from "<U004C><U0059>"
 collating-element <L-y> from "<U004C><U0079>"
@@ -167,7 +167,6 @@ collating-element <l-L-y> from "<U006C><U004C><U0079>"
 collating-element <l-l-Y> from "<U006C><U006C><U0059>"
 collating-element <l-l-y> from "<U006C><U006C><U0079>"
 
-collating-symbol  <n_or_ny>
 collating-symbol  <ny>
 collating-element <N-Y> from "<U004E><U0059>"
 collating-element <N-y> from "<U004E><U0079>"
@@ -182,7 +181,6 @@ collating-element <n-N-y> from "<U006E><U004E><U0079>"
 collating-element <n-n-Y> from "<U006E><U006E><U0059>"
 collating-element <n-n-y> from "<U006E><U006E><U0079>"
 
-collating-symbol  <s_or_sz>
 collating-symbol  <sz>
 collating-element <S-Z> from "<U0053><U005A>"
 collating-element <S-z> from "<U0053><U007A>"
@@ -197,7 +195,6 @@ collating-element <s-S-z> from "<U0073><U0053><U007A>"
 collating-element <s-s-Z> from "<U0073><U0073><U005A>"
 collating-element <s-s-z> from "<U0073><U0073><U007A>"
 
-collating-symbol  <t_or_ty>
 collating-symbol  <ty>
 collating-element <T-Y> from "<U0054><U0059>"
 collating-element <T-y> from "<U0054><U0079>"
@@ -212,7 +209,6 @@ collating-element <t-T-y> from "<U0074><U0054><U0079>"
 collating-element <t-t-Y> from "<U0074><U0074><U0059>"
 collating-element <t-t-y> from "<U0074><U0074><U0079>"
 
-collating-symbol  <z_or_zs>
 collating-symbol  <zs>
 collating-element <Z-S> from "<U005A><U0053>"
 collating-element <Z-s> from "<U005A><U0073>"
@@ -227,8 +223,10 @@ collating-element <z-Z-s> from "<U007A><U005A><U0073>"
 collating-element <z-z-S> from "<U007A><U007A><U0053>"
 collating-element <z-z-s> from "<U007A><U007A><U0073>"
 
+collating-symbol <CAP-CAP>
 collating-symbol <CAP-MIN>
 collating-symbol <MIN-CAP>
+collating-symbol <MIN-MIN>
 collating-symbol <CAP-CAP-CAP>
 collating-symbol <CAP-CAP-MIN>
 collating-symbol <CAP-MIN-CAP>
@@ -239,6 +237,7 @@ collating-symbol <MIN-MIN-CAP>
 collating-symbol <MIN-MIN-MIN>
 
 reorder-after <MIN>
+<MIN-MIN>
 <MIN-CAP>
 <MIN-MIN-MIN>
 <MIN-MIN-CAP>
@@ -247,42 +246,38 @@ reorder-after <MIN>
 
 reorder-after <CAP>
 <CAP-MIN>
+<CAP-CAP>
 <CAP-MIN-MIN>
 <CAP-MIN-CAP>
 <CAP-CAP-MIN>
 <CAP-CAP-CAP>
 
 reorder-after <c>
-<c_or_cs>
 <cs>
 reorder-after <d>
-<d_or_dz>
-<d_or_dzs>
 <dz>
 <dzs>
 reorder-after <g>
-<g_or_gy>
 <gy>
 reorder-after <l>
-<l_or_ly>
 <ly>
 reorder-after <n>
-<n_or_ny>
 <ny>
 reorder-after <o>
 <odouble>
 reorder-after <s>
-<s_or_sz>
 <sz>
 reorder-after <t>
-<t_or_ty>
 <ty>
 reorder-after <u>
 <udouble>
 reorder-after <z>
-<z_or_zs>
 <zs>
 
+reorder-after <BAS>
+<SINGLE-OR-COMPOUND>
+<COMPOUND>
+
 reorder-after <o>
 <U00F6>	<odouble>;<REU>;<MIN>;IGNORE
 <U0151>	<odouble>;<DAC>;<MIN>;IGNORE
@@ -295,152 +290,157 @@ reorder-after <u>
 <U00DC>	<udouble>;<REU>;<CAP>;IGNORE
 <U0170>	<udouble>;<DAC>;<CAP>;IGNORE
 
+reorder-after <BAS>
+<ACA>
+<REU>
+<DAC>
+
 reorder-after <U0043>
-<C-S>		<cs>;<BAS>;<CAP>;IGNORE
-<C-s>		<cs>;<BAS>;<CAP-MIN>;IGNORE
-<C-C-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<C-C-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<C-c-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<C-c-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<C-S>		<cs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<C-s>		<cs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<C-C-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<C-C-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<C-c-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<C-c-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0063>
-<c-S>		<cs>;<BAS>;<MIN-CAP>;IGNORE
-<c-s>		<cs>;<BAS>;<MIN>;IGNORE
-<c-C-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<c-C-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<c-c-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<c-c-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<c-S>		<cs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<c-s>		<cs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<c-C-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<c-C-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<c-c-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<c-c-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0044>
-<D-Z>		<dz>;<BAS>;<CAP>;IGNORE
-<D-z>		<dz>;<BAS>;<CAP-MIN>;IGNORE
-<D-D-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<D-D-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<D-d-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<D-d-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<D-Z>		<dz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<D-z>		<dz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<D-D-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<D-D-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<D-d-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<D-d-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z>		<dz>;<BAS>;<MIN-CAP>;IGNORE
-<d-z>		<dz>;<BAS>;<MIN>;IGNORE
-<d-D-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<d-D-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<d-d-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<d-d-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<d-Z>		<dz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<d-z>		<dz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<d-D-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<d-D-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<d-d-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<d-d-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0044>
-<D-Z-S>		<dzs>;<BAS>;<CAP-CAP-CAP>;IGNORE
-<D-Z-s>		<dzs>;<BAS>;<CAP-CAP-MIN>;IGNORE
-<D-z-S>		<dzs>;<BAS>;<CAP-MIN-CAP>;IGNORE
-<D-z-s>		<dzs>;<BAS>;<CAP-MIN-MIN>;IGNORE
-<D-D-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-D-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-D-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-D-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
-<D-d-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-d-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-d-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-d-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-Z-S>		<dzs>;<COMPOUND>;<CAP-CAP-CAP>;IGNORE
+<D-Z-s>		<dzs>;<COMPOUND>;<CAP-CAP-MIN>;IGNORE
+<D-z-S>		<dzs>;<COMPOUND>;<CAP-MIN-CAP>;IGNORE
+<D-z-s>		<dzs>;<COMPOUND>;<CAP-MIN-MIN>;IGNORE
+<D-D-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-CAP>";IGNORE
+<D-D-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-MIN>";IGNORE
+<D-D-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-CAP>";IGNORE
+<D-D-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-d-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-CAP>";IGNORE
+<D-d-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-MIN>";IGNORE
+<D-d-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-CAP>";IGNORE
+<D-d-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z-S>		<dzs>;<BAS>;<MIN-CAP-CAP>;IGNORE
-<d-Z-s>		<dzs>;<BAS>;<MIN-CAP-MIN>;IGNORE
-<d-z-S>		<dzs>;<BAS>;<MIN-MIN-CAP>;IGNORE
-<d-z-s>		<dzs>;<BAS>;<MIN-MIN-MIN>;IGNORE
-<d-D-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-D-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-D-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-D-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
-<d-d-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-d-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-d-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-d-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-Z-S>		<dzs>;<COMPOUND>;<MIN-CAP-CAP>;IGNORE
+<d-Z-s>		<dzs>;<COMPOUND>;<MIN-CAP-MIN>;IGNORE
+<d-z-S>		<dzs>;<COMPOUND>;<MIN-MIN-CAP>;IGNORE
+<d-z-s>		<dzs>;<COMPOUND>;<MIN-MIN-MIN>;IGNORE
+<d-D-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-CAP>";IGNORE
+<d-D-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-MIN>";IGNORE
+<d-D-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-CAP>";IGNORE
+<d-D-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-d-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-CAP>";IGNORE
+<d-d-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-MIN>";IGNORE
+<d-d-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-CAP>";IGNORE
+<d-d-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-MIN>";IGNORE
 
 reorder-after <U0047>
-<G-Y>		<gy>;<BAS>;<CAP>;IGNORE
-<G-y>		<gy>;<BAS>;<CAP-MIN>;IGNORE
-<G-G-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<G-G-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<G-g-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<G-g-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<G-Y>		<gy>;<COMPOUND>;<CAP-CAP>;IGNORE
+<G-y>		<gy>;<COMPOUND>;<CAP-MIN>;IGNORE
+<G-G-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<G-G-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<G-g-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<G-g-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0067>
-<g-y>		<gy>;<BAS>;<MIN>;IGNORE
-<g-Y>		<gy>;<BAS>;<MIN-CAP>;IGNORE
-<g-G-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<g-G-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<g-g-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<g-g-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<g-Y>		<gy>;<COMPOUND>;<MIN-CAP>;IGNORE
+<g-y>		<gy>;<COMPOUND>;<MIN-MIN>;IGNORE
+<g-G-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<g-G-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<g-g-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<g-g-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U004C>
-<L-Y>		<ly>;<BAS>;<CAP>;IGNORE
-<L-y>		<ly>;<BAS>;<CAP-MIN>;IGNORE
-<L-L-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<L-L-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<L-l-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<L-l-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<L-Y>		<ly>;<COMPOUND>;<CAP-CAP>;IGNORE
+<L-y>		<ly>;<COMPOUND>;<CAP-MIN>;IGNORE
+<L-L-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<L-L-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<L-l-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<L-l-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006C>
-<l-y>		<ly>;<BAS>;<MIN>;IGNORE
-<l-Y>		<ly>;<BAS>;<MIN-CAP>;IGNORE
-<l-L-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<l-L-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<l-l-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<l-l-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<l-Y>		<ly>;<COMPOUND>;<MIN-CAP>;IGNORE
+<l-y>		<ly>;<COMPOUND>;<MIN-MIN>;IGNORE
+<l-L-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<l-L-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<l-l-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<l-l-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U004E>
-<N-Y>		<ny>;<BAS>;<CAP>;IGNORE
-<N-y>		<ny>;<BAS>;<CAP-MIN>;IGNORE
-<N-N-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<N-N-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<N-n-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<N-n-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<N-Y>		<ny>;<COMPOUND>;<CAP-CAP>;IGNORE
+<N-y>		<ny>;<COMPOUND>;<CAP-MIN>;IGNORE
+<N-N-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<N-N-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<N-n-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<N-n-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006E>
-<n-y>		<ny>;<BAS>;<MIN>;IGNORE
-<n-Y>		<ny>;<BAS>;<MIN-CAP>;IGNORE
-<n-N-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<n-N-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<n-n-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<n-n-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<n-Y>		<ny>;<COMPOUND>;<MIN-CAP>;IGNORE
+<n-y>		<ny>;<COMPOUND>;<MIN-MIN>;IGNORE
+<n-N-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<n-N-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<n-n-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<n-n-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0053>
-<S-Z>		<sz>;<BAS>;<CAP>;IGNORE
-<S-z>		<sz>;<BAS>;<CAP-MIN>;IGNORE
-<S-S-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<S-S-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<S-s-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<S-s-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<S-Z>		<sz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<S-z>		<sz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<S-S-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<S-S-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<S-s-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<S-s-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0073>
-<s-Z>		<sz>;<BAS>;<MIN-CAP>;IGNORE
-<s-z>		<sz>;<BAS>;<MIN>;IGNORE
-<s-S-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<s-S-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<s-s-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<s-s-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<s-Z>		<sz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<s-z>		<sz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<s-S-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<s-S-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<s-s-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<s-s-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0054>
-<T-Y>		<ty>;<BAS>;<CAP>;IGNORE
-<T-y>		<ty>;<BAS>;<CAP-MIN>;IGNORE
-<T-T-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<T-T-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<T-t-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<T-t-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<T-Y>		<ty>;<COMPOUND>;<CAP-CAP>;IGNORE
+<T-y>		<ty>;<COMPOUND>;<CAP-MIN>;IGNORE
+<T-T-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<T-T-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<T-t-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<T-t-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0074>
-<t-Y>		<ty>;<BAS>;<MIN-CAP>;IGNORE
-<t-y>		<ty>;<BAS>;<MIN>;IGNORE
-<t-T-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<t-T-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<t-t-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<t-t-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<t-Y>		<ty>;<COMPOUND>;<MIN-CAP>;IGNORE
+<t-y>		<ty>;<COMPOUND>;<MIN-MIN>;IGNORE
+<t-T-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<t-T-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<t-t-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<t-t-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U005A>
-<Z-S>		<zs>;<BAS>;<CAP>;IGNORE
-<Z-s>		<zs>;<BAS>;<CAP-MIN>;IGNORE
-<Z-Z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<Z-Z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<Z-z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<Z-z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<Z-S>		<zs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<Z-s>		<zs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<Z-Z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<Z-Z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<Z-z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<Z-z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U007A>
-<z-S>		<zs>;<BAS>;<MIN-CAP>;IGNORE
-<z-s>		<zs>;<BAS>;<MIN>;IGNORE
-<z-Z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<z-Z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<z-z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<z-z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<z-S>		<zs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<z-s>		<zs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<z-Z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<z-Z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<z-z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<z-z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-end
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-01-31 23:17       ` Egmont Koblinger
@ 2017-02-01  0:48         ` Carlos O'Donell
  2017-02-01  1:56           ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-02-01  0:48 UTC (permalink / raw)
  To: Egmont Koblinger, libc-locales

On 01/31/2017 06:16 PM, Egmont Koblinger wrote:
> Hi guys,
> 
> Could we please return to this old topic and get it addressed once and for all?
> 
> It's approximately the 8th (!!!) time I'm sending this out and/or
> pinging you. The patch has been there for more than a year. Not a
> single complaint, concern, question etc. was raised.
> 
> The patch not only addresses multiple bugs, but also backs it up with
> by far the most exhaustive, clean, modular collation unittest any
> locale has.
> 
> Mike, the last time (in last April) you said "i'm inclined to merge
> this"... but then nothing happend, I wonder why.
> 
> Please guys, either commit this patch, or give me a clean and
> actionable feedback about your concerns. Let me know if there's
> anything else you need, but pretty please, let's not delay this issue
> any further.

Egmont,

Thank you for your patience. Perhaps the best way to restart this conversation
is to cover what, if any, review, the changes have received and reference old
discussions about them.

My questions are:

* How does this compare to CLDR?

We are looking to try and harmonize between glibc and CLDR so we have common
collation across all languages that use the two collation APIs. This may be
an impossible goal to get perfect, but harmonization is just that, getting as
close as we can. I say impossible because glibc needs to be updated to actually
use the Unicode collation algorithm instead of the current code before any true
harmonization could happen.

If we had a test harness that could compare glibc to CLDR that would be great,
but we don't currently. So I'm not asking for this, but perhaps a spot test of
certain problematic values to see how CLDR compares would be good enough.

* Does the regression test pass?

* What kind of consequences might this have on existing programs?

* Can you find a Hungarian speaker to review and validate your changes?

I try to get secondary review from a native speaker for any Spanish work that
I do, that way at least it has had some peer review.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-01  0:48         ` Carlos O'Donell
@ 2017-02-01  1:56           ` Egmont Koblinger
  2017-02-01 16:01             ` Luis Javier Merino
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-02-01  1:56 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-locales

Hi Carlos,

> Thank you for your patience. Perhaps the best way to restart this conversation
> is to cover what, if any, review, the changes have received and reference old
> discussions about them.

I've described/linked all the individual bugs from this "meta"
bugzilla bug #18934. You can also look up this thread in the mailing
list archive, although I doubt there was too much additional
information there, I tried to make sure every important piece of
information is present in bugzilla.

> * How does this compare to CLDR?

Unfortunately I have no information whatsoever about CLDR's Hungarian
collation implementation.

As much as it would be great to make sure both versions are correct
and pass this unittest, unfortunately CLDR is outside of my personal
radar of interest. I hope they'll notice our unittests and adjust
their implementation accordingly if required. Or we could just file an
"FYI-bug" against them to get them look after it.

(When I originally created this patch, you could have probably
convinced me to take a look at CLDR too. Nowadays (i.e. for the next
couple of years) I have an extremely limited free time due to personal
reasons. I just pretty much want to close out the pending issues (like
this one). I don't have time to pick up any nontrivial new task.)

> * Does the regression test pass?

What do you exactly mean by "regression tests"? There was no unittest
for hu_HU previously, the newly created one obviously passes during a
"make tests". I suspect you're referring to something else; if so
please clarify.

> * What kind of consequences might this have on existing programs?

There's nothing brand new, nothing "big" change in the collation
order. From the users' point of view, it's really a few "small" fixes
of a few rare corner cases.

(Let me give an interesting example. After 30 years, a new standard
for the Hungarian grammar rules was released in Sep 2015. The previous
one did not specify the collation order of uppercase and lowercase
counterpart of the same letter. The new one does. Accidentally,
however, we did not need to change it, the old implementation just
happens to be the one that's specified in the standard now. Had it
been the other way around, it would probably be a noticeable "big"
change.)

So, the fixes only revolve around a bit more special cases, effect
only a tiny subset of the actual words or artificially made-up
strings.

I recommend that you sort the unittest file with the old locale
definition (take care to remove the comments and trailing spaces if
you do it "manually" with "sort" rather than with glibc's "make
tests") and see the diff. Especially at the first part (the examples
from the official rules, rather than my tests which focus on the
corner cases) you won't see too much difference.

> * Can you find a Hungarian speaker to review and validate your changes?

I'm the person who contributed the last perhaps 5 or 6 (maybe even
more) changes to the locale file, some of them improving the
collation, some touching other parts. I also admit in the "meta" bug
that one of the changes did introduce a regression that I did not
notice then; I fixed it now. None of those previous changes were
backed up by any tests. The new ones are, and me having introduced a
regression was a huge motivation for creating these tests.

As linked from the meta bug, someone introduced a change that broke
many locales, including Hungarian. I really doubt he was asked to get
his work peer reviewed. In fact, this is still an open issue nobody
cares about!!!

I remember many-many years ago some random Hungarian guy came along,
submitted a patch to the collation definition which got accepted.
Turned out, he implemented his personal favorite rather than the
standard. Then I had to prove by scanning pages from dictionaries that
he was wrong with the sorting order to get it reverted. (I'm lazy to
look up pointers, sorry.)

I'm pondering... seems to me that in your project if someone comes
along and just changes something without explanation, you accept it;
but if he gives quite a lot of proof about his work's quality then you
ask for even more??

Please take a look at the comments of the new unittest. It links to
the official online version of the Hungarian grammar rules, gives a
short summary about each collation rule (because Hungarian is such a
weird language that you don't have much chance to understand what
Google/Bing Translate says), and copies all the examples from there.
You're free to verify that I've copied them correctly. Plus I add a
whole lot more which are also explained in comments. Note that the
basic collation rules are explained in the locale definition file
itself, I haven't changed them and they're consistent with what I say
in the unittests.

I'm sorry but I don't know any Hungarian guy who has any insight into
these locale definitions to do a peer review (other than the one who
implemented his personal favorite - I wouldn't trust him).

Seriously, please take a close look at the "meta" bug, the individual
bugs linked from there, and the new unittest itself and the comments
within. Please let me know if they are not convincing enough. I'm not
claiming that the new version is 100% guaranteed to be bug-free
(although I sure hope so). I'm saying it's obviously significantly
better than the previous one, and the unittests provide a solid
grounds for further improving it without possibly introducing a
regression, should there still be any bugs to fix.


Thanks a lot,
egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-01  1:56           ` Egmont Koblinger
@ 2017-02-01 16:01             ` Luis Javier Merino
  2017-02-02  0:04               ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Luis Javier Merino @ 2017-02-01 16:01 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Carlos O'Donell, libc-locales

I did some investigation of Hungarian collation for a code golf at
http://codegolf.stackexchange.com/a/75599/267

Hungarian has digraphs and trigraphs (cs, dz, dzs, gy, ly, ny, sz, ty,
zs). It also has geminated (long) consonants, which are represented by
writing the consonant twice. In the case of digraphs/trigraphs, they
can be written in a long (duplicate the whole digraph/trigraph) and
short form (duplicate only the first consonant of the
digraph/trigraph).

Not all occurrences of the consonants in a digraph/trigraph represent
a digraph/trigraph, e.g: in házszám zs doesn't represent a digraph,
but sz does. This means you need a dictionary or similar to get a
(nearly) fully correct collation. IIRC, LibreOffice uses libhnj, which
uses rules derived from a dictionary.

These are the differences I noticed between Egmont's testsuite and ICU:

 - Egmont collates the short forms before the full forms (ssz < szsz,
..., zzs < zszs ), ICU collates the long forms before the short forms
starting at L3 Case and Variants (szsz <3 ssz, ..., zszs <3 zzs ). I
don't think that is specified in the grammar rules, but I can't read
Hungarian.

 - ICU treats weirdly capitalized groups as
non-contractions/non-digraphs/non-trigraphs, e.g: ccS <3 CcS <3 cCs <3
cCS <3 CCs <3 cS <3 cs <3 Cs <3 CS <3 ccs <3 Ccs <3 CCS

I don't know which behavior comes from the CLDR, and which is specific to ICU.

(where I talk about glibc in the post at codegolf.se, I actually talk
about glibc with Egmont's patch, which I assumed would be merged
soon).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-01 16:01             ` Luis Javier Merino
@ 2017-02-02  0:04               ` Egmont Koblinger
  2017-02-02 13:28                 ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-02-02  0:04 UTC (permalink / raw)
  To: Luis Javier Merino; +Cc: Carlos O'Donell, libc-locales

Hi Luis, others,

TLDR:

Nice inversigation from someone not speaking our language, thumbs up :)

Your observations are all correct. I'm extending them with more
examples and explanation. Note that my patch does not change the
intent how tokenization should happen, basically whatever you show is
what is meant to be implemented now. There are just bugs in its
implementation which I'm fixing (and, as you point out, there'll still
remain bugs due to not adding a dictionary for the ambiguous cases).

The differences between ICU and my version: Yup, they are not
specified in the standard, and are about artificially made up strings
that don't occur in Hungarian text. I give some rationale why I chose
the way I chose, but ICU is not wrong at all either.


Long version:

> I did some investigation of Hungarian collation for a code golf at
> http://codegolf.stackexchange.com/a/75599/267
>
> Hungarian has digraphs and trigraphs (cs, dz, dzs, gy, ly, ny, sz, ty,
> zs). It also has geminated (long) consonants, which are represented by
> writing the consonant twice. In the case of digraphs/trigraphs, they
> can be written in a long (duplicate the whole digraph/trigraph) and
> short form (duplicate only the first consonant of the
> digraph/trigraph).

This is absolutely correct. (I'm also happy to learn the proper
terminology from you.)

Note: It's not up to the writer to freely choose between the two
forms. The long form must be used at compound word boundaries, e.g.
(see both in your stackexchange page and in my unittests) "nagy" [big]
+ "gyakorlat" [excercise] becomes "nagygyakorlat". The shorthand form
must be used otherwise, e.g. "naggyal" [with big], the agglutinative
suffix ("gyal" [with] in this particular case) does not count as a
separate word to form a compound word with.

> Not all occurrences of the consonants in a digraph/trigraph represent
> a digraph/trigraph, e.g: in házszám zs doesn't represent a digraph,
> but sz does. This means you need a dictionary or similar to get a
> (nearly) fully correct collation. IIRC, LibreOffice uses libhnj, which
> uses rules derived from a dictionary.

Again, this is correct. Combinations such as "zsz" require knowing the
language to tell wither it's z+sz or zs+z. Someone not speaking the
language would probably guess it right with a 50-50 chance.

Even simple diagraphs are ambiguous and require knowing the language,
e.g. the words "pácsó" or "malacsült" are compound words at the
boundary between c and s, it's not a cs diagraph.

Another interesting ambiguous case is "ssz", this can stand for s+sz
or sz+sz. Example: "karosszék" [armchair] is "kar" [arm] -> "karos"
[something with an arm] + "szék" [chair], hence it's s+sz. For
"karosszéria" [car's body/chassis] one could think it's coming from
"karos" + "széria" [series], but this doesn't make any sense. It's
probably coming from Italian "carrozzeria", hence it's sz+sz.
(Obviously the pronounciation is also different in these two cases.)

The current implementation is eager, always tries to combine as many
glyphs as possible to form a short or long diagraph or trigraph. As
such, it erroneously tokenizes "házszám" as h+á+zs+z+á+m instead of
h+á+z+sz+á+m, "pácsó" as p+á+cs+ó instead of p+á+c+s+ó, etc. This has
been the case (except for bugs maybe, so let's rather say this has
been the clear intent) probably even before I first touched the locale
definition, and is still the intent.

I have no plans to add dictionary of exception words to the glibc
locale definition, nor to analyze the frequencies (e.g. probably "zsz"
more often stands for z+sz than zs+z, yet glibc goes for the latter).
The current rules are good enough in the sense that they mistokenize
only a tiny, almost negligible subset of words, and even when they do,
the chance of this resulting in swapping order with another word is
even much smaller.

> These are the differences I noticed between Egmont's testsuite and ICU:
>
>  - Egmont collates the short forms before the full forms (ssz < szsz,
> ..., zzs < zszs ), ICU collates the long forms before the short forms
> starting at L3 Case and Variants (szsz <3 ssz, ..., zszs <3 zzs ). I
> don't think that is specified in the grammar rules, but I can't read
> Hungarian.

(I'm not sure what ICU is and what's its relation to CLDR. Nevermind.)

You are correct that this is not specified in the orthography rules.
This is probably because there are no actual words that do make sense
with both ways of spelling.

It would cause problems if e.g. someone invented a new word
"karosszéria" in the meaning [series with arms]. It could have caused
problems for a year after the release of the 12th version of the rules
with the word "ész" [mind] + "szerű" [-like, -ish] = "ésszerű"
[rational, reasonable] according to the previous standard ("szerű"
used to be considered an agglutinative suffix), but now is spelled
"észszerű" ("szerű" is now considered a standalone word, so it has
become a compound word). For a year both the old and the new versions
of the standard were valid. This one year has already elapsed, making
the previous spelling incorrect.

In an earlier version of glibc, ssz and szsz used to collate equally,
causing problems for some users and some software. See bug 13547, with
a further pointer where I found this problem being reported. The
reported ran sort and uniq on files that contained lines similar to
ZZSZSSPKKPKP and found that uniq removed some unique lines. (This is
where my fix unfortunately introduced a regression which I'm also
fixing now.)

I decided on ssz < szsz along this reasoning: I was thinking that as
per "karosszék" vs. "karosszéria" above, if you can't tell for sure
whether it's s+sz or sz+sz, let's sort in between the two that are
known for sure. To prove my point, let's make up two compound words:
"kés" [knife] + "szerű" and "kész" [ready] + "szerű". The correct
tokenization is obviously s+sz and sz+sz respectively, and hence this
is the required alphabetical ordering. My new version tokenizes both
as sz+sz, yet a weaker property ends up sorting them correctly. It
breaks, however, as soon as you continue the first word with another
agglutinative suffix. ICU sorts them incorrectly right away. Since
we're already in the gray zone of ambiguous, easily mistokenizable
words and rare, artificially constructed examples, I cannot say that
ICU's approach is wrong at all.

>  - ICU treats weirdly capitalized groups as
> non-contractions/non-digraphs/non-trigraphs, e.g: ccS <3 CcS <3 cCs <3
> cCS <3 CCs <3 cS <3 cs <3 Cs <3 CS <3 ccs <3 Ccs <3 CCS

Yet again an absolutely forced corner case that does not happen with real words.

The official rules [akh12 - link below] bullet points 3 and 8 show
that in all-uppercase context all the letters of diagraphs/trigraphs
become uppercase. Abbreviations and similar constructs are detailed in
276-289. With very few exceptions, the examples (in bold) are either
all lowercase, or an initial uppercase followed by all lowercase, or
all uppercase (at least up to the hyphen which attaches agglutinative
suffixes). The few exceptions are units (e.g. kB, kWh), ÉNy DNy
[northwest, southwest], that's pretty much it. So "weirdly capitalized
groups", as you say, really don't matter in practice.

I see rationale in what ICU does, but it also imposes some questions.
E.g. no Hungarian word starts with geminated (long) consonant. So then
shouldn't "Ccs" be tokenized as C+c+s? Or C+cs? Whereas "CCS" and
"ccs" should still be tokenized as CS+CS or cs+cs because those can
easily appear in the middle of words.

Again it's a gray zone, I went for the one that's simpler, technically
cleaner, provides a nicer structure in the definition file as well as
the tests etc, but I can't say the other approach is wrong.

Again, it's not specified in the standard, is of marginal (if any)
practical use, and I did not conceptually change glibc's behavior,
just fixed bugs/inconsistencies in its previous implementations.


Note: You've covered the collation of consonants, not the vowels.
That's another (simpler, almost unambiguous) story.


Thanks a lot,
egmont

[akh12] http://helyesiras.mta.hu/helyesiras/default/akh12

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-02  0:04               ` Egmont Koblinger
@ 2017-02-02 13:28                 ` Carlos O'Donell
  2017-02-02 19:00                   ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-02-02 13:28 UTC (permalink / raw)
  To: Egmont Koblinger, Luis Javier Merino; +Cc: libc-locales

On 02/01/2017 07:04 PM, Egmont Koblinger wrote:
> Hi Luis, others,
> 
> TLDR:
> 
> Nice inversigation from someone not speaking our language, thumbs up :)

I agree. I enjoyed reading both of your detailed discussions.

> (I'm not sure what ICU is and what's its relation to CLDR. Nevermind.)

ICU is "International Components for Unicode"
http://site.icu-project.org/

ICU provides a very thorough API for Unicode and globalization.

CLDR is "Unicode Common Locale Data Repository"

CLDR provides the data that is used by ICU implementations.

So in any given program you would link against the ICU library and
the manipulations you make are driven by the data from CLDR.

The goal with glibc locale data is to attempt to harmonize with CLDR
such that applications using ICU APIs and glibc APIs get as close to
the same results as possible.

It does users a disservice if Java and C are arbitrarily different
in this regard across the two most commonly used APIs for localization.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-02 13:28                 ` Carlos O'Donell
@ 2017-02-02 19:00                   ` Egmont Koblinger
  2017-02-05 12:16                     ` Luis Javier Merino
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-02-02 19:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Luis Javier Merino, libc-locales

Hi,

Thanks a lot for the information about ICU.

I'd like to emphasize again that my patch does not really touch those
bits where glibc and ICU differ (the unspecified cases) (or actually,
for the second of the two pointed out by Luis, I think I made glibc
more consistent with itself - but not touched the basic intent around
tokenization of weird capitalizations etc.). If we care about
harmonizing glibc and ICU in these unspecified cases, this should be
another extremely-low-prio issue, and we'd need to begin with
verifying that the well specified cases indeed work correctly and the
same way in the two.

Instead of discussing possible extremely-low-prio issues around
unspecified cases that my patch doesn't change, could we please focus
on the things that my patch actually does? :)

thanks,
egmont



On Thu, Feb 2, 2017 at 2:27 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/01/2017 07:04 PM, Egmont Koblinger wrote:
>> Hi Luis, others,
>>
>> TLDR:
>>
>> Nice inversigation from someone not speaking our language, thumbs up :)
>
> I agree. I enjoyed reading both of your detailed discussions.
>
>> (I'm not sure what ICU is and what's its relation to CLDR. Nevermind.)
>
> ICU is "International Components for Unicode"
> http://site.icu-project.org/
>
> ICU provides a very thorough API for Unicode and globalization.
>
> CLDR is "Unicode Common Locale Data Repository"
>
> CLDR provides the data that is used by ICU implementations.
>
> So in any given program you would link against the ICU library and
> the manipulations you make are driven by the data from CLDR.
>
> The goal with glibc locale data is to attempt to harmonize with CLDR
> such that applications using ICU APIs and glibc APIs get as close to
> the same results as possible.
>
> It does users a disservice if Java and C are arbitrarily different
> in this regard across the two most commonly used APIs for localization.
>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-02 19:00                   ` Egmont Koblinger
@ 2017-02-05 12:16                     ` Luis Javier Merino
  2017-02-05 16:30                       ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Luis Javier Merino @ 2017-02-05 12:16 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Carlos O'Donell, libc-locales

I've had a further look at Egmont's patch. It does the following:

- It reverts b008d4c (the "fix" for BZ#13547, which broke collation in
other ways). Reverting this brings collation more in line with ICU.
- It defines DIACRIT_FORWARD. This brings collation more in line with ICU.
- It fixes BZ#18587, defining collating symbols <MIN-MIN> and
<CAP-CAP>. Before, collation went cs (<MIN>) < cS (<MIN-CAP>) < CS
(<CAP>) < Cs (<CAP-MIN>). After, it goes cs (<MIN-MIN>) < cS
(<MIN-CAP>) < Cs (<CAP-MIN>) < CS (<CAP-CAP>). This brings collation a
little more in line with ICU.
- It introduces <SINGLE_OR_COMPOUND> and <COMPOUND> collating symbols,
and assigns secondary weights to digraphs/trigraphs and contracted
digraphs/trigraphs using them. <SINGLE_OR_COMPOUND> is ordered before
<COMPOUND>, which makes short forms collate belong long forms. b008d4c
already made short forms collate before long forms, by ordering
<c_or_cs> and the like before <cs> and the like. ICU doesn't collate
long forms before short forms until level 3. Perl collates them
stably, i.e. just as they appear in the input. In any case, ordering
<COMPOUND> before <SINGLE_OR_COMPOUND> would give ICU's ordering,
which I'm not at all sure it's better. Applying Egmont's patch doesn't
divert from ICU further than b008d4c did, and fixes other things.

I've noticed another difference with respect to ICU:

- When a word appears both with and without hyphen (pingpong and
ping-pong), they collate differently. This probably applies to all
glibc locales. ICU probably changes ordering when selecting a
different algorithm for variable weighings: Perl gives glibc ordering
(hyphenated word before non-hyphenated word) for "Shifted" and
"Non-Ignorable", the opposite ordering for "Shift-Trimmed" and
"Blanked".

So, to recap the other differences to ICU:

- ICU sorts long forms before short forms at L3. Perl collates as per
the input ordering. This can be changed in Egmont's patch by
reordering <COMPOUND> before <SINGLE_OR_COMPOUND>, but I'm not sure
that's better.
- ICU doesn't recognize some mixed case combinations as
digraphs/trigraphs, e.g. cS is treated as
<c><s>;<BAS><BAS>;<MIN><CAP>, not as <cs>;<BAS>;<MIN-CAP>. Perl and
glibc recognize them. Looking at some historical files in CLDR repo,
AIX and MS behaved as ICU, Sun JDK and IBM JDK behaved as glibc. I
haven't looked at the full CLDR repo. The following may be
interesting: http://unicode.org/cldr/trac/ticket/889 and
http://unicode.org/cldr/trac/changeset/1450/trunk/common/collation/hu.xml
, recognition of those digraphs is still marked as unconfirmed draft
in the latest version of the file.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-05 12:16                     ` Luis Javier Merino
@ 2017-02-05 16:30                       ` Egmont Koblinger
  2017-02-09 22:20                         ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-02-05 16:30 UTC (permalink / raw)
  To: Luis Javier Merino; +Cc: Carlos O'Donell, libc-locales

Hi Luis,

Thanks again for your valuable input! I hope it'll help us move forward.

> - When a word appears both with and without hyphen (pingpong and
> ping-pong), they collate differently.

This is another case where I haven't touched anything. I was happy
that spaces, hyphens were "accidentally" treated the way the Hungarian
rules specify (and the unittests verify to some extent). The rules say
that spaces and hyphens should be ignored -- but does not specify what
should happen if they are the only difference. Glibc's ordering seems
to be "pingpong" < "ping pong" < "ping-pong" which I personally don't
like, I'd prefer "pingpong" being at the end. Anyway, if we're about
to change this at all, it should be a subsequent separate change.

The standard is not only unspecified in certain cases, it also says in
bullet point 14e that in some cases different rules than the ones
specified might be used, e.g. sort based on the first unit. Similarly,
point 16 mentions that in some cases it's desired to use a generic
Latin alphabet that doesn't know anything about Hungarian compound
letters and such.

Back to 14e, one typical example is phone books. Note that in
Hungarian the names are in "reverse" order, family name followed by
given name. According to 14d, the ordering should be "Kiss Tamás" <
"Kis Tamás". This is counterintuitive and prevents grouping (family
name written out only once for multiple entries). Phone books order
the family names, and within the same family name they order the given
names.

I think it's beyond glibc's scope to address different possible
variations of collations. I, for one, have no desire whatsoever trying
to come up with various hu_HU@whatever collation definitions.

cheers,
egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-05 16:30                       ` Egmont Koblinger
@ 2017-02-09 22:20                         ` Egmont Koblinger
  2017-02-10 15:06                           ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-02-09 22:20 UTC (permalink / raw)
  To: Luis Javier Merino; +Cc: Carlos O'Donell, libc-locales

Carlos, any news?

Did Luis's and my comment help you move forward?

I'd like to emphasize again that my patch does not do anythinig
serious. No big redesign, no fundamental change, nothing like this.
The things Luis mentioned were either already implemented that way, or
I did not touch them. It's just a few, technically small bugfixes that
I made. Really nothing big deal. Plus unittests.

I have, a long time ago, offered that I can turn this all-in-one patch
into like 4-5 patches to be applied on top of each other. But then
they'd have to be reviewed and applied in a particular order (because
they'd heavily conflict) at once. I know that generally this is the
preferred approach, however, it cannot work together with test driven
development since there's no way to test the intermediate (i.e.
deliberately still broken) states. Having chosen TDD, the result of my
work was a patch that fixes all the referred bugs in a single step. I
can, I still offer to spend some more time on it to create a few
smaller, easier to review patches *if* seriously that is what's
missing from getting my work accepted. Let me know.

cheers,
egmont



On Sun, Feb 5, 2017 at 5:29 PM, Egmont Koblinger <egmont@gmail.com> wrote:
> Hi Luis,
>
> Thanks again for your valuable input! I hope it'll help us move forward.
>
>> - When a word appears both with and without hyphen (pingpong and
>> ping-pong), they collate differently.
>
> This is another case where I haven't touched anything. I was happy
> that spaces, hyphens were "accidentally" treated the way the Hungarian
> rules specify (and the unittests verify to some extent). The rules say
> that spaces and hyphens should be ignored -- but does not specify what
> should happen if they are the only difference. Glibc's ordering seems
> to be "pingpong" < "ping pong" < "ping-pong" which I personally don't
> like, I'd prefer "pingpong" being at the end. Anyway, if we're about
> to change this at all, it should be a subsequent separate change.
>
> The standard is not only unspecified in certain cases, it also says in
> bullet point 14e that in some cases different rules than the ones
> specified might be used, e.g. sort based on the first unit. Similarly,
> point 16 mentions that in some cases it's desired to use a generic
> Latin alphabet that doesn't know anything about Hungarian compound
> letters and such.
>
> Back to 14e, one typical example is phone books. Note that in
> Hungarian the names are in "reverse" order, family name followed by
> given name. According to 14d, the ordering should be "Kiss Tamás" <
> "Kis Tamás". This is counterintuitive and prevents grouping (family
> name written out only once for multiple entries). Phone books order
> the family names, and within the same family name they order the given
> names.
>
> I think it's beyond glibc's scope to address different possible
> variations of collations. I, for one, have no desire whatsoever trying
> to come up with various hu_HU@whatever collation definitions.
>
> cheers,
> egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-09 22:20                         ` Egmont Koblinger
@ 2017-02-10 15:06                           ` Carlos O'Donell
  2017-02-15 18:03                             ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-02-10 15:06 UTC (permalink / raw)
  To: Egmont Koblinger, Luis Javier Merino; +Cc: libc-locales

On 02/09/2017 05:19 PM, Egmont Koblinger wrote:
> Carlos, any news?
> 
> Did Luis's and my comment help you move forward?
> 
> I'd like to emphasize again that my patch does not do anythinig
> serious. No big redesign, no fundamental change, nothing like this.
> The things Luis mentioned were either already implemented that way, or
> I did not touch them. It's just a few, technically small bugfixes that
> I made. Really nothing big deal. Plus unittests.
> 
> I have, a long time ago, offered that I can turn this all-in-one patch
> into like 4-5 patches to be applied on top of each other. But then
> they'd have to be reviewed and applied in a particular order (because
> they'd heavily conflict) at once. I know that generally this is the
> preferred approach, however, it cannot work together with test driven
> development since there's no way to test the intermediate (i.e.
> deliberately still broken) states. Having chosen TDD, the result of my
> work was a patch that fixes all the referred bugs in a single step. I
> can, I still offer to spend some more time on it to create a few
> smaller, easier to review patches *if* seriously that is what's
> missing from getting my work accepted. Let me know.

I think your patch is ready to go, but we need a senior person with
commit privileges to review and check it in.

Please ping me again next week on Wednesday and I'll arrange to try
test and checkin on Thursday.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-10 15:06                           ` Carlos O'Donell
@ 2017-02-15 18:03                             ` Egmont Koblinger
  2017-02-16  2:36                               ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-02-15 18:03 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Luis Javier Merino, libc-locales

Hi Carlos,

Here's the Wednesday ping you requested :)

thanks,
egmont

On Fri, Feb 10, 2017 at 4:06 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/09/2017 05:19 PM, Egmont Koblinger wrote:
>> Carlos, any news?
>>
>> Did Luis's and my comment help you move forward?
>>
>> I'd like to emphasize again that my patch does not do anythinig
>> serious. No big redesign, no fundamental change, nothing like this.
>> The things Luis mentioned were either already implemented that way, or
>> I did not touch them. It's just a few, technically small bugfixes that
>> I made. Really nothing big deal. Plus unittests.
>>
>> I have, a long time ago, offered that I can turn this all-in-one patch
>> into like 4-5 patches to be applied on top of each other. But then
>> they'd have to be reviewed and applied in a particular order (because
>> they'd heavily conflict) at once. I know that generally this is the
>> preferred approach, however, it cannot work together with test driven
>> development since there's no way to test the intermediate (i.e.
>> deliberately still broken) states. Having chosen TDD, the result of my
>> work was a patch that fixes all the referred bugs in a single step. I
>> can, I still offer to spend some more time on it to create a few
>> smaller, easier to review patches *if* seriously that is what's
>> missing from getting my work accepted. Let me know.
>
> I think your patch is ready to go, but we need a senior person with
> commit privileges to review and check it in.
>
> Please ping me again next week on Wednesday and I'll arrange to try
> test and checkin on Thursday.
>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-15 18:03                             ` Egmont Koblinger
@ 2017-02-16  2:36                               ` Carlos O'Donell
  2017-02-21 14:55                                 ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-02-16  2:36 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Luis Javier Merino, libc-locales

On 02/15/2017 01:02 PM, Egmont Koblinger wrote:
> Hi Carlos,
> 
> Here's the Wednesday ping you requested :)

Thanks, I'll schedule this in for my Thursday reviews.

Cheers,
Carlos.
 
> thanks,
> egmont
> 
> On Fri, Feb 10, 2017 at 4:06 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 02/09/2017 05:19 PM, Egmont Koblinger wrote:
>>> Carlos, any news?
>>>
>>> Did Luis's and my comment help you move forward?
>>>
>>> I'd like to emphasize again that my patch does not do anythinig
>>> serious. No big redesign, no fundamental change, nothing like this.
>>> The things Luis mentioned were either already implemented that way, or
>>> I did not touch them. It's just a few, technically small bugfixes that
>>> I made. Really nothing big deal. Plus unittests.
>>>
>>> I have, a long time ago, offered that I can turn this all-in-one patch
>>> into like 4-5 patches to be applied on top of each other. But then
>>> they'd have to be reviewed and applied in a particular order (because
>>> they'd heavily conflict) at once. I know that generally this is the
>>> preferred approach, however, it cannot work together with test driven
>>> development since there's no way to test the intermediate (i.e.
>>> deliberately still broken) states. Having chosen TDD, the result of my
>>> work was a patch that fixes all the referred bugs in a single step. I
>>> can, I still offer to spend some more time on it to create a few
>>> smaller, easier to review patches *if* seriously that is what's
>>> missing from getting my work accepted. Let me know.
>>
>> I think your patch is ready to go, but we need a senior person with
>> commit privileges to review and check it in.
>>
>> Please ping me again next week on Wednesday and I'll arrange to try
>> test and checkin on Thursday.
>>
>> --
>> Cheers,
>> Carlos.


-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-16  2:36                               ` Carlos O'Donell
@ 2017-02-21 14:55                                 ` Egmont Koblinger
  2017-02-22 17:36                                   ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-02-21 14:55 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Luis Javier Merino, libc-locales

Hi Carlos,

Any news on this one?

In the mean time I'd like to confirm that the unittests still pass as
of the brand new Unicode 9.0 commit.

cheers,
egmont

On Thu, Feb 16, 2017 at 3:36 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/15/2017 01:02 PM, Egmont Koblinger wrote:
>> Hi Carlos,
>>
>> Here's the Wednesday ping you requested :)
>
> Thanks, I'll schedule this in for my Thursday reviews.
>
> Cheers,
> Carlos.
>
>> thanks,
>> egmont
>>
>> On Fri, Feb 10, 2017 at 4:06 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 02/09/2017 05:19 PM, Egmont Koblinger wrote:
>>>> Carlos, any news?
>>>>
>>>> Did Luis's and my comment help you move forward?
>>>>
>>>> I'd like to emphasize again that my patch does not do anythinig
>>>> serious. No big redesign, no fundamental change, nothing like this.
>>>> The things Luis mentioned were either already implemented that way, or
>>>> I did not touch them. It's just a few, technically small bugfixes that
>>>> I made. Really nothing big deal. Plus unittests.
>>>>
>>>> I have, a long time ago, offered that I can turn this all-in-one patch
>>>> into like 4-5 patches to be applied on top of each other. But then
>>>> they'd have to be reviewed and applied in a particular order (because
>>>> they'd heavily conflict) at once. I know that generally this is the
>>>> preferred approach, however, it cannot work together with test driven
>>>> development since there's no way to test the intermediate (i.e.
>>>> deliberately still broken) states. Having chosen TDD, the result of my
>>>> work was a patch that fixes all the referred bugs in a single step. I
>>>> can, I still offer to spend some more time on it to create a few
>>>> smaller, easier to review patches *if* seriously that is what's
>>>> missing from getting my work accepted. Let me know.
>>>
>>> I think your patch is ready to go, but we need a senior person with
>>> commit privileges to review and check it in.
>>>
>>> Please ping me again next week on Wednesday and I'll arrange to try
>>> test and checkin on Thursday.
>>>
>>> --
>>> Cheers,
>>> Carlos.
>
>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-21 14:55                                 ` Egmont Koblinger
@ 2017-02-22 17:36                                   ` Carlos O'Donell
  2017-03-15 20:37                                     ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-02-22 17:36 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Luis Javier Merino, libc-locales

On 02/21/2017 09:54 AM, Egmont Koblinger wrote:
> Hi Carlos,
> 
> Any news on this one?
> 
> In the mean time I'd like to confirm that the unittests still pass as
> of the brand new Unicode 9.0 commit.

I've started a regression test with your v5 patch against master.

Thanks for your patience.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-02-22 17:36                                   ` Carlos O'Donell
@ 2017-03-15 20:37                                     ` Egmont Koblinger
  2017-03-16 17:41                                       ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-03-15 20:37 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Luis Javier Merino, libc-locales

Hi guys,

Any progress on this one?

What is blocking or delaying this issue? Is there anything I can help
to move forward quicker?

thanks,
egmont

On Wed, Feb 22, 2017 at 6:36 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/21/2017 09:54 AM, Egmont Koblinger wrote:
>> Hi Carlos,
>>
>> Any news on this one?
>>
>> In the mean time I'd like to confirm that the unittests still pass as
>> of the brand new Unicode 9.0 commit.
>
> I've started a regression test with your v5 patch against master.
>
> Thanks for your patience.
>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-03-15 20:37                                     ` Egmont Koblinger
@ 2017-03-16 17:41                                       ` Carlos O'Donell
  2017-03-21 22:40                                         ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-03-16 17:41 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Luis Javier Merino, libc-locales

On 03/15/2017 04:36 PM, Egmont Koblinger wrote:
> Hi guys,
> 
> Any progress on this one?
> 
> What is blocking or delaying this issue? Is there anything I can help
> to move forward quicker?

At a high level the patch looks perfect.

I have only minor details that we should fix before commit. Almost
there.

Please review the changes below.

In hu_HU.in:

(a) tests you created.

+alphabet a              ; These tests were created by egmont@gmail.com.

Exactly which tests were created by you?

Normally we don't include attribution in source files like this, instead
I'll add an item to NEWS about the additional coverage for Hungarian
and credit you there with the various tests you created.

In summary:
- Remove this comment.
- Add a bigger NEWS entry to the patch and I'll help review that.
  You'll also get attribution in the git commit author.
  Is that sufficient for you?

(b) foreign accents.

+foreign-o1 ó            ; The rules are not explicit whether foreign accents on top of o or u
+foreign-o1 ò            ; should be sorted among o-ó and u-ú, or among ö-ő and ü-ű,
+foreign-o1 òp           ; but the example with Møsstrand makes it clear that it's the former.

I assume the Møsstrand example refers to:

+AkH-15 cérna            ; #15: Foreign accents are ignored, unless they're the only difference,
+AkH-15 Černý            ; in which case they are sorted after the Hungarian ones (in unspecified order).
...
+AkH-15 Goethe
+AkH-15 moshat
+AkH-15 mosna
+AkH-15 Mošna
+AkH-15 mosópor
+AkH-15 Møsstrand
        ^^^^^^^^^
+AkH-15 mostan
+AkH-15 munka
+AkH-15 Muñoz
...

If so then please clarify in the comment that you are referring
to test AkH-15.

Did you also design the foreign accent tests?

+foreign-a1 á            ; More thorough tests for foreign accents (#15).
+foreign-a1 à
+foreign-a1 àp
+foreign-a1 áq
+foreign-a2 á
...

If you did create them then you should mention the new tests in the NEWS entry also.

In summary:
- Remove attribution at the file level.
- Add clarification to Møsstrand comment for future readers.
- Add a NEWS entry with verbose description of the update and new tests created by you.

Please post v2 and I'll commit that right away.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-03-16 17:41                                       ` Carlos O'Donell
@ 2017-03-21 22:40                                         ` Egmont Koblinger
  2017-03-22  1:03                                           ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-03-21 22:40 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Luis Javier Merino, libc-locales

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

Hi Carlos,

I'm not sure I properly understood or addresed all your concerns, so
please double check :)

> +alphabet a              ; These tests were created by egmont@gmail.com.
>
> Exactly which tests were created by you?
>
> Normally we don't include attribution in source files like this

It just would have felt weird to me not to say anything about where
these tests come from, especially after the "AkH" section that points
to the source. I've reworded as "All the remaining tests were added by
glibc", is this okay?

> , instead
> I'll add an item to NEWS about the additional coverage for Hungarian
> and credit you there with the various tests you created.

I'm not sure if we're on the same page with the NEWS file. It seems to
highlight the bigger changes, and have an automatically generated
1-liner for smaler ones. I don't think the Hungarian sorting order
deserves to be among the bigger ones, I'm perfectly fine with the
1-liner.

Instead, I see the git log entries contain a pretty detailed
description of the changes, so I tried to create a quite verbose one.

> In summary:
> - Remove this comment.
> - Add a bigger NEWS entry to the patch and I'll help review that.
>   You'll also get attribution in the git commit author.
>   Is that sufficient for you?

Absolutely! By the way, I'm also mentioned in the locale definition
file, I've added the current date there as well. We might squeeze that
a bit, e.g. keep the years only, what do you think? (I don't care
about crediting too much, I just think it's a big enough change within
this file not to forget mentioning its date).

> I assume the Møsstrand example refers to: [...]
> If so then please clarify in the comment that you are referring
> to test AkH-15.

Yup, clarified.

> Did you also design the foreign accent tests?

Yes.

> If you did create them then you should mention the new tests in the NEWS entry also.

As said, I went for the git changelog entry rather than NEWS. Let me
know if you think that I made it generic/specific enough.

Note that I've also changed a tiny bit of wording in the unittest's
comments, most notably added the word "arbitrarily" twice at locations
where Luis was kind enough to point out that I indeed picked one
possible behavior arbitrarily where it's unspecified.

Please let me know if there's anything you'd still like to improve.
Enhancments in phrasing English sentences in comments, git log etc.
are also truly welcome, I'm sure you've already noticed that I'm not
native English spaker. (If you feel like just editing the last few
tiny bits yourself, I'm happy if you do that.)


thanks a lot,
egmont

[-- Attachment #2: glibc-18934-hu-collate-v6.patch --]
[-- Type: text/x-patch, Size: 36741 bytes --]

commit 61adba52927edfc966115177e5dee7559bb6ed87
Author: Egmont Koblinger <egmont@gmail.com>
Date:   Tue Mar 21 22:29:12 2017 +0100

    localedata: hu_HU: fix multiple sorting bugs [BZ #18934]
    
    Fix the incorrect sorting order of a digraph and its geminated variant,
    regression introduced by a faulty fix to bug 13547 in commit
    b008d4c85619a753e441d7f473ba8af0db400bd6.
    
    Fix two inconsistencies in sorting unusual capitalization of digraphs
    (bug #18587).
    
    Enable DIACRIT_FORWARD to work around bug #17750.
    
    Sort foreign accents after the Hungarian ones.
    
    Add extensive unittests containing all the examples from The Rules of
    Hungarian Orthography and many more, including explanatory comments.

diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 3b257a2..68987f1 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,10 @@
+2017-03-21  Egmont Koblinger  <egmont@gmail.com>
+
+	[BZ #18934]
+	* locales/hu_HU: Fix multiple collate bugs.
+	* hu_HU.in: New file.
+	* Makefile (test-input): Add hu_HU.UTF-8.
+
 2017-02-20  Mike FABIAN  <mfabian@redhat.com>
 
 	[BZ #20313]
diff --git a/localedata/Makefile b/localedata/Makefile
index f6a70a3..47ca39d 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon tst-rpmatch tst-trans \
 	     tst-ctype tst-langinfo tst-langinfo-static tst-numeric
 test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
 	      hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
-	      si_LK.UTF-8 uk_UA.UTF-8
+	      si_LK.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 test-input-data = $(addsuffix .in, $(basename $(test-input)))
 test-output := $(foreach s, .out .xout, \
 			 $(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
 	   hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
 	   nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
 	   zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
-	   tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
+	   tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
 include ../gen-locales.mk
 endif
 
diff --git a/localedata/hu_HU.in b/localedata/hu_HU.in
new file mode 100644
index 0000000..7736ac0
--- /dev/null
+++ b/localedata/hu_HU.in
@@ -0,0 +1,560 @@
+AkH-14-a1 acél          ; The "AkH" tests are from:
+AkH-14-a1 cukor         ;
+AkH-14-a1 csók          ; A magyar helyesírás szabályai, 12. kiadás
+AkH-14-a1 gép           ; [The Rules of Hungarian Orthography, 12th edition]
+AkH-14-a1 hideg         ;
+AkH-14-a1 kettő         ; often referred to as akadémiai helyesírás (AkH.) [academic orthography]
+AkH-14-a1 Nagy          ;
+AkH-14-a1 nyúl          ; http://helyesiras.mta.hu/helyesiras/default/akh12
+AkH-14-a1 olasz         ;
+AkH-14-a1 öröm          ; Alphabetical ordering described in #14-16.
+AkH-14-a1 remény
+AkH-14-a1 sokáig        ; #14-a1: Sort based on first letter.
+AkH-14-a1 szabad
+AkH-14-a1 Tamás
+AkH-14-a1 vásárol
+AkH-14-a2 jácint        ; #14-a2: If no other difference, lowercase initial precedes uppercase.
+AkH-14-a2 Jácint
+AkH-14-a2 opera
+AkH-14-a2 Opera
+AkH-14-a2 szűcs
+AkH-14-a2 Szűcs
+AkH-14-a2 viola
+AkH-14-a2 Viola
+AkH-14-a3 cudar         ; #14-a3: Compound letters (cs, dz, dzs, gy, ly, ny, sz, ty, zs)
+AkH-14-a3 cukor         ; are sorted separately, after their first letter:
+AkH-14-a3 cuppant       ; a b c cs d dz dzs e f g gy h ... l ly m n ny o ... s sz t ty u ... z zs
+AkH-14-a3 csalit
+AkH-14-a3 csata
+AkH-14-a3 Csepel
+AkH-14-a3 Zoltán
+AkH-14-a3 zongora
+AkH-14-a3 zúdul
+AkH-14-a3 zsalu
+AkH-14-a3 zseni
+AkH-14-a3 Zsigmond
+AkH-14-b1 lom           ; #14-b1: The first difference matters.
+AkH-14-b1 lomb
+AkH-14-b1 lombik
+AkH-14-b1 Lontay
+AkH-14-b1 lovagol
+AkH-14-b1 pirinkó
+AkH-14-b1 pirinyó
+AkH-14-b1 pirít
+AkH-14-b1 pirkad
+AkH-14-b1 Piroska
+AkH-14-b1 tükör
+AkH-14-b1 Tünde
+AkH-14-b1 tünemény
+AkH-14-b1 tüntet
+AkH-14-b1 tüzér
+AkH-14-b2 kas           ; #14-b2: If a compound letter is pronounced long, only the first letter
+AkH-14-b2 Kasmír        ; is duplicated in writing: <cs><cs> becomes ccs, <dzs><dzs> is ddzs etc.
+AkH-14-b2 Kassák        ; (unless it's at the boundary of a compound word where it's written out twice).
+AkH-14-b2 kastély       ; Sort according to the actual tokens, not the shorthand written form.
+AkH-14-b2 kasza         ; <k><a><sz><a>
+AkH-14-b2 kaszinó       ; <k><a><sz><i><n><ó>
+AkH-14-b2 kassza        ; <k><a><sz><sz><a>
+AkH-14-b2 kaszt         ; <k><a><sz><t>
+AkH-14-b2 mennek
+AkH-14-b2 mennének
+AkH-14-b2 menü
+AkH-14-b2 menza
+AkH-14-b2 meny          ; <m><e><ny>
+AkH-14-b2 Menyhért      ; <M><e><ny><h><é><r><t>
+AkH-14-b2 mennybolt     ; <m><e><ny><ny><b><o><l><t>
+AkH-14-b2 mennyi        ; <m><e><ny><ny><i>
+AkH-14-b2 nagy          ; <n><a><gy>
+AkH-14-b2 naggyá        ; <n><a><gy><gy><á>
+AkH-14-b2 nagygyakorlat ; <n><a><gy><gy><a><k><o><r><l><a><t> (compound word: nagy+gyakorlat)
+AkH-14-b2 naggyal       ; <n><a><gy><gy><a><l>
+AkH-14-b2 nagyít        ; <n><a><gy><í><t>
+AkH-14-b2 nagyobb
+AkH-14-b2 nagyol
+AkH-14-b2 nagyoll
+AkH-14-c1 ír            ; #14-c1: Vowels collate equally in pairs: a-á, e-é, i-í, o-ó, ö-ő, u-ú, ü-ű.
+AkH-14-c1 Irak
+AkH-14-c1 iram
+AkH-14-c1 Irán
+AkH-14-c1 írandó
+AkH-14-c1 iránt
+AkH-14-c1 író
+AkH-14-c1 iroda
+AkH-14-c1 irónia
+AkH-14-c2 Eger          ; #14-c2: Short vowel (unaccented, or with diaeresis) comes first if that's the only difference.
+AkH-14-c2 egér
+AkH-14-c2 egyfelé
+AkH-14-c2 egyféle
+AkH-14-c2 elöl
+AkH-14-c2 elől
+AkH-14-c2 kerek
+AkH-14-c2 kerék
+AkH-14-c2 keres
+AkH-14-c2 kérés
+AkH-14-c2 koros
+AkH-14-c2 kóros
+AkH-14-c2 szel
+AkH-14-c2 szél
+AkH-14-c2 szeles
+AkH-14-c2 széles
+AkH-14-c2 szüret
+AkH-14-c2 szűret
+AkH-14-d1 kis részben   ; #14-d1: Spaces, hyphens are ignored.
+AkH-14-d1 kissé
+AkH-14-d1 Kiss Ernő
+AkH-14-d1 kis sorozat
+AkH-14-d1 kissorozat-gyártás
+AkH-14-d1 kis számban
+AkH-14-d1 kistányér
+AkH-14-d1 kis virág
+AkH-14-d1 márvány
+AkH-14-d1 márványkő
+AkH-14-d1 márvány sírkő
+AkH-14-d1 Márvány-tenger
+AkH-14-d1 márványtömb
+AkH-14-d1 Márvány Zsolt
+AkH-14-d1 másféle
+AkH-14-d1 másol
+AkH-14-d1 tiszafa
+AkH-14-d1 Tiszahát
+AkH-14-d1 Tisza Kálmán
+AkH-14-d1 Tisza menti
+AkH-14-d1 Tiszántúl
+AkH-14-d1 Tisza-part
+AkH-14-d1 tiszavirág
+AkH-14-d1 tiszt
+AkH-15 cérna            ; #15: Foreign accents are ignored, unless they're the only difference,
+AkH-15 Černý            ; in which case they are sorted after the Hungarian ones (in unspecified order).
+AkH-15 Champagne
+AkH-15 Cholnoky
+AkH-15 címez
+AkH-15 cukor
+AkH-15 Czuczor
+AkH-15 csapat
+AkH-15 Gaal
+AkH-15 galamb
+AkH-15 Gärtner
+AkH-15 gáz
+AkH-15 geodézia
+AkH-15 Georges
+AkH-15 góc
+AkH-15 Goethe
+AkH-15 moshat
+AkH-15 mosna
+AkH-15 Mošna
+AkH-15 mosópor
+AkH-15 Møsstrand
+AkH-15 mostan
+AkH-15 munka
+AkH-15 Muñoz
+alphabet a              ; All the remaining tests were added by glibc.
+alphabet á
+alphabet aa             ; a = á unless that's the only difference in which case a < á.
+alphabet aá             ; (Same for e = é, i = í, o = ó, ö = ő, u = ú, ü = ű below.)
+alphabet áa             ; Differences in accents matter from left to right.
+alphabet áá
+alphabet áp
+alphabet aq
+alphabet b
+alphabet c
+alphabet cz             ; <c><z>
+alphabet cs             ; <cs>        -- or rarely <c><s>, can't tell for sure, assume <cs>.
+alphabet csc            ; <cs><c>
+alphabet ccs            ; <cs><cs>    -- or rarely <c><cs>, can't tell for sure, assume <cs><cs>.
+alphabet cscs           ; <cs><cs>    -- Make sure ccs and cscs don't collate as equal, see bug 13547.
+alphabet ccsa           ; <cs><cs><a> -- The order of ccs and cscs is not specified in the rules and is arbitrarily chosen by glibc.
+alphabet cscsa          ; <cs><cs><a>
+alphabet csd            ; <cs><d>     -- (These comments also apply to all other compound letters below.)
+alphabet d
+alphabet dz             ; <dz>
+alphabet dzd            ; <dz><d>
+alphabet ddz            ; <dz><dz>
+alphabet dzdz           ; <dz><dz>
+alphabet ddza           ; <dz><dz><a>
+alphabet dzdza          ; <dz><dz><a>
+alphabet dzdzs          ; <dz><dzs>
+alphabet dze            ; <dz><e>
+alphabet dzz            ; <dz><z>
+alphabet dzs            ; <dzs>
+alphabet dzsdz          ; <dzs><dz>
+alphabet ddzs           ; <dzs><dzs>
+alphabet dzsdzs         ; <dzs><dzs>
+alphabet ddzsa          ; <dzs><dzs><a>
+alphabet dzsdzsa        ; <dzs><dzs><a>
+alphabet dzse           ; <dzs><e>
+alphabet e
+alphabet é
+alphabet ee
+alphabet eé
+alphabet ée
+alphabet éé
+alphabet ép
+alphabet eq
+alphabet f
+alphabet g
+alphabet gz             ; <g><z>
+alphabet gy             ; <gy>
+alphabet gyg            ; <gy><g>
+alphabet ggy            ; <gy><gy>
+alphabet gygy           ; <gy><gy>
+alphabet ggya           ; <gy><gy><a>
+alphabet gygya          ; <gy><gy><a>
+alphabet gyh            ; <gy><h>
+alphabet h
+alphabet i
+alphabet í
+alphabet ii
+alphabet ií
+alphabet íi
+alphabet íí
+alphabet íp
+alphabet iq
+alphabet j
+alphabet k
+alphabet l
+alphabet lz             ; <l><z>
+alphabet ly             ; <ly>
+alphabet lyl            ; <ly><l>
+alphabet lly            ; <ly><ly>
+alphabet lyly           ; <ly><ly>
+alphabet llya           ; <ly><ly><a>
+alphabet lylya          ; <ly><ly><a>
+alphabet lym            ; <ly><m>
+alphabet m
+alphabet n
+alphabet nz             ; <n><z>
+alphabet ny             ; <ny>
+alphabet nyn            ; <ny><n>
+alphabet nny            ; <ny><ny>
+alphabet nyny           ; <ny><ny>
+alphabet nnya           ; <ny><ny><a>
+alphabet nynya          ; <ny><ny><a>
+alphabet nyo            ; <ny><o>
+alphabet o
+alphabet ó
+alphabet oo
+alphabet oó
+alphabet óo
+alphabet óó
+alphabet óp
+alphabet oq
+alphabet ö              ; ö = ő (unless that's the only difference), but these come strictly after o and ó.
+alphabet ő
+alphabet öö
+alphabet öő
+alphabet őö
+alphabet őő
+alphabet őp
+alphabet öq
+alphabet p
+alphabet q
+alphabet r
+alphabet s
+alphabet sz             ; <sz>
+alphabet szs            ; <sz><s>
+alphabet ssz            ; <sz><sz>
+alphabet szsz           ; <sz><sz>
+alphabet ssza           ; <sz><sz><a>
+alphabet szsza          ; <sz><sz><a>
+alphabet szt            ; <sz><t>
+alphabet t
+alphabet tz             ; <t><z>
+alphabet ty             ; <ty>
+alphabet tyt            ; <ty><t>
+alphabet tty            ; <ty><ty>
+alphabet tyty           ; <ty><ty>
+alphabet ttya           ; <ty><ty><a>
+alphabet tytya          ; <ty><ty><a>
+alphabet tyu            ; <ty><u>
+alphabet u
+alphabet ú
+alphabet úp
+alphabet uq
+alphabet uu
+alphabet uú
+alphabet úu
+alphabet úú
+alphabet ü              ; ü = ű (unless that's the only difference), but these come strictly after u and ú.
+alphabet ű
+alphabet űp
+alphabet üq
+alphabet üü
+alphabet üű
+alphabet űü
+alphabet űű
+alphabet v
+alphabet w
+alphabet x
+alphabet y
+alphabet z
+alphabet zz             ; <z><z>
+alphabet zs             ; <zs>
+alphabet zsz            ; <zs><z>
+alphabet zzs            ; <zs><zs>
+alphabet zszs           ; <zs><zs>
+alphabet zzsa           ; <zs><zs><a>
+alphabet zszsa          ; <zs><zs><a>
+case a                  ; #14-a2 specifies that if the same word appears in lowercase as well as with
+case A                  ; uppercase initial, the lowercase one is to be sorted first.
+case á                  ; Arbitrarily extend this to all other weird combinations of upper- and lowercases in compound letters.
+case Á
+case cs                 ; <cs>
+case cS
+case Cs
+case CS
+case ccs                ; <cs><cs>
+case ccS
+case cCs
+case cCS
+case Ccs
+case CcS
+case CCs
+case CCS
+case dz                 ; <dz>
+case dZ
+case Dz
+case DZ
+case ddz                ; <dz><dz>
+case ddZ
+case dDz
+case dDZ
+case Ddz
+case DdZ
+case DDz
+case DDZ
+case dzs                ; <dzs>
+case dzS
+case dZs
+case dZS
+case Dzs
+case DzS
+case DZs
+case DZS
+case ddzs               ; <dzs><dzs>
+case ddzS
+case ddZs
+case ddZS
+case dDzs
+case dDzS
+case dDZs
+case dDZS
+case Ddzs
+case DdzS
+case DdZs
+case DdZS
+case DDzs
+case DDzS
+case DDZs
+case DDZS
+case e
+case E
+case é
+case É
+case gy                 ; <gy>
+case gY
+case Gy
+case GY
+case ggy                ; <gy><gy>
+case ggY
+case gGy
+case gGY
+case Ggy
+case GgY
+case GGy
+case GGY
+case i
+case I
+case í
+case Í
+case ly                 ; <ly>
+case lY
+case Ly
+case LY
+case lly                ; <ly><ly>
+case llY
+case lLy
+case lLY
+case Lly
+case LlY
+case LLy
+case LLY
+case ny                 ; <ny>
+case nY
+case Ny
+case NY
+case nny                ; <ny><ny>
+case nnY
+case nNy
+case nNY
+case Nny
+case NnY
+case NNy
+case NNY
+case o
+case O
+case ó
+case Ó
+case ö
+case Ö
+case ő
+case Ő
+case sz                 ; <sz>
+case sZ
+case Sz
+case SZ
+case ssz                ; <sz><sz>
+case ssZ
+case sSz
+case sSZ
+case Ssz
+case SsZ
+case SSz
+case SSZ
+case ty                 ; <ty>
+case tY
+case Ty
+case TY
+case tty                ; <ty><ty>
+case ttY
+case tTy
+case tTY
+case Tty
+case TtY
+case TTy
+case TTY
+case u
+case U
+case ú
+case Ú
+case ü
+case Ü
+case ű
+case Ű
+case zs                 ; <zs>
+case zS
+case Zs
+case ZS
+case zzs                ; <zs><zs>
+case zzS
+case zZs
+case zZS
+case Zzs
+case ZzS
+case ZZs
+case ZZS
+foreign-a1 á            ; More thorough tests for foreign accents (#15).
+foreign-a1 à            ; Each test consists of 4 lines. The foreign accent is in the middle two.
+foreign-a1 àp           ; That is, on their own they come after the Hungarian accent, but a
+foreign-a1 áq           ; subsequent difference (p and q) overrides this.
+foreign-a2 á
+foreign-a2 â
+foreign-a2 âp
+foreign-a2 áq
+foreign-a3 á
+foreign-a3 ã
+foreign-a3 ãp
+foreign-a3 áq
+foreign-a4 á
+foreign-a4 ä
+foreign-a4 äp
+foreign-a4 áq
+foreign-a5 á
+foreign-a5 å
+foreign-a5 åp
+foreign-a5 áq
+foreign-a6 á
+foreign-a6 ă
+foreign-a6 ăp
+foreign-a6 áq
+foreign-c1 c
+foreign-c1 ç
+foreign-c1 çp
+foreign-c1 cq
+foreign-d1 d
+foreign-d1 đ
+foreign-d1 đp
+foreign-d1 dq
+foreign-e1 é
+foreign-e1 è
+foreign-e1 èp
+foreign-e1 éq
+foreign-e2 é
+foreign-e2 ê
+foreign-e2 êp
+foreign-e2 éq
+foreign-e3 é
+foreign-e3 ë
+foreign-e3 ëp
+foreign-e3 éq
+foreign-e4 é
+foreign-e4 ě
+foreign-e4 ěp
+foreign-e4 éq
+foreign-i1 í
+foreign-i1 ì
+foreign-i1 ìp
+foreign-i1 íq
+foreign-i2 í
+foreign-i2 î
+foreign-i2 îp
+foreign-i2 íq
+foreign-i3 í
+foreign-i3 ï
+foreign-i3 ïp
+foreign-i3 íq
+foreign-l1 l
+foreign-l1 ł
+foreign-l1 łp
+foreign-l1 lq
+foreign-n1 n
+foreign-n1 ñ
+foreign-n1 ñp
+foreign-n1 nq
+foreign-n2 n
+foreign-n2 ň
+foreign-n2 ňp
+foreign-n2 nq
+foreign-o1 ó            ; The rules are not explicit whether foreign accents on top of o or u
+foreign-o1 ò            ; should be sorted among o-ó and u-ú, or among ö-ő and ü-ű, but the
+foreign-o1 òp           ; AkH #15 example with Møsstrand implicitly shows that it's the former.
+foreign-o1 óq
+foreign-o2 ó
+foreign-o2 ô
+foreign-o2 ôp
+foreign-o2 óq
+foreign-o3 ó
+foreign-o3 õ
+foreign-o3 õp
+foreign-o3 óq
+foreign-o4 ó
+foreign-o4 ø
+foreign-o4 øp
+foreign-o4 óq
+foreign-r1 r
+foreign-r1 ř
+foreign-r1 řp
+foreign-r1 rq
+foreign-s1 s
+foreign-s1 š
+foreign-s1 šp
+foreign-s1 sq
+foreign-u1 ú
+foreign-u1 ù
+foreign-u1 ùp
+foreign-u1 úq
+foreign-u2 ú
+foreign-u2 û
+foreign-u2 ûp
+foreign-u2 úq
+foreign-u3 ú
+foreign-u3 ũ
+foreign-u3 ũp
+foreign-u3 úq
+foreign-u4 ú
+foreign-u4 ů
+foreign-u4 ůp
+foreign-u4 úq
+foreign-y1 y
+foreign-y1 ÿ
+foreign-y1 ÿp
+foreign-y1 yq
diff --git a/localedata/locales/hu_HU b/localedata/locales/hu_HU
index 898d293..8d508e3 100644
--- a/localedata/locales/hu_HU
+++ b/localedata/locales/hu_HU
@@ -22,7 +22,7 @@ escape_char /
 % - made all days abbreviations same lenght by appending spaces
 % Email: srtxg@chanae.alphanet.ch
 %
-% Further changes by Egmont Koblinger, 2002/Jan/06, 2012/Jan/03, 2015/Sep/03
+% Further changes by Egmont Koblinger, 2002/Jan/06, 2012/Jan/03, 2015/Sep/03, 2017/Mar/21
 % - fixed tons of remaining bugs in alphabetical order
 % - turned month names and similar stuff to lowercase
 % - other small bugfixes
@@ -64,6 +64,7 @@ category "i18n:2012";LC_MEASUREMENT
 END LC_IDENTIFICATION
 
 LC_COLLATE
+define DIACRIT_FORWARD
 copy "iso14651_t1"
 
 %% a b c cs d dz dzs e f g gy h i j k l ly m n ny o o: p q
@@ -77,15 +78,18 @@ copy "iso14651_t1"
 %% dzs+dzs becomes ddzs, and so on.
 %% However, c+cs is also spelled as ccs, you need to speak
 %% the language to tell which one is the case.
-%% Tokenize ccs as <c_or_cs><cs>, and sort the tokens as
-%% a b c c_or_cs cs d... This effectively assumes cs+cs
-%% which is more frequent than c+cs, but guarantees that the
-%% strings ccs and cscs don't collate as equal.
+%% Tokenize ccs as <cs><cs> since this is much more frequent
+%% than <c><cs>, but apply SINGLE-OR-COMPOUND and COMPOUND
+%% to the tokens so that the strings ccs and cscs don't collate
+%% as equal.
+%% The same goes for all other compound consonants.
 
 collating-symbol  <odouble>
 collating-symbol  <udouble>
 
-collating-symbol  <c_or_cs>
+collating-symbol  <SINGLE-OR-COMPOUND>
+collating-symbol  <COMPOUND>
+
 collating-symbol  <cs>
 collating-element <C-S> from "<U0043><U0053>"
 collating-element <C-s> from "<U0043><U0073>"
@@ -100,7 +104,6 @@ collating-element <c-C-s> from "<U0063><U0043><U0073>"
 collating-element <c-c-S> from "<U0063><U0063><U0053>"
 collating-element <c-c-s> from "<U0063><U0063><U0073>"
 
-collating-symbol  <d_or_dz>
 collating-symbol  <dz>
 collating-element <D-Z> from "<U0044><U005A>"
 collating-element <D-z> from "<U0044><U007A>"
@@ -115,7 +118,6 @@ collating-element <d-D-z> from "<U0064><U0044><U007A>"
 collating-element <d-d-Z> from "<U0064><U0064><U005A>"
 collating-element <d-d-z> from "<U0064><U0064><U007A>"
 
-collating-symbol  <d_or_dzs>
 collating-symbol  <dzs>
 collating-element <D-Z-S> from "<U0044><U005A><U0053>"
 collating-element <D-Z-s> from "<U0044><U005A><U0073>"
@@ -142,7 +144,6 @@ collating-element <d-d-Z-s> from "<U0064><U0064><U005A><U0073>"
 collating-element <d-d-z-S> from "<U0064><U0064><U007A><U0053>"
 collating-element <d-d-z-s> from "<U0064><U0064><U007A><U0073>"
 
-collating-symbol  <g_or_gy>
 collating-symbol  <gy>
 collating-element <G-Y> from "<U0047><U0059>"
 collating-element <G-y> from "<U0047><U0079>"
@@ -157,7 +158,6 @@ collating-element <g-G-y> from "<U0067><U0047><U0079>"
 collating-element <g-g-Y> from "<U0067><U0067><U0059>"
 collating-element <g-g-y> from "<U0067><U0067><U0079>"
 
-collating-symbol  <l_or_ly>
 collating-symbol  <ly>
 collating-element <L-Y> from "<U004C><U0059>"
 collating-element <L-y> from "<U004C><U0079>"
@@ -172,7 +172,6 @@ collating-element <l-L-y> from "<U006C><U004C><U0079>"
 collating-element <l-l-Y> from "<U006C><U006C><U0059>"
 collating-element <l-l-y> from "<U006C><U006C><U0079>"
 
-collating-symbol  <n_or_ny>
 collating-symbol  <ny>
 collating-element <N-Y> from "<U004E><U0059>"
 collating-element <N-y> from "<U004E><U0079>"
@@ -187,7 +186,6 @@ collating-element <n-N-y> from "<U006E><U004E><U0079>"
 collating-element <n-n-Y> from "<U006E><U006E><U0059>"
 collating-element <n-n-y> from "<U006E><U006E><U0079>"
 
-collating-symbol  <s_or_sz>
 collating-symbol  <sz>
 collating-element <S-Z> from "<U0053><U005A>"
 collating-element <S-z> from "<U0053><U007A>"
@@ -202,7 +200,6 @@ collating-element <s-S-z> from "<U0073><U0053><U007A>"
 collating-element <s-s-Z> from "<U0073><U0073><U005A>"
 collating-element <s-s-z> from "<U0073><U0073><U007A>"
 
-collating-symbol  <t_or_ty>
 collating-symbol  <ty>
 collating-element <T-Y> from "<U0054><U0059>"
 collating-element <T-y> from "<U0054><U0079>"
@@ -217,7 +214,6 @@ collating-element <t-T-y> from "<U0074><U0054><U0079>"
 collating-element <t-t-Y> from "<U0074><U0074><U0059>"
 collating-element <t-t-y> from "<U0074><U0074><U0079>"
 
-collating-symbol  <z_or_zs>
 collating-symbol  <zs>
 collating-element <Z-S> from "<U005A><U0053>"
 collating-element <Z-s> from "<U005A><U0073>"
@@ -232,8 +228,10 @@ collating-element <z-Z-s> from "<U007A><U005A><U0073>"
 collating-element <z-z-S> from "<U007A><U007A><U0053>"
 collating-element <z-z-s> from "<U007A><U007A><U0073>"
 
+collating-symbol <CAP-CAP>
 collating-symbol <CAP-MIN>
 collating-symbol <MIN-CAP>
+collating-symbol <MIN-MIN>
 collating-symbol <CAP-CAP-CAP>
 collating-symbol <CAP-CAP-MIN>
 collating-symbol <CAP-MIN-CAP>
@@ -244,6 +242,7 @@ collating-symbol <MIN-MIN-CAP>
 collating-symbol <MIN-MIN-MIN>
 
 reorder-after <MIN>
+<MIN-MIN>
 <MIN-CAP>
 <MIN-MIN-MIN>
 <MIN-MIN-CAP>
@@ -252,42 +251,38 @@ reorder-after <MIN>
 
 reorder-after <CAP>
 <CAP-MIN>
+<CAP-CAP>
 <CAP-MIN-MIN>
 <CAP-MIN-CAP>
 <CAP-CAP-MIN>
 <CAP-CAP-CAP>
 
 reorder-after <c>
-<c_or_cs>
 <cs>
 reorder-after <d>
-<d_or_dz>
-<d_or_dzs>
 <dz>
 <dzs>
 reorder-after <g>
-<g_or_gy>
 <gy>
 reorder-after <l>
-<l_or_ly>
 <ly>
 reorder-after <n>
-<n_or_ny>
 <ny>
 reorder-after <o>
 <odouble>
 reorder-after <s>
-<s_or_sz>
 <sz>
 reorder-after <t>
-<t_or_ty>
 <ty>
 reorder-after <u>
 <udouble>
 reorder-after <z>
-<z_or_zs>
 <zs>
 
+reorder-after <BAS>
+<SINGLE-OR-COMPOUND>
+<COMPOUND>
+
 reorder-after <o>
 <U00F6>	<odouble>;<REU>;<MIN>;IGNORE
 <U0151>	<odouble>;<DAC>;<MIN>;IGNORE
@@ -300,152 +295,157 @@ reorder-after <u>
 <U00DC>	<udouble>;<REU>;<CAP>;IGNORE
 <U0170>	<udouble>;<DAC>;<CAP>;IGNORE
 
+reorder-after <BAS>
+<ACA>
+<REU>
+<DAC>
+
 reorder-after <U0043>
-<C-S>		<cs>;<BAS>;<CAP>;IGNORE
-<C-s>		<cs>;<BAS>;<CAP-MIN>;IGNORE
-<C-C-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<C-C-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<C-c-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<C-c-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<C-S>		<cs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<C-s>		<cs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<C-C-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<C-C-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<C-c-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<C-c-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0063>
-<c-S>		<cs>;<BAS>;<MIN-CAP>;IGNORE
-<c-s>		<cs>;<BAS>;<MIN>;IGNORE
-<c-C-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<c-C-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<c-c-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<c-c-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<c-S>		<cs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<c-s>		<cs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<c-C-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<c-C-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<c-c-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<c-c-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0044>
-<D-Z>		<dz>;<BAS>;<CAP>;IGNORE
-<D-z>		<dz>;<BAS>;<CAP-MIN>;IGNORE
-<D-D-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<D-D-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<D-d-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<D-d-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<D-Z>		<dz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<D-z>		<dz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<D-D-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<D-D-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<D-d-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<D-d-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z>		<dz>;<BAS>;<MIN-CAP>;IGNORE
-<d-z>		<dz>;<BAS>;<MIN>;IGNORE
-<d-D-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<d-D-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<d-d-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<d-d-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<d-Z>		<dz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<d-z>		<dz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<d-D-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<d-D-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<d-d-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<d-d-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0044>
-<D-Z-S>		<dzs>;<BAS>;<CAP-CAP-CAP>;IGNORE
-<D-Z-s>		<dzs>;<BAS>;<CAP-CAP-MIN>;IGNORE
-<D-z-S>		<dzs>;<BAS>;<CAP-MIN-CAP>;IGNORE
-<D-z-s>		<dzs>;<BAS>;<CAP-MIN-MIN>;IGNORE
-<D-D-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-D-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-D-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-D-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
-<D-d-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
-<D-d-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
-<D-d-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
-<D-d-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-Z-S>		<dzs>;<COMPOUND>;<CAP-CAP-CAP>;IGNORE
+<D-Z-s>		<dzs>;<COMPOUND>;<CAP-CAP-MIN>;IGNORE
+<D-z-S>		<dzs>;<COMPOUND>;<CAP-MIN-CAP>;IGNORE
+<D-z-s>		<dzs>;<COMPOUND>;<CAP-MIN-MIN>;IGNORE
+<D-D-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-CAP>";IGNORE
+<D-D-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-MIN>";IGNORE
+<D-D-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-CAP>";IGNORE
+<D-D-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-MIN>";IGNORE
+<D-d-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-CAP>";IGNORE
+<D-d-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-MIN>";IGNORE
+<D-d-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-CAP>";IGNORE
+<D-d-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-MIN>";IGNORE
 reorder-after <U0064>
-<d-Z-S>		<dzs>;<BAS>;<MIN-CAP-CAP>;IGNORE
-<d-Z-s>		<dzs>;<BAS>;<MIN-CAP-MIN>;IGNORE
-<d-z-S>		<dzs>;<BAS>;<MIN-MIN-CAP>;IGNORE
-<d-z-s>		<dzs>;<BAS>;<MIN-MIN-MIN>;IGNORE
-<d-D-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-D-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-D-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-D-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
-<d-d-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
-<d-d-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
-<d-d-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
-<d-d-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-Z-S>		<dzs>;<COMPOUND>;<MIN-CAP-CAP>;IGNORE
+<d-Z-s>		<dzs>;<COMPOUND>;<MIN-CAP-MIN>;IGNORE
+<d-z-S>		<dzs>;<COMPOUND>;<MIN-MIN-CAP>;IGNORE
+<d-z-s>		<dzs>;<COMPOUND>;<MIN-MIN-MIN>;IGNORE
+<d-D-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-CAP>";IGNORE
+<d-D-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-MIN>";IGNORE
+<d-D-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-CAP>";IGNORE
+<d-D-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-MIN>";IGNORE
+<d-d-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-CAP>";IGNORE
+<d-d-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-MIN>";IGNORE
+<d-d-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-CAP>";IGNORE
+<d-d-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-MIN>";IGNORE
 
 reorder-after <U0047>
-<G-Y>		<gy>;<BAS>;<CAP>;IGNORE
-<G-y>		<gy>;<BAS>;<CAP-MIN>;IGNORE
-<G-G-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<G-G-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<G-g-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<G-g-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<G-Y>		<gy>;<COMPOUND>;<CAP-CAP>;IGNORE
+<G-y>		<gy>;<COMPOUND>;<CAP-MIN>;IGNORE
+<G-G-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<G-G-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<G-g-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<G-g-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0067>
-<g-y>		<gy>;<BAS>;<MIN>;IGNORE
-<g-Y>		<gy>;<BAS>;<MIN-CAP>;IGNORE
-<g-G-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<g-G-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<g-g-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<g-g-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<g-Y>		<gy>;<COMPOUND>;<MIN-CAP>;IGNORE
+<g-y>		<gy>;<COMPOUND>;<MIN-MIN>;IGNORE
+<g-G-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<g-G-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<g-g-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<g-g-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U004C>
-<L-Y>		<ly>;<BAS>;<CAP>;IGNORE
-<L-y>		<ly>;<BAS>;<CAP-MIN>;IGNORE
-<L-L-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<L-L-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<L-l-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<L-l-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<L-Y>		<ly>;<COMPOUND>;<CAP-CAP>;IGNORE
+<L-y>		<ly>;<COMPOUND>;<CAP-MIN>;IGNORE
+<L-L-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<L-L-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<L-l-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<L-l-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006C>
-<l-y>		<ly>;<BAS>;<MIN>;IGNORE
-<l-Y>		<ly>;<BAS>;<MIN-CAP>;IGNORE
-<l-L-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<l-L-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<l-l-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<l-l-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<l-Y>		<ly>;<COMPOUND>;<MIN-CAP>;IGNORE
+<l-y>		<ly>;<COMPOUND>;<MIN-MIN>;IGNORE
+<l-L-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<l-L-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<l-l-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<l-l-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U004E>
-<N-Y>		<ny>;<BAS>;<CAP>;IGNORE
-<N-y>		<ny>;<BAS>;<CAP-MIN>;IGNORE
-<N-N-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<N-N-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<N-n-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<N-n-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<N-Y>		<ny>;<COMPOUND>;<CAP-CAP>;IGNORE
+<N-y>		<ny>;<COMPOUND>;<CAP-MIN>;IGNORE
+<N-N-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<N-N-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<N-n-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<N-n-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U006E>
-<n-y>		<ny>;<BAS>;<MIN>;IGNORE
-<n-Y>		<ny>;<BAS>;<MIN-CAP>;IGNORE
-<n-N-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<n-N-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<n-n-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<n-n-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<n-Y>		<ny>;<COMPOUND>;<MIN-CAP>;IGNORE
+<n-y>		<ny>;<COMPOUND>;<MIN-MIN>;IGNORE
+<n-N-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<n-N-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<n-n-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<n-n-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0053>
-<S-Z>		<sz>;<BAS>;<CAP>;IGNORE
-<S-z>		<sz>;<BAS>;<CAP-MIN>;IGNORE
-<S-S-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<S-S-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<S-s-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<S-s-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<S-Z>		<sz>;<COMPOUND>;<CAP-CAP>;IGNORE
+<S-z>		<sz>;<COMPOUND>;<CAP-MIN>;IGNORE
+<S-S-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<S-S-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<S-s-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<S-s-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0073>
-<s-Z>		<sz>;<BAS>;<MIN-CAP>;IGNORE
-<s-z>		<sz>;<BAS>;<MIN>;IGNORE
-<s-S-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<s-S-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<s-s-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<s-s-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<s-Z>		<sz>;<COMPOUND>;<MIN-CAP>;IGNORE
+<s-z>		<sz>;<COMPOUND>;<MIN-MIN>;IGNORE
+<s-S-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<s-S-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<s-s-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<s-s-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U0054>
-<T-Y>		<ty>;<BAS>;<CAP>;IGNORE
-<T-y>		<ty>;<BAS>;<CAP-MIN>;IGNORE
-<T-T-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<T-T-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<T-t-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<T-t-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<T-Y>		<ty>;<COMPOUND>;<CAP-CAP>;IGNORE
+<T-y>		<ty>;<COMPOUND>;<CAP-MIN>;IGNORE
+<T-T-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<T-T-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<T-t-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<T-t-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U0074>
-<t-Y>		<ty>;<BAS>;<MIN-CAP>;IGNORE
-<t-y>		<ty>;<BAS>;<MIN>;IGNORE
-<t-T-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<t-T-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<t-t-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<t-t-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<t-Y>		<ty>;<COMPOUND>;<MIN-CAP>;IGNORE
+<t-y>		<ty>;<COMPOUND>;<MIN-MIN>;IGNORE
+<t-T-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<t-T-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<t-t-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<t-t-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-after <U005A>
-<Z-S>		<zs>;<BAS>;<CAP>;IGNORE
-<Z-s>		<zs>;<BAS>;<CAP-MIN>;IGNORE
-<Z-Z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
-<Z-Z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
-<Z-z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
-<Z-z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
+<Z-S>		<zs>;<COMPOUND>;<CAP-CAP>;IGNORE
+<Z-s>		<zs>;<COMPOUND>;<CAP-MIN>;IGNORE
+<Z-Z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
+<Z-Z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
+<Z-z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
+<Z-z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
 reorder-after <U007A>
-<z-S>		<zs>;<BAS>;<MIN-CAP>;IGNORE
-<z-s>		<zs>;<BAS>;<MIN>;IGNORE
-<z-Z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
-<z-Z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
-<z-z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
-<z-z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
+<z-S>		<zs>;<COMPOUND>;<MIN-CAP>;IGNORE
+<z-s>		<zs>;<COMPOUND>;<MIN-MIN>;IGNORE
+<z-Z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
+<z-Z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
+<z-z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
+<z-z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
 
 reorder-end
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-03-21 22:40                                         ` Egmont Koblinger
@ 2017-03-22  1:03                                           ` Carlos O'Donell
  2017-03-22  7:20                                             ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-03-22  1:03 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Luis Javier Merino, libc-locales

On 03/21/2017 06:39 PM, Egmont Koblinger wrote:
> Hi Carlos,
> 
> I'm not sure I properly understood or addresed all your concerns, so
> please double check :)
> 
>> +alphabet a              ; These tests were created by egmont@gmail.com.
>>
>> Exactly which tests were created by you?
>>
>> Normally we don't include attribution in source files like this
> It just would have felt weird to me not to say anything about where
> these tests come from, especially after the "AkH" section that points
> to the source. I've reworded as "All the remaining tests were added by
> glibc", is this okay?

That is OK.

>> , instead
>> I'll add an item to NEWS about the additional coverage for Hungarian
>> and credit you there with the various tests you created.
> I'm not sure if we're on the same page with the NEWS file. It seems to
> highlight the bigger changes, and have an automatically generated
> 1-liner for smaler ones. I don't think the Hungarian sorting order
> deserves to be among the bigger ones, I'm perfectly fine with the
> 1-liner.

Deciding what is "big" or "small" is somewhat arbitrary, and so the
NEWS can contain anything news-worthy, and include attributions.

I'm OK with one-liner in NEWS, and a more complete git commit message.

> Instead, I see the git log entries contain a pretty detailed
> description of the changes, so I tried to create a quite verbose one.

Perfect.

>> In summary:
>> - Remove this comment.
>> - Add a bigger NEWS entry to the patch and I'll help review that.
>>   You'll also get attribution in the git commit author.
>>   Is that sufficient for you?
> Absolutely! By the way, I'm also mentioned in the locale definition
> file, I've added the current date there as well. We might squeeze that
> a bit, e.g. keep the years only, what do you think? (I don't care
> about crediting too much, I just think it's a big enough change within
> this file not to forget mentioning its date).

I'll review that below.

>> I assume the Møsstrand example refers to: [...]
>> If so then please clarify in the comment that you are referring
>> to test AkH-15.
> Yup, clarified.

Thanks.

>> Did you also design the foreign accent tests?
> Yes.

OK.

>> If you did create them then you should mention the new tests in the NEWS entry also.
> As said, I went for the git changelog entry rather than NEWS. Let me
> know if you think that I made it generic/specific enough.

OK.

> Note that I've also changed a tiny bit of wording in the unittest's
> comments, most notably added the word "arbitrarily" twice at locations
> where Luis was kind enough to point out that I indeed picked one
> possible behavior arbitrarily where it's unspecified.
> 
> Please let me know if there's anything you'd still like to improve.
> Enhancments in phrasing English sentences in comments, git log etc.
> are also truly welcome, I'm sure you've already noticed that I'm not
> native English spaker. (If you feel like just editing the last few
> tiny bits yourself, I'm happy if you do that.)
> 
> 
> thanks a lot,
> egmont
> 
> 
> glibc-18934-hu-collate-v6.patch
 
In summary:
- This v6 looks ready to checkin.
- Suggest
  "Further changes by Egmont Koblinger between 2002-2017:"
  in the locale file. Are you OK with that? See below.
- NEWS item:
"* Extensive new collation tests for Hungarian locales based on 
   `The Rules of Hungarian Orthography, 12th edition` and the
   work of Egmont Koblinger <egmont@gmail.com> (Bug 18934).

If you're OK with that then I'll commit.
 
> commit 61adba52927edfc966115177e5dee7559bb6ed87
> Author: Egmont Koblinger <egmont@gmail.com>
> Date:   Tue Mar 21 22:29:12 2017 +0100
> 
>     localedata: hu_HU: fix multiple sorting bugs [BZ #18934]
>     
>     Fix the incorrect sorting order of a digraph and its geminated variant,
>     regression introduced by a faulty fix to bug 13547 in commit
>     b008d4c85619a753e441d7f473ba8af0db400bd6.
>     
>     Fix two inconsistencies in sorting unusual capitalization of digraphs
>     (bug #18587).
>     
>     Enable DIACRIT_FORWARD to work around bug #17750.
>     
>     Sort foreign accents after the Hungarian ones.
>     
>     Add extensive unittests containing all the examples from The Rules of
>     Hungarian Orthography and many more, including explanatory comments.

OK.

> diff --git a/localedata/ChangeLog b/localedata/ChangeLog
> index 3b257a2..68987f1 100644
> --- a/localedata/ChangeLog
> +++ b/localedata/ChangeLog
> @@ -1,3 +1,10 @@
> +2017-03-21  Egmont Koblinger  <egmont@gmail.com>
> +
> +	[BZ #18934]
> +	* locales/hu_HU: Fix multiple collate bugs.
> +	* hu_HU.in: New file.
> +	* Makefile (test-input): Add hu_HU.UTF-8.
> +
>  2017-02-20  Mike FABIAN  <mfabian@redhat.com>
>  
>  	[BZ #20313]
> diff --git a/localedata/Makefile b/localedata/Makefile
> index f6a70a3..47ca39d 100644
> --- a/localedata/Makefile
> +++ b/localedata/Makefile
> @@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon tst-rpmatch tst-trans \
>  	     tst-ctype tst-langinfo tst-langinfo-static tst-numeric
>  test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
>  	      hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
> -	      si_LK.UTF-8 uk_UA.UTF-8
> +	      si_LK.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
>  test-input-data = $(addsuffix .in, $(basename $(test-input)))
>  test-output := $(foreach s, .out .xout, \
>  			 $(addsuffix $s, $(basename $(test-input))))
> @@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
>  	   hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
>  	   nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
>  	   zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
> -	   tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8
> +	   tr_TR.ISO-8859-9 en_GB.UTF-8 uk_UA.UTF-8 hu_HU.UTF-8
>  include ../gen-locales.mk
>  endif
>  
> diff --git a/localedata/hu_HU.in b/localedata/hu_HU.in
> new file mode 100644
> index 0000000..7736ac0
> --- /dev/null
> +++ b/localedata/hu_HU.in
> @@ -0,0 +1,560 @@
> +AkH-14-a1 acél          ; The "AkH" tests are from:
> +AkH-14-a1 cukor         ;
> +AkH-14-a1 csók          ; A magyar helyesírás szabályai, 12. kiadás
> +AkH-14-a1 gép           ; [The Rules of Hungarian Orthography, 12th edition]
> +AkH-14-a1 hideg         ;
> +AkH-14-a1 kettő         ; often referred to as akadémiai helyesírás (AkH.) [academic orthography]
> +AkH-14-a1 Nagy          ;
> +AkH-14-a1 nyúl          ; http://helyesiras.mta.hu/helyesiras/default/akh12
> +AkH-14-a1 olasz         ;
> +AkH-14-a1 öröm          ; Alphabetical ordering described in #14-16.
> +AkH-14-a1 remény
> +AkH-14-a1 sokáig        ; #14-a1: Sort based on first letter.
> +AkH-14-a1 szabad
> +AkH-14-a1 Tamás
> +AkH-14-a1 vásárol
> +AkH-14-a2 jácint        ; #14-a2: If no other difference, lowercase initial precedes uppercase.
> +AkH-14-a2 Jácint
> +AkH-14-a2 opera
> +AkH-14-a2 Opera
> +AkH-14-a2 szűcs
> +AkH-14-a2 Szűcs
> +AkH-14-a2 viola
> +AkH-14-a2 Viola
> +AkH-14-a3 cudar         ; #14-a3: Compound letters (cs, dz, dzs, gy, ly, ny, sz, ty, zs)
> +AkH-14-a3 cukor         ; are sorted separately, after their first letter:
> +AkH-14-a3 cuppant       ; a b c cs d dz dzs e f g gy h ... l ly m n ny o ... s sz t ty u ... z zs
> +AkH-14-a3 csalit
> +AkH-14-a3 csata
> +AkH-14-a3 Csepel
> +AkH-14-a3 Zoltán
> +AkH-14-a3 zongora
> +AkH-14-a3 zúdul
> +AkH-14-a3 zsalu
> +AkH-14-a3 zseni
> +AkH-14-a3 Zsigmond
> +AkH-14-b1 lom           ; #14-b1: The first difference matters.
> +AkH-14-b1 lomb
> +AkH-14-b1 lombik
> +AkH-14-b1 Lontay
> +AkH-14-b1 lovagol
> +AkH-14-b1 pirinkó
> +AkH-14-b1 pirinyó
> +AkH-14-b1 pirít
> +AkH-14-b1 pirkad
> +AkH-14-b1 Piroska
> +AkH-14-b1 tükör
> +AkH-14-b1 Tünde
> +AkH-14-b1 tünemény
> +AkH-14-b1 tüntet
> +AkH-14-b1 tüzér
> +AkH-14-b2 kas           ; #14-b2: If a compound letter is pronounced long, only the first letter
> +AkH-14-b2 Kasmír        ; is duplicated in writing: <cs><cs> becomes ccs, <dzs><dzs> is ddzs etc.
> +AkH-14-b2 Kassák        ; (unless it's at the boundary of a compound word where it's written out twice).
> +AkH-14-b2 kastély       ; Sort according to the actual tokens, not the shorthand written form.
> +AkH-14-b2 kasza         ; <k><a><sz><a>
> +AkH-14-b2 kaszinó       ; <k><a><sz><i><n><ó>
> +AkH-14-b2 kassza        ; <k><a><sz><sz><a>
> +AkH-14-b2 kaszt         ; <k><a><sz><t>
> +AkH-14-b2 mennek
> +AkH-14-b2 mennének
> +AkH-14-b2 menü
> +AkH-14-b2 menza
> +AkH-14-b2 meny          ; <m><e><ny>
> +AkH-14-b2 Menyhért      ; <M><e><ny><h><é><r><t>
> +AkH-14-b2 mennybolt     ; <m><e><ny><ny><b><o><l><t>
> +AkH-14-b2 mennyi        ; <m><e><ny><ny><i>
> +AkH-14-b2 nagy          ; <n><a><gy>
> +AkH-14-b2 naggyá        ; <n><a><gy><gy><á>
> +AkH-14-b2 nagygyakorlat ; <n><a><gy><gy><a><k><o><r><l><a><t> (compound word: nagy+gyakorlat)
> +AkH-14-b2 naggyal       ; <n><a><gy><gy><a><l>
> +AkH-14-b2 nagyít        ; <n><a><gy><í><t>
> +AkH-14-b2 nagyobb
> +AkH-14-b2 nagyol
> +AkH-14-b2 nagyoll
> +AkH-14-c1 ír            ; #14-c1: Vowels collate equally in pairs: a-á, e-é, i-í, o-ó, ö-ő, u-ú, ü-ű.
> +AkH-14-c1 Irak
> +AkH-14-c1 iram
> +AkH-14-c1 Irán
> +AkH-14-c1 írandó
> +AkH-14-c1 iránt
> +AkH-14-c1 író
> +AkH-14-c1 iroda
> +AkH-14-c1 irónia
> +AkH-14-c2 Eger          ; #14-c2: Short vowel (unaccented, or with diaeresis) comes first if that's the only difference.
> +AkH-14-c2 egér
> +AkH-14-c2 egyfelé
> +AkH-14-c2 egyféle
> +AkH-14-c2 elöl
> +AkH-14-c2 elől
> +AkH-14-c2 kerek
> +AkH-14-c2 kerék
> +AkH-14-c2 keres
> +AkH-14-c2 kérés
> +AkH-14-c2 koros
> +AkH-14-c2 kóros
> +AkH-14-c2 szel
> +AkH-14-c2 szél
> +AkH-14-c2 szeles
> +AkH-14-c2 széles
> +AkH-14-c2 szüret
> +AkH-14-c2 szűret
> +AkH-14-d1 kis részben   ; #14-d1: Spaces, hyphens are ignored.
> +AkH-14-d1 kissé
> +AkH-14-d1 Kiss Ernő
> +AkH-14-d1 kis sorozat
> +AkH-14-d1 kissorozat-gyártás
> +AkH-14-d1 kis számban
> +AkH-14-d1 kistányér
> +AkH-14-d1 kis virág
> +AkH-14-d1 márvány
> +AkH-14-d1 márványkő
> +AkH-14-d1 márvány sírkő
> +AkH-14-d1 Márvány-tenger
> +AkH-14-d1 márványtömb
> +AkH-14-d1 Márvány Zsolt
> +AkH-14-d1 másféle
> +AkH-14-d1 másol
> +AkH-14-d1 tiszafa
> +AkH-14-d1 Tiszahát
> +AkH-14-d1 Tisza Kálmán
> +AkH-14-d1 Tisza menti
> +AkH-14-d1 Tiszántúl
> +AkH-14-d1 Tisza-part
> +AkH-14-d1 tiszavirág
> +AkH-14-d1 tiszt
> +AkH-15 cérna            ; #15: Foreign accents are ignored, unless they're the only difference,
> +AkH-15 Černý            ; in which case they are sorted after the Hungarian ones (in unspecified order).
> +AkH-15 Champagne
> +AkH-15 Cholnoky
> +AkH-15 címez
> +AkH-15 cukor
> +AkH-15 Czuczor
> +AkH-15 csapat
> +AkH-15 Gaal
> +AkH-15 galamb
> +AkH-15 Gärtner
> +AkH-15 gáz
> +AkH-15 geodézia
> +AkH-15 Georges
> +AkH-15 góc
> +AkH-15 Goethe
> +AkH-15 moshat
> +AkH-15 mosna
> +AkH-15 Mošna
> +AkH-15 mosópor
> +AkH-15 Møsstrand
> +AkH-15 mostan
> +AkH-15 munka
> +AkH-15 Muñoz
> +alphabet a              ; All the remaining tests were added by glibc.

OK.

> +alphabet á
> +alphabet aa             ; a = á unless that's the only difference in which case a < á.
> +alphabet aá             ; (Same for e = é, i = í, o = ó, ö = ő, u = ú, ü = ű below.)
> +alphabet áa             ; Differences in accents matter from left to right.
> +alphabet áá
> +alphabet áp
> +alphabet aq
> +alphabet b
> +alphabet c
> +alphabet cz             ; <c><z>
> +alphabet cs             ; <cs>        -- or rarely <c><s>, can't tell for sure, assume <cs>.
> +alphabet csc            ; <cs><c>
> +alphabet ccs            ; <cs><cs>    -- or rarely <c><cs>, can't tell for sure, assume <cs><cs>.
> +alphabet cscs           ; <cs><cs>    -- Make sure ccs and cscs don't collate as equal, see bug 13547.
> +alphabet ccsa           ; <cs><cs><a> -- The order of ccs and cscs is not specified in the rules and is arbitrarily chosen by glibc.
> +alphabet cscsa          ; <cs><cs><a>
> +alphabet csd            ; <cs><d>     -- (These comments also apply to all other compound letters below.)
> +alphabet d
> +alphabet dz             ; <dz>
> +alphabet dzd            ; <dz><d>
> +alphabet ddz            ; <dz><dz>
> +alphabet dzdz           ; <dz><dz>
> +alphabet ddza           ; <dz><dz><a>
> +alphabet dzdza          ; <dz><dz><a>
> +alphabet dzdzs          ; <dz><dzs>
> +alphabet dze            ; <dz><e>
> +alphabet dzz            ; <dz><z>
> +alphabet dzs            ; <dzs>
> +alphabet dzsdz          ; <dzs><dz>
> +alphabet ddzs           ; <dzs><dzs>
> +alphabet dzsdzs         ; <dzs><dzs>
> +alphabet ddzsa          ; <dzs><dzs><a>
> +alphabet dzsdzsa        ; <dzs><dzs><a>
> +alphabet dzse           ; <dzs><e>
> +alphabet e
> +alphabet é
> +alphabet ee
> +alphabet eé
> +alphabet ée
> +alphabet éé
> +alphabet ép
> +alphabet eq
> +alphabet f
> +alphabet g
> +alphabet gz             ; <g><z>
> +alphabet gy             ; <gy>
> +alphabet gyg            ; <gy><g>
> +alphabet ggy            ; <gy><gy>
> +alphabet gygy           ; <gy><gy>
> +alphabet ggya           ; <gy><gy><a>
> +alphabet gygya          ; <gy><gy><a>
> +alphabet gyh            ; <gy><h>
> +alphabet h
> +alphabet i
> +alphabet í
> +alphabet ii
> +alphabet ií
> +alphabet íi
> +alphabet íí
> +alphabet íp
> +alphabet iq
> +alphabet j
> +alphabet k
> +alphabet l
> +alphabet lz             ; <l><z>
> +alphabet ly             ; <ly>
> +alphabet lyl            ; <ly><l>
> +alphabet lly            ; <ly><ly>
> +alphabet lyly           ; <ly><ly>
> +alphabet llya           ; <ly><ly><a>
> +alphabet lylya          ; <ly><ly><a>
> +alphabet lym            ; <ly><m>
> +alphabet m
> +alphabet n
> +alphabet nz             ; <n><z>
> +alphabet ny             ; <ny>
> +alphabet nyn            ; <ny><n>
> +alphabet nny            ; <ny><ny>
> +alphabet nyny           ; <ny><ny>
> +alphabet nnya           ; <ny><ny><a>
> +alphabet nynya          ; <ny><ny><a>
> +alphabet nyo            ; <ny><o>
> +alphabet o
> +alphabet ó
> +alphabet oo
> +alphabet oó
> +alphabet óo
> +alphabet óó
> +alphabet óp
> +alphabet oq
> +alphabet ö              ; ö = ő (unless that's the only difference), but these come strictly after o and ó.
> +alphabet Å‘
> +alphabet öö
> +alphabet öő
> +alphabet őö
> +alphabet Å‘Å‘
> +alphabet Å‘p
> +alphabet öq
> +alphabet p
> +alphabet q
> +alphabet r
> +alphabet s
> +alphabet sz             ; <sz>
> +alphabet szs            ; <sz><s>
> +alphabet ssz            ; <sz><sz>
> +alphabet szsz           ; <sz><sz>
> +alphabet ssza           ; <sz><sz><a>
> +alphabet szsza          ; <sz><sz><a>
> +alphabet szt            ; <sz><t>
> +alphabet t
> +alphabet tz             ; <t><z>
> +alphabet ty             ; <ty>
> +alphabet tyt            ; <ty><t>
> +alphabet tty            ; <ty><ty>
> +alphabet tyty           ; <ty><ty>
> +alphabet ttya           ; <ty><ty><a>
> +alphabet tytya          ; <ty><ty><a>
> +alphabet tyu            ; <ty><u>
> +alphabet u
> +alphabet ú
> +alphabet úp
> +alphabet uq
> +alphabet uu
> +alphabet uú
> +alphabet úu
> +alphabet úú
> +alphabet ü              ; ü = ű (unless that's the only difference), but these come strictly after u and ú.
> +alphabet ű
> +alphabet űp
> +alphabet üq
> +alphabet üü
> +alphabet üű
> +alphabet űü
> +alphabet űű
> +alphabet v
> +alphabet w
> +alphabet x
> +alphabet y
> +alphabet z
> +alphabet zz             ; <z><z>
> +alphabet zs             ; <zs>
> +alphabet zsz            ; <zs><z>
> +alphabet zzs            ; <zs><zs>
> +alphabet zszs           ; <zs><zs>
> +alphabet zzsa           ; <zs><zs><a>
> +alphabet zszsa          ; <zs><zs><a>
> +case a                  ; #14-a2 specifies that if the same word appears in lowercase as well as with
> +case A                  ; uppercase initial, the lowercase one is to be sorted first.
> +case á                  ; Arbitrarily extend this to all other weird combinations of upper- and lowercases in compound letters.
> +case Á
> +case cs                 ; <cs>
> +case cS
> +case Cs
> +case CS
> +case ccs                ; <cs><cs>
> +case ccS
> +case cCs
> +case cCS
> +case Ccs
> +case CcS
> +case CCs
> +case CCS
> +case dz                 ; <dz>
> +case dZ
> +case Dz
> +case DZ
> +case ddz                ; <dz><dz>
> +case ddZ
> +case dDz
> +case dDZ
> +case Ddz
> +case DdZ
> +case DDz
> +case DDZ
> +case dzs                ; <dzs>
> +case dzS
> +case dZs
> +case dZS
> +case Dzs
> +case DzS
> +case DZs
> +case DZS
> +case ddzs               ; <dzs><dzs>
> +case ddzS
> +case ddZs
> +case ddZS
> +case dDzs
> +case dDzS
> +case dDZs
> +case dDZS
> +case Ddzs
> +case DdzS
> +case DdZs
> +case DdZS
> +case DDzs
> +case DDzS
> +case DDZs
> +case DDZS
> +case e
> +case E
> +case é
> +case É
> +case gy                 ; <gy>
> +case gY
> +case Gy
> +case GY
> +case ggy                ; <gy><gy>
> +case ggY
> +case gGy
> +case gGY
> +case Ggy
> +case GgY
> +case GGy
> +case GGY
> +case i
> +case I
> +case í
> +case Í
> +case ly                 ; <ly>
> +case lY
> +case Ly
> +case LY
> +case lly                ; <ly><ly>
> +case llY
> +case lLy
> +case lLY
> +case Lly
> +case LlY
> +case LLy
> +case LLY
> +case ny                 ; <ny>
> +case nY
> +case Ny
> +case NY
> +case nny                ; <ny><ny>
> +case nnY
> +case nNy
> +case nNY
> +case Nny
> +case NnY
> +case NNy
> +case NNY
> +case o
> +case O
> +case ó
> +case Ó
> +case ö
> +case Ö
> +case Å‘
> +case Ő
> +case sz                 ; <sz>
> +case sZ
> +case Sz
> +case SZ
> +case ssz                ; <sz><sz>
> +case ssZ
> +case sSz
> +case sSZ
> +case Ssz
> +case SsZ
> +case SSz
> +case SSZ
> +case ty                 ; <ty>
> +case tY
> +case Ty
> +case TY
> +case tty                ; <ty><ty>
> +case ttY
> +case tTy
> +case tTY
> +case Tty
> +case TtY
> +case TTy
> +case TTY
> +case u
> +case U
> +case ú
> +case Ú
> +case ü
> +case Ü
> +case ű
> +case Å°
> +case zs                 ; <zs>
> +case zS
> +case Zs
> +case ZS
> +case zzs                ; <zs><zs>
> +case zzS
> +case zZs
> +case zZS
> +case Zzs
> +case ZzS
> +case ZZs
> +case ZZS
> +foreign-a1 á            ; More thorough tests for foreign accents (#15).
> +foreign-a1 à            ; Each test consists of 4 lines. The foreign accent is in the middle two.
> +foreign-a1 àp           ; That is, on their own they come after the Hungarian accent, but a
> +foreign-a1 áq           ; subsequent difference (p and q) overrides this.
> +foreign-a2 á
> +foreign-a2 â
> +foreign-a2 âp
> +foreign-a2 áq
> +foreign-a3 á
> +foreign-a3 ã
> +foreign-a3 ãp
> +foreign-a3 áq
> +foreign-a4 á
> +foreign-a4 ä
> +foreign-a4 äp
> +foreign-a4 áq
> +foreign-a5 á
> +foreign-a5 å
> +foreign-a5 åp
> +foreign-a5 áq
> +foreign-a6 á
> +foreign-a6 ă
> +foreign-a6 ăp
> +foreign-a6 áq
> +foreign-c1 c
> +foreign-c1 ç
> +foreign-c1 çp
> +foreign-c1 cq
> +foreign-d1 d
> +foreign-d1 Ä‘
> +foreign-d1 Ä‘p
> +foreign-d1 dq
> +foreign-e1 é
> +foreign-e1 è
> +foreign-e1 èp
> +foreign-e1 éq
> +foreign-e2 é
> +foreign-e2 ê
> +foreign-e2 êp
> +foreign-e2 éq
> +foreign-e3 é
> +foreign-e3 ë
> +foreign-e3 ëp
> +foreign-e3 éq
> +foreign-e4 é
> +foreign-e4 Ä›
> +foreign-e4 ěp
> +foreign-e4 éq
> +foreign-i1 í
> +foreign-i1 ì
> +foreign-i1 ìp
> +foreign-i1 íq
> +foreign-i2 í
> +foreign-i2 î
> +foreign-i2 îp
> +foreign-i2 íq
> +foreign-i3 í
> +foreign-i3 ï
> +foreign-i3 ïp
> +foreign-i3 íq
> +foreign-l1 l
> +foreign-l1 Å‚
> +foreign-l1 Å‚p
> +foreign-l1 lq
> +foreign-n1 n
> +foreign-n1 ñ
> +foreign-n1 ñp
> +foreign-n1 nq
> +foreign-n2 n
> +foreign-n2 ň
> +foreign-n2 ňp
> +foreign-n2 nq
> +foreign-o1 ó            ; The rules are not explicit whether foreign accents on top of o or u
> +foreign-o1 ò            ; should be sorted among o-ó and u-ú, or among ö-ő and ü-ű, but the
> +foreign-o1 òp           ; AkH #15 example with Møsstrand implicitly shows that it's the former.
> +foreign-o1 óq
> +foreign-o2 ó
> +foreign-o2 ô
> +foreign-o2 ôp
> +foreign-o2 óq
> +foreign-o3 ó
> +foreign-o3 õ
> +foreign-o3 õp
> +foreign-o3 óq
> +foreign-o4 ó
> +foreign-o4 ø
> +foreign-o4 øp
> +foreign-o4 óq
> +foreign-r1 r
> +foreign-r1 Å™
> +foreign-r1 řp
> +foreign-r1 rq
> +foreign-s1 s
> +foreign-s1 Å¡
> +foreign-s1 Å¡p
> +foreign-s1 sq
> +foreign-u1 ú
> +foreign-u1 ù
> +foreign-u1 ùp
> +foreign-u1 úq
> +foreign-u2 ú
> +foreign-u2 û
> +foreign-u2 ûp
> +foreign-u2 úq
> +foreign-u3 ú
> +foreign-u3 Å©
> +foreign-u3 Å©p
> +foreign-u3 úq
> +foreign-u4 ú
> +foreign-u4 ů
> +foreign-u4 ůp
> +foreign-u4 úq
> +foreign-y1 y
> +foreign-y1 ÿ
> +foreign-y1 ÿp
> +foreign-y1 yq
> diff --git a/localedata/locales/hu_HU b/localedata/locales/hu_HU
> index 898d293..8d508e3 100644
> --- a/localedata/locales/hu_HU
> +++ b/localedata/locales/hu_HU
> @@ -22,7 +22,7 @@ escape_char /
>  % - made all days abbreviations same lenght by appending spaces
>  % Email: srtxg@chanae.alphanet.ch
>  %
> -% Further changes by Egmont Koblinger, 2002/Jan/06, 2012/Jan/03, 2015/Sep/03
> +% Further changes by Egmont Koblinger, 2002/Jan/06, 2012/Jan/03, 2015/Sep/03, 2017/Mar/21

Suggest:
Further changes by Egmont Koblinger between 2002-2017:

>  % - fixed tons of remaining bugs in alphabetical order
>  % - turned month names and similar stuff to lowercase
>  % - other small bugfixes
> @@ -64,6 +64,7 @@ category "i18n:2012";LC_MEASUREMENT
>  END LC_IDENTIFICATION
>  
>  LC_COLLATE
> +define DIACRIT_FORWARD
>  copy "iso14651_t1"
>  
>  %% a b c cs d dz dzs e f g gy h i j k l ly m n ny o o: p q
> @@ -77,15 +78,18 @@ copy "iso14651_t1"
>  %% dzs+dzs becomes ddzs, and so on.
>  %% However, c+cs is also spelled as ccs, you need to speak
>  %% the language to tell which one is the case.
> -%% Tokenize ccs as <c_or_cs><cs>, and sort the tokens as
> -%% a b c c_or_cs cs d... This effectively assumes cs+cs
> -%% which is more frequent than c+cs, but guarantees that the
> -%% strings ccs and cscs don't collate as equal.
> +%% Tokenize ccs as <cs><cs> since this is much more frequent
> +%% than <c><cs>, but apply SINGLE-OR-COMPOUND and COMPOUND
> +%% to the tokens so that the strings ccs and cscs don't collate
> +%% as equal.
> +%% The same goes for all other compound consonants.
>  
>  collating-symbol  <odouble>
>  collating-symbol  <udouble>
>  
> -collating-symbol  <c_or_cs>
> +collating-symbol  <SINGLE-OR-COMPOUND>
> +collating-symbol  <COMPOUND>
> +
>  collating-symbol  <cs>
>  collating-element <C-S> from "<U0043><U0053>"
>  collating-element <C-s> from "<U0043><U0073>"
> @@ -100,7 +104,6 @@ collating-element <c-C-s> from "<U0063><U0043><U0073>"
>  collating-element <c-c-S> from "<U0063><U0063><U0053>"
>  collating-element <c-c-s> from "<U0063><U0063><U0073>"
>  
> -collating-symbol  <d_or_dz>
>  collating-symbol  <dz>
>  collating-element <D-Z> from "<U0044><U005A>"
>  collating-element <D-z> from "<U0044><U007A>"
> @@ -115,7 +118,6 @@ collating-element <d-D-z> from "<U0064><U0044><U007A>"
>  collating-element <d-d-Z> from "<U0064><U0064><U005A>"
>  collating-element <d-d-z> from "<U0064><U0064><U007A>"
>  
> -collating-symbol  <d_or_dzs>
>  collating-symbol  <dzs>
>  collating-element <D-Z-S> from "<U0044><U005A><U0053>"
>  collating-element <D-Z-s> from "<U0044><U005A><U0073>"
> @@ -142,7 +144,6 @@ collating-element <d-d-Z-s> from "<U0064><U0064><U005A><U0073>"
>  collating-element <d-d-z-S> from "<U0064><U0064><U007A><U0053>"
>  collating-element <d-d-z-s> from "<U0064><U0064><U007A><U0073>"
>  
> -collating-symbol  <g_or_gy>
>  collating-symbol  <gy>
>  collating-element <G-Y> from "<U0047><U0059>"
>  collating-element <G-y> from "<U0047><U0079>"
> @@ -157,7 +158,6 @@ collating-element <g-G-y> from "<U0067><U0047><U0079>"
>  collating-element <g-g-Y> from "<U0067><U0067><U0059>"
>  collating-element <g-g-y> from "<U0067><U0067><U0079>"
>  
> -collating-symbol  <l_or_ly>
>  collating-symbol  <ly>
>  collating-element <L-Y> from "<U004C><U0059>"
>  collating-element <L-y> from "<U004C><U0079>"
> @@ -172,7 +172,6 @@ collating-element <l-L-y> from "<U006C><U004C><U0079>"
>  collating-element <l-l-Y> from "<U006C><U006C><U0059>"
>  collating-element <l-l-y> from "<U006C><U006C><U0079>"
>  
> -collating-symbol  <n_or_ny>
>  collating-symbol  <ny>
>  collating-element <N-Y> from "<U004E><U0059>"
>  collating-element <N-y> from "<U004E><U0079>"
> @@ -187,7 +186,6 @@ collating-element <n-N-y> from "<U006E><U004E><U0079>"
>  collating-element <n-n-Y> from "<U006E><U006E><U0059>"
>  collating-element <n-n-y> from "<U006E><U006E><U0079>"
>  
> -collating-symbol  <s_or_sz>
>  collating-symbol  <sz>
>  collating-element <S-Z> from "<U0053><U005A>"
>  collating-element <S-z> from "<U0053><U007A>"
> @@ -202,7 +200,6 @@ collating-element <s-S-z> from "<U0073><U0053><U007A>"
>  collating-element <s-s-Z> from "<U0073><U0073><U005A>"
>  collating-element <s-s-z> from "<U0073><U0073><U007A>"
>  
> -collating-symbol  <t_or_ty>
>  collating-symbol  <ty>
>  collating-element <T-Y> from "<U0054><U0059>"
>  collating-element <T-y> from "<U0054><U0079>"
> @@ -217,7 +214,6 @@ collating-element <t-T-y> from "<U0074><U0054><U0079>"
>  collating-element <t-t-Y> from "<U0074><U0074><U0059>"
>  collating-element <t-t-y> from "<U0074><U0074><U0079>"
>  
> -collating-symbol  <z_or_zs>
>  collating-symbol  <zs>
>  collating-element <Z-S> from "<U005A><U0053>"
>  collating-element <Z-s> from "<U005A><U0073>"
> @@ -232,8 +228,10 @@ collating-element <z-Z-s> from "<U007A><U005A><U0073>"
>  collating-element <z-z-S> from "<U007A><U007A><U0053>"
>  collating-element <z-z-s> from "<U007A><U007A><U0073>"
>  
> +collating-symbol <CAP-CAP>
>  collating-symbol <CAP-MIN>
>  collating-symbol <MIN-CAP>
> +collating-symbol <MIN-MIN>
>  collating-symbol <CAP-CAP-CAP>
>  collating-symbol <CAP-CAP-MIN>
>  collating-symbol <CAP-MIN-CAP>
> @@ -244,6 +242,7 @@ collating-symbol <MIN-MIN-CAP>
>  collating-symbol <MIN-MIN-MIN>
>  
>  reorder-after <MIN>
> +<MIN-MIN>
>  <MIN-CAP>
>  <MIN-MIN-MIN>
>  <MIN-MIN-CAP>
> @@ -252,42 +251,38 @@ reorder-after <MIN>
>  
>  reorder-after <CAP>
>  <CAP-MIN>
> +<CAP-CAP>
>  <CAP-MIN-MIN>
>  <CAP-MIN-CAP>
>  <CAP-CAP-MIN>
>  <CAP-CAP-CAP>
>  
>  reorder-after <c>
> -<c_or_cs>
>  <cs>
>  reorder-after <d>
> -<d_or_dz>
> -<d_or_dzs>
>  <dz>
>  <dzs>
>  reorder-after <g>
> -<g_or_gy>
>  <gy>
>  reorder-after <l>
> -<l_or_ly>
>  <ly>
>  reorder-after <n>
> -<n_or_ny>
>  <ny>
>  reorder-after <o>
>  <odouble>
>  reorder-after <s>
> -<s_or_sz>
>  <sz>
>  reorder-after <t>
> -<t_or_ty>
>  <ty>
>  reorder-after <u>
>  <udouble>
>  reorder-after <z>
> -<z_or_zs>
>  <zs>
>  
> +reorder-after <BAS>
> +<SINGLE-OR-COMPOUND>
> +<COMPOUND>
> +
>  reorder-after <o>
>  <U00F6>	<odouble>;<REU>;<MIN>;IGNORE
>  <U0151>	<odouble>;<DAC>;<MIN>;IGNORE
> @@ -300,152 +295,157 @@ reorder-after <u>
>  <U00DC>	<udouble>;<REU>;<CAP>;IGNORE
>  <U0170>	<udouble>;<DAC>;<CAP>;IGNORE
>  
> +reorder-after <BAS>
> +<ACA>
> +<REU>
> +<DAC>
> +
>  reorder-after <U0043>
> -<C-S>		<cs>;<BAS>;<CAP>;IGNORE
> -<C-s>		<cs>;<BAS>;<CAP-MIN>;IGNORE
> -<C-C-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<C-C-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<C-c-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<C-c-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<C-S>		<cs>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<C-s>		<cs>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<C-C-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<C-C-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<C-c-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<C-c-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U0063>
> -<c-S>		<cs>;<BAS>;<MIN-CAP>;IGNORE
> -<c-s>		<cs>;<BAS>;<MIN>;IGNORE
> -<c-C-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<c-C-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<c-c-S>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<c-c-s>		"<c_or_cs><cs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<c-S>		<cs>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<c-s>		<cs>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<c-C-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<c-C-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<c-c-S>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<c-c-s>		"<cs><cs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-after <U0044>
> -<D-Z>		<dz>;<BAS>;<CAP>;IGNORE
> -<D-z>		<dz>;<BAS>;<CAP-MIN>;IGNORE
> -<D-D-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<D-D-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<D-d-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<D-d-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<D-Z>		<dz>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<D-z>		<dz>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<D-D-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<D-D-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<D-d-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<D-d-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U0064>
> -<d-Z>		<dz>;<BAS>;<MIN-CAP>;IGNORE
> -<d-z>		<dz>;<BAS>;<MIN>;IGNORE
> -<d-D-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<d-D-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<d-d-Z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<d-d-z>		"<d_or_dz><dz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<d-Z>		<dz>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<d-z>		<dz>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<d-D-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<d-D-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<d-d-Z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<d-d-z>		"<dz><dz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-after <U0044>
> -<D-Z-S>		<dzs>;<BAS>;<CAP-CAP-CAP>;IGNORE
> -<D-Z-s>		<dzs>;<BAS>;<CAP-CAP-MIN>;IGNORE
> -<D-z-S>		<dzs>;<BAS>;<CAP-MIN-CAP>;IGNORE
> -<D-z-s>		<dzs>;<BAS>;<CAP-MIN-MIN>;IGNORE
> -<D-D-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
> -<D-D-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
> -<D-D-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
> -<D-D-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
> -<D-d-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-CAP>";IGNORE
> -<D-d-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-CAP-MIN>";IGNORE
> -<D-d-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-CAP>";IGNORE
> -<D-d-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<CAP><CAP-MIN-MIN>";IGNORE
> +<D-Z-S>		<dzs>;<COMPOUND>;<CAP-CAP-CAP>;IGNORE
> +<D-Z-s>		<dzs>;<COMPOUND>;<CAP-CAP-MIN>;IGNORE
> +<D-z-S>		<dzs>;<COMPOUND>;<CAP-MIN-CAP>;IGNORE
> +<D-z-s>		<dzs>;<COMPOUND>;<CAP-MIN-MIN>;IGNORE
> +<D-D-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-CAP>";IGNORE
> +<D-D-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP-MIN>";IGNORE
> +<D-D-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-CAP>";IGNORE
> +<D-D-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN-MIN>";IGNORE
> +<D-d-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-CAP>";IGNORE
> +<D-d-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP-MIN>";IGNORE
> +<D-d-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-CAP>";IGNORE
> +<D-d-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN-MIN>";IGNORE
>  reorder-after <U0064>
> -<d-Z-S>		<dzs>;<BAS>;<MIN-CAP-CAP>;IGNORE
> -<d-Z-s>		<dzs>;<BAS>;<MIN-CAP-MIN>;IGNORE
> -<d-z-S>		<dzs>;<BAS>;<MIN-MIN-CAP>;IGNORE
> -<d-z-s>		<dzs>;<BAS>;<MIN-MIN-MIN>;IGNORE
> -<d-D-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
> -<d-D-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
> -<d-D-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
> -<d-D-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
> -<d-d-Z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-CAP>";IGNORE
> -<d-d-Z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-CAP-MIN>";IGNORE
> -<d-d-z-S>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-CAP>";IGNORE
> -<d-d-z-s>	"<d_or_dzs><dzs>";"<BAS><BAS>";"<MIN><CAP-MIN-MIN>";IGNORE
> +<d-Z-S>		<dzs>;<COMPOUND>;<MIN-CAP-CAP>;IGNORE
> +<d-Z-s>		<dzs>;<COMPOUND>;<MIN-CAP-MIN>;IGNORE
> +<d-z-S>		<dzs>;<COMPOUND>;<MIN-MIN-CAP>;IGNORE
> +<d-z-s>		<dzs>;<COMPOUND>;<MIN-MIN-MIN>;IGNORE
> +<d-D-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-CAP>";IGNORE
> +<d-D-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP-MIN>";IGNORE
> +<d-D-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-CAP>";IGNORE
> +<d-D-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN-MIN>";IGNORE
> +<d-d-Z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-CAP>";IGNORE
> +<d-d-Z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP-MIN>";IGNORE
> +<d-d-z-S>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-CAP>";IGNORE
> +<d-d-z-s>	"<dzs><dzs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN-MIN>";IGNORE
>  
>  reorder-after <U0047>
> -<G-Y>		<gy>;<BAS>;<CAP>;IGNORE
> -<G-y>		<gy>;<BAS>;<CAP-MIN>;IGNORE
> -<G-G-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<G-G-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<G-g-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<G-g-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<G-Y>		<gy>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<G-y>		<gy>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<G-G-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<G-G-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<G-g-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<G-g-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U0067>
> -<g-y>		<gy>;<BAS>;<MIN>;IGNORE
> -<g-Y>		<gy>;<BAS>;<MIN-CAP>;IGNORE
> -<g-G-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<g-G-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<g-g-Y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<g-g-y>		"<g_or_gy><gy>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<g-Y>		<gy>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<g-y>		<gy>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<g-G-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<g-G-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<g-g-Y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<g-g-y>		"<gy><gy>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-after <U004C>
> -<L-Y>		<ly>;<BAS>;<CAP>;IGNORE
> -<L-y>		<ly>;<BAS>;<CAP-MIN>;IGNORE
> -<L-L-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<L-L-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<L-l-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<L-l-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<L-Y>		<ly>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<L-y>		<ly>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<L-L-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<L-L-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<L-l-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<L-l-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U006C>
> -<l-y>		<ly>;<BAS>;<MIN>;IGNORE
> -<l-Y>		<ly>;<BAS>;<MIN-CAP>;IGNORE
> -<l-L-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<l-L-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<l-l-Y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<l-l-y>		"<l_or_ly><ly>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<l-Y>		<ly>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<l-y>		<ly>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<l-L-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<l-L-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<l-l-Y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<l-l-y>		"<ly><ly>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-after <U004E>
> -<N-Y>		<ny>;<BAS>;<CAP>;IGNORE
> -<N-y>		<ny>;<BAS>;<CAP-MIN>;IGNORE
> -<N-N-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<N-N-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<N-n-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<N-n-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<N-Y>		<ny>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<N-y>		<ny>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<N-N-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<N-N-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<N-n-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<N-n-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U006E>
> -<n-y>		<ny>;<BAS>;<MIN>;IGNORE
> -<n-Y>		<ny>;<BAS>;<MIN-CAP>;IGNORE
> -<n-N-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<n-N-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<n-n-Y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<n-n-y>		"<n_or_ny><ny>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<n-Y>		<ny>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<n-y>		<ny>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<n-N-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<n-N-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<n-n-Y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<n-n-y>		"<ny><ny>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-after <U0053>
> -<S-Z>		<sz>;<BAS>;<CAP>;IGNORE
> -<S-z>		<sz>;<BAS>;<CAP-MIN>;IGNORE
> -<S-S-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<S-S-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<S-s-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<S-s-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<S-Z>		<sz>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<S-z>		<sz>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<S-S-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<S-S-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<S-s-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<S-s-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U0073>
> -<s-Z>		<sz>;<BAS>;<MIN-CAP>;IGNORE
> -<s-z>		<sz>;<BAS>;<MIN>;IGNORE
> -<s-S-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<s-S-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<s-s-Z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<s-s-z>		"<s_or_sz><sz>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<s-Z>		<sz>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<s-z>		<sz>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<s-S-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<s-S-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<s-s-Z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<s-s-z>		"<sz><sz>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-after <U0054>
> -<T-Y>		<ty>;<BAS>;<CAP>;IGNORE
> -<T-y>		<ty>;<BAS>;<CAP-MIN>;IGNORE
> -<T-T-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<T-T-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<T-t-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<T-t-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<T-Y>		<ty>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<T-y>		<ty>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<T-T-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<T-T-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<T-t-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<T-t-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U0074>
> -<t-Y>		<ty>;<BAS>;<MIN-CAP>;IGNORE
> -<t-y>		<ty>;<BAS>;<MIN>;IGNORE
> -<t-T-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<t-T-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<t-t-Y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<t-t-y>		"<t_or_ty><ty>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<t-Y>		<ty>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<t-y>		<ty>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<t-T-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<t-T-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<t-t-Y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<t-t-y>		"<ty><ty>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-after <U005A>
> -<Z-S>		<zs>;<BAS>;<CAP>;IGNORE
> -<Z-s>		<zs>;<BAS>;<CAP-MIN>;IGNORE
> -<Z-Z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP>";IGNORE
> -<Z-Z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><CAP-MIN>";IGNORE
> -<Z-z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN-CAP>";IGNORE
> -<Z-z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<CAP><MIN>";IGNORE
> +<Z-S>		<zs>;<COMPOUND>;<CAP-CAP>;IGNORE
> +<Z-s>		<zs>;<COMPOUND>;<CAP-MIN>;IGNORE
> +<Z-Z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-CAP>";IGNORE
> +<Z-Z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><CAP-MIN>";IGNORE
> +<Z-z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-CAP>";IGNORE
> +<Z-z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<CAP><MIN-MIN>";IGNORE
>  reorder-after <U007A>
> -<z-S>		<zs>;<BAS>;<MIN-CAP>;IGNORE
> -<z-s>		<zs>;<BAS>;<MIN>;IGNORE
> -<z-Z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP>";IGNORE
> -<z-Z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><CAP-MIN>";IGNORE
> -<z-z-S>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN-CAP>";IGNORE
> -<z-z-s>		"<z_or_zs><zs>";"<BAS><BAS>";"<MIN><MIN>";IGNORE
> +<z-S>		<zs>;<COMPOUND>;<MIN-CAP>;IGNORE
> +<z-s>		<zs>;<COMPOUND>;<MIN-MIN>;IGNORE
> +<z-Z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-CAP>";IGNORE
> +<z-Z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><CAP-MIN>";IGNORE
> +<z-z-S>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-CAP>";IGNORE
> +<z-z-s>		"<zs><zs>";"<SINGLE-OR-COMPOUND><COMPOUND>";"<MIN><MIN-MIN>";IGNORE
>  
>  reorder-end
>  


-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-03-22  1:03                                           ` Carlos O'Donell
@ 2017-03-22  7:20                                             ` Egmont Koblinger
  2017-03-28 14:37                                               ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-03-22  7:20 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Luis Javier Merino, libc-locales

Hi,

On Wed, Mar 22, 2017 at 2:02 AM, Carlos O'Donell <carlos@redhat.com> wrote:

> In summary:
> - This v6 looks ready to checkin.
> - Suggest
>   "Further changes by Egmont Koblinger between 2002-2017:"
>   in the locale file. Are you OK with that? See below.

Either this, or perhaps listing each year (2002, 2012, 2015, 2017)?
Either one is absolutely fine for me.

> - NEWS item:
> "* Extensive new collation tests for Hungarian locales based on
>    `The Rules of Hungarian Orthography, 12th edition` and the
>    work of Egmont Koblinger <egmont@gmail.com> (Bug 18934).

Amazing, thanks! :)

> If you're OK with that then I'll commit.

Yes please, I can hardly wait for it :)))

(You should also bump the date in localedata/Changelog, although being
a day behind is probably not a big deal.)


Thanks a lot,
egmont

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-03-22  7:20                                             ` Egmont Koblinger
@ 2017-03-28 14:37                                               ` Carlos O'Donell
  2017-03-28 22:52                                                 ` Egmont Koblinger
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2017-03-28 14:37 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Luis Javier Merino, libc-locales

On 03/22/2017 03:19 AM, Egmont Koblinger wrote:
> Hi,
> 
> On Wed, Mar 22, 2017 at 2:02 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> In summary:
>> - This v6 looks ready to checkin.
>> - Suggest
>>   "Further changes by Egmont Koblinger between 2002-2017:"
>>   in the locale file. Are you OK with that? See below.
> 
> Either this, or perhaps listing each year (2002, 2012, 2015, 2017)?
> Either one is absolutely fine for me.
> 
>> - NEWS item:
>> "* Extensive new collation tests for Hungarian locales based on
>>    `The Rules of Hungarian Orthography, 12th edition` and the
>>    work of Egmont Koblinger <egmont@gmail.com> (Bug 18934).
> 
> Amazing, thanks! :)
> 
>> If you're OK with that then I'll commit.
> 
> Yes please, I can hardly wait for it :)))
> 
> (You should also bump the date in localedata/Changelog, although being
> a day behind is probably not a big deal.)
> 
> 
> Thanks a lot,
> egmont
> 

Pushed.

commit ea1898dded26316e2e73adfb409224e864ffaa8b
Author: Egmont Koblinger <egmont@gmail.com>
Date:   Wed Mar 22 21:27:30 2017 -0400

    localedata: hu_HU: fix multiple sorting bugs (bug 18934)
    
    Fix the incorrect sorting order of a digraph and its geminated variant,
    regression introduced by a faulty fix to bug 13547 in commit
    b008d4c85619a753e441d7f473ba8af0db400bd6.
    
    Fix two inconsistencies in sorting unusual capitalization of digraphs
    (bug #18587).
    
    Enable DIACRIT_FORWARD to work around bug #17750.
    
    Sort foreign accents after the Hungarian ones.
    
    Add extensive unittests containing all the examples from The Rules of
    Hungarian Orthography and many more, including explanatory comments.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-03-28 14:37                                               ` Carlos O'Donell
@ 2017-03-28 22:52                                                 ` Egmont Koblinger
  2017-03-28 23:03                                                   ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Egmont Koblinger @ 2017-03-28 22:52 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Luis Javier Merino, libc-locales

Yay! :)


e.

On Tue, Mar 28, 2017 at 4:37 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 03/22/2017 03:19 AM, Egmont Koblinger wrote:
>> Hi,
>>
>> On Wed, Mar 22, 2017 at 2:02 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> In summary:
>>> - This v6 looks ready to checkin.
>>> - Suggest
>>>   "Further changes by Egmont Koblinger between 2002-2017:"
>>>   in the locale file. Are you OK with that? See below.
>>
>> Either this, or perhaps listing each year (2002, 2012, 2015, 2017)?
>> Either one is absolutely fine for me.
>>
>>> - NEWS item:
>>> "* Extensive new collation tests for Hungarian locales based on
>>>    `The Rules of Hungarian Orthography, 12th edition` and the
>>>    work of Egmont Koblinger <egmont@gmail.com> (Bug 18934).
>>
>> Amazing, thanks! :)
>>
>>> If you're OK with that then I'll commit.
>>
>> Yes please, I can hardly wait for it :)))
>>
>> (You should also bump the date in localedata/Changelog, although being
>> a day behind is probably not a big deal.)
>>
>>
>> Thanks a lot,
>> egmont
>>
>
> Pushed.
>
> commit ea1898dded26316e2e73adfb409224e864ffaa8b
> Author: Egmont Koblinger <egmont@gmail.com>
> Date:   Wed Mar 22 21:27:30 2017 -0400
>
>     localedata: hu_HU: fix multiple sorting bugs (bug 18934)
>
>     Fix the incorrect sorting order of a digraph and its geminated variant,
>     regression introduced by a faulty fix to bug 13547 in commit
>     b008d4c85619a753e441d7f473ba8af0db400bd6.
>
>     Fix two inconsistencies in sorting unusual capitalization of digraphs
>     (bug #18587).
>
>     Enable DIACRIT_FORWARD to work around bug #17750.
>
>     Sort foreign accents after the Hungarian ones.
>
>     Add extensive unittests containing all the examples from The Rules of
>     Hungarian Orthography and many more, including explanatory comments.
>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
  2017-03-28 22:52                                                 ` Egmont Koblinger
@ 2017-03-28 23:03                                                   ` Carlos O'Donell
  0 siblings, 0 replies; 33+ messages in thread
From: Carlos O'Donell @ 2017-03-28 23:03 UTC (permalink / raw)
  To: Egmont Koblinger; +Cc: Luis Javier Merino, libc-locales

On 03/28/2017 06:51 PM, Egmont Koblinger wrote:
> Yay! :)

My apologies for the delay, and my many thanks for your hard work.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-03-28 23:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-13 21:57 [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs Egmont Koblinger
2015-10-13 22:37 ` Egmont Koblinger
2015-10-26 15:25   ` Egmont Koblinger
2015-11-15 21:34     ` Egmont Koblinger
2016-01-14 12:54       ` Egmont Koblinger
2016-04-16  8:50         ` Egmont Koblinger
2016-04-21  6:13   ` Mike Frysinger
2016-04-21 11:15     ` Egmont Koblinger
2016-04-21 15:18       ` Mike Frysinger
2016-06-29 21:01     ` Egmont Koblinger
2017-01-31 23:17       ` Egmont Koblinger
2017-02-01  0:48         ` Carlos O'Donell
2017-02-01  1:56           ` Egmont Koblinger
2017-02-01 16:01             ` Luis Javier Merino
2017-02-02  0:04               ` Egmont Koblinger
2017-02-02 13:28                 ` Carlos O'Donell
2017-02-02 19:00                   ` Egmont Koblinger
2017-02-05 12:16                     ` Luis Javier Merino
2017-02-05 16:30                       ` Egmont Koblinger
2017-02-09 22:20                         ` Egmont Koblinger
2017-02-10 15:06                           ` Carlos O'Donell
2017-02-15 18:03                             ` Egmont Koblinger
2017-02-16  2:36                               ` Carlos O'Donell
2017-02-21 14:55                                 ` Egmont Koblinger
2017-02-22 17:36                                   ` Carlos O'Donell
2017-03-15 20:37                                     ` Egmont Koblinger
2017-03-16 17:41                                       ` Carlos O'Donell
2017-03-21 22:40                                         ` Egmont Koblinger
2017-03-22  1:03                                           ` Carlos O'Donell
2017-03-22  7:20                                             ` Egmont Koblinger
2017-03-28 14:37                                               ` Carlos O'Donell
2017-03-28 22:52                                                 ` Egmont Koblinger
2017-03-28 23:03                                                   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).