public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/18587] New: Minor collate issues in Hungarian locale
@ 2015-06-23 23:11 egmont at gmail dot com
  2015-09-08  8:39 ` [Bug localedata/18587] " egmont at gmail dot com
  2017-03-28 21:34 ` cvs-commit at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: egmont at gmail dot com @ 2015-06-23 23:11 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18587

            Bug ID: 18587
           Summary: Minor collate issues in Hungarian locale
           Product: glibc
           Version: 2.21
            Status: NEW
          Severity: minor
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: egmont at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Created attachment 8385
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8385&action=edit
Fix

There are two minor issues with the Hungarian locale when sorting strings that
only differ in their case. Please apply the attached patch to fix them.

Issue 1:

Most of the time the lowercase counterpart is sorted before the uppercase;
however it's not the case for "CS" < "Cs", and similarly for all the other
double consonants (dz, gy, ..., there are 8 of them in total).

To test:

LC_ALL=hu_HU.UTF-8 sort -k 1,1 -s << END
cs 1
cS 2
Cs 3
CS 4
END

Expected output: according to the numbers. Current output: in the order 1 2 4
3.

The fix copies the pattern found at the only triple consonant "dzs", by using
the new <MIN-MIN> or <CAP-CAP> instead of <MIN> or <CAP> to explicitly denote
the case of both of the codepoints in the compound letter. This also makes the
file's layout more nicely tabulated and easier to read.

Issue 2:

When the only triple letter "dzs" is pronounced long, it's spelled as "ddzs",
however, due to stupid obvious typos of using <CAP-x-y> instead of <MIN-x-y>
(this mistake might have been introduced by me a long time ago, can't
remember), the case of the second "d" is ignored rather than lowercase being
sorted before uppercase.

To test:

LC_ALL=hu_HU.UTF-8 sort -k 1,1 -s << END
DDzs 2
Ddzs 1
DDzs 3
END

Expected output: according to the numbers. Actual output: unchanged order,
proving that they all compare equal.

On a slightly related note: the new version of the Hungarian spelling rules is
planned to be released this September [1], replacing the current 30 year old
version. The old version's section about alphabetical sorting doesn't say what
to do when only the case differs. Allegedly the new version will specify that
lowercase is to be sorted first, followed by uppercase: [2] -> "arany, Arany",
which is what the current version already implements - apart from these bugs.
So this patch is also in preparation for the new rules.

[1]
http://mta.hu/mta_hirei/szeptemberben-jelenik-meg-a-magyar-helyesiras-szabalyai-tizenkettedik-kiadasa-136386/
[2] http://www.nyest.hu/hirek/mi-ujsag-a-helyesirasban

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug localedata/18587] Minor collate issues in Hungarian locale
  2015-06-23 23:11 [Bug localedata/18587] New: Minor collate issues in Hungarian locale egmont at gmail dot com
@ 2015-09-08  8:39 ` egmont at gmail dot com
  2017-03-28 21:34 ` cvs-commit at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: egmont at gmail dot com @ 2015-09-08  8:39 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18587

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #1 from Egmont Koblinger <egmont at gmail dot com> ---
I discovered other bugs as well, and created a patch that does not only address
all of them but also adds extensive test coverage. I wouldn't want to pollute
this bug by squeezing in new ones, so I decided to create a new one.

Let's mark this bug as obsoleted by bug 18934.

*** This bug has been marked as a duplicate of bug 18934 ***

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug localedata/18587] Minor collate issues in Hungarian locale
  2015-06-23 23:11 [Bug localedata/18587] New: Minor collate issues in Hungarian locale egmont at gmail dot com
  2015-09-08  8:39 ` [Bug localedata/18587] " egmont at gmail dot com
@ 2017-03-28 21:34 ` cvs-commit at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2017-03-28 21:34 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18587

--- Comment #2 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  ea1898dded26316e2e73adfb409224e864ffaa8b (commit)
      from  78c05814320cdc3377347f8e5fdbaa7cf5abf5b5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ea1898dded26316e2e73adfb409224e864ffaa8b

commit ea1898dded26316e2e73adfb409224e864ffaa8b
Author: Egmont Koblinger <egmont@gmail.com>
Date:   Wed Mar 22 21:27:30 2017 -0400

    localedata: hu_HU: fix multiple sorting bugs (bug 18934)

    Fix the incorrect sorting order of a digraph and its geminated variant,
    regression introduced by a faulty fix to bug 13547 in commit
    b008d4c85619a753e441d7f473ba8af0db400bd6.

    Fix two inconsistencies in sorting unusual capitalization of digraphs
    (bug #18587).

    Enable DIACRIT_FORWARD to work around bug #17750.

    Sort foreign accents after the Hungarian ones.

    Add extensive unittests containing all the examples from The Rules of
    Hungarian Orthography and many more, including explanatory comments.

-----------------------------------------------------------------------

Summary of changes:
 NEWS                     |    4 +
 localedata/ChangeLog     |    7 +
 localedata/Makefile      |    4 +-
 localedata/hu_HU.in      |  560 ++++++++++++++++++++++++++++++++++++++++++++++
 localedata/locales/hu_HU |  286 ++++++++++++------------
 5 files changed, 716 insertions(+), 145 deletions(-)
 create mode 100644 localedata/hu_HU.in

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-03-28 21:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-23 23:11 [Bug localedata/18587] New: Minor collate issues in Hungarian locale egmont at gmail dot com
2015-09-08  8:39 ` [Bug localedata/18587] " egmont at gmail dot com
2017-03-28 21:34 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).