public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
* merge done
@ 1999-08-31  0:20 Ulrich Drepper
  1999-08-31  2:52 ` Andreas Schwab
  0 siblings, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 1999-08-31  0:20 UTC (permalink / raw)
  To: GNU libc hacker

I've merged in my locale changes.  Hopefully it still compiles.  It
did before the merge but I changed a bit while doing it.  Anyhow, I'll
start the compiler now but will go to bed.  I will fix eventual
compilation problems tomorrow, no need to send patches.

What you can send patches for are functional problems.

What should I mention?  Well, first, LC_COLLATE is for now disabled.
The normal string functions are used for now.  The new implementation
follows ISO 14652 quite closely.  The only deliberate change I've made
is not supporting stateful character sets with the strange charmap
definitions.  I don't think it would ever work since the definition
format is not good enough.

You will also note that the localedef program takes quite a long time
when using on large charsets, say, UTF8.  Well, that's how it will be.
Most of the time is spend in the optimizing phase.

Also the output files are quite big.  This I actually regard as a
problem.  It might be worthwhile going away from the byte order
independent format since then the file size will shrink by almost 50%.

OK, now give it a try.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  0:20 merge done Ulrich Drepper
@ 1999-08-31  2:52 ` Andreas Schwab
  1999-08-31  4:04   ` Geoff Keating
  1999-08-31  8:51   ` Ulrich Drepper
  0 siblings, 2 replies; 11+ messages in thread
From: Andreas Schwab @ 1999-08-31  2:52 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: GNU libc hacker

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

Ulrich Drepper <drepper@cygnus.com> writes:

|> Also the output files are quite big.  This I actually regard as a
|> problem.  It might be worthwhile going away from the byte order
|> independent format since then the file size will shrink by almost 50%.

What's the point in storing both the little and big endian value for
single word entries?

Andreas.

-- 
Andreas Schwab                                  "And now for something
schwab@suse.de                                   completely different."
SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  2:52 ` Andreas Schwab
@ 1999-08-31  4:04   ` Geoff Keating
  1999-08-31  4:35     ` Andreas Schwab
  1999-08-31  8:44     ` Ulrich Drepper
  1999-08-31  8:51   ` Ulrich Drepper
  1 sibling, 2 replies; 11+ messages in thread
From: Geoff Keating @ 1999-08-31  4:04 UTC (permalink / raw)
  To: schwab; +Cc: drepper, libc-hacker

Ulrich Drepper <drepper@cygnus.com> writes:
|> Also the output files are quite big.  This I actually regard as a
|> problem.  It might be worthwhile going away from the byte order
|> independent format since then the file size will shrink by almost 50%.

It would help a lot if we could use symlinks or something to avoid
having dozens of identical copies of LC_COLLATE and LC_CTYPE.

On my system I think I have 35 copies of the en_DK LC_CTYPE, for
instance, which come up to nearly 400k; and 22 copies of its
LC_COLLATE, which is nearly 700k.  There's rarely any reason for the
CTYPE files to differ if the charsets are the same, and the COLLATE
files should at least be the same if the language and charset is the
same.

This would also speed up the build, if we could generate these files
just once.

-- 
Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  4:04   ` Geoff Keating
@ 1999-08-31  4:35     ` Andreas Schwab
  1999-08-31  8:44     ` Ulrich Drepper
  1 sibling, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 1999-08-31  4:35 UTC (permalink / raw)
  To: Geoff Keating; +Cc: drepper, libc-hacker

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

Geoff Keating <geoffk@ozemail.com.au> writes:

|> Ulrich Drepper <drepper@cygnus.com> writes:
|> |> Also the output files are quite big.  This I actually regard as a
|> |> problem.  It might be worthwhile going away from the byte order
|> |> independent format since then the file size will shrink by almost 50%.
|> 
|> It would help a lot if we could use symlinks or something to avoid
|> having dozens of identical copies of LC_COLLATE and LC_CTYPE.
|> 
|> On my system I think I have 35 copies of the en_DK LC_CTYPE, for
|> instance, which come up to nearly 400k; and 22 copies of its
|> LC_COLLATE, which is nearly 700k.  There's rarely any reason for the
|> CTYPE files to differ if the charsets are the same, and the COLLATE
|> files should at least be the same if the language and charset is the
|> same.

How do you handle the case when the en_DK locale is not installed or
removed?  And if you make hardlinks you have to make sure that the locale
files are made in the correct order, so that you don't link to files that
are out of date.  I don't think this is all worth the trouble.

Andreas.

-- 
Andreas Schwab                                  "And now for something
schwab@suse.de                                   completely different."
SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  4:04   ` Geoff Keating
  1999-08-31  4:35     ` Andreas Schwab
@ 1999-08-31  8:44     ` Ulrich Drepper
  1999-08-31 22:24       ` Geoff Keating
  1 sibling, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 1999-08-31  8:44 UTC (permalink / raw)
  To: Geoff Keating; +Cc: schwab, libc-hacker

Geoff Keating <geoffk@ozemail.com.au> writes:

> On my system I think I have 35 copies of the en_DK LC_CTYPE, for
> instance, which come up to nearly 400k; and 22 copies of its
> LC_COLLATE, which is nearly 700k.

Well, if you look at the files you find that many are different.  And
with the new possibilities most of the LC_COLLATE definitions will
differ in the one or the other form.  This hasn't happened so far
because of it so difficult to describe without a terrible amount of
duplication.

For the LC_CTYPE stuff.  I could see a way how to avoid it but it is
not very clean.  The biggest part of the data is the table for the
isw*() and tow*() functions.  But since the encoding is almost always
UCS4 (there will be a few differences in future) it means the tables
are ideally the same, all filled with the information from the Unicode
tables.  But there are possibilities to differ:

- there can be non-standard character classes or conversions

- there can be intentionally differences  for whatever reason

- the wchar_t encoding is different


What I'd like to have is a test in localedef whether the Unicode
tables are ok or not.  If they are ok to use then no tables for these
functions are emitted.  Maybe make the test depending on
POSIXLY_CORRECT more or less strict (e.g., if the nvvar is not set
simply test that no bit cleared in the builtin tables would be set in
the loaded tables).

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  2:52 ` Andreas Schwab
  1999-08-31  4:04   ` Geoff Keating
@ 1999-08-31  8:51   ` Ulrich Drepper
  1999-08-31  9:00     ` Andreas Schwab
  1 sibling, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 1999-08-31  8:51 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: GNU libc hacker

Andreas Schwab <schwab@suse.de> writes:

> What's the point in storing both the little and big endian value for
> single word entries?

It would not require any work at load time.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  8:51   ` Ulrich Drepper
@ 1999-08-31  9:00     ` Andreas Schwab
  1999-08-31  9:04       ` Ulrich Drepper
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Schwab @ 1999-08-31  9:00 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: GNU libc hacker

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

Ulrich Drepper <drepper@cygnus.com> writes:

|> Andreas Schwab <schwab@suse.de> writes:
|> 
|> > What's the point in storing both the little and big endian value for
|> > single word entries?
|> 
|> It would not require any work at load time.

But the work is being done anyway:

loadlocale.c:
      if (_nl_value_types[category][cnt] == word)
	newdata->values[cnt].word = W (*((u_int32_t *) (newdata->filedata
							+ idx)));
      else
	newdata->values[cnt].string = newdata->filedata + idx;

So currently the other endian entries only waste space.

Andreas.

-- 
Andreas Schwab                                  "And now for something
schwab@suse.de                                   completely different."
SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  9:00     ` Andreas Schwab
@ 1999-08-31  9:04       ` Ulrich Drepper
  1999-08-31  9:07         ` Andreas Schwab
  0 siblings, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 1999-08-31  9:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: GNU libc hacker

Andreas Schwab <schwab@suse.de> writes:

> But the work is being done anyway:

I know.  I am trying to get rid of it since it does not really safe
anything.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  9:04       ` Ulrich Drepper
@ 1999-08-31  9:07         ` Andreas Schwab
  0 siblings, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 1999-08-31  9:07 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: GNU libc hacker

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 451 bytes --]

Ulrich Drepper <drepper@cygnus.com> writes:

|> Andreas Schwab <schwab@suse.de> writes:
|> 
|> > But the work is being done anyway:
|> 
|> I know.  I am trying to get rid of it since it does not really safe
|> anything.

Ok, good to know.

Andreas.

-- 
Andreas Schwab                                  "And now for something
schwab@suse.de                                   completely different."
SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31  8:44     ` Ulrich Drepper
@ 1999-08-31 22:24       ` Geoff Keating
  1999-08-31 23:01         ` Ulrich Drepper
  0 siblings, 1 reply; 11+ messages in thread
From: Geoff Keating @ 1999-08-31 22:24 UTC (permalink / raw)
  To: drepper; +Cc: schwab, libc-hacker

> Cc: schwab@suse.de, libc-hacker@sourceware.cygnus.com
> Reply-To: drepper@cygnus.com (Ulrich Drepper)
> From: Ulrich Drepper <drepper@cygnus.com>
> Date: 31 Aug 1999 08:39:57 -0700
> 
> Geoff Keating <geoffk@ozemail.com.au> writes:
> 
> > On my system I think I have 35 copies of the en_DK LC_CTYPE, for
> > instance, which come up to nearly 400k; and 22 copies of its
> > LC_COLLATE, which is nearly 700k.
> 
> Well, if you look at the files you find that many are different. 

Yes, but many are the same.  For instance:

[geoffk@geoffk locale]$ md5sum */LC_CTYPE
2ddcbc6c352b4734e7d4ab020ec1c07f  cs_CZ/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  da_DK/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_AT/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_BE/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_CH/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_DE/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_LU/LC_CTYPE
86ab85d84ef4e7cf25c0f2f4a8bb227c  el_GR.ISO8859-7/LC_CTYPE
86ab85d84ef4e7cf25c0f2f4a8bb227c  el_GR/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  en_AU/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  en_CA/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  en_DK/LC_CTYPE
...

[geoffk@geoffk locale]$ md5sum */LC_COLLATE
0b12c2bf93730c1911a6adb66da13cf6  cs_CZ/LC_COLLATE
af476468a5848376d2baa4fe24e0dc31  da_DK/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_AT/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_BE/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_CH/LC_COLLATE
fd032782430ad9fcf67b6aa91c251a2f  de_DE/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_LU/LC_COLLATE
53fce17204ea0403e236acfb47c537e9  el_GR.ISO8859-7/LC_COLLATE
89a594a0a067b957f8c6a8469856d2ca  el_GR/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  en_AU/LC_COLLATE
088130adb3c3dae4aa9e7f778fe7e019  en_CA/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  en_DK/LC_COLLATE
...

> And with the new possibilities most of the LC_COLLATE definitions
> will differ in the one or the other form.  This hasn't happened so
> far because of it so difficult to describe without a terrible amount
> of duplication.
> 
> For the LC_CTYPE stuff.  I could see a way how to avoid it but it is
> not very clean.  The biggest part of the data is the table for the
> isw*() and tow*() functions.  But since the encoding is almost always
> UCS4 (there will be a few differences in future) it means the tables
> are ideally the same, all filled with the information from the Unicode
> tables.
...

Yes, that would help.

The point I'm trying to make is that many of the collating orders
etc. are common between different countries and languages.  For
instance, I'd be very suprised to find that en_AU, en_US, en_GB, en_IE
and en_CA have different definitions for LC_CTYPE, and the first four
should be the same for LC_COLLATE (fr_CA and en_CA are the same and
different from en_AU, which sort of makes sense).

I don't mean that _every_ LC_COLLATE will be the same; that would be
silly.  Just that there are a relatively small number of alternatives
compared to the total number of locales.

-- 
Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: merge done
  1999-08-31 22:24       ` Geoff Keating
@ 1999-08-31 23:01         ` Ulrich Drepper
  0 siblings, 0 replies; 11+ messages in thread
From: Ulrich Drepper @ 1999-08-31 23:01 UTC (permalink / raw)
  To: Geoff Keating; +Cc: schwab, libc-hacker

Geoff Keating <geoffk@ozemail.com.au> writes:

> I don't mean that _every_ LC_COLLATE will be the same; that would be
> silly.  Just that there are a relatively small number of alternatives
> compared to the total number of locales.

I don't want to get into trying clever things with symliniks and so.
If you care about disspace don't install all locales.  The localedef
program is standardized to allow the user generate the necessary files.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~1999-08-31 23:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-08-31  0:20 merge done Ulrich Drepper
1999-08-31  2:52 ` Andreas Schwab
1999-08-31  4:04   ` Geoff Keating
1999-08-31  4:35     ` Andreas Schwab
1999-08-31  8:44     ` Ulrich Drepper
1999-08-31 22:24       ` Geoff Keating
1999-08-31 23:01         ` Ulrich Drepper
1999-08-31  8:51   ` Ulrich Drepper
1999-08-31  9:00     ` Andreas Schwab
1999-08-31  9:04       ` Ulrich Drepper
1999-08-31  9:07         ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).