From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-67704-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 21411 invoked by alias); 29 Feb 2016 07:53:37 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 21367 invoked by uid 89); 29 Feb 2016 07:53:36 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: =?ISO-8859-1?Q?Yes, score=5.6 required=5.0 tests=BAYES_50,BODY_8BITS,FREEMAIL_FROM,GARBLED_BODY,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 spammy=8:=b8=b4, 8:=b8=a3, 8:=b1=e0, 8:=b9?=
X-HELO: mout.web.de
From: Leonhard Holz <leonhard.holz@web.de>
Subject: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper
 locale
To: GNU C Library <libc-alpha@sourceware.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Message-ID: <56D3F8F0.8070401@web.de>
Date: Mon, 29 Feb 2016 10:29:00 -0000
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.6.0
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------------030900000606020402020804"
X-UI-Out-Filterresults: notjunk:1;V01:K0:ErXvnSjD38g=:ldO1ZAQVBzhnR/ufp2PnJV
 Fxetd9kwldcYcWxZNlaR8Uw27XRsNm3kDNU639ZIUMN388nCVlQ0rSyR2e9N/CAqi5UN0hGwI
 gTl2QoCDFeKQHI3MwUUnAgz/KqAydMBkFPAc7x6BvVjOTI8xlKdRdlkV/dJ8v2l/enD7J3MsB
 qG0TOJ+Sd6bI871iNA8eoRS6d9rh4cIPNxecYFLPl1/rfxum50wi7amLZBZuNi5yxKcYCuw57
 A5yj7+A42GMfbzYAfdKggItYI4NnrBcL+mD02Vm7wBeTfdxVS8hsqFaauk9l63ayIdExfCnvT
 +/OGU37QS/L7Gydrn1rNsZ68lt9LEFpWVipKKYfruJEQJbW60fAmmZhEjcs6FFvvQ7YfKMM/z
 u9c2li7r+irRnNyoWIT84XzXyx4wz8LJDEcbgSTWWVbYo8VHlECJBuJtXhUvXOYNCeWJzYU/v
 GOqG/piYK12lrHIsICZs5jRRqHOzyekK3RHbnKZIvl71chMOaf6QNx0MSLRIy+InVg2OO8rad
 xhHSsLtHRw7OzATbuj8MAnLEl0XBk/6+OdxuqU10WZbAQcrDP/AvBkhEgaXpnp/3cN05yMbL6
 dPe1+VjL+wO2XFv3OXx8TqDzkb4lctuujCrH+f9K+nqhf9Brgd0Z//BVDyDnqm6KM/z8cPrXo
 K1NwkNP8IvE3DQxa8rkSYyKIItGTbFF+PMMkuyMR+/gOeyO9pqCOlCkicUngEDIwtOR8Uekpz
 DZXR4t+8H5IX8FgMhCSFlvkWfTDh6FUW+GqnGAW6Ol4WFiKjCQyzCOF2CJw=
X-SW-Source: 2016-02/txt/msg00875.txt.bz2

This is a multi-part message in MIME format.
--------------030900000606020402020804
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Content-length: 29420

In BZ #18441 sorting a thai text with the en_US.UTF-8 locale causes a performance
regression. The cause of the problem is that

a) en_US.UTF-8 has no informations for thai chars and so always reports a zero
sort weight which causes the comparison to check the whole string instead of
breaking up early and

b) the sequence-to-weight list is partitioned by the first byte of the first
character (TABLEMB); this generates long lists for multibyte UTF-8 characters as
they tend to have an equal starting byte (e.g. all thai chars start with E0).

The approach of the patch is to interprete TABLEMB as a hashtable and find a
better hash key. My first try was to somehow "fold" a multibyte character into one
byte but that worsened the overall performance a lot. Enhancing the table to 2
byte keys works much better while needing a reasonable amount of extra memory.

The patch vastly improves the performance of languages with multibyte chars (see
zh_CN, hi_IN and ja_JP below). A side effect is that some languages with one-byte chars
get a bit slower because of the extra check for the first byte while finding the right
sequence in the sequence list . It cannot be avoided since the hash key is not
longer equal to the first byte of the sequence. Tests are ok.

filelist#C			  1.75%		23,396,200	23,805,700
filelist#en_US.UTF-8		  1.42%		77,186,200	78,285,200
lorem_ipsum#vi_VN.UTF-8		 -1.70%		1,680,740	1,652,110
lorem_ipsum#ar_SA.UTF-8		 -7.71%		2,134,780	1,970,170
lorem_ipsum#en_US.UTF-8	 	  2.61%		1,685,120	1,729,160
lorem_ipsum#zh_CN.UTF-8		-88.66%		806,176		91,423
lorem_ipsum#cs_CZ.UTF-8		 -4.89%		2,150,120	2,045,030
lorem_ipsum#en_GB.UTF-8		 -1.47%		2,061,960	2,031,620
lorem_ipsum#da_DK.UTF-8		  3.15%		1,703,710	1,757,390
lorem_ipsum#pl_PL.UTF-8		  0.86%		1,634,890	1,648,870
lorem_ipsum#fr_FR.UTF-8		 -2.06%		2,232,030	2,186,030
lorem_ipsum#pt_PT.UTF-8		 -2.60%		2,238,410	2,180,210
lorem_ipsum#el_GR.UTF-8		-34.52%		3,413,330	2,235,010
lorem_ipsum#ru_RU.UTF-8		 -9.88%		2,403,370	2,165,950
lorem_ipsum#iw_IL.UTF-8		 -9.56%		2,209,740	1,998,500
lorem_ipsum#es_ES.UTF-8	 	  4.92%		1,983,470	2,081,050
lorem_ipsum#hi_IN.UTF-8		-98.88%		220,453,000	2,458,620
lorem_ipsum#sv_SE.UTF-8		  1.79%		1,645,370	1,674,760
lorem_ipsum#hu_HU.UTF-8		  4.86%		3,179,620	3,334,290
lorem_ipsum#tr_TR.UTF-8		-23.59%		2,473,330	1,889,870
lorem_ipsum#is_IS.UTF-8		  2.49%		1,620,370	1,660,680
lorem_ipsum#it_IT.UTF-8		 -2.67%		2,186,160	2,127,710
lorem_ipsum#sr_RS.UTF-8		  2.70%		1,930,520	1,982,720
lorem_ipsum#ja_JP.UTF-8		-97.43%		958,411		24,664
wikipedia-th#en_US.UTF-8	-99.61%		10,511,700,000	40,577,100

The performance numbers and the size of the patch changed due to the removal of the strdiff optimization (#18589) and
the included thai test. Performance degration for locales in the ASCII plane is still minor. It does increase the speed
of strcoll for all languages that mostly use multiple byte UTF-8 encoding a lot. Note that it should affect the regex
performance of these languages too, though there is no benchmark for that.

Regarding Carlos comments:

>> +  struct element_t *mbheads[256 * 256];
>
> Use #define MBHEADS_SZ or something similar.

Ok.

>> +  bool is_utf8 = strcmp (charmap->code_set_name, "UTF-8") == 0;
>
> OK.
>
> Will this always work? I'm just wondering about a user generated charmap that they
> call 'utf8', which is the other common alias for instance where the dash is not valid
> syntax. Probably not since the official name is UTF-8, and that's what you should use.

Well, if it does not work it's just a speed penalty. But there is no problem in adding a check for "utf8".

>> +	  /* Special handling of UTF-8: Generate a 2-byte index to mbheads.
>> +	     Also check the UTF-8 encoding.  Keep locale/weight.h in sync.  */
>
> Not OK. Can we refactor to avoid keeing the two in sync?

Ok, there is a new function utf8index in locale/weight.h that does the job.

>> @@ -2239,7 +2281,7 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
>>
>>  		/* Compute how much space we will need.  */
>>  		added = LOCFILE_ALIGN_UP (sizeof (int32_t) + 1
>> -					  + 2 * (runp->nmbs - 1));
>> +					  + 2 * runp->nmbs);
>
> Doesn't the change to zero indexing make the conditional in the code above this wrong?
>
> e.g.
> 2230             if (runp->mbnext != NULL
> 2231                 && runp->nmbs == runp->mbnext->nmbs
> 2232                 && memcmp (runp->mbs, runp->mbnext->mbs, runp->nmbs - 1) == 0
> 2233                 && (runp->mbs[runp->nmbs - 1]
> 2234                     == runp->mbnext->mbs[runp->nmbs - 1] + 1))

No. runp traverses through the input / locale definition file and this is not affected by the change. What happens here
is a check if the next unicode literal has the same byte sequence as the current except for the last byte, which should
be 1 higher than the last byte of the current literal -> beginning of a sequence.


	* benchtests/bench-strcoll.c: Add thai text with en_US.UTF-8 locale.
	* benchtests/strcoll-inputs/wikipedia-th#en_US.UTF-8: New file.
	* locale/categories.def: Define _NL_COLLATE_ENCODING_TYPE.
	* locale/langinfo.h: Add _NL_COLLATE_ENCODING_TYPE to attribute list.
	* locale/localeinfo.h: Add enum collation_encoding_type.
	* locale/C-collate.c: Set _NL_COLLATE_ENCODING_TYPE to 8bit.
	* locale/programs/ld-collate.c (struct locale_collate_t):
	Expand mbheads array from 256 to 16384 entries.
	(collate_finish): Generate 2-byte key for mbheads if UTF-8 locale.
	(collate_output): Output larger table and sequences including first byte.
	(collate_output): Add encoding type info.
	* locale/weight.h (utf8index): New function to calculate 2 byte index.
	(findidx): Use 2-byte index for table if UTF-8 locale.
	* locale/weightwc.h (findidx): Accept encoding parameter, not used.
 	* posix/fnmatch_loop.c (FCT): Call findidx with encoding parameter.
	* posix/regcomp.c (build_equiv_class): Likewise.
	* posix/regex_internal.h (re_string_elem_size_at): Likewise.
	* posix/regexec.c (check_node_accept_bytes): Likewise.
	* string/strcoll_l.c (get_next_seq): Likewise.
	(STRCOLL): Call get_next_seq with encoding parameter.
	* string/strxfrm_l.c (find_idx): Call findidx with encoding parameter.
	(STRXFRM): Call find_idx with encoding parameter.


diff --git a/benchtests/bench-strcoll.c b/benchtests/bench-strcoll.c
index 22ae87c..6ce5b2a 100644
--- a/benchtests/bench-strcoll.c
+++ b/benchtests/bench-strcoll.c
@@ -53,7 +53,8 @@ static const char *const input_files[] = {
   "lorem_ipsum#is_IS.UTF-8",
   "lorem_ipsum#it_IT.UTF-8",
   "lorem_ipsum#sr_RS.UTF-8",
-  "lorem_ipsum#ja_JP.UTF-8"
+  "lorem_ipsum#ja_JP.UTF-8",
+  "wikipedia-th#en_US.UTF-8"
 };

 #define TEXTFILE_DELIMITER " \n\r\t.,?!"
diff --git a/locale/C-collate.c b/locale/C-collate.c
index 8214ff5..5a9ed6a 100644
--- a/locale/C-collate.c
+++ b/locale/C-collate.c
@@ -144,6 +144,8 @@ const struct __locale_data _nl_C_LC_COLLATE attribute_hidden =
     /* _NL_COLLATE_COLLSEQWC */
     { .string = (const char *) collseqwc },
     /* _NL_COLLATE_CODESET */
-    { .string = _nl_C_codeset }
+    { .string = _nl_C_codeset },
+    /* _NL_COLLATE_ENCODING_TYPE */
+    { .word = __cet_8bit }
   }
 };
diff --git a/locale/categories.def b/locale/categories.def
index d8a3ab8..cb57eae 100644
--- a/locale/categories.def
+++ b/locale/categories.def
@@ -58,6 +58,7 @@ DEFINE_CATEGORY
   DEFINE_ELEMENT (_NL_COLLATE_COLLSEQMB,        "collate-collseqmb",        std, wstring)
   DEFINE_ELEMENT (_NL_COLLATE_COLLSEQWC,        "collate-collseqwc",        std, wstring)
   DEFINE_ELEMENT (_NL_COLLATE_CODESET,		"collate-codeset",	    std, string)
+  DEFINE_ELEMENT (_NL_COLLATE_ENCODING_TYPE,	"collate-encoding-type",    std, word)
   ), NO_POSTLOAD)


diff --git a/locale/langinfo.h b/locale/langinfo.h
index 481e226..0906a6a 100644
--- a/locale/langinfo.h
+++ b/locale/langinfo.h
@@ -255,6 +255,7 @@ enum
   _NL_COLLATE_COLLSEQMB,
   _NL_COLLATE_COLLSEQWC,
   _NL_COLLATE_CODESET,
+  _NL_COLLATE_ENCODING_TYPE,
   _NL_NUM_LC_COLLATE,

   /* LC_CTYPE category: character classification.
diff --git a/locale/localeinfo.h b/locale/localeinfo.h
index 5c4e6ef..bd284df 100644
--- a/locale/localeinfo.h
+++ b/locale/localeinfo.h
@@ -110,6 +110,14 @@ enum coll_sort_rule
   sort_mask
 };

+/* Collation encoding type.  */
+enum collation_encoding_type
+{
+  __cet_other,
+  __cet_8bit,
+  __cet_utf8
+};
+
 /* We can map the types of the entries into a few categories.  */
 enum value_type
 {
diff --git a/locale/programs/ld-collate.c b/locale/programs/ld-collate.c
index 1e125f6..efaacf6 100644
--- a/locale/programs/ld-collate.c
+++ b/locale/programs/ld-collate.c
@@ -32,6 +32,8 @@
 #include "linereader.h"
 #include "locfile.h"
 #include "elem-hash.h"
+#include "../localeinfo.h"
+#include "../locale/weight.h"

 /* Uncomment the following line in the production version.  */
 /* #define NDEBUG 1 */
@@ -243,9 +245,10 @@ struct locale_collate_t
      Therefore we keep all relevant input in a list.  */
   struct locale_collate_t *next;

-  /* Arrays with heads of the list for each of the leading bytes in
+  /* Arrays with heads of the list for the leading bytes in
      the multibyte sequences.  */
-  struct element_t *mbheads[256];
+  #define MBHEADS_SZ (256 * 256)
+  struct element_t *mbheads[MBHEADS_SZ];

   /* Arrays with heads of the list for each of the leading bytes in
      the multibyte sequences.  */
@@ -1557,6 +1560,7 @@ collate_finish (struct localedef_t *locale, const struct charmap_t *charmap)
   struct section_list *sect;
   int ruleidx;
   int nr_wide_elems = 0;
+  bool is_utf8 = strcmp (charmap->code_set_name, "UTF-8") == 0;

   if (collate == NULL)
     {
@@ -1663,7 +1667,22 @@ collate_finish (struct localedef_t *locale, const struct charmap_t *charmap)
 	  struct element_t *lastp = NULL;

 	  /* Find the point where to insert in the list.  */
-	  eptr = &collate->mbheads[((unsigned char *) runp->mbs)[0]];
+	  uint16_t index = ((unsigned char *) runp->mbs)[0];
+
+	  /* Special handling of UTF-8: Generate a 2-byte index to mbheads.  */
+	  if (is_utf8 && index > 0)
+	    {
+	      index = utf8index((unsigned char *) runp->mbs, runp->nmbs);
+	      if (index == 0)
+		{
+		  WITH_CUR_LOCALE (error_at_line (0, 0, runp->file, runp->line,
+						  _("\
+malformed UTF-8 character in `%s'"), runp->name););
+		  goto dont_insert;
+		}
+	    }
+
+	  eptr = &collate->mbheads[index];
 	  while (*eptr != NULL)
 	    {
 	      if ((*eptr)->nmbs < runp->nmbs)
@@ -1734,7 +1753,7 @@ symbol `%s' has the same encoding as"), (*eptr)->name);

   /* Find out whether any of the `mbheads' entries is unset.  In this
      case we use the UNDEFINED entry.  */
-  for (i = 1; i < 256; ++i)
+  for (i = 1; i < MBHEADS_SZ; ++i)
     if (collate->mbheads[i] == NULL)
       {
 	need_undefined = 1;
@@ -2107,7 +2126,7 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
   const size_t nelems = _NL_ITEM_INDEX (_NL_NUM_LC_COLLATE);
   struct locale_file file;
   size_t ch;
-  int32_t tablemb[256];
+  int32_t tablemb[MBHEADS_SZ];
   struct obstack weightpool;
   struct obstack extrapool;
   struct obstack indirectpool;
@@ -2130,6 +2149,8 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
 	  /* The words have to be handled specially.  */
 	  if (idx == _NL_ITEM_INDEX (_NL_COLLATE_SYMB_HASH_SIZEMB))
 	    add_locale_uint32 (&file, 0);
+	  else if (idx == _NL_ITEM_INDEX (_NL_COLLATE_ENCODING_TYPE))
+	    add_locale_uint32 (&file, __cet_other);
 	  else
 	    add_locale_empty (&file);
 	}
@@ -2183,7 +2204,7 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
   if (collate->undefined.used_in_level != 0)
     output_weight (&weightpool, collate, &collate->undefined);

-  for (ch = 1; ch < 256; ++ch)
+  for (ch = 1; ch < MBHEADS_SZ; ++ch)
     if (collate->mbheads[ch]->mbnext == NULL
 	&& collate->mbheads[ch]->nmbs <= 1)
       {
@@ -2208,7 +2229,6 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
 	   and add only one index into the weight table.  We can find the
 	   consecutive entries since they are also consecutive in the list.  */
 	struct element_t *runp = collate->mbheads[ch];
-	struct element_t *lastp;

 	assert (LOCFILE_ALIGNED_P (obstack_object_size (&extrapool)));

@@ -2236,7 +2256,7 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,

 		/* Compute how much space we will need.  */
 		added = LOCFILE_ALIGN_UP (sizeof (int32_t) + 1
-					  + 2 * (runp->nmbs - 1));
+					  + 2 * runp->nmbs);
 		assert (LOCFILE_ALIGNED_P (obstack_object_size (&extrapool)));
 		obstack_make_room (&extrapool, added);

@@ -2259,9 +2279,9 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
 		/* Now walk backward from here to the beginning.  */
 		curp = runp;

-		assert (runp->nmbs <= 256);
-		obstack_1grow_fast (&extrapool, curp->nmbs - 1);
-		for (i = 1; i < curp->nmbs; ++i)
+		assert (runp->nmbs <= 255);
+		obstack_1grow_fast (&extrapool, curp->nmbs);
+		for (i = 0; i < curp->nmbs; ++i)
 		  obstack_1grow_fast (&extrapool, curp->mbs[i]);

 		/* Now find the end of the consecutive sequence and
@@ -2281,7 +2301,7 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,

 		/* And add the end byte sequence.  Without length this
 		   time.  */
-		for (i = 1; i < curp->nmbs; ++i)
+		for (i = 0; i < curp->nmbs; ++i)
 		  obstack_1grow_fast (&extrapool, curp->mbs[i]);
 	      }
 	    else
@@ -2295,15 +2315,15 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
 		weightidx = output_weight (&weightpool, collate, runp);

 		added = LOCFILE_ALIGN_UP (sizeof (int32_t) + 1
-					  + runp->nmbs - 1);
+					  + runp->nmbs);
 		assert (LOCFILE_ALIGNED_P (obstack_object_size (&extrapool)));
 		obstack_make_room (&extrapool, added);

 		obstack_int32_grow_fast (&extrapool, weightidx);
-		assert (runp->nmbs <= 256);
-		obstack_1grow_fast (&extrapool, runp->nmbs - 1);
+		assert (runp->nmbs <= 255);
+		obstack_1grow_fast (&extrapool, runp->nmbs);

-		for (i = 1; i < runp->nmbs; ++i)
+		for (i = 0; i < runp->nmbs; ++i)
 		  obstack_1grow_fast (&extrapool, runp->mbs[i]);
 	      }

@@ -2312,30 +2332,25 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
 	      obstack_1grow_fast (&extrapool, '\0');

 	    /* Next entry.  */
-	    lastp = runp;
 	    runp = runp->mbnext;
 	  }
 	while (runp != NULL);

 	assert (LOCFILE_ALIGNED_P (obstack_object_size (&extrapool)));

-	/* If the final entry in the list is not a single character we
-	   add an UNDEFINED entry here.  */
-	if (lastp->nmbs != 1)
-	  {
-	    int added = LOCFILE_ALIGN_UP (sizeof (int32_t) + 1 + 1);
-	    obstack_make_room (&extrapool, added);
+	/* Add an UNDEFINED entry at the end of the list.  */
+	int added = LOCFILE_ALIGN_UP (sizeof (int32_t) + 1 + 1);
+	obstack_make_room (&extrapool, added);

-	    obstack_int32_grow_fast (&extrapool, 0);
-	    /* XXX What rule? We just pick the first.  */
-	    obstack_1grow_fast (&extrapool, 0);
-	    /* Length is zero.  */
-	    obstack_1grow_fast (&extrapool, 0);
+	obstack_int32_grow_fast (&extrapool, 0);
+	/* XXX What rule? We just pick the first.  */
+	obstack_1grow_fast (&extrapool, 0);
+	/* Length is zero.  */
+	obstack_1grow_fast (&extrapool, 0);

-	    /* Add alignment bytes if necessary.  */
-	    while (!LOCFILE_ALIGNED_P (obstack_object_size (&extrapool)))
-	      obstack_1grow_fast (&extrapool, '\0');
-	  }
+	/* Add alignment bytes if necessary.  */
+	while (!LOCFILE_ALIGNED_P (obstack_object_size (&extrapool)))
+	  obstack_1grow_fast (&extrapool, '\0');
       }

   /* Add padding to the tables if necessary.  */
@@ -2343,7 +2358,7 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
     obstack_1grow (&weightpool, 0);

   /* Now add the four tables.  */
-  add_locale_uint32_array (&file, (const uint32_t *) tablemb, 256);
+  add_locale_uint32_array (&file, (const uint32_t *) tablemb, MBHEADS_SZ);
   add_locale_raw_obstack (&file, &weightpool);
   add_locale_raw_obstack (&file, &extrapool);
   add_locale_raw_obstack (&file, &indirectpool);
@@ -2493,6 +2508,12 @@ collate_output (struct localedef_t *locale, const struct charmap_t *charmap,
   add_locale_raw_data (&file, collate->mbseqorder, 256);
   add_locale_collseq_table (&file, &collate->wcseqorder);
   add_locale_string (&file, charmap->code_set_name);
+  if (strcmp (charmap->code_set_name, "UTF-8") == 0)
+    add_locale_uint32 (&file, __cet_utf8);
+  else if (charmap->mb_cur_max == 1)
+    add_locale_uint32 (&file, __cet_8bit);
+  else
+    add_locale_uint32 (&file, __cet_other);
   write_locale_data (output_path, LC_COLLATE, "LC_COLLATE", &file);

   obstack_free (&weightpool, NULL);
diff --git a/locale/weight.h b/locale/weight.h
index c99730c..5b4103b 100644
--- a/locale/weight.h
+++ b/locale/weight.h
@@ -19,26 +19,81 @@
 #ifndef _WEIGHT_H_
 #define _WEIGHT_H_	1

+/* Generate 2 byte code for the next UTF-8 encoded char.
+   Returns zero on UTF-8 encoding errors.  */
+static __always_inline uint16_t
+utf8index (const unsigned char *cp, size_t len)
+{
+  uint16_t index = cp[0];
+
+  if (index >= 0x80)
+    {
+      if (index < 0xE0)
+	{
+	  if (len < 2)
+	    return 0;
+	  uint16_t byte2 = cp[1];
+	  index = (index << 6) + byte2 - 0x3080;
+	}
+      else if (index < 0xF0)
+	{
+	  if (len < 3)
+	    return 0;
+	  uint16_t byte2 = cp[1];
+	  uint16_t byte3 = cp[2];
+	  index = (index << 12) + (byte2 << 6) + byte3 - 0xE2080;
+	}
+      else if (index < 0xF8)
+	{
+	  if (len < 4)
+	    return 0;
+	  uint16_t byte2 = cp[1];
+	  uint16_t byte3 = cp[2];
+	  uint16_t byte4 = cp[3];
+	  index = (byte2 << 12) + (byte3 << 6) + byte4 - 0x82080;
+	}
+      else
+	return 0;
+    }
+
+  return index;
+}
+
 /* Find index of weight.  */
 static inline int32_t __attribute__ ((always_inline))
-findidx (const int32_t *table,
+findidx (uint_fast32_t locale_encoding,
+	 const int32_t *table,
 	 const int32_t *indirect,
 	 const unsigned char *extra,
 	 const unsigned char **cpp, size_t len)
 {
-  int_fast32_t i = table[*(*cpp)++];
   const unsigned char *cp;
   const unsigned char *usrc;
+  uint16_t index = (*cpp)[0];
+
+  /* Special handling of UTF-8: Generate a 2-byte index for table.  */
+  if (index >= 0x80 && locale_encoding == __cet_utf8)
+    {
+      index = utf8index(*cpp, len);
+      if (index == 0)
+	{
+	  *cpp += 1;
+	  return 0;
+	}
+    }

+  int_fast32_t i = table[index];
   if (i >= 0)
-    /* This is an index into the weight table.  Cool.  */
-    return i;
+    {
+      /* This is an index into the weight table.  Cool.  */
+      *cpp += 1;
+      return i;
+    }

   /* Oh well, more than one sequence starting with this byte.
      Search for the correct one.  */
   cp = &extra[-i];
   usrc = *cpp;
-  --len;
   while (1)
     {
       size_t nhere;
@@ -57,8 +112,7 @@ findidx (const int32_t *table,
 	  /* It is a single character.  If it matches we found our
 	     index.  Note that at the end of each list there is an
 	     entry of length zero which represents the single byte
-	     sequence.  The first (and here only) byte was tested
-	     already.  */
+	     sequence.  */
 	  size_t cnt;

 	  for (cnt = 0; cnt < nhere && cnt < len; ++cnt)
@@ -68,7 +122,7 @@ findidx (const int32_t *table,
 	  if (cnt == nhere)
 	    {
 	      /* Found it.  */
-	      *cpp += nhere;
+	      *cpp += nhere > 0 ? nhere : 1;
 	      return i;
 	    }

@@ -127,7 +181,7 @@ findidx (const int32_t *table,
 	      while (++cnt < nhere);
 	    }

-	  *cpp += nhere;
+	  *cpp += nhere > 0 ? nhere : 1;
 	  return indirect[-i + offset];
 	}
     }
diff --git a/locale/weightwc.h b/locale/weightwc.h
index ab26482..4101dc8 100644
--- a/locale/weightwc.h
+++ b/locale/weightwc.h
@@ -21,7 +21,8 @@

 /* Find index of weight.  */
 static inline int32_t __attribute__ ((always_inline))
-findidx (const int32_t *table,
+findidx (uint_fast32_t encoding,
+	 const int32_t *table,
 	 const int32_t *indirect,
 	 const wint_t *extra,
 	 const wint_t **cpp, size_t len)
diff --git a/posix/fnmatch_loop.c b/posix/fnmatch_loop.c
index 229904e..07b60fb 100644
--- a/posix/fnmatch_loop.c
+++ b/posix/fnmatch_loop.c
@@ -383,6 +383,8 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
 			const int32_t *indirect;
 			int32_t idx;
 			const UCHAR *cp = (const UCHAR *) &str;
+			uint_fast32_t encoding =
+			  _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_ENCODING_TYPE);

 # if WIDE_CHAR_VERSION
 			table = (const int32_t *)
@@ -404,7 +406,7 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
 			  _NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB);
 # endif

-			idx = FINDIDX (table, indirect, extra, &cp, 1);
+			idx = FINDIDX (encoding, table, indirect, extra, &cp, 1);
 			if (idx != 0)
 			  {
 			    /* We found a table entry.  Now see whether the
@@ -414,7 +416,7 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
 			    int32_t idx2;
 			    const UCHAR *np = (const UCHAR *) n;

-			    idx2 = FINDIDX (table, indirect, extra,
+			    idx2 = FINDIDX (encoding, table, indirect, extra,
 					    &np, string_end - n);
 			    if (idx2 != 0
 				&& (idx >> 24) == (idx2 >> 24)
diff --git a/posix/regcomp.c b/posix/regcomp.c
index b6126b7..011ef92 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -3414,6 +3414,7 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
   uint32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES);
   if (nrules != 0)
     {
+      uint_fast32_t encoding;
       const int32_t *table, *indirect;
       const unsigned char *weights, *extra, *cp;
       unsigned char char_buf[2];
@@ -3422,6 +3423,7 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
       size_t len;
       /* Calculate the index for equivalence class.  */
       cp = name;
+      encoding = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_ENCODING_TYPE);
       table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB);
       weights = (const unsigned char *) _NL_CURRENT (LC_COLLATE,
 					       _NL_COLLATE_WEIGHTMB);
@@ -3429,7 +3431,7 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
 						   _NL_COLLATE_EXTRAMB);
       indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE,
 						_NL_COLLATE_INDIRECTMB);
-      idx1 = findidx (table, indirect, extra, &cp, -1);
+      idx1 = findidx (encoding, table, indirect, extra, &cp, -1);
       if (BE (idx1 == 0 || *cp != '\0', 0))
 	/* This isn't a valid character.  */
 	return REG_ECOLLATE;
@@ -3440,7 +3442,7 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
 	{
 	  char_buf[0] = ch;
 	  cp = char_buf;
-	  idx2 = findidx (table, indirect, extra, &cp, 1);
+	  idx2 = findidx (encoding, table, indirect, extra, &cp, 1);
 /*
 	  idx2 = table[ch];
 */
diff --git a/posix/regex_internal.h b/posix/regex_internal.h
index 02e040b..993c7c3 100644
--- a/posix/regex_internal.h
+++ b/posix/regex_internal.h
@@ -743,17 +743,19 @@ re_string_elem_size_at (const re_string_t *pstr, int idx)
 #  ifdef _LIBC
   const unsigned char *p, *extra;
   const int32_t *table, *indirect;
+  uint_fast32_t encoding;
   uint_fast32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES);

   if (nrules != 0)
     {
+      encoding = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_ENCODING_TYPE);
       table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB);
       extra = (const unsigned char *)
 	_NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAMB);
       indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE,
 						_NL_COLLATE_INDIRECTMB);
       p = pstr->mbs + idx;
-      findidx (table, indirect, extra, &p, pstr->len - idx);
+      findidx (encoding, table, indirect, extra, &p, pstr->len - idx);
       return p - pstr->mbs - idx;
     }
   else
diff --git a/posix/regexec.c b/posix/regexec.c
index ec46c3a..3d3ad9a 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -3843,6 +3843,7 @@ check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
       if (nrules != 0)
 	{
 	  unsigned int in_collseq = 0;
+	  uint_fast32_t encoding;
 	  const int32_t *table, *indirect;
 	  const unsigned char *weights, *extra;
 	  const char *collseqwc;
@@ -3893,6 +3894,8 @@ check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
 	  if (cset->nequiv_classes)
 	    {
 	      const unsigned char *cp = pin;
+	      encoding =
+		_NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_ENCODING_TYPE);
 	      table = (const int32_t *)
 		_NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB);
 	      weights = (const unsigned char *)
@@ -3901,7 +3904,8 @@ check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
 		_NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAMB);
 	      indirect = (const int32_t *)
 		_NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB);
-	      int32_t idx = findidx (table, indirect, extra, &cp, elem_len);
+	      int32_t idx = findidx (encoding, table, indirect, extra, &cp,
+				     elem_len);
 	      if (idx > 0)
 		for (i = 0; i < cset->nequiv_classes; ++i)
 		  {
diff --git a/string/strcoll_l.c b/string/strcoll_l.c
index 4d1e3ab..2c2cab0 100644
--- a/string/strcoll_l.c
+++ b/string/strcoll_l.c
@@ -63,9 +63,9 @@ typedef struct
 /* Get next sequence.  Traverse the string as required.  */
 static __always_inline void
 get_next_seq (coll_seq *seq, int nrules, const unsigned char *rulesets,
-	      const USTRING_TYPE *weights, const int32_t *table,
-	      const USTRING_TYPE *extra, const int32_t *indirect,
-	      int pass)
+	      const USTRING_TYPE *weights, uint_fast32_t encoding,
+	      const int32_t *table, const USTRING_TYPE *extra,
+	      const int32_t *indirect, int pass)
 {
   size_t val = seq->val = 0;
   int len = seq->len;
@@ -109,7 +109,7 @@ get_next_seq (coll_seq *seq, int nrules, const unsigned char *rulesets,
 	      us = seq->back_us;
 	      while (i < backw)
 		{
-		  int32_t tmp = findidx (table, indirect, extra, &us, -1);
+		  int32_t tmp = findidx (encoding, table, indirect, extra, &us, -1);
 		  idx = tmp & 0xffffff;
 		  i++;
 		}
@@ -124,7 +124,7 @@ get_next_seq (coll_seq *seq, int nrules, const unsigned char *rulesets,

 	  while (*us != L('\0'))
 	    {
-	      int32_t tmp = findidx (table, indirect, extra, &us, -1);
+	      int32_t tmp = findidx (encoding, table, indirect, extra, &us, -1);
 	      unsigned char rule = tmp >> 24;
 	      prev_idx = idx;
 	      idx = tmp & 0xffffff;
@@ -253,6 +253,7 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
   const USTRING_TYPE *weights;
   const USTRING_TYPE *extra;
   const int32_t *indirect;
+  uint_fast32_t encoding;

   if (nrules == 0)
     return STRCMP (s1, s2);
@@ -271,6 +272,8 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
     current->values[_NL_ITEM_INDEX (CONCAT(_NL_COLLATE_EXTRA,SUFFIX))].string;
   indirect = (const int32_t *)
     current->values[_NL_ITEM_INDEX (CONCAT(_NL_COLLATE_INDIRECT,SUFFIX))].string;
+  encoding = current->values[_NL_ITEM_INDEX (_NL_COLLATE_ENCODING_TYPE)].word;
+

   assert (((uintptr_t) table) % __alignof__ (table[0]) == 0);
   assert (((uintptr_t) weights) % __alignof__ (weights[0]) == 0);
@@ -310,9 +313,9 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)

       while (1)
 	{
-	  get_next_seq (&seq1, nrules, rulesets, weights, table,
+	  get_next_seq (&seq1, nrules, rulesets, weights, encoding, table,
 				    extra, indirect, pass);
-	  get_next_seq (&seq2, nrules, rulesets, weights, table,
+	  get_next_seq (&seq2, nrules, rulesets, weights, encoding, table,
 				    extra, indirect, pass);
 	  /* See whether any or both strings are empty.  */
 	  if (seq1.len == 0 || seq2.len == 0)
diff --git a/string/strxfrm_l.c b/string/strxfrm_l.c
index 22e24d3..5c89b15 100644
--- a/string/strxfrm_l.c
+++ b/string/strxfrm_l.c
@@ -53,6 +53,7 @@ typedef struct
   uint_fast32_t nrules;
   unsigned char *rulesets;
   USTRING_TYPE *weights;
+  uint_fast32_t encoding;
   int32_t *table;
   USTRING_TYPE *extra;
   int32_t *indirect;
@@ -100,8 +101,8 @@ static __always_inline size_t
 find_idx (const USTRING_TYPE **us, int32_t *weight_idx,
 	  unsigned char *rule_idx, const locale_data_t *l_data, const int pass)
 {
-  int32_t tmp = findidx (l_data->table, l_data->indirect, l_data->extra, us,
-			 -1);
+  int32_t tmp = findidx (l_data->encoding, l_data->table, l_data->indirect,
+			 l_data->extra, us, -1);
   *rule_idx = tmp >> 24;
   int32_t idx = tmp & 0xffffff;
   size_t len = l_data->weights[idx++];
@@ -693,6 +694,8 @@ STRXFRM (STRING_TYPE *dest, const STRING_TYPE *src, size_t n, __locale_t l)
   /* Get the locale data.  */
   l_data.rulesets = (unsigned char *)
     current->values[_NL_ITEM_INDEX (_NL_COLLATE_RULESETS)].string;
+  l_data.encoding =
+    current->values[_NL_ITEM_INDEX (_NL_COLLATE_ENCODING_TYPE)].word;
   l_data.table = (int32_t *)
     current->values[_NL_ITEM_INDEX (CONCAT(_NL_COLLATE_TABLE,SUFFIX))].string;
   l_data.weights = (USTRING_TYPE *)
@@ -721,8 +724,8 @@ STRXFRM (STRING_TYPE *dest, const STRING_TYPE *src, size_t n, __locale_t l)

   do
     {
-      int32_t tmp = findidx (l_data.table, l_data.indirect, l_data.extra, &cur,
-			     -1);
+      int32_t tmp = findidx (l_data.encoding, l_data.table, l_data.indirect,
+			     l_data.extra, &cur, -1);
       rulearr[idxmax] = tmp >> 24;
       idxarr[idxmax] = tmp & 0xffffff;


--------------030900000606020402020804
Content-Type: text/plain; charset=UTF-8;
 name="wikipedia-th#en_US.UTF-8"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="wikipedia-th#en_US.UTF-8"
Content-length: 8988

4LmA4LiZ4Lia4Li04Lin4Lil4Liy4Lib4Li5IOC5gOC4m+C5h+C4meC4i+C4
suC4geC4i+C4ueC5gOC4m+C4reC4o+C5jOC5guC4meC4p+C4suC5geC4peC4
sOC5gOC4meC4muC4tOC4p+C4peC4suC4peC4oeC4nuC4seC4peC4i+C4suC4
o+C5jOC5g+C4meC4geC4peC4uOC5iOC4oeC4lOC4suC4p+C4p+C4seC4pwrg
uYDguJnguJrguLTguKfguKXguLLguJnguLXguYnguYTguJTguYnguKPguLHg
uJrguIHguLLguKPguKrguLHguIfguYDguIHguJXguYLguJTguKLguIjguK3g
uKvguYzguJkg4LmA4Lia4Lin4Li04LiqIOC5g+C4meC4m+C4tSDguJ4u4Lio
LiAyMjc0CuC4i+C4tuC5iOC4h+C4quC4reC4lOC4hOC4peC5ieC4reC4h+C4
geC4seC4muC4geC4suC4o+C4muC4seC4meC4l+C4tuC4geC5gOC4q+C4leC4
uOC4geC4suC4o+C4k+C5jOC4i+C4ueC5gOC4m+C4reC4o+C5jOC5guC4meC4
p+C4suC4quC4p+C5iOC4suC4h+C5guC4lOC4ouC4meC4seC4geC4lOC4suC4
o+C4suC4qOC4suC4quC4leC4o+C5jOC4iuC4suC4p+C4iOC4teC4meC5geC4
peC4sOC4iuC4suC4p+C4reC4suC4q+C4o+C4seC4muC5g+C4mQrguJ4u4Lio
LiAxNTk3IOC4l+C4teC5iOC4o+C4sOC4lOC4seC4muC4o+C4seC4h+C4quC4
teC5gOC4reC4geC4i+C5jOC5geC4peC4sOC4o+C4seC4h+C4quC4teC5geC4
geC4oeC4oeC4suC4quC4ueC4h+C4geC4p+C5iOC4siAzMCDguIHguLTguYLg
uKXguK3guLTguYDguKXguYfguIHguJXguKPguK3guJnguYLguKfguKXguJXg
uYwK4LmA4LiZ4Lia4Li04Lin4Lil4Liy4Lib4Li54LmA4Lib4LmH4LiZ4LmB
4Lir4Lil4LmI4LiH4Lie4Lil4Lix4LiH4LiH4Liy4LiZ4LiX4Li14LmI4LmA
4LiC4LmJ4Lih4LiX4Li14LmI4Liq4Li44LiU4Lia4LiZ4LiX4LmJ4Lit4LiH
4Lif4LmJ4Liy4Lih4Liy4Lit4Lii4LmI4Liy4LiH4LiV4LmI4Lit4LmA4LiZ
4Li34LmI4Lit4LiHIOC5guC4lOC4ouC4quC4suC4oeC4suC4o+C4luC4p+C4
seC4lOC4n+C4peC4seC4geC4i+C5jOC5hOC4lOC5ieC4luC4tuC4h+C4quC4
ueC4h+C4geC4p+C5iOC4sgoxMDEyIOC4reC4tOC5gOC4peC5h+C4geC4leC4
o+C4reC4meC5guC4p+C4peC4leC5jCDguYDguJnguJrguLTguKfguKXguLLg
uJvguLnguJXguLHguYnguIfguK3guKLguLnguYjguKvguYjguLLguIfguIjg
uLLguIHguYLguKXguIEgNiw1MDAg4Lib4Li14LmB4Liq4LiHICgyIOC4geC4
tOC5guC4peC4nuC4suC4o+C5jOC5gOC4i+C4gSkK4Lih4Li14LmA4Liq4LmJ
4LiZ4Lic4LmI4Liy4LiZ4Lio4Li54LiZ4Lii4LmM4LiB4Lil4Liy4LiHIDEx
IOC4m+C4teC5geC4quC4hyAoMy40IOC4nuC4suC4o+C5jOC5gOC4i+C4gSkg
4LmB4Lil4Liw4LiC4Lii4Liy4Lii4LiV4Lix4Lin4LmD4LiZ4Lit4Lix4LiV
4Lij4LiyIDEsNTAwIOC4geC4tOC5guC4peC5gOC4oeC4leC4o+C4leC5iOC4
reC4p+C4tOC4meC4suC4l+C4tQrguJMg4LmD4LiI4LiB4Lil4Liy4LiH4LmA
4LiZ4Lia4Li04Lin4Lil4Liy4Lib4Li54LmA4Lib4LmH4LiZ4LiX4Li14LmI
4Lit4Lii4Li54LmI4LiC4Lit4LiH4Lie4Lix4Lil4LiL4Liy4Lij4LmM4Lib
4Li5IOC4lOC4suC4p+C4meC4tOC4p+C4leC4o+C4reC4meC4guC4meC4suC4
lOC5gOC4quC5ieC4meC4nOC5iOC4suC4meC4qOC4ueC4meC4ouC5jOC4geC4
peC4suC4hyAyOC0zMCDguIHguLTguYLguKXguYDguKHguJXguKMK4LiL4Li2
4LmI4LiH4Lib4Lil4LiU4Lib4Lil4LmI4Lit4Lii4Lij4Lix4LiH4Liq4Li1
4LiV4Lix4LmJ4LiH4LmB4LiV4LmI4Lij4Lix4LiH4Liq4Li14LmB4LiB4Lih
4Lih4Liy4LmE4Lib4LiI4LiZ4LiW4Li24LiH4LiE4Lil4Li34LmI4LiZ4Lin
4Li04LiX4Lii4Li44LiU4LmJ4Lin4Lii4Lit4Lix4LiV4Lij4Liy4LiB4Liy
4Lij4Lir4Lih4Li44LiZIDMwLjIg4Lij4Lit4Lia4LiV4LmI4Lit4Lin4Li0
4LiZ4Liy4LiX4Li1CuC5gOC4meC4muC4tOC4p+C4peC4suC4m+C4ueC5gOC4
m+C5h+C4meC4p+C4seC4leC4luC4uOC4l+C4suC4h+C4lOC4suC4o+C4suC4
qOC4suC4quC4leC4o+C5jOC4p+C4seC4leC4luC4uOC5geC4o+C4geC4l+C4
teC5iOC4quC4suC4oeC4suC4o+C4luC4o+C4sOC4muC4uOC5hOC4lOC5ieC4
iOC4suC4geC4geC4suC4o+C4o+C4sOC5gOC4muC4tOC4lOC4i+C4ueC5gOC4
m+C4reC4o+C5jOC5guC4meC4p+C4suC5g+C4meC4m+C4o+C4sOC4p+C4seC4
leC4tOC4qOC4suC4quC4leC4o+C5jArguYDguJnguJrguLTguKfguKXguLLg
uJnguLXguYnguJfguLPguJXguLHguKfguYDguKrguKHguLfguK3guJnguKvg
uJnguLbguYjguIfguYHguKvguKXguYjguIfguIHguLPguYDguJnguLTguJTg
uKPguLHguIfguKrguLXguKrguLPguKvguKPguLHguJrguIHguLLguKPguKjg
uLbguIHguKnguLLguYDguJfguKvguYzguJ/guLLguIHguJ/guYnguLLguJfg
uLXguYjguYDguITguKXguLfguYjguK3guJnguJzguYjguLLguJnguJXguLHg
uKfguKHguLHguJkK4LmD4LiZ4LiK4LmI4Lin4LiH4Lib4Li14LieLuC4qC4g
MjQ5MyDguYHguKXguLAgMjUxMgrguKHguLXguIHguLLguKPguJfguLPguYHg
uJzguJnguKDguLnguKHguLTguYLguITguYLguKPguJnguLLguILguK3guIfg
uJTguKfguIfguK3guLLguJfguLTguJXguKLguYzguILguLbguYnguJnguIjg
uLLguIHguIHguLLguKPguYDguJ3guYnguLLguKrguLHguIfguYDguIHguJXg
uITguKXguLfguYjguJnguKfguLTguJfguKLguLjguIjguLLguIHguYDguJng
uJrguLTguKfguKXguLLguJvguLnguJfguLXguYjguJzguYjguLLguJnguIrg
uLHguYnguJnguYLguITguYLguKPguJnguLLguYTguJsK4LmB4Lil4Liw4LmD
4LiZ4Lib4Li1IOC4ni7guKguIDI1NDYg4LmA4Lij4Liy4Liq4Liy4Lih4Liy
4Lij4LiW4Lin4Lix4LiU4LiE4Lin4Liy4Lih4Lir4LiZ4Liy4LiC4Lit4LiH
4LiK4Lix4LmJ4LiZ4Lia4Lij4Lij4Lii4Liy4LiB4Liy4Lio4LiC4Lit4LiH
4LiU4Lin4LiH4LiI4Lix4LiZ4LiX4Lij4LmM4LmE4LiX4LiX4Lix4LiZCuC4
lOC4suC4p+C4muC4o+C4tOC4p+C4suC4o+C4guC4reC4h+C4lOC4suC4p+C5
gOC4quC4suC4o+C5jOC5hOC4lOC5ieC4iOC4suC4geC4geC4suC4o+C4l+C4
teC5iOC4iuC4seC5ieC4meC4muC4o+C4o+C4ouC4suC4geC4suC4qOC4meC4
teC5ieC4geC4teC4lOC4guC4p+C4suC4h+C4o+C4seC4h+C4quC4teC5gOC4
reC4geC4i+C5jOC4iOC4suC4geC5gOC4meC4muC4tOC4p+C4peC4siAo4Lit
4LmI4Liy4LiZ4LiV4LmI4LitLi4uKQrguIzguK3guKPguYzguIwg4LmA4Lil
4Lit4LmB4Lih4LmH4LiX4Lij4LmMIOC4meC4seC4geC4p+C4tOC4l+C4ouC4
suC4qOC4suC4quC4leC4o+C5jOC5geC4peC4sOC4nuC4o+C4sOC5guC4o+C4
oeC4seC4meC4hOC4suC4l+C4reC4peC4tOC4gSDguYDguJvguYfguJnguJzg
uLnguYnguYDguKrguJnguK3guYHguJnguKfguITguLTguJTguIHguLLguKPg
uIHguLPguYDguJnguLTguJTguILguK3guIfguYDguK3guIHguKDguJ4K4LiL
4Li24LmI4LiH4LiV4LmI4Lit4Lih4Liy4Lij4Li54LmJ4LiI4Lix4LiB4LiB
4Lix4LiZ4LmD4LiZ4LiK4Li34LmI4LitIOC4l+C4pOC4qeC4juC4teC4muC4
tOC4geC5geC4muC4hyDguYPguJnguYDguJrguLfguYnguK3guIfguYHguKPg
uIHguYDguILguLLguYDguKPguLXguKLguIHguJfguKTguKnguI7guLXguJng
uLXguYnguKfguYjguLIK4Liq4Lih4Lih4LiV4Li04LiQ4Liy4LiZ4LmA4LiB
4Li14LmI4Lii4Lin4LiB4Lix4Lia4Lit4Liw4LiV4Lit4Lih4LmB4Lij4LiB
4LmA4Lij4Li04LmI4LihIChoeXBvdGhlc2lzIG9mIHRoZSBwcmltZXZhbCBh
dG9tKSDguK3guYDguKXguYfguIHguIvguLLguJnguYDguJTguK3guKPguYwK
4Lif4Lij4Li14LiU4LmB4Lih4LiZCuC4l+C4s+C4geC4suC4o+C4hOC4s+C4
meC4p+C4k+C5geC4muC4muC4iOC4s+C4peC4reC4h+C5guC4lOC4ouC4oeC4
teC4geC4o+C4reC4muC4geC4suC4o+C4nuC4tOC4iOC4suC4o+C4k+C4suC4
reC4ouC4ueC5iOC4muC4meC4nuC4t+C5ieC4meC4kOC4suC4meC4guC4reC4
h+C4l+C4pOC4qeC4juC4teC4quC4seC4oeC4nuC4seC4l+C4mOC4oOC4suC4
nuC4l+C4seC5iOC4p+C5hOC4m+C4guC4reC4h+C4reC4seC4peC5gOC4muC4
tOC4o+C5jOC4lQrguYTguK3guJnguYzguKrguYTguJXguJnguYwg4LiV4LmI
4Lit4Lih4Liy4LmD4LiZ4Lib4Li1IOC4hC7guKguIDE5Mjkg4LmA4Lit4LmH
4LiU4Lin4Li04LiZIOC4ruC4seC4muC5gOC4muC4tOC4peC4hOC5ieC4meC4
nuC4muC4p+C5iOC4sgrguKPguLDguKLguLDguKvguYjguLLguIfguILguK3g
uIfguJTguLLguKPguLLguIjguLHguIHguKPguKHguLXguKrguLHguJTguKrg
uYjguKfguJnguJfguLXguYjguYDguJvguKXguLXguYjguKLguJnguYHguJvg
uKXguIfguKrguLHguKHguJ7guLHguJnguJjguYzguIHguLHguJrguIHguLLg
uKPguYDguITguKXguLfguYjguK3guJnguYTguJvguJfguLLguIfguYHguJTg
uIcK4LiB4Liy4Lij4Liq4Lix4LiH4LmA4LiB4LiV4LiB4Liy4Lij4LiT4LmM
4LiZ4Li14LmJ4Lia4LmI4LiH4LiK4Li14LmJ4Lin4LmI4LiyIOC4lOC4suC4
o+C4suC4iOC4seC4geC4o+C5geC4peC4sOC4geC4o+C4sOC4iOC4uOC4geC4
lOC4suC4p+C4reC4seC4meC4q+C5iOC4suC4h+C5hOC4geC4peC4geC4s+C4
peC4seC4h+C5gOC4hOC4peC4t+C5iOC4reC4meC4l+C4teC5iOC4reC4reC4
geC4iOC4suC4geC4iOC4uOC4lOC4quC4seC4h+C5gOC4geC4lQrguIvguLbg
uYjguIfguKvguKHguLLguKLguITguKfguLLguKHguKfguYjguLLguYDguK3g
uIHguKDguJ7guIHguLPguKXguLHguIfguILguKLguLLguKLguJXguLHguKcg
4Lii4Li04LmI4LiH4LiV4Liz4LmB4Lir4LiZ4LmI4LiH4LiU4Liy4Lij4Liy
4LiI4Lix4LiB4Lij4LmE4LiB4Lil4Lii4Li04LmI4LiH4LiC4Li24LmJ4LiZ
CuC4hOC4p+C4suC4oeC5gOC4o+C5h+C4p+C4m+C4o+C4suC4geC4j+C4geC5
h+C4ouC4tOC5iOC4h+C5gOC4nuC4tOC5iOC4oeC4oeC4suC4geC4guC4tuC5
ieC4mSDguKvguLLguIHguYDguK3guIHguKDguJ7guYPguJnguJvguLHguIjg
uIjguLjguJrguLHguJnguIHguLPguKXguLHguIfguILguKLguLLguKLguJXg
uLHguKcg4LmB4Liq4LiU4LiH4Lin4LmI4Liy4LiB4LmI4Lit4LiZ4Lir4LiZ
4LmJ4Liy4LiZ4Li14LmJCuC5gOC4reC4geC4oOC4nuC4ouC5iOC4reC4oeC4
oeC4teC4guC4meC4suC4lOC5gOC4peC5h+C4geC4geC4p+C5iOC4siDguKvg
uJnguLLguYHguJnguYjguJnguIHguKfguYjguLIg4LmB4Lil4Liw4Lij4LmJ
4Lit4LiZ4LiB4Lin4LmI4Liy4LiX4Li14LmI4LmA4Lib4LmH4LiZ4Lit4Lii
4Li54LmICuC5geC4meC4p+C4hOC4tOC4lOC4meC4teC5ieC4oeC4teC4geC4
suC4o+C4nuC4tOC4iOC4suC4o+C4k+C4suC4reC4ouC5iOC4suC4h+C4peC4
sOC5gOC4reC4teC4ouC4lOC4ouC5ieC4reC4meC5hOC4m+C4iOC4meC4luC4
tuC4h+C4o+C4sOC4lOC4seC4muC4hOC4p+C4suC4oeC4q+C4meC4suC5geC4
meC5iOC4meC5geC4peC4sOC4reC4uOC4k+C4q+C4oOC4ueC4oeC4tOC4l+C4
teC5iOC4iOC4uOC4lOC4quC4ueC4h+C4quC4uOC4lArguYHguKXguLDguJzg
uKXguKrguKPguLjguJvguJfguLXguYjguYTguJTguYnguIHguYfguKrguK3g
uJTguITguKXguYnguK3guIfguK3guKLguYjguLLguIfguKLguLTguYjguIfg
uIHguLHguJrguJzguKXguIjguLLguIHguIHguLLguKPguKrguLHguIfguYDg
uIHguJXguIHguLLguKPguJPguYwK4LiX4Lin4LmI4Liy4LiB4Liy4Lij4LmA
4Lie4Li04LmI4Lih4LiC4Lit4LiH4Lit4Lix4LiV4Lij4Liy4LmA4Lij4LmI
4LiH4Lih4Li14LiC4LmJ4Lit4LiI4Liz4LiB4Lix4LiU4LmD4LiZ4LiB4Liy
4Lij4LiV4Lij4Lin4LiI4Liq4Lit4Lia4Liq4Lig4Liy4Lin4Liw4Lie4Lil
4Lix4LiH4LiH4Liy4LiZ4LiX4Li14LmI4Liq4Li54LiH4LiC4LiZ4Liy4LiU
4LiZ4Lix4LmJ4LiZCuC4q+C4suC4geC5hOC4oeC5iOC4oeC4teC4guC5ieC4
reC4oeC4ueC4peC4reC4t+C5iOC4meC4l+C4teC5iOC4iuC5iOC4p+C4ouC4
ouC4t+C4meC4ouC4seC4meC4quC4oOC4suC4p+C4sOC5gOC4o+C4tOC5iOC4
oeC4leC5ieC4meC4iuC4seC5iOC4p+C4guC4k+C4sOC4geC5iOC4reC4meC4
geC4suC4o+C4o+C4sOC5gOC4muC4tOC4lArguKXguLPguJ7guLHguIfguJfg
uKTguKnguI7guLXguJrguLTguIHguYHguJrguIfguIHguYfguKLguLHguIfg
uYTguKHguYjguKrguLLguKHguLLguKPguJbguYPguIrguYnguK3guJjguLTg
uJrguLLguKLguKrguKDguLLguKfguLDguYDguKPguLTguYjguKHguJXguYng
uJnguYTguJTguYkK4Lih4Lix4LiZ4LmA4Lie4Li14Lii4LiH4Lit4LiY4Li0
4Lia4Liy4Lii4LiB4Lij4Liw4Lia4Lin4LiZ4LiB4Liy4Lij4LmA4Lib4Lil
4Li14LmI4Lii4LiZ4LmB4Lib4Lil4LiH4LiC4Lit4LiH4LmA4Lit4LiB4Lig
4Lie4LiX4Li14LmI4LmA4LiB4Li04LiU4LiC4Li24LmJ4LiZ4Lir4Lil4Lix
4LiH4LiI4Liy4LiB4Liq4Lig4Liy4Lin4Liw4LmA4Lij4Li04LmI4Lih4LiV
4LmJ4LiZ4LmA4LiX4LmI4Liy4LiZ4Lix4LmJ4LiZCijguK3guYjguLLguJng
uJXguYjguK0uLi4pCgo=

--------------030900000606020402020804--