From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aurelien@aurel32.net>
Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1])
 by sourceware.org (Postfix) with ESMTPS id 6AFD03857815
 for <libc-stable@sourceware.org>; Sat, 10 Oct 2020 12:53:26 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6AFD03857815
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=aurel32.net
Authentication-Results: sourceware.org;
 spf=none smtp.mailfrom=aurelien@aurel32.net
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net
 ; s=202004.hall;
 h=Content-Transfer-Encoding:MIME-Version:Message-Id:Date:
 Subject:Cc:To:From:Content-Type:From:Reply-To:Subject:Content-ID:
 Content-Description:In-Reply-To:References:X-Debbugs-Cc;
 bh=ZL1pAI43NsNIH+wg56KzntaTVt5QEJYfgxPcFYPKnqY=; b=rJUkcxiRWe9Ysv3zFkUF6wDW8p
 qZ2PikCbdaj9Wtcga4A1FEVmFSR/ypVnUMct6HQPZ11ZIBg91mvByDUOFHrqnOMTlmfPoCrn4WqiR
 3OHMcSCTi9Q7h/V6tanu0q1T35Qpw8Za8DQIlDLcZtGy5S4rDSvLUlHOy9hRWaxnoKbfvm3m6HXWs
 9Eg9A5AbKZVFWQ57ytEtMs7R1NyoEPfPRKxxLrPrT/De+r7pceYXRsF/RaqAiKmSARrrwwCcGtpam
 pYjxAKTmsuNi9IwuI1BHo0JotLEpVqis751ouiBXNnyxeKlttp5q4y7CEhMNjS/d5beXtTrTlEUek
 0unNpJCA==;
Received: from [2a01:e35:2fdd:a4e1:fe91:fc89:bc43:b814] (helo=ohm.rr44.fr)
 by hall.aurel32.net with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.92) (envelope-from <aurelien@aurel32.net>)
 id 1kRENJ-0004SB-Cy; Sat, 10 Oct 2020 14:53:25 +0200
Received: from aurel32 by ohm.rr44.fr with local (Exim 4.94)
 (envelope-from <aurelien@aurel32.net>)
 id 1kRENI-00CYkO-NS; Sat, 10 Oct 2020 14:53:24 +0200
From: Aurelien Jarno <aurelien@aurel32.net>
To: libc-stable@sourceware.org
Cc: Arjun Shankar <arjun@redhat.com>
Subject: [2.31 COMMITTED] intl: Handle translation output codesets with
 suffixes [BZ #26383]
Date: Sat, 10 Oct 2020 14:53:16 +0200
Message-Id: <20201010125316.2993445-1-aurelien@aurel32.net>
X-Mailer: git-send-email 2.28.0
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-stable@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-stable mailing list <libc-stable.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-stable>,
 <mailto:libc-stable-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-stable/>
List-Help: <mailto:libc-stable-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-stable>,
 <mailto:libc-stable-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Sat, 10 Oct 2020 12:53:29 -0000

From: Arjun Shankar <arjun@redhat.com>

Commit 91927b7c7643 (Rewrite iconv option parsing [BZ #19519]) did not
handle cases where the output codeset for translations (via the `gettext'
family of functions) might have a caller specified encoding suffix such as
TRANSLIT or IGNORE.  This led to a regression where translations did not
work when the codeset had a suffix.

This commit fixes the above issue by parsing any suffixes passed to
__dcigettext and adds two new test-cases to intl/tst-codeset.c to
verify correct behaviour.  The iconv-internal function __gconv_create_spec
and the static iconv-internal function gconv_destroy_spec are now visible
internally within glibc and used in intl/dcigettext.c.

(cherry picked from commit 7d4ec75e111291851620c6aa2c4460647b7fd50d)
---
 NEWS                  |  1 +
 iconv/Versions        |  4 +++-
 iconv/gconv_charset.c | 10 ++++++++++
 iconv/gconv_charset.h | 27 ---------------------------
 iconv/gconv_int.h     | 21 +++++++++++++++++++++
 iconv/iconv_open.c    |  2 +-
 iconv/iconv_prog.c    |  2 +-
 intl/dcigettext.c     | 17 ++++++++++-------
 intl/tst-codeset.c    | 34 ++++++++++++++--------------------
 9 files changed, 61 insertions(+), 57 deletions(-)

diff --git a/NEWS b/NEWS
index d3ffb82294..c3901287d0 100644
--- a/NEWS
+++ b/NEWS
@@ -28,6 +28,7 @@ The following bugs are resolved with this release:
   [25976] nss_compat: internal_end*ent may clobber errno, hiding ERANGE
   [26248] Incorrect argument types for INLINE_SETXID_SYSCALL
   [26332] Incorrect cache line size load causes memory corruption in memset
+  [26383] bind_textdomain_codeset doesn't accept //TRANSLIT anymore
 
 Security related changes:
 
diff --git a/iconv/Versions b/iconv/Versions
index 8a5f4cf780..d51af52fa3 100644
--- a/iconv/Versions
+++ b/iconv/Versions
@@ -6,7 +6,9 @@ libc {
   GLIBC_PRIVATE {
     # functions shared with iconv program
     __gconv_get_alias_db; __gconv_get_cache; __gconv_get_modules_db;
-    __gconv_open; __gconv_create_spec;
+
+    # functions used elsewhere in glibc
+    __gconv_open; __gconv_create_spec; __gconv_destroy_spec;
 
     # function used by the gconv modules
     __gconv_transliterate;
diff --git a/iconv/gconv_charset.c b/iconv/gconv_charset.c
index 6ccd0773cc..4ba0aa99f5 100644
--- a/iconv/gconv_charset.c
+++ b/iconv/gconv_charset.c
@@ -216,3 +216,13 @@ out:
   return ret;
 }
 libc_hidden_def (__gconv_create_spec)
+
+
+void
+__gconv_destroy_spec (struct gconv_spec *conv_spec)
+{
+  free (conv_spec->fromcode);
+  free (conv_spec->tocode);
+  return;
+}
+libc_hidden_def (__gconv_destroy_spec)
diff --git a/iconv/gconv_charset.h b/iconv/gconv_charset.h
index b39b09aea1..e9c122cf7e 100644
--- a/iconv/gconv_charset.h
+++ b/iconv/gconv_charset.h
@@ -48,33 +48,6 @@
 #define GCONV_IGNORE_ERRORS_SUFFIX "IGNORE"
 
 
-/* This function accepts the charset names of the source and destination of the
-   conversion and populates *conv_spec with an equivalent conversion
-   specification that may later be used by __gconv_open.  The charset names
-   might contain options in the form of suffixes that alter the conversion,
-   e.g. "ISO-10646/UTF-8/TRANSLIT".  It processes the charset names, ignoring
-   and truncating any suffix options in fromcode, and processing and truncating
-   any suffix options in tocode.  Supported suffix options ("TRANSLIT" or
-   "IGNORE") when found in tocode lead to the corresponding flag in *conv_spec
-   to be set to true.  Unrecognized suffix options are silently discarded.  If
-   the function succeeds, it returns conv_spec back to the caller.  It returns
-   NULL upon failure.  */
-struct gconv_spec *
-__gconv_create_spec (struct gconv_spec *conv_spec, const char *fromcode,
-                     const char *tocode);
-libc_hidden_proto (__gconv_create_spec)
-
-
-/* This function frees all heap memory allocated by __gconv_create_spec.  */
-static void __attribute__ ((unused))
-gconv_destroy_spec (struct gconv_spec *conv_spec)
-{
-  free (conv_spec->fromcode);
-  free (conv_spec->tocode);
-  return;
-}
-
-
 /* This function copies in-order, characters from the source 's' that are
    either alpha-numeric or one in one of these: "_-.,:/" - into the destination
    'wp' while dropping all other characters.  In the process, it converts all
diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h
index e86938dae7..f721ce30ff 100644
--- a/iconv/gconv_int.h
+++ b/iconv/gconv_int.h
@@ -152,6 +152,27 @@ extern int __gconv_open (struct gconv_spec *conv_spec,
                          __gconv_t *handle, int flags);
 libc_hidden_proto (__gconv_open)
 
+/* This function accepts the charset names of the source and destination of the
+   conversion and populates *conv_spec with an equivalent conversion
+   specification that may later be used by __gconv_open.  The charset names
+   might contain options in the form of suffixes that alter the conversion,
+   e.g. "ISO-10646/UTF-8/TRANSLIT".  It processes the charset names, ignoring
+   and truncating any suffix options in fromcode, and processing and truncating
+   any suffix options in tocode.  Supported suffix options ("TRANSLIT" or
+   "IGNORE") when found in tocode lead to the corresponding flag in *conv_spec
+   to be set to true.  Unrecognized suffix options are silently discarded.  If
+   the function succeeds, it returns conv_spec back to the caller.  It returns
+   NULL upon failure.  */
+extern struct gconv_spec *
+__gconv_create_spec (struct gconv_spec *conv_spec, const char *fromcode,
+                     const char *tocode);
+libc_hidden_proto (__gconv_create_spec)
+
+/* This function frees all heap memory allocated by __gconv_create_spec.  */
+extern void
+__gconv_destroy_spec (struct gconv_spec *conv_spec);
+libc_hidden_proto (__gconv_destroy_spec)
+
 /* Free resources associated with transformation descriptor CD.  */
 extern int __gconv_close (__gconv_t cd)
      attribute_hidden;
diff --git a/iconv/iconv_open.c b/iconv/iconv_open.c
index dd54bc12e0..5b30055c04 100644
--- a/iconv/iconv_open.c
+++ b/iconv/iconv_open.c
@@ -39,7 +39,7 @@ iconv_open (const char *tocode, const char *fromcode)
 
   int res = __gconv_open (&conv_spec, &cd, 0);
 
-  gconv_destroy_spec (&conv_spec);
+  __gconv_destroy_spec (&conv_spec);
 
   if (__builtin_expect (res, __GCONV_OK) != __GCONV_OK)
     {
diff --git a/iconv/iconv_prog.c b/iconv/iconv_prog.c
index b4334faa57..d59979759c 100644
--- a/iconv/iconv_prog.c
+++ b/iconv/iconv_prog.c
@@ -184,7 +184,7 @@ main (int argc, char *argv[])
       /* Let's see whether we have these coded character sets.  */
       res = __gconv_open (&conv_spec, &cd, 0);
 
-      gconv_destroy_spec (&conv_spec);
+      __gconv_destroy_spec (&conv_spec);
 
       if (res != __GCONV_OK)
 	{
diff --git a/intl/dcigettext.c b/intl/dcigettext.c
index 2e7c662bc7..bd332e71da 100644
--- a/intl/dcigettext.c
+++ b/intl/dcigettext.c
@@ -1120,15 +1120,18 @@ _nl_find_msg (struct loaded_l10nfile *domain_file,
 
 # ifdef _LIBC
 
-		      struct gconv_spec conv_spec
-		        = { .fromcode = norm_add_slashes (charset, ""),
-		            .tocode = norm_add_slashes (outcharset, ""),
-		            /* We always want to use transliteration.  */
-		            .translit = true,
-		            .ignore = false
-		          };
+		      struct gconv_spec conv_spec;
+
+                      __gconv_create_spec (&conv_spec, charset, outcharset);
+
+		      /* We always want to use transliteration.  */
+                      conv_spec.translit = true;
+
 		      int r = __gconv_open (&conv_spec, &convd->conv,
 		                            GCONV_AVOID_NOCONV);
+
+                      __gconv_destroy_spec (&conv_spec);
+
 		      if (__builtin_expect (r != __GCONV_OK, 0))
 			{
 			  /* If the output encoding is the same there is
diff --git a/intl/tst-codeset.c b/intl/tst-codeset.c
index fd70432eca..e9f6e5e09f 100644
--- a/intl/tst-codeset.c
+++ b/intl/tst-codeset.c
@@ -22,13 +22,11 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <support/check.h>
 
 static int
 do_test (void)
 {
-  char *s;
-  int result = 0;
-
   unsetenv ("LANGUAGE");
   unsetenv ("OUTPUT_CHARSET");
   setlocale (LC_ALL, "de_DE.ISO-8859-1");
@@ -36,25 +34,21 @@ do_test (void)
   bindtextdomain ("codeset", OBJPFX "domaindir");
 
   /* Here we expect output in ISO-8859-1.  */
-  s = gettext ("cheese");
-  if (strcmp (s, "K\344se"))
-    {
-      printf ("call 1 returned: %s\n", s);
-      result = 1;
-    }
+  TEST_COMPARE_STRING (gettext ("cheese"), "K\344se");
 
+  /* Here we expect output in UTF-8.  */
   bind_textdomain_codeset ("codeset", "UTF-8");
+  TEST_COMPARE_STRING (gettext ("cheese"), "K\303\244se");
 
-  /* Here we expect output in UTF-8.  */
-  s = gettext ("cheese");
-  if (strcmp (s, "K\303\244se"))
-    {
-      printf ("call 2 returned: %s\n", s);
-      result = 1;
-    }
-
-  return result;
+  /* `a with umlaut' is transliterated to `ae'.  */
+  bind_textdomain_codeset ("codeset", "ASCII//TRANSLIT");
+  TEST_COMPARE_STRING (gettext ("cheese"), "Kaese");
+
+  /* Transliteration also works by default even if not set.  */
+  bind_textdomain_codeset ("codeset", "ASCII");
+  TEST_COMPARE_STRING (gettext ("cheese"), "Kaese");
+
+  return 0;
 }
 
-#define TEST_FUNCTION do_test ()
-#include "../test-skeleton.c"
+#include <support/test-driver.c>
-- 
2.28.0