public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 13/14] Fix ucs4le_internal_loop in error case.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
  2016-02-23  9:21 ` [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules Stefan Liebler
@ 2016-02-23  9:21 ` Stefan Liebler
  2016-02-23 17:42   ` Joseph Myers
  2016-02-23  9:21 ` [PATCH 02/14] S390: Mention s390-specific gconv-modues before common ones Stefan Liebler
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:21 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

When converting from UCS4LE to INTERNAL, the input-value is checked for a too
large value and the iconv() call sets errno to EILSEQ. In this case the inbuf
argument of the iconv() call should point to the invalid character, but it
points to the beginning of the inbuf.
Thus this patch updates the pointers inptrp and outptrp before returning in
this error case.

ChangeLog:

	* iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
	outptrp in case of an illegal input.
---
 iconv/gconv_simple.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index 5412bd6..f66bf34 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -638,6 +638,8 @@ ucs4le_internal_loop (struct __gconv_step *step,
 	      continue;
 	    }
 
+	  *inptrp = inptr;
+	  *outptrp = outptr;
 	  return __GCONV_ILLEGAL_INPUT;
 	}
 
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 02/14] S390: Mention s390-specific gconv-modues before common ones.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
  2016-02-23  9:21 ` [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules Stefan Liebler
  2016-02-23  9:21 ` [PATCH 13/14] Fix ucs4le_internal_loop in error case Stefan Liebler
@ 2016-02-23  9:21 ` Stefan Liebler
  2016-04-15 10:27   ` Florian Weimer
  2016-02-23  9:22 ` [PATCH 04/14] S390: Optimize 8bit-generic iconv modules Stefan Liebler
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:21 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch changes the order in gconv-modules. Now the s390-specific
modules are mentioned before the common ones, because these modules
aren't used in all possible conversions. E.g. the converting-step from
INTERNAL to UTF-16 used the common UTF-16.so module instead of
UTF16_UTF32_Z9.so.

The awk script is parsing the source gconv-modules file and copies it
line by line. The s390 modules are emitted between the header-comments
and the first common-code-module.

ChangeLog:

	* sysdeps/s390/s390-64/Makefile ($(objpfx)gconv-modules-s390):
	Mention s390-specific gconv-modules before common ones.
---
 sysdeps/s390/s390-64/Makefile | 82 ++++++++++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 33 deletions(-)

diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index de249a7..d1ee59d 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -40,39 +40,55 @@ $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
 	$(do-install-program)
 
 $(objpfx)gconv-modules-s390: gconv-modules
-	cp $< $@
-	echo >> $@
-	echo "# S/390 hardware accelerated modules" >> $@
-	echo -n "module	ISO-8859-1//		IBM037//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	IBM037//		ISO-8859-1//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32BE//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		INTERNAL	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16BE//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
+	${AWK} 'BEGIN { emitted = 0 } \
+	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
+	!emitted { emit_s390_modules(); emitted = 1; print; } \
+	function emit_s390_modules() { \
+	  # Emit header line. \
+	  print "# S/390 hardware accelerated modules"; \
+	  print_val("#", 8); \
+	  print_val("from", 24); \
+	  print_val("to", 24); \
+	  print_val("module", 24); \
+	  printf "cost\n"; \
+	  # Emit s390-specific modules. \
+	  modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900"); \
+	  modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900"); \
+	  modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9"); \
+	  modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9"); \
+	  modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9"); \
+	  modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9"); \
+	  modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9"); \
+	  printf "\n# Default glibc modules\n"; \
+	} \
+	function modul(from, to, file, cost) { \
+	  print_val("module", 8); \
+	  print_val(from, 24); \
+	  print_val(to, 24); \
+	  print_val(file, 24); \
+	  if (cost == 0) cost = 1; \
+	  printf "%d\n", cost; \
+	} \
+	function print_val(val, width) { \
+	  # Emit value followed by tabs. \
+	  printf "%s", val; \
+	  len = length(val); \
+	  if (len < width) { \
+	    len = width - len; \
+	    nr_tabs = len / 8; \
+	    if (len % 8 != 0) nr_tabs++; \
+	  } \
+	  else nr_tabs = 1; \
+	  for (i = 1; i <= nr_tabs; i++) printf "\t"; \
+	}' < $< > $@
 
 GCONV_MODULES = gconv-modules-s390
 
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
@ 2016-02-23  9:21 ` Stefan Liebler
  2016-04-14 14:16   ` Stefan Liebler
  2016-02-23  9:21 ` [PATCH 13/14] Fix ucs4le_internal_loop in error case Stefan Liebler
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:21 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch introduces a way to provide an architecture dependent gconv-modules
file. Before this patch, the gconv-modules file was normally installed from
src-dir/iconvdata/gconv-modules. The S390 Makefile had overridden the
installation recipe (with a make warning) in order to install the
gconv-module-s390 file from build-dir.
The iconvdata/Makefile provides another recipe, which copies the gconv-modules
file from src to build dir, which are used by the testcases.
Thus the testcases does not use the currently build s390-modules.

This patch uses build-dir/iconvdata/gconv-modules for installation.
If makefile variable GCONV_MODULES is not defined, then gconv-modules file
is copied form source to build directory.
If an architecture wants to create his own gconv-modules file, then the variable
GCONV_MODULE is set to the name of the architecture-dependent gconv-modules file
in build-directory, which has to be created by a recipe in sysdeps/.../Makefile.
Then the  iconvdata/Makefile copies this file to build-dir/iconvdata/gconv-modules, which will be used for installation and test.

This way, the s390-Makefile does not need to override the recipe for gconv-modules and no warning is emitted anymore.

ChangeLog:

    * iconvdata/Makefile (GCONV_MODULES): New variable, which can
    be set by sysdeps Makefile.
    ($(inst_gconvdir)/gconv-modules):
    Install file from $(objpfx)gconv-modules.
    ($(objpfx)gconv-modules): Copy File from src-dir or from
    build-dir with file-name specified by GCONV_MODULES.
    * sysdeps/s390/s390-64/Makefile ($(inst_gconvdir)/gconv-modules):
    Deleted.
    (GCONV_MODULES): New variable.
---
 iconvdata/Makefile            | 15 +++++++++++++--
 sysdeps/s390/s390-64/Makefile | 17 ++---------------
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/iconvdata/Makefile b/iconvdata/Makefile
index 357530b..1ac1a5c 100644
--- a/iconvdata/Makefile
+++ b/iconvdata/Makefile
@@ -244,7 +244,7 @@ headers: $(addprefix $(objpfx), $(generated-modules:=.h))
 $(addprefix $(inst_gconvdir)/, $(modules.so)): \
     $(inst_gconvdir)/%: $(objpfx)% $(+force)
 	$(do-install-program)
-$(inst_gconvdir)/gconv-modules: gconv-modules $(+force)
+$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules $(+force)
 	$(do-install)
 ifeq (no,$(cross-compiling))
 # Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
@@ -332,6 +332,17 @@ tst-tables-clean:
 	-rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
 
 ifdef objpfx
+# Override GCONV_MODULES file name and provide a Makefile recipe,
+# if you want to create your own version.
+ifndef GCONV_MODULES
+# Copy gconv-modules from src-tree for tests and installation.
 $(objpfx)gconv-modules: gconv-modules
-	cp $^ $@
+	cp $< $@
+else
+generated += $(GCONV_MODULES)
+
+# Copy overrided GCONV_MODULES file to gconv-modules for tests and installation.
+$(objpfx)gconv-modules: $(objpfx)$(GCONV_MODULES)
+	cp $< $@
+endif
 endif
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index ce4f0c5..de249a7 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -39,7 +39,7 @@ $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
 $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
 	$(do-install-program)
 
-$(objpfx)gconv-modules-s390: gconv-modules $(+force)
+$(objpfx)gconv-modules-s390: gconv-modules
 	cp $< $@
 	echo >> $@
 	echo "# S/390 hardware accelerated modules" >> $@
@@ -74,19 +74,6 @@ $(objpfx)gconv-modules-s390: gconv-modules $(+force)
 	echo -n "module	ISO-10646/UTF8/		UTF-16BE//	" >> $@
 	echo "	UTF8_UTF16_Z9		1" >> $@
 
-$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules-s390 $(+force)
-	$(do-install)
-ifeq (no,$(cross-compiling))
-# Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
-# if this libc has more gconv modules than the previously installed one.
-	if test -f "$(inst_gconvdir)/gconv-modules.cache"; then \
-	   LC_ALL=C \
-	   $(rtld-prefix) \
-	   $(common-objpfx)iconv/iconvconfig \
-	     $(addprefix --prefix=,$(install_root)); \
-	fi
-else
-	@echo '*@*@*@ You should recreate $(inst_gconvdir)/gconv-modules.cache'
-endif
+GCONV_MODULES = gconv-modules-s390
 
 endif
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 04/14] S390: Optimize 8bit-generic iconv modules.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (2 preceding siblings ...)
  2016-02-23  9:21 ` [PATCH 02/14] S390: Mention s390-specific gconv-modues before common ones Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-04-15 13:05   ` Florian Weimer
  2016-02-23  9:22 ` [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module Stefan Liebler
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch introduces a s390 specific 8bit-generic.c file which provides an
optimized version for z13 with translate-/vector-instructions, which will be
chosen at runtime via ifunc.
If the build-environment lacks vector support, then iconvdata/8bit-generic.c
is used wihtout any change. Otherwise iconvdata/8bit-generic.c is used to create
conversion loop routines without vector instructions as fallback, if vector
instructions aren't available at runtime.

The vector routines can only be used with charsets where the maximum UCS4 value
fits in 1 byte size. Then the hardware translate-instruction is used
to translate between up to 256 generic characters and "1 byte UCS4"
characters at once. The vector instructions are used to convert between
the "1 byte UCS4" and UCS4.

ChangeLog:

	* sysdeps/s390/multiarch/8bit-generic.c: New File.
	* sysdeps/s390/multiarch/iconv/skeleton.c: Likewise.
---
 sysdeps/s390/multiarch/8bit-generic.c   | 485 ++++++++++++++++++++++++++++++++
 sysdeps/s390/multiarch/iconv/skeleton.c |  21 ++
 2 files changed, 506 insertions(+)
 create mode 100644 sysdeps/s390/multiarch/8bit-generic.c
 create mode 100644 sysdeps/s390/multiarch/iconv/skeleton.c

diff --git a/sysdeps/s390/multiarch/8bit-generic.c b/sysdeps/s390/multiarch/8bit-generic.c
new file mode 100644
index 0000000..66a0537
--- /dev/null
+++ b/sysdeps/s390/multiarch/8bit-generic.c
@@ -0,0 +1,485 @@
+/* Generic conversion to and from 8bit charsets - S390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+
+# if defined HAVE_S390_VX_GCC_SUPPORT
+#  define ASM_CLOBBER_VR(NR) , NR
+# else
+#  define ASM_CLOBBER_VR(NR)
+# endif
+
+/* Generate the conversion loop routines without vector instructions as
+   fallback, if vector instructions aren't available at runtime.  */
+# define IGNORE_ICONV_SKELETON
+# define from_generic __from_generic_c
+# define to_generic __to_generic_c
+# include "iconvdata/8bit-generic.c"
+# undef IGNORE_ICONV_SKELETON
+# undef from_generic
+# undef to_generic
+
+/* Generate the converion routines with vector instructions. The vector
+   routines can only be used with charsets where the maximum UCS4 value
+   fits in 1 byte size. Then the hardware translate-instruction is used
+   to translate between multiple generic characters and "1 byte UCS4"
+   characters at once. The vector instructions are used to convert between
+   the "1 byte UCS4" and UCS4.  */
+# include <unistd.h>
+# include <dl-procinfo.h>
+
+static uint8_t to_ucs1[256];
+
+__attribute__ ((constructor)) void
+create_1byte_table (void)
+{
+  if (sizeof (from_ucs4) / sizeof (from_ucs4[0]) <= 256)
+    {
+      /* The translation-table to_ucs4 translates a 1 byte character
+	 to the corresponding internal 4 byte UCS4 value. The from_ucs4
+	 table contains all possible translations from an UCS4 character
+	 to the 1 byte generic character. If this table contains only up
+	 to 256 entry, then the highest UCS4 value can be stored in 1 byte
+	 and the translation for up to 256 characters can be done with one
+	 tr-instruction. Afterwards the translated characters are enlarged
+	 to 4 bytes. The tr-instruction needs a translation table from one
+	 to one byte. Thus "to_ucs4" with its 4 byte entries is converted to
+	 "to_ucs1" with 1 byte entries. from_ucs4 already contains 1 byte
+	 entries.  */
+
+      if (GLRO (dl_hwcap) & HWCAP_S390_VX)
+	{
+	  /* Convert to_ucs4 */
+	  uint8_t *dstptr = to_ucs1;
+	  const uint32_t *srcptr = to_ucs4;
+
+	  size_t blocks;
+	  /* Convert in blocks of 256 bytes to ones with 64 bytes.  */
+	  __asm__ volatile (".machine push\n\t"
+			    ".machine \"z13\"\n\t"
+			    ".machinemode \"zarch_nohighgprs\"\n\t"
+			    "lghi %[R_I],4\n\t"
+			    "0:\n\t"
+			    /* Load 256 bytes.  */
+			    "vlm %%v16,%%v31,0(%[R_SRC])\n\t"
+			    "la %[R_SRC],256(%[R_SRC])\n\t"
+			    /* Shorten to byte values.  */
+			    "vpkf %%v16,%%v16,%%v17\n\t"
+			    "vpkf %%v18,%%v18,%%v19\n\t"
+			    "vpkh %%v16,%%v16,%%v18\n\t"
+			    "vpkf %%v20,%%v20,%%v21\n\t"
+			    "vpkf %%v22,%%v22,%%v23\n\t"
+			    "vpkh %%v17,%%v20,%%v22\n\t"
+			    "vpkf %%v24,%%v24,%%v25\n\t"
+			    "vpkf %%v26,%%v26,%%v27\n\t"
+			    "vpkh %%v18,%%v24,%%v26\n\t"
+			    "vpkf %%v28,%%v28,%%v29\n\t"
+			    "vpkf %%v30,%%v30,%%v31\n\t"
+			    "vpkh %%v19,%%v28,%%v30\n\t"
+			    /* Store 64 bytes to buf.  */
+			    "vstm %%v16,%%v19,0(%[R_DST])\n\t"
+			    "la %[R_DST],64(%[R_DST])\n\t"
+			    "brct %[R_I],0b\n\t"
+			    ".machine pop"
+			    : /* outputs */ [R_DST] "+a" (dstptr)
+			      , [R_SRC] "+a" (srcptr) , [R_I] "=d" (blocks)
+			    : /* inputs */
+			    : /* clobber list */ "memory", "cc"
+			      ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")
+			      ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")
+			      ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")
+			      ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")
+			      ASM_CLOBBER_VR ("v24") ASM_CLOBBER_VR ("v25")
+			      ASM_CLOBBER_VR ("v26") ASM_CLOBBER_VR ("v27")
+			      ASM_CLOBBER_VR ("v28") ASM_CLOBBER_VR ("v29")
+			      ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")
+			    );
+	}
+    }
+}
+
+# undef FROM_LOOP
+# undef TO_LOOP
+# define FROM_LOOP		__from_generic_vx
+# define TO_LOOP		__to_generic_vx
+
+# define MIN_NEEDED_FROM	1
+# define MIN_NEEDED_TO		4
+# define ONE_DIRECTION		0
+
+/* First define the conversion function from the 8bit charset to UCS4.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_FROM_ORIG \
+  {									      \
+    uint32_t ch = to_ucs4[*inptr];					      \
+									      \
+    if (HAS_HOLES && __builtin_expect (ch == L'\0', 0) && *inptr != '\0')     \
+      {									      \
+	/* This is an illegal character.  */				      \
+	STANDARD_FROM_LOOP_ERR_HANDLER (1);				      \
+      }									      \
+									      \
+    put32 (outptr, ch);							      \
+    outptr += 4;							      \
+    ++inptr;								      \
+  }
+
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 16, 1)			\
+	|| outend - outptr < 64)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_FROM_ORIG							\
+    else								\
+       {								\
+	 /* Convert 16 ... 256 bytes at once with tr-instruction.  */	\
+	 size_t index;							\
+	 char buf[256];							\
+	 size_t loop_count = (inend - inptr) / 16;			\
+	 if (loop_count > (outend - outptr) / 64)			\
+	   loop_count = (outend - outptr) / 64;				\
+	 if (loop_count > 16)						\
+	   loop_count = 16;						\
+	 __asm__ volatile (".machine push\n\t"				\
+			   ".machine \"z13\"\n\t"			\
+			   ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			   "sllk %[R_I],%[R_LI],4\n\t"			\
+			   "ahi %[R_I],-1\n\t"				\
+			   /* Execute mvc and tr with correct len.  */	\
+			   "exrl %[R_I],21f\n\t"			\
+			   "exrl %[R_I],22f\n\t"			\
+			   /* Post-processing.  */			\
+			   "lghi %[R_I],0\n\t"				\
+			   "vzero %%v0\n\t"				\
+			   "0:\n\t"					\
+			   /* Find invalid character - value is zero.  */ \
+			   "vl %%v16,0(%[R_I],%[R_BUF])\n\t"		\
+			   "vceqbs %%v23,%%v0,%%v16\n\t"		\
+			   "jle 10f\n\t"				\
+			   "1:\n\t"					\
+			   /* Enlarge to UCS4.  */			\
+			   "vuplhb %%v17,%%v16\n\t"			\
+			   "vupllb %%v18,%%v16\n\t"			\
+			   "vuplhh %%v19,%%v17\n\t"			\
+			   "vupllh %%v20,%%v17\n\t"			\
+			   "vuplhh %%v21,%%v18\n\t"			\
+			   "vupllh %%v22,%%v18\n\t"			\
+			   /* Store 64bytes to buf_out.  */		\
+			   "vstm %%v19,%%v22,0(%[R_OUT])\n\t"		\
+			   "aghi %[R_I],16\n\t"				\
+			   "la %[R_OUT],64(%[R_OUT])\n\t"		\
+			   "brct %[R_LI],0b\n\t"			\
+			   "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+			   "j 20f\n\t"					\
+			   "21: mvc 0(1,%[R_BUF]),0(%[R_IN])\n\t"	\
+			   "22: tr 0(1,%[R_BUF]),0(%[R_TBL])\n\t"	\
+			   /* Possibly invalid character found.  */	\
+			   "10:\n\t"					\
+			   /* Test if input was zero, too.  */		\
+			   "vl %%v24,0(%[R_I],%[R_IN])\n\t"		\
+			   "vceqb %%v24,%%v0,%%v24\n\t"			\
+			   /* Zeros in buf (v23) and inptr (v24) are marked \
+			      with one bits. After xor, invalid characters \
+			      are marked as one bits. Proceed, if no	\
+			      invalid characters are found.  */		\
+			   "vx %%v24,%%v23,%%v24\n\t"			\
+			   "vfenebs %%v24,%%v24,%%v0\n\t"		\
+			   "jo 1b\n\t"					\
+			   /* Found an invalid translation.		\
+			      Store the preceding chars.  */		\
+			   "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+			   "vlgvb %[R_I],%%v24,7\n\t"			\
+			   "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+			   "sll %[R_I],2\n\t"				\
+			   "ahi %[R_I],-1\n\t"				\
+			   "jl 20f\n\t"					\
+			   "lgr %[R_LI],%[R_I]\n\t"			\
+			   "vuplhb %%v17,%%v16\n\t"			\
+			   "vuplhh %%v19,%%v17\n\t"			\
+			   "vstl %%v19,%[R_I],0(%[R_OUT])\n\t"		\
+			   "ahi %[R_I],-16\n\t"				\
+			   "jl 11f\n\t"					\
+			   "vupllh %%v20,%%v17\n\t"			\
+			   "vstl %%v20,%[R_I],16(%[R_OUT])\n\t"		\
+			   "ahi %[R_I],-16\n\t"				\
+			   "jl 11f\n\t"					\
+			   "vupllb %%v18,%%v16\n\t"			\
+			   "vuplhh %%v21,%%v18\n\t"			\
+			   "vstl %%v21,%[R_I],32(%[R_OUT])\n\t"		\
+			   "ahi %[R_I],-16\n\t"				\
+			   "jl 11f\n\t"					\
+			   "vupllh %%v22,%%v18\n\t"			\
+			   "vstl %%v22,%[R_I],48(%[R_OUT])\n\t"		\
+			   "11:\n\t"					\
+			   "la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"	\
+			   "20:\n\t"					\
+			   ".machine pop"				\
+			   : /* outputs */ [R_IN] "+a" (inptr)		\
+			     , [R_OUT] "+a" (outptr), [R_I] "=&a" (index) \
+			     , [R_LI] "+a" (loop_count)			\
+			   : /* inputs */ [R_BUF] "a" (buf)		\
+			     , [R_TBL] "a" (to_ucs1)			\
+			   : /* clobber list*/ "memory", "cc"		\
+			     ASM_CLOBBER_VR ("v0")  ASM_CLOBBER_VR ("v16") \
+			     ASM_CLOBBER_VR ("v17") ASM_CLOBBER_VR ("v18") \
+			     ASM_CLOBBER_VR ("v19") ASM_CLOBBER_VR ("v20") \
+			     ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22") \
+			     ASM_CLOBBER_VR ("v23") ASM_CLOBBER_VR ("v24") \
+			   );						\
+	 /* Error occured?  */						\
+	 if (loop_count != 0)						\
+	   {								\
+	     /* Found an invalid character!  */				\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (1);				\
+	  }								\
+      }									\
+    }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Next, define the other direction - from UCS4 to 8bit charset.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define LOOPFCT		TO_LOOP
+# define BODY_TO_ORIG \
+  {									      \
+    uint32_t ch = get32 (inptr);					      \
+									      \
+    if (__builtin_expect (ch >= sizeof (from_ucs4) / sizeof (from_ucs4[0]), 0)\
+	|| (__builtin_expect (from_ucs4[ch], '\1') == '\0' && ch != 0))	      \
+      {									      \
+	UNICODE_TAG_HANDLER (ch, 4);					      \
+									      \
+	/* This is an illegal character.  */				      \
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				      \
+      }									      \
+									      \
+    *outptr++ = from_ucs4[ch];						      \
+    inptr += 4;								      \
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 64, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_TO_ORIG							\
+    else								\
+      {									\
+	/* Convert 64 ... 1024 bytes at once with tr-instruction.  */	\
+	size_t index, tmp;						\
+	char buf[256];							\
+	size_t loop_count = (inend - inptr) / 64;			\
+	uint32_t max = sizeof (from_ucs4) / sizeof (from_ucs4[0]);	\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	if (loop_count > 16)						\
+	  loop_count = 16;						\
+	size_t remaining_loop_count = loop_count;			\
+	/* Step 1: Check for ch>=max, ch == 0 and shorten to bytes.	\
+	   (ch == 0 is no error, but is handled differently)  */	\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  /* Setup to check for ch >= max.  */		\
+			  "vzero %%v21\n\t"				\
+			  "vleih %%v21,-24576,0\n\t" /* element 0:   >  */ \
+			  "vleih %%v21,-8192,2\n\t"  /* element 1: =<>  */ \
+			  "vlvgf %%v20,%[R_MAX],0\n\t" /* element 0: val  */ \
+			  /* Process in 64byte - 16 characters blcocks.  */ \
+			  "lghi %[R_I],0\n\t"				\
+			  "lghi %[R_TMP],0\n\t"				\
+			  "0:\n\t"					\
+			  "vlm %%v16,%%v19,0(%[R_IN])\n\t"		\
+			  /* Test for ch >= max and ch == 0.  */	\
+			  "vstrczfs %%v22,%%v16,%%v20,%%v21\n\t"	\
+			  "jno 10f\n\t"					\
+			  "vstrczfs %%v22,%%v17,%%v20,%%v21\n\t"	\
+			  "jno 11f\n\t"					\
+			  "vstrczfs %%v22,%%v18,%%v20,%%v21\n\t"	\
+			  "jno 12f\n\t"					\
+			  "vstrczfs %%v22,%%v19,%%v20,%%v21\n\t"	\
+			  "jno 13f\n\t"					\
+			  /* Shorten to byte values.  */		\
+			  "vpkf %%v16,%%v16,%%v17\n\t"			\
+			  "vpkf %%v18,%%v18,%%v19\n\t"			\
+			  "vpkh %%v16,%%v16,%%v18\n\t"			\
+			  /* Store 16bytes to buf.  */			\
+			  "vst %%v16,0(%[R_I],%[R_BUF])\n\t"		\
+			  /* Loop until all blocks are processed.  */	\
+			  "la %[R_IN],64(%[R_IN])\n\t"			\
+			  "aghi %[R_I],16\n\t"				\
+			  "brct %[R_LI],0b\n\t"				\
+			  "j 20f\n\t"					\
+			  /* Found error ch >= max or ch == 0. */	\
+			  "13: aghi %[R_TMP],4\n\t"			\
+			  "12: aghi %[R_TMP],4\n\t"			\
+			  "11: aghi %[R_TMP],4\n\t"			\
+			  "10: vlgvb %[R_I],%%v22,7\n\t"		\
+			  "srlg %[R_I],%[R_I],2\n\t"			\
+			  "agr %[R_I],%[R_TMP]\n\t"			\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_IN] "+a" (inptr)		\
+			    , [R_I] "=&a" (index)			\
+			    , [R_TMP] "=d" (tmp)			\
+			    , [R_LI] "+d" (remaining_loop_count)	\
+			  : /* inputs */ [R_BUF] "a" (buf)		\
+			    , [R_MAX] "d" (max)				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22")			\
+			  );						\
+	/* Error occured in step 1? An error (ch >= max || ch == 0)	\
+	   occured, if remaining_loop_count > 0. The error occured	\
+	   at character-index (index) after already processed blocks.  */ \
+	loop_count -= remaining_loop_count;				\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Step 2: Translate already processed blocks in buf and	\
+	       check for errors (from_ucs4[ch] == 0).  */		\
+	    __asm__ volatile (".machine push\n\t"			\
+			      ".machine \"z13\"\n\t"			\
+			      ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			      "sllk %[R_I],%[R_LI],4\n\t"		\
+			      "ahi %[R_I],-1\n\t"			\
+			      /* Execute tr with correct len.  */	\
+			      "exrl %[R_I],21f\n\t"			\
+			      /* Post-processing.  */			\
+			      "lghi %[R_I],0\n\t"			\
+			      "0:\n\t"					\
+			      /* Find invalid character - value == 0.  */ \
+			      "vl %%v16,0(%[R_I],%[R_BUF])\n\t"		\
+			      "vfenezbs %%v17,%%v16,%%v16\n\t"		\
+			      "je 10f\n\t"				\
+			      /* Store 16bytes to buf_out.  */		\
+			      "vst %%v16,0(%[R_I],%[R_OUT])\n\t"	\
+			      "aghi %[R_I],16\n\t"			\
+			      "brct %[R_LI],0b\n\t"			\
+			      "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"	\
+			      "j 20f\n\t"				\
+			      "21: tr 0(1,%[R_BUF]),0(%[R_TBL])\n\t"	\
+			      /* Found an error: from_ucs4[ch] == 0.  */ \
+			      "10: la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"	\
+			      "vlgvb %[R_I],%%v17,7\n\t"		\
+			      "20:\n\t"					\
+			      ".machine pop"				\
+			      : /* outputs */ [R_OUT] "+a" (outptr)	\
+				, [R_I] "=&a" (tmp)			\
+				, [R_LI] "+d" (loop_count)		\
+			      : /* inputs */ [R_BUF] "a" (buf)		\
+				, [R_TBL] "a" (from_ucs4)		\
+			      : /* clobber list*/ "memory", "cc"	\
+				ASM_CLOBBER_VR ("v16")			\
+				ASM_CLOBBER_VR ("v17")			\
+			      );					\
+	    /* Error occured in processed bytes of step 2?		\
+	       Thus possible error in step 1 is obselete.*/		\
+	    if (tmp < 16)						\
+	      {								\
+		index = tmp;						\
+		inptr -= loop_count * 64;				\
+	      }								\
+	  }								\
+	/* Error occured in step 1/2?  */				\
+	if (index < 16)							\
+	  {								\
+	    /* Found an invalid character (see step 2) or zero		\
+	       (see step 1) at index! Convert the chars before index	\
+	       manually. If there is a zero at index detected by step 1, \
+	       there could be invalid characters before this zero.  */	\
+	    int i;							\
+	    uint32_t ch;						\
+	    for (i = 0; i < index; i++)					\
+	      {								\
+		ch = get32 (inptr);					\
+		if (__builtin_expect (from_ucs4[ch], '\1') == '\0')     \
+		  break;						\
+		*outptr++ = from_ucs4[ch];				\
+		inptr += 4;						\
+	      }								\
+	    if (i == index)						\
+	      {								\
+		ch = get32 (inptr);					\
+		if (ch == 0)						\
+		  {							\
+		    /* This is no error, but handled differently.  */	\
+		    *outptr++ = from_ucs4[ch];				\
+		    inptr += 4;						\
+		    continue;						\
+		  }							\
+	      }								\
+									\
+	    UNICODE_TAG_HANDLER (ch, 4);				\
+									\
+	    /* This is an illegal character.  */			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+	  }								\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_generic_c)
+__attribute__ ((ifunc ("__from_generic_resolver")))
+__from_generic;
+
+static void *
+__from_generic_resolver (unsigned long int dl_hwcap)
+{
+  if (sizeof (from_ucs4) / sizeof (from_ucs4[0]) <= 256
+      && dl_hwcap & HWCAP_S390_VX)
+    return &__from_generic_vx;
+  else
+    return &__from_generic_c;
+}
+
+__typeof(__to_generic_c)
+__attribute__ ((ifunc ("__to_generic_resolver")))
+__to_generic;
+
+static void *
+__to_generic_resolver (unsigned long int dl_hwcap)
+{
+  if (sizeof (from_ucs4) / sizeof (from_ucs4[0]) <= 256
+      && dl_hwcap & HWCAP_S390_VX)
+    return &__to_generic_vx;
+  else
+    return &__to_generic_c;
+}
+
+strong_alias (__to_generic_c_single, __to_generic_single)
+
+# undef FROM_LOOP
+# undef TO_LOOP
+# define FROM_LOOP		__from_generic
+# define TO_LOOP		__to_generic
+# include <iconv/skeleton.c>
+
+#else
+/* Generate this module without ifunc if build environment lacks vector
+   support. Instead the common 8bit-generic.c is used.  */
+# include "iconvdata/8bit-generic.c"
+#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/multiarch/iconv/skeleton.c b/sysdeps/s390/multiarch/iconv/skeleton.c
new file mode 100644
index 0000000..3a90031
--- /dev/null
+++ b/sysdeps/s390/multiarch/iconv/skeleton.c
@@ -0,0 +1,21 @@
+/* Skeleton for a conversion module - S390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef IGNORE_ICONV_SKELETON
+# include_next <iconv/skeleton.c>
+#endif
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 12/14] S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (7 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 05/14] S390: Optimize builtin iconv-modules Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-04-21 15:30   ` Stefan Liebler
  2016-02-23  9:22 ` [PATCH 09/14] S390: Optimize utf16-utf32 module Stefan Liebler
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.

Thus the cu42 instruction, which converts from utf32 to utf16,  has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.

ChangeLog:

	* sysdeps/s390/utf16-utf32-z9.c: Disable cu42 instruction and report
	an error in case of a value in range of an utf16 low surrogate.
---
 sysdeps/s390/utf16-utf32-z9.c | 155 +++++++++++++++++-------------------------
 1 file changed, 62 insertions(+), 93 deletions(-)

diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
index ecf06bd..70aa640 100644
--- a/sysdeps/s390/utf16-utf32-z9.c
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -145,42 +145,6 @@ gconv_end (struct __gconv_step *data)
   free (data->__data);
 }
 
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register size_t inlen __asm__ ("9") = inend - inptr;		\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register size_t outlen __asm__("11") = outend - outptr;		\
-    unsigned long cc = 0;						\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
 #define PREPARE_LOOP							\
   enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
   int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
@@ -310,7 +274,7 @@ gconv_end (struct __gconv_step *data)
 		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
 		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
 		  "12: lghi %[R_TMP2],16\n\t"				\
-		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
+		  "slgr %[R_TMP2],%[R_TMP]\n\t"				\
 		  "srl %[R_TMP2],1\n\t"					\
 		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
 		  "aghi %[R_OUTLEN],-4\n\t"				\
@@ -437,7 +401,7 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
     uint32_t c = get32 (inptr);						\
 									\
     if (__builtin_expect (c <= 0xd7ff, 1)				\
-	|| (c >=0xdc00 && c <= 0xffff))					\
+	|| (c > 0xdfff && c <= 0xffff))					\
       {									\
 	/* Two UTF-16 chars.  */					\
 	put16 (outptr, c);						\
@@ -475,29 +439,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
     inptr += 4;								\
   }
 
-#define BODY_TO_ETF3EH							\
-  {									\
-    HARDWARE_CONVERT ("cu42 %0, %1");					\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
 #define BODY_TO_VX							\
   {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
     unsigned long tmp, tmp2, tmp3;					\
     asm volatile (".machine push\n\t"					\
 		  ".machine \"z13\"\n\t"				\
@@ -509,8 +454,8 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-16 chars			\
 		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
-		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
 		  "lghi %[R_TMP2],0\n\t"				\
 		  /* Shorten to UTF-16.  */				\
@@ -526,9 +471,15 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  "aghi %[R_INLEN],-32\n\t"				\
 		  "aghi %[R_OUTLEN],-16\n\t"				\
 		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],32,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "clgijl %[R_INLEN],32,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "j 1b\n\t"						\
+		  /* Calculate remaining uint32_t values in inptr.  */	\
+		  "2:\n\t"						\
+		  "clgije %[R_INLEN],0,99f\n\t"				\
+		  "clgijl %[R_INLEN],4,92f\n\t"				\
+		  "srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
+		  "j 20f\n\t"						\
 		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
 		     and check for ch >= 0x10000. (v30, v31)  */	\
 		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
@@ -540,21 +491,59 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
 		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
 		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "jl 20f\n\t"						\
+		  "jl 12f\n\t"						\
 		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
 		  /* Update pointers.  */				\
 		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
 		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
 		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
 		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handles UTF16 surrogates with convert instruction.  */ \
-		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  /* Calculate remaining uint32_t values in vrs.  */	\
+		  "12: lghi %[R_TMP2],8\n\t"				\
+		  "srlg %[R_TMP3],%[R_TMP3],1\n\t"			\
+		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  /* Handle remaining UTF-32 characters.  */		\
+		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "aghi %[R_INLEN],-4\n\t"				\
+		  /* Test if ch is 2byte UTF-16 char. */		\
+		  "clfi %[R_TMP],0xffff\n\t"				\
+		  "jh 21f\n\t"						\
+		  /* Handle 2 byte UTF16 char.  */			\
+		  "lgr %[R_TMP3],%[R_TMP]\n\t"				\
+		  "nilf %[R_TMP],0xf800\n\t"				\
+		  "clfi %[R_TMP],0xd800\n\t"				\
+		  "je 91f\n\t" /* Do not accept UTF-16 surrogates.  */	\
+		  "slgfi %[R_OUTLEN],2\n\t"				\
+		  "jl 90f \n\t"						\
+		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 4byte UTF-16 char. */		\
+		  "21: clfi %[R_TMP],0x10ffff\n\t"			\
+		  "jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
+		  /* Handle 4 byte UTF16 char.  */			\
+		  "slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 90f \n\t"						\
+		  "slfi %[R_TMP],0x10000\n\t" /* zabcd = uvwxy - 1.  */	\
+		  "llilf %[R_TMP3],0xd800dc00\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],38,47,6\n\t" /* High surrogate.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],54,63,0\n\t" /* Low surrogate.  */ \
+		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "j 99f\n\t"						\
+		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99:\n\t"						\
 		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
 		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
 		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
 		    , [R_RES] "+d" (result)				\
@@ -567,17 +556,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
 		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
 		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-									\
     if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
       break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
+									\
     STANDARD_TO_LOOP_ERR_HANDLER (4);					\
   }
 
@@ -590,15 +572,6 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 #define BODY			BODY_TO_C
 #include <iconv/loop.c>
 
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
 #if defined HAVE_S390_VX_ASM_SUPPORT
 /* Generate loop-function with hardware vector instructions.  */
 # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
@@ -623,10 +596,6 @@ __to_utf16_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf16_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
-      && dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf16_loop_etf3eh;
-  else
     return __to_utf16_loop_c;
 }
 
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 00/14] S390: Optimize iconv modules.
@ 2016-02-23  9:22 Stefan Liebler
  2016-02-23  9:21 ` [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules Stefan Liebler
                   ` (14 more replies)
  0 siblings, 15 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

Hi,

this patch set introduces optimized iconv modules for S390/S390x.

The first patches prepare for the latter optimizations.
A make warning is eliminated, the order in gconv-modules file is
changed for the s390 specific modules and a new configure check
is introduced.

The next patches optimize the current s390 specific iconv modules
and generic or built-in ones. The optimizations are done e.g. with
vector instructions, if gcc, binutils can handle those.
At compile time, the relevant functions are build with/without the
vector instructions. On runtime, the appropiate function is choosen
with an ifunc resolver.

The current s390-specific iconv-modules are used on 64bit only.
These modules are reworked to run on S390 31bit, too.

The last patches fixes some errors. Unfortunately, some of the s390
convert instructions do not report errors on UTF-16 low surrogates,
thus those failing instructions has to be disabled. Perhaps those
instructions can be reenabled in future. Some common-code modules
have similar problems, which are fixed, too.

The testsuite runs without new test failures. Tests were executed for 31/64bit
with binutils that do/don't support the z13 vector instructions.

Please review.
Ok to commit?

Stefan Liebler (14):
  S390: Get rid of make warning: overriding recipe for target
    gconv-modules.
  S390: Mention s390-specific gconv-modues before common ones.
  S390: Configure check for vector support in gcc.
  S390: Optimize 8bit-generic iconv modules.
  S390: Optimize builtin iconv-modules.
  S390: Optimize iso-8859-1 to ibm037 iconv-module.
  S390: Optimize utf8-utf32 module.
  S390: Optimize utf8-utf16 module.
  S390: Optimize utf16-utf32 module.
  S390: Use s390-64 specific ionv-modules on s390-32, too.
  S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).
  S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).
  Fix ucs4le_internal_loop in error case.
  Fix UTF-16 surrogate handling.

 config.h.in                                  |    4 +
 iconv/gconv_simple.c                         |    5 +-
 iconvdata/Makefile                           |   15 +-
 iconvdata/utf-16.c                           |   12 +
 iconvdata/utf-32.c                           |    2 +-
 sysdeps/s390/Makefile                        |   83 ++
 sysdeps/s390/configure                       |   32 +
 sysdeps/s390/configure.ac                    |   21 +
 sysdeps/s390/iso-8859-1_cp037_z900.c         |  262 ++++++
 sysdeps/s390/multiarch/8bit-generic.c        |  485 ++++++++++
 sysdeps/s390/multiarch/Makefile              |    4 +
 sysdeps/s390/multiarch/gconv_simple.c        | 1266 ++++++++++++++++++++++++++
 sysdeps/s390/multiarch/iconv/skeleton.c      |   21 +
 sysdeps/s390/s390-64/Makefile                |   81 --
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c |  237 -----
 sysdeps/s390/s390-64/utf16-utf32-z9.c        |  337 -------
 sysdeps/s390/s390-64/utf8-utf16-z9.c         |  471 ----------
 sysdeps/s390/s390-64/utf8-utf32-z9.c         |  511 -----------
 sysdeps/s390/utf16-utf32-z9.c                |  605 ++++++++++++
 sysdeps/s390/utf8-utf16-z9.c                 |  818 +++++++++++++++++
 sysdeps/s390/utf8-utf32-z9.c                 |  862 ++++++++++++++++++
 21 files changed, 4493 insertions(+), 1641 deletions(-)
 create mode 100644 sysdeps/s390/Makefile
 create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
 create mode 100644 sysdeps/s390/multiarch/8bit-generic.c
 create mode 100644 sysdeps/s390/multiarch/gconv_simple.c
 create mode 100644 sysdeps/s390/multiarch/iconv/skeleton.c
 delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
 create mode 100644 sysdeps/s390/utf16-utf32-z9.c
 create mode 100644 sysdeps/s390/utf8-utf16-z9.c
 create mode 100644 sysdeps/s390/utf8-utf32-z9.c

-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 07/14] S390: Optimize utf8-utf32 module.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (9 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 09/14] S390: Optimize utf16-utf32 module Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-04-21 15:15   ` Stefan Liebler
  2016-02-23  9:22 ` [PATCH 08/14] S390: Optimize utf8-utf16 module Stefan Liebler
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch reworks the s390 specific module to convert between utf8 and utf32.
Now ifunc is used to choose either the c or etf3eh (with convert utf
instruction) variants at runtime.
Furthermore a new vector variant for z13 is introduced which will be build
and chosen if vector support is available at build / runtime.
The vector variants optimize input of 1byte utf8 characters. The convert utf
instruction is used if a multibyte utf8 character is found.

This patch also fixes some whitespace errors. The c variants are rejecting
UTF-16 surrogates and values above 0x10ffff now.
Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
Before they ignored the ignore-case and always stopped at an error.

ChangeLog:

	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Use ifunc to select c, etf3eh
	or new vector loop-variant.
---
 sysdeps/s390/s390-64/utf8-utf32-z9.c | 664 +++++++++++++++++++++++++----------
 1 file changed, 480 insertions(+), 184 deletions(-)

diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
index defd47d..e89dc70 100644
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ b/sysdeps/s390/s390-64/utf8-utf32-z9.c
@@ -30,35 +30,25 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
-/* UTF-32 big endian byte order mark.  */
-#define BOM	                0x0000feffu
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
 
+/* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
-/* These definitions apply to the UTF-8 to UTF-32 direction.  The
-   software implementation for UTF-8 still supports multibyte
-   characters up to 6 bytes whereas the hardware variant does not.  */
 #define MIN_NEEDED_FROM		1
 #define MAX_NEEDED_FROM		6
 #define MIN_NEEDED_TO		4
-#define FROM_LOOP		from_utf8_loop
-#define TO_LOOP			to_utf8_loop
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
 #define FROM_DIRECTION		(dir == from_utf8)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the Byte Order Mark.  */					\
-      if (__glibc_unlikely (outbuf + 4 > outend))			      \
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put32u (outbuf, BOM);						\
-      outbuf += 4;							\
-    }
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM			0x0000feffu
 
 /* Direction of the transformation.  */
 enum direction
@@ -155,16 +145,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-		      "+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -173,49 +163,150 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the Byte Order Mark.  */					\
+      if (__glibc_unlikely (outbuf + 4 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put32u (outbuf, BOM);						\
+      outbuf += 4;							\
+    }
+
 /* Conversion function from UTF-8 to UTF-32 internal/BE.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
-/* The software routine is copied from gconv_simple.c.  */
-#define BODY								\
+#define STORE_REST_COMMON						      \
+  {									      \
+    /* We store the remaining bytes while converting them into the UCS4	      \
+       format.  We can assume that the first byte in the buffer is	      \
+       correct and that it requires a larger number of bytes than there	      \
+       are in the input buffer.  */					      \
+    wint_t ch = **inptrp;						      \
+    size_t cnt, r;							      \
+									      \
+    state->__count = inend - *inptrp;					      \
+									      \
+    assert (ch != 0xc0 && ch != 0xc1);					      \
+    if (ch >= 0xc2 && ch < 0xe0)					      \
+      {									      \
+	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
+	   0xc1, otherwise the wide character could have been		      \
+	   represented using a single byte.  */				      \
+	cnt = 2;							      \
+	ch &= 0x1f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+      {									      \
+	/* We expect three bytes.  */					      \
+	cnt = 3;							      \
+	ch &= 0x0f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+      {									      \
+	/* We expect four bytes.  */					      \
+	cnt = 4;							      \
+	ch &= 0x07;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
+      {									      \
+	/* We expect five bytes.  */					      \
+	cnt = 5;							      \
+	ch &= 0x03;							      \
+      }									      \
+    else								      \
+      {									      \
+	/* We expect six bytes.  */					      \
+	cnt = 6;							      \
+	ch &= 0x01;							      \
+      }									      \
+									      \
+    /* The first byte is already consumed.  */				      \
+    r = cnt - 1;							      \
+    while (++(*inptrp) < inend)						      \
+      {									      \
+	ch <<= 6;							      \
+	ch |= **inptrp & 0x3f;						      \
+	--r;								      \
+      }									      \
+									      \
+    /* Shift for the so far missing bytes.  */				      \
+    ch <<= r * 6;							      \
+									      \
+    /* Store the number of bytes expected for the entire sequence.  */	      \
+    state->__count |= cnt << 8;						      \
+									      \
+    /* Store the value.  */						      \
+    state->__value.__wch = ch;						      \
+  }
+
+#define UNPACK_BYTES_COMMON \
+  {									      \
+    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
+    wint_t wch = state->__value.__wch;					      \
+    size_t ntotal = state->__count >> 8;				      \
+									      \
+    inlen = state->__count & 255;					      \
+									      \
+    bytebuf[0] = inmask[ntotal - 2];					      \
+									      \
+    do									      \
+      {									      \
+	if (--ntotal < inlen)						      \
+	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
+	wch >>= 6;							      \
+      }									      \
+    while (ntotal > 1);							      \
+									      \
+    bytebuf[0] |= wch;							      \
+  }
+
+#define CLEAR_STATE_COMMON \
+  state->__count = 0
+
+#define BODY_FROM_HW(ASM)						\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu14 %0, %1, 1");				\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
 									\
-	if (inptr != inend)						\
-	  {								\
-	    int i;							\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-	continue;							\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
       }									\
-									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
+
+
+/* The software routine is copied from gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
     /* Next input byte.  */						\
     uint32_t ch = *inptr;						\
 									\
-    if (__glibc_likely (ch < 0x80))					      \
+    if (__glibc_likely (ch < 0x80))					\
       {									\
 	/* One byte sequence.  */					\
 	++inptr;							\
@@ -233,30 +324,18 @@ gconv_end (struct __gconv_step *data)
 	    cnt = 2;							\
 	    ch &= 0x1f;							\
 	  }								\
-        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
 	  {								\
 	    /* We expect three bytes.  */				\
 	    cnt = 3;							\
 	    ch &= 0x0f;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
 	  {								\
 	    /* We expect four bytes.  */				\
 	    cnt = 4;							\
 	    ch &= 0x07;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-	  {								\
-	    /* We expect five bytes.  */				\
-	    cnt = 5;							\
-	    ch &= 0x03;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xfe) == 0xfc))			      \
-	  {								\
-	    /* We expect six bytes.  */					\
-	    cnt = 6;							\
-	    ch &= 0x01;							\
-	  }								\
 	else								\
 	  {								\
 	    /* Search the end of this ill-formed UTF-8 character.  This	\
@@ -272,7 +351,7 @@ gconv_end (struct __gconv_step *data)
 	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
 	  }								\
 									\
-	if (__glibc_unlikely (inptr + cnt > inend))			      \
+	if (__glibc_unlikely (inptr + cnt > inend))			\
 	  {								\
 	    /* We don't have enough input.  But before we report	\
 	       that check that all the bytes are correct.  */		\
@@ -280,7 +359,7 @@ gconv_end (struct __gconv_step *data)
 	      if ((inptr[i] & 0xc0) != 0x80)				\
 		break;							\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
+	    if (__glibc_likely (inptr + i == inend))			\
 	      {								\
 		result = __GCONV_INCOMPLETE_INPUT;			\
 		break;							\
@@ -305,7 +384,10 @@ gconv_end (struct __gconv_step *data)
 	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
 	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
 	   have been represented with fewer than cnt bytes.  */		\
-	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0))		\
+	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
+	    /* Do not accept UTF-16 surrogates.  */			\
+	    || (ch >= 0xd800 && ch <= 0xdfff)				\
+	    || (ch > 0x10ffff))						\
 	  {								\
 	    /* This is an illegal encoding.  */				\
 	    goto errout;						\
@@ -318,137 +400,212 @@ gconv_end (struct __gconv_step *data)
     *((uint32_t *) outptr) = ch;					\
     outptr += sizeof (uint32_t);					\
   }
-#define LOOP_NEED_FLAGS
 
-#define STORE_REST							\
-  {									      \
-    /* We store the remaining bytes while converting them into the UCS4	      \
-       format.  We can assume that the first byte in the buffer is	      \
-       correct and that it requires a larger number of bytes than there	      \
-       are in the input buffer.  */					      \
-    wint_t ch = **inptrp;						      \
-    size_t cnt, r;							      \
-									      \
-    state->__count = inend - *inptrp;					      \
-									      \
-    if (ch >= 0xc2 && ch < 0xe0)					      \
-      {									      \
-	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
-	   0xc1, otherwise the wide character could have been		      \
-	   represented using a single byte.  */				      \
-	cnt = 2;							      \
-	ch &= 0x1f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
-      {									      \
-	/* We expect three bytes.  */					      \
-	cnt = 3;							      \
-	ch &= 0x0f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
-      {									      \
-	/* We expect four bytes.  */					      \
-	cnt = 4;							      \
-	ch &= 0x07;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-      {									      \
-	/* We expect five bytes.  */					      \
-	cnt = 5;							      \
-	ch &= 0x03;							      \
-      }									      \
-    else								      \
-      {									      \
-	/* We expect six bytes.  */					      \
-	cnt = 6;							      \
-	ch &= 0x01;							      \
-      }									      \
-									      \
-    /* The first byte is already consumed.  */				      \
-    r = cnt - 1;							      \
-    while (++(*inptrp) < inend)						      \
-      {									      \
-	ch <<= 6;							      \
-	ch |= **inptrp & 0x3f;						      \
-	--r;								      \
-      }									      \
-									      \
-    /* Shift for the so far missing bytes.  */				      \
-    ch <<= r * 6;							      \
-									      \
-    /* Store the number of bytes expected for the entire sequence.  */	      \
-    state->__count |= cnt << 8;						      \
-									      \
-    /* Store the value.  */						      \
-    state->__value.__wch = ch;						      \
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
+		  "vrepib %%v31,0x20\n\t"				\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Enlarge to UCS4.  */				\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vuplhh %%v20,%%v18\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  "vupllh %%v21,%%v18\n\t"				\
+		  "aghi %[R_OUTLEN],-64\n\t"				\
+		  "vuplhh %%v22,%%v19\n\t"				\
+		  "vupllh %%v23,%%v19\n\t"				\
+		  /* Store 64 bytes to buf_out.  */			\
+		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
+		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
+						     index to store. */ \
+		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "ahi %[R_TMP2],-1\n\t"				\
+		  "jl 20f\n\t"						\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vuplhh %%v20,%%v18\n\t"				\
+		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllh %%v21,%%v18\n\t"				\
+		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "vuplhh %%v22,%%v19\n\t"				\
+		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllh %%v23,%%v19\n\t"				\
+		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
+		  "11:\n\t"						\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+		    ASM_CLOBBER_VR ("v31")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
   }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
 
-#define UNPACK_BYTES \
-  {									      \
-    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
-    wint_t wch = state->__value.__wch;					      \
-    size_t ntotal = state->__count >> 8;				      \
-									      \
-    inlen = state->__count & 255;					      \
-									      \
-    bytebuf[0] = inmask[ntotal - 2];					      \
-									      \
-    do									      \
-      {									      \
-	if (--ntotal < inlen)						      \
-	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
-	wch >>= 6;							      \
-      }									      \
-    while (ntotal > 1);							      \
-									      \
-    bytebuf[0] |= wch;							      \
-  }
+/* These definitions apply to the UTF-8 to UTF-32 direction.  The
+   software implementation for UTF-8 still supports multibyte
+   characters up to 6 bytes whereas the hardware variant does not.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
 
-#define CLEAR_STATE \
-  state->__count = 0
+#define LOOP_NEED_FLAGS
 
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_C
 #include <iconv/loop.c>
 
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+
+# define LOOP_NEED_FLAGS
+
+# define STORE_REST		STORE_REST_COMMON
+# define UNPACK_BYTES		UNPACK_BYTES_COMMON
+# define CLEAR_STATE		CLEAR_STATE_COMMON
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+
 /* Conversion from UTF-32 internal/BE to UTF-8.  */
+#define BODY_TO_HW(ASM)							\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* The hardware routine uses the S/390 cu41 instruction.  */
+#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
+
+/* The hardware routine uses the S/390 vector and cu41 instructions.  */
+#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine mimics the S/390 cu41 instruction.  */
-#define BODY							\
+#define BODY_TO_C						\
   {								\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)			\
-      {								\
-	HARDWARE_CONVERT ("cu41 %0, %1");			\
-								\
-	if (inptr != inend)					\
-	  {							\
-	    result = __GCONV_INCOMPLETE_INPUT;			\
-	    break;						\
-	  }							\
-	continue;						\
-      }								\
-								\
     uint32_t wc = *((const uint32_t *) inptr);			\
 								\
-    if (__glibc_likely (wc <= 0x7f))					      \
+    if (__glibc_likely (wc <= 0x7f))				\
       {								\
-        /* Single UTF-8 char.  */				\
-        *outptr = (uint8_t)wc;					\
+	/* Single UTF-8 char.  */				\
+	*outptr = (uint8_t)wc;					\
 	outptr++;						\
       }								\
     else if (wc <= 0x7ff)					\
       {								\
-        /* Two UTF-8 chars.  */					\
-        if (__glibc_unlikely (outptr + 2 > outend))			      \
+	/* Two UTF-8 chars.  */					\
+	if (__glibc_unlikely (outptr + 2 > outend))		\
 	  {							\
 	    /* Overflow in the output buffer.  */		\
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
 								\
-        outptr[0] = 0xc0;					\
+	outptr[0] = 0xc0;					\
 	outptr[0] |= wc >> 6;					\
 								\
 	outptr[1] = 0x80;					\
@@ -459,12 +616,18 @@ gconv_end (struct __gconv_step *data)
     else if (wc <= 0xffff)					\
       {								\
 	/* Three UTF-8 chars.  */				\
-	if (__glibc_unlikely (outptr + 3 > outend))			      \
+	if (__glibc_unlikely (outptr + 3 > outend))		\
 	  {							\
 	    /* Overflow in the output buffer.  */		\
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
+	if (wc >= 0xd800 && wc < 0xdc00)			\
+	  {							\
+	    /* Do not accept UTF-16 surrogates.   */		\
+	    result = __GCONV_ILLEGAL_INPUT;			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	  }							\
 	outptr[0] = 0xe0;					\
 	outptr[0] |= wc >> 12;					\
 								\
@@ -479,7 +642,7 @@ gconv_end (struct __gconv_step *data)
       else if (wc <= 0x10ffff)					\
 	{							\
 	  /* Four UTF-8 chars.  */				\
-	  if (__glibc_unlikely (outptr + 4 > outend))			      \
+	  if (__glibc_unlikely (outptr + 4 > outend))		\
 	    {							\
 	      /* Overflow in the output buffer.  */		\
 	      result = __GCONV_FULL_OUTPUT;			\
@@ -505,7 +668,140 @@ gconv_end (struct __gconv_step *data)
 	}							\
     inptr += 4;							\
   }
+
+#define HW_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2;						\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		  "vzero %%v21\n\t"					\
+		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
+		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP],0\n\t"					\
+		  /* Shorten to byte values.  */			\
+		  "vpkf %%v23,%%v16,%%v17\n\t"				\
+		  "vpkf %%v24,%%v18,%%v19\n\t"				\
+		  "vpkh %%v23,%%v23,%%v24\n\t"				\
+		  /* Checking for values > 0x7f.  */			\
+		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
+		  "jno 10f\n\t"						\
+		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
+		  "jno 11f\n\t"						\
+		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
+		  "jno 12f\n\t"						\
+		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
+		  "jno 13f\n\t"						\
+		  /* Store 16bytes to outptr.  */			\
+		  "vst %%v23,0(%[R_OUT])\n\t"				\
+		  "aghi %[R_INLEN],-64\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_IN],64(%[R_IN])\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],64,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Found a value > 0x7f.  */				\
+		  "13: ahi %[R_TMP],4\n\t"				\
+		  "12: ahi %[R_TMP],4\n\t"				\
+		  "11: ahi %[R_TMP],4\n\t"				\
+		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		  "srlg %[R_I],%[R_I],2\n\t"				\
+		  "agr %[R_I],%[R_TMP]\n\t"				\
+		  "je 20f\n\t"						\
+		  /* Store characters before invalid one...  */		\
+		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
+		  "15: aghi %[R_I],-1\n\t"				\
+		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
+		  /* ... and update pointers.  */			\
+		  "aghi %[R_I],1\n\t"					\
+		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
+		  "sllg %[R_I],%[R_I],2\n\t"				\
+		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_I]\n\t"				\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
+		    , [R_I] "=a" (tmp2)					\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+		    ASM_CLOBBER_VR ("v24")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_c
+#define BODY			BODY_TO_C
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY			BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf8_loop_etf3eh;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+
 #include <iconv/skeleton.c>
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 08/14] S390: Optimize utf8-utf16 module.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (10 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 07/14] S390: Optimize utf8-utf32 module Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-04-21 15:20   ` Stefan Liebler
  2016-02-23  9:23 ` [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too Stefan Liebler
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch reworks the s390 specific module to convert between utf8 and utf16.
Now ifunc is used to choose either the c or etf3eh (with convert utf instruction)
variants at runtime. Furthermore a new vector variant for z13 is introduced
which will be build and chosen if vector support is available at build / runtime.

In case of converting utf 8 to utf16, the vector variant optimizes input of
1byte utf8 characters. The convert utf instruction is used if a multibyte utf8
character is found.

For the other direction utf16 to utf8, the cu21 instruction can't be re-enabled,
because it does not report an error, if the input-stream consists of a single
low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13, too.
Thus there is only the c or the new vector variant, which can handle 1..4 byte
utf8 characters.

The c variant from utf16 to utf8 has beed fixed. If a high surrogate was at the
end of the input-buffer, then errno was set to EINVAL and the input-pointer
pointed just after the high surrogate. Now it points to the beginning of the
high surrogate.

This patch also fixes some whitespace errors. The c variant from utf8 to utf16
is now checking that tail-bytes starts with 0b10... and the value is not in
range of an utf16 surrogate.

Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
Before they ignored the ignore-case and always stopped at an error.

ChangeLog:

	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Use ifunc to select c,
	etf3eh or new vector loop-variant.
---
 sysdeps/s390/s390-64/utf8-utf16-z9.c | 547 ++++++++++++++++++++++++++++-------
 1 file changed, 441 insertions(+), 106 deletions(-)

diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
index 4148ed7..76625d0 100644
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ b/sysdeps/s390/s390-64/utf8-utf16-z9.c
@@ -30,33 +30,27 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	0xfeff
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
 
+/* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
 #define MIN_NEEDED_FROM		1
 #define MAX_NEEDED_FROM		4
 #define MIN_NEEDED_TO		2
 #define MAX_NEEDED_TO		4
-#define FROM_LOOP		from_utf8_loop
-#define TO_LOOP			to_utf8_loop
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
 #define FROM_DIRECTION		(dir == from_utf8)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the UTF-16 Byte Order Mark.  */				\
-      if (__glibc_unlikely (outbuf + 2 > outend))			      \
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put16u (outbuf, BOM_UTF16);					\
-      outbuf += 2;							\
-    }
+
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16	0xfeff
 
 /* Direction of the transformation.  */
 enum direction
@@ -151,16 +145,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			"+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -169,50 +163,135 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the UTF-16 Byte Order Mark.  */				\
+      if (__glibc_unlikely (outbuf + 2 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put16u (outbuf, BOM_UTF16);					\
+      outbuf += 2;							\
+    }
+
 /* Conversion function from UTF-8 to UTF-16.  */
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
+		  "vrepib %%v31,0x20\n\t"				\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Enlarge to UTF-16.  */				\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
+		  "aghi %[R_OUTLEN],-32\n\t"				\
+		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
+		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
+						     index to store. */ \
+		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "ahi %[R_TMP2],-1\n\t"				\
+		  "jl 20f\n\t"						\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11:\n\t" /* Update pointers.  */			\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
 /* The software implementation is based on the code in gconv_simple.c.  */
-#define BODY								\
+#define BODY_FROM_C							\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu12 %0, %1, 1");				\
-									\
-	if (inptr != inend)						\
-	  {								\
-	    int i;							\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-								\
-	    if (__glibc_likely (inptr + i == inend))			      \
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-	continue;							\
-    }									\
-									\
     /* Next input byte.  */						\
     uint16_t ch = *inptr;						\
 									\
-    if (__glibc_likely (ch < 0x80))					      \
+    if (__glibc_likely (ch < 0x80))					\
       {									\
 	/* One byte sequence.  */					\
 	++inptr;							\
@@ -230,13 +309,13 @@ gconv_end (struct __gconv_step *data)
 	    cnt = 2;							\
 	    ch &= 0x1f;							\
 	  }								\
-        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
 	  {								\
 	    /* We expect three bytes.  */				\
 	    cnt = 3;							\
 	    ch &= 0x0f;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
 	  {								\
 	    /* We expect four bytes.  */				\
 	    cnt = 4;							\
@@ -257,7 +336,7 @@ gconv_end (struct __gconv_step *data)
 	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
 	  }								\
 									\
-	if (__glibc_unlikely (inptr + cnt > inend))			      \
+	if (__glibc_unlikely (inptr + cnt > inend))			\
 	  {								\
 	    /* We don't have enough input.  But before we report	\
 	       that check that all the bytes are correct.  */		\
@@ -265,7 +344,7 @@ gconv_end (struct __gconv_step *data)
 	      if ((inptr[i] & 0xc0) != 0x80)				\
 		break;							\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
+	    if (__glibc_likely (inptr + i == inend))			\
 	      {								\
 		result = __GCONV_INCOMPLETE_INPUT;			\
 		break;							\
@@ -280,23 +359,31 @@ gconv_end (struct __gconv_step *data)
 	       low) are needed.  */					\
 	    uint16_t zabcd, high, low;					\
 									\
-	    if (__glibc_unlikely (outptr + 4 > outend))			      \
+	    if (__glibc_unlikely (outptr + 4 > outend))			\
 	      {								\
 		/* Overflow in the output buffer.  */			\
 		result = __GCONV_FULL_OUTPUT;				\
 		break;							\
 	      }								\
 									\
+	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		if ((inptr[i] & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  goto errout;						\
+	      }								\
+									\
 	    /* See Principles of Operations cu12.  */			\
 	    zabcd = (((inptr[0] & 0x7) << 2) |				\
-                     ((inptr[1] & 0x30) >> 4)) - 1;			\
+		     ((inptr[1] & 0x30) >> 4)) - 1;			\
 									\
 	    /* z-bit must be zero after subtracting 1.  */		\
 	    if (zabcd & 0x10)						\
 	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
 									\
 	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
-	    high |= zabcd << 6;	                        /* abcd bits */	\
+	    high |= zabcd << 6;                         /* abcd bits */	\
 	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
 	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
 									\
@@ -326,8 +413,19 @@ gconv_end (struct __gconv_step *data)
 		ch <<= 6;						\
 		ch |= byte & 0x3f;					\
 	      }								\
-	    inptr += cnt;						\
 									\
+	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
+	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+	       have been represented with fewer than cnt bytes.  */	\
+	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
+		/* Do not accept UTF-16 surrogates.  */			\
+		|| (ch >= 0xd800 && ch <= 0xdfff))			\
+	      {								\
+		/* This is an illegal encoding.  */			\
+		goto errout;						\
+	      }								\
+									\
+	    inptr += cnt;						\
 	  }								\
       }									\
     /* Now adjust the pointers and store the result.  */		\
@@ -335,43 +433,70 @@ gconv_end (struct __gconv_step *data)
     outptr += sizeof (uint16_t);					\
   }
 
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
 /* Conversion from UTF-16 to UTF-8.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine is based on the functionality of the S/390
    hardware instruction (cu21) as described in the Principles of
    Operation.  */
-#define BODY								\
+#define BODY_TO_C							\
   {									\
-    /* The hardware instruction currently fails to report an error for	\
-       isolated low surrogates so we have to disable the instruction	\
-       until this gets resolved.  */					\
-    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
-      {									\
-	HARDWARE_CONVERT ("cu21 %0, %1, 1");				\
-	if (inptr != inend)						\
-	  {								\
-	    /* Check if the third byte is				\
-	       a valid start of a UTF-16 surrogate.  */			\
-	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
-	      STANDARD_TO_LOOP_ERR_HANDLER (3);				\
-									\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint16_t c = get16 (inptr);						\
 									\
-    if (__glibc_likely (c <= 0x007f))					      \
+    if (__glibc_likely (c <= 0x007f))					\
       {									\
 	/* Single byte UTF-8 char.  */					\
 	*outptr = c & 0xff;						\
@@ -379,20 +504,20 @@ gconv_end (struct __gconv_step *data)
       }									\
     else if (c >= 0x0080 && c <= 0x07ff)				\
       {									\
-        /* Two byte UTF-8 char.  */					\
+	/* Two byte UTF-8 char.  */					\
 									\
-	if (__glibc_unlikely (outptr + 2 > outend))			      \
+	if (__glibc_unlikely (outptr + 2 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
 	    break;							\
 	  }								\
 									\
-        outptr[0] = 0xc0;						\
-        outptr[0] |= c >> 6;						\
+	outptr[0] = 0xc0;						\
+	outptr[0] |= c >> 6;						\
 									\
-        outptr[1] = 0x80;						\
-        outptr[1] |= c & 0x3f;						\
+	outptr[1] = 0x80;						\
+	outptr[1] |= c & 0x3f;						\
 									\
 	outptr += 2;							\
       }									\
@@ -400,7 +525,7 @@ gconv_end (struct __gconv_step *data)
       {									\
 	/* Three byte UTF-8 char.  */					\
 									\
-	if (__glibc_unlikely (outptr + 3 > outend))			      \
+	if (__glibc_unlikely (outptr + 3 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
@@ -419,22 +544,22 @@ gconv_end (struct __gconv_step *data)
       }									\
     else if (c >= 0xd800 && c <= 0xdbff)				\
       {									\
-        /* Four byte UTF-8 char.  */					\
+	/* Four byte UTF-8 char.  */					\
 	uint16_t low, uvwxy;						\
 									\
-	if (__glibc_unlikely (outptr + 4 > outend))			      \
+	if (__glibc_unlikely (outptr + 4 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
 	    break;							\
 	  }								\
-	inptr += 2;							\
-	if (__glibc_unlikely (inptr + 2 > inend))			      \
+	if (__glibc_unlikely (inptr + 4 > inend))			\
 	  {								\
 	    result = __GCONV_INCOMPLETE_INPUT;				\
 	    break;							\
 	  }								\
 									\
+	inptr += 2;							\
 	low = get16 (inptr);						\
 									\
 	if ((low & 0xfc00) != 0xdc00)					\
@@ -461,11 +586,221 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
       }									\
     inptr += 2;								\
   }
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
+
+#define BODY_TO_VX							\
+  {									\
+    size_t inlen  = inend - inptr;					\
+    size_t outlen  = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for values <= 0x7f.  */		\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
+		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP2],0\n\t"				\
+		  /* Check for > 1byte UTF-8 chars.  */			\
+		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
+		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Shorten to UTF-8.  */				\
+		  "vpkh %%v18,%%v16,%%v17\n\t"				\
+		  "la %[R_IN],32(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-32\n\t"				\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "vst %%v18,0(%[R_OUT])\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],32,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
+		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
+		  "10:\n\t"						\
+		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
+		  /* Shorten to UTF-8.  */				\
+		  "vpkh %%v18,%%v16,%%v17\n\t"				\
+		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
+		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 13f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  "13:\n\t"						\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "lghi %[R_TMP2],16\n\t"				\
+		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  "j 22f\n\t"						\
+		  /* Handle remaining bytes.  */			\
+		  "2:\n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "clgfi %[R_INLEN],1\n\t"				\
+		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
+		  "jle 99f\n\t" /* End if less than two bytes.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  /* Test if ch is 1-byte UTF-8 char.  */		\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "jl 90f \n\t"						\
+		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 2-byte UTF-8 char.  */		\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "jh 23f\n\t"						\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llill %[R_TMP3],0xc080\n\t"				\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
+		  "jh 24f\n\t"						\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llilf %[R_TMP3],0xe08080\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0xdfff\n\t"			\
+		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
+		  "clfi %[R_TMP],0xdbff\n\t"				\
+		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
+				  without a preceding high surrogate.  */ \
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 90f \n\t"						\
+		  "slgfi %[R_INLEN],2\n\t"				\
+		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
+		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
+		  "llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "aghi %[R_TMP],0x40\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
+		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
+		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
+		  "nilf %[R_TMP],0xfc00\n\t"				\
+		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
+		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "aghi %[R_TMP2],-2\n\t"				\
+		  "jh 20b\n\t"						\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__to_utf8_loop_c
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate loop-function with software implementation.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY                   BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+#else
+# define LOOPFCT		TO_LOOP
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
 
 #include <iconv/skeleton.c>
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 11/14] S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (4 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-04-21 15:25   ` Stefan Liebler
  2016-02-23  9:22 ` [PATCH 03/14] S390: Configure check for vector support in gcc Stefan Liebler
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.

Thus the cu41 instruction, which converts from utf32 to utf8,  has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.

ChangeLog:

	* sysdeps/s390/utf8-utf32-z9.c: Disable cu41 instruction and report
	an error in case of a value in range of an utf16 low surrogate.
---
 sysdeps/s390/utf8-utf32-z9.c | 188 ++++++++++++++++++++++++++-----------------
 1 file changed, 115 insertions(+), 73 deletions(-)

diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
index 1b2d6a2..b378823 100644
--- a/sysdeps/s390/utf8-utf32-z9.c
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -572,28 +572,6 @@ __from_utf8_loop_resolver (unsigned long int dl_hwcap)
 
 strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 
-
-/* Conversion from UTF-32 internal/BE to UTF-8.  */
-#define BODY_TO_HW(ASM)							\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* The hardware routine uses the S/390 cu41 instruction.  */
-#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
-
-/* The hardware routine uses the S/390 vector and cu41 instructions.  */
-#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
-
 /* The software routine mimics the S/390 cu41 instruction.  */
 #define BODY_TO_C						\
   {								\
@@ -632,7 +610,7 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
-	if (wc >= 0xd800 && wc < 0xdc00)			\
+	if (wc >= 0xd800 && wc <= 0xdfff)			\
 	  {							\
 	    /* Do not accept UTF-16 surrogates.   */		\
 	    result = __GCONV_ILLEGAL_INPUT;			\
@@ -679,13 +657,12 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
     inptr += 4;							\
   }
 
-#define HW_TO_VX							\
+/* The hardware routine uses the S/390 vector instructions.  */
+#define BODY_TO_VX							\
   {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2;						\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
     asm volatile (".machine push\n\t"					\
 		  ".machine \"z13\"\n\t"				\
 		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
@@ -696,10 +673,10 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
 		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
-		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "0: clgijl %[R_INLEN],64,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
-		  "lghi %[R_TMP],0\n\t"					\
+		  "lghi %[R_TMP2],0\n\t"				\
 		  /* Shorten to byte values.  */			\
 		  "vpkf %%v23,%%v16,%%v17\n\t"				\
 		  "vpkf %%v24,%%v18,%%v19\n\t"				\
@@ -719,41 +696,116 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  "aghi %[R_OUTLEN],-16\n\t"				\
 		  "la %[R_IN],64(%[R_IN])\n\t"				\
 		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],64,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "clgijl %[R_INLEN],64,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "j 1b\n\t"						\
 		  /* Found a value > 0x7f.  */				\
-		  "13: ahi %[R_TMP],4\n\t"				\
-		  "12: ahi %[R_TMP],4\n\t"				\
-		  "11: ahi %[R_TMP],4\n\t"				\
-		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
-		  "srlg %[R_I],%[R_I],2\n\t"				\
-		  "agr %[R_I],%[R_TMP]\n\t"				\
-		  "je 20f\n\t"						\
+		  "13: ahi %[R_TMP2],4\n\t"				\
+		  "12: ahi %[R_TMP2],4\n\t"				\
+		  "11: ahi %[R_TMP2],4\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v22,7\n\t"			\
+		  "srlg %[R_TMP],%[R_TMP],2\n\t"			\
+		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
+		  "je 16f\n\t"						\
 		  /* Store characters before invalid one...  */		\
-		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
-		  "15: aghi %[R_I],-1\n\t"				\
-		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
+		  "slgr %[R_OUTLEN],%[R_TMP]\n\t"			\
+		  "15: aghi %[R_TMP],-1\n\t"				\
+		  "vstl %%v23,%[R_TMP],0(%[R_OUT])\n\t"			\
 		  /* ... and update pointers.  */			\
-		  "aghi %[R_I],1\n\t"					\
-		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
-		  "sllg %[R_I],%[R_I],2\n\t"				\
-		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_I]\n\t"				\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  "aghi %[R_TMP],1\n\t"					\
+		  "la %[R_OUT],0(%[R_TMP],%[R_OUT])\n\t"		\
+		  "sllg %[R_TMP2],%[R_TMP],2\n\t"			\
+		  "la %[R_IN],0(%[R_TMP2],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP2]\n\t"			\
+		  /* Calculate remaining uint32_t values in loaded vrs.  */ \
+		  "16: lghi %[R_TMP2],16\n\t"				\
+		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
+		  "l %[R_TMP],0(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-4\n\t"				\
+		  "j 22f\n\t"						\
+		  /* Handle remaining bytes.  */			\
+		  "2: clgije %[R_INLEN],0,99f\n\t"			\
+		  "clgijl %[R_INLEN],4,92f\n\t"				\
+		  /* Calculate remaining uint32_t values in inptr.  */	\
+		  "srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "aghi %[R_INLEN],-4\n\t"				\
+		  /* Test if ch is 1byte UTF-8 char. */			\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "jl 90f \n\t"						\
+		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 2byte UTF-8 char. */			\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "jh 23f\n\t"						\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llill %[R_TMP3],0xc080\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xffff\n\t"			\
+		  "jh 24f\n\t"						\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llilf %[R_TMP3],0xe08080\n\t"			\
+		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  /* Test if ch is a UTF-16 surrogate: ch & 0xf800 == 0xd800  */ \
+		  "nilf %[R_TMP],0xf800\n\t"				\
+		  "clfi %[R_TMP],0xd800\n\t"				\
+		  "je 91f\n\t" /* Do not accept UTF-16 surrogates.  */	\
+		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0x10ffff\n\t"			\
+		  "jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "risbgn %[R_TMP3],%[R_TMP],37,39,6\n\t" /* 1. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],42,47,4\n\t" /* 2. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 3. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte.  */ \
+		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "j 99f\n\t"						\
+		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99:\n\t"						\
 		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
-		    , [R_I] "=a" (tmp2)					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=a" (tmp2), [R_TMP3] "=d" (tmp3)	\
 		    , [R_RES] "+d" (result)				\
 		  : /* inputs */					\
 		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
 		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
 		  : /* clobber list */ "memory", "cc"			\
 		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
 		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
@@ -761,8 +813,11 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
 		    ASM_CLOBBER_VR ("v24")				\
 		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
   }
 
 /* Generate loop-function with software routing.  */
@@ -774,15 +829,6 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 #define LOOP_NEED_FLAGS
 #include <iconv/loop.c>
 
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
 #if defined HAVE_S390_VX_ASM_SUPPORT
 /* Generate loop-function with hardware vector and utf-convert instructions.  */
 # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
@@ -807,10 +853,6 @@ __to_utf8_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf8_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
-      && dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf8_loop_etf3eh;
-  else
     return __to_utf8_loop_c;
 }
 
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 05/14] S390: Optimize builtin iconv-modules.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (6 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 03/14] S390: Configure check for vector support in gcc Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-03-18 12:58   ` Stefan Liebler
  2016-02-23  9:22 ` [PATCH 12/14] S390: Fix utf32 to utf16 handling of low surrogates (disable cu42) Stefan Liebler
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch introduces a s390 specific gconv_simple.c file which provides
optimized versions for z13 with vector instructions, which will be chosen at
runtime via ifunc.
The optimized conversions can convert between internal and ascii, ucs4, ucs4le,
ucs2, ucs2le.
If the build-environment lacks vector support, then iconv/gconv_simple.c
is used wihtout any change. Otherwise iconvdata/gconv_simple.c is used to create
conversion loop routines without vector instructions as fallback, if vector
instructions aren't available at runtime.

ChangeLog:

	* sysdeps/s390/multiarch/gconv_simple.c: New File.
	* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add gconv_simple.
---
 sysdeps/s390/multiarch/Makefile       |    4 +
 sysdeps/s390/multiarch/gconv_simple.c | 1266 +++++++++++++++++++++++++++++++++
 2 files changed, 1270 insertions(+)
 create mode 100644 sysdeps/s390/multiarch/gconv_simple.c

diff --git a/sysdeps/s390/multiarch/Makefile b/sysdeps/s390/multiarch/Makefile
index 0805b07..5067b6f 100644
--- a/sysdeps/s390/multiarch/Makefile
+++ b/sysdeps/s390/multiarch/Makefile
@@ -42,3 +42,7 @@ sysdep_routines += wcslen wcslen-vx wcslen-c \
 		   wmemset wmemset-vx wmemset-c \
 		   wmemcmp wmemcmp-vx wmemcmp-c
 endif
+
+ifeq ($(subdir),iconv)
+sysdep_routines += gconv_simple
+endif
diff --git a/sysdeps/s390/multiarch/gconv_simple.c b/sysdeps/s390/multiarch/gconv_simple.c
new file mode 100644
index 0000000..0e59422
--- /dev/null
+++ b/sysdeps/s390/multiarch/gconv_simple.c
@@ -0,0 +1,1266 @@
+/* Simple transformations functions - s390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# include <ifunc-resolve.h>
+
+# if defined HAVE_S390_VX_GCC_SUPPORT
+#  define ASM_CLOBBER_VR(NR) , NR
+# else
+#  define ASM_CLOBBER_VR(NR)
+# endif
+
+# define ICONV_C_NAME(NAME) __##NAME##_c
+# define ICONV_VX_NAME(NAME) __##NAME##_vx
+# define ICONV_VX_IFUNC(FUNC)						\
+  extern __typeof (ICONV_C_NAME (FUNC)) __##FUNC;			\
+  s390_vx_libc_ifunc (__##FUNC)						\
+  int FUNC (struct __gconv_step *step, struct __gconv_step_data *data,	\
+	    const unsigned char **inptrp, const unsigned char *inend,	\
+	    unsigned char **outbufstart, size_t *irreversible,		\
+	    int do_flush, int consume_incomplete)			\
+  {									\
+    return __##FUNC (step, data, inptrp, inend,outbufstart,		\
+		     irreversible, do_flush, consume_incomplete);	\
+  }
+# define ICONV_VX_SINGLE(NAME)						\
+  static __typeof (NAME##_single) __##NAME##_vx_single __attribute__((alias(#NAME "_single")));
+
+/* Generate the transformations which are used, if the target machine does not
+   support vector instructions.  */
+# define __gconv_transform_ascii_internal		\
+  ICONV_C_NAME (__gconv_transform_ascii_internal)
+# define __gconv_transform_internal_ascii		\
+  ICONV_C_NAME (__gconv_transform_internal_ascii)
+# define __gconv_transform_internal_ucs4le		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs4le)
+# define __gconv_transform_ucs4_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4_internal)
+# define __gconv_transform_ucs4le_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4le_internal)
+# define __gconv_transform_ucs2_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2_internal)
+# define __gconv_transform_ucs2reverse_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2reverse_internal)
+# define __gconv_transform_internal_ucs2		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2)
+# define __gconv_transform_internal_ucs2reverse		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2reverse)
+
+
+# include <iconv/gconv_simple.c>
+
+# undef __gconv_transform_ascii_internal
+# undef __gconv_transform_internal_ascii
+# undef __gconv_transform_internal_ucs4le
+# undef __gconv_transform_ucs4_internal
+# undef __gconv_transform_ucs4le_internal
+# undef __gconv_transform_ucs2_internal
+# undef __gconv_transform_ucs2reverse_internal
+# undef __gconv_transform_internal_ucs2
+# undef __gconv_transform_internal_ucs2reverse
+
+/* Now define the functions with vector support.  */
+# if defined __s390x__
+#  define CONVERT_32BIT_SIZE_T(REG)
+# else
+#  define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+# endif
+
+/* Convert from ISO 646-IRV to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	1
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ascii_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ascii_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ascii_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+    /* The value is too large.  We don't try transliteration here since \
+       this is not an error because of the lack of possibilities to	\
+       represent the result.  This is a genuine bug in the input since	\
+       ASCII does not allow such values.  */				\
+    STANDARD_FROM_LOOP_ERR_HANDLER (1);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*inptr > '\x7f'))				\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*((uint32_t *) outptr) = *inptr++;				\
+	outptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = inend - inptr;						\
+    if (len > (outend - outptr) / 4)					\
+      len = (outend - outptr) / 4;					\
+    size_t loop_count, tmp;						\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		      "srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "vrepib %%v31,0x20\n\t"				\
+		      "clgije %[R_LI],0,1f\n\t"				\
+		      "0:\n\t" /* Handle 16-byte blocks.  */		\
+		      "vl %%v16,0(%[R_IN])\n\t"				\
+		      /* Checking for values > 0x7f.  */		\
+		      "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "jno 10f\n\t"					\
+		      /* Enlarge to UCS4.  */				\
+		      "vuplhb %%v17,%%v16\n\t"				\
+		      "vupllb %%v18,%%v16\n\t"				\
+		      "vuplhh %%v19,%%v17\n\t"				\
+		      "vupllh %%v20,%%v17\n\t"				\
+		      "vuplhh %%v21,%%v18\n\t"				\
+		      "vupllh %%v22,%%v18\n\t"				\
+		      /* Store 64bytes to buf_out.  */			\
+		      "vstm %%v19,%%v22,0(%[R_OUT])\n\t"		\
+		      "la %[R_IN],16(%[R_IN])\n\t"			\
+		      "la %[R_OUT],64(%[R_OUT])\n\t"			\
+		      "brctg %[R_LI],0b\n\t"				\
+		      "lghi %[R_LI],15\n\t"				\
+		      "ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: aghik %[R_LI],%[R_LEN],-1\n\t"		\
+		      "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      "vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      /* Checking for values > 0x7f.  */		\
+		      "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "clr %[R_TMP],%[R_LI]\n\t"			\
+		      "locrh %[R_TMP],%[R_LEN]\n\t"			\
+		      "locghih %[R_LEN],0\n\t"				\
+		      "j 12f\n\t"					\
+		      "10:\n\t"						\
+		      /* Found a value > 0x7f.				\
+			 Store the preceding chars.  */			\
+		      "vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "12: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "sllk %[R_TMP],%[R_TMP],2\n\t"			\
+		      "ahi %[R_TMP],-1\n\t"				\
+		      "jl 20f\n\t"					\
+		      "lgr %[R_LI],%[R_TMP]\n\t"			\
+		      "vuplhb %%v17,%%v16\n\t"				\
+		      "vuplhh %%v19,%%v17\n\t"				\
+		      "vstl %%v19,%[R_LI],0(%[R_OUT])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 11f\n\t"					\
+		      "vupllh %%v20,%%v17\n\t"				\
+		      "vstl %%v20,%[R_LI],16(%[R_OUT])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 11f\n\t"					\
+		      "vupllb %%v18,%%v16\n\t"				\
+		      "vuplhh %%v21,%%v18\n\t"				\
+		      "vstl %%v21,%[R_LI],32(%[R_OUT])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 11f\n\t"					\
+		      "vupllh %%v22,%%v18\n\t"				\
+		      "vstl %%v22,%[R_LI],48(%[R_OUT])\n\t"		\
+		      "11:\n\t"						\
+		      "la %[R_OUT],1(%[R_TMP],%[R_OUT])\n\t"		\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_TMP] "=a" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+			ASM_CLOBBER_VR ("v31")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at the next input byte.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ascii_internal)
+
+/* Convert from the internal (UCS4-like) format to ISO 646-IRV.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		1
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ascii_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ascii_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ascii)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  UNICODE_TAG_HANDLER (*((const uint32_t *) inptr), 4);			\
+  STANDARD_TO_LOOP_ERR_HANDLER (4);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*((const uint32_t *) inptr) > 0x7f))		\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*outptr++ = *((const uint32_t *) inptr);			\
+	inptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = (inend - inptr) / 4;					\
+    if (len > outend - outptr)						\
+      len = outend - outptr;						\
+    size_t loop_count, tmp, tmp2;					\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch > 0x7f.  */		\
+		      "vzero %%v21\n\t"					\
+		      "srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		      "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		      "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		      "lghi %[R_TMP],0\n\t"				\
+		      "clgije %[R_LI],0,1f\n\t"				\
+		      "0:\n\t"						\
+		      "vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		      /* Shorten to byte values.  */			\
+		      "vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "vpkh %%v23,%%v23,%%v24\n\t"			\
+		      /* Checking for values > 0x7f.  */		\
+		      "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "jno 10f\n\t"					\
+		      "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "jno 11f\n\t"					\
+		      "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "jno 12f\n\t"					\
+		      "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "jno 13f\n\t"					\
+		      /* Store 16bytes to outptr.  */			\
+		      "vst %%v23,0(%[R_OUT])\n\t"			\
+		      "la %[R_IN],64(%[R_IN])\n\t"			\
+		      "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		      "brctg %[R_LI],0b\n\t"				\
+		      "lghi %[R_LI],15\n\t"				\
+		      "ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "aghi %[R_LI],-1\n\t"				\
+		      "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Load remaining 1...63 bytes.  */		\
+		      "vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 2f\n\t"					\
+		      "vll %%v17,%[R_LI],16(%[R_IN])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 2f\n\t"					\
+		      "vll %%v18,%[R_LI],32(%[R_IN])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 2f\n\t"					\
+		      "vll %%v19,%[R_LI],48(%[R_IN])\n\t"		\
+		      "2:\n\t"						\
+		      /* Shorten to byte values.  */			\
+		      "vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "vpkh %%v23,%%v23,%%v24\n\t"			\
+		      "sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "aghi %[R_LI],-16\n\t"				\
+		      "jl 3f\n\t" /* v16 is not fully loaded.  */	\
+		      "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "jno 10f\n\t"					\
+		      "aghi %[R_LI],-16\n\t"				\
+		      "jl 4f\n\t" /* v17 is not fully loaded.  */	\
+		      "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "jno 11f\n\t"					\
+		      "aghi %[R_LI],-16\n\t"				\
+		      "jl 5f\n\t" /* v18 is not fully loaded.  */	\
+		      "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "jno 12f\n\t"					\
+		      "aghi %[R_LI],-16\n\t"				\
+		      /* v19 is not fully loaded. */			\
+		      "lghi %[R_TMP],12\n\t"				\
+		      "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "6: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "aghi %[R_LI],16\n\t"				\
+		      "clrjl %[R_I],%[R_LI],14f\n\t"			\
+		      "lgr %[R_I],%[R_LEN]\n\t"				\
+		      "lghi %[R_LEN],0\n\t"				\
+		      "j 15f\n\t"					\
+		      "3: vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "j 6b\n\t"					\
+		      "4: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "lghi %[R_TMP],4\n\t"				\
+		      "j 6b\n\t"					\
+		      "5: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "lghi %[R_TMP],8\n\t"				\
+		      "j 6b\n\t"					\
+		      /* Found a value > 0x7f.  */			\
+		      "13: ahi %[R_TMP],4\n\t"				\
+		      "12: ahi %[R_TMP],4\n\t"				\
+		      "11: ahi %[R_TMP],4\n\t"				\
+		      "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "14: srlg %[R_I],%[R_I],2\n\t"			\
+		      "agr %[R_I],%[R_TMP]\n\t"				\
+		      "je 20f\n\t"					\
+		      /* Store characters before invalid one...  */	\
+		      "15: aghi %[R_I],-1\n\t"				\
+		      "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		      /* ... and update pointers.  */			\
+		      "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+		      "sllg %[R_I],%[R_I],2\n\t"			\
+		      "la %[R_IN],4(%[R_I],%[R_IN])\n\t"		\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_I] "=a" (tmp2)				\
+			, [R_TMP] "=d" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+			ASM_CLOBBER_VR ("v24")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character > 0x7f at next character.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_internal_ascii)
+
+
+/* Convert from internal UCS4 to UCS4 little endian form.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ucs4le_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ucs4le_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs4le)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (internal_ucs4le_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len = MIN (inend - inptr, outend - outptr) / 4;
+  size_t loop_count;
+  __asm__ volatile (".machine push\n\t"
+		    ".machine \"z13\"\n\t"
+		    ".machinemode \"zarch_nohighgprs\"\n\t"
+		    CONVERT_32BIT_SIZE_T ([R_LEN])
+		    "bras %[R_LI],1f\n\t"
+		    /* Vector permute mask:  */
+		    ".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+		    "1: vl %%v20,0(%[R_LI])\n\t"
+		    /* Process 64byte (16char) blocks.  */
+		    "srlg %[R_LI],%[R_LEN],4\n\t"
+		    "clgije %[R_LI],0,10f\n\t"
+		    "0: vlm %%v16,%%v19,0(%[R_IN])\n\t"
+		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "vperm %%v17,%%v17,%%v17,%%v20\n\t"
+		    "vperm %%v18,%%v18,%%v18,%%v20\n\t"
+		    "vperm %%v19,%%v19,%%v19,%%v20\n\t"
+		    "vstm %%v16,%%v19,0(%[R_OUT])\n\t"
+		    "la %[R_IN],64(%[R_IN])\n\t"
+		    "la %[R_OUT],64(%[R_OUT])\n\t"
+		    "brctg %[R_LI],0b\n\t"
+		    "llgfr %[R_LEN],%[R_LEN]\n\t"
+		    "nilf %[R_LEN],15\n\t"
+		    /* Process 16byte (4char) blocks.  */
+		    "10: srlg %[R_LI],%[R_LEN],2\n\t"
+		    "clgije %[R_LI],0,20f\n\t"
+		    "11: vl %%v16,0(%[R_IN])\n\t"
+		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "vst %%v16,0(%[R_OUT])\n\t"
+		    "la %[R_IN],16(%[R_IN])\n\t"
+		    "la %[R_OUT],16(%[R_OUT])\n\t"
+		    "brctg %[R_LI],11b\n\t"
+		    "nill %[R_LEN],3\n\t"
+		    /* Process <16bytes.  */
+		    "20: sll %[R_LEN],2\n\t"
+		    "ahi %[R_LEN],-1\n\t"
+		    "jl 30f\n\t"
+		    "vll %%v16,%[R_LEN],0(%[R_IN])\n\t"
+		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "vstl %%v16,%[R_LEN],0(%[R_OUT])\n\t"
+		    "la %[R_IN],1(%[R_LEN],%[R_IN])\n\t"
+		    "la %[R_OUT],1(%[R_LEN],%[R_OUT])\n\t"
+		    "30: \n\t"
+		    ".machine pop"
+		    : /* outputs */ [R_OUT] "+a" (outptr)
+		      , [R_IN] "+a" (inptr)
+		      , [R_LI] "=a" (loop_count)
+		      , [R_LEN] "+a" (len)
+		    : /* inputs */
+		    : /* clobber list*/ "memory", "cc"
+		      ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")
+		      ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")
+		      ASM_CLOBBER_VR ("v20")
+		    );
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (internal_ucs4le_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs4le)
+
+
+/* Transform from UCS4 to the internal, UCS4-like format.  Unlike
+   for the other direction we have to check for correct values here.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4_internal)
+# define ONE_DIRECTION		0
+
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4_internal_loop) (struct __gconv_step *step,
+				    struct __gconv_step_data *step_data,
+				    const unsigned char **inptrp,
+				    const unsigned char *inend,
+				    unsigned char **outptrp,
+				    unsigned char *outend,
+				    size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"larl %[R_LI],9f\n\t"
+			"vlm %%v20,%%v21,0(%[R_LI])\n\t"
+			"srlg %[R_LI],%[R_LEN],2\n\t"
+			"clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0: vl %%v16,0(%[R_IN])\n\t"
+			"vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"jno 10f\n\t"
+			"vst %%v16,0(%[R_OUT])\n\t"
+			"la %[R_IN],16(%[R_IN])\n\t"
+			"la %[R_OUT],16(%[R_OUT])\n\t"
+			"brctg %[R_LI],0b\n\t"
+			"llgfr %[R_LEN],%[R_LEN]\n\t"
+			"nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1: sll %[R_LEN],2\n\t"
+			"ahik %[R_LI],%[R_LEN],-1\n\t"
+			"jl 20f\n\t" /* No further bytes available.  */
+			"vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"vlgvb %[R_LI],%%v22,7\n\t"
+			"clr %[R_LI],%[R_LEN]\n\t"
+			"locgrhe %[R_LI],%[R_LEN]\n\t"
+			"locghihe %[R_LEN],0\n\t"
+			"j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			".long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v22,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"jl 20f\n\t"
+			"vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (ucs4_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4_internal)
+
+
+/* Transform from UCS4-LE to the internal encoding.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4le_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4le_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4le_internal)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4le_internal_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"larl %[R_LI],9f\n\t"
+			"vlm %%v20,%%v22,0(%[R_LI])\n\t"
+			"srlg %[R_LI],%[R_LEN],2\n\t"
+			"clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0: vl %%v16,0(%[R_IN])\n\t"
+			"vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"jno 10f\n\t"
+			"vst %%v16,0(%[R_OUT])\n\t"
+			"la %[R_IN],16(%[R_IN])\n\t"
+			"la %[R_OUT],16(%[R_OUT])\n\t"
+			"brctg %[R_LI],0b\n\t"
+			"llgfr %[R_LEN],%[R_LEN]\n\t"
+			"nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1: sll %[R_LEN],2\n\t"
+			"ahik %[R_LI],%[R_LEN],-1\n\t"
+			"jl 20f\n\t" /* No further bytes available.  */
+			"vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"vlgvb %[R_LI],%%v23,7\n\t"
+			"clr %[R_LI],%[R_LEN]\n\t"
+			"locgrhe %[R_LI],%[R_LEN]\n\t"
+			"locghihe %[R_LEN],0\n\t"
+			"j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			".long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* v22: Vector permute mask.  */
+			".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v23,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"jl 20f\n\t"
+			"vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			  ASM_CLOBBER_VR ("v23")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*inptrp + 4 > inend)
+    result = __GCONV_INCOMPLETE_INPUT;
+  else
+    {
+      assert (*outptrp + 4 > outend);
+      result = __GCONV_FULL_OUTPUT;
+    }
+
+  return result;
+}
+ICONV_VX_SINGLE (ucs4le_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4le_internal)
+
+/* Convert from UCS2 to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  STANDARD_FROM_LOOP_ERR_HANDLER (2);
+# define BODY_ORIG							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "larl %[R_TMP],9f\n\t"				\
+		      "vlm %%v20,%%v21,0(%[R_TMP])\n\t"			\
+		      "srlg %[R_TMP],%[R_LEN],3\n\t"			\
+		      "clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0: vl %%v16,0(%[R_IN])\n\t"			\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "la %[R_IN],16(%[R_IN])\n\t"			\
+		      "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		      "brctg %[R_TMP],0b\n\t"				\
+		      "llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1: sll %[R_LEN],1\n\t"				\
+		      "ahik %[R_TMP],%[R_LEN],-1\n\t"			\
+		      "jl 20f\n\t" /* No further bytes available.  */	\
+		      "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "locgrhe %[R_TMP],%[R_LEN]\n\t"			\
+		      "locghihe %[R_LEN],0\n\t"				\
+		      "j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "sll %[R_TMP],1\n\t"				\
+		      "lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "ahi %[R_TMP],-1\n\t"				\
+		      "jl 20f\n\t"					\
+		      "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "ahi %[R_TMP],-16\n\t"				\
+		      "jl 19f\n\t"					\
+		      "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"		\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2_internal)
+
+/* Convert from UCS2 in other endianness to the internal (UCS4-like) format. */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop) /* This is not used.*/
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2reverse_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  if (! ignore_errors_p ())						\
+    {									\
+      result = __GCONV_ILLEGAL_INPUT;					\
+      break;								\
+    }									\
+  inptr += 2;								\
+  ++*irreversible;							\
+  continue;
+
+# define BODY_ORIG \
+  {									\
+    uint16_t u1 = bswap_16 (get16 (inptr));				\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "larl %[R_TMP],9f\n\t"				\
+		      "vlm %%v20,%%v22,0(%[R_TMP])\n\t"			\
+		      "srlg %[R_TMP],%[R_LEN],3\n\t"			\
+		      "clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0: vl %%v16,0(%[R_IN])\n\t"			\
+		      "vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "la %[R_IN],16(%[R_IN])\n\t"			\
+		      "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		      "brctg %[R_TMP],0b\n\t"				\
+		      "llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1: sll %[R_LEN],1\n\t"				\
+		      "ahik %[R_TMP],%[R_LEN],-1\n\t"			\
+		      "jl 20f\n\t" /* No further bytes available.  */	\
+		      "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "locgrhe %[R_TMP],%[R_LEN]\n\t"			\
+		      "locghihe %[R_LEN],0\n\t"				\
+		      "j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v22: Vector permute mask.  */			\
+		      ".short 0x0100,0x0302,0x0504,0x0706\n\t"		\
+		      ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"		\
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "sll %[R_TMP],1\n\t"				\
+		      "lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "ahi %[R_TMP],-1\n\t"				\
+		      "jl 20f\n\t"					\
+		      "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "ahi %[R_TMP],-16\n\t"				\
+		      "jl 19f\n\t"					\
+		      "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"		\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2reverse_internal)
+
+/* Convert from the internal (UCS4-like) format to UCS2.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2_loop) /* This is not used.  */
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+									\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	result = __GCONV_ILLEGAL_INPUT;					\
+	if (! ignore_errors_p ())					\
+	  break;							\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, val);						\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "larl %[R_I],3f\n\t"				\
+			  "vlm %%v20,%%v23,0(%[R_I])\n\t"		\
+			  "0:\n\t"					\
+			  "vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2.  */			\
+			  "vpkf %%v18,%%v16,%%v17\n\t"			\
+			  "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"		\
+			  "jno 11f\n\t"					\
+			  "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "jno 10f\n\t"					\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
+			  "la %[R_IN],32(%[R_IN])\n\t"			\
+			  "la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "brctg %[R_LI],0b\n\t"			\
+			  "j 20f\n\t"					\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  ".long 0xe000,0x10000,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "jo 2b\n\t" /* All ch's in this range, proceed.   */ \
+			  "lhi %[R_TMP],16\n\t"				\
+			  "j 12f\n\t"					\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "jo 1b\n\t" /* All ch's in this range, proceed.   */ \
+			  "lhi %[R_TMP],0\n\t"				\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "agr %[R_I],%[R_TMP]\n\t"			\
+			  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+			  "srl %[R_I],1\n\t"				\
+			  "ahi %[R_I],-1\n\t"				\
+			  "jl 20f\n\t"					\
+			  "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"		\
+			  "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2)
+
+/* Convert from the internal (UCS4-like) format to UCS2 in other endianness. */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2reverse_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2reverse_loop)/* This is not used.*/
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2reverse)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	if (! ignore_errors_p ())					\
+	  {								\
+	    result = __GCONV_ILLEGAL_INPUT;				\
+	    break;							\
+	  }								\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, bswap_16 (val));					\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "larl %[R_I],3f\n\t"				\
+			  "vlm %%v20,%%v24,0(%[R_I])\n\t"		\
+			  "0:\n\t"					\
+			  "vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2 and byteswap.  */	\
+			  "vpkf %%v18,%%v16,%%v17\n\t"			\
+			  "vperm %%v18,%%v18,%%v18,%%v24\n\t"		\
+			  "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"		\
+			  "jno 11f\n\t"					\
+			  "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "jno 10f\n\t"					\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
+			  "la %[R_IN],32(%[R_IN])\n\t"			\
+			  "la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "brctg %[R_LI],0b\n\t"			\
+			  "j 20f\n\t"					\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  ".long 0xe000,0x10000,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* Vector permute mask (v24)  */		\
+			  ".short 0x0100,0x0302,0x0504,0x0706\n\t"	\
+			  ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "jo 2b\n\t" /* All ch's in this range, proceed.  */ \
+			  "lhi %[R_TMP],16\n\t"				\
+			  "j 12f\n\t"					\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "jo 1b\n\t" /* All ch's in this range, proceed.  */ \
+			  "lhi %[R_TMP],0\n\t"				\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "agr %[R_I],%[R_TMP]\n\t"			\
+			  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+			  "srl %[R_I],1\n\t"				\
+			  "ahi %[R_I],-1\n\t"				\
+			  "jl 20f\n\t"					\
+			  "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"		\
+			  "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			    ASM_CLOBBER_VR ("v24")			\
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2reverse)
+
+
+#else
+/* Generate the internal transformations without ifunc if build environment
+   lacks vector support. Instead simply include the common version.  */
+# include <iconv/gconv_simple.c>
+#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (3 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 04/14] S390: Optimize 8bit-generic iconv modules Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-04-21 15:05   ` Stefan Liebler
  2016-02-23  9:22 ` [PATCH 11/14] S390: Fix utf32 to utf8 handling of low surrogates (disable cu41) Stefan Liebler
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch reworks the s390 specific module which used the z900
translate one to one instruction. Now the g5 translate instruction is used,
because it outperforms the troo instruction.

ChangeLog:

	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP):
	Rename to TR_LOOP and usage of tr instead of troo instruction.
---
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 93 +++++++++++++++++-----------
 1 file changed, 56 insertions(+), 37 deletions(-)

diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
index c59f87f..4d79bbf 100644
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
@@ -1,7 +1,6 @@
 /* Conversion between ISO 8859-1 and IBM037.
 
-   This module uses the Z900 variant of the Translate One To One
-   instruction.
+   This module uses the translate instruction.
    Copyright (C) 1997-2016 Free Software Foundation, Inc.
 
    Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
@@ -176,50 +175,70 @@ __attribute__ ((aligned (8))) =
 #define MIN_NEEDED_FROM		1
 #define MIN_NEEDED_TO		1
 
-/* The Z900 variant of troo forces us to always specify a test
-   character which ends the translation.  So if we run into the
-   situation where the translation has been interrupted due to the
-   test character we translate the character by hand and jump back
-   into the instruction.  */
-
-#define TROO_LOOP(TABLE)						\
+#define TR_LOOP(TABLE)							\
   {									\
-    register const unsigned char test __asm__ ("0") = 0;		\
-    register const unsigned char *pTable __asm__ ("1") = TABLE;		\
-    register unsigned char *pOutput __asm__ ("2") = outptr;		\
-    register uint64_t length __asm__ ("3");				\
-    const unsigned char* pInput = inptr;				\
-    uint64_t tmp;							\
-									\
-    length = (inend - inptr < outend - outptr				\
-	      ? inend - inptr : outend - outptr);			\
+    size_t length = (inend - inptr < outend - outptr			\
+		     ? inend - inptr : outend - outptr);		\
 									\
-    __asm__ volatile ("0:                        \n\t"			\
-		      "  troo    %0,%1           \n\t"			\
-		      "  jz      1f              \n\t"			\
-		      "  jo      0b              \n\t"			\
-		      "  llgc    %3,0(%1)        \n\t"			\
-		      "  la      %3,0(%3,%4)     \n\t"			\
-		      "  mvc     0(1,%0),0(%3)   \n\t"			\
-		      "  aghi    %1,1            \n\t"			\
-		      "  aghi    %0,1            \n\t"			\
-		      "  aghi    %2,-1           \n\t"			\
-		      "  j       0b              \n\t"			\
-		      "1:                        \n"			\
+    /* Process in 256 byte blocks.  */					\
+    if (__builtin_expect (length >= 256, 0))				\
+      {									\
+	size_t blocks = length / 256;					\
+	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
+			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
+			     "la %[R_IN],256(%[R_IN])\n\t"		\
+			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
+			     "brctg %[R_LI],0b\n\t"			\
+			     : /* outputs */ [R_IN] "+a" (inptr)	\
+			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
+			     : /* inputs */ [R_TBL] "a" (TABLE)		\
+			     : /* clobber list */ "memory"		\
+			     );						\
+	length = length % 256;						\
+      }									\
 									\
-     : "+a" (pOutput), "+a" (pInput), "+d" (length), "=&a" (tmp)        \
-     : "a" (pTable), "d" (test)						\
-     : "cc");								\
+    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
+    if (length >= 8)							\
+      {									\
+	size_t blocks = length / 8;					\
+	for (int i = 0; i < blocks; i++)				\
+	  {								\
+	    outptr[0] = TABLE[inptr[0]];				\
+	    outptr[1] = TABLE[inptr[1]];				\
+	    outptr[2] = TABLE[inptr[2]];				\
+	    outptr[3] = TABLE[inptr[3]];				\
+	    outptr[4] = TABLE[inptr[4]];				\
+	    outptr[5] = TABLE[inptr[5]];				\
+	    outptr[6] = TABLE[inptr[6]];				\
+	    outptr[7] = TABLE[inptr[7]];				\
+	    inptr += 8;							\
+	    outptr += 8;						\
+	  }								\
+	length = length % 8;						\
+      }									\
 									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
+    /* Process remaining 0...7 bytes.  */				\
+    switch (length)							\
+      {									\
+      case 7: outptr[6] = TABLE[inptr[6]];				\
+      case 6: outptr[5] = TABLE[inptr[5]];				\
+      case 5: outptr[4] = TABLE[inptr[4]];				\
+      case 4: outptr[3] = TABLE[inptr[3]];				\
+      case 3: outptr[2] = TABLE[inptr[2]];				\
+      case 2: outptr[1] = TABLE[inptr[1]];				\
+      case 1: outptr[0] = TABLE[inptr[0]];				\
+      case 0: break;							\
+      }									\
+    inptr += length;							\
+    outptr += length;							\
   }
 
+
 /* First define the conversion function from ISO 8859-1 to CP037.  */
 #define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
 #define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
 #define LOOPFCT			FROM_LOOP
-#define BODY TROO_LOOP (table_iso8859_1_to_cp037)
+#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
 
 #include <iconv/loop.c>
 
@@ -228,7 +247,7 @@ __attribute__ ((aligned (8))) =
 #define MIN_NEEDED_INPUT	MIN_NEEDED_TO
 #define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
 #define LOOPFCT			TO_LOOP
-#define BODY TROO_LOOP (table_cp037_iso8859_1);
+#define BODY			TR_LOOP (table_cp037_iso8859_1);
 
 #include <iconv/loop.c>
 
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 09/14] S390: Optimize utf16-utf32 module.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (8 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 12/14] S390: Fix utf32 to utf16 handling of low surrogates (disable cu42) Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-04-21 14:55   ` Stefan Liebler
  2016-02-23  9:22 ` [PATCH 07/14] S390: Optimize utf8-utf32 module Stefan Liebler
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch reworks the s390 specific module to convert between utf16 and utf32.
Now ifunc is used to choose either the c or etf3eh (with convert utf
instruction) variants at runtime.
Furthermore a new vector variant for z13 is introduced which will be build
and chosen if vector support is available at build / runtime.

In case of converting utf 32 to utf16, the vector variant optimizes input of
2byte utf16 characters. The convert utf instruction is used if an utf16
surrogate is found.

For the other direction utf16 to utf32, the cu24 instruction can't be re-
enabled, because it does not report an error, if the input-stream consists of
a single low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13,
too. Thus there is only the c or the new vector variant, which can handle utf16
surrogate characters.

This patch also fixes some whitespace errors. Furthermore, the etf3eh variant is
handling the "UTF-xx//IGNORE" case now. Before they ignored the ignore-case and
always stopped at an error.

ChangeLog:

	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Use ifunc to select c,
	etf3eh or new vector loop-variant.
---
 sysdeps/s390/s390-64/utf16-utf32-z9.c | 471 +++++++++++++++++++++++++++-------
 1 file changed, 379 insertions(+), 92 deletions(-)

diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
index a3863ee..4c2c548 100644
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ b/sysdeps/s390/s390-64/utf16-utf32-z9.c
@@ -30,47 +30,27 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
 /* UTF-32 big endian byte order mark.  */
 #define BOM_UTF32               0x0000feffu
 
 /* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	        0xfeff
+#define BOM_UTF16               0xfeff
 
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
 #define MIN_NEEDED_FROM		2
 #define MAX_NEEDED_FROM		4
 #define MIN_NEEDED_TO		4
-#define FROM_LOOP		from_utf16_loop
-#define TO_LOOP			to_utf16_loop
+#define FROM_LOOP		__from_utf16_loop
+#define TO_LOOP			__to_utf16_loop
 #define FROM_DIRECTION		(dir == from_utf16)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      if (dir == to_utf16)						\
-	{								\
-          /* Emit the UTF-16 Byte Order Mark.  */			\
-          if (__glibc_unlikely (outbuf + 2 > outend))			      \
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put16u (outbuf, BOM_UTF16);					\
-	  outbuf += 2;							\
-	}								\
-      else								\
-	{								\
-          /* Emit the UTF-32 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 4 > outend))			      \
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put32u (outbuf, BOM_UTF32);					\
-	  outbuf += 4;							\
-	}								\
-    }
 
 /* Direction of the transformation.  */
 enum direction
@@ -169,16 +149,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-		      "+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -187,44 +167,46 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      if (dir == to_utf16)						\
+	{								\
+	  /* Emit the UTF-16 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 2 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put16u (outbuf, BOM_UTF16);					\
+	  outbuf += 2;							\
+	}								\
+      else								\
+	{								\
+	  /* Emit the UTF-32 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 4 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put32u (outbuf, BOM_UTF32);					\
+	  outbuf += 4;							\
+	}								\
+    }
+
 /* Conversion function from UTF-16 to UTF-32 internal/BE.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
 /* The software routine is copied from utf-16.c (minus bytes
    swapping).  */
-#define BODY								\
+#define BODY_FROM_C							\
   {									\
-    /* The hardware instruction currently fails to report an error for	\
-       isolated low surrogates so we have to disable the instruction	\
-       until this gets resolved.  */					\
-    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
-      {									\
-	HARDWARE_CONVERT ("cu24 %0, %1, 1");				\
-	if (inptr != inend)						\
-	  {								\
-	    /* Check if the third byte is				\
-	       a valid start of a UTF-16 surrogate.  */			\
-	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
-	      STANDARD_FROM_LOOP_ERR_HANDLER (3);			\
-									\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint16_t u1 = get16 (inptr);					\
 									\
     if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
@@ -235,15 +217,15 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        /* An isolated low-surrogate was found.  This has to be         \
+	/* An isolated low-surrogate was found.  This has to be         \
 	   considered ill-formed.  */					\
-        if (__glibc_unlikely (u1 >= 0xdc00))				      \
+	if (__glibc_unlikely (u1 >= 0xdc00))				\
 	  {								\
 	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
 	  }								\
 	/* It's a surrogate character.  At least the first word says	\
 	   it is.  */							\
-	if (__glibc_unlikely (inptr + 4 > inend))			      \
+	if (__glibc_unlikely (inptr + 4 > inend))			\
 	  {								\
 	    /* We don't have enough input for another complete input	\
 	       character.  */						\
@@ -266,48 +248,200 @@ gconv_end (struct __gconv_step *data)
       }									\
     outptr += 4;							\
   }
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
+
+#define BODY_FROM_VX							\
+  {									\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
+		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t"						\
+		  /* Enlarge to UTF-32.  */				\
+		  "vuplhh %%v17,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vupllh %%v18,%%v16\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
+		  "aghi %[R_OUTLEN],-32\n\t"				\
+		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
+		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  /* At least on uint16_t is in range of surrogates.	\
+		     Store the preceding chars.  */			\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "vuplhh %%v17,%%v16\n\t"				\
+		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 12f\n\t"						\
+		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "vupllh %%v18,%%v16\n\t"				\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "12: lghi %[R_TMP2],16\n\t"				\
+		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
+		  "srl %[R_TMP2],1\n\t"					\
+		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
+		  "aghi %[R_OUTLEN],-4\n\t"				\
+		  "j 16f\n\t"						\
+		  /* Handle remaining bytes.  */			\
+		  "2:\n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "clgfi %[R_INLEN],1\n\t"				\
+		  "je 97f\n\t" /* Only one byte available.  */		\
+		  "jl 99f\n\t" /* End if no bytes available.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle remaining uint16_t values.  */		\
+		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 96f \n\t"						\
+		  "clfi %[R_TMP],0xd800\n\t"				\
+		  "jhe 15f\n\t"						\
+		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],13b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Handle UTF-16 surrogate pair.  */			\
+		  "15: clfi %[R_TMP],0xdfff\n\t"			\
+		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
+		  "16: clfi %[R_TMP],0xdc00\n\t"			\
+		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
+		  "slgfi %[R_INLEN],4\n\t"				\
+		  "jl 97f\n\t" /* Big enough input?  */			\
+		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "slfi %[R_TMP],0xd7c0\n\t"				\
+		  "sll %[R_TMP],10\n\t"					\
+		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
+		  "nilf %[R_TMP3],0xfc00\n\t"				\
+		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "jne 98f\n\t"						\
+		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "aghi %[R_TMP2],-2\n\t"				\
+		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  "96:\n\t" /* Return full output.  */			\
+		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "97:\n\t" /* Return incomplete input.  */		\
+		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "98:\n\t" /* Return Illegal character.  */		\
+		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
+  }
+
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__from_utf16_loop_c
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf16_loop_c)
+__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
+__from_utf16_loop;
+
+static void *
+__from_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf16_loop_vx;
+  else
+    return __from_utf16_loop_c;
+}
+
+strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
+#else
+# define LOOPFCT		FROM_LOOP
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+#endif
 
 /* Conversion from UTF-32 internal/BE to UTF-16.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine is copied from utf-16.c (minus bytes
    swapping).  */
-#define BODY								\
+#define BODY_TO_C							\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu42 %0, %1");				\
-									\
-	if (inptr != inend)						\
-	  {								\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint32_t c = get32 (inptr);						\
 									\
     if (__builtin_expect (c <= 0xd7ff, 1)				\
 	|| (c >=0xdc00 && c <= 0xffff))					\
       {									\
-        /* Two UTF-16 chars.  */					\
-        put16 (outptr, c);						\
+	/* Two UTF-16 chars.  */					\
+	put16 (outptr, c);						\
       }									\
     else if (__builtin_expect (c >= 0x10000, 1)				\
 	     && __builtin_expect (c <= 0x10ffff, 1))			\
       {									\
 	/* Four UTF-16 chars.  */					\
-        uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
+	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
 	uint16_t out;							\
 									\
 	/* Generate a surrogate character.  */				\
-	if (__glibc_unlikely (outptr + 4 > outend))			      \
+	if (__glibc_unlikely (outptr + 4 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
@@ -326,12 +460,165 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
       }									\
     outptr += 2;							\
     inptr += 4;								\
   }
+
+#define BODY_TO_ETF3EH							\
+  {									\
+    HARDWARE_CONVERT ("cu42 %0, %1");					\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars			\
+		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
+		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP2],0\n\t"				\
+		  /* Shorten to UTF-16.  */				\
+		  "vpkf %%v18,%%v16,%%v17\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t"						\
+		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
+		  "jno 11f\n\t"						\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "vst %%v18,0(%[R_OUT])\n\t"				\
+		  "la %[R_IN],32(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-32\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],32,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
+		     and check for ch >= 0x10000. (v30, v31)  */	\
+		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
+		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
+		  /* At least on UTF32 char is in range of surrogates.	\
+		     Store the preceding characters.  */		\
+		  "11: ahi %[R_TMP2],16\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
+		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 20f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handles UTF16 surrogates with convert instruction.  */ \
+		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_TO_VX
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf16_loop_c)
+__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
+__to_utf16_loop;
+
+static void *
+__to_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf16_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf16_loop_etf3eh;
+  else
+    return __to_utf16_loop_c;
+}
+
+strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
+
+
 #include <iconv/skeleton.c>
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 03/14] S390: Configure check for vector support in gcc.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (5 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 11/14] S390: Fix utf32 to utf8 handling of low surrogates (disable cu41) Stefan Liebler
@ 2016-02-23  9:22 ` Stefan Liebler
  2016-02-23  9:22 ` [PATCH 05/14] S390: Optimize builtin iconv-modules Stefan Liebler
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:22 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

The S390 specific test checks if the gcc has support for vector registers
by compiling an inline assembly which clobbers vector registers.
On success the macro HAVE_S390_VX_GCC_SUPPORT is defined.
This macro can be used to determine if e.g. clobbering vector registers
is allowed or not.

ChangeLog:

	* config.h.in (HAVE_S390_VX_GCC_SUPPORT): New macro undefine.
	* sysdeps/s390/configure.ac: Add test for S390 vector register
	support in gcc.
	* sysdeps/s390/configure: Regenerated.
---
 config.h.in               |  4 ++++
 sysdeps/s390/configure    | 32 ++++++++++++++++++++++++++++++++
 sysdeps/s390/configure.ac | 21 +++++++++++++++++++++
 3 files changed, 57 insertions(+)

diff --git a/config.h.in b/config.h.in
index 13c0044..0143ef4 100644
--- a/config.h.in
+++ b/config.h.in
@@ -73,6 +73,10 @@
 /* Define if assembler supports vector instructions on S390.  */
 #undef  HAVE_S390_VX_ASM_SUPPORT
 
+/* Define if gcc supports vector registers as clobbers in inline assembly
+   on S390.  */
+#undef  HAVE_S390_VX_GCC_SUPPORT
+
 /* Define if assembler supports Intel MPX.  */
 #undef  HAVE_MPX_SUPPORT
 
diff --git a/sysdeps/s390/configure b/sysdeps/s390/configure
index 0fa54c3..c9fb69c 100644
--- a/sysdeps/s390/configure
+++ b/sysdeps/s390/configure
@@ -144,6 +144,38 @@ else
 $as_echo "$as_me: WARNING: Use binutils with vector-support in order to use optimized implementations." >&2;}
 fi
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for S390 vector support in gcc" >&5
+$as_echo_n "checking for S390 vector support in gcc... " >&6; }
+if ${libc_cv_gcc_s390_vx+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat > conftest.c <<\EOF
+void testvecclobber ()
+{
+  __asm__ ("" : : : "v16");
+}
+EOF
+if { ac_try='${CC-cc} --shared conftest.c -o conftest.o &> /dev/null'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; } ;
+then
+  libc_cv_gcc_s390_vx=yes
+else
+  libc_cv_gcc_s390_vx=no
+fi
+rm -f conftest*
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_gcc_s390_vx" >&5
+$as_echo "$libc_cv_gcc_s390_vx" >&6; }
+
+if test "$libc_cv_gcc_s390_vx" = yes ;
+then
+  $as_echo "#define HAVE_S390_VX_GCC_SUPPORT 1" >>confdefs.h
+
+fi
 
 test -n "$critic_missing" && as_fn_error $? "
 *** $critic_missing" "$LINENO" 5
diff --git a/sysdeps/s390/configure.ac b/sysdeps/s390/configure.ac
index 4da134e..1db6d84 100644
--- a/sysdeps/s390/configure.ac
+++ b/sysdeps/s390/configure.ac
@@ -64,6 +64,27 @@ else
   AC_MSG_WARN([Use binutils with vector-support in order to use optimized implementations.])
 fi
 
+AC_CACHE_CHECK(for S390 vector support in gcc, libc_cv_gcc_s390_vx, [dnl
+cat > conftest.c <<\EOF
+void testvecclobber ()
+{
+  __asm__ ("" : : : "v16");
+}
+EOF
+dnl
+dnl test, if gcc supports S390 vector registers as clobber in inline assembly
+if AC_TRY_COMMAND([${CC-cc} --shared conftest.c -o conftest.o &> /dev/null]) ;
+then
+  libc_cv_gcc_s390_vx=yes
+else
+  libc_cv_gcc_s390_vx=no
+fi
+rm -f conftest* ])
+
+if test "$libc_cv_gcc_s390_vx" = yes ;
+then
+  AC_DEFINE(HAVE_S390_VX_GCC_SUPPORT)
+fi
 
 test -n "$critic_missing" && AC_MSG_ERROR([
 *** $critic_missing])
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (12 preceding siblings ...)
  2016-02-23  9:23 ` [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too Stefan Liebler
@ 2016-02-23  9:23 ` Stefan Liebler
  2016-02-23 17:57   ` Joseph Myers
  2016-03-01 15:01 ` [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:23 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
Thus this patch fixes this behaviour for converting from utf32 to internal and
from internal to utf8.

Furthermore the conversion from utf16 to internal does not report an error if the
input-stream consists of two low-surrogate values. If an uint16_t value is in the
range of 0xd800 .. 0xdfff, the next uint16_t value is checked, if it is in the
range of a low surrogate (0xdc00 .. 0xdfff). Afterwards these two uint16_t
values are interpreted as a high- and low-surrogates pair. But there is no test
if the first uint16_t value is really in the range of a high-surrogate
(0xd800 .. 0xdbff). If there would be two uint16_t values in the range of a low
surrogate, then they will be treated as a valid high- and low-surrogates pair.
This patch adds this test.

ChangeLog:

	* iconvdata/utf-16.c (BODY): Report an error if first word is not a
	valid high surrogate.
	* iconvdata/utf-32.c (BODY): Report an error if the value is in range
	of an utf16 surrogate.
	* iconv/gconv_simple.c (BODY): Likewise.
---
 iconv/gconv_simple.c |  3 ++-
 iconvdata/utf-16.c   | 12 ++++++++++++
 iconvdata/utf-32.c   |  2 +-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index f66bf34..e5284e4 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -892,7 +892,8 @@ ucs4le_internal_loop_single (struct __gconv_step *step,
     if (__glibc_likely (wc < 0x80))					      \
       /* It's an one byte sequence.  */					      \
       *outptr++ = (unsigned char) wc;					      \
-    else if (__glibc_likely (wc <= 0x7fffffff))				      \
+    else if (__glibc_likely (wc <= 0x7fffffff				      \
+			     && (wc < 0xd800 || wc > 0xdfff)))		      \
       {									      \
 	size_t step;							      \
 	unsigned char *start;						      \
diff --git a/iconvdata/utf-16.c b/iconvdata/utf-16.c
index 2d74a13..dbbcd6d 100644
--- a/iconvdata/utf-16.c
+++ b/iconvdata/utf-16.c
@@ -295,6 +295,12 @@ gconv_end (struct __gconv_step *data)
 	  {								      \
 	    uint16_t u2;						      \
 									      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
@@ -329,6 +335,12 @@ gconv_end (struct __gconv_step *data)
 	  }								      \
 	else								      \
 	  {								      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
diff --git a/iconvdata/utf-32.c b/iconvdata/utf-32.c
index 0d6fe30..25f6fc6 100644
--- a/iconvdata/utf-32.c
+++ b/iconvdata/utf-32.c
@@ -239,7 +239,7 @@ gconv_end (struct __gconv_step *data)
     if (swap)								      \
       u1 = bswap_32 (u1);						      \
 									      \
-    if (__glibc_unlikely (u1 >= 0x110000))				      \
+    if (__glibc_unlikely (u1 >= 0x110000 || (u1 >= 0xd800 && u1 < 0xe000)))   \
       {									      \
 	/* This is illegal.  */						      \
 	STANDARD_FROM_LOOP_ERR_HANDLER (4);				      \
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (11 preceding siblings ...)
  2016-02-23  9:22 ` [PATCH 08/14] S390: Optimize utf8-utf16 module Stefan Liebler
@ 2016-02-23  9:23 ` Stefan Liebler
  2016-02-23 12:06   ` Stefan Liebler
  2016-04-21 15:10   ` Stefan Liebler
  2016-02-23  9:23 ` [PATCH 14/14] Fix UTF-16 surrogate handling Stefan Liebler
  2016-03-01 15:01 ` [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
  14 siblings, 2 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23  9:23 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch reworks the existing s390 64bit specific iconv modules in order
to use them on s390 31bit, too.

Thus the parts for subdirectory iconvdata in sysdeps/s390/s390-64/Makefile
were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
All those modules are moved from sysdeps/s390/s390-64 directory to sysdeps/s390.

The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
on count) instruction on 31bit s390 instead of brctg, because the brctg is a
zarch instruction and is not available on a 31bit kernel.

The utf modules are using zarch instructions, thus the directive machinemode
zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
only if zarch instructions are available (64bit kernel in 31bit compat-mode).
Furthermore some variable types were changed. E.g. unsigned long long would be
a register pair on s390 31bit, but we want only one single register.
For variables of type size_t the register contents have to be enlarged from a
32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
in such cases.

ChangeLog:

	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
	Move to ...
	* sysdeps/s390/Makefile: ... here.
	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
	* sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
	(BRANCH_ON_COUNT): New define.
	(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
	* sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
	* sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
	* sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
---
 sysdeps/s390/Makefile                        |  83 +++
 sysdeps/s390/iso-8859-1_cp037_z900.c         | 262 +++++++++
 sysdeps/s390/s390-64/Makefile                |  84 ---
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 256 ---------
 sysdeps/s390/s390-64/utf16-utf32-z9.c        | 624 --------------------
 sysdeps/s390/s390-64/utf8-utf16-z9.c         | 806 --------------------------
 sysdeps/s390/s390-64/utf8-utf32-z9.c         | 807 --------------------------
 sysdeps/s390/utf16-utf32-z9.c                | 636 +++++++++++++++++++++
 sysdeps/s390/utf8-utf16-z9.c                 | 818 ++++++++++++++++++++++++++
 sysdeps/s390/utf8-utf32-z9.c                 | 820 +++++++++++++++++++++++++++
 10 files changed, 2619 insertions(+), 2577 deletions(-)
 create mode 100644 sysdeps/s390/Makefile
 create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
 create mode 100644 sysdeps/s390/utf16-utf32-z9.c
 create mode 100644 sysdeps/s390/utf8-utf16-z9.c
 create mode 100644 sysdeps/s390/utf8-utf32-z9.c

diff --git a/sysdeps/s390/Makefile b/sysdeps/s390/Makefile
new file mode 100644
index 0000000..9b17342
--- /dev/null
+++ b/sysdeps/s390/Makefile
@@ -0,0 +1,83 @@
+ifeq ($(subdir),iconvdata)
+ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
+ISO-8859-1_CP037_Z900-map := gconv.map
+
+UTF8_UTF32_Z9-routines := utf8-utf32-z9
+UTF8_UTF32_Z9-map := gconv.map
+
+UTF16_UTF32_Z9-routines := utf16-utf32-z9
+UTF16_UTF32_Z9-map := gconv.map
+
+UTF8_UTF16_Z9-routines := utf8-utf16-z9
+UTF8_UTF16_Z9-map := gconv.map
+
+s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
+
+extra-modules-left += $(s390x-iconv-modules)
+include extra-module.mk
+
+cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
+lib := iconvdata
+include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
+
+extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
+install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
+
+$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
+$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
+	$(do-install-program)
+
+$(objpfx)gconv-modules-s390: gconv-modules
+	${AWK} 'BEGIN { emitted = 0 } \
+	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
+	!emitted { emit_s390_modules(); emitted = 1; print; } \
+	function emit_s390_modules() { \
+	  # Emit header line. \
+	  print "# S/390 hardware accelerated modules"; \
+	  print_val("#", 8); \
+	  print_val("from", 24); \
+	  print_val("to", 24); \
+	  print_val("module", 24); \
+	  printf "cost\n"; \
+	  # Emit s390-specific modules. \
+	  modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900"); \
+	  modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900"); \
+	  modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9"); \
+	  modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9"); \
+	  modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9"); \
+	  modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9"); \
+	  modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9"); \
+	  printf "\n# Default glibc modules\n"; \
+	} \
+	function modul(from, to, file, cost) { \
+	  print_val("module", 8); \
+	  print_val(from, 24); \
+	  print_val(to, 24); \
+	  print_val(file, 24); \
+	  if (cost == 0) cost = 1; \
+	  printf "%d\n", cost; \
+	} \
+	function print_val(val, width) { \
+	  # Emit value followed by tabs. \
+	  printf "%s", val; \
+	  len = length(val); \
+	  if (len < width) { \
+	    len = width - len; \
+	    nr_tabs = len / 8; \
+	    if (len % 8 != 0) nr_tabs++; \
+	  } \
+	  else nr_tabs = 1; \
+	  for (i = 1; i <= nr_tabs; i++) printf "\t"; \
+	}' < $< > $@
+
+GCONV_MODULES = gconv-modules-s390
+
+endif
diff --git a/sysdeps/s390/iso-8859-1_cp037_z900.c b/sysdeps/s390/iso-8859-1_cp037_z900.c
new file mode 100644
index 0000000..5c19218
--- /dev/null
+++ b/sysdeps/s390/iso-8859-1_cp037_z900.c
@@ -0,0 +1,262 @@
+/* Conversion between ISO 8859-1 and IBM037.
+
+   This module uses the translate instruction.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+
+// conversion table from ISO-8859-1 to IBM037
+static const unsigned char table_iso8859_1_to_cp037[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
+  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
+  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
+  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
+  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
+  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
+  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
+  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
+  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
+  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
+  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
+  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
+  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
+  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
+  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
+  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
+  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
+  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
+  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
+  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
+  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
+  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
+  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
+  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
+  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
+  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
+  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
+  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
+  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
+  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
+  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
+  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
+  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
+  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
+  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
+  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
+  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
+  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
+  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
+  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
+  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
+  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
+  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
+  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
+  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
+  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
+  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
+  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
+  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
+  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
+  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
+  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
+  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
+  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
+  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
+};
+
+// conversion table from IBM037 to ISO-8859-1
+static const unsigned char table_cp037_iso8859_1[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
+  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
+  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
+  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
+  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
+  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
+  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
+  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
+  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
+  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
+  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
+  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
+  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
+  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
+  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
+  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
+  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
+  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
+  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
+  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
+  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
+  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
+  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
+  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
+  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
+  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
+  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
+  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
+  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
+  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
+  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
+  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
+  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
+  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
+  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
+  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
+  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
+  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
+  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
+  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
+  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
+  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
+  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
+  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
+  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
+  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
+  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
+  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
+  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
+  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
+  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
+  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
+  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
+  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
+  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
+};
+
+/* Definitions used in the body of the `gconv' function.  */
+#define CHARSET_NAME		"ISO-8859-1//"
+#define FROM_LOOP		iso8859_1_to_cp037_z900
+#define TO_LOOP			cp037_to_iso8859_1_z900
+#define DEFINE_INIT		1
+#define DEFINE_FINI		1
+#define MIN_NEEDED_FROM		1
+#define MIN_NEEDED_TO		1
+
+# if defined __s390x__
+#  define BRANCH_ON_COUNT(REG,LBL) "brctg %" #REG "," #LBL "\n\t"
+# else
+#  define BRANCH_ON_COUNT(REG,LBL) "brct %" #REG "," #LBL "\n\t"
+# endif
+
+#define TR_LOOP(TABLE)							\
+  {									\
+    size_t length = (inend - inptr < outend - outptr			\
+		     ? inend - inptr : outend - outptr);		\
+									\
+    /* Process in 256 byte blocks.  */					\
+    if (__builtin_expect (length >= 256, 0))				\
+      {									\
+	size_t blocks = length / 256;					\
+	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
+			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
+			     "la %[R_IN],256(%[R_IN])\n\t"		\
+			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
+			     BRANCH_ON_COUNT ([R_LI], 0b)		\
+			     : /* outputs */ [R_IN] "+a" (inptr)	\
+			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
+			     : /* inputs */ [R_TBL] "a" (TABLE)		\
+			     : /* clobber list */ "memory"		\
+			     );						\
+	length = length % 256;						\
+      }									\
+									\
+    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
+    if (length >= 8)							\
+      {									\
+	size_t blocks = length / 8;					\
+	for (int i = 0; i < blocks; i++)				\
+	  {								\
+	    outptr[0] = TABLE[inptr[0]];				\
+	    outptr[1] = TABLE[inptr[1]];				\
+	    outptr[2] = TABLE[inptr[2]];				\
+	    outptr[3] = TABLE[inptr[3]];				\
+	    outptr[4] = TABLE[inptr[4]];				\
+	    outptr[5] = TABLE[inptr[5]];				\
+	    outptr[6] = TABLE[inptr[6]];				\
+	    outptr[7] = TABLE[inptr[7]];				\
+	    inptr += 8;							\
+	    outptr += 8;						\
+	  }								\
+	length = length % 8;						\
+      }									\
+									\
+    /* Process remaining 0...7 bytes.  */				\
+    switch (length)							\
+      {									\
+      case 7: outptr[6] = TABLE[inptr[6]];				\
+      case 6: outptr[5] = TABLE[inptr[5]];				\
+      case 5: outptr[4] = TABLE[inptr[4]];				\
+      case 4: outptr[3] = TABLE[inptr[3]];				\
+      case 3: outptr[2] = TABLE[inptr[2]];				\
+      case 2: outptr[1] = TABLE[inptr[1]];				\
+      case 1: outptr[0] = TABLE[inptr[0]];				\
+      case 0: break;							\
+      }									\
+    inptr += length;							\
+    outptr += length;							\
+  }
+
+
+/* First define the conversion function from ISO 8859-1 to CP037.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
+
+#include <iconv/loop.c>
+
+
+/* Next, define the conversion function from CP037 to ISO 8859-1.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define LOOPFCT			TO_LOOP
+#define BODY			TR_LOOP (table_cp037_iso8859_1);
+
+#include <iconv/loop.c>
+
+
+/* Now define the toplevel functions.  */
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index d1ee59d..0a50514 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -9,87 +9,3 @@ CFLAGS-rtld.c += -Wno-uninitialized -Wno-unused
 CFLAGS-dl-load.c += -Wno-unused
 CFLAGS-dl-reloc.c += -Wno-unused
 endif
-
-ifeq ($(subdir),iconvdata)
-ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
-ISO-8859-1_CP037_Z900-map := gconv.map
-
-UTF8_UTF32_Z9-routines := utf8-utf32-z9
-UTF8_UTF32_Z9-map := gconv.map
-
-UTF16_UTF32_Z9-routines := utf16-utf32-z9
-UTF16_UTF32_Z9-map := gconv.map
-
-UTF8_UTF16_Z9-routines := utf8-utf16-z9
-UTF8_UTF16_Z9-map := gconv.map
-
-s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
-
-extra-modules-left += $(s390x-iconv-modules)
-include extra-module.mk
-
-cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
-lib := iconvdata
-include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
-
-extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
-install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
-
-$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
-$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
-	$(do-install-program)
-
-$(objpfx)gconv-modules-s390: gconv-modules
-	${AWK} 'BEGIN { emitted = 0 } \
-	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
-	!emitted { emit_s390_modules(); emitted = 1; print; } \
-	function emit_s390_modules() { \
-	  # Emit header line. \
-	  print "# S/390 hardware accelerated modules"; \
-	  print_val("#", 8); \
-	  print_val("from", 24); \
-	  print_val("to", 24); \
-	  print_val("module", 24); \
-	  printf "cost\n"; \
-	  # Emit s390-specific modules. \
-	  modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900"); \
-	  modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900"); \
-	  modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9"); \
-	  modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9"); \
-	  modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9"); \
-	  modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9"); \
-	  modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9"); \
-	  modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9"); \
-	  modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9"); \
-	  modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9"); \
-	  modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9"); \
-	  printf "\n# Default glibc modules\n"; \
-	} \
-	function modul(from, to, file, cost) { \
-	  print_val("module", 8); \
-	  print_val(from, 24); \
-	  print_val(to, 24); \
-	  print_val(file, 24); \
-	  if (cost == 0) cost = 1; \
-	  printf "%d\n", cost; \
-	} \
-	function print_val(val, width) { \
-	  # Emit value followed by tabs. \
-	  printf "%s", val; \
-	  len = length(val); \
-	  if (len < width) { \
-	    len = width - len; \
-	    nr_tabs = len / 8; \
-	    if (len % 8 != 0) nr_tabs++; \
-	  } \
-	  else nr_tabs = 1; \
-	  for (i = 1; i <= nr_tabs; i++) printf "\t"; \
-	}' < $< > $@
-
-GCONV_MODULES = gconv-modules-s390
-
-endif
diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
deleted file mode 100644
index 4d79bbf..0000000
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Conversion between ISO 8859-1 and IBM037.
-
-   This module uses the translate instruction.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-
-// conversion table from ISO-8859-1 to IBM037
-static const unsigned char table_iso8859_1_to_cp037[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
-  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
-  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
-  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
-  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
-  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
-  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
-  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
-  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
-  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
-  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
-  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
-  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
-  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
-  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
-  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
-  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
-  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
-  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
-  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
-  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
-  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
-  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
-  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
-  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
-  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
-  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
-  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
-  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
-  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
-  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
-  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
-  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
-  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
-  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
-  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
-  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
-  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
-  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
-  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
-  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
-  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
-  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
-  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
-  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
-  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
-  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
-  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
-  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
-  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
-  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
-  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
-  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
-  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
-  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
-};
-
-// conversion table from IBM037 to ISO-8859-1
-static const unsigned char table_cp037_iso8859_1[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
-  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
-  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
-  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
-  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
-  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
-  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
-  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
-  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
-  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
-  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
-  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
-  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
-  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
-  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
-  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
-  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
-  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
-  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
-  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
-  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
-  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
-  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
-  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
-  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
-  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
-  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
-  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
-  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
-  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
-  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
-  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
-  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
-  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
-  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
-  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
-  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
-  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
-  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
-  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
-  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
-  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
-  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
-  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
-  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
-  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
-  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
-  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
-  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
-  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
-  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
-  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
-  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
-  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
-  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
-};
-
-/* Definitions used in the body of the `gconv' function.  */
-#define CHARSET_NAME		"ISO-8859-1//"
-#define FROM_LOOP		iso8859_1_to_cp037_z900
-#define TO_LOOP			cp037_to_iso8859_1_z900
-#define DEFINE_INIT		1
-#define DEFINE_FINI		1
-#define MIN_NEEDED_FROM		1
-#define MIN_NEEDED_TO		1
-
-#define TR_LOOP(TABLE)							\
-  {									\
-    size_t length = (inend - inptr < outend - outptr			\
-		     ? inend - inptr : outend - outptr);		\
-									\
-    /* Process in 256 byte blocks.  */					\
-    if (__builtin_expect (length >= 256, 0))				\
-      {									\
-	size_t blocks = length / 256;					\
-	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
-			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
-			     "la %[R_IN],256(%[R_IN])\n\t"		\
-			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
-			     "brctg %[R_LI],0b\n\t"			\
-			     : /* outputs */ [R_IN] "+a" (inptr)	\
-			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
-			     : /* inputs */ [R_TBL] "a" (TABLE)		\
-			     : /* clobber list */ "memory"		\
-			     );						\
-	length = length % 256;						\
-      }									\
-									\
-    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
-    if (length >= 8)							\
-      {									\
-	size_t blocks = length / 8;					\
-	for (int i = 0; i < blocks; i++)				\
-	  {								\
-	    outptr[0] = TABLE[inptr[0]];				\
-	    outptr[1] = TABLE[inptr[1]];				\
-	    outptr[2] = TABLE[inptr[2]];				\
-	    outptr[3] = TABLE[inptr[3]];				\
-	    outptr[4] = TABLE[inptr[4]];				\
-	    outptr[5] = TABLE[inptr[5]];				\
-	    outptr[6] = TABLE[inptr[6]];				\
-	    outptr[7] = TABLE[inptr[7]];				\
-	    inptr += 8;							\
-	    outptr += 8;						\
-	  }								\
-	length = length % 8;						\
-      }									\
-									\
-    /* Process remaining 0...7 bytes.  */				\
-    switch (length)							\
-      {									\
-      case 7: outptr[6] = TABLE[inptr[6]];				\
-      case 6: outptr[5] = TABLE[inptr[5]];				\
-      case 5: outptr[4] = TABLE[inptr[4]];				\
-      case 4: outptr[3] = TABLE[inptr[3]];				\
-      case 3: outptr[2] = TABLE[inptr[2]];				\
-      case 2: outptr[1] = TABLE[inptr[1]];				\
-      case 1: outptr[0] = TABLE[inptr[0]];				\
-      case 0: break;							\
-      }									\
-    inptr += length;							\
-    outptr += length;							\
-  }
-
-
-/* First define the conversion function from ISO 8859-1 to CP037.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
-#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
-
-#include <iconv/loop.c>
-
-
-/* Next, define the conversion function from CP037 to ISO 8859-1.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
-#define BODY			TR_LOOP (table_cp037_iso8859_1);
-
-#include <iconv/loop.c>
-
-
-/* Now define the toplevel functions.  */
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
deleted file mode 100644
index 4c2c548..0000000
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ /dev/null
@@ -1,624 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM_UTF32               0x0000feffu
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16               0xfeff
-
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		2
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf16_loop
-#define TO_LOOP			__to_utf16_loop
-#define FROM_DIRECTION		(dir == from_utf16)
-#define ONE_DIRECTION           0
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf16,
-  from_utf16
-};
-
-struct utf16_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf16_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf16;
-    }
-  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
-	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf16;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf16)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      if (dir == to_utf16)						\
-	{								\
-	  /* Emit the UTF-16 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 2 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put16u (outbuf, BOM_UTF16);					\
-	  outbuf += 2;							\
-	}								\
-      else								\
-	{								\
-	  /* Emit the UTF-32 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 4 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put32u (outbuf, BOM_UTF32);					\
-	  outbuf += 4;							\
-	}								\
-    }
-
-/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_FROM_C							\
-  {									\
-    uint16_t u1 = get16 (inptr);					\
-									\
-    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
-      {									\
-	/* No surrogate.  */						\
-	put32 (outptr, u1);						\
-	inptr += 2;							\
-      }									\
-    else								\
-      {									\
-	/* An isolated low-surrogate was found.  This has to be         \
-	   considered ill-formed.  */					\
-	if (__glibc_unlikely (u1 >= 0xdc00))				\
-	  {								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	/* It's a surrogate character.  At least the first word says	\
-	   it is.  */							\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    /* We don't have enough input for another complete input	\
-	       character.  */						\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	uint16_t u2 = get16 (inptr);					\
-	if (__builtin_expect (u2 < 0xdc00, 0)				\
-	    || __builtin_expect (u2 > 0xdfff, 0))			\
-	  {								\
-	    /* This is no valid second word for a surrogate.  */	\
-	    inptr -= 2;							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-									\
-	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
-	inptr += 2;							\
-      }									\
-    outptr += 4;							\
-  }
-
-#define BODY_FROM_VX							\
-  {									\
-    size_t inlen = inend - inptr;					\
-    size_t outlen = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "larl %[R_TMP],9f\n\t"				\
-		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
-		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
-		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  /* Check for surrogate chars.  */			\
-		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t"						\
-		  /* Enlarge to UTF-32.  */				\
-		  "vuplhh %%v17,%%v16\n\t"				\
-		  "la %[R_IN],16(%[R_IN])\n\t"				\
-		  "vupllh %%v18,%%v16\n\t"				\
-		  "aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
-		  "aghi %[R_OUTLEN],-32\n\t"				\
-		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],16,2f\n\t"				\
-		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
-		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  /* At least on uint16_t is in range of surrogates.	\
-		     Store the preceding chars.  */			\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "vuplhh %%v17,%%v16\n\t"				\
-		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "jl 12f\n\t"						\
-		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "vupllh %%v18,%%v16\n\t"				\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11: \n\t" /* Update pointers.  */			\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "12: lghi %[R_TMP2],16\n\t"				\
-		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
-		  "srl %[R_TMP2],1\n\t"					\
-		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
-		  "aghi %[R_OUTLEN],-4\n\t"				\
-		  "j 16f\n\t"						\
-		  /* Handle remaining bytes.  */			\
-		  "2:\n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "clgfi %[R_INLEN],1\n\t"				\
-		  "je 97f\n\t" /* Only one byte available.  */		\
-		  "jl 99f\n\t" /* End if no bytes available.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle remaining uint16_t values.  */		\
-		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "slgfi %[R_OUTLEN],4\n\t"				\
-		  "jl 96f \n\t"						\
-		  "clfi %[R_TMP],0xd800\n\t"				\
-		  "jhe 15f\n\t"						\
-		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-2\n\t"				\
-		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],13b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Handle UTF-16 surrogate pair.  */			\
-		  "15: clfi %[R_TMP],0xdfff\n\t"			\
-		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
-		  "16: clfi %[R_TMP],0xdc00\n\t"			\
-		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
-		  "slgfi %[R_INLEN],4\n\t"				\
-		  "jl 97f\n\t" /* Big enough input?  */			\
-		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "slfi %[R_TMP],0xd7c0\n\t"				\
-		  "sll %[R_TMP],10\n\t"					\
-		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
-		  "nilf %[R_TMP3],0xfc00\n\t"				\
-		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "jne 98f\n\t"						\
-		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
-		  "la %[R_IN],4(%[R_IN])\n\t"				\
-		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
-		  "aghi %[R_TMP2],-2\n\t"				\
-		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  "96:\n\t" /* Return full output.  */			\
-		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
-		  "j 99f\n\t"						\
-		  "97:\n\t" /* Return incomplete input.  */		\
-		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
-		  "j 99f\n\t"						\
-		  "98:\n\t" /* Return Illegal character.  */		\
-		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "99:\n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
-  }
-
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__from_utf16_loop_c
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf16_loop_c)
-__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
-__from_utf16_loop;
-
-static void *
-__from_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf16_loop_vx;
-  else
-    return __from_utf16_loop_c;
-}
-
-strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
-#else
-# define LOOPFCT		FROM_LOOP
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-#endif
-
-/* Conversion from UTF-32 internal/BE to UTF-16.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_TO_C							\
-  {									\
-    uint32_t c = get32 (inptr);						\
-									\
-    if (__builtin_expect (c <= 0xd7ff, 1)				\
-	|| (c >=0xdc00 && c <= 0xffff))					\
-      {									\
-	/* Two UTF-16 chars.  */					\
-	put16 (outptr, c);						\
-      }									\
-    else if (__builtin_expect (c >= 0x10000, 1)				\
-	     && __builtin_expect (c <= 0x10ffff, 1))			\
-      {									\
-	/* Four UTF-16 chars.  */					\
-	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
-	uint16_t out;							\
-									\
-	/* Generate a surrogate character.  */				\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	out = 0xd800;							\
-	out |= (zabcd & 0xff) << 6;					\
-	out |= (c >> 10) & 0x3f;					\
-	put16 (outptr, out);						\
-	outptr += 2;							\
-									\
-	out = 0xdc00;							\
-	out |= c & 0x3ff;						\
-	put16 (outptr, out);						\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
-      }									\
-    outptr += 2;							\
-    inptr += 4;								\
-  }
-
-#define BODY_TO_ETF3EH							\
-  {									\
-    HARDWARE_CONVERT ("cu42 %0, %1");					\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "larl %[R_TMP],9f\n\t"				\
-		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars			\
-		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
-		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "lghi %[R_TMP2],0\n\t"				\
-		  /* Shorten to UTF-16.  */				\
-		  "vpkf %%v18,%%v16,%%v17\n\t"				\
-		  /* Check for surrogate chars.  */			\
-		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t"						\
-		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
-		  "jno 11f\n\t"						\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "vst %%v18,0(%[R_OUT])\n\t"				\
-		  "la %[R_IN],32(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-32\n\t"				\
-		  "aghi %[R_OUTLEN],-16\n\t"				\
-		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],32,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
-		     and check for ch >= 0x10000. (v30, v31)  */	\
-		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
-		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
-		  /* At least on UTF32 char is in range of surrogates.	\
-		     Store the preceding characters.  */		\
-		  "11: ahi %[R_TMP2],16\n\t"				\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
-		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "jl 20f\n\t"						\
-		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handles UTF16 surrogates with convert instruction.  */ \
-		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_TO_VX
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf16_loop_c)
-__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
-__to_utf16_loop;
-
-static void *
-__to_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf16_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf16_loop_etf3eh;
-  else
-    return __to_utf16_loop_c;
-}
-
-strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
deleted file mode 100644
index 76625d0..0000000
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ /dev/null
@@ -1,806 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		2
-#define MAX_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	0xfeff
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the UTF-16 Byte Order Mark.  */				\
-      if (__glibc_unlikely (outbuf + 2 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put16u (outbuf, BOM_UTF16);					\
-      outbuf += 2;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-16.  */
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
-		  "vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  /* Enlarge to UTF-16.  */				\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "la %[R_IN],16(%[R_IN])\n\t"				\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
-		  "aghi %[R_OUTLEN],-32\n\t"				\
-		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  "10:\n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
-		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
-						     index to store. */ \
-		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "ahi %[R_TMP2],-1\n\t"				\
-		  "jl 20f\n\t"						\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11:\n\t" /* Update pointers.  */			\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-
-/* The software implementation is based on the code in gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint16_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0	\
-	       or 0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	if (cnt == 4)							\
-	  {								\
-	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
-	       low) are needed.  */					\
-	    uint16_t zabcd, high, low;					\
-									\
-	    if (__glibc_unlikely (outptr + 4 > outend))			\
-	      {								\
-		/* Overflow in the output buffer.  */			\
-		result = __GCONV_FULL_OUTPUT;				\
-		break;							\
-	      }								\
-									\
-	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		if ((inptr[i] & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  goto errout;						\
-	      }								\
-									\
-	    /* See Principles of Operations cu12.  */			\
-	    zabcd = (((inptr[0] & 0x7) << 2) |				\
-		     ((inptr[1] & 0x30) >> 4)) - 1;			\
-									\
-	    /* z-bit must be zero after subtracting 1.  */		\
-	    if (zabcd & 0x10)						\
-	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
-									\
-	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
-	    high |= zabcd << 6;                         /* abcd bits */	\
-	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
-	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
-									\
-	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
-	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
-	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
-	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
-									\
-	    put16 (outptr, high);					\
-	    outptr += 2;						\
-	    put16 (outptr, low);					\
-	    outptr += 2;						\
-	    inptr += 4;							\
-	    continue;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Read the possible remaining bytes.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		uint16_t byte = inptr[i];				\
-									\
-		if ((byte & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  break;						\
-									\
-		ch <<= 6;						\
-		ch |= byte & 0x3f;					\
-	      }								\
-									\
-	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
-	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
-	       have been represented with fewer than cnt bytes.  */	\
-	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
-		/* Do not accept UTF-16 surrogates.  */			\
-		|| (ch >= 0xd800 && ch <= 0xdfff))			\
-	      {								\
-		/* This is an illegal encoding.  */			\
-		goto errout;						\
-	      }								\
-									\
-	    inptr += cnt;						\
-	  }								\
-      }									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint16_t *) outptr) = ch;					\
-    outptr += sizeof (uint16_t);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-/* Conversion from UTF-16 to UTF-8.  */
-
-/* The software routine is based on the functionality of the S/390
-   hardware instruction (cu21) as described in the Principles of
-   Operation.  */
-#define BODY_TO_C							\
-  {									\
-    uint16_t c = get16 (inptr);						\
-									\
-    if (__glibc_likely (c <= 0x007f))					\
-      {									\
-	/* Single byte UTF-8 char.  */					\
-	*outptr = c & 0xff;						\
-	outptr++;							\
-      }									\
-    else if (c >= 0x0080 && c <= 0x07ff)				\
-      {									\
-	/* Two byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 2 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	outptr[0] = 0xc0;						\
-	outptr[0] |= c >> 6;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= c & 0x3f;						\
-									\
-	outptr += 2;							\
-      }									\
-    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
-      {									\
-	/* Three byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 3 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	outptr[0] = 0xe0;						\
-	outptr[0] |= c >> 12;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (c >> 6) & 0x3f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= c & 0x3f;						\
-									\
-	outptr += 3;							\
-      }									\
-    else if (c >= 0xd800 && c <= 0xdbff)				\
-      {									\
-	/* Four byte UTF-8 char.  */					\
-	uint16_t low, uvwxy;						\
-									\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	low = get16 (inptr);						\
-									\
-	if ((low & 0xfc00) != 0xdc00)					\
-	  {								\
-	    inptr -= 2;							\
-	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	uvwxy = ((c >> 6) & 0xf) + 1;					\
-	outptr[0] = 0xf0;						\
-	outptr[0] |= uvwxy >> 2;					\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (uvwxy << 4) & 0x30;				\
-	outptr[1] |= (c >> 2) & 0x0f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= (c & 0x03) << 4;					\
-	outptr[2] |= (low >> 6) & 0x0f;					\
-									\
-	outptr[3] = 0x80;						\
-	outptr[3] |= low & 0x3f;					\
-									\
-	outptr += 4;							\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-      }									\
-    inptr += 2;								\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    size_t inlen  = inend - inptr;					\
-    size_t outlen  = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for values <= 0x7f.  */		\
-		  "larl %[R_TMP],9f\n\t"				\
-		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
-		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "lghi %[R_TMP2],0\n\t"				\
-		  /* Check for > 1byte UTF-8 chars.  */			\
-		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
-		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  /* Shorten to UTF-8.  */				\
-		  "vpkh %%v18,%%v16,%%v17\n\t"				\
-		  "la %[R_IN],32(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-32\n\t"				\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "vst %%v18,0(%[R_OUT])\n\t"				\
-		  "aghi %[R_OUTLEN],-16\n\t"				\
-		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],32,2f\n\t"				\
-		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
-		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
-		  "10:\n\t"						\
-		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
-		  /* Shorten to UTF-8.  */				\
-		  "vpkh %%v18,%%v16,%%v17\n\t"				\
-		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
-		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "jl 13f\n\t"						\
-		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  "13:\n\t"						\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "lghi %[R_TMP2],16\n\t"				\
-		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
-		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-2\n\t"				\
-		  "j 22f\n\t"						\
-		  /* Handle remaining bytes.  */			\
-		  "2:\n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "clgfi %[R_INLEN],1\n\t"				\
-		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
-		  "jle 99f\n\t" /* End if less than two bytes.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle multibyte utf8-char. */			\
-		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "aghi %[R_INLEN],-2\n\t"				\
-		  /* Test if ch is 1-byte UTF-8 char.  */		\
-		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
-		  /* Handle 1-byte UTF-8 char.  */			\
-		  "31: slgfi %[R_OUTLEN],1\n\t"				\
-		  "jl 90f \n\t"						\
-		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],20b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Test if ch is 2-byte UTF-8 char.  */		\
-		  "22: clfi %[R_TMP],0x7ff\n\t"				\
-		  "jh 23f\n\t"						\
-		  /* Handle 2-byte UTF-8 char.  */			\
-		  "32: slgfi %[R_OUTLEN],2\n\t"				\
-		  "jl 90f \n\t"						\
-		  "llill %[R_TMP3],0xc080\n\t"				\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
-		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
-		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],20b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Test if ch is 3-byte UTF-8 char.  */		\
-		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
-		  "jh 24f\n\t"						\
-		  /* Handle 3-byte UTF-8 char.  */			\
-		  "33: slgfi %[R_OUTLEN],3\n\t"				\
-		  "jl 90f \n\t"						\
-		  "llilf %[R_TMP3],0xe08080\n\t"			\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
-		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
-		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],20b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Test if ch is 4-byte UTF-8 char.  */		\
-		  "24: clfi %[R_TMP],0xdfff\n\t"			\
-		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
-		  "clfi %[R_TMP],0xdbff\n\t"				\
-		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
-				  without a preceding high surrogate.  */ \
-		  /* Handle 4-byte UTF-8 char.  */			\
-		  "34: slgfi %[R_OUTLEN],4\n\t"				\
-		  "jl 90f \n\t"						\
-		  "slgfi %[R_INLEN],2\n\t"				\
-		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
-		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
-		  "llilf %[R_TMP3],0xf0808080\n\t"			\
-		  "aghi %[R_TMP],0x40\n\t"				\
-		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
-		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
-		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
-		  "nilf %[R_TMP],0xfc00\n\t"				\
-		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
-		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "la %[R_IN],4(%[R_IN])\n\t"				\
-		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
-		  "aghi %[R_TMP2],-2\n\t"				\
-		  "jh 20b\n\t"						\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
-		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
-		  "99:\n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__to_utf8_loop_c
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate loop-function with software implementation.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY                   BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-#else
-# define LOOPFCT		TO_LOOP
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
deleted file mode 100644
index e89dc70..0000000
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ /dev/null
@@ -1,807 +0,0 @@
-/* Conversion between UTF-8 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		6
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM			0x0000feffu
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the Byte Order Mark.  */					\
-      if (__glibc_unlikely (outbuf + 4 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put32u (outbuf, BOM);						\
-      outbuf += 4;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
-
-#define STORE_REST_COMMON						      \
-  {									      \
-    /* We store the remaining bytes while converting them into the UCS4	      \
-       format.  We can assume that the first byte in the buffer is	      \
-       correct and that it requires a larger number of bytes than there	      \
-       are in the input buffer.  */					      \
-    wint_t ch = **inptrp;						      \
-    size_t cnt, r;							      \
-									      \
-    state->__count = inend - *inptrp;					      \
-									      \
-    assert (ch != 0xc0 && ch != 0xc1);					      \
-    if (ch >= 0xc2 && ch < 0xe0)					      \
-      {									      \
-	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
-	   0xc1, otherwise the wide character could have been		      \
-	   represented using a single byte.  */				      \
-	cnt = 2;							      \
-	ch &= 0x1f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
-      {									      \
-	/* We expect three bytes.  */					      \
-	cnt = 3;							      \
-	ch &= 0x0f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
-      {									      \
-	/* We expect four bytes.  */					      \
-	cnt = 4;							      \
-	ch &= 0x07;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-      {									      \
-	/* We expect five bytes.  */					      \
-	cnt = 5;							      \
-	ch &= 0x03;							      \
-      }									      \
-    else								      \
-      {									      \
-	/* We expect six bytes.  */					      \
-	cnt = 6;							      \
-	ch &= 0x01;							      \
-      }									      \
-									      \
-    /* The first byte is already consumed.  */				      \
-    r = cnt - 1;							      \
-    while (++(*inptrp) < inend)						      \
-      {									      \
-	ch <<= 6;							      \
-	ch |= **inptrp & 0x3f;						      \
-	--r;								      \
-      }									      \
-									      \
-    /* Shift for the so far missing bytes.  */				      \
-    ch <<= r * 6;							      \
-									      \
-    /* Store the number of bytes expected for the entire sequence.  */	      \
-    state->__count |= cnt << 8;						      \
-									      \
-    /* Store the value.  */						      \
-    state->__value.__wch = ch;						      \
-  }
-
-#define UNPACK_BYTES_COMMON \
-  {									      \
-    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
-    wint_t wch = state->__value.__wch;					      \
-    size_t ntotal = state->__count >> 8;				      \
-									      \
-    inlen = state->__count & 255;					      \
-									      \
-    bytebuf[0] = inmask[ntotal - 2];					      \
-									      \
-    do									      \
-      {									      \
-	if (--ntotal < inlen)						      \
-	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
-	wch >>= 6;							      \
-      }									      \
-    while (ntotal > 1);							      \
-									      \
-    bytebuf[0] |= wch;							      \
-  }
-
-#define CLEAR_STATE_COMMON \
-  state->__count = 0
-
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
-
-
-/* The software routine is copied from gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint32_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
-	       0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	/* Read the possible remaining bytes.  */			\
-	for (i = 1; i < cnt; ++i)					\
-	  {								\
-	    uint32_t byte = inptr[i];					\
-									\
-	    if ((byte & 0xc0) != 0x80)					\
-	      /* This is an illegal encoding.  */			\
-	      break;							\
-									\
-	    ch <<= 6;							\
-	    ch |= byte & 0x3f;						\
-	  }								\
-									\
-	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
-	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
-	   have been represented with fewer than cnt bytes.  */		\
-	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
-	    /* Do not accept UTF-16 surrogates.  */			\
-	    || (ch >= 0xd800 && ch <= 0xdfff)				\
-	    || (ch > 0x10ffff))						\
-	  {								\
-	    /* This is an illegal encoding.  */				\
-	    goto errout;						\
-	  }								\
-									\
-	inptr += cnt;							\
-      }									\
-									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint32_t *) outptr) = ch;					\
-    outptr += sizeof (uint32_t);					\
-  }
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
-		  "vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  /* Enlarge to UCS4.  */				\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "la %[R_IN],16(%[R_IN])\n\t"				\
-		  "vuplhh %%v20,%%v18\n\t"				\
-		  "aghi %[R_INLEN],-16\n\t"				\
-		  "vupllh %%v21,%%v18\n\t"				\
-		  "aghi %[R_OUTLEN],-64\n\t"				\
-		  "vuplhh %%v22,%%v19\n\t"				\
-		  "vupllh %%v23,%%v19\n\t"				\
-		  /* Store 64 bytes to buf_out.  */			\
-		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
-		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  "10:\n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
-		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
-						     index to store. */ \
-		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "ahi %[R_TMP2],-1\n\t"				\
-		  "jl 20f\n\t"						\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "vuplhh %%v20,%%v18\n\t"				\
-		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllh %%v21,%%v18\n\t"				\
-		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "vuplhh %%v22,%%v19\n\t"				\
-		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllh %%v23,%%v19\n\t"				\
-		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
-		  "11:\n\t"						\
-		  /* Update pointers.  */				\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
-		    ASM_CLOBBER_VR ("v31")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-/* These definitions apply to the UTF-8 to UTF-32 direction.  The
-   software implementation for UTF-8 still supports multibyte
-   characters up to 6 bytes whereas the hardware variant does not.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-
-# define LOOP_NEED_FLAGS
-
-# define STORE_REST		STORE_REST_COMMON
-# define UNPACK_BYTES		UNPACK_BYTES_COMMON
-# define CLEAR_STATE		CLEAR_STATE_COMMON
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-
-/* Conversion from UTF-32 internal/BE to UTF-8.  */
-#define BODY_TO_HW(ASM)							\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* The hardware routine uses the S/390 cu41 instruction.  */
-#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
-
-/* The hardware routine uses the S/390 vector and cu41 instructions.  */
-#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
-
-/* The software routine mimics the S/390 cu41 instruction.  */
-#define BODY_TO_C						\
-  {								\
-    uint32_t wc = *((const uint32_t *) inptr);			\
-								\
-    if (__glibc_likely (wc <= 0x7f))				\
-      {								\
-	/* Single UTF-8 char.  */				\
-	*outptr = (uint8_t)wc;					\
-	outptr++;						\
-      }								\
-    else if (wc <= 0x7ff)					\
-      {								\
-	/* Two UTF-8 chars.  */					\
-	if (__glibc_unlikely (outptr + 2 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-								\
-	outptr[0] = 0xc0;					\
-	outptr[0] |= wc >> 6;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= wc & 0x3f;					\
-								\
-	outptr += 2;						\
-      }								\
-    else if (wc <= 0xffff)					\
-      {								\
-	/* Three UTF-8 chars.  */				\
-	if (__glibc_unlikely (outptr + 3 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-	if (wc >= 0xd800 && wc < 0xdc00)			\
-	  {							\
-	    /* Do not accept UTF-16 surrogates.   */		\
-	    result = __GCONV_ILLEGAL_INPUT;			\
-	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	  }							\
-	outptr[0] = 0xe0;					\
-	outptr[0] |= wc >> 12;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= (wc >> 6) & 0x3f;				\
-								\
-	outptr[2] = 0x80;					\
-	outptr[2] |= wc & 0x3f;					\
-								\
-	outptr += 3;						\
-      }								\
-      else if (wc <= 0x10ffff)					\
-	{							\
-	  /* Four UTF-8 chars.  */				\
-	  if (__glibc_unlikely (outptr + 4 > outend))		\
-	    {							\
-	      /* Overflow in the output buffer.  */		\
-	      result = __GCONV_FULL_OUTPUT;			\
-	      break;						\
-	    }							\
-	  outptr[0] = 0xf0;					\
-	  outptr[0] |= wc >> 18;				\
-								\
-	  outptr[1] = 0x80;					\
-	  outptr[1] |= (wc >> 12) & 0x3f;			\
-								\
-	  outptr[2] = 0x80;					\
-	  outptr[2] |= (wc >> 6) & 0x3f;			\
-								\
-	  outptr[3] = 0x80;					\
-	  outptr[3] |= wc & 0x3f;				\
-								\
-	  outptr += 4;						\
-	}							\
-      else							\
-	{							\
-	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	}							\
-    inptr += 4;							\
-  }
-
-#define HW_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2;						\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
-		  "vzero %%v21\n\t"					\
-		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
-		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
-		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
-		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
-		  "lghi %[R_TMP],0\n\t"					\
-		  /* Shorten to byte values.  */			\
-		  "vpkf %%v23,%%v16,%%v17\n\t"				\
-		  "vpkf %%v24,%%v18,%%v19\n\t"				\
-		  "vpkh %%v23,%%v23,%%v24\n\t"				\
-		  /* Checking for values > 0x7f.  */			\
-		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
-		  "jno 10f\n\t"						\
-		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
-		  "jno 11f\n\t"						\
-		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
-		  "jno 12f\n\t"						\
-		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
-		  "jno 13f\n\t"						\
-		  /* Store 16bytes to outptr.  */			\
-		  "vst %%v23,0(%[R_OUT])\n\t"				\
-		  "aghi %[R_INLEN],-64\n\t"				\
-		  "aghi %[R_OUTLEN],-16\n\t"				\
-		  "la %[R_IN],64(%[R_IN])\n\t"				\
-		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],64,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Found a value > 0x7f.  */				\
-		  "13: ahi %[R_TMP],4\n\t"				\
-		  "12: ahi %[R_TMP],4\n\t"				\
-		  "11: ahi %[R_TMP],4\n\t"				\
-		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
-		  "srlg %[R_I],%[R_I],2\n\t"				\
-		  "agr %[R_I],%[R_TMP]\n\t"				\
-		  "je 20f\n\t"						\
-		  /* Store characters before invalid one...  */		\
-		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
-		  "15: aghi %[R_I],-1\n\t"				\
-		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
-		  /* ... and update pointers.  */			\
-		  "aghi %[R_I],1\n\t"					\
-		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
-		  "sllg %[R_I],%[R_I],2\n\t"				\
-		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_I]\n\t"				\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
-		    , [R_I] "=a" (tmp2)					\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
-		    ASM_CLOBBER_VR ("v24")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_c
-#define BODY			BODY_TO_C
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY			BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf8_loop_etf3eh;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
new file mode 100644
index 0000000..ecf06bd
--- /dev/null
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -0,0 +1,636 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM_UTF32               0x0000feffu
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16               0xfeff
+
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		2
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf16_loop
+#define TO_LOOP			__to_utf16_loop
+#define FROM_DIRECTION		(dir == from_utf16)
+#define ONE_DIRECTION           0
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf16,
+  from_utf16
+};
+
+struct utf16_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf16_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf16;
+    }
+  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
+	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf16;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf16)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      if (dir == to_utf16)						\
+	{								\
+	  /* Emit the UTF-16 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 2 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put16u (outbuf, BOM_UTF16);					\
+	  outbuf += 2;							\
+	}								\
+      else								\
+	{								\
+	  /* Emit the UTF-32 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 4 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put32u (outbuf, BOM_UTF32);					\
+	  outbuf += 4;							\
+	}								\
+    }
+
+/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_FROM_C							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
+      {									\
+	/* No surrogate.  */						\
+	put32 (outptr, u1);						\
+	inptr += 2;							\
+      }									\
+    else								\
+      {									\
+	/* An isolated low-surrogate was found.  This has to be         \
+	   considered ill-formed.  */					\
+	if (__glibc_unlikely (u1 >= 0xdc00))				\
+	  {								\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	/* It's a surrogate character.  At least the first word says	\
+	   it is.  */							\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    /* We don't have enough input for another complete input	\
+	       character.  */						\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	uint16_t u2 = get16 (inptr);					\
+	if (__builtin_expect (u2 < 0xdc00, 0)				\
+	    || __builtin_expect (u2 > 0xdfff, 0))			\
+	  {								\
+	    /* This is no valid second word for a surrogate.  */	\
+	    inptr -= 2;							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+									\
+	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
+	inptr += 2;							\
+      }									\
+    outptr += 4;							\
+  }
+
+#define BODY_FROM_VX							\
+  {									\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
+		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t"						\
+		  /* Enlarge to UTF-32.  */				\
+		  "vuplhh %%v17,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vupllh %%v18,%%v16\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
+		  "aghi %[R_OUTLEN],-32\n\t"				\
+		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
+		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  /* At least on uint16_t is in range of surrogates.	\
+		     Store the preceding chars.  */			\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "vuplhh %%v17,%%v16\n\t"				\
+		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 12f\n\t"						\
+		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "vupllh %%v18,%%v16\n\t"				\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "12: lghi %[R_TMP2],16\n\t"				\
+		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
+		  "srl %[R_TMP2],1\n\t"					\
+		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
+		  "aghi %[R_OUTLEN],-4\n\t"				\
+		  "j 16f\n\t"						\
+		  /* Handle remaining bytes.  */			\
+		  "2:\n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "clgfi %[R_INLEN],1\n\t"				\
+		  "je 97f\n\t" /* Only one byte available.  */		\
+		  "jl 99f\n\t" /* End if no bytes available.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle remaining uint16_t values.  */		\
+		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 96f \n\t"						\
+		  "clfi %[R_TMP],0xd800\n\t"				\
+		  "jhe 15f\n\t"						\
+		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],13b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Handle UTF-16 surrogate pair.  */			\
+		  "15: clfi %[R_TMP],0xdfff\n\t"			\
+		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
+		  "16: clfi %[R_TMP],0xdc00\n\t"			\
+		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
+		  "slgfi %[R_INLEN],4\n\t"				\
+		  "jl 97f\n\t" /* Big enough input?  */			\
+		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "slfi %[R_TMP],0xd7c0\n\t"				\
+		  "sll %[R_TMP],10\n\t"					\
+		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
+		  "nilf %[R_TMP3],0xfc00\n\t"				\
+		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "jne 98f\n\t"						\
+		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "aghi %[R_TMP2],-2\n\t"				\
+		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  "96:\n\t" /* Return full output.  */			\
+		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "97:\n\t" /* Return incomplete input.  */		\
+		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "98:\n\t" /* Return Illegal character.  */		\
+		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
+  }
+
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__from_utf16_loop_c
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf16_loop_c)
+__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
+__from_utf16_loop;
+
+static void *
+__from_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf16_loop_vx;
+  else
+    return __from_utf16_loop_c;
+}
+
+strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
+#else
+# define LOOPFCT		FROM_LOOP
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+#endif
+
+/* Conversion from UTF-32 internal/BE to UTF-16.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_TO_C							\
+  {									\
+    uint32_t c = get32 (inptr);						\
+									\
+    if (__builtin_expect (c <= 0xd7ff, 1)				\
+	|| (c >=0xdc00 && c <= 0xffff))					\
+      {									\
+	/* Two UTF-16 chars.  */					\
+	put16 (outptr, c);						\
+      }									\
+    else if (__builtin_expect (c >= 0x10000, 1)				\
+	     && __builtin_expect (c <= 0x10ffff, 1))			\
+      {									\
+	/* Four UTF-16 chars.  */					\
+	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
+	uint16_t out;							\
+									\
+	/* Generate a surrogate character.  */				\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	out = 0xd800;							\
+	out |= (zabcd & 0xff) << 6;					\
+	out |= (c >> 10) & 0x3f;					\
+	put16 (outptr, out);						\
+	outptr += 2;							\
+									\
+	out = 0xdc00;							\
+	out |= c & 0x3ff;						\
+	put16 (outptr, out);						\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    outptr += 2;							\
+    inptr += 4;								\
+  }
+
+#define BODY_TO_ETF3EH							\
+  {									\
+    HARDWARE_CONVERT ("cu42 %0, %1");					\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars			\
+		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
+		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP2],0\n\t"				\
+		  /* Shorten to UTF-16.  */				\
+		  "vpkf %%v18,%%v16,%%v17\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t"						\
+		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
+		  "jno 11f\n\t"						\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "vst %%v18,0(%[R_OUT])\n\t"				\
+		  "la %[R_IN],32(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-32\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],32,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
+		     and check for ch >= 0x10000. (v30, v31)  */	\
+		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
+		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
+		  /* At least on UTF32 char is in range of surrogates.	\
+		     Store the preceding characters.  */		\
+		  "11: ahi %[R_TMP2],16\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
+		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 20f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handles UTF16 surrogates with convert instruction.  */ \
+		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_TO_VX
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf16_loop_c)
+__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
+__to_utf16_loop;
+
+static void *
+__to_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf16_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf16_loop_etf3eh;
+  else
+    return __to_utf16_loop_c;
+}
+
+strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
+
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf16-z9.c b/sysdeps/s390/utf8-utf16-z9.c
new file mode 100644
index 0000000..29a0bf9
--- /dev/null
+++ b/sysdeps/s390/utf8-utf16-z9.c
@@ -0,0 +1,818 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define MAX_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16	0xfeff
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the UTF-16 Byte Order Mark.  */				\
+      if (__glibc_unlikely (outbuf + 2 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put16u (outbuf, BOM_UTF16);					\
+      outbuf += 2;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-16.  */
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
+		  "vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Enlarge to UTF-16.  */				\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
+		  "aghi %[R_OUTLEN],-32\n\t"				\
+		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
+		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
+						     index to store. */ \
+		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "ahi %[R_TMP2],-1\n\t"				\
+		  "jl 20f\n\t"						\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11:\n\t" /* Update pointers.  */			\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+
+/* The software implementation is based on the code in gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint16_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0	\
+	       or 0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	if (cnt == 4)							\
+	  {								\
+	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
+	       low) are needed.  */					\
+	    uint16_t zabcd, high, low;					\
+									\
+	    if (__glibc_unlikely (outptr + 4 > outend))			\
+	      {								\
+		/* Overflow in the output buffer.  */			\
+		result = __GCONV_FULL_OUTPUT;				\
+		break;							\
+	      }								\
+									\
+	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		if ((inptr[i] & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  goto errout;						\
+	      }								\
+									\
+	    /* See Principles of Operations cu12.  */			\
+	    zabcd = (((inptr[0] & 0x7) << 2) |				\
+		     ((inptr[1] & 0x30) >> 4)) - 1;			\
+									\
+	    /* z-bit must be zero after subtracting 1.  */		\
+	    if (zabcd & 0x10)						\
+	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
+									\
+	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
+	    high |= zabcd << 6;                         /* abcd bits */	\
+	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
+	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
+									\
+	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
+	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
+	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
+	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
+									\
+	    put16 (outptr, high);					\
+	    outptr += 2;						\
+	    put16 (outptr, low);					\
+	    outptr += 2;						\
+	    inptr += 4;							\
+	    continue;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Read the possible remaining bytes.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		uint16_t byte = inptr[i];				\
+									\
+		if ((byte & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  break;						\
+									\
+		ch <<= 6;						\
+		ch |= byte & 0x3f;					\
+	      }								\
+									\
+	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
+	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+	       have been represented with fewer than cnt bytes.  */	\
+	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
+		/* Do not accept UTF-16 surrogates.  */			\
+		|| (ch >= 0xd800 && ch <= 0xdfff))			\
+	      {								\
+		/* This is an illegal encoding.  */			\
+		goto errout;						\
+	      }								\
+									\
+	    inptr += cnt;						\
+	  }								\
+      }									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint16_t *) outptr) = ch;					\
+    outptr += sizeof (uint16_t);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+/* Conversion from UTF-16 to UTF-8.  */
+
+/* The software routine is based on the functionality of the S/390
+   hardware instruction (cu21) as described in the Principles of
+   Operation.  */
+#define BODY_TO_C							\
+  {									\
+    uint16_t c = get16 (inptr);						\
+									\
+    if (__glibc_likely (c <= 0x007f))					\
+      {									\
+	/* Single byte UTF-8 char.  */					\
+	*outptr = c & 0xff;						\
+	outptr++;							\
+      }									\
+    else if (c >= 0x0080 && c <= 0x07ff)				\
+      {									\
+	/* Two byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 2 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	outptr[0] = 0xc0;						\
+	outptr[0] |= c >> 6;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= c & 0x3f;						\
+									\
+	outptr += 2;							\
+      }									\
+    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
+      {									\
+	/* Three byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 3 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	outptr[0] = 0xe0;						\
+	outptr[0] |= c >> 12;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (c >> 6) & 0x3f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= c & 0x3f;						\
+									\
+	outptr += 3;							\
+      }									\
+    else if (c >= 0xd800 && c <= 0xdbff)				\
+      {									\
+	/* Four byte UTF-8 char.  */					\
+	uint16_t low, uvwxy;						\
+									\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	low = get16 (inptr);						\
+									\
+	if ((low & 0xfc00) != 0xdc00)					\
+	  {								\
+	    inptr -= 2;							\
+	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	uvwxy = ((c >> 6) & 0xf) + 1;					\
+	outptr[0] = 0xf0;						\
+	outptr[0] |= uvwxy >> 2;					\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (uvwxy << 4) & 0x30;				\
+	outptr[1] |= (c >> 2) & 0x0f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= (c & 0x03) << 4;					\
+	outptr[2] |= (low >> 6) & 0x0f;					\
+									\
+	outptr[3] = 0x80;						\
+	outptr[3] |= low & 0x3f;					\
+									\
+	outptr += 4;							\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+      }									\
+    inptr += 2;								\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    size_t inlen  = inend - inptr;					\
+    size_t outlen  = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for values <= 0x7f.  */		\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
+		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP2],0\n\t"				\
+		  /* Check for > 1byte UTF-8 chars.  */			\
+		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
+		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Shorten to UTF-8.  */				\
+		  "vpkh %%v18,%%v16,%%v17\n\t"				\
+		  "la %[R_IN],32(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-32\n\t"				\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "vst %%v18,0(%[R_OUT])\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],32,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
+		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
+		  "10:\n\t"						\
+		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
+		  /* Shorten to UTF-8.  */				\
+		  "vpkh %%v18,%%v16,%%v17\n\t"				\
+		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
+		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 13f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  "13:\n\t"						\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "lghi %[R_TMP2],16\n\t"				\
+		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  "j 22f\n\t"						\
+		  /* Handle remaining bytes.  */			\
+		  "2:\n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "clgfi %[R_INLEN],1\n\t"				\
+		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
+		  "jle 99f\n\t" /* End if less than two bytes.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  /* Test if ch is 1-byte UTF-8 char.  */		\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "jl 90f \n\t"						\
+		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 2-byte UTF-8 char.  */		\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "jh 23f\n\t"						\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llill %[R_TMP3],0xc080\n\t"				\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
+		  "jh 24f\n\t"						\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llilf %[R_TMP3],0xe08080\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0xdfff\n\t"			\
+		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
+		  "clfi %[R_TMP],0xdbff\n\t"				\
+		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
+				  without a preceding high surrogate.  */ \
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 90f \n\t"						\
+		  "slgfi %[R_INLEN],2\n\t"				\
+		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
+		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
+		  "llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "aghi %[R_TMP],0x40\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
+		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
+		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
+		  "nilf %[R_TMP],0xfc00\n\t"				\
+		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
+		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "aghi %[R_TMP2],-2\n\t"				\
+		  "jh 20b\n\t"						\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__to_utf8_loop_c
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate loop-function with software implementation.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY                   BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+#else
+# define LOOPFCT		TO_LOOP
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
new file mode 100644
index 0000000..1b2d6a2
--- /dev/null
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -0,0 +1,820 @@
+/* Conversion between UTF-8 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		6
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM			0x0000feffu
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the Byte Order Mark.  */					\
+      if (__glibc_unlikely (outbuf + 4 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put32u (outbuf, BOM);						\
+      outbuf += 4;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
+
+#define STORE_REST_COMMON						      \
+  {									      \
+    /* We store the remaining bytes while converting them into the UCS4	      \
+       format.  We can assume that the first byte in the buffer is	      \
+       correct and that it requires a larger number of bytes than there	      \
+       are in the input buffer.  */					      \
+    wint_t ch = **inptrp;						      \
+    size_t cnt, r;							      \
+									      \
+    state->__count = inend - *inptrp;					      \
+									      \
+    assert (ch != 0xc0 && ch != 0xc1);					      \
+    if (ch >= 0xc2 && ch < 0xe0)					      \
+      {									      \
+	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
+	   0xc1, otherwise the wide character could have been		      \
+	   represented using a single byte.  */				      \
+	cnt = 2;							      \
+	ch &= 0x1f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+      {									      \
+	/* We expect three bytes.  */					      \
+	cnt = 3;							      \
+	ch &= 0x0f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+      {									      \
+	/* We expect four bytes.  */					      \
+	cnt = 4;							      \
+	ch &= 0x07;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
+      {									      \
+	/* We expect five bytes.  */					      \
+	cnt = 5;							      \
+	ch &= 0x03;							      \
+      }									      \
+    else								      \
+      {									      \
+	/* We expect six bytes.  */					      \
+	cnt = 6;							      \
+	ch &= 0x01;							      \
+      }									      \
+									      \
+    /* The first byte is already consumed.  */				      \
+    r = cnt - 1;							      \
+    while (++(*inptrp) < inend)						      \
+      {									      \
+	ch <<= 6;							      \
+	ch |= **inptrp & 0x3f;						      \
+	--r;								      \
+      }									      \
+									      \
+    /* Shift for the so far missing bytes.  */				      \
+    ch <<= r * 6;							      \
+									      \
+    /* Store the number of bytes expected for the entire sequence.  */	      \
+    state->__count |= cnt << 8;						      \
+									      \
+    /* Store the value.  */						      \
+    state->__value.__wch = ch;						      \
+  }
+
+#define UNPACK_BYTES_COMMON \
+  {									      \
+    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
+    wint_t wch = state->__value.__wch;					      \
+    size_t ntotal = state->__count >> 8;				      \
+									      \
+    inlen = state->__count & 255;					      \
+									      \
+    bytebuf[0] = inmask[ntotal - 2];					      \
+									      \
+    do									      \
+      {									      \
+	if (--ntotal < inlen)						      \
+	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
+	wch >>= 6;							      \
+      }									      \
+    while (ntotal > 1);							      \
+									      \
+    bytebuf[0] |= wch;							      \
+  }
+
+#define CLEAR_STATE_COMMON \
+  state->__count = 0
+
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
+
+
+/* The software routine is copied from gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint32_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
+	       0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	/* Read the possible remaining bytes.  */			\
+	for (i = 1; i < cnt; ++i)					\
+	  {								\
+	    uint32_t byte = inptr[i];					\
+									\
+	    if ((byte & 0xc0) != 0x80)					\
+	      /* This is an illegal encoding.  */			\
+	      break;							\
+									\
+	    ch <<= 6;							\
+	    ch |= byte & 0x3f;						\
+	  }								\
+									\
+	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
+	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
+	   have been represented with fewer than cnt bytes.  */		\
+	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
+	    /* Do not accept UTF-16 surrogates.  */			\
+	    || (ch >= 0xd800 && ch <= 0xdfff)				\
+	    || (ch > 0x10ffff))						\
+	  {								\
+	    /* This is an illegal encoding.  */				\
+	    goto errout;						\
+	  }								\
+									\
+	inptr += cnt;							\
+      }									\
+									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint32_t *) outptr) = ch;					\
+    outptr += sizeof (uint32_t);					\
+  }
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
+		  "vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Enlarge to UCS4.  */				\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vuplhh %%v20,%%v18\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  "vupllh %%v21,%%v18\n\t"				\
+		  "aghi %[R_OUTLEN],-64\n\t"				\
+		  "vuplhh %%v22,%%v19\n\t"				\
+		  "vupllh %%v23,%%v19\n\t"				\
+		  /* Store 64 bytes to buf_out.  */			\
+		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
+		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
+						     index to store. */ \
+		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "ahi %[R_TMP2],-1\n\t"				\
+		  "jl 20f\n\t"						\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vuplhh %%v20,%%v18\n\t"				\
+		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllh %%v21,%%v18\n\t"				\
+		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "vuplhh %%v22,%%v19\n\t"				\
+		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllh %%v23,%%v19\n\t"				\
+		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
+		  "11:\n\t"						\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+		    ASM_CLOBBER_VR ("v31")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+/* These definitions apply to the UTF-8 to UTF-32 direction.  The
+   software implementation for UTF-8 still supports multibyte
+   characters up to 6 bytes whereas the hardware variant does not.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+
+# define LOOP_NEED_FLAGS
+
+# define STORE_REST		STORE_REST_COMMON
+# define UNPACK_BYTES		UNPACK_BYTES_COMMON
+# define CLEAR_STATE		CLEAR_STATE_COMMON
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+
+/* Conversion from UTF-32 internal/BE to UTF-8.  */
+#define BODY_TO_HW(ASM)							\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* The hardware routine uses the S/390 cu41 instruction.  */
+#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
+
+/* The hardware routine uses the S/390 vector and cu41 instructions.  */
+#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
+
+/* The software routine mimics the S/390 cu41 instruction.  */
+#define BODY_TO_C						\
+  {								\
+    uint32_t wc = *((const uint32_t *) inptr);			\
+								\
+    if (__glibc_likely (wc <= 0x7f))				\
+      {								\
+	/* Single UTF-8 char.  */				\
+	*outptr = (uint8_t)wc;					\
+	outptr++;						\
+      }								\
+    else if (wc <= 0x7ff)					\
+      {								\
+	/* Two UTF-8 chars.  */					\
+	if (__glibc_unlikely (outptr + 2 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+								\
+	outptr[0] = 0xc0;					\
+	outptr[0] |= wc >> 6;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= wc & 0x3f;					\
+								\
+	outptr += 2;						\
+      }								\
+    else if (wc <= 0xffff)					\
+      {								\
+	/* Three UTF-8 chars.  */				\
+	if (__glibc_unlikely (outptr + 3 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+	if (wc >= 0xd800 && wc < 0xdc00)			\
+	  {							\
+	    /* Do not accept UTF-16 surrogates.   */		\
+	    result = __GCONV_ILLEGAL_INPUT;			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	  }							\
+	outptr[0] = 0xe0;					\
+	outptr[0] |= wc >> 12;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= (wc >> 6) & 0x3f;				\
+								\
+	outptr[2] = 0x80;					\
+	outptr[2] |= wc & 0x3f;					\
+								\
+	outptr += 3;						\
+      }								\
+      else if (wc <= 0x10ffff)					\
+	{							\
+	  /* Four UTF-8 chars.  */				\
+	  if (__glibc_unlikely (outptr + 4 > outend))		\
+	    {							\
+	      /* Overflow in the output buffer.  */		\
+	      result = __GCONV_FULL_OUTPUT;			\
+	      break;						\
+	    }							\
+	  outptr[0] = 0xf0;					\
+	  outptr[0] |= wc >> 18;				\
+								\
+	  outptr[1] = 0x80;					\
+	  outptr[1] |= (wc >> 12) & 0x3f;			\
+								\
+	  outptr[2] = 0x80;					\
+	  outptr[2] |= (wc >> 6) & 0x3f;			\
+								\
+	  outptr[3] = 0x80;					\
+	  outptr[3] |= wc & 0x3f;				\
+								\
+	  outptr += 4;						\
+	}							\
+      else							\
+	{							\
+	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	}							\
+    inptr += 4;							\
+  }
+
+#define HW_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2;						\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		  "vzero %%v21\n\t"					\
+		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
+		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP],0\n\t"					\
+		  /* Shorten to byte values.  */			\
+		  "vpkf %%v23,%%v16,%%v17\n\t"				\
+		  "vpkf %%v24,%%v18,%%v19\n\t"				\
+		  "vpkh %%v23,%%v23,%%v24\n\t"				\
+		  /* Checking for values > 0x7f.  */			\
+		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
+		  "jno 10f\n\t"						\
+		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
+		  "jno 11f\n\t"						\
+		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
+		  "jno 12f\n\t"						\
+		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
+		  "jno 13f\n\t"						\
+		  /* Store 16bytes to outptr.  */			\
+		  "vst %%v23,0(%[R_OUT])\n\t"				\
+		  "aghi %[R_INLEN],-64\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_IN],64(%[R_IN])\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],64,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Found a value > 0x7f.  */				\
+		  "13: ahi %[R_TMP],4\n\t"				\
+		  "12: ahi %[R_TMP],4\n\t"				\
+		  "11: ahi %[R_TMP],4\n\t"				\
+		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		  "srlg %[R_I],%[R_I],2\n\t"				\
+		  "agr %[R_I],%[R_TMP]\n\t"				\
+		  "je 20f\n\t"						\
+		  /* Store characters before invalid one...  */		\
+		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
+		  "15: aghi %[R_I],-1\n\t"				\
+		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
+		  /* ... and update pointers.  */			\
+		  "aghi %[R_I],1\n\t"					\
+		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
+		  "sllg %[R_I],%[R_I],2\n\t"				\
+		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_I]\n\t"				\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
+		    , [R_I] "=a" (tmp2)					\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+		    ASM_CLOBBER_VR ("v24")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_c
+#define BODY			BODY_TO_C
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY			BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf8_loop_etf3eh;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+
+#include <iconv/skeleton.c>
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too.
  2016-02-23  9:23 ` [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too Stefan Liebler
@ 2016-02-23 12:06   ` Stefan Liebler
  2016-04-21 15:10   ` Stefan Liebler
  1 sibling, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-02-23 12:06 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stefan Liebler

This patch reworks the existing s390 64bit specific iconv modules in order
to use them on s390 31bit, too.

Thus the parts for subdirectory iconvdata in sysdeps/s390/s390-64/Makefile
were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
All those modules are moved from sysdeps/s390/s390-64 directory to sysdeps/s390.

The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
on count) instruction on 31bit s390 instead of brctg, because the brctg is a
zarch instruction and is not available on a 31bit kernel.

The utf modules are using zarch instructions, thus the directive machinemode
zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
only if zarch instructions are available (64bit kernel in 31bit compat-mode).
Furthermore some variable types were changed. E.g. unsigned long long would be
a register pair on s390 31bit, but we want only one single register.
For variables of type size_t the register contents have to be enlarged from a
32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
in such cases.

ChangeLog:

	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
	Move to ...
	* sysdeps/s390/Makefile: ... here.
	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
	* sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
	(BRANCH_ON_COUNT): New define.
	(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
	* sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
	* sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
	* sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
---
 sysdeps/s390/Makefile                        |  83 +++
 sysdeps/s390/iso-8859-1_cp037_z900.c         | 262 +++++++++
 sysdeps/s390/s390-64/Makefile                |  84 ---
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 256 ---------
 sysdeps/s390/s390-64/utf16-utf32-z9.c        | 624 --------------------
 sysdeps/s390/s390-64/utf8-utf16-z9.c         | 806 --------------------------
 sysdeps/s390/s390-64/utf8-utf32-z9.c         | 807 --------------------------
 sysdeps/s390/utf16-utf32-z9.c                | 636 +++++++++++++++++++++
 sysdeps/s390/utf8-utf16-z9.c                 | 818 ++++++++++++++++++++++++++
 sysdeps/s390/utf8-utf32-z9.c                 | 820 +++++++++++++++++++++++++++
 10 files changed, 2619 insertions(+), 2577 deletions(-)
 create mode 100644 sysdeps/s390/Makefile
 create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
 create mode 100644 sysdeps/s390/utf16-utf32-z9.c
 create mode 100644 sysdeps/s390/utf8-utf16-z9.c
 create mode 100644 sysdeps/s390/utf8-utf32-z9.c

diff --git a/sysdeps/s390/Makefile b/sysdeps/s390/Makefile
new file mode 100644
index 0000000..9b17342
--- /dev/null
+++ b/sysdeps/s390/Makefile
@@ -0,0 +1,83 @@
+ifeq ($(subdir),iconvdata)
+ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
+ISO-8859-1_CP037_Z900-map := gconv.map
+
+UTF8_UTF32_Z9-routines := utf8-utf32-z9
+UTF8_UTF32_Z9-map := gconv.map
+
+UTF16_UTF32_Z9-routines := utf16-utf32-z9
+UTF16_UTF32_Z9-map := gconv.map
+
+UTF8_UTF16_Z9-routines := utf8-utf16-z9
+UTF8_UTF16_Z9-map := gconv.map
+
+s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
+
+extra-modules-left += $(s390x-iconv-modules)
+include extra-module.mk
+
+cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
+lib := iconvdata
+include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
+
+extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
+install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
+
+$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
+$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
+	$(do-install-program)
+
+$(objpfx)gconv-modules-s390: gconv-modules
+	${AWK} 'BEGIN { emitted = 0 } \
+	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
+	!emitted { emit_s390_modules(); emitted = 1; print; } \
+	function emit_s390_modules() { \
+	  # Emit header line. \
+	  print "# S/390 hardware accelerated modules"; \
+	  print_val("#", 8); \
+	  print_val("from", 24); \
+	  print_val("to", 24); \
+	  print_val("module", 24); \
+	  printf "cost\n"; \
+	  # Emit s390-specific modules. \
+	  modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900"); \
+	  modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900"); \
+	  modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9"); \
+	  modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9"); \
+	  modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9"); \
+	  modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9"); \
+	  modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9"); \
+	  modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9"); \
+	  modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9"); \
+	  printf "\n# Default glibc modules\n"; \
+	} \
+	function modul(from, to, file, cost) { \
+	  print_val("module", 8); \
+	  print_val(from, 24); \
+	  print_val(to, 24); \
+	  print_val(file, 24); \
+	  if (cost == 0) cost = 1; \
+	  printf "%d\n", cost; \
+	} \
+	function print_val(val, width) { \
+	  # Emit value followed by tabs. \
+	  printf "%s", val; \
+	  len = length(val); \
+	  if (len < width) { \
+	    len = width - len; \
+	    nr_tabs = len / 8; \
+	    if (len % 8 != 0) nr_tabs++; \
+	  } \
+	  else nr_tabs = 1; \
+	  for (i = 1; i <= nr_tabs; i++) printf "\t"; \
+	}' < $< > $@
+
+GCONV_MODULES = gconv-modules-s390
+
+endif
diff --git a/sysdeps/s390/iso-8859-1_cp037_z900.c b/sysdeps/s390/iso-8859-1_cp037_z900.c
new file mode 100644
index 0000000..5c19218
--- /dev/null
+++ b/sysdeps/s390/iso-8859-1_cp037_z900.c
@@ -0,0 +1,262 @@
+/* Conversion between ISO 8859-1 and IBM037.
+
+   This module uses the translate instruction.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+
+// conversion table from ISO-8859-1 to IBM037
+static const unsigned char table_iso8859_1_to_cp037[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
+  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
+  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
+  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
+  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
+  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
+  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
+  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
+  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
+  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
+  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
+  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
+  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
+  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
+  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
+  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
+  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
+  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
+  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
+  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
+  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
+  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
+  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
+  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
+  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
+  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
+  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
+  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
+  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
+  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
+  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
+  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
+  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
+  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
+  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
+  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
+  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
+  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
+  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
+  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
+  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
+  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
+  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
+  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
+  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
+  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
+  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
+  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
+  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
+  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
+  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
+  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
+  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
+  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
+  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
+};
+
+// conversion table from IBM037 to ISO-8859-1
+static const unsigned char table_cp037_iso8859_1[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
+  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
+  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
+  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
+  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
+  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
+  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
+  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
+  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
+  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
+  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
+  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
+  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
+  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
+  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
+  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
+  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
+  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
+  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
+  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
+  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
+  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
+  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
+  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
+  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
+  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
+  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
+  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
+  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
+  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
+  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
+  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
+  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
+  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
+  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
+  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
+  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
+  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
+  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
+  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
+  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
+  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
+  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
+  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
+  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
+  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
+  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
+  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
+  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
+  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
+  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
+  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
+  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
+  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
+  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
+};
+
+/* Definitions used in the body of the `gconv' function.  */
+#define CHARSET_NAME		"ISO-8859-1//"
+#define FROM_LOOP		iso8859_1_to_cp037_z900
+#define TO_LOOP			cp037_to_iso8859_1_z900
+#define DEFINE_INIT		1
+#define DEFINE_FINI		1
+#define MIN_NEEDED_FROM		1
+#define MIN_NEEDED_TO		1
+
+# if defined __s390x__
+#  define BRANCH_ON_COUNT(REG,LBL) "brctg %" #REG "," #LBL "\n\t"
+# else
+#  define BRANCH_ON_COUNT(REG,LBL) "brct %" #REG "," #LBL "\n\t"
+# endif
+
+#define TR_LOOP(TABLE)							\
+  {									\
+    size_t length = (inend - inptr < outend - outptr			\
+		     ? inend - inptr : outend - outptr);		\
+									\
+    /* Process in 256 byte blocks.  */					\
+    if (__builtin_expect (length >= 256, 0))				\
+      {									\
+	size_t blocks = length / 256;					\
+	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
+			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
+			     "la %[R_IN],256(%[R_IN])\n\t"		\
+			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
+			     BRANCH_ON_COUNT ([R_LI], 0b)		\
+			     : /* outputs */ [R_IN] "+a" (inptr)	\
+			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
+			     : /* inputs */ [R_TBL] "a" (TABLE)		\
+			     : /* clobber list */ "memory"		\
+			     );						\
+	length = length % 256;						\
+      }									\
+									\
+    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
+    if (length >= 8)							\
+      {									\
+	size_t blocks = length / 8;					\
+	for (int i = 0; i < blocks; i++)				\
+	  {								\
+	    outptr[0] = TABLE[inptr[0]];				\
+	    outptr[1] = TABLE[inptr[1]];				\
+	    outptr[2] = TABLE[inptr[2]];				\
+	    outptr[3] = TABLE[inptr[3]];				\
+	    outptr[4] = TABLE[inptr[4]];				\
+	    outptr[5] = TABLE[inptr[5]];				\
+	    outptr[6] = TABLE[inptr[6]];				\
+	    outptr[7] = TABLE[inptr[7]];				\
+	    inptr += 8;							\
+	    outptr += 8;						\
+	  }								\
+	length = length % 8;						\
+      }									\
+									\
+    /* Process remaining 0...7 bytes.  */				\
+    switch (length)							\
+      {									\
+      case 7: outptr[6] = TABLE[inptr[6]];				\
+      case 6: outptr[5] = TABLE[inptr[5]];				\
+      case 5: outptr[4] = TABLE[inptr[4]];				\
+      case 4: outptr[3] = TABLE[inptr[3]];				\
+      case 3: outptr[2] = TABLE[inptr[2]];				\
+      case 2: outptr[1] = TABLE[inptr[1]];				\
+      case 1: outptr[0] = TABLE[inptr[0]];				\
+      case 0: break;							\
+      }									\
+    inptr += length;							\
+    outptr += length;							\
+  }
+
+
+/* First define the conversion function from ISO 8859-1 to CP037.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
+
+#include <iconv/loop.c>
+
+
+/* Next, define the conversion function from CP037 to ISO 8859-1.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define LOOPFCT			TO_LOOP
+#define BODY			TR_LOOP (table_cp037_iso8859_1);
+
+#include <iconv/loop.c>
+
+
+/* Now define the toplevel functions.  */
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index d1ee59d..0a50514 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -9,87 +9,3 @@ CFLAGS-rtld.c += -Wno-uninitialized -Wno-unused
 CFLAGS-dl-load.c += -Wno-unused
 CFLAGS-dl-reloc.c += -Wno-unused
 endif
-
-ifeq ($(subdir),iconvdata)
-ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
-ISO-8859-1_CP037_Z900-map := gconv.map
-
-UTF8_UTF32_Z9-routines := utf8-utf32-z9
-UTF8_UTF32_Z9-map := gconv.map
-
-UTF16_UTF32_Z9-routines := utf16-utf32-z9
-UTF16_UTF32_Z9-map := gconv.map
-
-UTF8_UTF16_Z9-routines := utf8-utf16-z9
-UTF8_UTF16_Z9-map := gconv.map
-
-s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
-
-extra-modules-left += $(s390x-iconv-modules)
-include extra-module.mk
-
-cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
-lib := iconvdata
-include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
-
-extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
-install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
-
-$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
-$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
-	$(do-install-program)
-
-$(objpfx)gconv-modules-s390: gconv-modules
-	${AWK} 'BEGIN { emitted = 0 } \
-	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
-	!emitted { emit_s390_modules(); emitted = 1; print; } \
-	function emit_s390_modules() { \
-	  # Emit header line. \
-	  print "# S/390 hardware accelerated modules"; \
-	  print_val("#", 8); \
-	  print_val("from", 24); \
-	  print_val("to", 24); \
-	  print_val("module", 24); \
-	  printf "cost\n"; \
-	  # Emit s390-specific modules. \
-	  modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900"); \
-	  modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900"); \
-	  modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9"); \
-	  modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9"); \
-	  modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9"); \
-	  modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9"); \
-	  modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9"); \
-	  modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9"); \
-	  modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9"); \
-	  modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9"); \
-	  modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9"); \
-	  modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9"); \
-	  printf "\n# Default glibc modules\n"; \
-	} \
-	function modul(from, to, file, cost) { \
-	  print_val("module", 8); \
-	  print_val(from, 24); \
-	  print_val(to, 24); \
-	  print_val(file, 24); \
-	  if (cost == 0) cost = 1; \
-	  printf "%d\n", cost; \
-	} \
-	function print_val(val, width) { \
-	  # Emit value followed by tabs. \
-	  printf "%s", val; \
-	  len = length(val); \
-	  if (len < width) { \
-	    len = width - len; \
-	    nr_tabs = len / 8; \
-	    if (len % 8 != 0) nr_tabs++; \
-	  } \
-	  else nr_tabs = 1; \
-	  for (i = 1; i <= nr_tabs; i++) printf "\t"; \
-	}' < $< > $@
-
-GCONV_MODULES = gconv-modules-s390
-
-endif
diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
deleted file mode 100644
index 4d79bbf..0000000
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Conversion between ISO 8859-1 and IBM037.
-
-   This module uses the translate instruction.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-
-// conversion table from ISO-8859-1 to IBM037
-static const unsigned char table_iso8859_1_to_cp037[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
-  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
-  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
-  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
-  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
-  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
-  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
-  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
-  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
-  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
-  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
-  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
-  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
-  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
-  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
-  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
-  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
-  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
-  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
-  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
-  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
-  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
-  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
-  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
-  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
-  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
-  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
-  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
-  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
-  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
-  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
-  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
-  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
-  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
-  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
-  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
-  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
-  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
-  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
-  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
-  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
-  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
-  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
-  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
-  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
-  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
-  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
-  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
-  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
-  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
-  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
-  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
-  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
-  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
-  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
-};
-
-// conversion table from IBM037 to ISO-8859-1
-static const unsigned char table_cp037_iso8859_1[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
-  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
-  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
-  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
-  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
-  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
-  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
-  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
-  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
-  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
-  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
-  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
-  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
-  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
-  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
-  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
-  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
-  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
-  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
-  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
-  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
-  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
-  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
-  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
-  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
-  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
-  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
-  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
-  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
-  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
-  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
-  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
-  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
-  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
-  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
-  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
-  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
-  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
-  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
-  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
-  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
-  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
-  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
-  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
-  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
-  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
-  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
-  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
-  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
-  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
-  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
-  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
-  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
-  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
-  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
-};
-
-/* Definitions used in the body of the `gconv' function.  */
-#define CHARSET_NAME		"ISO-8859-1//"
-#define FROM_LOOP		iso8859_1_to_cp037_z900
-#define TO_LOOP			cp037_to_iso8859_1_z900
-#define DEFINE_INIT		1
-#define DEFINE_FINI		1
-#define MIN_NEEDED_FROM		1
-#define MIN_NEEDED_TO		1
-
-#define TR_LOOP(TABLE)							\
-  {									\
-    size_t length = (inend - inptr < outend - outptr			\
-		     ? inend - inptr : outend - outptr);		\
-									\
-    /* Process in 256 byte blocks.  */					\
-    if (__builtin_expect (length >= 256, 0))				\
-      {									\
-	size_t blocks = length / 256;					\
-	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
-			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
-			     "la %[R_IN],256(%[R_IN])\n\t"		\
-			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
-			     "brctg %[R_LI],0b\n\t"			\
-			     : /* outputs */ [R_IN] "+a" (inptr)	\
-			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
-			     : /* inputs */ [R_TBL] "a" (TABLE)		\
-			     : /* clobber list */ "memory"		\
-			     );						\
-	length = length % 256;						\
-      }									\
-									\
-    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
-    if (length >= 8)							\
-      {									\
-	size_t blocks = length / 8;					\
-	for (int i = 0; i < blocks; i++)				\
-	  {								\
-	    outptr[0] = TABLE[inptr[0]];				\
-	    outptr[1] = TABLE[inptr[1]];				\
-	    outptr[2] = TABLE[inptr[2]];				\
-	    outptr[3] = TABLE[inptr[3]];				\
-	    outptr[4] = TABLE[inptr[4]];				\
-	    outptr[5] = TABLE[inptr[5]];				\
-	    outptr[6] = TABLE[inptr[6]];				\
-	    outptr[7] = TABLE[inptr[7]];				\
-	    inptr += 8;							\
-	    outptr += 8;						\
-	  }								\
-	length = length % 8;						\
-      }									\
-									\
-    /* Process remaining 0...7 bytes.  */				\
-    switch (length)							\
-      {									\
-      case 7: outptr[6] = TABLE[inptr[6]];				\
-      case 6: outptr[5] = TABLE[inptr[5]];				\
-      case 5: outptr[4] = TABLE[inptr[4]];				\
-      case 4: outptr[3] = TABLE[inptr[3]];				\
-      case 3: outptr[2] = TABLE[inptr[2]];				\
-      case 2: outptr[1] = TABLE[inptr[1]];				\
-      case 1: outptr[0] = TABLE[inptr[0]];				\
-      case 0: break;							\
-      }									\
-    inptr += length;							\
-    outptr += length;							\
-  }
-
-
-/* First define the conversion function from ISO 8859-1 to CP037.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
-#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
-
-#include <iconv/loop.c>
-
-
-/* Next, define the conversion function from CP037 to ISO 8859-1.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
-#define BODY			TR_LOOP (table_cp037_iso8859_1);
-
-#include <iconv/loop.c>
-
-
-/* Now define the toplevel functions.  */
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
deleted file mode 100644
index 4c2c548..0000000
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ /dev/null
@@ -1,624 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM_UTF32               0x0000feffu
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16               0xfeff
-
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		2
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf16_loop
-#define TO_LOOP			__to_utf16_loop
-#define FROM_DIRECTION		(dir == from_utf16)
-#define ONE_DIRECTION           0
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf16,
-  from_utf16
-};
-
-struct utf16_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf16_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf16;
-    }
-  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
-	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf16;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf16)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      if (dir == to_utf16)						\
-	{								\
-	  /* Emit the UTF-16 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 2 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put16u (outbuf, BOM_UTF16);					\
-	  outbuf += 2;							\
-	}								\
-      else								\
-	{								\
-	  /* Emit the UTF-32 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 4 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put32u (outbuf, BOM_UTF32);					\
-	  outbuf += 4;							\
-	}								\
-    }
-
-/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_FROM_C							\
-  {									\
-    uint16_t u1 = get16 (inptr);					\
-									\
-    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
-      {									\
-	/* No surrogate.  */						\
-	put32 (outptr, u1);						\
-	inptr += 2;							\
-      }									\
-    else								\
-      {									\
-	/* An isolated low-surrogate was found.  This has to be         \
-	   considered ill-formed.  */					\
-	if (__glibc_unlikely (u1 >= 0xdc00))				\
-	  {								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	/* It's a surrogate character.  At least the first word says	\
-	   it is.  */							\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    /* We don't have enough input for another complete input	\
-	       character.  */						\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	uint16_t u2 = get16 (inptr);					\
-	if (__builtin_expect (u2 < 0xdc00, 0)				\
-	    || __builtin_expect (u2 > 0xdfff, 0))			\
-	  {								\
-	    /* This is no valid second word for a surrogate.  */	\
-	    inptr -= 2;							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-									\
-	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
-	inptr += 2;							\
-      }									\
-    outptr += 4;							\
-  }
-
-#define BODY_FROM_VX							\
-  {									\
-    size_t inlen = inend - inptr;					\
-    size_t outlen = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "larl %[R_TMP],9f\n\t"				\
-		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
-		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
-		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  /* Check for surrogate chars.  */			\
-		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t"						\
-		  /* Enlarge to UTF-32.  */				\
-		  "vuplhh %%v17,%%v16\n\t"				\
-		  "la %[R_IN],16(%[R_IN])\n\t"				\
-		  "vupllh %%v18,%%v16\n\t"				\
-		  "aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
-		  "aghi %[R_OUTLEN],-32\n\t"				\
-		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],16,2f\n\t"				\
-		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
-		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  /* At least on uint16_t is in range of surrogates.	\
-		     Store the preceding chars.  */			\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "vuplhh %%v17,%%v16\n\t"				\
-		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "jl 12f\n\t"						\
-		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "vupllh %%v18,%%v16\n\t"				\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11: \n\t" /* Update pointers.  */			\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "12: lghi %[R_TMP2],16\n\t"				\
-		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
-		  "srl %[R_TMP2],1\n\t"					\
-		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
-		  "aghi %[R_OUTLEN],-4\n\t"				\
-		  "j 16f\n\t"						\
-		  /* Handle remaining bytes.  */			\
-		  "2:\n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "clgfi %[R_INLEN],1\n\t"				\
-		  "je 97f\n\t" /* Only one byte available.  */		\
-		  "jl 99f\n\t" /* End if no bytes available.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle remaining uint16_t values.  */		\
-		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "slgfi %[R_OUTLEN],4\n\t"				\
-		  "jl 96f \n\t"						\
-		  "clfi %[R_TMP],0xd800\n\t"				\
-		  "jhe 15f\n\t"						\
-		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-2\n\t"				\
-		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],13b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Handle UTF-16 surrogate pair.  */			\
-		  "15: clfi %[R_TMP],0xdfff\n\t"			\
-		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
-		  "16: clfi %[R_TMP],0xdc00\n\t"			\
-		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
-		  "slgfi %[R_INLEN],4\n\t"				\
-		  "jl 97f\n\t" /* Big enough input?  */			\
-		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "slfi %[R_TMP],0xd7c0\n\t"				\
-		  "sll %[R_TMP],10\n\t"					\
-		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
-		  "nilf %[R_TMP3],0xfc00\n\t"				\
-		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "jne 98f\n\t"						\
-		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
-		  "la %[R_IN],4(%[R_IN])\n\t"				\
-		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
-		  "aghi %[R_TMP2],-2\n\t"				\
-		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  "96:\n\t" /* Return full output.  */			\
-		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
-		  "j 99f\n\t"						\
-		  "97:\n\t" /* Return incomplete input.  */		\
-		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
-		  "j 99f\n\t"						\
-		  "98:\n\t" /* Return Illegal character.  */		\
-		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "99:\n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
-  }
-
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__from_utf16_loop_c
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf16_loop_c)
-__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
-__from_utf16_loop;
-
-static void *
-__from_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf16_loop_vx;
-  else
-    return __from_utf16_loop_c;
-}
-
-strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
-#else
-# define LOOPFCT		FROM_LOOP
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-#endif
-
-/* Conversion from UTF-32 internal/BE to UTF-16.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_TO_C							\
-  {									\
-    uint32_t c = get32 (inptr);						\
-									\
-    if (__builtin_expect (c <= 0xd7ff, 1)				\
-	|| (c >=0xdc00 && c <= 0xffff))					\
-      {									\
-	/* Two UTF-16 chars.  */					\
-	put16 (outptr, c);						\
-      }									\
-    else if (__builtin_expect (c >= 0x10000, 1)				\
-	     && __builtin_expect (c <= 0x10ffff, 1))			\
-      {									\
-	/* Four UTF-16 chars.  */					\
-	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
-	uint16_t out;							\
-									\
-	/* Generate a surrogate character.  */				\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	out = 0xd800;							\
-	out |= (zabcd & 0xff) << 6;					\
-	out |= (c >> 10) & 0x3f;					\
-	put16 (outptr, out);						\
-	outptr += 2;							\
-									\
-	out = 0xdc00;							\
-	out |= c & 0x3ff;						\
-	put16 (outptr, out);						\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
-      }									\
-    outptr += 2;							\
-    inptr += 4;								\
-  }
-
-#define BODY_TO_ETF3EH							\
-  {									\
-    HARDWARE_CONVERT ("cu42 %0, %1");					\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "larl %[R_TMP],9f\n\t"				\
-		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars			\
-		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
-		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "lghi %[R_TMP2],0\n\t"				\
-		  /* Shorten to UTF-16.  */				\
-		  "vpkf %%v18,%%v16,%%v17\n\t"				\
-		  /* Check for surrogate chars.  */			\
-		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t"						\
-		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
-		  "jno 11f\n\t"						\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "vst %%v18,0(%[R_OUT])\n\t"				\
-		  "la %[R_IN],32(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-32\n\t"				\
-		  "aghi %[R_OUTLEN],-16\n\t"				\
-		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],32,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
-		     and check for ch >= 0x10000. (v30, v31)  */	\
-		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
-		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
-		  /* At least on UTF32 char is in range of surrogates.	\
-		     Store the preceding characters.  */		\
-		  "11: ahi %[R_TMP2],16\n\t"				\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
-		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "jl 20f\n\t"						\
-		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handles UTF16 surrogates with convert instruction.  */ \
-		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_TO_VX
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf16_loop_c)
-__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
-__to_utf16_loop;
-
-static void *
-__to_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf16_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf16_loop_etf3eh;
-  else
-    return __to_utf16_loop_c;
-}
-
-strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
deleted file mode 100644
index 76625d0..0000000
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ /dev/null
@@ -1,806 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		2
-#define MAX_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	0xfeff
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the UTF-16 Byte Order Mark.  */				\
-      if (__glibc_unlikely (outbuf + 2 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put16u (outbuf, BOM_UTF16);					\
-      outbuf += 2;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-16.  */
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
-		  "vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  /* Enlarge to UTF-16.  */				\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "la %[R_IN],16(%[R_IN])\n\t"				\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
-		  "aghi %[R_OUTLEN],-32\n\t"				\
-		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  "10:\n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
-		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
-						     index to store. */ \
-		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "ahi %[R_TMP2],-1\n\t"				\
-		  "jl 20f\n\t"						\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11:\n\t" /* Update pointers.  */			\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-
-/* The software implementation is based on the code in gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint16_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0	\
-	       or 0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	if (cnt == 4)							\
-	  {								\
-	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
-	       low) are needed.  */					\
-	    uint16_t zabcd, high, low;					\
-									\
-	    if (__glibc_unlikely (outptr + 4 > outend))			\
-	      {								\
-		/* Overflow in the output buffer.  */			\
-		result = __GCONV_FULL_OUTPUT;				\
-		break;							\
-	      }								\
-									\
-	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		if ((inptr[i] & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  goto errout;						\
-	      }								\
-									\
-	    /* See Principles of Operations cu12.  */			\
-	    zabcd = (((inptr[0] & 0x7) << 2) |				\
-		     ((inptr[1] & 0x30) >> 4)) - 1;			\
-									\
-	    /* z-bit must be zero after subtracting 1.  */		\
-	    if (zabcd & 0x10)						\
-	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
-									\
-	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
-	    high |= zabcd << 6;                         /* abcd bits */	\
-	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
-	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
-									\
-	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
-	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
-	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
-	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
-									\
-	    put16 (outptr, high);					\
-	    outptr += 2;						\
-	    put16 (outptr, low);					\
-	    outptr += 2;						\
-	    inptr += 4;							\
-	    continue;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Read the possible remaining bytes.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		uint16_t byte = inptr[i];				\
-									\
-		if ((byte & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  break;						\
-									\
-		ch <<= 6;						\
-		ch |= byte & 0x3f;					\
-	      }								\
-									\
-	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
-	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
-	       have been represented with fewer than cnt bytes.  */	\
-	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
-		/* Do not accept UTF-16 surrogates.  */			\
-		|| (ch >= 0xd800 && ch <= 0xdfff))			\
-	      {								\
-		/* This is an illegal encoding.  */			\
-		goto errout;						\
-	      }								\
-									\
-	    inptr += cnt;						\
-	  }								\
-      }									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint16_t *) outptr) = ch;					\
-    outptr += sizeof (uint16_t);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-/* Conversion from UTF-16 to UTF-8.  */
-
-/* The software routine is based on the functionality of the S/390
-   hardware instruction (cu21) as described in the Principles of
-   Operation.  */
-#define BODY_TO_C							\
-  {									\
-    uint16_t c = get16 (inptr);						\
-									\
-    if (__glibc_likely (c <= 0x007f))					\
-      {									\
-	/* Single byte UTF-8 char.  */					\
-	*outptr = c & 0xff;						\
-	outptr++;							\
-      }									\
-    else if (c >= 0x0080 && c <= 0x07ff)				\
-      {									\
-	/* Two byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 2 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	outptr[0] = 0xc0;						\
-	outptr[0] |= c >> 6;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= c & 0x3f;						\
-									\
-	outptr += 2;							\
-      }									\
-    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
-      {									\
-	/* Three byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 3 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	outptr[0] = 0xe0;						\
-	outptr[0] |= c >> 12;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (c >> 6) & 0x3f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= c & 0x3f;						\
-									\
-	outptr += 3;							\
-      }									\
-    else if (c >= 0xd800 && c <= 0xdbff)				\
-      {									\
-	/* Four byte UTF-8 char.  */					\
-	uint16_t low, uvwxy;						\
-									\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	low = get16 (inptr);						\
-									\
-	if ((low & 0xfc00) != 0xdc00)					\
-	  {								\
-	    inptr -= 2;							\
-	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	uvwxy = ((c >> 6) & 0xf) + 1;					\
-	outptr[0] = 0xf0;						\
-	outptr[0] |= uvwxy >> 2;					\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (uvwxy << 4) & 0x30;				\
-	outptr[1] |= (c >> 2) & 0x0f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= (c & 0x03) << 4;					\
-	outptr[2] |= (low >> 6) & 0x0f;					\
-									\
-	outptr[3] = 0x80;						\
-	outptr[3] |= low & 0x3f;					\
-									\
-	outptr += 4;							\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-      }									\
-    inptr += 2;								\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    size_t inlen  = inend - inptr;					\
-    size_t outlen  = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for values <= 0x7f.  */		\
-		  "larl %[R_TMP],9f\n\t"				\
-		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
-		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "lghi %[R_TMP2],0\n\t"				\
-		  /* Check for > 1byte UTF-8 chars.  */			\
-		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
-		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  /* Shorten to UTF-8.  */				\
-		  "vpkh %%v18,%%v16,%%v17\n\t"				\
-		  "la %[R_IN],32(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-32\n\t"				\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "vst %%v18,0(%[R_OUT])\n\t"				\
-		  "aghi %[R_OUTLEN],-16\n\t"				\
-		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],32,2f\n\t"				\
-		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
-		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
-		  "10:\n\t"						\
-		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
-		  /* Shorten to UTF-8.  */				\
-		  "vpkh %%v18,%%v16,%%v17\n\t"				\
-		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
-		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "jl 13f\n\t"						\
-		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  "13:\n\t"						\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "lghi %[R_TMP2],16\n\t"				\
-		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
-		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
-		  "aghi %[R_INLEN],-2\n\t"				\
-		  "j 22f\n\t"						\
-		  /* Handle remaining bytes.  */			\
-		  "2:\n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "clgfi %[R_INLEN],1\n\t"				\
-		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
-		  "jle 99f\n\t" /* End if less than two bytes.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle multibyte utf8-char. */			\
-		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "aghi %[R_INLEN],-2\n\t"				\
-		  /* Test if ch is 1-byte UTF-8 char.  */		\
-		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
-		  /* Handle 1-byte UTF-8 char.  */			\
-		  "31: slgfi %[R_OUTLEN],1\n\t"				\
-		  "jl 90f \n\t"						\
-		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],20b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Test if ch is 2-byte UTF-8 char.  */		\
-		  "22: clfi %[R_TMP],0x7ff\n\t"				\
-		  "jh 23f\n\t"						\
-		  /* Handle 2-byte UTF-8 char.  */			\
-		  "32: slgfi %[R_OUTLEN],2\n\t"				\
-		  "jl 90f \n\t"						\
-		  "llill %[R_TMP3],0xc080\n\t"				\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
-		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
-		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],20b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Test if ch is 3-byte UTF-8 char.  */		\
-		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
-		  "jh 24f\n\t"						\
-		  /* Handle 3-byte UTF-8 char.  */			\
-		  "33: slgfi %[R_OUTLEN],3\n\t"				\
-		  "jl 90f \n\t"						\
-		  "llilf %[R_TMP3],0xe08080\n\t"			\
-		  "la %[R_IN],2(%[R_IN])\n\t"				\
-		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
-		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
-		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
-		  "brctg %[R_TMP2],20b\n\t"				\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Test if ch is 4-byte UTF-8 char.  */		\
-		  "24: clfi %[R_TMP],0xdfff\n\t"			\
-		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
-		  "clfi %[R_TMP],0xdbff\n\t"				\
-		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
-				  without a preceding high surrogate.  */ \
-		  /* Handle 4-byte UTF-8 char.  */			\
-		  "34: slgfi %[R_OUTLEN],4\n\t"				\
-		  "jl 90f \n\t"						\
-		  "slgfi %[R_INLEN],2\n\t"				\
-		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
-		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
-		  "llilf %[R_TMP3],0xf0808080\n\t"			\
-		  "aghi %[R_TMP],0x40\n\t"				\
-		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
-		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
-		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
-		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
-		  "nilf %[R_TMP],0xfc00\n\t"				\
-		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
-		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "la %[R_IN],4(%[R_IN])\n\t"				\
-		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
-		  "aghi %[R_TMP2],-2\n\t"				\
-		  "jh 20b\n\t"						\
-		  "j 0b\n\t" /* Switch to vx-loop.  */			\
-		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
-		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
-		  "99:\n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__to_utf8_loop_c
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate loop-function with software implementation.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY                   BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-#else
-# define LOOPFCT		TO_LOOP
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
deleted file mode 100644
index e89dc70..0000000
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ /dev/null
@@ -1,807 +0,0 @@
-/* Conversion between UTF-8 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		6
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM			0x0000feffu
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the Byte Order Mark.  */					\
-      if (__glibc_unlikely (outbuf + 4 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put32u (outbuf, BOM);						\
-      outbuf += 4;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
-
-#define STORE_REST_COMMON						      \
-  {									      \
-    /* We store the remaining bytes while converting them into the UCS4	      \
-       format.  We can assume that the first byte in the buffer is	      \
-       correct and that it requires a larger number of bytes than there	      \
-       are in the input buffer.  */					      \
-    wint_t ch = **inptrp;						      \
-    size_t cnt, r;							      \
-									      \
-    state->__count = inend - *inptrp;					      \
-									      \
-    assert (ch != 0xc0 && ch != 0xc1);					      \
-    if (ch >= 0xc2 && ch < 0xe0)					      \
-      {									      \
-	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
-	   0xc1, otherwise the wide character could have been		      \
-	   represented using a single byte.  */				      \
-	cnt = 2;							      \
-	ch &= 0x1f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
-      {									      \
-	/* We expect three bytes.  */					      \
-	cnt = 3;							      \
-	ch &= 0x0f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
-      {									      \
-	/* We expect four bytes.  */					      \
-	cnt = 4;							      \
-	ch &= 0x07;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-      {									      \
-	/* We expect five bytes.  */					      \
-	cnt = 5;							      \
-	ch &= 0x03;							      \
-      }									      \
-    else								      \
-      {									      \
-	/* We expect six bytes.  */					      \
-	cnt = 6;							      \
-	ch &= 0x01;							      \
-      }									      \
-									      \
-    /* The first byte is already consumed.  */				      \
-    r = cnt - 1;							      \
-    while (++(*inptrp) < inend)						      \
-      {									      \
-	ch <<= 6;							      \
-	ch |= **inptrp & 0x3f;						      \
-	--r;								      \
-      }									      \
-									      \
-    /* Shift for the so far missing bytes.  */				      \
-    ch <<= r * 6;							      \
-									      \
-    /* Store the number of bytes expected for the entire sequence.  */	      \
-    state->__count |= cnt << 8;						      \
-									      \
-    /* Store the value.  */						      \
-    state->__value.__wch = ch;						      \
-  }
-
-#define UNPACK_BYTES_COMMON \
-  {									      \
-    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
-    wint_t wch = state->__value.__wch;					      \
-    size_t ntotal = state->__count >> 8;				      \
-									      \
-    inlen = state->__count & 255;					      \
-									      \
-    bytebuf[0] = inmask[ntotal - 2];					      \
-									      \
-    do									      \
-      {									      \
-	if (--ntotal < inlen)						      \
-	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
-	wch >>= 6;							      \
-      }									      \
-    while (ntotal > 1);							      \
-									      \
-    bytebuf[0] |= wch;							      \
-  }
-
-#define CLEAR_STATE_COMMON \
-  state->__count = 0
-
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
-
-
-/* The software routine is copied from gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint32_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
-	       0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	/* Read the possible remaining bytes.  */			\
-	for (i = 1; i < cnt; ++i)					\
-	  {								\
-	    uint32_t byte = inptr[i];					\
-									\
-	    if ((byte & 0xc0) != 0x80)					\
-	      /* This is an illegal encoding.  */			\
-	      break;							\
-									\
-	    ch <<= 6;							\
-	    ch |= byte & 0x3f;						\
-	  }								\
-									\
-	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
-	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
-	   have been represented with fewer than cnt bytes.  */		\
-	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
-	    /* Do not accept UTF-16 surrogates.  */			\
-	    || (ch >= 0xd800 && ch <= 0xdfff)				\
-	    || (ch > 0x10ffff))						\
-	  {								\
-	    /* This is an illegal encoding.  */				\
-	    goto errout;						\
-	  }								\
-									\
-	inptr += cnt;							\
-      }									\
-									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint32_t *) outptr) = ch;					\
-    outptr += sizeof (uint32_t);					\
-  }
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
-		  "vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
-		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
-				   UTF8 chars.  */			\
-		  /* Enlarge to UCS4.  */				\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "la %[R_IN],16(%[R_IN])\n\t"				\
-		  "vuplhh %%v20,%%v18\n\t"				\
-		  "aghi %[R_INLEN],-16\n\t"				\
-		  "vupllh %%v21,%%v18\n\t"				\
-		  "aghi %[R_OUTLEN],-64\n\t"				\
-		  "vuplhh %%v22,%%v19\n\t"				\
-		  "vupllh %%v23,%%v19\n\t"				\
-		  /* Store 64 bytes to buf_out.  */			\
-		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
-		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],16,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  "10:\n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
-		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
-						     index to store. */ \
-		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "ahi %[R_TMP2],-1\n\t"				\
-		  "jl 20f\n\t"						\
-		  "vuplhb %%v18,%%v16\n\t"				\
-		  "vuplhh %%v20,%%v18\n\t"				\
-		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllh %%v21,%%v18\n\t"				\
-		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllb %%v19,%%v16\n\t"				\
-		  "vuplhh %%v22,%%v19\n\t"				\
-		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
-		  "ahi %[R_TMP2],-16\n\t"				\
-		  "jl 11f\n\t"						\
-		  "vupllh %%v23,%%v19\n\t"				\
-		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
-		  "11:\n\t"						\
-		  /* Update pointers.  */				\
-		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
-		    ASM_CLOBBER_VR ("v31")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-/* These definitions apply to the UTF-8 to UTF-32 direction.  The
-   software implementation for UTF-8 still supports multibyte
-   characters up to 6 bytes whereas the hardware variant does not.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-
-# define LOOP_NEED_FLAGS
-
-# define STORE_REST		STORE_REST_COMMON
-# define UNPACK_BYTES		UNPACK_BYTES_COMMON
-# define CLEAR_STATE		CLEAR_STATE_COMMON
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-
-/* Conversion from UTF-32 internal/BE to UTF-8.  */
-#define BODY_TO_HW(ASM)							\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* The hardware routine uses the S/390 cu41 instruction.  */
-#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
-
-/* The hardware routine uses the S/390 vector and cu41 instructions.  */
-#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
-
-/* The software routine mimics the S/390 cu41 instruction.  */
-#define BODY_TO_C						\
-  {								\
-    uint32_t wc = *((const uint32_t *) inptr);			\
-								\
-    if (__glibc_likely (wc <= 0x7f))				\
-      {								\
-	/* Single UTF-8 char.  */				\
-	*outptr = (uint8_t)wc;					\
-	outptr++;						\
-      }								\
-    else if (wc <= 0x7ff)					\
-      {								\
-	/* Two UTF-8 chars.  */					\
-	if (__glibc_unlikely (outptr + 2 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-								\
-	outptr[0] = 0xc0;					\
-	outptr[0] |= wc >> 6;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= wc & 0x3f;					\
-								\
-	outptr += 2;						\
-      }								\
-    else if (wc <= 0xffff)					\
-      {								\
-	/* Three UTF-8 chars.  */				\
-	if (__glibc_unlikely (outptr + 3 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-	if (wc >= 0xd800 && wc < 0xdc00)			\
-	  {							\
-	    /* Do not accept UTF-16 surrogates.   */		\
-	    result = __GCONV_ILLEGAL_INPUT;			\
-	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	  }							\
-	outptr[0] = 0xe0;					\
-	outptr[0] |= wc >> 12;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= (wc >> 6) & 0x3f;				\
-								\
-	outptr[2] = 0x80;					\
-	outptr[2] |= wc & 0x3f;					\
-								\
-	outptr += 3;						\
-      }								\
-      else if (wc <= 0x10ffff)					\
-	{							\
-	  /* Four UTF-8 chars.  */				\
-	  if (__glibc_unlikely (outptr + 4 > outend))		\
-	    {							\
-	      /* Overflow in the output buffer.  */		\
-	      result = __GCONV_FULL_OUTPUT;			\
-	      break;						\
-	    }							\
-	  outptr[0] = 0xf0;					\
-	  outptr[0] |= wc >> 18;				\
-								\
-	  outptr[1] = 0x80;					\
-	  outptr[1] |= (wc >> 12) & 0x3f;			\
-								\
-	  outptr[2] = 0x80;					\
-	  outptr[2] |= (wc >> 6) & 0x3f;			\
-								\
-	  outptr[3] = 0x80;					\
-	  outptr[3] |= wc & 0x3f;				\
-								\
-	  outptr += 4;						\
-	}							\
-      else							\
-	{							\
-	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	}							\
-    inptr += 4;							\
-  }
-
-#define HW_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2;						\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
-		  "vzero %%v21\n\t"					\
-		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
-		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
-		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
-		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
-		  "lghi %[R_TMP],0\n\t"					\
-		  /* Shorten to byte values.  */			\
-		  "vpkf %%v23,%%v16,%%v17\n\t"				\
-		  "vpkf %%v24,%%v18,%%v19\n\t"				\
-		  "vpkh %%v23,%%v23,%%v24\n\t"				\
-		  /* Checking for values > 0x7f.  */			\
-		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
-		  "jno 10f\n\t"						\
-		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
-		  "jno 11f\n\t"						\
-		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
-		  "jno 12f\n\t"						\
-		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
-		  "jno 13f\n\t"						\
-		  /* Store 16bytes to outptr.  */			\
-		  "vst %%v23,0(%[R_OUT])\n\t"				\
-		  "aghi %[R_INLEN],-64\n\t"				\
-		  "aghi %[R_OUTLEN],-16\n\t"				\
-		  "la %[R_IN],64(%[R_IN])\n\t"				\
-		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "clgijl %[R_INLEN],64,20f\n\t"			\
-		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "j 1b\n\t"						\
-		  /* Found a value > 0x7f.  */				\
-		  "13: ahi %[R_TMP],4\n\t"				\
-		  "12: ahi %[R_TMP],4\n\t"				\
-		  "11: ahi %[R_TMP],4\n\t"				\
-		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
-		  "srlg %[R_I],%[R_I],2\n\t"				\
-		  "agr %[R_I],%[R_TMP]\n\t"				\
-		  "je 20f\n\t"						\
-		  /* Store characters before invalid one...  */		\
-		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
-		  "15: aghi %[R_I],-1\n\t"				\
-		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
-		  /* ... and update pointers.  */			\
-		  "aghi %[R_I],1\n\t"					\
-		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
-		  "sllg %[R_I],%[R_I],2\n\t"				\
-		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
-		  "slgr %[R_INLEN],%[R_I]\n\t"				\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
-		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
-		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
-		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
-		    , [R_I] "=a" (tmp2)					\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
-		    ASM_CLOBBER_VR ("v24")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_c
-#define BODY			BODY_TO_C
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY			BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf8_loop_etf3eh;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
new file mode 100644
index 0000000..ecf06bd
--- /dev/null
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -0,0 +1,636 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM_UTF32               0x0000feffu
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16               0xfeff
+
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		2
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf16_loop
+#define TO_LOOP			__to_utf16_loop
+#define FROM_DIRECTION		(dir == from_utf16)
+#define ONE_DIRECTION           0
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf16,
+  from_utf16
+};
+
+struct utf16_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf16_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf16;
+    }
+  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
+	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf16;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf16)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      if (dir == to_utf16)						\
+	{								\
+	  /* Emit the UTF-16 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 2 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put16u (outbuf, BOM_UTF16);					\
+	  outbuf += 2;							\
+	}								\
+      else								\
+	{								\
+	  /* Emit the UTF-32 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 4 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put32u (outbuf, BOM_UTF32);					\
+	  outbuf += 4;							\
+	}								\
+    }
+
+/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_FROM_C							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
+      {									\
+	/* No surrogate.  */						\
+	put32 (outptr, u1);						\
+	inptr += 2;							\
+      }									\
+    else								\
+      {									\
+	/* An isolated low-surrogate was found.  This has to be         \
+	   considered ill-formed.  */					\
+	if (__glibc_unlikely (u1 >= 0xdc00))				\
+	  {								\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	/* It's a surrogate character.  At least the first word says	\
+	   it is.  */							\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    /* We don't have enough input for another complete input	\
+	       character.  */						\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	uint16_t u2 = get16 (inptr);					\
+	if (__builtin_expect (u2 < 0xdc00, 0)				\
+	    || __builtin_expect (u2 > 0xdfff, 0))			\
+	  {								\
+	    /* This is no valid second word for a surrogate.  */	\
+	    inptr -= 2;							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+									\
+	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
+	inptr += 2;							\
+      }									\
+    outptr += 4;							\
+  }
+
+#define BODY_FROM_VX							\
+  {									\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
+		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t"						\
+		  /* Enlarge to UTF-32.  */				\
+		  "vuplhh %%v17,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vupllh %%v18,%%v16\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
+		  "aghi %[R_OUTLEN],-32\n\t"				\
+		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
+		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  /* At least on uint16_t is in range of surrogates.	\
+		     Store the preceding chars.  */			\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "vuplhh %%v17,%%v16\n\t"				\
+		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 12f\n\t"						\
+		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "vupllh %%v18,%%v16\n\t"				\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "12: lghi %[R_TMP2],16\n\t"				\
+		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
+		  "srl %[R_TMP2],1\n\t"					\
+		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
+		  "aghi %[R_OUTLEN],-4\n\t"				\
+		  "j 16f\n\t"						\
+		  /* Handle remaining bytes.  */			\
+		  "2:\n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "clgfi %[R_INLEN],1\n\t"				\
+		  "je 97f\n\t" /* Only one byte available.  */		\
+		  "jl 99f\n\t" /* End if no bytes available.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle remaining uint16_t values.  */		\
+		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 96f \n\t"						\
+		  "clfi %[R_TMP],0xd800\n\t"				\
+		  "jhe 15f\n\t"						\
+		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],13b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Handle UTF-16 surrogate pair.  */			\
+		  "15: clfi %[R_TMP],0xdfff\n\t"			\
+		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
+		  "16: clfi %[R_TMP],0xdc00\n\t"			\
+		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
+		  "slgfi %[R_INLEN],4\n\t"				\
+		  "jl 97f\n\t" /* Big enough input?  */			\
+		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "slfi %[R_TMP],0xd7c0\n\t"				\
+		  "sll %[R_TMP],10\n\t"					\
+		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
+		  "nilf %[R_TMP3],0xfc00\n\t"				\
+		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "jne 98f\n\t"						\
+		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "aghi %[R_TMP2],-2\n\t"				\
+		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  "96:\n\t" /* Return full output.  */			\
+		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "97:\n\t" /* Return incomplete input.  */		\
+		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
+		  "j 99f\n\t"						\
+		  "98:\n\t" /* Return Illegal character.  */		\
+		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
+  }
+
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__from_utf16_loop_c
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf16_loop_c)
+__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
+__from_utf16_loop;
+
+static void *
+__from_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf16_loop_vx;
+  else
+    return __from_utf16_loop_c;
+}
+
+strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
+#else
+# define LOOPFCT		FROM_LOOP
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+#endif
+
+/* Conversion from UTF-32 internal/BE to UTF-16.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_TO_C							\
+  {									\
+    uint32_t c = get32 (inptr);						\
+									\
+    if (__builtin_expect (c <= 0xd7ff, 1)				\
+	|| (c >=0xdc00 && c <= 0xffff))					\
+      {									\
+	/* Two UTF-16 chars.  */					\
+	put16 (outptr, c);						\
+      }									\
+    else if (__builtin_expect (c >= 0x10000, 1)				\
+	     && __builtin_expect (c <= 0x10ffff, 1))			\
+      {									\
+	/* Four UTF-16 chars.  */					\
+	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
+	uint16_t out;							\
+									\
+	/* Generate a surrogate character.  */				\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	out = 0xd800;							\
+	out |= (zabcd & 0xff) << 6;					\
+	out |= (c >> 10) & 0x3f;					\
+	put16 (outptr, out);						\
+	outptr += 2;							\
+									\
+	out = 0xdc00;							\
+	out |= c & 0x3ff;						\
+	put16 (outptr, out);						\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    outptr += 2;							\
+    inptr += 4;								\
+  }
+
+#define BODY_TO_ETF3EH							\
+  {									\
+    HARDWARE_CONVERT ("cu42 %0, %1");					\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars			\
+		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
+		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP2],0\n\t"				\
+		  /* Shorten to UTF-16.  */				\
+		  "vpkf %%v18,%%v16,%%v17\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t"						\
+		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
+		  "jno 11f\n\t"						\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "vst %%v18,0(%[R_OUT])\n\t"				\
+		  "la %[R_IN],32(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-32\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],32,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
+		     and check for ch >= 0x10000. (v30, v31)  */	\
+		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
+		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
+		  /* At least on UTF32 char is in range of surrogates.	\
+		     Store the preceding characters.  */		\
+		  "11: ahi %[R_TMP2],16\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
+		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 20f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handles UTF16 surrogates with convert instruction.  */ \
+		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_TO_VX
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf16_loop_c)
+__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
+__to_utf16_loop;
+
+static void *
+__to_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf16_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf16_loop_etf3eh;
+  else
+    return __to_utf16_loop_c;
+}
+
+strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
+
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf16-z9.c b/sysdeps/s390/utf8-utf16-z9.c
new file mode 100644
index 0000000..29a0bf9
--- /dev/null
+++ b/sysdeps/s390/utf8-utf16-z9.c
@@ -0,0 +1,818 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define MAX_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16	0xfeff
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the UTF-16 Byte Order Mark.  */				\
+      if (__glibc_unlikely (outbuf + 2 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put16u (outbuf, BOM_UTF16);					\
+      outbuf += 2;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-16.  */
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
+		  "vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Enlarge to UTF-16.  */				\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
+		  "aghi %[R_OUTLEN],-32\n\t"				\
+		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
+		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
+						     index to store. */ \
+		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "ahi %[R_TMP2],-1\n\t"				\
+		  "jl 20f\n\t"						\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11:\n\t" /* Update pointers.  */			\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+
+/* The software implementation is based on the code in gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint16_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0	\
+	       or 0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	if (cnt == 4)							\
+	  {								\
+	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
+	       low) are needed.  */					\
+	    uint16_t zabcd, high, low;					\
+									\
+	    if (__glibc_unlikely (outptr + 4 > outend))			\
+	      {								\
+		/* Overflow in the output buffer.  */			\
+		result = __GCONV_FULL_OUTPUT;				\
+		break;							\
+	      }								\
+									\
+	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		if ((inptr[i] & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  goto errout;						\
+	      }								\
+									\
+	    /* See Principles of Operations cu12.  */			\
+	    zabcd = (((inptr[0] & 0x7) << 2) |				\
+		     ((inptr[1] & 0x30) >> 4)) - 1;			\
+									\
+	    /* z-bit must be zero after subtracting 1.  */		\
+	    if (zabcd & 0x10)						\
+	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
+									\
+	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
+	    high |= zabcd << 6;                         /* abcd bits */	\
+	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
+	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
+									\
+	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
+	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
+	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
+	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
+									\
+	    put16 (outptr, high);					\
+	    outptr += 2;						\
+	    put16 (outptr, low);					\
+	    outptr += 2;						\
+	    inptr += 4;							\
+	    continue;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Read the possible remaining bytes.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		uint16_t byte = inptr[i];				\
+									\
+		if ((byte & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  break;						\
+									\
+		ch <<= 6;						\
+		ch |= byte & 0x3f;					\
+	      }								\
+									\
+	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
+	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+	       have been represented with fewer than cnt bytes.  */	\
+	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
+		/* Do not accept UTF-16 surrogates.  */			\
+		|| (ch >= 0xd800 && ch <= 0xdfff))			\
+	      {								\
+		/* This is an illegal encoding.  */			\
+		goto errout;						\
+	      }								\
+									\
+	    inptr += cnt;						\
+	  }								\
+      }									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint16_t *) outptr) = ch;					\
+    outptr += sizeof (uint16_t);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+/* Conversion from UTF-16 to UTF-8.  */
+
+/* The software routine is based on the functionality of the S/390
+   hardware instruction (cu21) as described in the Principles of
+   Operation.  */
+#define BODY_TO_C							\
+  {									\
+    uint16_t c = get16 (inptr);						\
+									\
+    if (__glibc_likely (c <= 0x007f))					\
+      {									\
+	/* Single byte UTF-8 char.  */					\
+	*outptr = c & 0xff;						\
+	outptr++;							\
+      }									\
+    else if (c >= 0x0080 && c <= 0x07ff)				\
+      {									\
+	/* Two byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 2 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	outptr[0] = 0xc0;						\
+	outptr[0] |= c >> 6;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= c & 0x3f;						\
+									\
+	outptr += 2;							\
+      }									\
+    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
+      {									\
+	/* Three byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 3 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	outptr[0] = 0xe0;						\
+	outptr[0] |= c >> 12;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (c >> 6) & 0x3f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= c & 0x3f;						\
+									\
+	outptr += 3;							\
+      }									\
+    else if (c >= 0xd800 && c <= 0xdbff)				\
+      {									\
+	/* Four byte UTF-8 char.  */					\
+	uint16_t low, uvwxy;						\
+									\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	low = get16 (inptr);						\
+									\
+	if ((low & 0xfc00) != 0xdc00)					\
+	  {								\
+	    inptr -= 2;							\
+	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	uvwxy = ((c >> 6) & 0xf) + 1;					\
+	outptr[0] = 0xf0;						\
+	outptr[0] |= uvwxy >> 2;					\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (uvwxy << 4) & 0x30;				\
+	outptr[1] |= (c >> 2) & 0x0f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= (c & 0x03) << 4;					\
+	outptr[2] |= (low >> 6) & 0x0f;					\
+									\
+	outptr[3] = 0x80;						\
+	outptr[3] |= low & 0x3f;					\
+									\
+	outptr += 4;							\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+      }									\
+    inptr += 2;								\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    size_t inlen  = inend - inptr;					\
+    size_t outlen  = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for values <= 0x7f.  */		\
+		  "larl %[R_TMP],9f\n\t"				\
+		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
+		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP2],0\n\t"				\
+		  /* Check for > 1byte UTF-8 chars.  */			\
+		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
+		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Shorten to UTF-8.  */				\
+		  "vpkh %%v18,%%v16,%%v17\n\t"				\
+		  "la %[R_IN],32(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-32\n\t"				\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "vst %%v18,0(%[R_OUT])\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],32,2f\n\t"				\
+		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
+		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
+		  "10:\n\t"						\
+		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
+		  /* Shorten to UTF-8.  */				\
+		  "vpkh %%v18,%%v16,%%v17\n\t"				\
+		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
+		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "jl 13f\n\t"						\
+		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  "13:\n\t"						\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "lghi %[R_TMP2],16\n\t"				\
+		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  "j 22f\n\t"						\
+		  /* Handle remaining bytes.  */			\
+		  "2:\n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "clgfi %[R_INLEN],1\n\t"				\
+		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
+		  "jle 99f\n\t" /* End if less than two bytes.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "aghi %[R_INLEN],-2\n\t"				\
+		  /* Test if ch is 1-byte UTF-8 char.  */		\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "jl 90f \n\t"						\
+		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 2-byte UTF-8 char.  */		\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "jh 23f\n\t"						\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llill %[R_TMP3],0xc080\n\t"				\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
+		  "jh 24f\n\t"						\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "jl 90f \n\t"						\
+		  "llilf %[R_TMP3],0xe08080\n\t"			\
+		  "la %[R_IN],2(%[R_IN])\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
+		  "brctg %[R_TMP2],20b\n\t"				\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0xdfff\n\t"			\
+		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
+		  "clfi %[R_TMP],0xdbff\n\t"				\
+		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
+				  without a preceding high surrogate.  */ \
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "jl 90f \n\t"						\
+		  "slgfi %[R_INLEN],2\n\t"				\
+		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
+		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
+		  "llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "aghi %[R_TMP],0x40\n\t"				\
+		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
+		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
+		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
+		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
+		  "nilf %[R_TMP],0xfc00\n\t"				\
+		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
+		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "la %[R_IN],4(%[R_IN])\n\t"				\
+		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
+		  "aghi %[R_TMP2],-2\n\t"				\
+		  "jh 20b\n\t"						\
+		  "j 0b\n\t" /* Switch to vx-loop.  */			\
+		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__to_utf8_loop_c
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate loop-function with software implementation.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY                   BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+#else
+# define LOOPFCT		TO_LOOP
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
new file mode 100644
index 0000000..1b2d6a2
--- /dev/null
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -0,0 +1,820 @@
+/* Conversion between UTF-8 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		6
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM			0x0000feffu
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the Byte Order Mark.  */					\
+      if (__glibc_unlikely (outbuf + 4 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put32u (outbuf, BOM);						\
+      outbuf += 4;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
+
+#define STORE_REST_COMMON						      \
+  {									      \
+    /* We store the remaining bytes while converting them into the UCS4	      \
+       format.  We can assume that the first byte in the buffer is	      \
+       correct and that it requires a larger number of bytes than there	      \
+       are in the input buffer.  */					      \
+    wint_t ch = **inptrp;						      \
+    size_t cnt, r;							      \
+									      \
+    state->__count = inend - *inptrp;					      \
+									      \
+    assert (ch != 0xc0 && ch != 0xc1);					      \
+    if (ch >= 0xc2 && ch < 0xe0)					      \
+      {									      \
+	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
+	   0xc1, otherwise the wide character could have been		      \
+	   represented using a single byte.  */				      \
+	cnt = 2;							      \
+	ch &= 0x1f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+      {									      \
+	/* We expect three bytes.  */					      \
+	cnt = 3;							      \
+	ch &= 0x0f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+      {									      \
+	/* We expect four bytes.  */					      \
+	cnt = 4;							      \
+	ch &= 0x07;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
+      {									      \
+	/* We expect five bytes.  */					      \
+	cnt = 5;							      \
+	ch &= 0x03;							      \
+      }									      \
+    else								      \
+      {									      \
+	/* We expect six bytes.  */					      \
+	cnt = 6;							      \
+	ch &= 0x01;							      \
+      }									      \
+									      \
+    /* The first byte is already consumed.  */				      \
+    r = cnt - 1;							      \
+    while (++(*inptrp) < inend)						      \
+      {									      \
+	ch <<= 6;							      \
+	ch |= **inptrp & 0x3f;						      \
+	--r;								      \
+      }									      \
+									      \
+    /* Shift for the so far missing bytes.  */				      \
+    ch <<= r * 6;							      \
+									      \
+    /* Store the number of bytes expected for the entire sequence.  */	      \
+    state->__count |= cnt << 8;						      \
+									      \
+    /* Store the value.  */						      \
+    state->__value.__wch = ch;						      \
+  }
+
+#define UNPACK_BYTES_COMMON \
+  {									      \
+    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
+    wint_t wch = state->__value.__wch;					      \
+    size_t ntotal = state->__count >> 8;				      \
+									      \
+    inlen = state->__count & 255;					      \
+									      \
+    bytebuf[0] = inmask[ntotal - 2];					      \
+									      \
+    do									      \
+      {									      \
+	if (--ntotal < inlen)						      \
+	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
+	wch >>= 6;							      \
+      }									      \
+    while (ntotal > 1);							      \
+									      \
+    bytebuf[0] |= wch;							      \
+  }
+
+#define CLEAR_STATE_COMMON \
+  state->__count = 0
+
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
+
+
+/* The software routine is copied from gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint32_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
+	       0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	/* Read the possible remaining bytes.  */			\
+	for (i = 1; i < cnt; ++i)					\
+	  {								\
+	    uint32_t byte = inptr[i];					\
+									\
+	    if ((byte & 0xc0) != 0x80)					\
+	      /* This is an illegal encoding.  */			\
+	      break;							\
+									\
+	    ch <<= 6;							\
+	    ch |= byte & 0x3f;						\
+	  }								\
+									\
+	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
+	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
+	   have been represented with fewer than cnt bytes.  */		\
+	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
+	    /* Do not accept UTF-16 surrogates.  */			\
+	    || (ch >= 0xd800 && ch <= 0xdfff)				\
+	    || (ch > 0x10ffff))						\
+	  {								\
+	    /* This is an illegal encoding.  */				\
+	    goto errout;						\
+	  }								\
+									\
+	inptr += cnt;							\
+      }									\
+									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint32_t *) outptr) = ch;					\
+    outptr += sizeof (uint32_t);					\
+  }
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
+		  "vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
+		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
+				   UTF8 chars.  */			\
+		  /* Enlarge to UCS4.  */				\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "la %[R_IN],16(%[R_IN])\n\t"				\
+		  "vuplhh %%v20,%%v18\n\t"				\
+		  "aghi %[R_INLEN],-16\n\t"				\
+		  "vupllh %%v21,%%v18\n\t"				\
+		  "aghi %[R_OUTLEN],-64\n\t"				\
+		  "vuplhh %%v22,%%v19\n\t"				\
+		  "vupllh %%v23,%%v19\n\t"				\
+		  /* Store 64 bytes to buf_out.  */			\
+		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
+		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],16,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
+		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
+						     index to store. */ \
+		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "ahi %[R_TMP2],-1\n\t"				\
+		  "jl 20f\n\t"						\
+		  "vuplhb %%v18,%%v16\n\t"				\
+		  "vuplhh %%v20,%%v18\n\t"				\
+		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllh %%v21,%%v18\n\t"				\
+		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllb %%v19,%%v16\n\t"				\
+		  "vuplhh %%v22,%%v19\n\t"				\
+		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
+		  "ahi %[R_TMP2],-16\n\t"				\
+		  "jl 11f\n\t"						\
+		  "vupllh %%v23,%%v19\n\t"				\
+		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
+		  "11:\n\t"						\
+		  /* Update pointers.  */				\
+		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+		    ASM_CLOBBER_VR ("v31")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+/* These definitions apply to the UTF-8 to UTF-32 direction.  The
+   software implementation for UTF-8 still supports multibyte
+   characters up to 6 bytes whereas the hardware variant does not.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+
+# define LOOP_NEED_FLAGS
+
+# define STORE_REST		STORE_REST_COMMON
+# define UNPACK_BYTES		UNPACK_BYTES_COMMON
+# define CLEAR_STATE		CLEAR_STATE_COMMON
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+
+/* Conversion from UTF-32 internal/BE to UTF-8.  */
+#define BODY_TO_HW(ASM)							\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* The hardware routine uses the S/390 cu41 instruction.  */
+#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
+
+/* The hardware routine uses the S/390 vector and cu41 instructions.  */
+#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
+
+/* The software routine mimics the S/390 cu41 instruction.  */
+#define BODY_TO_C						\
+  {								\
+    uint32_t wc = *((const uint32_t *) inptr);			\
+								\
+    if (__glibc_likely (wc <= 0x7f))				\
+      {								\
+	/* Single UTF-8 char.  */				\
+	*outptr = (uint8_t)wc;					\
+	outptr++;						\
+      }								\
+    else if (wc <= 0x7ff)					\
+      {								\
+	/* Two UTF-8 chars.  */					\
+	if (__glibc_unlikely (outptr + 2 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+								\
+	outptr[0] = 0xc0;					\
+	outptr[0] |= wc >> 6;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= wc & 0x3f;					\
+								\
+	outptr += 2;						\
+      }								\
+    else if (wc <= 0xffff)					\
+      {								\
+	/* Three UTF-8 chars.  */				\
+	if (__glibc_unlikely (outptr + 3 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+	if (wc >= 0xd800 && wc < 0xdc00)			\
+	  {							\
+	    /* Do not accept UTF-16 surrogates.   */		\
+	    result = __GCONV_ILLEGAL_INPUT;			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	  }							\
+	outptr[0] = 0xe0;					\
+	outptr[0] |= wc >> 12;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= (wc >> 6) & 0x3f;				\
+								\
+	outptr[2] = 0x80;					\
+	outptr[2] |= wc & 0x3f;					\
+								\
+	outptr += 3;						\
+      }								\
+      else if (wc <= 0x10ffff)					\
+	{							\
+	  /* Four UTF-8 chars.  */				\
+	  if (__glibc_unlikely (outptr + 4 > outend))		\
+	    {							\
+	      /* Overflow in the output buffer.  */		\
+	      result = __GCONV_FULL_OUTPUT;			\
+	      break;						\
+	    }							\
+	  outptr[0] = 0xf0;					\
+	  outptr[0] |= wc >> 18;				\
+								\
+	  outptr[1] = 0x80;					\
+	  outptr[1] |= (wc >> 12) & 0x3f;			\
+								\
+	  outptr[2] = 0x80;					\
+	  outptr[2] |= (wc >> 6) & 0x3f;			\
+								\
+	  outptr[3] = 0x80;					\
+	  outptr[3] |= wc & 0x3f;				\
+								\
+	  outptr += 4;						\
+	}							\
+      else							\
+	{							\
+	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	}							\
+    inptr += 4;							\
+  }
+
+#define HW_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2;						\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		  "vzero %%v21\n\t"					\
+		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
+		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		  "lghi %[R_TMP],0\n\t"					\
+		  /* Shorten to byte values.  */			\
+		  "vpkf %%v23,%%v16,%%v17\n\t"				\
+		  "vpkf %%v24,%%v18,%%v19\n\t"				\
+		  "vpkh %%v23,%%v23,%%v24\n\t"				\
+		  /* Checking for values > 0x7f.  */			\
+		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
+		  "jno 10f\n\t"						\
+		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
+		  "jno 11f\n\t"						\
+		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
+		  "jno 12f\n\t"						\
+		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
+		  "jno 13f\n\t"						\
+		  /* Store 16bytes to outptr.  */			\
+		  "vst %%v23,0(%[R_OUT])\n\t"				\
+		  "aghi %[R_INLEN],-64\n\t"				\
+		  "aghi %[R_OUTLEN],-16\n\t"				\
+		  "la %[R_IN],64(%[R_IN])\n\t"				\
+		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "clgijl %[R_INLEN],64,20f\n\t"			\
+		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "j 1b\n\t"						\
+		  /* Found a value > 0x7f.  */				\
+		  "13: ahi %[R_TMP],4\n\t"				\
+		  "12: ahi %[R_TMP],4\n\t"				\
+		  "11: ahi %[R_TMP],4\n\t"				\
+		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		  "srlg %[R_I],%[R_I],2\n\t"				\
+		  "agr %[R_I],%[R_TMP]\n\t"				\
+		  "je 20f\n\t"						\
+		  /* Store characters before invalid one...  */		\
+		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
+		  "15: aghi %[R_I],-1\n\t"				\
+		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
+		  /* ... and update pointers.  */			\
+		  "aghi %[R_I],1\n\t"					\
+		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
+		  "sllg %[R_I],%[R_I],2\n\t"				\
+		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
+		  "slgr %[R_INLEN],%[R_I]\n\t"				\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
+		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
+		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
+		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
+		    , [R_I] "=a" (tmp2)					\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+		    ASM_CLOBBER_VR ("v24")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_c
+#define BODY			BODY_TO_C
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY			BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf8_loop_etf3eh;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+
+#include <iconv/skeleton.c>
-- 
2.3.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/14] Fix ucs4le_internal_loop in error case.
  2016-02-23  9:21 ` [PATCH 13/14] Fix ucs4le_internal_loop in error case Stefan Liebler
@ 2016-02-23 17:42   ` Joseph Myers
  2016-02-25  9:00     ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Joseph Myers @ 2016-02-23 17:42 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

If this is user-visible in a release, there should be a bug filed in 
Bugzilla (if there isn't one already open), and a testcase added to the 
testsuite.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-02-23  9:23 ` [PATCH 14/14] Fix UTF-16 surrogate handling Stefan Liebler
@ 2016-02-23 17:57   ` Joseph Myers
  2016-02-25 12:57     ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Joseph Myers @ 2016-02-23 17:57 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

If this is user-visible in a release, there should be a bug filed in 
Bugzilla (if there isn't one already open), and a testcase added to the 
testsuite.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/14] Fix ucs4le_internal_loop in error case.
  2016-02-23 17:42   ` Joseph Myers
@ 2016-02-25  9:00     ` Stefan Liebler
  2016-03-18 13:04       ` Stefan Liebler
  2016-03-31  9:45       ` Andreas Schwab
  0 siblings, 2 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-02-25  9:00 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

On 02/23/2016 06:41 PM, Joseph Myers wrote:
> If this is user-visible in a release, there should be a bug filed in
> Bugzilla (if there isn't one already open), and a testcase added to the
> testsuite.
>
okay.

I've filed the bug
"Bug 19726 - Converting UCS4LE to INTERNAL with iconv() does not update 
pointers and lengths in error-case."
(https://sourceware.org/bugzilla/show_bug.cgi?id=19726)

This patch also adds a new testcase for this issue.
The new test was tested on a s390, power, intel machine.

ChangeLog:

	[BZ #19726]
	* iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
	outptrp in case of an illegal input.
	* iconv/tst-iconv6.c: New file.
	* iconv/Makefile (tests): Add tst-iconv6.

[-- Attachment #2: Fix-ucs4le_internal_loop-in-error-case.-BZ-19726.patch --]
[-- Type: text/x-patch, Size: 4140 bytes --]

diff --git a/iconv/Makefile b/iconv/Makefile
index b008707..c2299c9 100644
--- a/iconv/Makefile
+++ b/iconv/Makefile
@@ -42,7 +42,7 @@ CFLAGS-charmap.c = -DCHARMAP_PATH='"$(i18ndir)/charmaps"' \
 CFLAGS-linereader.c = -DNO_TRANSLITERATION
 CFLAGS-simple-hash.c = -I../locale
 
-tests	= tst-iconv1 tst-iconv2 tst-iconv3 tst-iconv4 tst-iconv5
+tests	= tst-iconv1 tst-iconv2 tst-iconv3 tst-iconv4 tst-iconv5 tst-iconv6
 
 others		= iconv_prog iconvconfig
 install-others-programs	= $(inst_bindir)/iconv
diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index 5412bd6..f66bf34 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -638,6 +638,8 @@ ucs4le_internal_loop (struct __gconv_step *step,
 	      continue;
 	    }
 
+	  *inptrp = inptr;
+	  *outptrp = outptr;
 	  return __GCONV_ILLEGAL_INPUT;
 	}
 
diff --git a/iconv/tst-iconv6.c b/iconv/tst-iconv6.c
new file mode 100644
index 0000000..57d7f38
--- /dev/null
+++ b/iconv/tst-iconv6.c
@@ -0,0 +1,117 @@
+/* Testing ucs4le_internal_loop() in gconv_simple.c.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <inttypes.h>
+#include <iconv.h>
+#include <byteswap.h>
+
+static int
+do_test (void)
+{
+  iconv_t cd;
+  char *inptr;
+  size_t inlen;
+  char *outptr;
+  size_t outlen;
+  size_t n;
+  int e;
+  int result = 0;
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+  /* On big-endian machines, ucs4le_internal_loop() swaps the bytes before
+     error checking. Thus the input values has to be swapped.  */
+# define VALUE(val) bswap_32 (val)
+#else
+# define VALUE(val) val
+#endif
+  uint32_t inbuf[3] = { VALUE (0x41), VALUE (0x80000000), VALUE (0x42) };
+  uint32_t outbuf[3] = { 0, 0, 0 };
+
+  cd = iconv_open ("WCHAR_T", "UCS-4LE");
+  if (cd == (iconv_t) -1)
+    {
+      printf ("cannot convert from UCS4LE to wchar_t: %m\n");
+      return 1;
+    }
+
+  inptr = (char *) inbuf;
+  inlen = sizeof (inbuf);
+  outptr = (char *) outbuf;
+  outlen = sizeof (outbuf);
+
+  n = iconv (cd, &inptr, &inlen, &outptr, &outlen);
+  e = errno;
+
+  if (n != (size_t) -1)
+    {
+      printf ("incorrect iconv() return value: %zd, expected -1\n", n);
+      result = 1;
+    }
+
+  if (e != EILSEQ)
+    {
+      printf ("incorrect error value: %s, expected %s\n",
+	      strerror (e), strerror (EILSEQ));
+      result = 1;
+    }
+
+  if (inptr != (char *) &inbuf[1])
+    {
+      printf ("inptr=0x%p does not point to invalid character! Expected=0x%p\n"
+	      , inptr, &inbuf[1]);
+      result = 1;
+    }
+
+  if (inlen != sizeof (inbuf) - sizeof (uint32_t))
+    {
+      printf ("inlen=%zd != %zd\n"
+	      , inlen, sizeof (inbuf) - sizeof (uint32_t));
+      result = 1;
+    }
+
+  if (outptr != (char *) &outbuf[1])
+    {
+      printf ("outptr=0x%p does not point to invalid character in inbuf! "
+	      "Expected=0x%p\n"
+	      , outptr, &outbuf[1]);
+      result = 1;
+    }
+
+  if (outlen != sizeof (inbuf) - sizeof (uint32_t))
+    {
+      printf ("outlen=%zd != %zd\n"
+	      , outlen, sizeof (outbuf) - sizeof (uint32_t));
+      result = 1;
+    }
+
+  if (outbuf[0] != 0x41 || outbuf[1] != 0 || outbuf[2] != 0)
+    {
+      puts ("Characters conversion is incorrect!");
+      result = 1;
+    }
+
+  iconv_close (cd);
+
+  return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
-- 
2.3.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-02-23 17:57   ` Joseph Myers
@ 2016-02-25 12:57     ` Stefan Liebler
  2016-03-18 13:05       ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-02-25 12:57 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1114 bytes --]

On 02/23/2016 06:42 PM, Joseph Myers wrote:
> If this is user-visible in a release, there should be a bug filed in
> Bugzilla (if there isn't one already open), and a testcase added to the
> testsuite.
>
okay.

I've filed the bug
"Bug 19727 - Converting from/to UTF-xx with iconv() does not always 
report errors on UTF-16 surrogates values."
(https://sourceware.org/bugzilla/show_bug.cgi?id=19727)

This patch also adds a new testcase, which checks UTF conversions with
input values in range of UTF16 surrogates. The test converts from
UTF-xx to INTERNAL, INTERNAL to UTF-xx and directly between
UTF-xx to UTF-yy. The latter conversion is needed because s390 has
iconv-modules, which converts from/to UTF in one step.
The new testcase was tested on a s390, power and intel machine.

ChangeLog:

	[BZ #19727]
	* iconvdata/utf-16.c (BODY): Report an error if first word is
	not a valid high surrogate.
	* iconvdata/utf-32.c (BODY): Report an error if the value is
	in range of an utf16 surrogate.
	* iconv/gconv_simple.c (BODY): Likewise.
	* iconv/tst-iconv7.c: New file.
	* iconv/Makefile (tests): Add tst-iconv7.

[-- Attachment #2: Fix-UTF-16-surrogate-handling.-BZ-19727.patch --]
[-- Type: text/x-patch, Size: 12996 bytes --]

diff --git a/iconv/Makefile b/iconv/Makefile
index c2299c9..30c8e83 100644
--- a/iconv/Makefile
+++ b/iconv/Makefile
@@ -42,7 +42,8 @@ CFLAGS-charmap.c = -DCHARMAP_PATH='"$(i18ndir)/charmaps"' \
 CFLAGS-linereader.c = -DNO_TRANSLITERATION
 CFLAGS-simple-hash.c = -I../locale
 
-tests	= tst-iconv1 tst-iconv2 tst-iconv3 tst-iconv4 tst-iconv5 tst-iconv6
+tests	= tst-iconv1 tst-iconv2 tst-iconv3 tst-iconv4 tst-iconv5 tst-iconv6 \
+	  tst-iconv7
 
 others		= iconv_prog iconvconfig
 install-others-programs	= $(inst_bindir)/iconv
diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index f66bf34..e5284e4 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -892,7 +892,8 @@ ucs4le_internal_loop_single (struct __gconv_step *step,
     if (__glibc_likely (wc < 0x80))					      \
       /* It's an one byte sequence.  */					      \
       *outptr++ = (unsigned char) wc;					      \
-    else if (__glibc_likely (wc <= 0x7fffffff))				      \
+    else if (__glibc_likely (wc <= 0x7fffffff				      \
+			     && (wc < 0xd800 || wc > 0xdfff)))		      \
       {									      \
 	size_t step;							      \
 	unsigned char *start;						      \
diff --git a/iconv/tst-iconv7.c b/iconv/tst-iconv7.c
new file mode 100644
index 0000000..fc2e33e
--- /dev/null
+++ b/iconv/tst-iconv7.c
@@ -0,0 +1,263 @@
+/* Testing UTF conversions with UTF16 surrogates as input.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <inttypes.h>
+#include <iconv.h>
+#include <byteswap.h>
+
+static int
+run_conversion (const char *from, const char *to, char *inbuf, size_t inbuflen
+		, int exp_errno, int line)
+{
+  char outbuf[16];
+  iconv_t cd;
+  char *inptr;
+  size_t inlen;
+  char *outptr;
+  size_t outlen;
+  size_t n;
+  int e;
+  int fails = 0;
+
+  cd = iconv_open (to, from);
+  if (cd == (iconv_t) -1)
+    {
+      printf ("line %d: cannot convert from %s to %s: %m\n", line, from, to);
+      return 1;
+    }
+
+  inptr = (char *) inbuf;
+  inlen = inbuflen;
+  outptr = outbuf;
+  outlen = sizeof (outbuf);
+
+  errno = 0;
+  n = iconv (cd, &inptr, &inlen, &outptr, &outlen);
+  e = errno;
+
+  if (exp_errno == 0)
+    {
+      if (n == (size_t) -1)
+	{
+	  puts ("n should be >= 0, but n == -1");
+	  fails ++;
+	}
+
+      if (e != 0)
+	{
+	  printf ("errno should be 0: 'Success', but errno == %d: '%s'\n"
+		  , e, strerror(e));
+	  fails ++;
+	}
+    }
+  else
+    {
+      if (n != (size_t) -1)
+	{
+	  printf ("n should be -1, but n == %zd\n", n);
+	  fails ++;
+	}
+
+      if (e != exp_errno)
+	{
+	  printf ("errno should be %d: '%s', but errno == %d: '%s'\n"
+		  , exp_errno, strerror (exp_errno), e, strerror (e));
+	  fails ++;
+	}
+    }
+
+  iconv_close (cd);
+
+  if (fails > 0)
+    {
+      printf ("Errors in line %d while converting %s to %s.\n\n"
+	      , line, from, to);
+    }
+
+  return fails;
+}
+
+static int
+do_test (void)
+{
+  int fails = 0;
+  char buf[4];
+
+  /* This test runs iconv() with UTF character in range of an UTF16 surrogate.
+     UTF-16 high surrogate is in range 0xD800..0xDBFF and
+     UTF-16 low surrogate is in range 0xDC00..0xDFFF.
+     Converting from or to UTF-xx has to report errors in those cases.
+     In UTF-16, surrogate pairs with a high surrogate in front of a low
+     surrogate is valid.  */
+
+  /* Use RUN_UCS4_UTF32_INPUT to test conversion ...
+
+     ... from INTERNAL to UTF-xx[LE|BE]:
+     Converting from UCS4 to UTF-xx[LE|BE] first converts UCS4 to INTERNAL
+     without checking for UTF-16 surrogate values
+     and then converts from INTERNAL to UTF-xx[LE|BE].
+     The latter conversion has to report an error in those cases.
+
+     ... from UTF-32[LE|BE] to INTERNAL:
+     Converting directly from UTF-32LE to UTF-8|16 is needed,
+     because e.g. s390x has iconv-modules which converts directly.  */
+#define RUN_UCS4_UTF32_INPUT(b0, b1, b2, b3, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  buf[3] = b3;								\
+  fails += run_conversion ("UCS4", "UTF-8", buf, 4, err, line);		\
+  fails += run_conversion ("UCS4", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-16BE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-32LE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-32BE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "WCHAR_T", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-8", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-16BE", buf, 4, err, line);	\
+  buf[0] = b3;								\
+  buf[1] = b2;								\
+  buf[2] = b1;								\
+  buf[3] = b0;								\
+  fails += run_conversion ("UTF-32LE", "WCHAR_T", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-8", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-16BE", buf, 4, err, line);
+
+  /* Use UCS4/UTF32 input of 0xD7FF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xD7, 0xFF, 0, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xD800.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xD8, 0x00, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDBFF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDB, 0xFF, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDC00.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDC, 0x00, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDFFF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDF, 0xFF, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xE000.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xE0, 0x00, 0, __LINE__);
+
+
+  /* Use RUN_UTF16_INPUT to test conversion from UTF16[LE|BE] to INTERNAL.
+     Converting directly from UTF-16 to UTF-8|32 is needed,
+     because e.g. s390x has iconv-modules which converts directly.
+     Use len == 2 or 4 to specify one or two UTF-16 characters.  */
+#define RUN_UTF16_INPUT(b0, b1, b2, b3, len, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  buf[3] = b3;								\
+  fails += run_conversion ("UTF-16BE", "WCHAR_T", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16BE", "UTF-8", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16BE", "UTF-32LE", buf, len, err, line); \
+  fails += run_conversion ("UTF-16BE", "UTF-32BE", buf, len, err, line); \
+  buf[0] = b1;								\
+  buf[1] = b0;								\
+  buf[2] = b3;								\
+  buf[3] = b2;								\
+  fails += run_conversion ("UTF-16LE", "WCHAR_T", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16LE", "UTF-8", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16LE", "UTF-32LE", buf, len, err, line); \
+  fails += run_conversion ("UTF-16LE", "UTF-32BE", buf, len, err, line);
+
+  /* Use UTF16 input of 0xD7FF.  */
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xD7, 0xFF, 4, 0, __LINE__);
+
+  /* Use [single] UTF16 high surrogate 0xD800 [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid low surrogate].  */
+  RUN_UTF16_INPUT (0xD8, 0x0, 0x0, 0x0, 2, EINVAL, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xD8, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xE0, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xDC, 0x0, 4, 0, __LINE__);
+
+  /* Use [single] UTF16 high surrogate 0xDBFF [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid low surrogate].  */
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0x0, 0x0, 2, EINVAL, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDB, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xE0, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDF, 0xFF, 4, 0, __LINE__);
+
+  /* Use single UTF16 low surrogate 0xDC00 [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid high surrogate].   */
+  RUN_UTF16_INPUT (0xDC, 0x0, 0x0, 0x0, 2, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDC, 0x0, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xDC, 0x0, 4, 0, __LINE__);
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDC, 0x0, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+
+  /* Use single UTF16 low surrogate 0xDFFF [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid high surrogate].   */
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0x0, 0x0, 2, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDF, 0xFF, 4, 0, __LINE__);
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xE000.  */
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xE0, 0x0, 4, 0, __LINE__);
+
+
+  /* Use RUN_UTF8_3BYTE_INPUT to test conversion from UTF-8 to INTERNAL.
+     Converting directly from UTF-8 to UTF-16|32 is needed,
+     because e.g. s390x has iconv-modules which converts directly.  */
+#define RUN_UTF8_3BYTE_INPUT(b0, b1, b2, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  fails += run_conversion ("UTF-8", "WCHAR_T", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-16LE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-16BE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-32LE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-32BE", buf, 3, err, line);
+
+  /* Use UTF-8 input of 0xD7FF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0x9F, 0xBF, 0, __LINE__);
+
+  /* Use UTF-8 input of 0xD800.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xA0, 0x80, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDBFF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xAF, 0xBF, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDC00.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xB0, 0x80, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDFFF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xBF, 0xBF, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xF000.  */
+  RUN_UTF8_3BYTE_INPUT (0xEF, 0x80, 0x80, 0, __LINE__);
+
+  return fails > 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
diff --git a/iconvdata/utf-16.c b/iconvdata/utf-16.c
index 2d74a13..dbbcd6d 100644
--- a/iconvdata/utf-16.c
+++ b/iconvdata/utf-16.c
@@ -295,6 +295,12 @@ gconv_end (struct __gconv_step *data)
 	  {								      \
 	    uint16_t u2;						      \
 									      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
@@ -329,6 +335,12 @@ gconv_end (struct __gconv_step *data)
 	  }								      \
 	else								      \
 	  {								      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
diff --git a/iconvdata/utf-32.c b/iconvdata/utf-32.c
index 0d6fe30..25f6fc6 100644
--- a/iconvdata/utf-32.c
+++ b/iconvdata/utf-32.c
@@ -239,7 +239,7 @@ gconv_end (struct __gconv_step *data)
     if (swap)								      \
       u1 = bswap_32 (u1);						      \
 									      \
-    if (__glibc_unlikely (u1 >= 0x110000))				      \
+    if (__glibc_unlikely (u1 >= 0x110000 || (u1 >= 0xd800 && u1 < 0xe000)))   \
       {									      \
 	/* This is illegal.  */						      \
 	STANDARD_FROM_LOOP_ERR_HANDLER (4);				      \
-- 
2.3.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/14] S390: Optimize iconv modules.
  2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
                   ` (13 preceding siblings ...)
  2016-02-23  9:23 ` [PATCH 14/14] Fix UTF-16 surrogate handling Stefan Liebler
@ 2016-03-01 15:01 ` Stefan Liebler
  2016-03-08 12:33   ` Stefan Liebler
  14 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-03-01 15:01 UTC (permalink / raw)
  To: libc-alpha

Ping

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> Hi,
>
> this patch set introduces optimized iconv modules for S390/S390x.
>
> The first patches prepare for the latter optimizations.
> A make warning is eliminated, the order in gconv-modules file is
> changed for the s390 specific modules and a new configure check
> is introduced.
>
> The next patches optimize the current s390 specific iconv modules
> and generic or built-in ones. The optimizations are done e.g. with
> vector instructions, if gcc, binutils can handle those.
> At compile time, the relevant functions are build with/without the
> vector instructions. On runtime, the appropiate function is choosen
> with an ifunc resolver.
>
> The current s390-specific iconv-modules are used on 64bit only.
> These modules are reworked to run on S390 31bit, too.
>
> The last patches fixes some errors. Unfortunately, some of the s390
> convert instructions do not report errors on UTF-16 low surrogates,
> thus those failing instructions has to be disabled. Perhaps those
> instructions can be reenabled in future. Some common-code modules
> have similar problems, which are fixed, too.
>
> The testsuite runs without new test failures. Tests were executed for 31/64bit
> with binutils that do/don't support the z13 vector instructions.
>
> Please review.
> Ok to commit?
>
> Stefan Liebler (14):
>    S390: Get rid of make warning: overriding recipe for target
>      gconv-modules.
>    S390: Mention s390-specific gconv-modues before common ones.
>    S390: Configure check for vector support in gcc.
>    S390: Optimize 8bit-generic iconv modules.
>    S390: Optimize builtin iconv-modules.
>    S390: Optimize iso-8859-1 to ibm037 iconv-module.
>    S390: Optimize utf8-utf32 module.
>    S390: Optimize utf8-utf16 module.
>    S390: Optimize utf16-utf32 module.
>    S390: Use s390-64 specific ionv-modules on s390-32, too.
>    S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).
>    S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).
>    Fix ucs4le_internal_loop in error case.
>    Fix UTF-16 surrogate handling.
>
>   config.h.in                                  |    4 +
>   iconv/gconv_simple.c                         |    5 +-
>   iconvdata/Makefile                           |   15 +-
>   iconvdata/utf-16.c                           |   12 +
>   iconvdata/utf-32.c                           |    2 +-
>   sysdeps/s390/Makefile                        |   83 ++
>   sysdeps/s390/configure                       |   32 +
>   sysdeps/s390/configure.ac                    |   21 +
>   sysdeps/s390/iso-8859-1_cp037_z900.c         |  262 ++++++
>   sysdeps/s390/multiarch/8bit-generic.c        |  485 ++++++++++
>   sysdeps/s390/multiarch/Makefile              |    4 +
>   sysdeps/s390/multiarch/gconv_simple.c        | 1266 ++++++++++++++++++++++++++
>   sysdeps/s390/multiarch/iconv/skeleton.c      |   21 +
>   sysdeps/s390/s390-64/Makefile                |   81 --
>   sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c |  237 -----
>   sysdeps/s390/s390-64/utf16-utf32-z9.c        |  337 -------
>   sysdeps/s390/s390-64/utf8-utf16-z9.c         |  471 ----------
>   sysdeps/s390/s390-64/utf8-utf32-z9.c         |  511 -----------
>   sysdeps/s390/utf16-utf32-z9.c                |  605 ++++++++++++
>   sysdeps/s390/utf8-utf16-z9.c                 |  818 +++++++++++++++++
>   sysdeps/s390/utf8-utf32-z9.c                 |  862 ++++++++++++++++++
>   21 files changed, 4493 insertions(+), 1641 deletions(-)
>   create mode 100644 sysdeps/s390/Makefile
>   create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
>   create mode 100644 sysdeps/s390/multiarch/8bit-generic.c
>   create mode 100644 sysdeps/s390/multiarch/gconv_simple.c
>   create mode 100644 sysdeps/s390/multiarch/iconv/skeleton.c
>   delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
>   delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
>   delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
>   delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
>   create mode 100644 sysdeps/s390/utf16-utf32-z9.c
>   create mode 100644 sysdeps/s390/utf8-utf16-z9.c
>   create mode 100644 sysdeps/s390/utf8-utf32-z9.c
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/14] S390: Optimize iconv modules.
  2016-03-01 15:01 ` [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
@ 2016-03-08 12:33   ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-03-08 12:33 UTC (permalink / raw)
  To: libc-alpha

ping

On 03/01/2016 04:01 PM, Stefan Liebler wrote:
> Ping
>
> On 02/23/2016 10:21 AM, Stefan Liebler wrote:
>> Hi,
>>
>> this patch set introduces optimized iconv modules for S390/S390x.
>>
>> The first patches prepare for the latter optimizations.
>> A make warning is eliminated, the order in gconv-modules file is
>> changed for the s390 specific modules and a new configure check
>> is introduced.
>>
>> The next patches optimize the current s390 specific iconv modules
>> and generic or built-in ones. The optimizations are done e.g. with
>> vector instructions, if gcc, binutils can handle those.
>> At compile time, the relevant functions are build with/without the
>> vector instructions. On runtime, the appropiate function is choosen
>> with an ifunc resolver.
>>
>> The current s390-specific iconv-modules are used on 64bit only.
>> These modules are reworked to run on S390 31bit, too.
>>
>> The last patches fixes some errors. Unfortunately, some of the s390
>> convert instructions do not report errors on UTF-16 low surrogates,
>> thus those failing instructions has to be disabled. Perhaps those
>> instructions can be reenabled in future. Some common-code modules
>> have similar problems, which are fixed, too.
>>
>> The testsuite runs without new test failures. Tests were executed for
>> 31/64bit
>> with binutils that do/don't support the z13 vector instructions.
>>
>> Please review.
>> Ok to commit?
>>
>> Stefan Liebler (14):
>>    S390: Get rid of make warning: overriding recipe for target
>>      gconv-modules.
>>    S390: Mention s390-specific gconv-modues before common ones.
>>    S390: Configure check for vector support in gcc.
>>    S390: Optimize 8bit-generic iconv modules.
>>    S390: Optimize builtin iconv-modules.
>>    S390: Optimize iso-8859-1 to ibm037 iconv-module.
>>    S390: Optimize utf8-utf32 module.
>>    S390: Optimize utf8-utf16 module.
>>    S390: Optimize utf16-utf32 module.
>>    S390: Use s390-64 specific ionv-modules on s390-32, too.
>>    S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).
>>    S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).
>>    Fix ucs4le_internal_loop in error case.
>>    Fix UTF-16 surrogate handling.
>>
>>   config.h.in                                  |    4 +
>>   iconv/gconv_simple.c                         |    5 +-
>>   iconvdata/Makefile                           |   15 +-
>>   iconvdata/utf-16.c                           |   12 +
>>   iconvdata/utf-32.c                           |    2 +-
>>   sysdeps/s390/Makefile                        |   83 ++
>>   sysdeps/s390/configure                       |   32 +
>>   sysdeps/s390/configure.ac                    |   21 +
>>   sysdeps/s390/iso-8859-1_cp037_z900.c         |  262 ++++++
>>   sysdeps/s390/multiarch/8bit-generic.c        |  485 ++++++++++
>>   sysdeps/s390/multiarch/Makefile              |    4 +
>>   sysdeps/s390/multiarch/gconv_simple.c        | 1266
>> ++++++++++++++++++++++++++
>>   sysdeps/s390/multiarch/iconv/skeleton.c      |   21 +
>>   sysdeps/s390/s390-64/Makefile                |   81 --
>>   sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c |  237 -----
>>   sysdeps/s390/s390-64/utf16-utf32-z9.c        |  337 -------
>>   sysdeps/s390/s390-64/utf8-utf16-z9.c         |  471 ----------
>>   sysdeps/s390/s390-64/utf8-utf32-z9.c         |  511 -----------
>>   sysdeps/s390/utf16-utf32-z9.c                |  605 ++++++++++++
>>   sysdeps/s390/utf8-utf16-z9.c                 |  818 +++++++++++++++++
>>   sysdeps/s390/utf8-utf32-z9.c                 |  862 ++++++++++++++++++
>>   21 files changed, 4493 insertions(+), 1641 deletions(-)
>>   create mode 100644 sysdeps/s390/Makefile
>>   create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
>>   create mode 100644 sysdeps/s390/multiarch/8bit-generic.c
>>   create mode 100644 sysdeps/s390/multiarch/gconv_simple.c
>>   create mode 100644 sysdeps/s390/multiarch/iconv/skeleton.c
>>   delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
>>   delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
>>   delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
>>   delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
>>   create mode 100644 sysdeps/s390/utf16-utf32-z9.c
>>   create mode 100644 sysdeps/s390/utf8-utf16-z9.c
>>   create mode 100644 sysdeps/s390/utf8-utf32-z9.c
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/14] S390: Optimize builtin iconv-modules.
  2016-02-23  9:22 ` [PATCH 05/14] S390: Optimize builtin iconv-modules Stefan Liebler
@ 2016-03-18 12:58   ` Stefan Liebler
  2016-04-21 14:51     ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-03-18 12:58 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 50531 bytes --]

Hi,

I've updated the vector loop functions
internal_ucs2_loop and internal_ucs2reverse_loop.
The old patch contained lhi statements to initialize %[R_TMP],
which is later used to calculate an address.
This patch uses lghi statements to initialize %[R_TMP].

the ChangeLog remains the same.

Bye Stefan

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> This patch introduces a s390 specific gconv_simple.c file which provides
> optimized versions for z13 with vector instructions, which will be chosen at
> runtime via ifunc.
> The optimized conversions can convert between internal and ascii, ucs4, ucs4le,
> ucs2, ucs2le.
> If the build-environment lacks vector support, then iconv/gconv_simple.c
> is used wihtout any change. Otherwise iconvdata/gconv_simple.c is used to create
> conversion loop routines without vector instructions as fallback, if vector
> instructions aren't available at runtime.
>
> ChangeLog:
>
> 	* sysdeps/s390/multiarch/gconv_simple.c: New File.
> 	* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add gconv_simple.
> ---
>   sysdeps/s390/multiarch/Makefile       |    4 +
>   sysdeps/s390/multiarch/gconv_simple.c | 1266 +++++++++++++++++++++++++++++++++
>   2 files changed, 1270 insertions(+)
>   create mode 100644 sysdeps/s390/multiarch/gconv_simple.c
>
> diff --git a/sysdeps/s390/multiarch/Makefile b/sysdeps/s390/multiarch/Makefile
> index 0805b07..5067b6f 100644
> --- a/sysdeps/s390/multiarch/Makefile
> +++ b/sysdeps/s390/multiarch/Makefile
> @@ -42,3 +42,7 @@ sysdep_routines += wcslen wcslen-vx wcslen-c \
>   		   wmemset wmemset-vx wmemset-c \
>   		   wmemcmp wmemcmp-vx wmemcmp-c
>   endif
> +
> +ifeq ($(subdir),iconv)
> +sysdep_routines += gconv_simple
> +endif
> diff --git a/sysdeps/s390/multiarch/gconv_simple.c b/sysdeps/s390/multiarch/gconv_simple.c
> new file mode 100644
> index 0000000..0e59422
> --- /dev/null
> +++ b/sysdeps/s390/multiarch/gconv_simple.c
> @@ -0,0 +1,1266 @@
> +/* Simple transformations functions - s390 version.
> +   Copyright (C) 2016 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +# include <ifunc-resolve.h>
> +
> +# if defined HAVE_S390_VX_GCC_SUPPORT
> +#  define ASM_CLOBBER_VR(NR) , NR
> +# else
> +#  define ASM_CLOBBER_VR(NR)
> +# endif
> +
> +# define ICONV_C_NAME(NAME) __##NAME##_c
> +# define ICONV_VX_NAME(NAME) __##NAME##_vx
> +# define ICONV_VX_IFUNC(FUNC)						\
> +  extern __typeof (ICONV_C_NAME (FUNC)) __##FUNC;			\
> +  s390_vx_libc_ifunc (__##FUNC)						\
> +  int FUNC (struct __gconv_step *step, struct __gconv_step_data *data,	\
> +	    const unsigned char **inptrp, const unsigned char *inend,	\
> +	    unsigned char **outbufstart, size_t *irreversible,		\
> +	    int do_flush, int consume_incomplete)			\
> +  {									\
> +    return __##FUNC (step, data, inptrp, inend,outbufstart,		\
> +		     irreversible, do_flush, consume_incomplete);	\
> +  }
> +# define ICONV_VX_SINGLE(NAME)						\
> +  static __typeof (NAME##_single) __##NAME##_vx_single __attribute__((alias(#NAME "_single")));
> +
> +/* Generate the transformations which are used, if the target machine does not
> +   support vector instructions.  */
> +# define __gconv_transform_ascii_internal		\
> +  ICONV_C_NAME (__gconv_transform_ascii_internal)
> +# define __gconv_transform_internal_ascii		\
> +  ICONV_C_NAME (__gconv_transform_internal_ascii)
> +# define __gconv_transform_internal_ucs4le		\
> +  ICONV_C_NAME (__gconv_transform_internal_ucs4le)
> +# define __gconv_transform_ucs4_internal		\
> +  ICONV_C_NAME (__gconv_transform_ucs4_internal)
> +# define __gconv_transform_ucs4le_internal		\
> +  ICONV_C_NAME (__gconv_transform_ucs4le_internal)
> +# define __gconv_transform_ucs2_internal		\
> +  ICONV_C_NAME (__gconv_transform_ucs2_internal)
> +# define __gconv_transform_ucs2reverse_internal		\
> +  ICONV_C_NAME (__gconv_transform_ucs2reverse_internal)
> +# define __gconv_transform_internal_ucs2		\
> +  ICONV_C_NAME (__gconv_transform_internal_ucs2)
> +# define __gconv_transform_internal_ucs2reverse		\
> +  ICONV_C_NAME (__gconv_transform_internal_ucs2reverse)
> +
> +
> +# include <iconv/gconv_simple.c>
> +
> +# undef __gconv_transform_ascii_internal
> +# undef __gconv_transform_internal_ascii
> +# undef __gconv_transform_internal_ucs4le
> +# undef __gconv_transform_ucs4_internal
> +# undef __gconv_transform_ucs4le_internal
> +# undef __gconv_transform_ucs2_internal
> +# undef __gconv_transform_ucs2reverse_internal
> +# undef __gconv_transform_internal_ucs2
> +# undef __gconv_transform_internal_ucs2reverse
> +
> +/* Now define the functions with vector support.  */
> +# if defined __s390x__
> +#  define CONVERT_32BIT_SIZE_T(REG)
> +# else
> +#  define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
> +# endif
> +
> +/* Convert from ISO 646-IRV to the internal (UCS4-like) format.  */
> +# define DEFINE_INIT		0
> +# define DEFINE_FINI		0
> +# define MIN_NEEDED_FROM	1
> +# define MIN_NEEDED_TO		4
> +# define FROM_DIRECTION		1
> +# define FROM_LOOP		ICONV_VX_NAME (ascii_internal_loop)
> +# define TO_LOOP		ICONV_VX_NAME (ascii_internal_loop) /* This is not used.  */
> +# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ascii_internal)
> +# define ONE_DIRECTION		1
> +
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		FROM_LOOP
> +# define BODY_ORIG_ERROR						\
> +    /* The value is too large.  We don't try transliteration here since \
> +       this is not an error because of the lack of possibilities to	\
> +       represent the result.  This is a genuine bug in the input since	\
> +       ASCII does not allow such values.  */				\
> +    STANDARD_FROM_LOOP_ERR_HANDLER (1);
> +
> +# define BODY_ORIG							\
> +  {									\
> +    if (__glibc_unlikely (*inptr > '\x7f'))				\
> +      {									\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +    else								\
> +      {									\
> +	/* It's an one byte sequence.  */				\
> +	*((uint32_t *) outptr) = *inptr++;				\
> +	outptr += sizeof (uint32_t);					\
> +      }									\
> +  }
> +# define BODY								\
> +  {									\
> +    size_t len = inend - inptr;						\
> +    if (len > (outend - outptr) / 4)					\
> +      len = (outend - outptr) / 4;					\
> +    size_t loop_count, tmp;						\
> +    __asm__ volatile (".machine push\n\t"				\
> +		      ".machine \"z13\"\n\t"				\
> +		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
> +		      "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
> +		      "srlg %[R_LI],%[R_LEN],4\n\t"			\
> +		      "vrepib %%v31,0x20\n\t"				\
> +		      "clgije %[R_LI],0,1f\n\t"				\
> +		      "0:\n\t" /* Handle 16-byte blocks.  */		\
> +		      "vl %%v16,0(%[R_IN])\n\t"				\
> +		      /* Checking for values > 0x7f.  */		\
> +		      "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
> +		      "jno 10f\n\t"					\
> +		      /* Enlarge to UCS4.  */				\
> +		      "vuplhb %%v17,%%v16\n\t"				\
> +		      "vupllb %%v18,%%v16\n\t"				\
> +		      "vuplhh %%v19,%%v17\n\t"				\
> +		      "vupllh %%v20,%%v17\n\t"				\
> +		      "vuplhh %%v21,%%v18\n\t"				\
> +		      "vupllh %%v22,%%v18\n\t"				\
> +		      /* Store 64bytes to buf_out.  */			\
> +		      "vstm %%v19,%%v22,0(%[R_OUT])\n\t"		\
> +		      "la %[R_IN],16(%[R_IN])\n\t"			\
> +		      "la %[R_OUT],64(%[R_OUT])\n\t"			\
> +		      "brctg %[R_LI],0b\n\t"				\
> +		      "lghi %[R_LI],15\n\t"				\
> +		      "ngr %[R_LEN],%[R_LI]\n\t"			\
> +		      "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
> +		      /* Handle remaining bytes.  */			\
> +		      "1: aghik %[R_LI],%[R_LEN],-1\n\t"		\
> +		      "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
> +		      "vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
> +		      /* Checking for values > 0x7f.  */		\
> +		      "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
> +		      "vlgvb %[R_TMP],%%v17,7\n\t"			\
> +		      "clr %[R_TMP],%[R_LI]\n\t"			\
> +		      "locrh %[R_TMP],%[R_LEN]\n\t"			\
> +		      "locghih %[R_LEN],0\n\t"				\
> +		      "j 12f\n\t"					\
> +		      "10:\n\t"						\
> +		      /* Found a value > 0x7f.				\
> +			 Store the preceding chars.  */			\
> +		      "vlgvb %[R_TMP],%%v17,7\n\t"			\
> +		      "12: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
> +		      "sllk %[R_TMP],%[R_TMP],2\n\t"			\
> +		      "ahi %[R_TMP],-1\n\t"				\
> +		      "jl 20f\n\t"					\
> +		      "lgr %[R_LI],%[R_TMP]\n\t"			\
> +		      "vuplhb %%v17,%%v16\n\t"				\
> +		      "vuplhh %%v19,%%v17\n\t"				\
> +		      "vstl %%v19,%[R_LI],0(%[R_OUT])\n\t"		\
> +		      "ahi %[R_LI],-16\n\t"				\
> +		      "jl 11f\n\t"					\
> +		      "vupllh %%v20,%%v17\n\t"				\
> +		      "vstl %%v20,%[R_LI],16(%[R_OUT])\n\t"		\
> +		      "ahi %[R_LI],-16\n\t"				\
> +		      "jl 11f\n\t"					\
> +		      "vupllb %%v18,%%v16\n\t"				\
> +		      "vuplhh %%v21,%%v18\n\t"				\
> +		      "vstl %%v21,%[R_LI],32(%[R_OUT])\n\t"		\
> +		      "ahi %[R_LI],-16\n\t"				\
> +		      "jl 11f\n\t"					\
> +		      "vupllh %%v22,%%v18\n\t"				\
> +		      "vstl %%v22,%[R_LI],48(%[R_OUT])\n\t"		\
> +		      "11:\n\t"						\
> +		      "la %[R_OUT],1(%[R_TMP],%[R_OUT])\n\t"		\
> +		      "20:\n\t"						\
> +		      ".machine pop"					\
> +		      : /* outputs */ [R_OUT] "+a" (outptr)		\
> +			, [R_IN] "+a" (inptr)				\
> +			, [R_LEN] "+d" (len)				\
> +			, [R_LI] "=d" (loop_count)			\
> +			, [R_TMP] "=a" (tmp)				\
> +		      : /* inputs */					\
> +		      : /* clobber list*/ "memory", "cc"		\
> +			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
> +			ASM_CLOBBER_VR ("v31")				\
> +		      );						\
> +    if (len > 0)							\
> +      {									\
> +	/* Found an invalid character at the next input byte.  */	\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +  }
> +
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +# include <iconv/skeleton.c>
> +# undef BODY_ORIG
> +# undef BODY_ORIG_ERROR
> +ICONV_VX_IFUNC (__gconv_transform_ascii_internal)
> +
> +/* Convert from the internal (UCS4-like) format to ISO 646-IRV.  */
> +# define DEFINE_INIT		0
> +# define DEFINE_FINI		0
> +# define MIN_NEEDED_FROM	4
> +# define MIN_NEEDED_TO		1
> +# define FROM_DIRECTION		1
> +# define FROM_LOOP		ICONV_VX_NAME (internal_ascii_loop)
> +# define TO_LOOP		ICONV_VX_NAME (internal_ascii_loop) /* This is not used.  */
> +# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ascii)
> +# define ONE_DIRECTION		1
> +
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		FROM_LOOP
> +# define BODY_ORIG_ERROR						\
> +  UNICODE_TAG_HANDLER (*((const uint32_t *) inptr), 4);			\
> +  STANDARD_TO_LOOP_ERR_HANDLER (4);
> +
> +# define BODY_ORIG							\
> +  {									\
> +    if (__glibc_unlikely (*((const uint32_t *) inptr) > 0x7f))		\
> +      {									\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +    else								\
> +      {									\
> +	/* It's an one byte sequence.  */				\
> +	*outptr++ = *((const uint32_t *) inptr);			\
> +	inptr += sizeof (uint32_t);					\
> +      }									\
> +  }
> +# define BODY								\
> +  {									\
> +    size_t len = (inend - inptr) / 4;					\
> +    if (len > outend - outptr)						\
> +      len = outend - outptr;						\
> +    size_t loop_count, tmp, tmp2;					\
> +    __asm__ volatile (".machine push\n\t"				\
> +		      ".machine \"z13\"\n\t"				\
> +		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
> +		      /* Setup to check for ch > 0x7f.  */		\
> +		      "vzero %%v21\n\t"					\
> +		      "srlg %[R_LI],%[R_LEN],4\n\t"			\
> +		      "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
> +		      "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
> +		      "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
> +		      "lghi %[R_TMP],0\n\t"				\
> +		      "clgije %[R_LI],0,1f\n\t"				\
> +		      "0:\n\t"						\
> +		      "vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
> +		      /* Shorten to byte values.  */			\
> +		      "vpkf %%v23,%%v16,%%v17\n\t"			\
> +		      "vpkf %%v24,%%v18,%%v19\n\t"			\
> +		      "vpkh %%v23,%%v23,%%v24\n\t"			\
> +		      /* Checking for values > 0x7f.  */		\
> +		      "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
> +		      "jno 10f\n\t"					\
> +		      "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
> +		      "jno 11f\n\t"					\
> +		      "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
> +		      "jno 12f\n\t"					\
> +		      "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
> +		      "jno 13f\n\t"					\
> +		      /* Store 16bytes to outptr.  */			\
> +		      "vst %%v23,0(%[R_OUT])\n\t"			\
> +		      "la %[R_IN],64(%[R_IN])\n\t"			\
> +		      "la %[R_OUT],16(%[R_OUT])\n\t"			\
> +		      "brctg %[R_LI],0b\n\t"				\
> +		      "lghi %[R_LI],15\n\t"				\
> +		      "ngr %[R_LEN],%[R_LI]\n\t"			\
> +		      "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
> +		      /* Handle remaining bytes.  */			\
> +		      "1: sllg %[R_LI],%[R_LEN],2\n\t"			\
> +		      "aghi %[R_LI],-1\n\t"				\
> +		      "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
> +		      /* Load remaining 1...63 bytes.  */		\
> +		      "vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
> +		      "ahi %[R_LI],-16\n\t"				\
> +		      "jl 2f\n\t"					\
> +		      "vll %%v17,%[R_LI],16(%[R_IN])\n\t"		\
> +		      "ahi %[R_LI],-16\n\t"				\
> +		      "jl 2f\n\t"					\
> +		      "vll %%v18,%[R_LI],32(%[R_IN])\n\t"		\
> +		      "ahi %[R_LI],-16\n\t"				\
> +		      "jl 2f\n\t"					\
> +		      "vll %%v19,%[R_LI],48(%[R_IN])\n\t"		\
> +		      "2:\n\t"						\
> +		      /* Shorten to byte values.  */			\
> +		      "vpkf %%v23,%%v16,%%v17\n\t"			\
> +		      "vpkf %%v24,%%v18,%%v19\n\t"			\
> +		      "vpkh %%v23,%%v23,%%v24\n\t"			\
> +		      "sllg %[R_LI],%[R_LEN],2\n\t"			\
> +		      "aghi %[R_LI],-16\n\t"				\
> +		      "jl 3f\n\t" /* v16 is not fully loaded.  */	\
> +		      "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
> +		      "jno 10f\n\t"					\
> +		      "aghi %[R_LI],-16\n\t"				\
> +		      "jl 4f\n\t" /* v17 is not fully loaded.  */	\
> +		      "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
> +		      "jno 11f\n\t"					\
> +		      "aghi %[R_LI],-16\n\t"				\
> +		      "jl 5f\n\t" /* v18 is not fully loaded.  */	\
> +		      "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
> +		      "jno 12f\n\t"					\
> +		      "aghi %[R_LI],-16\n\t"				\
> +		      /* v19 is not fully loaded. */			\
> +		      "lghi %[R_TMP],12\n\t"				\
> +		      "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
> +		      "6: vlgvb %[R_I],%%v22,7\n\t"			\
> +		      "aghi %[R_LI],16\n\t"				\
> +		      "clrjl %[R_I],%[R_LI],14f\n\t"			\
> +		      "lgr %[R_I],%[R_LEN]\n\t"				\
> +		      "lghi %[R_LEN],0\n\t"				\
> +		      "j 15f\n\t"					\
> +		      "3: vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
> +		      "j 6b\n\t"					\
> +		      "4: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
> +		      "lghi %[R_TMP],4\n\t"				\
> +		      "j 6b\n\t"					\
> +		      "5: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
> +		      "lghi %[R_TMP],8\n\t"				\
> +		      "j 6b\n\t"					\
> +		      /* Found a value > 0x7f.  */			\
> +		      "13: ahi %[R_TMP],4\n\t"				\
> +		      "12: ahi %[R_TMP],4\n\t"				\
> +		      "11: ahi %[R_TMP],4\n\t"				\
> +		      "10: vlgvb %[R_I],%%v22,7\n\t"			\
> +		      "14: srlg %[R_I],%[R_I],2\n\t"			\
> +		      "agr %[R_I],%[R_TMP]\n\t"				\
> +		      "je 20f\n\t"					\
> +		      /* Store characters before invalid one...  */	\
> +		      "15: aghi %[R_I],-1\n\t"				\
> +		      "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
> +		      /* ... and update pointers.  */			\
> +		      "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
> +		      "sllg %[R_I],%[R_I],2\n\t"			\
> +		      "la %[R_IN],4(%[R_I],%[R_IN])\n\t"		\
> +		      "20:\n\t"						\
> +		      ".machine pop"					\
> +		      : /* outputs */ [R_OUT] "+a" (outptr)		\
> +			, [R_IN] "+a" (inptr)				\
> +			, [R_LEN] "+d" (len)				\
> +			, [R_LI] "=d" (loop_count)			\
> +			, [R_I] "=a" (tmp2)				\
> +			, [R_TMP] "=d" (tmp)				\
> +		      : /* inputs */					\
> +		      : /* clobber list*/ "memory", "cc"		\
> +			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
> +			ASM_CLOBBER_VR ("v24")				\
> +		      );						\
> +    if (len > 0)							\
> +      {									\
> +	/* Found an invalid character > 0x7f at next character.  */	\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +  }
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +# include <iconv/skeleton.c>
> +# undef BODY_ORIG
> +# undef BODY_ORIG_ERROR
> +ICONV_VX_IFUNC (__gconv_transform_internal_ascii)
> +
> +
> +/* Convert from internal UCS4 to UCS4 little endian form.  */
> +# define DEFINE_INIT		0
> +# define DEFINE_FINI		0
> +# define MIN_NEEDED_FROM	4
> +# define MIN_NEEDED_TO		4
> +# define FROM_DIRECTION		1
> +# define FROM_LOOP		ICONV_VX_NAME (internal_ucs4le_loop)
> +# define TO_LOOP		ICONV_VX_NAME (internal_ucs4le_loop) /* This is not used.  */
> +# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs4le)
> +# define ONE_DIRECTION		0
> +
> +static inline int
> +__attribute ((always_inline))
> +ICONV_VX_NAME (internal_ucs4le_loop) (struct __gconv_step *step,
> +				      struct __gconv_step_data *step_data,
> +				      const unsigned char **inptrp,
> +				      const unsigned char *inend,
> +				      unsigned char **outptrp,
> +				      unsigned char *outend,
> +				      size_t *irreversible)
> +{
> +  const unsigned char *inptr = *inptrp;
> +  unsigned char *outptr = *outptrp;
> +  int result;
> +  size_t len = MIN (inend - inptr, outend - outptr) / 4;
> +  size_t loop_count;
> +  __asm__ volatile (".machine push\n\t"
> +		    ".machine \"z13\"\n\t"
> +		    ".machinemode \"zarch_nohighgprs\"\n\t"
> +		    CONVERT_32BIT_SIZE_T ([R_LEN])
> +		    "bras %[R_LI],1f\n\t"
> +		    /* Vector permute mask:  */
> +		    ".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
> +		    "1: vl %%v20,0(%[R_LI])\n\t"
> +		    /* Process 64byte (16char) blocks.  */
> +		    "srlg %[R_LI],%[R_LEN],4\n\t"
> +		    "clgije %[R_LI],0,10f\n\t"
> +		    "0: vlm %%v16,%%v19,0(%[R_IN])\n\t"
> +		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
> +		    "vperm %%v17,%%v17,%%v17,%%v20\n\t"
> +		    "vperm %%v18,%%v18,%%v18,%%v20\n\t"
> +		    "vperm %%v19,%%v19,%%v19,%%v20\n\t"
> +		    "vstm %%v16,%%v19,0(%[R_OUT])\n\t"
> +		    "la %[R_IN],64(%[R_IN])\n\t"
> +		    "la %[R_OUT],64(%[R_OUT])\n\t"
> +		    "brctg %[R_LI],0b\n\t"
> +		    "llgfr %[R_LEN],%[R_LEN]\n\t"
> +		    "nilf %[R_LEN],15\n\t"
> +		    /* Process 16byte (4char) blocks.  */
> +		    "10: srlg %[R_LI],%[R_LEN],2\n\t"
> +		    "clgije %[R_LI],0,20f\n\t"
> +		    "11: vl %%v16,0(%[R_IN])\n\t"
> +		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
> +		    "vst %%v16,0(%[R_OUT])\n\t"
> +		    "la %[R_IN],16(%[R_IN])\n\t"
> +		    "la %[R_OUT],16(%[R_OUT])\n\t"
> +		    "brctg %[R_LI],11b\n\t"
> +		    "nill %[R_LEN],3\n\t"
> +		    /* Process <16bytes.  */
> +		    "20: sll %[R_LEN],2\n\t"
> +		    "ahi %[R_LEN],-1\n\t"
> +		    "jl 30f\n\t"
> +		    "vll %%v16,%[R_LEN],0(%[R_IN])\n\t"
> +		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
> +		    "vstl %%v16,%[R_LEN],0(%[R_OUT])\n\t"
> +		    "la %[R_IN],1(%[R_LEN],%[R_IN])\n\t"
> +		    "la %[R_OUT],1(%[R_LEN],%[R_OUT])\n\t"
> +		    "30: \n\t"
> +		    ".machine pop"
> +		    : /* outputs */ [R_OUT] "+a" (outptr)
> +		      , [R_IN] "+a" (inptr)
> +		      , [R_LI] "=a" (loop_count)
> +		      , [R_LEN] "+a" (len)
> +		    : /* inputs */
> +		    : /* clobber list*/ "memory", "cc"
> +		      ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")
> +		      ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")
> +		      ASM_CLOBBER_VR ("v20")
> +		    );
> +  *inptrp = inptr;
> +  *outptrp = outptr;
> +
> +  /* Determine the status.  */
> +  if (*inptrp == inend)
> +    result = __GCONV_EMPTY_INPUT;
> +  else if (*outptrp + 4 > outend)
> +    result = __GCONV_FULL_OUTPUT;
> +  else
> +    result = __GCONV_INCOMPLETE_INPUT;
> +
> +  return result;
> +}
> +
> +ICONV_VX_SINGLE (internal_ucs4le_loop)
> +# include <iconv/skeleton.c>
> +ICONV_VX_IFUNC (__gconv_transform_internal_ucs4le)
> +
> +
> +/* Transform from UCS4 to the internal, UCS4-like format.  Unlike
> +   for the other direction we have to check for correct values here.  */
> +# define DEFINE_INIT		0
> +# define DEFINE_FINI		0
> +# define MIN_NEEDED_FROM	4
> +# define MIN_NEEDED_TO		4
> +# define FROM_DIRECTION		1
> +# define FROM_LOOP		ICONV_VX_NAME (ucs4_internal_loop)
> +# define TO_LOOP		ICONV_VX_NAME (ucs4_internal_loop) /* This is not used.  */
> +# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4_internal)
> +# define ONE_DIRECTION		0
> +
> +
> +static inline int
> +__attribute ((always_inline))
> +ICONV_VX_NAME (ucs4_internal_loop) (struct __gconv_step *step,
> +				    struct __gconv_step_data *step_data,
> +				    const unsigned char **inptrp,
> +				    const unsigned char *inend,
> +				    unsigned char **outptrp,
> +				    unsigned char *outend,
> +				    size_t *irreversible)
> +{
> +  int flags = step_data->__flags;
> +  const unsigned char *inptr = *inptrp;
> +  unsigned char *outptr = *outptrp;
> +  int result;
> +  size_t len, loop_count;
> +  do
> +    {
> +      len = MIN (inend - inptr, outend - outptr) / 4;
> +      __asm__ volatile (".machine push\n\t"
> +			".machine \"z13\"\n\t"
> +			".machinemode \"zarch_nohighgprs\"\n\t"
> +			CONVERT_32BIT_SIZE_T ([R_LEN])
> +			/* Setup to check for ch > 0x7fffffff.  */
> +			"larl %[R_LI],9f\n\t"
> +			"vlm %%v20,%%v21,0(%[R_LI])\n\t"
> +			"srlg %[R_LI],%[R_LEN],2\n\t"
> +			"clgije %[R_LI],0,1f\n\t"
> +			/* Process 16byte (4char) blocks.  */
> +			"0: vl %%v16,0(%[R_IN])\n\t"
> +			"vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
> +			"jno 10f\n\t"
> +			"vst %%v16,0(%[R_OUT])\n\t"
> +			"la %[R_IN],16(%[R_IN])\n\t"
> +			"la %[R_OUT],16(%[R_OUT])\n\t"
> +			"brctg %[R_LI],0b\n\t"
> +			"llgfr %[R_LEN],%[R_LEN]\n\t"
> +			"nilf %[R_LEN],3\n\t"
> +			/* Process <16bytes.  */
> +			"1: sll %[R_LEN],2\n\t"
> +			"ahik %[R_LI],%[R_LEN],-1\n\t"
> +			"jl 20f\n\t" /* No further bytes available.  */
> +			"vll %%v16,%[R_LI],0(%[R_IN])\n\t"
> +			"vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
> +			"vlgvb %[R_LI],%%v22,7\n\t"
> +			"clr %[R_LI],%[R_LEN]\n\t"
> +			"locgrhe %[R_LI],%[R_LEN]\n\t"
> +			"locghihe %[R_LEN],0\n\t"
> +			"j 11f\n\t"
> +			/* v20: Vector string range compare values.  */
> +			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
> +			/* v21: Vector string range compare control-bits.
> +			   element 0: >; element 1: =<> (always true)  */
> +			".long 0x20000000,0xE0000000,0x0,0x0\n\t"
> +			/* Found a value > 0x7fffffff.  */
> +			"10: vlgvb %[R_LI],%%v22,7\n\t"
> +			/* Store characters before invalid one.  */
> +			"11: aghi %[R_LI],-1\n\t"
> +			"jl 20f\n\t"
> +			"vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
> +			"la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
> +			"la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
> +			"20:\n\t"
> +			".machine pop"
> +			: /* outputs */ [R_OUT] "+a" (outptr)
> +			  , [R_IN] "+a" (inptr)
> +			  , [R_LI] "=a" (loop_count)
> +			  , [R_LEN] "+d" (len)
> +			: /* inputs */
> +			: /* clobber list*/ "memory", "cc"
> +			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
> +			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
> +			);
> +      if (len > 0)
> +	{
> +	  /* The value is too large.  We don't try transliteration here since
> +	     this is not an error because of the lack of possibilities to
> +	     represent the result.  This is a genuine bug in the input since
> +	     UCS4 does not allow such values.  */
> +	  if (irreversible == NULL)
> +	    /* We are transliterating, don't try to correct anything.  */
> +	    return __GCONV_ILLEGAL_INPUT;
> +
> +	  if (flags & __GCONV_IGNORE_ERRORS)
> +	    {
> +	      /* Just ignore this character.  */
> +	      ++*irreversible;
> +	      inptr += 4;
> +	      continue;
> +	    }
> +
> +	  *inptrp = inptr;
> +	  *outptrp = outptr;
> +	  return __GCONV_ILLEGAL_INPUT;
> +	}
> +    }
> +  while (len > 0);
> +
> +  *inptrp = inptr;
> +  *outptrp = outptr;
> +
> +  /* Determine the status.  */
> +  if (*inptrp == inend)
> +    result = __GCONV_EMPTY_INPUT;
> +  else if (*outptrp + 4 > outend)
> +    result = __GCONV_FULL_OUTPUT;
> +  else
> +    result = __GCONV_INCOMPLETE_INPUT;
> +
> +  return result;
> +}
> +
> +ICONV_VX_SINGLE (ucs4_internal_loop)
> +# include <iconv/skeleton.c>
> +ICONV_VX_IFUNC (__gconv_transform_ucs4_internal)
> +
> +
> +/* Transform from UCS4-LE to the internal encoding.  */
> +# define DEFINE_INIT		0
> +# define DEFINE_FINI		0
> +# define MIN_NEEDED_FROM	4
> +# define MIN_NEEDED_TO		4
> +# define FROM_DIRECTION		1
> +# define FROM_LOOP		ICONV_VX_NAME (ucs4le_internal_loop)
> +# define TO_LOOP		ICONV_VX_NAME (ucs4le_internal_loop) /* This is not used.  */
> +# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4le_internal)
> +# define ONE_DIRECTION		0
> +
> +static inline int
> +__attribute ((always_inline))
> +ICONV_VX_NAME (ucs4le_internal_loop) (struct __gconv_step *step,
> +				      struct __gconv_step_data *step_data,
> +				      const unsigned char **inptrp,
> +				      const unsigned char *inend,
> +				      unsigned char **outptrp,
> +				      unsigned char *outend,
> +				      size_t *irreversible)
> +{
> +  int flags = step_data->__flags;
> +  const unsigned char *inptr = *inptrp;
> +  unsigned char *outptr = *outptrp;
> +  int result;
> +  size_t len, loop_count;
> +  do
> +    {
> +      len = MIN (inend - inptr, outend - outptr) / 4;
> +      __asm__ volatile (".machine push\n\t"
> +			".machine \"z13\"\n\t"
> +			".machinemode \"zarch_nohighgprs\"\n\t"
> +			CONVERT_32BIT_SIZE_T ([R_LEN])
> +			/* Setup to check for ch > 0x7fffffff.  */
> +			"larl %[R_LI],9f\n\t"
> +			"vlm %%v20,%%v22,0(%[R_LI])\n\t"
> +			"srlg %[R_LI],%[R_LEN],2\n\t"
> +			"clgije %[R_LI],0,1f\n\t"
> +			/* Process 16byte (4char) blocks.  */
> +			"0: vl %%v16,0(%[R_IN])\n\t"
> +			"vperm %%v16,%%v16,%%v16,%%v22\n\t"
> +			"vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
> +			"jno 10f\n\t"
> +			"vst %%v16,0(%[R_OUT])\n\t"
> +			"la %[R_IN],16(%[R_IN])\n\t"
> +			"la %[R_OUT],16(%[R_OUT])\n\t"
> +			"brctg %[R_LI],0b\n\t"
> +			"llgfr %[R_LEN],%[R_LEN]\n\t"
> +			"nilf %[R_LEN],3\n\t"
> +			/* Process <16bytes.  */
> +			"1: sll %[R_LEN],2\n\t"
> +			"ahik %[R_LI],%[R_LEN],-1\n\t"
> +			"jl 20f\n\t" /* No further bytes available.  */
> +			"vll %%v16,%[R_LI],0(%[R_IN])\n\t"
> +			"vperm %%v16,%%v16,%%v16,%%v22\n\t"
> +			"vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
> +			"vlgvb %[R_LI],%%v23,7\n\t"
> +			"clr %[R_LI],%[R_LEN]\n\t"
> +			"locgrhe %[R_LI],%[R_LEN]\n\t"
> +			"locghihe %[R_LEN],0\n\t"
> +			"j 11f\n\t"
> +			/* v20: Vector string range compare values.  */
> +			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
> +			/* v21: Vector string range compare control-bits.
> +			   element 0: >; element 1: =<> (always true)  */
> +			".long 0x20000000,0xE0000000,0x0,0x0\n\t"
> +			/* v22: Vector permute mask.  */
> +			".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
> +			/* Found a value > 0x7fffffff.  */
> +			"10: vlgvb %[R_LI],%%v23,7\n\t"
> +			/* Store characters before invalid one.  */
> +			"11: aghi %[R_LI],-1\n\t"
> +			"jl 20f\n\t"
> +			"vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
> +			"la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
> +			"la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
> +			"20:\n\t"
> +			".machine pop"
> +			: /* outputs */ [R_OUT] "+a" (outptr)
> +			  , [R_IN] "+a" (inptr)
> +			  , [R_LI] "=a" (loop_count)
> +			  , [R_LEN] "+d" (len)
> +			: /* inputs */
> +			: /* clobber list*/ "memory", "cc"
> +			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
> +			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
> +			  ASM_CLOBBER_VR ("v23")
> +			);
> +      if (len > 0)
> +	{
> +	  /* The value is too large.  We don't try transliteration here since
> +	     this is not an error because of the lack of possibilities to
> +	     represent the result.  This is a genuine bug in the input since
> +	     UCS4 does not allow such values.  */
> +	  if (irreversible == NULL)
> +	    /* We are transliterating, don't try to correct anything.  */
> +	    return __GCONV_ILLEGAL_INPUT;
> +
> +	  if (flags & __GCONV_IGNORE_ERRORS)
> +	    {
> +	      /* Just ignore this character.  */
> +	      ++*irreversible;
> +	      inptr += 4;
> +	      continue;
> +	    }
> +
> +	  *inptrp = inptr;
> +	  *outptrp = outptr;
> +	  return __GCONV_ILLEGAL_INPUT;
> +	}
> +    }
> +  while (len > 0);
> +
> +  *inptrp = inptr;
> +  *outptrp = outptr;
> +
> +  /* Determine the status.  */
> +  if (*inptrp == inend)
> +    result = __GCONV_EMPTY_INPUT;
> +  else if (*inptrp + 4 > inend)
> +    result = __GCONV_INCOMPLETE_INPUT;
> +  else
> +    {
> +      assert (*outptrp + 4 > outend);
> +      result = __GCONV_FULL_OUTPUT;
> +    }
> +
> +  return result;
> +}
> +ICONV_VX_SINGLE (ucs4le_internal_loop)
> +# include <iconv/skeleton.c>
> +ICONV_VX_IFUNC (__gconv_transform_ucs4le_internal)
> +
> +/* Convert from UCS2 to the internal (UCS4-like) format.  */
> +# define DEFINE_INIT		0
> +# define DEFINE_FINI		0
> +# define MIN_NEEDED_FROM	2
> +# define MIN_NEEDED_TO		4
> +# define FROM_DIRECTION		1
> +# define FROM_LOOP		ICONV_VX_NAME (ucs2_internal_loop)
> +# define TO_LOOP		ICONV_VX_NAME (ucs2_internal_loop) /* This is not used.  */
> +# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2_internal)
> +# define ONE_DIRECTION		1
> +
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		FROM_LOOP
> +# define BODY_ORIG_ERROR						\
> +  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
> +     them.  (Catching this here is not security relevant.)  */		\
> +  STANDARD_FROM_LOOP_ERR_HANDLER (2);
> +# define BODY_ORIG							\
> +  {									\
> +    uint16_t u1 = get16 (inptr);					\
> +									\
> +    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
> +      {									\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +									\
> +    *((uint32_t *) outptr) = u1;					\
> +    outptr += sizeof (uint32_t);					\
> +    inptr += 2;								\
> +  }
> +# define BODY								\
> +  {									\
> +    size_t len, tmp, tmp2;						\
> +    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
> +    __asm__ volatile (".machine push\n\t"				\
> +		      ".machine \"z13\"\n\t"				\
> +		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
> +		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
> +		      "larl %[R_TMP],9f\n\t"				\
> +		      "vlm %%v20,%%v21,0(%[R_TMP])\n\t"			\
> +		      "srlg %[R_TMP],%[R_LEN],3\n\t"			\
> +		      "clgije %[R_TMP],0,1f\n\t"			\
> +		      /* Process 16byte (8char) blocks.  */		\
> +		      "0: vl %%v16,0(%[R_IN])\n\t"			\
> +		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
> +		      /* Enlarge UCS2 to UCS4.  */			\
> +		      "vuplhh %%v17,%%v16\n\t"				\
> +		      "vupllh %%v18,%%v16\n\t"				\
> +		      "jno 10f\n\t"					\
> +		      /* Store 32bytes to buf_out.  */			\
> +		      "vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
> +		      "la %[R_IN],16(%[R_IN])\n\t"			\
> +		      "la %[R_OUT],32(%[R_OUT])\n\t"			\
> +		      "brctg %[R_TMP],0b\n\t"				\
> +		      "llgfr %[R_LEN],%[R_LEN]\n\t"			\
> +		      "nilf %[R_LEN],7\n\t"				\
> +		      /* Process <16bytes.  */				\
> +		      "1: sll %[R_LEN],1\n\t"				\
> +		      "ahik %[R_TMP],%[R_LEN],-1\n\t"			\
> +		      "jl 20f\n\t" /* No further bytes available.  */	\
> +		      "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
> +		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
> +		      /* Enlarge UCS2 to UCS4.  */			\
> +		      "vuplhh %%v17,%%v16\n\t"				\
> +		      "vupllh %%v18,%%v16\n\t"				\
> +		      "vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		      "clr %[R_TMP],%[R_LEN]\n\t"			\
> +		      "locgrhe %[R_TMP],%[R_LEN]\n\t"			\
> +		      "locghihe %[R_LEN],0\n\t"				\
> +		      "j 11f\n\t"					\
> +		      /* v20: Vector string range compare values.  */	\
> +		      "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
> +		      /* v21: Vector string range compare control-bits.	\
> +			 element 0: =>; element 1: <  */		\
> +		      ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
> +		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
> +		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
> +		      "sll %[R_TMP],1\n\t"				\
> +		      "lgr %[R_TMP2],%[R_TMP]\n\t"			\
> +		      "ahi %[R_TMP],-1\n\t"				\
> +		      "jl 20f\n\t"					\
> +		      "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
> +		      "ahi %[R_TMP],-16\n\t"				\
> +		      "jl 19f\n\t"					\
> +		      "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"		\
> +		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
> +		      "20:\n\t"						\
> +		      ".machine pop"					\
> +		      : /* outputs */ [R_OUT] "+a" (outptr)		\
> +			, [R_IN] "+a" (inptr)				\
> +			, [R_TMP] "=a" (tmp)				\
> +			, [R_TMP2] "=a" (tmp2)				\
> +			, [R_LEN] "+d" (len)				\
> +		      : /* inputs */					\
> +		      : /* clobber list*/ "memory", "cc"		\
> +			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +		      );						\
> +    if (len > 0)							\
> +      {									\
> +	/* Found an invalid character at next input-char.  */		\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +  }
> +
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +# include <iconv/skeleton.c>
> +# undef BODY_ORIG
> +# undef BODY_ORIG_ERROR
> +ICONV_VX_IFUNC (__gconv_transform_ucs2_internal)
> +
> +/* Convert from UCS2 in other endianness to the internal (UCS4-like) format. */
> +# define DEFINE_INIT		0
> +# define DEFINE_FINI		0
> +# define MIN_NEEDED_FROM	2
> +# define MIN_NEEDED_TO		4
> +# define FROM_DIRECTION		1
> +# define FROM_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop)
> +# define TO_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop) /* This is not used.*/
> +# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2reverse_internal)
> +# define ONE_DIRECTION		1
> +
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		FROM_LOOP
> +# define BODY_ORIG_ERROR						\
> +  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
> +     them.  (Catching this here is not security relevant.)  */		\
> +  if (! ignore_errors_p ())						\
> +    {									\
> +      result = __GCONV_ILLEGAL_INPUT;					\
> +      break;								\
> +    }									\
> +  inptr += 2;								\
> +  ++*irreversible;							\
> +  continue;
> +
> +# define BODY_ORIG \
> +  {									\
> +    uint16_t u1 = bswap_16 (get16 (inptr));				\
> +									\
> +    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
> +      {									\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +									\
> +    *((uint32_t *) outptr) = u1;					\
> +    outptr += sizeof (uint32_t);					\
> +    inptr += 2;								\
> +  }
> +# define BODY								\
> +  {									\
> +    size_t len, tmp, tmp2;						\
> +    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
> +    __asm__ volatile (".machine push\n\t"				\
> +		      ".machine \"z13\"\n\t"				\
> +		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
> +		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
> +		      "larl %[R_TMP],9f\n\t"				\
> +		      "vlm %%v20,%%v22,0(%[R_TMP])\n\t"			\
> +		      "srlg %[R_TMP],%[R_LEN],3\n\t"			\
> +		      "clgije %[R_TMP],0,1f\n\t"			\
> +		      /* Process 16byte (8char) blocks.  */		\
> +		      "0: vl %%v16,0(%[R_IN])\n\t"			\
> +		      "vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
> +		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
> +		      /* Enlarge UCS2 to UCS4.  */			\
> +		      "vuplhh %%v17,%%v16\n\t"				\
> +		      "vupllh %%v18,%%v16\n\t"				\
> +		      "jno 10f\n\t"					\
> +		      /* Store 32bytes to buf_out.  */			\
> +		      "vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
> +		      "la %[R_IN],16(%[R_IN])\n\t"			\
> +		      "la %[R_OUT],32(%[R_OUT])\n\t"			\
> +		      "brctg %[R_TMP],0b\n\t"				\
> +		      "llgfr %[R_LEN],%[R_LEN]\n\t"			\
> +		      "nilf %[R_LEN],7\n\t"				\
> +		      /* Process <16bytes.  */				\
> +		      "1: sll %[R_LEN],1\n\t"				\
> +		      "ahik %[R_TMP],%[R_LEN],-1\n\t"			\
> +		      "jl 20f\n\t" /* No further bytes available.  */	\
> +		      "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
> +		      "vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
> +		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
> +		      /* Enlarge UCS2 to UCS4.  */			\
> +		      "vuplhh %%v17,%%v16\n\t"				\
> +		      "vupllh %%v18,%%v16\n\t"				\
> +		      "vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		      "clr %[R_TMP],%[R_LEN]\n\t"			\
> +		      "locgrhe %[R_TMP],%[R_LEN]\n\t"			\
> +		      "locghihe %[R_LEN],0\n\t"				\
> +		      "j 11f\n\t"					\
> +		      /* v20: Vector string range compare values.  */	\
> +		      "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
> +		      /* v21: Vector string range compare control-bits.	\
> +			 element 0: =>; element 1: <  */		\
> +		      ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
> +		      /* v22: Vector permute mask.  */			\
> +		      ".short 0x0100,0x0302,0x0504,0x0706\n\t"		\
> +		      ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"		\
> +		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
> +		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
> +		      "sll %[R_TMP],1\n\t"				\
> +		      "lgr %[R_TMP2],%[R_TMP]\n\t"			\
> +		      "ahi %[R_TMP],-1\n\t"				\
> +		      "jl 20f\n\t"					\
> +		      "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
> +		      "ahi %[R_TMP],-16\n\t"				\
> +		      "jl 19f\n\t"					\
> +		      "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"		\
> +		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
> +		      "20:\n\t"						\
> +		      ".machine pop"					\
> +		      : /* outputs */ [R_OUT] "+a" (outptr)		\
> +			, [R_IN] "+a" (inptr)				\
> +			, [R_TMP] "=a" (tmp)				\
> +			, [R_TMP2] "=a" (tmp2)				\
> +			, [R_LEN] "+d" (len)				\
> +		      : /* inputs */					\
> +		      : /* clobber list*/ "memory", "cc"		\
> +			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +			ASM_CLOBBER_VR ("v22")				\
> +		      );						\
> +    if (len > 0)							\
> +      {									\
> +	/* Found an invalid character at next input-char.  */		\
> +	BODY_ORIG_ERROR							\
> +      }									\
> +  }
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +# include <iconv/skeleton.c>
> +# undef BODY_ORIG
> +# undef BODY_ORIG_ERROR
> +ICONV_VX_IFUNC (__gconv_transform_ucs2reverse_internal)
> +
> +/* Convert from the internal (UCS4-like) format to UCS2.  */
> +#define DEFINE_INIT		0
> +#define DEFINE_FINI		0
> +#define MIN_NEEDED_FROM		4
> +#define MIN_NEEDED_TO		2
> +#define FROM_DIRECTION		1
> +#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2_loop)
> +#define TO_LOOP			ICONV_VX_NAME (internal_ucs2_loop) /* This is not used.  */
> +#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2)
> +#define ONE_DIRECTION		1
> +
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define LOOPFCT			FROM_LOOP
> +#define BODY_ORIG							\
> +  {									\
> +    uint32_t val = *((const uint32_t *) inptr);				\
> +									\
> +    if (__glibc_unlikely (val >= 0x10000))				\
> +      {									\
> +	UNICODE_TAG_HANDLER (val, 4);					\
> +	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
> +      }									\
> +    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
> +      {									\
> +	/* Surrogate characters in UCS-4 input are not valid.		\
> +	   We must catch this, because the UCS-2 output might be	\
> +	   interpreted as UTF-16 by other programs.  If we let		\
> +	   surrogates pass through, attackers could make a security	\
> +	   hole exploit by synthesizing any desired plane 1-16		\
> +	   character.  */						\
> +	result = __GCONV_ILLEGAL_INPUT;					\
> +	if (! ignore_errors_p ())					\
> +	  break;							\
> +	inptr += 4;							\
> +	++*irreversible;						\
> +	continue;							\
> +      }									\
> +    else								\
> +      {									\
> +	put16 (outptr, val);						\
> +	outptr += sizeof (uint16_t);					\
> +	inptr += 4;							\
> +      }									\
> +  }
> +# define BODY								\
> +  {									\
> +    if (__builtin_expect (inend - inptr < 32, 1)			\
> +	|| outend - outptr < 16)					\
> +      /* Convert remaining bytes with c code.  */			\
> +      BODY_ORIG								\
> +    else								\
> +      {									\
> +	/* Convert in 32 byte blocks.  */				\
> +	size_t loop_count = (inend - inptr) / 32;			\
> +	size_t tmp, tmp2;						\
> +	if (loop_count > (outend - outptr) / 16)			\
> +	  loop_count = (outend - outptr) / 16;				\
> +	__asm__ volatile (".machine push\n\t"				\
> +			  ".machine \"z13\"\n\t"			\
> +			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
> +			  CONVERT_32BIT_SIZE_T ([R_LI])			\
> +			  "larl %[R_I],3f\n\t"				\
> +			  "vlm %%v20,%%v23,0(%[R_I])\n\t"		\
> +			  "0:\n\t"					\
> +			  "vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
> +			  /* Shorten UCS4 to UCS2.  */			\
> +			  "vpkf %%v18,%%v16,%%v17\n\t"			\
> +			  "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"		\
> +			  "jno 11f\n\t"					\
> +			  "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
> +			  "jno 10f\n\t"					\
> +			  /* Store 16bytes to buf_out.  */		\
> +			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
> +			  "la %[R_IN],32(%[R_IN])\n\t"			\
> +			  "la %[R_OUT],16(%[R_OUT])\n\t"		\
> +			  "brctg %[R_LI],0b\n\t"			\
> +			  "j 20f\n\t"					\
> +			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
> +			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
> +			  ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
> +			  /* Setup to check for ch >= 0xe000		\
> +			     && ch < 0x10000. (v22,v23)  */		\
> +			  ".long 0xe000,0x10000,0x0,0x0\n\t"		\
> +			  ".long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
> +			  /* v16 contains only valid chars. Check in v17: \
> +			     ch >= 0xe000 && ch <= 0xffff.  */		\
> +			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
> +			  "jo 2b\n\t" /* All ch's in this range, proceed.   */ \
> +			  "lhi %[R_TMP],16\n\t"				\
> +			  "j 12f\n\t"					\
> +			  /* Maybe v16 contains invalid chars.		\
> +			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
> +			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
> +			  "jo 1b\n\t" /* All ch's in this range, proceed.   */ \
> +			  "lhi %[R_TMP],0\n\t"				\
> +			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
> +			  "agr %[R_I],%[R_TMP]\n\t"			\
> +			  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
> +			  "srl %[R_I],1\n\t"				\
> +			  "ahi %[R_I],-1\n\t"				\
> +			  "jl 20f\n\t"					\
> +			  "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"		\
> +			  "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
> +			  "20:\n\t"					\
> +			  ".machine pop"				\
> +			  : /* outputs */ [R_OUT] "+a" (outptr)		\
> +			    , [R_IN] "+a" (inptr)			\
> +			    , [R_LI] "+d" (loop_count)			\
> +			    , [R_I] "=a" (tmp2)				\
> +			    , [R_TMP] "=d" (tmp)			\
> +			  : /* inputs */				\
> +			  : /* clobber list*/ "memory", "cc"		\
> +			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
> +			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
> +			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
> +			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
> +			  );						\
> +	if (loop_count > 0)						\
> +	  {								\
> +	    /* Found an invalid character at next character.  */	\
> +	    BODY_ORIG							\
> +	  }								\
> +      }									\
> +  }
> +#define LOOP_NEED_FLAGS
> +#include <iconv/loop.c>
> +#include <iconv/skeleton.c>
> +# undef BODY_ORIG
> +ICONV_VX_IFUNC (__gconv_transform_internal_ucs2)
> +
> +/* Convert from the internal (UCS4-like) format to UCS2 in other endianness. */
> +#define DEFINE_INIT		0
> +#define DEFINE_FINI		0
> +#define MIN_NEEDED_FROM		4
> +#define MIN_NEEDED_TO		2
> +#define FROM_DIRECTION		1
> +#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2reverse_loop)
> +#define TO_LOOP			ICONV_VX_NAME (internal_ucs2reverse_loop)/* This is not used.*/
> +#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2reverse)
> +#define ONE_DIRECTION		1
> +
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define LOOPFCT			FROM_LOOP
> +#define BODY_ORIG							\
> +  {									\
> +    uint32_t val = *((const uint32_t *) inptr);				\
> +    if (__glibc_unlikely (val >= 0x10000))				\
> +      {									\
> +	UNICODE_TAG_HANDLER (val, 4);					\
> +	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
> +      }									\
> +    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
> +      {									\
> +	/* Surrogate characters in UCS-4 input are not valid.		\
> +	   We must catch this, because the UCS-2 output might be	\
> +	   interpreted as UTF-16 by other programs.  If we let		\
> +	   surrogates pass through, attackers could make a security	\
> +	   hole exploit by synthesizing any desired plane 1-16		\
> +	   character.  */						\
> +	if (! ignore_errors_p ())					\
> +	  {								\
> +	    result = __GCONV_ILLEGAL_INPUT;				\
> +	    break;							\
> +	  }								\
> +	inptr += 4;							\
> +	++*irreversible;						\
> +	continue;							\
> +      }									\
> +    else								\
> +      {									\
> +	put16 (outptr, bswap_16 (val));					\
> +	outptr += sizeof (uint16_t);					\
> +	inptr += 4;							\
> +      }									\
> +  }
> +# define BODY								\
> +  {									\
> +    if (__builtin_expect (inend - inptr < 32, 1)			\
> +	|| outend - outptr < 16)					\
> +      /* Convert remaining bytes with c code.  */			\
> +      BODY_ORIG								\
> +    else								\
> +      {									\
> +	/* Convert in 32 byte blocks.  */				\
> +	size_t loop_count = (inend - inptr) / 32;			\
> +	size_t tmp, tmp2;						\
> +	if (loop_count > (outend - outptr) / 16)			\
> +	  loop_count = (outend - outptr) / 16;				\
> +	__asm__ volatile (".machine push\n\t"				\
> +			  ".machine \"z13\"\n\t"			\
> +			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
> +			  CONVERT_32BIT_SIZE_T ([R_LI])			\
> +			  "larl %[R_I],3f\n\t"				\
> +			  "vlm %%v20,%%v24,0(%[R_I])\n\t"		\
> +			  "0:\n\t"					\
> +			  "vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
> +			  /* Shorten UCS4 to UCS2 and byteswap.  */	\
> +			  "vpkf %%v18,%%v16,%%v17\n\t"			\
> +			  "vperm %%v18,%%v18,%%v18,%%v24\n\t"		\
> +			  "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"		\
> +			  "jno 11f\n\t"					\
> +			  "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
> +			  "jno 10f\n\t"					\
> +			  /* Store 16bytes to buf_out.  */		\
> +			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
> +			  "la %[R_IN],32(%[R_IN])\n\t"			\
> +			  "la %[R_OUT],16(%[R_OUT])\n\t"		\
> +			  "brctg %[R_LI],0b\n\t"			\
> +			  "j 20f\n\t"					\
> +			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
> +			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
> +			  ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
> +			  /* Setup to check for ch >= 0xe000		\
> +			     && ch < 0x10000. (v22,v23)  */		\
> +			  ".long 0xe000,0x10000,0x0,0x0\n\t"		\
> +			  ".long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
> +			  /* Vector permute mask (v24)  */		\
> +			  ".short 0x0100,0x0302,0x0504,0x0706\n\t"	\
> +			  ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"	\
> +			  /* v16 contains only valid chars. Check in v17: \
> +			     ch >= 0xe000 && ch <= 0xffff.  */		\
> +			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
> +			  "jo 2b\n\t" /* All ch's in this range, proceed.  */ \
> +			  "lhi %[R_TMP],16\n\t"				\
> +			  "j 12f\n\t"					\
> +			  /* Maybe v16 contains invalid chars.		\
> +			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
> +			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
> +			  "jo 1b\n\t" /* All ch's in this range, proceed.  */ \
> +			  "lhi %[R_TMP],0\n\t"				\
> +			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
> +			  "agr %[R_I],%[R_TMP]\n\t"			\
> +			  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
> +			  "srl %[R_I],1\n\t"				\
> +			  "ahi %[R_I],-1\n\t"				\
> +			  "jl 20f\n\t"					\
> +			  "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"		\
> +			  "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
> +			  "20:\n\t"					\
> +			  ".machine pop"				\
> +			  : /* outputs */ [R_OUT] "+a" (outptr)		\
> +			    , [R_IN] "+a" (inptr)			\
> +			    , [R_LI] "+d" (loop_count)			\
> +			    , [R_I] "=a" (tmp2)				\
> +			    , [R_TMP] "=d" (tmp)			\
> +			  : /* inputs */				\
> +			  : /* clobber list*/ "memory", "cc"		\
> +			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
> +			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
> +			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
> +			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
> +			    ASM_CLOBBER_VR ("v24")			\
> +			  );						\
> +	if (loop_count > 0)						\
> +	  {								\
> +	    /* Found an invalid character at next character.  */	\
> +	    BODY_ORIG							\
> +	  }								\
> +      }									\
> +  }
> +#define LOOP_NEED_FLAGS
> +#include <iconv/loop.c>
> +#include <iconv/skeleton.c>
> +# undef BODY_ORIG
> +ICONV_VX_IFUNC (__gconv_transform_internal_ucs2reverse)
> +
> +
> +#else
> +/* Generate the internal transformations without ifunc if build environment
> +   lacks vector support. Instead simply include the common version.  */
> +# include <iconv/gconv_simple.c>
> +#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
>

[-- Attachment #2: 0005-S390-Optimize-builtin-iconv-modules.patch --]
[-- Type: text/x-patch, Size: 47798 bytes --]

From e6fc53b88ae221b591840c9d0c4849efe64cb185 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Tue, 23 Feb 2016 09:27:45 +0100
Subject: [PATCH 05/14] S390: Optimize builtin iconv-modules.

This patch introduces a s390 specific gconv_simple.c file which provides
optimized versions for z13 with vector instructions, which will be chosen at
runtime via ifunc.
The optimized conversions can convert between internal and ascii, ucs4, ucs4le,
ucs2, ucs2le.
If the build-environment lacks vector support, then iconv/gconv_simple.c
is used wihtout any change. Otherwise iconvdata/gconv_simple.c is used to create
conversion loop routines without vector instructions as fallback, if vector
instructions aren't available at runtime.

ChangeLog:

	* sysdeps/s390/multiarch/gconv_simple.c: New File.
	* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add gconv_simple.
---
 sysdeps/s390/multiarch/Makefile       |    4 +
 sysdeps/s390/multiarch/gconv_simple.c | 1266 +++++++++++++++++++++++++++++++++
 2 files changed, 1270 insertions(+)
 create mode 100644 sysdeps/s390/multiarch/gconv_simple.c

diff --git a/sysdeps/s390/multiarch/Makefile b/sysdeps/s390/multiarch/Makefile
index 0805b07..5067b6f 100644
--- a/sysdeps/s390/multiarch/Makefile
+++ b/sysdeps/s390/multiarch/Makefile
@@ -42,3 +42,7 @@ sysdep_routines += wcslen wcslen-vx wcslen-c \
 		   wmemset wmemset-vx wmemset-c \
 		   wmemcmp wmemcmp-vx wmemcmp-c
 endif
+
+ifeq ($(subdir),iconv)
+sysdep_routines += gconv_simple
+endif
diff --git a/sysdeps/s390/multiarch/gconv_simple.c b/sysdeps/s390/multiarch/gconv_simple.c
new file mode 100644
index 0000000..ab692f7
--- /dev/null
+++ b/sysdeps/s390/multiarch/gconv_simple.c
@@ -0,0 +1,1266 @@
+/* Simple transformations functions - s390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# include <ifunc-resolve.h>
+
+# if defined HAVE_S390_VX_GCC_SUPPORT
+#  define ASM_CLOBBER_VR(NR) , NR
+# else
+#  define ASM_CLOBBER_VR(NR)
+# endif
+
+# define ICONV_C_NAME(NAME) __##NAME##_c
+# define ICONV_VX_NAME(NAME) __##NAME##_vx
+# define ICONV_VX_IFUNC(FUNC)						\
+  extern __typeof (ICONV_C_NAME (FUNC)) __##FUNC;			\
+  s390_vx_libc_ifunc (__##FUNC)						\
+  int FUNC (struct __gconv_step *step, struct __gconv_step_data *data,	\
+	    const unsigned char **inptrp, const unsigned char *inend,	\
+	    unsigned char **outbufstart, size_t *irreversible,		\
+	    int do_flush, int consume_incomplete)			\
+  {									\
+    return __##FUNC (step, data, inptrp, inend,outbufstart,		\
+		     irreversible, do_flush, consume_incomplete);	\
+  }
+# define ICONV_VX_SINGLE(NAME)						\
+  static __typeof (NAME##_single) __##NAME##_vx_single __attribute__((alias(#NAME "_single")));
+
+/* Generate the transformations which are used, if the target machine does not
+   support vector instructions.  */
+# define __gconv_transform_ascii_internal		\
+  ICONV_C_NAME (__gconv_transform_ascii_internal)
+# define __gconv_transform_internal_ascii		\
+  ICONV_C_NAME (__gconv_transform_internal_ascii)
+# define __gconv_transform_internal_ucs4le		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs4le)
+# define __gconv_transform_ucs4_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4_internal)
+# define __gconv_transform_ucs4le_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4le_internal)
+# define __gconv_transform_ucs2_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2_internal)
+# define __gconv_transform_ucs2reverse_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2reverse_internal)
+# define __gconv_transform_internal_ucs2		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2)
+# define __gconv_transform_internal_ucs2reverse		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2reverse)
+
+
+# include <iconv/gconv_simple.c>
+
+# undef __gconv_transform_ascii_internal
+# undef __gconv_transform_internal_ascii
+# undef __gconv_transform_internal_ucs4le
+# undef __gconv_transform_ucs4_internal
+# undef __gconv_transform_ucs4le_internal
+# undef __gconv_transform_ucs2_internal
+# undef __gconv_transform_ucs2reverse_internal
+# undef __gconv_transform_internal_ucs2
+# undef __gconv_transform_internal_ucs2reverse
+
+/* Now define the functions with vector support.  */
+# if defined __s390x__
+#  define CONVERT_32BIT_SIZE_T(REG)
+# else
+#  define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+# endif
+
+/* Convert from ISO 646-IRV to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	1
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ascii_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ascii_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ascii_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+    /* The value is too large.  We don't try transliteration here since \
+       this is not an error because of the lack of possibilities to	\
+       represent the result.  This is a genuine bug in the input since	\
+       ASCII does not allow such values.  */				\
+    STANDARD_FROM_LOOP_ERR_HANDLER (1);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*inptr > '\x7f'))				\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*((uint32_t *) outptr) = *inptr++;				\
+	outptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = inend - inptr;						\
+    if (len > (outend - outptr) / 4)					\
+      len = (outend - outptr) / 4;					\
+    size_t loop_count, tmp;						\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		      "srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "vrepib %%v31,0x20\n\t"				\
+		      "clgije %[R_LI],0,1f\n\t"				\
+		      "0:\n\t" /* Handle 16-byte blocks.  */		\
+		      "vl %%v16,0(%[R_IN])\n\t"				\
+		      /* Checking for values > 0x7f.  */		\
+		      "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "jno 10f\n\t"					\
+		      /* Enlarge to UCS4.  */				\
+		      "vuplhb %%v17,%%v16\n\t"				\
+		      "vupllb %%v18,%%v16\n\t"				\
+		      "vuplhh %%v19,%%v17\n\t"				\
+		      "vupllh %%v20,%%v17\n\t"				\
+		      "vuplhh %%v21,%%v18\n\t"				\
+		      "vupllh %%v22,%%v18\n\t"				\
+		      /* Store 64bytes to buf_out.  */			\
+		      "vstm %%v19,%%v22,0(%[R_OUT])\n\t"		\
+		      "la %[R_IN],16(%[R_IN])\n\t"			\
+		      "la %[R_OUT],64(%[R_OUT])\n\t"			\
+		      "brctg %[R_LI],0b\n\t"				\
+		      "lghi %[R_LI],15\n\t"				\
+		      "ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: aghik %[R_LI],%[R_LEN],-1\n\t"		\
+		      "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      "vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      /* Checking for values > 0x7f.  */		\
+		      "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "clr %[R_TMP],%[R_LI]\n\t"			\
+		      "locrh %[R_TMP],%[R_LEN]\n\t"			\
+		      "locghih %[R_LEN],0\n\t"				\
+		      "j 12f\n\t"					\
+		      "10:\n\t"						\
+		      /* Found a value > 0x7f.				\
+			 Store the preceding chars.  */			\
+		      "vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "12: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "sllk %[R_TMP],%[R_TMP],2\n\t"			\
+		      "ahi %[R_TMP],-1\n\t"				\
+		      "jl 20f\n\t"					\
+		      "lgr %[R_LI],%[R_TMP]\n\t"			\
+		      "vuplhb %%v17,%%v16\n\t"				\
+		      "vuplhh %%v19,%%v17\n\t"				\
+		      "vstl %%v19,%[R_LI],0(%[R_OUT])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 11f\n\t"					\
+		      "vupllh %%v20,%%v17\n\t"				\
+		      "vstl %%v20,%[R_LI],16(%[R_OUT])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 11f\n\t"					\
+		      "vupllb %%v18,%%v16\n\t"				\
+		      "vuplhh %%v21,%%v18\n\t"				\
+		      "vstl %%v21,%[R_LI],32(%[R_OUT])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 11f\n\t"					\
+		      "vupllh %%v22,%%v18\n\t"				\
+		      "vstl %%v22,%[R_LI],48(%[R_OUT])\n\t"		\
+		      "11:\n\t"						\
+		      "la %[R_OUT],1(%[R_TMP],%[R_OUT])\n\t"		\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_TMP] "=a" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+			ASM_CLOBBER_VR ("v31")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at the next input byte.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ascii_internal)
+
+/* Convert from the internal (UCS4-like) format to ISO 646-IRV.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		1
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ascii_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ascii_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ascii)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  UNICODE_TAG_HANDLER (*((const uint32_t *) inptr), 4);			\
+  STANDARD_TO_LOOP_ERR_HANDLER (4);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*((const uint32_t *) inptr) > 0x7f))		\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*outptr++ = *((const uint32_t *) inptr);			\
+	inptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = (inend - inptr) / 4;					\
+    if (len > outend - outptr)						\
+      len = outend - outptr;						\
+    size_t loop_count, tmp, tmp2;					\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch > 0x7f.  */		\
+		      "vzero %%v21\n\t"					\
+		      "srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		      "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		      "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		      "lghi %[R_TMP],0\n\t"				\
+		      "clgije %[R_LI],0,1f\n\t"				\
+		      "0:\n\t"						\
+		      "vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		      /* Shorten to byte values.  */			\
+		      "vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "vpkh %%v23,%%v23,%%v24\n\t"			\
+		      /* Checking for values > 0x7f.  */		\
+		      "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "jno 10f\n\t"					\
+		      "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "jno 11f\n\t"					\
+		      "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "jno 12f\n\t"					\
+		      "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "jno 13f\n\t"					\
+		      /* Store 16bytes to outptr.  */			\
+		      "vst %%v23,0(%[R_OUT])\n\t"			\
+		      "la %[R_IN],64(%[R_IN])\n\t"			\
+		      "la %[R_OUT],16(%[R_OUT])\n\t"			\
+		      "brctg %[R_LI],0b\n\t"				\
+		      "lghi %[R_LI],15\n\t"				\
+		      "ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "aghi %[R_LI],-1\n\t"				\
+		      "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Load remaining 1...63 bytes.  */		\
+		      "vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 2f\n\t"					\
+		      "vll %%v17,%[R_LI],16(%[R_IN])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 2f\n\t"					\
+		      "vll %%v18,%[R_LI],32(%[R_IN])\n\t"		\
+		      "ahi %[R_LI],-16\n\t"				\
+		      "jl 2f\n\t"					\
+		      "vll %%v19,%[R_LI],48(%[R_IN])\n\t"		\
+		      "2:\n\t"						\
+		      /* Shorten to byte values.  */			\
+		      "vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "vpkh %%v23,%%v23,%%v24\n\t"			\
+		      "sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "aghi %[R_LI],-16\n\t"				\
+		      "jl 3f\n\t" /* v16 is not fully loaded.  */	\
+		      "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "jno 10f\n\t"					\
+		      "aghi %[R_LI],-16\n\t"				\
+		      "jl 4f\n\t" /* v17 is not fully loaded.  */	\
+		      "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "jno 11f\n\t"					\
+		      "aghi %[R_LI],-16\n\t"				\
+		      "jl 5f\n\t" /* v18 is not fully loaded.  */	\
+		      "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "jno 12f\n\t"					\
+		      "aghi %[R_LI],-16\n\t"				\
+		      /* v19 is not fully loaded. */			\
+		      "lghi %[R_TMP],12\n\t"				\
+		      "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "6: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "aghi %[R_LI],16\n\t"				\
+		      "clrjl %[R_I],%[R_LI],14f\n\t"			\
+		      "lgr %[R_I],%[R_LEN]\n\t"				\
+		      "lghi %[R_LEN],0\n\t"				\
+		      "j 15f\n\t"					\
+		      "3: vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "j 6b\n\t"					\
+		      "4: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "lghi %[R_TMP],4\n\t"				\
+		      "j 6b\n\t"					\
+		      "5: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "lghi %[R_TMP],8\n\t"				\
+		      "j 6b\n\t"					\
+		      /* Found a value > 0x7f.  */			\
+		      "13: ahi %[R_TMP],4\n\t"				\
+		      "12: ahi %[R_TMP],4\n\t"				\
+		      "11: ahi %[R_TMP],4\n\t"				\
+		      "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "14: srlg %[R_I],%[R_I],2\n\t"			\
+		      "agr %[R_I],%[R_TMP]\n\t"				\
+		      "je 20f\n\t"					\
+		      /* Store characters before invalid one...  */	\
+		      "15: aghi %[R_I],-1\n\t"				\
+		      "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		      /* ... and update pointers.  */			\
+		      "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+		      "sllg %[R_I],%[R_I],2\n\t"			\
+		      "la %[R_IN],4(%[R_I],%[R_IN])\n\t"		\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_I] "=a" (tmp2)				\
+			, [R_TMP] "=d" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+			ASM_CLOBBER_VR ("v24")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character > 0x7f at next character.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_internal_ascii)
+
+
+/* Convert from internal UCS4 to UCS4 little endian form.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ucs4le_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ucs4le_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs4le)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (internal_ucs4le_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len = MIN (inend - inptr, outend - outptr) / 4;
+  size_t loop_count;
+  __asm__ volatile (".machine push\n\t"
+		    ".machine \"z13\"\n\t"
+		    ".machinemode \"zarch_nohighgprs\"\n\t"
+		    CONVERT_32BIT_SIZE_T ([R_LEN])
+		    "bras %[R_LI],1f\n\t"
+		    /* Vector permute mask:  */
+		    ".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+		    "1: vl %%v20,0(%[R_LI])\n\t"
+		    /* Process 64byte (16char) blocks.  */
+		    "srlg %[R_LI],%[R_LEN],4\n\t"
+		    "clgije %[R_LI],0,10f\n\t"
+		    "0: vlm %%v16,%%v19,0(%[R_IN])\n\t"
+		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "vperm %%v17,%%v17,%%v17,%%v20\n\t"
+		    "vperm %%v18,%%v18,%%v18,%%v20\n\t"
+		    "vperm %%v19,%%v19,%%v19,%%v20\n\t"
+		    "vstm %%v16,%%v19,0(%[R_OUT])\n\t"
+		    "la %[R_IN],64(%[R_IN])\n\t"
+		    "la %[R_OUT],64(%[R_OUT])\n\t"
+		    "brctg %[R_LI],0b\n\t"
+		    "llgfr %[R_LEN],%[R_LEN]\n\t"
+		    "nilf %[R_LEN],15\n\t"
+		    /* Process 16byte (4char) blocks.  */
+		    "10: srlg %[R_LI],%[R_LEN],2\n\t"
+		    "clgije %[R_LI],0,20f\n\t"
+		    "11: vl %%v16,0(%[R_IN])\n\t"
+		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "vst %%v16,0(%[R_OUT])\n\t"
+		    "la %[R_IN],16(%[R_IN])\n\t"
+		    "la %[R_OUT],16(%[R_OUT])\n\t"
+		    "brctg %[R_LI],11b\n\t"
+		    "nill %[R_LEN],3\n\t"
+		    /* Process <16bytes.  */
+		    "20: sll %[R_LEN],2\n\t"
+		    "ahi %[R_LEN],-1\n\t"
+		    "jl 30f\n\t"
+		    "vll %%v16,%[R_LEN],0(%[R_IN])\n\t"
+		    "vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "vstl %%v16,%[R_LEN],0(%[R_OUT])\n\t"
+		    "la %[R_IN],1(%[R_LEN],%[R_IN])\n\t"
+		    "la %[R_OUT],1(%[R_LEN],%[R_OUT])\n\t"
+		    "30: \n\t"
+		    ".machine pop"
+		    : /* outputs */ [R_OUT] "+a" (outptr)
+		      , [R_IN] "+a" (inptr)
+		      , [R_LI] "=a" (loop_count)
+		      , [R_LEN] "+a" (len)
+		    : /* inputs */
+		    : /* clobber list*/ "memory", "cc"
+		      ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")
+		      ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")
+		      ASM_CLOBBER_VR ("v20")
+		    );
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (internal_ucs4le_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs4le)
+
+
+/* Transform from UCS4 to the internal, UCS4-like format.  Unlike
+   for the other direction we have to check for correct values here.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4_internal)
+# define ONE_DIRECTION		0
+
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4_internal_loop) (struct __gconv_step *step,
+				    struct __gconv_step_data *step_data,
+				    const unsigned char **inptrp,
+				    const unsigned char *inend,
+				    unsigned char **outptrp,
+				    unsigned char *outend,
+				    size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"larl %[R_LI],9f\n\t"
+			"vlm %%v20,%%v21,0(%[R_LI])\n\t"
+			"srlg %[R_LI],%[R_LEN],2\n\t"
+			"clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0: vl %%v16,0(%[R_IN])\n\t"
+			"vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"jno 10f\n\t"
+			"vst %%v16,0(%[R_OUT])\n\t"
+			"la %[R_IN],16(%[R_IN])\n\t"
+			"la %[R_OUT],16(%[R_OUT])\n\t"
+			"brctg %[R_LI],0b\n\t"
+			"llgfr %[R_LEN],%[R_LEN]\n\t"
+			"nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1: sll %[R_LEN],2\n\t"
+			"ahik %[R_LI],%[R_LEN],-1\n\t"
+			"jl 20f\n\t" /* No further bytes available.  */
+			"vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"vlgvb %[R_LI],%%v22,7\n\t"
+			"clr %[R_LI],%[R_LEN]\n\t"
+			"locgrhe %[R_LI],%[R_LEN]\n\t"
+			"locghihe %[R_LEN],0\n\t"
+			"j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			".long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v22,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"jl 20f\n\t"
+			"vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (ucs4_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4_internal)
+
+
+/* Transform from UCS4-LE to the internal encoding.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4le_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4le_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4le_internal)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4le_internal_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"larl %[R_LI],9f\n\t"
+			"vlm %%v20,%%v22,0(%[R_LI])\n\t"
+			"srlg %[R_LI],%[R_LEN],2\n\t"
+			"clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0: vl %%v16,0(%[R_IN])\n\t"
+			"vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"jno 10f\n\t"
+			"vst %%v16,0(%[R_OUT])\n\t"
+			"la %[R_IN],16(%[R_IN])\n\t"
+			"la %[R_OUT],16(%[R_OUT])\n\t"
+			"brctg %[R_LI],0b\n\t"
+			"llgfr %[R_LEN],%[R_LEN]\n\t"
+			"nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1: sll %[R_LEN],2\n\t"
+			"ahik %[R_LI],%[R_LEN],-1\n\t"
+			"jl 20f\n\t" /* No further bytes available.  */
+			"vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"vlgvb %[R_LI],%%v23,7\n\t"
+			"clr %[R_LI],%[R_LEN]\n\t"
+			"locgrhe %[R_LI],%[R_LEN]\n\t"
+			"locghihe %[R_LEN],0\n\t"
+			"j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			".long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* v22: Vector permute mask.  */
+			".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v23,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"jl 20f\n\t"
+			"vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			  ASM_CLOBBER_VR ("v23")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*inptrp + 4 > inend)
+    result = __GCONV_INCOMPLETE_INPUT;
+  else
+    {
+      assert (*outptrp + 4 > outend);
+      result = __GCONV_FULL_OUTPUT;
+    }
+
+  return result;
+}
+ICONV_VX_SINGLE (ucs4le_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4le_internal)
+
+/* Convert from UCS2 to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  STANDARD_FROM_LOOP_ERR_HANDLER (2);
+# define BODY_ORIG							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "larl %[R_TMP],9f\n\t"				\
+		      "vlm %%v20,%%v21,0(%[R_TMP])\n\t"			\
+		      "srlg %[R_TMP],%[R_LEN],3\n\t"			\
+		      "clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0: vl %%v16,0(%[R_IN])\n\t"			\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "la %[R_IN],16(%[R_IN])\n\t"			\
+		      "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		      "brctg %[R_TMP],0b\n\t"				\
+		      "llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1: sll %[R_LEN],1\n\t"				\
+		      "ahik %[R_TMP],%[R_LEN],-1\n\t"			\
+		      "jl 20f\n\t" /* No further bytes available.  */	\
+		      "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "locgrhe %[R_TMP],%[R_LEN]\n\t"			\
+		      "locghihe %[R_LEN],0\n\t"				\
+		      "j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "sll %[R_TMP],1\n\t"				\
+		      "lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "ahi %[R_TMP],-1\n\t"				\
+		      "jl 20f\n\t"					\
+		      "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "ahi %[R_TMP],-16\n\t"				\
+		      "jl 19f\n\t"					\
+		      "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"		\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2_internal)
+
+/* Convert from UCS2 in other endianness to the internal (UCS4-like) format. */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop) /* This is not used.*/
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2reverse_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  if (! ignore_errors_p ())						\
+    {									\
+      result = __GCONV_ILLEGAL_INPUT;					\
+      break;								\
+    }									\
+  inptr += 2;								\
+  ++*irreversible;							\
+  continue;
+
+# define BODY_ORIG \
+  {									\
+    uint16_t u1 = bswap_16 (get16 (inptr));				\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "larl %[R_TMP],9f\n\t"				\
+		      "vlm %%v20,%%v22,0(%[R_TMP])\n\t"			\
+		      "srlg %[R_TMP],%[R_LEN],3\n\t"			\
+		      "clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0: vl %%v16,0(%[R_IN])\n\t"			\
+		      "vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "la %[R_IN],16(%[R_IN])\n\t"			\
+		      "la %[R_OUT],32(%[R_OUT])\n\t"			\
+		      "brctg %[R_TMP],0b\n\t"				\
+		      "llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1: sll %[R_LEN],1\n\t"				\
+		      "ahik %[R_TMP],%[R_LEN],-1\n\t"			\
+		      "jl 20f\n\t" /* No further bytes available.  */	\
+		      "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "vuplhh %%v17,%%v16\n\t"				\
+		      "vupllh %%v18,%%v16\n\t"				\
+		      "vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "locgrhe %[R_TMP],%[R_LEN]\n\t"			\
+		      "locghihe %[R_LEN],0\n\t"				\
+		      "j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v22: Vector permute mask.  */			\
+		      ".short 0x0100,0x0302,0x0504,0x0706\n\t"		\
+		      ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"		\
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "sll %[R_TMP],1\n\t"				\
+		      "lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "ahi %[R_TMP],-1\n\t"				\
+		      "jl 20f\n\t"					\
+		      "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "ahi %[R_TMP],-16\n\t"				\
+		      "jl 19f\n\t"					\
+		      "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"		\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2reverse_internal)
+
+/* Convert from the internal (UCS4-like) format to UCS2.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2_loop) /* This is not used.  */
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+									\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	result = __GCONV_ILLEGAL_INPUT;					\
+	if (! ignore_errors_p ())					\
+	  break;							\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, val);						\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "larl %[R_I],3f\n\t"				\
+			  "vlm %%v20,%%v23,0(%[R_I])\n\t"		\
+			  "0:\n\t"					\
+			  "vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2.  */			\
+			  "vpkf %%v18,%%v16,%%v17\n\t"			\
+			  "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"		\
+			  "jno 11f\n\t"					\
+			  "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "jno 10f\n\t"					\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
+			  "la %[R_IN],32(%[R_IN])\n\t"			\
+			  "la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "brctg %[R_LI],0b\n\t"			\
+			  "j 20f\n\t"					\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  ".long 0xe000,0x10000,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "jo 2b\n\t" /* All ch's in this range, proceed.   */ \
+			  "lghi %[R_TMP],16\n\t"			\
+			  "j 12f\n\t"					\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "jo 1b\n\t" /* All ch's in this range, proceed.   */ \
+			  "lghi %[R_TMP],0\n\t"				\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "agr %[R_I],%[R_TMP]\n\t"			\
+			  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+			  "srl %[R_I],1\n\t"				\
+			  "ahi %[R_I],-1\n\t"				\
+			  "jl 20f\n\t"					\
+			  "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"		\
+			  "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2)
+
+/* Convert from the internal (UCS4-like) format to UCS2 in other endianness. */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2reverse_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2reverse_loop)/* This is not used.*/
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2reverse)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	if (! ignore_errors_p ())					\
+	  {								\
+	    result = __GCONV_ILLEGAL_INPUT;				\
+	    break;							\
+	  }								\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, bswap_16 (val));					\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "larl %[R_I],3f\n\t"				\
+			  "vlm %%v20,%%v24,0(%[R_I])\n\t"		\
+			  "0:\n\t"					\
+			  "vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2 and byteswap.  */	\
+			  "vpkf %%v18,%%v16,%%v17\n\t"			\
+			  "vperm %%v18,%%v18,%%v18,%%v24\n\t"		\
+			  "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"		\
+			  "jno 11f\n\t"					\
+			  "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "jno 10f\n\t"					\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
+			  "la %[R_IN],32(%[R_IN])\n\t"			\
+			  "la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "brctg %[R_LI],0b\n\t"			\
+			  "j 20f\n\t"					\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  ".long 0xe000,0x10000,0x0,0x0\n\t"		\
+			  ".long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* Vector permute mask (v24)  */		\
+			  ".short 0x0100,0x0302,0x0504,0x0706\n\t"	\
+			  ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "jo 2b\n\t" /* All ch's in this range, proceed.  */ \
+			  "lghi %[R_TMP],16\n\t"			\
+			  "j 12f\n\t"					\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "jo 1b\n\t" /* All ch's in this range, proceed.  */ \
+			  "lghi %[R_TMP],0\n\t"				\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "agr %[R_I],%[R_TMP]\n\t"			\
+			  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+			  "srl %[R_I],1\n\t"				\
+			  "ahi %[R_I],-1\n\t"				\
+			  "jl 20f\n\t"					\
+			  "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"		\
+			  "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			    ASM_CLOBBER_VR ("v24")			\
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2reverse)
+
+
+#else
+/* Generate the internal transformations without ifunc if build environment
+   lacks vector support. Instead simply include the common version.  */
+# include <iconv/gconv_simple.c>
+#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/14] Fix ucs4le_internal_loop in error case.
  2016-02-25  9:00     ` Stefan Liebler
@ 2016-03-18 13:04       ` Stefan Liebler
  2016-03-31  9:20         ` Stefan Liebler
  2016-03-31  9:45       ` Andreas Schwab
  1 sibling, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-03-18 13:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: Joseph S. Myers

On 02/25/2016 09:58 AM, Stefan Liebler wrote:
> On 02/23/2016 06:41 PM, Joseph Myers wrote:
>> If this is user-visible in a release, there should be a bug filed in
>> Bugzilla (if there isn't one already open), and a testcase added to the
>> testsuite.
>>
> okay.
>
> I've filed the bug
> "Bug 19726 - Converting UCS4LE to INTERNAL with iconv() does not update
> pointers and lengths in error-case."
> (https://sourceware.org/bugzilla/show_bug.cgi?id=19726)
>
> This patch also adds a new testcase for this issue.
> The new test was tested on a s390, power, intel machine.
Is the previously attached test-case okay?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-02-25 12:57     ` Stefan Liebler
@ 2016-03-18 13:05       ` Stefan Liebler
  2016-03-22 14:39         ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-03-18 13:05 UTC (permalink / raw)
  To: libc-alpha; +Cc: Joseph S. Myers

On 02/25/2016 09:58 AM, Stefan Liebler wrote:
> On 02/23/2016 06:42 PM, Joseph Myers wrote:
>> If this is user-visible in a release, there should be a bug filed in
>> Bugzilla (if there isn't one already open), and a testcase added to the
>> testsuite.
>>
> okay.
>
> I've filed the bug
> "Bug 19727 - Converting from/to UTF-xx with iconv() does not always
> report errors on UTF-16 surrogates values."
> (https://sourceware.org/bugzilla/show_bug.cgi?id=19727)
>
> This patch also adds a new testcase, which checks UTF conversions with
> input values in range of UTF16 surrogates. The test converts from
> UTF-xx to INTERNAL, INTERNAL to UTF-xx and directly between
> UTF-xx to UTF-yy. The latter conversion is needed because s390 has
> iconv-modules, which converts from/to UTF in one step.
> The new testcase was tested on a s390, power and intel machine.
Is the previously attached test-case okay?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-03-18 13:05       ` Stefan Liebler
@ 2016-03-22 14:39         ` Stefan Liebler
  2016-03-31  9:18           ` Stefan Liebler
  2016-04-07 15:18           ` Andreas Schwab
  0 siblings, 2 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-03-22 14:39 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

On 03/18/2016 02:04 PM, Stefan Liebler wrote:
> Is the previously attached test-case okay?

Hi,
here is one further update regarding the new test "iconv/tst-iconv7.c".
If "make check" is executed directly after building without the 
installation step, then <build-dir>/iconvdata/gconv-modules hasn't been 
generated yet and the system gconv-modules-file/iconv-modules are used 
instead - if available.
The test needs gconv-modules and the UTF-16|32 modules as prerequirement.
Thus I've moved it to iconvdata/bug-iconv12.c and added
the prerequirements in iconvdata/Makefile.

Ok, to commit?

ChangeLog:

	[BZ #19727]
	* iconvdata/utf-16.c (BODY):
	Report an error if first word is not a
	valid high surrogate.
	* iconvdata/utf-32.c (BODY):
	Report an error if the value is in range
	of an utf16 surrogate.
	* iconv/gconv_simple.c (BODY): Likewise.
	* iconvdata/bug-iconv12.c: New file.
	* iconvdata/Makefile (tests): Add bug-iconv12.

[-- Attachment #2: 0014-Fix-UTF-16-surrogate-handling.-BZ-19727.patch --]
[-- Type: text/x-patch, Size: 15675 bytes --]

From d3b5f8fe14a719dbf13e83ee3946395c73c55766 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Tue, 23 Feb 2016 09:27:46 +0100
Subject: [PATCH 14/14] Fix UTF-16 surrogate handling. [BZ #19727]

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
Thus this patch fixes this behaviour for converting from utf32 to internal and
from internal to utf8.

Furthermore the conversion from utf16 to internal does not report an error if the
input-stream consists of two low-surrogate values. If an uint16_t value is in the
range of 0xd800 .. 0xdfff, the next uint16_t value is checked, if it is in the
range of a low surrogate (0xdc00 .. 0xdfff). Afterwards these two uint16_t
values are interpreted as a high- and low-surrogates pair. But there is no test
if the first uint16_t value is really in the range of a high-surrogate
(0xd800 .. 0xdbff). If there would be two uint16_t values in the range of a low
surrogate, then they will be treated as a valid high- and low-surrogates pair.
This patch adds this test.

This patch also adds a new testcase, which checks UTF conversions with input
values in range of UTF16 surrogates. The test converts from UTF-xx to INTERNAL,
INTERNAL to UTF-xx and directly between UTF-xx to UTF-yy. The latter conversion
is needed because s390 has iconv-modules, which converts from/to UTF in one step.
The new testcase was tested on a s390, power and intel machine.

ChangeLog:

	[BZ #19727]
	* iconvdata/utf-16.c (BODY): Report an error if first word is not a
	valid high surrogate.
	* iconvdata/utf-32.c (BODY): Report an error if the value is in range
	of an utf16 surrogate.
	* iconv/gconv_simple.c (BODY): Likewise.
	* iconvdata/bug-iconv12.c: New file.
	* iconvdata/Makefile (tests): Add bug-iconv12.

rename test
---
 iconv/gconv_simple.c    |   3 +-
 iconvdata/Makefile      |   4 +-
 iconvdata/bug-iconv12.c | 263 ++++++++++++++++++++++++++++++++++++++++++++++++
 iconvdata/utf-16.c      |  12 +++
 iconvdata/utf-32.c      |   2 +-
 5 files changed, 281 insertions(+), 3 deletions(-)
 create mode 100644 iconvdata/bug-iconv12.c

diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index f66bf34..e5284e4 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -892,7 +892,8 @@ ucs4le_internal_loop_single (struct __gconv_step *step,
     if (__glibc_likely (wc < 0x80))					      \
       /* It's an one byte sequence.  */					      \
       *outptr++ = (unsigned char) wc;					      \
-    else if (__glibc_likely (wc <= 0x7fffffff))				      \
+    else if (__glibc_likely (wc <= 0x7fffffff				      \
+			     && (wc < 0xd800 || wc > 0xdfff)))		      \
       {									      \
 	size_t step;							      \
 	unsigned char *start;						      \
diff --git a/iconvdata/Makefile b/iconvdata/Makefile
index 1ac1a5c..8e59dd6 100644
--- a/iconvdata/Makefile
+++ b/iconvdata/Makefile
@@ -68,7 +68,7 @@ modules.so := $(addsuffix .so, $(modules))
 ifeq (yes,$(build-shared))
 tests = bug-iconv1 bug-iconv2 tst-loading tst-e2big tst-iconv4 bug-iconv4 \
 	tst-iconv6 bug-iconv5 bug-iconv6 tst-iconv7 bug-iconv8 bug-iconv9 \
-	bug-iconv10 bug-iconv11
+	bug-iconv10 bug-iconv11 bug-iconv12
 ifeq ($(have-thread-library),yes)
 tests += bug-iconv3
 endif
@@ -309,6 +309,8 @@ $(objpfx)tst-iconv7.out: $(objpfx)gconv-modules \
 			 $(addprefix $(objpfx),$(modules.so))
 $(objpfx)bug-iconv10.out: $(objpfx)gconv-modules \
 			  $(addprefix $(objpfx),$(modules.so))
+$(objpfx)bug-iconv12.out: $(objpfx)gconv-modules \
+			  $(addprefix $(objpfx),$(modules.so))
 
 $(objpfx)iconv-test.out: run-iconv-test.sh $(objpfx)gconv-modules \
 			 $(addprefix $(objpfx),$(modules.so)) \
diff --git a/iconvdata/bug-iconv12.c b/iconvdata/bug-iconv12.c
new file mode 100644
index 0000000..8c748e8
--- /dev/null
+++ b/iconvdata/bug-iconv12.c
@@ -0,0 +1,263 @@
+/* bug 19727: Testing UTF conversions with UTF16 surrogates as input.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <inttypes.h>
+#include <iconv.h>
+#include <byteswap.h>
+
+static int
+run_conversion (const char *from, const char *to, char *inbuf, size_t inbuflen
+		, int exp_errno, int line)
+{
+  char outbuf[16];
+  iconv_t cd;
+  char *inptr;
+  size_t inlen;
+  char *outptr;
+  size_t outlen;
+  size_t n;
+  int e;
+  int fails = 0;
+
+  cd = iconv_open (to, from);
+  if (cd == (iconv_t) -1)
+    {
+      printf ("line %d: cannot convert from %s to %s: %m\n", line, from, to);
+      return 1;
+    }
+
+  inptr = (char *) inbuf;
+  inlen = inbuflen;
+  outptr = outbuf;
+  outlen = sizeof (outbuf);
+
+  errno = 0;
+  n = iconv (cd, &inptr, &inlen, &outptr, &outlen);
+  e = errno;
+
+  if (exp_errno == 0)
+    {
+      if (n == (size_t) -1)
+	{
+	  puts ("n should be >= 0, but n == -1");
+	  fails ++;
+	}
+
+      if (e != 0)
+	{
+	  printf ("errno should be 0: 'Success', but errno == %d: '%s'\n"
+		  , e, strerror(e));
+	  fails ++;
+	}
+    }
+  else
+    {
+      if (n != (size_t) -1)
+	{
+	  printf ("n should be -1, but n == %zd\n", n);
+	  fails ++;
+	}
+
+      if (e != exp_errno)
+	{
+	  printf ("errno should be %d: '%s', but errno == %d: '%s'\n"
+		  , exp_errno, strerror (exp_errno), e, strerror (e));
+	  fails ++;
+	}
+    }
+
+  iconv_close (cd);
+
+  if (fails > 0)
+    {
+      printf ("Errors in line %d while converting %s to %s.\n\n"
+	      , line, from, to);
+    }
+
+  return fails;
+}
+
+static int
+do_test (void)
+{
+  int fails = 0;
+  char buf[4];
+
+  /* This test runs iconv() with UTF character in range of an UTF16 surrogate.
+     UTF-16 high surrogate is in range 0xD800..0xDBFF and
+     UTF-16 low surrogate is in range 0xDC00..0xDFFF.
+     Converting from or to UTF-xx has to report errors in those cases.
+     In UTF-16, surrogate pairs with a high surrogate in front of a low
+     surrogate is valid.  */
+
+  /* Use RUN_UCS4_UTF32_INPUT to test conversion ...
+
+     ... from INTERNAL to UTF-xx[LE|BE]:
+     Converting from UCS4 to UTF-xx[LE|BE] first converts UCS4 to INTERNAL
+     without checking for UTF-16 surrogate values
+     and then converts from INTERNAL to UTF-xx[LE|BE].
+     The latter conversion has to report an error in those cases.
+
+     ... from UTF-32[LE|BE] to INTERNAL:
+     Converting directly from UTF-32LE to UTF-8|16 is needed,
+     because e.g. s390x has iconv-modules which converts directly.  */
+#define RUN_UCS4_UTF32_INPUT(b0, b1, b2, b3, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  buf[3] = b3;								\
+  fails += run_conversion ("UCS4", "UTF-8", buf, 4, err, line);		\
+  fails += run_conversion ("UCS4", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-16BE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-32LE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-32BE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "WCHAR_T", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-8", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-16BE", buf, 4, err, line);	\
+  buf[0] = b3;								\
+  buf[1] = b2;								\
+  buf[2] = b1;								\
+  buf[3] = b0;								\
+  fails += run_conversion ("UTF-32LE", "WCHAR_T", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-8", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-16BE", buf, 4, err, line);
+
+  /* Use UCS4/UTF32 input of 0xD7FF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xD7, 0xFF, 0, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xD800.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xD8, 0x00, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDBFF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDB, 0xFF, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDC00.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDC, 0x00, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDFFF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDF, 0xFF, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xE000.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xE0, 0x00, 0, __LINE__);
+
+
+  /* Use RUN_UTF16_INPUT to test conversion from UTF16[LE|BE] to INTERNAL.
+     Converting directly from UTF-16 to UTF-8|32 is needed,
+     because e.g. s390x has iconv-modules which converts directly.
+     Use len == 2 or 4 to specify one or two UTF-16 characters.  */
+#define RUN_UTF16_INPUT(b0, b1, b2, b3, len, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  buf[3] = b3;								\
+  fails += run_conversion ("UTF-16BE", "WCHAR_T", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16BE", "UTF-8", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16BE", "UTF-32LE", buf, len, err, line); \
+  fails += run_conversion ("UTF-16BE", "UTF-32BE", buf, len, err, line); \
+  buf[0] = b1;								\
+  buf[1] = b0;								\
+  buf[2] = b3;								\
+  buf[3] = b2;								\
+  fails += run_conversion ("UTF-16LE", "WCHAR_T", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16LE", "UTF-8", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16LE", "UTF-32LE", buf, len, err, line); \
+  fails += run_conversion ("UTF-16LE", "UTF-32BE", buf, len, err, line);
+
+  /* Use UTF16 input of 0xD7FF.  */
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xD7, 0xFF, 4, 0, __LINE__);
+
+  /* Use [single] UTF16 high surrogate 0xD800 [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid low surrogate].  */
+  RUN_UTF16_INPUT (0xD8, 0x0, 0x0, 0x0, 2, EINVAL, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xD8, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xE0, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xDC, 0x0, 4, 0, __LINE__);
+
+  /* Use [single] UTF16 high surrogate 0xDBFF [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid low surrogate].  */
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0x0, 0x0, 2, EINVAL, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDB, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xE0, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDF, 0xFF, 4, 0, __LINE__);
+
+  /* Use single UTF16 low surrogate 0xDC00 [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid high surrogate].   */
+  RUN_UTF16_INPUT (0xDC, 0x0, 0x0, 0x0, 2, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDC, 0x0, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xDC, 0x0, 4, 0, __LINE__);
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDC, 0x0, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+
+  /* Use single UTF16 low surrogate 0xDFFF [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid high surrogate].   */
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0x0, 0x0, 2, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDF, 0xFF, 4, 0, __LINE__);
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xE000.  */
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xE0, 0x0, 4, 0, __LINE__);
+
+
+  /* Use RUN_UTF8_3BYTE_INPUT to test conversion from UTF-8 to INTERNAL.
+     Converting directly from UTF-8 to UTF-16|32 is needed,
+     because e.g. s390x has iconv-modules which converts directly.  */
+#define RUN_UTF8_3BYTE_INPUT(b0, b1, b2, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  fails += run_conversion ("UTF-8", "WCHAR_T", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-16LE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-16BE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-32LE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-32BE", buf, 3, err, line);
+
+  /* Use UTF-8 input of 0xD7FF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0x9F, 0xBF, 0, __LINE__);
+
+  /* Use UTF-8 input of 0xD800.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xA0, 0x80, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDBFF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xAF, 0xBF, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDC00.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xB0, 0x80, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDFFF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xBF, 0xBF, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xF000.  */
+  RUN_UTF8_3BYTE_INPUT (0xEF, 0x80, 0x80, 0, __LINE__);
+
+  return fails > 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
diff --git a/iconvdata/utf-16.c b/iconvdata/utf-16.c
index 2d74a13..dbbcd6d 100644
--- a/iconvdata/utf-16.c
+++ b/iconvdata/utf-16.c
@@ -295,6 +295,12 @@ gconv_end (struct __gconv_step *data)
 	  {								      \
 	    uint16_t u2;						      \
 									      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
@@ -329,6 +335,12 @@ gconv_end (struct __gconv_step *data)
 	  }								      \
 	else								      \
 	  {								      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
diff --git a/iconvdata/utf-32.c b/iconvdata/utf-32.c
index 0d6fe30..25f6fc6 100644
--- a/iconvdata/utf-32.c
+++ b/iconvdata/utf-32.c
@@ -239,7 +239,7 @@ gconv_end (struct __gconv_step *data)
     if (swap)								      \
       u1 = bswap_32 (u1);						      \
 									      \
-    if (__glibc_unlikely (u1 >= 0x110000))				      \
+    if (__glibc_unlikely (u1 >= 0x110000 || (u1 >= 0xd800 && u1 < 0xe000)))   \
       {									      \
 	/* This is illegal.  */						      \
 	STANDARD_FROM_LOOP_ERR_HANDLER (4);				      \
-- 
2.3.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-03-22 14:39         ` Stefan Liebler
@ 2016-03-31  9:18           ` Stefan Liebler
  2016-04-07 14:35             ` Stefan Liebler
  2016-04-07 15:18           ` Andreas Schwab
  1 sibling, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-03-31  9:18 UTC (permalink / raw)
  To: libc-alpha

ping

On 03/22/2016 03:38 PM, Stefan Liebler wrote:
> On 03/18/2016 02:04 PM, Stefan Liebler wrote:
>> Is the previously attached test-case okay?
>
> Hi,
> here is one further update regarding the new test "iconv/tst-iconv7.c".
> If "make check" is executed directly after building without the
> installation step, then <build-dir>/iconvdata/gconv-modules hasn't been
> generated yet and the system gconv-modules-file/iconv-modules are used
> instead - if available.
> The test needs gconv-modules and the UTF-16|32 modules as prerequirement.
> Thus I've moved it to iconvdata/bug-iconv12.c and added
> the prerequirements in iconvdata/Makefile.
>
> Ok, to commit?
>
> ChangeLog:
>
>      [BZ #19727]
>      * iconvdata/utf-16.c (BODY):
>      Report an error if first word is not a
>      valid high surrogate.
>      * iconvdata/utf-32.c (BODY):
>      Report an error if the value is in range
>      of an utf16 surrogate.
>      * iconv/gconv_simple.c (BODY): Likewise.
>      * iconvdata/bug-iconv12.c: New file.
>      * iconvdata/Makefile (tests): Add bug-iconv12.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/14] Fix ucs4le_internal_loop in error case.
  2016-03-18 13:04       ` Stefan Liebler
@ 2016-03-31  9:20         ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-03-31  9:20 UTC (permalink / raw)
  To: libc-alpha

ping

On 03/18/2016 02:04 PM, Stefan Liebler wrote:
> On 02/25/2016 09:58 AM, Stefan Liebler wrote:
>> On 02/23/2016 06:41 PM, Joseph Myers wrote:
>>> If this is user-visible in a release, there should be a bug filed in
>>> Bugzilla (if there isn't one already open), and a testcase added to the
>>> testsuite.
>>>
>> okay.
>>
>> I've filed the bug
>> "Bug 19726 - Converting UCS4LE to INTERNAL with iconv() does not update
>> pointers and lengths in error-case."
>> (https://sourceware.org/bugzilla/show_bug.cgi?id=19726)
>>
>> This patch also adds a new testcase for this issue.
>> The new test was tested on a s390, power, intel machine.
> Is the previously attached test-case okay?
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/14] Fix ucs4le_internal_loop in error case.
  2016-02-25  9:00     ` Stefan Liebler
  2016-03-18 13:04       ` Stefan Liebler
@ 2016-03-31  9:45       ` Andreas Schwab
  1 sibling, 0 replies; 55+ messages in thread
From: Andreas Schwab @ 2016-03-31  9:45 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

Stefan Liebler <stli@linux.vnet.ibm.com> writes:

> 	[BZ #19726]
> 	* iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
> 	outptrp in case of an illegal input.
> 	* iconv/tst-iconv6.c: New file.
> 	* iconv/Makefile (tests): Add tst-iconv6.

Ok.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-03-31  9:18           ` Stefan Liebler
@ 2016-04-07 14:35             ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-07 14:35 UTC (permalink / raw)
  To: libc-alpha

ping

On 03/31/2016 11:18 AM, Stefan Liebler wrote:
> ping
>
> On 03/22/2016 03:38 PM, Stefan Liebler wrote:
>> On 03/18/2016 02:04 PM, Stefan Liebler wrote:
>>> Is the previously attached test-case okay?
>>
>> Hi,
>> here is one further update regarding the new test "iconv/tst-iconv7.c".
>> If "make check" is executed directly after building without the
>> installation step, then <build-dir>/iconvdata/gconv-modules hasn't been
>> generated yet and the system gconv-modules-file/iconv-modules are used
>> instead - if available.
>> The test needs gconv-modules and the UTF-16|32 modules as prerequirement.
>> Thus I've moved it to iconvdata/bug-iconv12.c and added
>> the prerequirements in iconvdata/Makefile.
>>
>> Ok, to commit?
>>
>> ChangeLog:
>>
>>      [BZ #19727]
>>      * iconvdata/utf-16.c (BODY):
>>      Report an error if first word is not a
>>      valid high surrogate.
>>      * iconvdata/utf-32.c (BODY):
>>      Report an error if the value is in range
>>      of an utf16 surrogate.
>>      * iconv/gconv_simple.c (BODY): Likewise.
>>      * iconvdata/bug-iconv12.c: New file.
>>      * iconvdata/Makefile (tests): Add bug-iconv12.
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/14] Fix UTF-16 surrogate handling.
  2016-03-22 14:39         ` Stefan Liebler
  2016-03-31  9:18           ` Stefan Liebler
@ 2016-04-07 15:18           ` Andreas Schwab
  1 sibling, 0 replies; 55+ messages in thread
From: Andreas Schwab @ 2016-04-07 15:18 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

Stefan Liebler <stli@linux.vnet.ibm.com> writes:

> +static int
> +run_conversion (const char *from, const char *to, char *inbuf, size_t inbuflen
> +		, int exp_errno, int line)

Linebreak after the comma, not before.  Ok with that change.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-02-23  9:21 ` [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules Stefan Liebler
@ 2016-04-14 14:16   ` Stefan Liebler
  2016-04-21 15:00     ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-04-14 14:16 UTC (permalink / raw)
  To: libc-alpha; +Cc: Joseph S. Myers

Ping. Is the new handling of gconv-modules in iconvdata/Makefile okay to 
commit?

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> This patch introduces a way to provide an architecture dependent gconv-modules
> file. Before this patch, the gconv-modules file was normally installed from
> src-dir/iconvdata/gconv-modules. The S390 Makefile had overridden the
> installation recipe (with a make warning) in order to install the
> gconv-module-s390 file from build-dir.
> The iconvdata/Makefile provides another recipe, which copies the gconv-modules
> file from src to build dir, which are used by the testcases.
> Thus the testcases does not use the currently build s390-modules.
>
> This patch uses build-dir/iconvdata/gconv-modules for installation.
> If makefile variable GCONV_MODULES is not defined, then gconv-modules file
> is copied form source to build directory.
> If an architecture wants to create his own gconv-modules file, then the variable
> GCONV_MODULE is set to the name of the architecture-dependent gconv-modules file
> in build-directory, which has to be created by a recipe in sysdeps/.../Makefile.
> Then the  iconvdata/Makefile copies this file to build-dir/iconvdata/gconv-modules, which will be used for installation and test.
>
> This way, the s390-Makefile does not need to override the recipe for gconv-modules and no warning is emitted anymore.
>
> ChangeLog:
>
>      * iconvdata/Makefile (GCONV_MODULES): New variable, which can
>      be set by sysdeps Makefile.
>      ($(inst_gconvdir)/gconv-modules):
>      Install file from $(objpfx)gconv-modules.
>      ($(objpfx)gconv-modules): Copy File from src-dir or from
>      build-dir with file-name specified by GCONV_MODULES.
>      * sysdeps/s390/s390-64/Makefile ($(inst_gconvdir)/gconv-modules):
>      Deleted.
>      (GCONV_MODULES): New variable.
> ---
>   iconvdata/Makefile            | 15 +++++++++++++--
>   sysdeps/s390/s390-64/Makefile | 17 ++---------------
>   2 files changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/iconvdata/Makefile b/iconvdata/Makefile
> index 357530b..1ac1a5c 100644
> --- a/iconvdata/Makefile
> +++ b/iconvdata/Makefile
> @@ -244,7 +244,7 @@ headers: $(addprefix $(objpfx), $(generated-modules:=.h))
>   $(addprefix $(inst_gconvdir)/, $(modules.so)): \
>       $(inst_gconvdir)/%: $(objpfx)% $(+force)
>   	$(do-install-program)
> -$(inst_gconvdir)/gconv-modules: gconv-modules $(+force)
> +$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules $(+force)
>   	$(do-install)
>   ifeq (no,$(cross-compiling))
>   # Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
> @@ -332,6 +332,17 @@ tst-tables-clean:
>   	-rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
>
>   ifdef objpfx
> +# Override GCONV_MODULES file name and provide a Makefile recipe,
> +# if you want to create your own version.
> +ifndef GCONV_MODULES
> +# Copy gconv-modules from src-tree for tests and installation.
>   $(objpfx)gconv-modules: gconv-modules
> -	cp $^ $@
> +	cp $< $@
> +else
> +generated += $(GCONV_MODULES)
> +
> +# Copy overrided GCONV_MODULES file to gconv-modules for tests and installation.
> +$(objpfx)gconv-modules: $(objpfx)$(GCONV_MODULES)
> +	cp $< $@
> +endif
>   endif
> diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
> index ce4f0c5..de249a7 100644
> --- a/sysdeps/s390/s390-64/Makefile
> +++ b/sysdeps/s390/s390-64/Makefile
> @@ -39,7 +39,7 @@ $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
>   $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
>   	$(do-install-program)
>
> -$(objpfx)gconv-modules-s390: gconv-modules $(+force)
> +$(objpfx)gconv-modules-s390: gconv-modules
>   	cp $< $@
>   	echo >> $@
>   	echo "# S/390 hardware accelerated modules" >> $@
> @@ -74,19 +74,6 @@ $(objpfx)gconv-modules-s390: gconv-modules $(+force)
>   	echo -n "module	ISO-10646/UTF8/		UTF-16BE//	" >> $@
>   	echo "	UTF8_UTF16_Z9		1" >> $@
>
> -$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules-s390 $(+force)
> -	$(do-install)
> -ifeq (no,$(cross-compiling))
> -# Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
> -# if this libc has more gconv modules than the previously installed one.
> -	if test -f "$(inst_gconvdir)/gconv-modules.cache"; then \
> -	   LC_ALL=C \
> -	   $(rtld-prefix) \
> -	   $(common-objpfx)iconv/iconvconfig \
> -	     $(addprefix --prefix=,$(install_root)); \
> -	fi
> -else
> -	@echo '*@*@*@ You should recreate $(inst_gconvdir)/gconv-modules.cache'
> -endif
> +GCONV_MODULES = gconv-modules-s390
>
>   endif
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/14] S390: Mention s390-specific gconv-modues before common ones.
  2016-02-23  9:21 ` [PATCH 02/14] S390: Mention s390-specific gconv-modues before common ones Stefan Liebler
@ 2016-04-15 10:27   ` Florian Weimer
  2016-04-21 14:50     ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Florian Weimer @ 2016-04-15 10:27 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> +	${AWK} 'BEGIN { emitted = 0 } \
> +	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
> +	!emitted { emit_s390_modules(); emitted = 1; print; } \

I would like to suggest to put the awk script into a separate file (and 
reflect it in the makefile dependencies).

Florian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] S390: Optimize 8bit-generic iconv modules.
  2016-02-23  9:22 ` [PATCH 04/14] S390: Optimize 8bit-generic iconv modules Stefan Liebler
@ 2016-04-15 13:05   ` Florian Weimer
  2016-04-21 15:35     ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Florian Weimer @ 2016-04-15 13:05 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> +	 to the 1 byte generic character. If this table contains only up
> +	 to 256 entry, then the highest UCS4 value can be stored in 1 byte

“256 entries”? (spelling)

Why don't you compute the required table at compile time?  Then it can 
live in .rodata and does not have to end up in .bss.

In the inline assembly, I would suggest to out-dent the labels.  There 
is a typo in a comment, “blcocks”.  You could reduce the amount of 
inline assembly by falling back on the C code for error handling, I think.

I can't comment on the technical accuracy of the inline assembly.

Florian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/14] S390: Mention s390-specific gconv-modues before common ones.
  2016-04-15 10:27   ` Florian Weimer
@ 2016-04-21 14:50     ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 14:50 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

[-- Attachment #1: Type: text/plain, Size: 608 bytes --]

On 04/15/2016 12:27 PM, Florian Weimer wrote:
> On 02/23/2016 10:21 AM, Stefan Liebler wrote:
>> +    ${AWK} 'BEGIN { emitted = 0 } \
>> +    emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
>> +    !emitted { emit_s390_modules(); emitted = 1; print; } \
>
> I would like to suggest to put the awk script into a separate file (and
> reflect it in the makefile dependencies).
>
> Florian
>
Okay. Done.

Here is the updated ChangeLog:

	* sysdeps/s390/s390-64/Makefile ($(objpfx)gconv-modules-s390):
	Mention s390-specific gconv-modules before common ones.
	* sysdeps/s390/gconv-modules-s390.awk: New file.

[-- Attachment #2: 0002-S390-Mention-s390-specific-gconv-modues-before-commo.patch --]
[-- Type: text/x-patch, Size: 5940 bytes --]

From d72d7ed0480722727350c29b2dcd5a202d532c19 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:48 +0200
Subject: [PATCH 02/14] S390: Mention s390-specific gconv-modues before common
 ones.

This patch changes the order in gconv-modules. Now the s390-specific
modules are mentioned before the common ones, because these modules
aren't used in all possible conversions. E.g. the converting-step from
INTERNAL to UTF-16 used the common UTF-16.so module instead of
UTF16_UTF32_Z9.so.

The awk script is parsing the source gconv-modules file and copies it
line by line. The s390 modules are emitted between the header-comments
and the first common-code-module.

ChangeLog:

	* sysdeps/s390/s390-64/Makefile ($(objpfx)gconv-modules-s390):
	Mention s390-specific gconv-modules before common ones.
	* sysdeps/s390/gconv-modules-s390.awk: New file.
---
 sysdeps/s390/gconv-modules-s390.awk | 81 +++++++++++++++++++++++++++++++++++++
 sysdeps/s390/s390-64/Makefile       | 36 +----------------
 2 files changed, 83 insertions(+), 34 deletions(-)
 create mode 100644 sysdeps/s390/gconv-modules-s390.awk

diff --git a/sysdeps/s390/gconv-modules-s390.awk b/sysdeps/s390/gconv-modules-s390.awk
new file mode 100644
index 0000000..344c7b3
--- /dev/null
+++ b/sysdeps/s390/gconv-modules-s390.awk
@@ -0,0 +1,81 @@
+# Emit s390 modules at top of gconv-modules file.
+# Copyright (C) 2016 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+BEGIN {
+    emitted = 0
+}
+
+emitted || NF == 0 || $1 ~ /^#/ {
+    print;
+    next;
+}
+
+!emitted {
+    emit_s390_modules();
+    emitted = 1;
+    print;
+}
+
+function emit_s390_modules() {
+    # Emit header line.
+    print "# S/390 hardware accelerated modules";
+    print_val("#", 8);
+    print_val("from", 24);
+    print_val("to", 24);
+    print_val("module", 24);
+    printf "cost\n";
+    # Emit s390-specific modules.
+    modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900");
+    modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900");
+    modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9");
+    modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9");
+    modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9");
+    modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9");
+    modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9");
+    modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9");
+    modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9");
+    modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9");
+    modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9");
+    modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9");
+    modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9");
+    modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9");
+    modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9");
+    printf "\n# Default glibc modules\n";
+}
+
+function modul(from, to, file, cost) {
+    print_val("module", 8);
+    print_val(from, 24);
+    print_val(to, 24);
+    print_val(file, 24);
+    if (cost == 0) cost = 1;
+    printf "%d\n", cost;
+}
+
+function print_val(val, width) {
+    # Emit value followed by tabs.
+    printf "%s", val;
+    len = length(val);
+    if (len < width) {
+	len = width - len;
+	nr_tabs = len / 8;
+	if (len % 8 != 0) nr_tabs++;
+    }
+    else nr_tabs = 1;
+    for (i = 1; i <= nr_tabs; i++) printf "\t";
+}
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index de249a7..094b1e9 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -39,40 +39,8 @@ $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
 $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
 	$(do-install-program)
 
-$(objpfx)gconv-modules-s390: gconv-modules
-	cp $< $@
-	echo >> $@
-	echo "# S/390 hardware accelerated modules" >> $@
-	echo -n "module	ISO-8859-1//		IBM037//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	IBM037//		ISO-8859-1//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32BE//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		INTERNAL	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16BE//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
+$(objpfx)gconv-modules-s390: ../sysdeps/s390/gconv-modules-s390.awk gconv-modules
+	${AWK} -f $^ > $@
 
 GCONV_MODULES = gconv-modules-s390
 
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/14] S390: Optimize builtin iconv-modules.
  2016-03-18 12:58   ` Stefan Liebler
@ 2016-04-21 14:51     ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 14:51 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 65852 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 03/18/2016 01:57 PM, Stefan Liebler wrote:
> Hi,
>
> I've updated the vector loop functions
> internal_ucs2_loop and internal_ucs2reverse_loop.
> The old patch contained lhi statements to initialize %[R_TMP],
> which is later used to calculate an address.
> This patch uses lghi statements to initialize %[R_TMP].
>
> the ChangeLog remains the same.
>
> Bye Stefan
>
> On 02/23/2016 10:21 AM, Stefan Liebler wrote:
>> This patch introduces a s390 specific gconv_simple.c file which provides
>> optimized versions for z13 with vector instructions, which will be
>> chosen at
>> runtime via ifunc.
>> The optimized conversions can convert between internal and ascii,
>> ucs4, ucs4le,
>> ucs2, ucs2le.
>> If the build-environment lacks vector support, then iconv/gconv_simple.c
>> is used wihtout any change. Otherwise iconvdata/gconv_simple.c is used
>> to create
>> conversion loop routines without vector instructions as fallback, if
>> vector
>> instructions aren't available at runtime.
>>
>> ChangeLog:
>>
>>     * sysdeps/s390/multiarch/gconv_simple.c: New File.
>>     * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add
>> gconv_simple.
>> ---
>>   sysdeps/s390/multiarch/Makefile       |    4 +
>>   sysdeps/s390/multiarch/gconv_simple.c | 1266
>> +++++++++++++++++++++++++++++++++
>>   2 files changed, 1270 insertions(+)
>>   create mode 100644 sysdeps/s390/multiarch/gconv_simple.c
>>
>> diff --git a/sysdeps/s390/multiarch/Makefile
>> b/sysdeps/s390/multiarch/Makefile
>> index 0805b07..5067b6f 100644
>> --- a/sysdeps/s390/multiarch/Makefile
>> +++ b/sysdeps/s390/multiarch/Makefile
>> @@ -42,3 +42,7 @@ sysdep_routines += wcslen wcslen-vx wcslen-c \
>>              wmemset wmemset-vx wmemset-c \
>>              wmemcmp wmemcmp-vx wmemcmp-c
>>   endif
>> +
>> +ifeq ($(subdir),iconv)
>> +sysdep_routines += gconv_simple
>> +endif
>> diff --git a/sysdeps/s390/multiarch/gconv_simple.c
>> b/sysdeps/s390/multiarch/gconv_simple.c
>> new file mode 100644
>> index 0000000..0e59422
>> --- /dev/null
>> +++ b/sysdeps/s390/multiarch/gconv_simple.c
>> @@ -0,0 +1,1266 @@
>> +/* Simple transformations functions - s390 version.
>> +   Copyright (C) 2016 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <http://www.gnu.org/licenses/>.  */
>> +
>> +#if defined HAVE_S390_VX_ASM_SUPPORT
>> +# include <ifunc-resolve.h>
>> +
>> +# if defined HAVE_S390_VX_GCC_SUPPORT
>> +#  define ASM_CLOBBER_VR(NR) , NR
>> +# else
>> +#  define ASM_CLOBBER_VR(NR)
>> +# endif
>> +
>> +# define ICONV_C_NAME(NAME) __##NAME##_c
>> +# define ICONV_VX_NAME(NAME) __##NAME##_vx
>> +# define ICONV_VX_IFUNC(FUNC)                        \
>> +  extern __typeof (ICONV_C_NAME (FUNC)) __##FUNC;            \
>> +  s390_vx_libc_ifunc (__##FUNC)                        \
>> +  int FUNC (struct __gconv_step *step, struct __gconv_step_data
>> *data,    \
>> +        const unsigned char **inptrp, const unsigned char *inend,    \
>> +        unsigned char **outbufstart, size_t *irreversible,        \
>> +        int do_flush, int consume_incomplete)            \
>> +  {                                    \
>> +    return __##FUNC (step, data, inptrp, inend,outbufstart,        \
>> +             irreversible, do_flush, consume_incomplete);    \
>> +  }
>> +# define ICONV_VX_SINGLE(NAME)                        \
>> +  static __typeof (NAME##_single) __##NAME##_vx_single
>> __attribute__((alias(#NAME "_single")));
>> +
>> +/* Generate the transformations which are used, if the target machine
>> does not
>> +   support vector instructions.  */
>> +# define __gconv_transform_ascii_internal        \
>> +  ICONV_C_NAME (__gconv_transform_ascii_internal)
>> +# define __gconv_transform_internal_ascii        \
>> +  ICONV_C_NAME (__gconv_transform_internal_ascii)
>> +# define __gconv_transform_internal_ucs4le        \
>> +  ICONV_C_NAME (__gconv_transform_internal_ucs4le)
>> +# define __gconv_transform_ucs4_internal        \
>> +  ICONV_C_NAME (__gconv_transform_ucs4_internal)
>> +# define __gconv_transform_ucs4le_internal        \
>> +  ICONV_C_NAME (__gconv_transform_ucs4le_internal)
>> +# define __gconv_transform_ucs2_internal        \
>> +  ICONV_C_NAME (__gconv_transform_ucs2_internal)
>> +# define __gconv_transform_ucs2reverse_internal        \
>> +  ICONV_C_NAME (__gconv_transform_ucs2reverse_internal)
>> +# define __gconv_transform_internal_ucs2        \
>> +  ICONV_C_NAME (__gconv_transform_internal_ucs2)
>> +# define __gconv_transform_internal_ucs2reverse        \
>> +  ICONV_C_NAME (__gconv_transform_internal_ucs2reverse)
>> +
>> +
>> +# include <iconv/gconv_simple.c>
>> +
>> +# undef __gconv_transform_ascii_internal
>> +# undef __gconv_transform_internal_ascii
>> +# undef __gconv_transform_internal_ucs4le
>> +# undef __gconv_transform_ucs4_internal
>> +# undef __gconv_transform_ucs4le_internal
>> +# undef __gconv_transform_ucs2_internal
>> +# undef __gconv_transform_ucs2reverse_internal
>> +# undef __gconv_transform_internal_ucs2
>> +# undef __gconv_transform_internal_ucs2reverse
>> +
>> +/* Now define the functions with vector support.  */
>> +# if defined __s390x__
>> +#  define CONVERT_32BIT_SIZE_T(REG)
>> +# else
>> +#  define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
>> +# endif
>> +
>> +/* Convert from ISO 646-IRV to the internal (UCS4-like) format.  */
>> +# define DEFINE_INIT        0
>> +# define DEFINE_FINI        0
>> +# define MIN_NEEDED_FROM    1
>> +# define MIN_NEEDED_TO        4
>> +# define FROM_DIRECTION        1
>> +# define FROM_LOOP        ICONV_VX_NAME (ascii_internal_loop)
>> +# define TO_LOOP        ICONV_VX_NAME (ascii_internal_loop) /* This
>> is not used.  */
>> +# define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_ascii_internal)
>> +# define ONE_DIRECTION        1
>> +
>> +# define MIN_NEEDED_INPUT    MIN_NEEDED_FROM
>> +# define MIN_NEEDED_OUTPUT    MIN_NEEDED_TO
>> +# define LOOPFCT        FROM_LOOP
>> +# define BODY_ORIG_ERROR                        \
>> +    /* The value is too large.  We don't try transliteration here
>> since \
>> +       this is not an error because of the lack of possibilities to    \
>> +       represent the result.  This is a genuine bug in the input
>> since    \
>> +       ASCII does not allow such values.  */                \
>> +    STANDARD_FROM_LOOP_ERR_HANDLER (1);
>> +
>> +# define BODY_ORIG                            \
>> +  {                                    \
>> +    if (__glibc_unlikely (*inptr > '\x7f'))                \
>> +      {                                    \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +    else                                \
>> +      {                                    \
>> +    /* It's an one byte sequence.  */                \
>> +    *((uint32_t *) outptr) = *inptr++;                \
>> +    outptr += sizeof (uint32_t);                    \
>> +      }                                    \
>> +  }
>> +# define BODY                                \
>> +  {                                    \
>> +    size_t len = inend - inptr;                        \
>> +    if (len > (outend - outptr) / 4)                    \
>> +      len = (outend - outptr) / 4;                    \
>> +    size_t loop_count, tmp;                        \
>> +    __asm__ volatile (".machine push\n\t"                \
>> +              ".machine \"z13\"\n\t"                \
>> +              ".machinemode \"zarch_nohighgprs\"\n\t"        \
>> +              CONVERT_32BIT_SIZE_T ([R_LEN])            \
>> +              "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
>> +              "srlg %[R_LI],%[R_LEN],4\n\t"            \
>> +              "vrepib %%v31,0x20\n\t"                \
>> +              "clgije %[R_LI],0,1f\n\t"                \
>> +              "0:\n\t" /* Handle 16-byte blocks.  */        \
>> +              "vl %%v16,0(%[R_IN])\n\t"                \
>> +              /* Checking for values > 0x7f.  */        \
>> +              "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"        \
>> +              "jno 10f\n\t"                    \
>> +              /* Enlarge to UCS4.  */                \
>> +              "vuplhb %%v17,%%v16\n\t"                \
>> +              "vupllb %%v18,%%v16\n\t"                \
>> +              "vuplhh %%v19,%%v17\n\t"                \
>> +              "vupllh %%v20,%%v17\n\t"                \
>> +              "vuplhh %%v21,%%v18\n\t"                \
>> +              "vupllh %%v22,%%v18\n\t"                \
>> +              /* Store 64bytes to buf_out.  */            \
>> +              "vstm %%v19,%%v22,0(%[R_OUT])\n\t"        \
>> +              "la %[R_IN],16(%[R_IN])\n\t"            \
>> +              "la %[R_OUT],64(%[R_OUT])\n\t"            \
>> +              "brctg %[R_LI],0b\n\t"                \
>> +              "lghi %[R_LI],15\n\t"                \
>> +              "ngr %[R_LEN],%[R_LI]\n\t"            \
>> +              "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
>> +              /* Handle remaining bytes.  */            \
>> +              "1: aghik %[R_LI],%[R_LEN],-1\n\t"        \
>> +              "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
>> +              "vll %%v16,%[R_LI],0(%[R_IN])\n\t"        \
>> +              /* Checking for values > 0x7f.  */        \
>> +              "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"        \
>> +              "vlgvb %[R_TMP],%%v17,7\n\t"            \
>> +              "clr %[R_TMP],%[R_LI]\n\t"            \
>> +              "locrh %[R_TMP],%[R_LEN]\n\t"            \
>> +              "locghih %[R_LEN],0\n\t"                \
>> +              "j 12f\n\t"                    \
>> +              "10:\n\t"                        \
>> +              /* Found a value > 0x7f.                \
>> +             Store the preceding chars.  */            \
>> +              "vlgvb %[R_TMP],%%v17,7\n\t"            \
>> +              "12: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"        \
>> +              "sllk %[R_TMP],%[R_TMP],2\n\t"            \
>> +              "ahi %[R_TMP],-1\n\t"                \
>> +              "jl 20f\n\t"                    \
>> +              "lgr %[R_LI],%[R_TMP]\n\t"            \
>> +              "vuplhb %%v17,%%v16\n\t"                \
>> +              "vuplhh %%v19,%%v17\n\t"                \
>> +              "vstl %%v19,%[R_LI],0(%[R_OUT])\n\t"        \
>> +              "ahi %[R_LI],-16\n\t"                \
>> +              "jl 11f\n\t"                    \
>> +              "vupllh %%v20,%%v17\n\t"                \
>> +              "vstl %%v20,%[R_LI],16(%[R_OUT])\n\t"        \
>> +              "ahi %[R_LI],-16\n\t"                \
>> +              "jl 11f\n\t"                    \
>> +              "vupllb %%v18,%%v16\n\t"                \
>> +              "vuplhh %%v21,%%v18\n\t"                \
>> +              "vstl %%v21,%[R_LI],32(%[R_OUT])\n\t"        \
>> +              "ahi %[R_LI],-16\n\t"                \
>> +              "jl 11f\n\t"                    \
>> +              "vupllh %%v22,%%v18\n\t"                \
>> +              "vstl %%v22,%[R_LI],48(%[R_OUT])\n\t"        \
>> +              "11:\n\t"                        \
>> +              "la %[R_OUT],1(%[R_TMP],%[R_OUT])\n\t"        \
>> +              "20:\n\t"                        \
>> +              ".machine pop"                    \
>> +              : /* outputs */ [R_OUT] "+a" (outptr)        \
>> +            , [R_IN] "+a" (inptr)                \
>> +            , [R_LEN] "+d" (len)                \
>> +            , [R_LI] "=d" (loop_count)            \
>> +            , [R_TMP] "=a" (tmp)                \
>> +              : /* inputs */                    \
>> +              : /* clobber list*/ "memory", "cc"        \
>> +            ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")    \
>> +            ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")    \
>> +            ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")    \
>> +            ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")    \
>> +            ASM_CLOBBER_VR ("v31")                \
>> +              );                        \
>> +    if (len > 0)                            \
>> +      {                                    \
>> +    /* Found an invalid character at the next input byte.  */    \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +  }
>> +
>> +# define LOOP_NEED_FLAGS
>> +# include <iconv/loop.c>
>> +# include <iconv/skeleton.c>
>> +# undef BODY_ORIG
>> +# undef BODY_ORIG_ERROR
>> +ICONV_VX_IFUNC (__gconv_transform_ascii_internal)
>> +
>> +/* Convert from the internal (UCS4-like) format to ISO 646-IRV.  */
>> +# define DEFINE_INIT        0
>> +# define DEFINE_FINI        0
>> +# define MIN_NEEDED_FROM    4
>> +# define MIN_NEEDED_TO        1
>> +# define FROM_DIRECTION        1
>> +# define FROM_LOOP        ICONV_VX_NAME (internal_ascii_loop)
>> +# define TO_LOOP        ICONV_VX_NAME (internal_ascii_loop) /* This
>> is not used.  */
>> +# define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_internal_ascii)
>> +# define ONE_DIRECTION        1
>> +
>> +# define MIN_NEEDED_INPUT    MIN_NEEDED_FROM
>> +# define MIN_NEEDED_OUTPUT    MIN_NEEDED_TO
>> +# define LOOPFCT        FROM_LOOP
>> +# define BODY_ORIG_ERROR                        \
>> +  UNICODE_TAG_HANDLER (*((const uint32_t *) inptr), 4);            \
>> +  STANDARD_TO_LOOP_ERR_HANDLER (4);
>> +
>> +# define BODY_ORIG                            \
>> +  {                                    \
>> +    if (__glibc_unlikely (*((const uint32_t *) inptr) > 0x7f))        \
>> +      {                                    \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +    else                                \
>> +      {                                    \
>> +    /* It's an one byte sequence.  */                \
>> +    *outptr++ = *((const uint32_t *) inptr);            \
>> +    inptr += sizeof (uint32_t);                    \
>> +      }                                    \
>> +  }
>> +# define BODY                                \
>> +  {                                    \
>> +    size_t len = (inend - inptr) / 4;                    \
>> +    if (len > outend - outptr)                        \
>> +      len = outend - outptr;                        \
>> +    size_t loop_count, tmp, tmp2;                    \
>> +    __asm__ volatile (".machine push\n\t"                \
>> +              ".machine \"z13\"\n\t"                \
>> +              ".machinemode \"zarch_nohighgprs\"\n\t"        \
>> +              CONVERT_32BIT_SIZE_T ([R_LEN])            \
>> +              /* Setup to check for ch > 0x7f.  */        \
>> +              "vzero %%v21\n\t"                    \
>> +              "srlg %[R_LI],%[R_LEN],4\n\t"            \
>> +              "vleih %%v21,8192,0\n\t"  /* element 0:   >  */    \
>> +              "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */    \
>> +              "vleif %%v20,127,0\n\t"   /* element 0: 127  */    \
>> +              "lghi %[R_TMP],0\n\t"                \
>> +              "clgije %[R_LI],0,1f\n\t"                \
>> +              "0:\n\t"                        \
>> +              "vlm %%v16,%%v19,0(%[R_IN])\n\t"            \
>> +              /* Shorten to byte values.  */            \
>> +              "vpkf %%v23,%%v16,%%v17\n\t"            \
>> +              "vpkf %%v24,%%v18,%%v19\n\t"            \
>> +              "vpkh %%v23,%%v23,%%v24\n\t"            \
>> +              /* Checking for values > 0x7f.  */        \
>> +              "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"        \
>> +              "jno 10f\n\t"                    \
>> +              "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"        \
>> +              "jno 11f\n\t"                    \
>> +              "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"        \
>> +              "jno 12f\n\t"                    \
>> +              "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"        \
>> +              "jno 13f\n\t"                    \
>> +              /* Store 16bytes to outptr.  */            \
>> +              "vst %%v23,0(%[R_OUT])\n\t"            \
>> +              "la %[R_IN],64(%[R_IN])\n\t"            \
>> +              "la %[R_OUT],16(%[R_OUT])\n\t"            \
>> +              "brctg %[R_LI],0b\n\t"                \
>> +              "lghi %[R_LI],15\n\t"                \
>> +              "ngr %[R_LEN],%[R_LI]\n\t"            \
>> +              "je 20f\n\t" /* Jump away if no remaining bytes.  */ \
>> +              /* Handle remaining bytes.  */            \
>> +              "1: sllg %[R_LI],%[R_LEN],2\n\t"            \
>> +              "aghi %[R_LI],-1\n\t"                \
>> +              "jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
>> +              /* Load remaining 1...63 bytes.  */        \
>> +              "vll %%v16,%[R_LI],0(%[R_IN])\n\t"        \
>> +              "ahi %[R_LI],-16\n\t"                \
>> +              "jl 2f\n\t"                    \
>> +              "vll %%v17,%[R_LI],16(%[R_IN])\n\t"        \
>> +              "ahi %[R_LI],-16\n\t"                \
>> +              "jl 2f\n\t"                    \
>> +              "vll %%v18,%[R_LI],32(%[R_IN])\n\t"        \
>> +              "ahi %[R_LI],-16\n\t"                \
>> +              "jl 2f\n\t"                    \
>> +              "vll %%v19,%[R_LI],48(%[R_IN])\n\t"        \
>> +              "2:\n\t"                        \
>> +              /* Shorten to byte values.  */            \
>> +              "vpkf %%v23,%%v16,%%v17\n\t"            \
>> +              "vpkf %%v24,%%v18,%%v19\n\t"            \
>> +              "vpkh %%v23,%%v23,%%v24\n\t"            \
>> +              "sllg %[R_LI],%[R_LEN],2\n\t"            \
>> +              "aghi %[R_LI],-16\n\t"                \
>> +              "jl 3f\n\t" /* v16 is not fully loaded.  */    \
>> +              "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"        \
>> +              "jno 10f\n\t"                    \
>> +              "aghi %[R_LI],-16\n\t"                \
>> +              "jl 4f\n\t" /* v17 is not fully loaded.  */    \
>> +              "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"        \
>> +              "jno 11f\n\t"                    \
>> +              "aghi %[R_LI],-16\n\t"                \
>> +              "jl 5f\n\t" /* v18 is not fully loaded.  */    \
>> +              "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"        \
>> +              "jno 12f\n\t"                    \
>> +              "aghi %[R_LI],-16\n\t"                \
>> +              /* v19 is not fully loaded. */            \
>> +              "lghi %[R_TMP],12\n\t"                \
>> +              "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"        \
>> +              "6: vlgvb %[R_I],%%v22,7\n\t"            \
>> +              "aghi %[R_LI],16\n\t"                \
>> +              "clrjl %[R_I],%[R_LI],14f\n\t"            \
>> +              "lgr %[R_I],%[R_LEN]\n\t"                \
>> +              "lghi %[R_LEN],0\n\t"                \
>> +              "j 15f\n\t"                    \
>> +              "3: vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"        \
>> +              "j 6b\n\t"                    \
>> +              "4: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"        \
>> +              "lghi %[R_TMP],4\n\t"                \
>> +              "j 6b\n\t"                    \
>> +              "5: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"        \
>> +              "lghi %[R_TMP],8\n\t"                \
>> +              "j 6b\n\t"                    \
>> +              /* Found a value > 0x7f.  */            \
>> +              "13: ahi %[R_TMP],4\n\t"                \
>> +              "12: ahi %[R_TMP],4\n\t"                \
>> +              "11: ahi %[R_TMP],4\n\t"                \
>> +              "10: vlgvb %[R_I],%%v22,7\n\t"            \
>> +              "14: srlg %[R_I],%[R_I],2\n\t"            \
>> +              "agr %[R_I],%[R_TMP]\n\t"                \
>> +              "je 20f\n\t"                    \
>> +              /* Store characters before invalid one...  */    \
>> +              "15: aghi %[R_I],-1\n\t"                \
>> +              "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"        \
>> +              /* ... and update pointers.  */            \
>> +              "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"        \
>> +              "sllg %[R_I],%[R_I],2\n\t"            \
>> +              "la %[R_IN],4(%[R_I],%[R_IN])\n\t"        \
>> +              "20:\n\t"                        \
>> +              ".machine pop"                    \
>> +              : /* outputs */ [R_OUT] "+a" (outptr)        \
>> +            , [R_IN] "+a" (inptr)                \
>> +            , [R_LEN] "+d" (len)                \
>> +            , [R_LI] "=d" (loop_count)            \
>> +            , [R_I] "=a" (tmp2)                \
>> +            , [R_TMP] "=d" (tmp)                \
>> +              : /* inputs */                    \
>> +              : /* clobber list*/ "memory", "cc"        \
>> +            ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")    \
>> +            ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")    \
>> +            ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")    \
>> +            ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")    \
>> +            ASM_CLOBBER_VR ("v24")                \
>> +              );                        \
>> +    if (len > 0)                            \
>> +      {                                    \
>> +    /* Found an invalid character > 0x7f at next character.  */    \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +  }
>> +# define LOOP_NEED_FLAGS
>> +# include <iconv/loop.c>
>> +# include <iconv/skeleton.c>
>> +# undef BODY_ORIG
>> +# undef BODY_ORIG_ERROR
>> +ICONV_VX_IFUNC (__gconv_transform_internal_ascii)
>> +
>> +
>> +/* Convert from internal UCS4 to UCS4 little endian form.  */
>> +# define DEFINE_INIT        0
>> +# define DEFINE_FINI        0
>> +# define MIN_NEEDED_FROM    4
>> +# define MIN_NEEDED_TO        4
>> +# define FROM_DIRECTION        1
>> +# define FROM_LOOP        ICONV_VX_NAME (internal_ucs4le_loop)
>> +# define TO_LOOP        ICONV_VX_NAME (internal_ucs4le_loop) /* This
>> is not used.  */
>> +# define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_internal_ucs4le)
>> +# define ONE_DIRECTION        0
>> +
>> +static inline int
>> +__attribute ((always_inline))
>> +ICONV_VX_NAME (internal_ucs4le_loop) (struct __gconv_step *step,
>> +                      struct __gconv_step_data *step_data,
>> +                      const unsigned char **inptrp,
>> +                      const unsigned char *inend,
>> +                      unsigned char **outptrp,
>> +                      unsigned char *outend,
>> +                      size_t *irreversible)
>> +{
>> +  const unsigned char *inptr = *inptrp;
>> +  unsigned char *outptr = *outptrp;
>> +  int result;
>> +  size_t len = MIN (inend - inptr, outend - outptr) / 4;
>> +  size_t loop_count;
>> +  __asm__ volatile (".machine push\n\t"
>> +            ".machine \"z13\"\n\t"
>> +            ".machinemode \"zarch_nohighgprs\"\n\t"
>> +            CONVERT_32BIT_SIZE_T ([R_LEN])
>> +            "bras %[R_LI],1f\n\t"
>> +            /* Vector permute mask:  */
>> +            ".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
>> +            "1: vl %%v20,0(%[R_LI])\n\t"
>> +            /* Process 64byte (16char) blocks.  */
>> +            "srlg %[R_LI],%[R_LEN],4\n\t"
>> +            "clgije %[R_LI],0,10f\n\t"
>> +            "0: vlm %%v16,%%v19,0(%[R_IN])\n\t"
>> +            "vperm %%v16,%%v16,%%v16,%%v20\n\t"
>> +            "vperm %%v17,%%v17,%%v17,%%v20\n\t"
>> +            "vperm %%v18,%%v18,%%v18,%%v20\n\t"
>> +            "vperm %%v19,%%v19,%%v19,%%v20\n\t"
>> +            "vstm %%v16,%%v19,0(%[R_OUT])\n\t"
>> +            "la %[R_IN],64(%[R_IN])\n\t"
>> +            "la %[R_OUT],64(%[R_OUT])\n\t"
>> +            "brctg %[R_LI],0b\n\t"
>> +            "llgfr %[R_LEN],%[R_LEN]\n\t"
>> +            "nilf %[R_LEN],15\n\t"
>> +            /* Process 16byte (4char) blocks.  */
>> +            "10: srlg %[R_LI],%[R_LEN],2\n\t"
>> +            "clgije %[R_LI],0,20f\n\t"
>> +            "11: vl %%v16,0(%[R_IN])\n\t"
>> +            "vperm %%v16,%%v16,%%v16,%%v20\n\t"
>> +            "vst %%v16,0(%[R_OUT])\n\t"
>> +            "la %[R_IN],16(%[R_IN])\n\t"
>> +            "la %[R_OUT],16(%[R_OUT])\n\t"
>> +            "brctg %[R_LI],11b\n\t"
>> +            "nill %[R_LEN],3\n\t"
>> +            /* Process <16bytes.  */
>> +            "20: sll %[R_LEN],2\n\t"
>> +            "ahi %[R_LEN],-1\n\t"
>> +            "jl 30f\n\t"
>> +            "vll %%v16,%[R_LEN],0(%[R_IN])\n\t"
>> +            "vperm %%v16,%%v16,%%v16,%%v20\n\t"
>> +            "vstl %%v16,%[R_LEN],0(%[R_OUT])\n\t"
>> +            "la %[R_IN],1(%[R_LEN],%[R_IN])\n\t"
>> +            "la %[R_OUT],1(%[R_LEN],%[R_OUT])\n\t"
>> +            "30: \n\t"
>> +            ".machine pop"
>> +            : /* outputs */ [R_OUT] "+a" (outptr)
>> +              , [R_IN] "+a" (inptr)
>> +              , [R_LI] "=a" (loop_count)
>> +              , [R_LEN] "+a" (len)
>> +            : /* inputs */
>> +            : /* clobber list*/ "memory", "cc"
>> +              ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")
>> +              ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")
>> +              ASM_CLOBBER_VR ("v20")
>> +            );
>> +  *inptrp = inptr;
>> +  *outptrp = outptr;
>> +
>> +  /* Determine the status.  */
>> +  if (*inptrp == inend)
>> +    result = __GCONV_EMPTY_INPUT;
>> +  else if (*outptrp + 4 > outend)
>> +    result = __GCONV_FULL_OUTPUT;
>> +  else
>> +    result = __GCONV_INCOMPLETE_INPUT;
>> +
>> +  return result;
>> +}
>> +
>> +ICONV_VX_SINGLE (internal_ucs4le_loop)
>> +# include <iconv/skeleton.c>
>> +ICONV_VX_IFUNC (__gconv_transform_internal_ucs4le)
>> +
>> +
>> +/* Transform from UCS4 to the internal, UCS4-like format.  Unlike
>> +   for the other direction we have to check for correct values here.  */
>> +# define DEFINE_INIT        0
>> +# define DEFINE_FINI        0
>> +# define MIN_NEEDED_FROM    4
>> +# define MIN_NEEDED_TO        4
>> +# define FROM_DIRECTION        1
>> +# define FROM_LOOP        ICONV_VX_NAME (ucs4_internal_loop)
>> +# define TO_LOOP        ICONV_VX_NAME (ucs4_internal_loop) /* This is
>> not used.  */
>> +# define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_ucs4_internal)
>> +# define ONE_DIRECTION        0
>> +
>> +
>> +static inline int
>> +__attribute ((always_inline))
>> +ICONV_VX_NAME (ucs4_internal_loop) (struct __gconv_step *step,
>> +                    struct __gconv_step_data *step_data,
>> +                    const unsigned char **inptrp,
>> +                    const unsigned char *inend,
>> +                    unsigned char **outptrp,
>> +                    unsigned char *outend,
>> +                    size_t *irreversible)
>> +{
>> +  int flags = step_data->__flags;
>> +  const unsigned char *inptr = *inptrp;
>> +  unsigned char *outptr = *outptrp;
>> +  int result;
>> +  size_t len, loop_count;
>> +  do
>> +    {
>> +      len = MIN (inend - inptr, outend - outptr) / 4;
>> +      __asm__ volatile (".machine push\n\t"
>> +            ".machine \"z13\"\n\t"
>> +            ".machinemode \"zarch_nohighgprs\"\n\t"
>> +            CONVERT_32BIT_SIZE_T ([R_LEN])
>> +            /* Setup to check for ch > 0x7fffffff.  */
>> +            "larl %[R_LI],9f\n\t"
>> +            "vlm %%v20,%%v21,0(%[R_LI])\n\t"
>> +            "srlg %[R_LI],%[R_LEN],2\n\t"
>> +            "clgije %[R_LI],0,1f\n\t"
>> +            /* Process 16byte (4char) blocks.  */
>> +            "0: vl %%v16,0(%[R_IN])\n\t"
>> +            "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
>> +            "jno 10f\n\t"
>> +            "vst %%v16,0(%[R_OUT])\n\t"
>> +            "la %[R_IN],16(%[R_IN])\n\t"
>> +            "la %[R_OUT],16(%[R_OUT])\n\t"
>> +            "brctg %[R_LI],0b\n\t"
>> +            "llgfr %[R_LEN],%[R_LEN]\n\t"
>> +            "nilf %[R_LEN],3\n\t"
>> +            /* Process <16bytes.  */
>> +            "1: sll %[R_LEN],2\n\t"
>> +            "ahik %[R_LI],%[R_LEN],-1\n\t"
>> +            "jl 20f\n\t" /* No further bytes available.  */
>> +            "vll %%v16,%[R_LI],0(%[R_IN])\n\t"
>> +            "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
>> +            "vlgvb %[R_LI],%%v22,7\n\t"
>> +            "clr %[R_LI],%[R_LEN]\n\t"
>> +            "locgrhe %[R_LI],%[R_LEN]\n\t"
>> +            "locghihe %[R_LEN],0\n\t"
>> +            "j 11f\n\t"
>> +            /* v20: Vector string range compare values.  */
>> +            "9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
>> +            /* v21: Vector string range compare control-bits.
>> +               element 0: >; element 1: =<> (always true)  */
>> +            ".long 0x20000000,0xE0000000,0x0,0x0\n\t"
>> +            /* Found a value > 0x7fffffff.  */
>> +            "10: vlgvb %[R_LI],%%v22,7\n\t"
>> +            /* Store characters before invalid one.  */
>> +            "11: aghi %[R_LI],-1\n\t"
>> +            "jl 20f\n\t"
>> +            "vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
>> +            "la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
>> +            "la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
>> +            "20:\n\t"
>> +            ".machine pop"
>> +            : /* outputs */ [R_OUT] "+a" (outptr)
>> +              , [R_IN] "+a" (inptr)
>> +              , [R_LI] "=a" (loop_count)
>> +              , [R_LEN] "+d" (len)
>> +            : /* inputs */
>> +            : /* clobber list*/ "memory", "cc"
>> +              ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
>> +              ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
>> +            );
>> +      if (len > 0)
>> +    {
>> +      /* The value is too large.  We don't try transliteration here
>> since
>> +         this is not an error because of the lack of possibilities to
>> +         represent the result.  This is a genuine bug in the input since
>> +         UCS4 does not allow such values.  */
>> +      if (irreversible == NULL)
>> +        /* We are transliterating, don't try to correct anything.  */
>> +        return __GCONV_ILLEGAL_INPUT;
>> +
>> +      if (flags & __GCONV_IGNORE_ERRORS)
>> +        {
>> +          /* Just ignore this character.  */
>> +          ++*irreversible;
>> +          inptr += 4;
>> +          continue;
>> +        }
>> +
>> +      *inptrp = inptr;
>> +      *outptrp = outptr;
>> +      return __GCONV_ILLEGAL_INPUT;
>> +    }
>> +    }
>> +  while (len > 0);
>> +
>> +  *inptrp = inptr;
>> +  *outptrp = outptr;
>> +
>> +  /* Determine the status.  */
>> +  if (*inptrp == inend)
>> +    result = __GCONV_EMPTY_INPUT;
>> +  else if (*outptrp + 4 > outend)
>> +    result = __GCONV_FULL_OUTPUT;
>> +  else
>> +    result = __GCONV_INCOMPLETE_INPUT;
>> +
>> +  return result;
>> +}
>> +
>> +ICONV_VX_SINGLE (ucs4_internal_loop)
>> +# include <iconv/skeleton.c>
>> +ICONV_VX_IFUNC (__gconv_transform_ucs4_internal)
>> +
>> +
>> +/* Transform from UCS4-LE to the internal encoding.  */
>> +# define DEFINE_INIT        0
>> +# define DEFINE_FINI        0
>> +# define MIN_NEEDED_FROM    4
>> +# define MIN_NEEDED_TO        4
>> +# define FROM_DIRECTION        1
>> +# define FROM_LOOP        ICONV_VX_NAME (ucs4le_internal_loop)
>> +# define TO_LOOP        ICONV_VX_NAME (ucs4le_internal_loop) /* This
>> is not used.  */
>> +# define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_ucs4le_internal)
>> +# define ONE_DIRECTION        0
>> +
>> +static inline int
>> +__attribute ((always_inline))
>> +ICONV_VX_NAME (ucs4le_internal_loop) (struct __gconv_step *step,
>> +                      struct __gconv_step_data *step_data,
>> +                      const unsigned char **inptrp,
>> +                      const unsigned char *inend,
>> +                      unsigned char **outptrp,
>> +                      unsigned char *outend,
>> +                      size_t *irreversible)
>> +{
>> +  int flags = step_data->__flags;
>> +  const unsigned char *inptr = *inptrp;
>> +  unsigned char *outptr = *outptrp;
>> +  int result;
>> +  size_t len, loop_count;
>> +  do
>> +    {
>> +      len = MIN (inend - inptr, outend - outptr) / 4;
>> +      __asm__ volatile (".machine push\n\t"
>> +            ".machine \"z13\"\n\t"
>> +            ".machinemode \"zarch_nohighgprs\"\n\t"
>> +            CONVERT_32BIT_SIZE_T ([R_LEN])
>> +            /* Setup to check for ch > 0x7fffffff.  */
>> +            "larl %[R_LI],9f\n\t"
>> +            "vlm %%v20,%%v22,0(%[R_LI])\n\t"
>> +            "srlg %[R_LI],%[R_LEN],2\n\t"
>> +            "clgije %[R_LI],0,1f\n\t"
>> +            /* Process 16byte (4char) blocks.  */
>> +            "0: vl %%v16,0(%[R_IN])\n\t"
>> +            "vperm %%v16,%%v16,%%v16,%%v22\n\t"
>> +            "vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
>> +            "jno 10f\n\t"
>> +            "vst %%v16,0(%[R_OUT])\n\t"
>> +            "la %[R_IN],16(%[R_IN])\n\t"
>> +            "la %[R_OUT],16(%[R_OUT])\n\t"
>> +            "brctg %[R_LI],0b\n\t"
>> +            "llgfr %[R_LEN],%[R_LEN]\n\t"
>> +            "nilf %[R_LEN],3\n\t"
>> +            /* Process <16bytes.  */
>> +            "1: sll %[R_LEN],2\n\t"
>> +            "ahik %[R_LI],%[R_LEN],-1\n\t"
>> +            "jl 20f\n\t" /* No further bytes available.  */
>> +            "vll %%v16,%[R_LI],0(%[R_IN])\n\t"
>> +            "vperm %%v16,%%v16,%%v16,%%v22\n\t"
>> +            "vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
>> +            "vlgvb %[R_LI],%%v23,7\n\t"
>> +            "clr %[R_LI],%[R_LEN]\n\t"
>> +            "locgrhe %[R_LI],%[R_LEN]\n\t"
>> +            "locghihe %[R_LEN],0\n\t"
>> +            "j 11f\n\t"
>> +            /* v20: Vector string range compare values.  */
>> +            "9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
>> +            /* v21: Vector string range compare control-bits.
>> +               element 0: >; element 1: =<> (always true)  */
>> +            ".long 0x20000000,0xE0000000,0x0,0x0\n\t"
>> +            /* v22: Vector permute mask.  */
>> +            ".long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
>> +            /* Found a value > 0x7fffffff.  */
>> +            "10: vlgvb %[R_LI],%%v23,7\n\t"
>> +            /* Store characters before invalid one.  */
>> +            "11: aghi %[R_LI],-1\n\t"
>> +            "jl 20f\n\t"
>> +            "vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
>> +            "la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
>> +            "la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
>> +            "20:\n\t"
>> +            ".machine pop"
>> +            : /* outputs */ [R_OUT] "+a" (outptr)
>> +              , [R_IN] "+a" (inptr)
>> +              , [R_LI] "=a" (loop_count)
>> +              , [R_LEN] "+d" (len)
>> +            : /* inputs */
>> +            : /* clobber list*/ "memory", "cc"
>> +              ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
>> +              ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
>> +              ASM_CLOBBER_VR ("v23")
>> +            );
>> +      if (len > 0)
>> +    {
>> +      /* The value is too large.  We don't try transliteration here
>> since
>> +         this is not an error because of the lack of possibilities to
>> +         represent the result.  This is a genuine bug in the input since
>> +         UCS4 does not allow such values.  */
>> +      if (irreversible == NULL)
>> +        /* We are transliterating, don't try to correct anything.  */
>> +        return __GCONV_ILLEGAL_INPUT;
>> +
>> +      if (flags & __GCONV_IGNORE_ERRORS)
>> +        {
>> +          /* Just ignore this character.  */
>> +          ++*irreversible;
>> +          inptr += 4;
>> +          continue;
>> +        }
>> +
>> +      *inptrp = inptr;
>> +      *outptrp = outptr;
>> +      return __GCONV_ILLEGAL_INPUT;
>> +    }
>> +    }
>> +  while (len > 0);
>> +
>> +  *inptrp = inptr;
>> +  *outptrp = outptr;
>> +
>> +  /* Determine the status.  */
>> +  if (*inptrp == inend)
>> +    result = __GCONV_EMPTY_INPUT;
>> +  else if (*inptrp + 4 > inend)
>> +    result = __GCONV_INCOMPLETE_INPUT;
>> +  else
>> +    {
>> +      assert (*outptrp + 4 > outend);
>> +      result = __GCONV_FULL_OUTPUT;
>> +    }
>> +
>> +  return result;
>> +}
>> +ICONV_VX_SINGLE (ucs4le_internal_loop)
>> +# include <iconv/skeleton.c>
>> +ICONV_VX_IFUNC (__gconv_transform_ucs4le_internal)
>> +
>> +/* Convert from UCS2 to the internal (UCS4-like) format.  */
>> +# define DEFINE_INIT        0
>> +# define DEFINE_FINI        0
>> +# define MIN_NEEDED_FROM    2
>> +# define MIN_NEEDED_TO        4
>> +# define FROM_DIRECTION        1
>> +# define FROM_LOOP        ICONV_VX_NAME (ucs2_internal_loop)
>> +# define TO_LOOP        ICONV_VX_NAME (ucs2_internal_loop) /* This is
>> not used.  */
>> +# define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_ucs2_internal)
>> +# define ONE_DIRECTION        1
>> +
>> +# define MIN_NEEDED_INPUT    MIN_NEEDED_FROM
>> +# define MIN_NEEDED_OUTPUT    MIN_NEEDED_TO
>> +# define LOOPFCT        FROM_LOOP
>> +# define BODY_ORIG_ERROR                        \
>> +  /* Surrogate characters in UCS-2 input are not valid.  Reject        \
>> +     them.  (Catching this here is not security relevant.)  */        \
>> +  STANDARD_FROM_LOOP_ERR_HANDLER (2);
>> +# define BODY_ORIG                            \
>> +  {                                    \
>> +    uint16_t u1 = get16 (inptr);                    \
>> +                                    \
>> +    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))            \
>> +      {                                    \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +                                    \
>> +    *((uint32_t *) outptr) = u1;                    \
>> +    outptr += sizeof (uint32_t);                    \
>> +    inptr += 2;                                \
>> +  }
>> +# define BODY                                \
>> +  {                                    \
>> +    size_t len, tmp, tmp2;                        \
>> +    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);        \
>> +    __asm__ volatile (".machine push\n\t"                \
>> +              ".machine \"z13\"\n\t"                \
>> +              ".machinemode \"zarch_nohighgprs\"\n\t"        \
>> +              CONVERT_32BIT_SIZE_T ([R_LEN])            \
>> +              /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
>> +              "larl %[R_TMP],9f\n\t"                \
>> +              "vlm %%v20,%%v21,0(%[R_TMP])\n\t"            \
>> +              "srlg %[R_TMP],%[R_LEN],3\n\t"            \
>> +              "clgije %[R_TMP],0,1f\n\t"            \
>> +              /* Process 16byte (8char) blocks.  */        \
>> +              "0: vl %%v16,0(%[R_IN])\n\t"            \
>> +              "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"        \
>> +              /* Enlarge UCS2 to UCS4.  */            \
>> +              "vuplhh %%v17,%%v16\n\t"                \
>> +              "vupllh %%v18,%%v16\n\t"                \
>> +              "jno 10f\n\t"                    \
>> +              /* Store 32bytes to buf_out.  */            \
>> +              "vstm %%v17,%%v18,0(%[R_OUT])\n\t"        \
>> +              "la %[R_IN],16(%[R_IN])\n\t"            \
>> +              "la %[R_OUT],32(%[R_OUT])\n\t"            \
>> +              "brctg %[R_TMP],0b\n\t"                \
>> +              "llgfr %[R_LEN],%[R_LEN]\n\t"            \
>> +              "nilf %[R_LEN],7\n\t"                \
>> +              /* Process <16bytes.  */                \
>> +              "1: sll %[R_LEN],1\n\t"                \
>> +              "ahik %[R_TMP],%[R_LEN],-1\n\t"            \
>> +              "jl 20f\n\t" /* No further bytes available.  */    \
>> +              "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"        \
>> +              "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"        \
>> +              /* Enlarge UCS2 to UCS4.  */            \
>> +              "vuplhh %%v17,%%v16\n\t"                \
>> +              "vupllh %%v18,%%v16\n\t"                \
>> +              "vlgvb %[R_TMP],%%v19,7\n\t"            \
>> +              "clr %[R_TMP],%[R_LEN]\n\t"            \
>> +              "locgrhe %[R_TMP],%[R_LEN]\n\t"            \
>> +              "locghihe %[R_LEN],0\n\t"                \
>> +              "j 11f\n\t"                    \
>> +              /* v20: Vector string range compare values.  */    \
>> +              "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
>> +              /* v21: Vector string range compare control-bits.    \
>> +             element 0: =>; element 1: <  */        \
>> +              ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
>> +              /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
>> +              "10: vlgvb %[R_TMP],%%v19,7\n\t"            \
>> +              "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"        \
>> +              "sll %[R_TMP],1\n\t"                \
>> +              "lgr %[R_TMP2],%[R_TMP]\n\t"            \
>> +              "ahi %[R_TMP],-1\n\t"                \
>> +              "jl 20f\n\t"                    \
>> +              "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"        \
>> +              "ahi %[R_TMP],-16\n\t"                \
>> +              "jl 19f\n\t"                    \
>> +              "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"        \
>> +              "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"    \
>> +              "20:\n\t"                        \
>> +              ".machine pop"                    \
>> +              : /* outputs */ [R_OUT] "+a" (outptr)        \
>> +            , [R_IN] "+a" (inptr)                \
>> +            , [R_TMP] "=a" (tmp)                \
>> +            , [R_TMP2] "=a" (tmp2)                \
>> +            , [R_LEN] "+d" (len)                \
>> +              : /* inputs */                    \
>> +              : /* clobber list*/ "memory", "cc"        \
>> +            ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")    \
>> +            ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")    \
>> +            ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")    \
>> +              );                        \
>> +    if (len > 0)                            \
>> +      {                                    \
>> +    /* Found an invalid character at next input-char.  */        \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +  }
>> +
>> +# define LOOP_NEED_FLAGS
>> +# include <iconv/loop.c>
>> +# include <iconv/skeleton.c>
>> +# undef BODY_ORIG
>> +# undef BODY_ORIG_ERROR
>> +ICONV_VX_IFUNC (__gconv_transform_ucs2_internal)
>> +
>> +/* Convert from UCS2 in other endianness to the internal (UCS4-like)
>> format. */
>> +# define DEFINE_INIT        0
>> +# define DEFINE_FINI        0
>> +# define MIN_NEEDED_FROM    2
>> +# define MIN_NEEDED_TO        4
>> +# define FROM_DIRECTION        1
>> +# define FROM_LOOP        ICONV_VX_NAME (ucs2reverse_internal_loop)
>> +# define TO_LOOP        ICONV_VX_NAME (ucs2reverse_internal_loop) /*
>> This is not used.*/
>> +# define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_ucs2reverse_internal)
>> +# define ONE_DIRECTION        1
>> +
>> +# define MIN_NEEDED_INPUT    MIN_NEEDED_FROM
>> +# define MIN_NEEDED_OUTPUT    MIN_NEEDED_TO
>> +# define LOOPFCT        FROM_LOOP
>> +# define BODY_ORIG_ERROR                        \
>> +  /* Surrogate characters in UCS-2 input are not valid.  Reject        \
>> +     them.  (Catching this here is not security relevant.)  */        \
>> +  if (! ignore_errors_p ())                        \
>> +    {                                    \
>> +      result = __GCONV_ILLEGAL_INPUT;                    \
>> +      break;                                \
>> +    }                                    \
>> +  inptr += 2;                                \
>> +  ++*irreversible;                            \
>> +  continue;
>> +
>> +# define BODY_ORIG \
>> +  {                                    \
>> +    uint16_t u1 = bswap_16 (get16 (inptr));                \
>> +                                    \
>> +    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))            \
>> +      {                                    \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +                                    \
>> +    *((uint32_t *) outptr) = u1;                    \
>> +    outptr += sizeof (uint32_t);                    \
>> +    inptr += 2;                                \
>> +  }
>> +# define BODY                                \
>> +  {                                    \
>> +    size_t len, tmp, tmp2;                        \
>> +    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);        \
>> +    __asm__ volatile (".machine push\n\t"                \
>> +              ".machine \"z13\"\n\t"                \
>> +              ".machinemode \"zarch_nohighgprs\"\n\t"        \
>> +              CONVERT_32BIT_SIZE_T ([R_LEN])            \
>> +              /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
>> +              "larl %[R_TMP],9f\n\t"                \
>> +              "vlm %%v20,%%v22,0(%[R_TMP])\n\t"            \
>> +              "srlg %[R_TMP],%[R_LEN],3\n\t"            \
>> +              "clgije %[R_TMP],0,1f\n\t"            \
>> +              /* Process 16byte (8char) blocks.  */        \
>> +              "0: vl %%v16,0(%[R_IN])\n\t"            \
>> +              "vperm %%v16,%%v16,%%v16,%%v22\n\t"        \
>> +              "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"        \
>> +              /* Enlarge UCS2 to UCS4.  */            \
>> +              "vuplhh %%v17,%%v16\n\t"                \
>> +              "vupllh %%v18,%%v16\n\t"                \
>> +              "jno 10f\n\t"                    \
>> +              /* Store 32bytes to buf_out.  */            \
>> +              "vstm %%v17,%%v18,0(%[R_OUT])\n\t"        \
>> +              "la %[R_IN],16(%[R_IN])\n\t"            \
>> +              "la %[R_OUT],32(%[R_OUT])\n\t"            \
>> +              "brctg %[R_TMP],0b\n\t"                \
>> +              "llgfr %[R_LEN],%[R_LEN]\n\t"            \
>> +              "nilf %[R_LEN],7\n\t"                \
>> +              /* Process <16bytes.  */                \
>> +              "1: sll %[R_LEN],1\n\t"                \
>> +              "ahik %[R_TMP],%[R_LEN],-1\n\t"            \
>> +              "jl 20f\n\t" /* No further bytes available.  */    \
>> +              "vll %%v16,%[R_TMP],0(%[R_IN])\n\t"        \
>> +              "vperm %%v16,%%v16,%%v16,%%v22\n\t"        \
>> +              "vstrchs %%v19,%%v16,%%v20,%%v21\n\t"        \
>> +              /* Enlarge UCS2 to UCS4.  */            \
>> +              "vuplhh %%v17,%%v16\n\t"                \
>> +              "vupllh %%v18,%%v16\n\t"                \
>> +              "vlgvb %[R_TMP],%%v19,7\n\t"            \
>> +              "clr %[R_TMP],%[R_LEN]\n\t"            \
>> +              "locgrhe %[R_TMP],%[R_LEN]\n\t"            \
>> +              "locghihe %[R_LEN],0\n\t"                \
>> +              "j 11f\n\t"                    \
>> +              /* v20: Vector string range compare values.  */    \
>> +              "9: .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
>> +              /* v21: Vector string range compare control-bits.    \
>> +             element 0: =>; element 1: <  */        \
>> +              ".short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
>> +              /* v22: Vector permute mask.  */            \
>> +              ".short 0x0100,0x0302,0x0504,0x0706\n\t"        \
>> +              ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"        \
>> +              /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
>> +              "10: vlgvb %[R_TMP],%%v19,7\n\t"            \
>> +              "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"        \
>> +              "sll %[R_TMP],1\n\t"                \
>> +              "lgr %[R_TMP2],%[R_TMP]\n\t"            \
>> +              "ahi %[R_TMP],-1\n\t"                \
>> +              "jl 20f\n\t"                    \
>> +              "vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"        \
>> +              "ahi %[R_TMP],-16\n\t"                \
>> +              "jl 19f\n\t"                    \
>> +              "vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"        \
>> +              "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"    \
>> +              "20:\n\t"                        \
>> +              ".machine pop"                    \
>> +              : /* outputs */ [R_OUT] "+a" (outptr)        \
>> +            , [R_IN] "+a" (inptr)                \
>> +            , [R_TMP] "=a" (tmp)                \
>> +            , [R_TMP2] "=a" (tmp2)                \
>> +            , [R_LEN] "+d" (len)                \
>> +              : /* inputs */                    \
>> +              : /* clobber list*/ "memory", "cc"        \
>> +            ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")    \
>> +            ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")    \
>> +            ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")    \
>> +            ASM_CLOBBER_VR ("v22")                \
>> +              );                        \
>> +    if (len > 0)                            \
>> +      {                                    \
>> +    /* Found an invalid character at next input-char.  */        \
>> +    BODY_ORIG_ERROR                            \
>> +      }                                    \
>> +  }
>> +# define LOOP_NEED_FLAGS
>> +# include <iconv/loop.c>
>> +# include <iconv/skeleton.c>
>> +# undef BODY_ORIG
>> +# undef BODY_ORIG_ERROR
>> +ICONV_VX_IFUNC (__gconv_transform_ucs2reverse_internal)
>> +
>> +/* Convert from the internal (UCS4-like) format to UCS2.  */
>> +#define DEFINE_INIT        0
>> +#define DEFINE_FINI        0
>> +#define MIN_NEEDED_FROM        4
>> +#define MIN_NEEDED_TO        2
>> +#define FROM_DIRECTION        1
>> +#define FROM_LOOP        ICONV_VX_NAME (internal_ucs2_loop)
>> +#define TO_LOOP            ICONV_VX_NAME (internal_ucs2_loop) /* This
>> is not used.  */
>> +#define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_internal_ucs2)
>> +#define ONE_DIRECTION        1
>> +
>> +#define MIN_NEEDED_INPUT    MIN_NEEDED_FROM
>> +#define MIN_NEEDED_OUTPUT    MIN_NEEDED_TO
>> +#define LOOPFCT            FROM_LOOP
>> +#define BODY_ORIG                            \
>> +  {                                    \
>> +    uint32_t val = *((const uint32_t *) inptr);                \
>> +                                    \
>> +    if (__glibc_unlikely (val >= 0x10000))                \
>> +      {                                    \
>> +    UNICODE_TAG_HANDLER (val, 4);                    \
>> +    STANDARD_TO_LOOP_ERR_HANDLER (4);                \
>> +      }                                    \
>> +    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))        \
>> +      {                                    \
>> +    /* Surrogate characters in UCS-4 input are not valid.        \
>> +       We must catch this, because the UCS-2 output might be    \
>> +       interpreted as UTF-16 by other programs.  If we let        \
>> +       surrogates pass through, attackers could make a security    \
>> +       hole exploit by synthesizing any desired plane 1-16        \
>> +       character.  */                        \
>> +    result = __GCONV_ILLEGAL_INPUT;                    \
>> +    if (! ignore_errors_p ())                    \
>> +      break;                            \
>> +    inptr += 4;                            \
>> +    ++*irreversible;                        \
>> +    continue;                            \
>> +      }                                    \
>> +    else                                \
>> +      {                                    \
>> +    put16 (outptr, val);                        \
>> +    outptr += sizeof (uint16_t);                    \
>> +    inptr += 4;                            \
>> +      }                                    \
>> +  }
>> +# define BODY                                \
>> +  {                                    \
>> +    if (__builtin_expect (inend - inptr < 32, 1)            \
>> +    || outend - outptr < 16)                    \
>> +      /* Convert remaining bytes with c code.  */            \
>> +      BODY_ORIG                                \
>> +    else                                \
>> +      {                                    \
>> +    /* Convert in 32 byte blocks.  */                \
>> +    size_t loop_count = (inend - inptr) / 32;            \
>> +    size_t tmp, tmp2;                        \
>> +    if (loop_count > (outend - outptr) / 16)            \
>> +      loop_count = (outend - outptr) / 16;                \
>> +    __asm__ volatile (".machine push\n\t"                \
>> +              ".machine \"z13\"\n\t"            \
>> +              ".machinemode \"zarch_nohighgprs\"\n\t"    \
>> +              CONVERT_32BIT_SIZE_T ([R_LI])            \
>> +              "larl %[R_I],3f\n\t"                \
>> +              "vlm %%v20,%%v23,0(%[R_I])\n\t"        \
>> +              "0:\n\t"                    \
>> +              "vlm %%v16,%%v17,0(%[R_IN])\n\t"        \
>> +              /* Shorten UCS4 to UCS2.  */            \
>> +              "vpkf %%v18,%%v16,%%v17\n\t"            \
>> +              "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"        \
>> +              "jno 11f\n\t"                    \
>> +              "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"    \
>> +              "jno 10f\n\t"                    \
>> +              /* Store 16bytes to buf_out.  */        \
>> +              "2: vst %%v18,0(%[R_OUT])\n\t"        \
>> +              "la %[R_IN],32(%[R_IN])\n\t"            \
>> +              "la %[R_OUT],16(%[R_OUT])\n\t"        \
>> +              "brctg %[R_LI],0b\n\t"            \
>> +              "j 20f\n\t"                    \
>> +              /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
>> +              "3: .long 0xd800,0xd800,0x0,0x0\n\t"        \
>> +              ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"    \
>> +              /* Setup to check for ch >= 0xe000        \
>> +                 && ch < 0x10000. (v22,v23)  */        \
>> +              ".long 0xe000,0x10000,0x0,0x0\n\t"        \
>> +              ".long 0xa0000000,0x40000000,0x0,0x0\n\t"    \
>> +              /* v16 contains only valid chars. Check in v17: \
>> +                 ch >= 0xe000 && ch <= 0xffff.  */        \
>> +              "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"    \
>> +              "jo 2b\n\t" /* All ch's in this range, proceed.   */ \
>> +              "lhi %[R_TMP],16\n\t"                \
>> +              "j 12f\n\t"                    \
>> +              /* Maybe v16 contains invalid chars.        \
>> +                 Check ch >= 0xe000 && ch <= 0xffff.  */    \
>> +              "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"    \
>> +              "jo 1b\n\t" /* All ch's in this range, proceed.   */ \
>> +              "lhi %[R_TMP],0\n\t"                \
>> +              "12: vlgvb %[R_I],%%v19,7\n\t"        \
>> +              "agr %[R_I],%[R_TMP]\n\t"            \
>> +              "la %[R_IN],0(%[R_I],%[R_IN])\n\t"        \
>> +              "srl %[R_I],1\n\t"                \
>> +              "ahi %[R_I],-1\n\t"                \
>> +              "jl 20f\n\t"                    \
>> +              "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"        \
>> +              "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"        \
>> +              "20:\n\t"                    \
>> +              ".machine pop"                \
>> +              : /* outputs */ [R_OUT] "+a" (outptr)        \
>> +                , [R_IN] "+a" (inptr)            \
>> +                , [R_LI] "+d" (loop_count)            \
>> +                , [R_I] "=a" (tmp2)                \
>> +                , [R_TMP] "=d" (tmp)            \
>> +              : /* inputs */                \
>> +              : /* clobber list*/ "memory", "cc"        \
>> +                ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
>> +                ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
>> +                ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
>> +                ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
>> +              );                        \
>> +    if (loop_count > 0)                        \
>> +      {                                \
>> +        /* Found an invalid character at next character.  */    \
>> +        BODY_ORIG                            \
>> +      }                                \
>> +      }                                    \
>> +  }
>> +#define LOOP_NEED_FLAGS
>> +#include <iconv/loop.c>
>> +#include <iconv/skeleton.c>
>> +# undef BODY_ORIG
>> +ICONV_VX_IFUNC (__gconv_transform_internal_ucs2)
>> +
>> +/* Convert from the internal (UCS4-like) format to UCS2 in other
>> endianness. */
>> +#define DEFINE_INIT        0
>> +#define DEFINE_FINI        0
>> +#define MIN_NEEDED_FROM        4
>> +#define MIN_NEEDED_TO        2
>> +#define FROM_DIRECTION        1
>> +#define FROM_LOOP        ICONV_VX_NAME (internal_ucs2reverse_loop)
>> +#define TO_LOOP            ICONV_VX_NAME
>> (internal_ucs2reverse_loop)/* This is not used.*/
>> +#define FUNCTION_NAME        ICONV_VX_NAME
>> (__gconv_transform_internal_ucs2reverse)
>> +#define ONE_DIRECTION        1
>> +
>> +#define MIN_NEEDED_INPUT    MIN_NEEDED_FROM
>> +#define MIN_NEEDED_OUTPUT    MIN_NEEDED_TO
>> +#define LOOPFCT            FROM_LOOP
>> +#define BODY_ORIG                            \
>> +  {                                    \
>> +    uint32_t val = *((const uint32_t *) inptr);                \
>> +    if (__glibc_unlikely (val >= 0x10000))                \
>> +      {                                    \
>> +    UNICODE_TAG_HANDLER (val, 4);                    \
>> +    STANDARD_TO_LOOP_ERR_HANDLER (4);                \
>> +      }                                    \
>> +    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))        \
>> +      {                                    \
>> +    /* Surrogate characters in UCS-4 input are not valid.        \
>> +       We must catch this, because the UCS-2 output might be    \
>> +       interpreted as UTF-16 by other programs.  If we let        \
>> +       surrogates pass through, attackers could make a security    \
>> +       hole exploit by synthesizing any desired plane 1-16        \
>> +       character.  */                        \
>> +    if (! ignore_errors_p ())                    \
>> +      {                                \
>> +        result = __GCONV_ILLEGAL_INPUT;                \
>> +        break;                            \
>> +      }                                \
>> +    inptr += 4;                            \
>> +    ++*irreversible;                        \
>> +    continue;                            \
>> +      }                                    \
>> +    else                                \
>> +      {                                    \
>> +    put16 (outptr, bswap_16 (val));                    \
>> +    outptr += sizeof (uint16_t);                    \
>> +    inptr += 4;                            \
>> +      }                                    \
>> +  }
>> +# define BODY                                \
>> +  {                                    \
>> +    if (__builtin_expect (inend - inptr < 32, 1)            \
>> +    || outend - outptr < 16)                    \
>> +      /* Convert remaining bytes with c code.  */            \
>> +      BODY_ORIG                                \
>> +    else                                \
>> +      {                                    \
>> +    /* Convert in 32 byte blocks.  */                \
>> +    size_t loop_count = (inend - inptr) / 32;            \
>> +    size_t tmp, tmp2;                        \
>> +    if (loop_count > (outend - outptr) / 16)            \
>> +      loop_count = (outend - outptr) / 16;                \
>> +    __asm__ volatile (".machine push\n\t"                \
>> +              ".machine \"z13\"\n\t"            \
>> +              ".machinemode \"zarch_nohighgprs\"\n\t"    \
>> +              CONVERT_32BIT_SIZE_T ([R_LI])            \
>> +              "larl %[R_I],3f\n\t"                \
>> +              "vlm %%v20,%%v24,0(%[R_I])\n\t"        \
>> +              "0:\n\t"                    \
>> +              "vlm %%v16,%%v17,0(%[R_IN])\n\t"        \
>> +              /* Shorten UCS4 to UCS2 and byteswap.  */    \
>> +              "vpkf %%v18,%%v16,%%v17\n\t"            \
>> +              "vperm %%v18,%%v18,%%v18,%%v24\n\t"        \
>> +              "vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"        \
>> +              "jno 11f\n\t"                    \
>> +              "1: vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"    \
>> +              "jno 10f\n\t"                    \
>> +              /* Store 16bytes to buf_out.  */        \
>> +              "2: vst %%v18,0(%[R_OUT])\n\t"        \
>> +              "la %[R_IN],32(%[R_IN])\n\t"            \
>> +              "la %[R_OUT],16(%[R_OUT])\n\t"        \
>> +              "brctg %[R_LI],0b\n\t"            \
>> +              "j 20f\n\t"                    \
>> +              /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
>> +              "3: .long 0xd800,0xd800,0x0,0x0\n\t"        \
>> +              ".long 0xa0000000,0xa0000000,0x0,0x0\n\t"    \
>> +              /* Setup to check for ch >= 0xe000        \
>> +                 && ch < 0x10000. (v22,v23)  */        \
>> +              ".long 0xe000,0x10000,0x0,0x0\n\t"        \
>> +              ".long 0xa0000000,0x40000000,0x0,0x0\n\t"    \
>> +              /* Vector permute mask (v24)  */        \
>> +              ".short 0x0100,0x0302,0x0504,0x0706\n\t"    \
>> +              ".short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"    \
>> +              /* v16 contains only valid chars. Check in v17: \
>> +                 ch >= 0xe000 && ch <= 0xffff.  */        \
>> +              "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"    \
>> +              "jo 2b\n\t" /* All ch's in this range, proceed.  */ \
>> +              "lhi %[R_TMP],16\n\t"                \
>> +              "j 12f\n\t"                    \
>> +              /* Maybe v16 contains invalid chars.        \
>> +                 Check ch >= 0xe000 && ch <= 0xffff.  */    \
>> +              "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"    \
>> +              "jo 1b\n\t" /* All ch's in this range, proceed.  */ \
>> +              "lhi %[R_TMP],0\n\t"                \
>> +              "12: vlgvb %[R_I],%%v19,7\n\t"        \
>> +              "agr %[R_I],%[R_TMP]\n\t"            \
>> +              "la %[R_IN],0(%[R_I],%[R_IN])\n\t"        \
>> +              "srl %[R_I],1\n\t"                \
>> +              "ahi %[R_I],-1\n\t"                \
>> +              "jl 20f\n\t"                    \
>> +              "vstl %%v18,%[R_I],0(%[R_OUT])\n\t"        \
>> +              "la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"        \
>> +              "20:\n\t"                    \
>> +              ".machine pop"                \
>> +              : /* outputs */ [R_OUT] "+a" (outptr)        \
>> +                , [R_IN] "+a" (inptr)            \
>> +                , [R_LI] "+d" (loop_count)            \
>> +                , [R_I] "=a" (tmp2)                \
>> +                , [R_TMP] "=d" (tmp)            \
>> +              : /* inputs */                \
>> +              : /* clobber list*/ "memory", "cc"        \
>> +                ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
>> +                ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
>> +                ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
>> +                ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
>> +                ASM_CLOBBER_VR ("v24")            \
>> +              );                        \
>> +    if (loop_count > 0)                        \
>> +      {                                \
>> +        /* Found an invalid character at next character.  */    \
>> +        BODY_ORIG                            \
>> +      }                                \
>> +      }                                    \
>> +  }
>> +#define LOOP_NEED_FLAGS
>> +#include <iconv/loop.c>
>> +#include <iconv/skeleton.c>
>> +# undef BODY_ORIG
>> +ICONV_VX_IFUNC (__gconv_transform_internal_ucs2reverse)
>> +
>> +
>> +#else
>> +/* Generate the internal transformations without ifunc if build
>> environment
>> +   lacks vector support. Instead simply include the common version.  */
>> +# include <iconv/gconv_simple.c>
>> +#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
>>

[-- Attachment #2: 0005-S390-Optimize-builtin-iconv-modules.patch --]
[-- Type: text/x-patch, Size: 48997 bytes --]

From 524552ddbd06e87deb3381383950620010e51a78 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 05/14] S390: Optimize builtin iconv-modules.

This patch introduces a s390 specific gconv_simple.c file which provides
optimized versions for z13 with vector instructions, which will be chosen at
runtime via ifunc.
The optimized conversions can convert between internal and ascii, ucs4, ucs4le,
ucs2, ucs2le.
If the build-environment lacks vector support, then iconv/gconv_simple.c
is used wihtout any change. Otherwise iconvdata/gconv_simple.c is used to create
conversion loop routines without vector instructions as fallback, if vector
instructions aren't available at runtime.

ChangeLog:

	* sysdeps/s390/multiarch/gconv_simple.c: New File.
	* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add gconv_simple.
---
 sysdeps/s390/multiarch/Makefile       |    4 +
 sysdeps/s390/multiarch/gconv_simple.c | 1266 +++++++++++++++++++++++++++++++++
 2 files changed, 1270 insertions(+)
 create mode 100644 sysdeps/s390/multiarch/gconv_simple.c

diff --git a/sysdeps/s390/multiarch/Makefile b/sysdeps/s390/multiarch/Makefile
index 11ad2b9..24949cd 100644
--- a/sysdeps/s390/multiarch/Makefile
+++ b/sysdeps/s390/multiarch/Makefile
@@ -52,3 +52,7 @@ $(move-if-change) $(@:stmp=T) $(@:stmp=h)
 touch $@
 endef
 endif
+
+ifeq ($(subdir),iconv)
+sysdep_routines += gconv_simple
+endif
diff --git a/sysdeps/s390/multiarch/gconv_simple.c b/sysdeps/s390/multiarch/gconv_simple.c
new file mode 100644
index 0000000..dc53a48
--- /dev/null
+++ b/sysdeps/s390/multiarch/gconv_simple.c
@@ -0,0 +1,1266 @@
+/* Simple transformations functions - s390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# include <ifunc-resolve.h>
+
+# if defined HAVE_S390_VX_GCC_SUPPORT
+#  define ASM_CLOBBER_VR(NR) , NR
+# else
+#  define ASM_CLOBBER_VR(NR)
+# endif
+
+# define ICONV_C_NAME(NAME) __##NAME##_c
+# define ICONV_VX_NAME(NAME) __##NAME##_vx
+# define ICONV_VX_IFUNC(FUNC)						\
+  extern __typeof (ICONV_C_NAME (FUNC)) __##FUNC;			\
+  s390_vx_libc_ifunc (__##FUNC)						\
+  int FUNC (struct __gconv_step *step, struct __gconv_step_data *data,	\
+	    const unsigned char **inptrp, const unsigned char *inend,	\
+	    unsigned char **outbufstart, size_t *irreversible,		\
+	    int do_flush, int consume_incomplete)			\
+  {									\
+    return __##FUNC (step, data, inptrp, inend,outbufstart,		\
+		     irreversible, do_flush, consume_incomplete);	\
+  }
+# define ICONV_VX_SINGLE(NAME)						\
+  static __typeof (NAME##_single) __##NAME##_vx_single __attribute__((alias(#NAME "_single")));
+
+/* Generate the transformations which are used, if the target machine does not
+   support vector instructions.  */
+# define __gconv_transform_ascii_internal		\
+  ICONV_C_NAME (__gconv_transform_ascii_internal)
+# define __gconv_transform_internal_ascii		\
+  ICONV_C_NAME (__gconv_transform_internal_ascii)
+# define __gconv_transform_internal_ucs4le		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs4le)
+# define __gconv_transform_ucs4_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4_internal)
+# define __gconv_transform_ucs4le_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4le_internal)
+# define __gconv_transform_ucs2_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2_internal)
+# define __gconv_transform_ucs2reverse_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2reverse_internal)
+# define __gconv_transform_internal_ucs2		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2)
+# define __gconv_transform_internal_ucs2reverse		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2reverse)
+
+
+# include <iconv/gconv_simple.c>
+
+# undef __gconv_transform_ascii_internal
+# undef __gconv_transform_internal_ascii
+# undef __gconv_transform_internal_ucs4le
+# undef __gconv_transform_ucs4_internal
+# undef __gconv_transform_ucs4le_internal
+# undef __gconv_transform_ucs2_internal
+# undef __gconv_transform_ucs2reverse_internal
+# undef __gconv_transform_internal_ucs2
+# undef __gconv_transform_internal_ucs2reverse
+
+/* Now define the functions with vector support.  */
+# if defined __s390x__
+#  define CONVERT_32BIT_SIZE_T(REG)
+# else
+#  define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+# endif
+
+/* Convert from ISO 646-IRV to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	1
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ascii_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ascii_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ascii_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+    /* The value is too large.  We don't try transliteration here since \
+       this is not an error because of the lack of possibilities to	\
+       represent the result.  This is a genuine bug in the input since	\
+       ASCII does not allow such values.  */				\
+    STANDARD_FROM_LOOP_ERR_HANDLER (1);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*inptr > '\x7f'))				\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*((uint32_t *) outptr) = *inptr++;				\
+	outptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = inend - inptr;						\
+    if (len > (outend - outptr) / 4)					\
+      len = (outend - outptr) / 4;					\
+    size_t loop_count, tmp;						\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		      "    srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "    vrepib %%v31,0x20\n\t"			\
+		      "    clgije %[R_LI],0,1f\n\t"			\
+		      "0:  \n\t" /* Handle 16-byte blocks.  */		\
+		      "    vl %%v16,0(%[R_IN])\n\t"			\
+		      /* Checking for values > 0x7f.  */		\
+		      "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "    jno 10f\n\t"					\
+		      /* Enlarge to UCS4.  */				\
+		      "    vuplhb %%v17,%%v16\n\t"			\
+		      "    vupllb %%v18,%%v16\n\t"			\
+		      "    vuplhh %%v19,%%v17\n\t"			\
+		      "    vupllh %%v20,%%v17\n\t"			\
+		      "    vuplhh %%v21,%%v18\n\t"			\
+		      "    vupllh %%v22,%%v18\n\t"			\
+		      /* Store 64bytes to buf_out.  */			\
+		      "    vstm %%v19,%%v22,0(%[R_OUT])\n\t"		\
+		      "    la %[R_IN],16(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],64(%[R_OUT])\n\t"		\
+		      "    brctg %[R_LI],0b\n\t"			\
+		      "    lghi %[R_LI],15\n\t"				\
+		      "    ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "    je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: aghik %[R_LI],%[R_LEN],-1\n\t"		\
+		      "    jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      "    vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      /* Checking for values > 0x7f.  */		\
+		      "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "    clr %[R_TMP],%[R_LI]\n\t"			\
+		      "    locrh %[R_TMP],%[R_LEN]\n\t"			\
+		      "    locghih %[R_LEN],0\n\t"			\
+		      "    j 12f\n\t"					\
+		      "10:\n\t"						\
+		      /* Found a value > 0x7f.				\
+			 Store the preceding chars.  */			\
+		      "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "12: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "    sllk %[R_TMP],%[R_TMP],2\n\t"		\
+		      "    ahi %[R_TMP],-1\n\t"				\
+		      "    jl 20f\n\t"					\
+		      "    lgr %[R_LI],%[R_TMP]\n\t"			\
+		      "    vuplhb %%v17,%%v16\n\t"			\
+		      "    vuplhh %%v19,%%v17\n\t"			\
+		      "    vstl %%v19,%[R_LI],0(%[R_OUT])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 11f\n\t"					\
+		      "    vupllh %%v20,%%v17\n\t"			\
+		      "    vstl %%v20,%[R_LI],16(%[R_OUT])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 11f\n\t"					\
+		      "    vupllb %%v18,%%v16\n\t"			\
+		      "    vuplhh %%v21,%%v18\n\t"			\
+		      "    vstl %%v21,%[R_LI],32(%[R_OUT])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 11f\n\t"					\
+		      "    vupllh %%v22,%%v18\n\t"			\
+		      "    vstl %%v22,%[R_LI],48(%[R_OUT])\n\t"		\
+		      "11:\n\t"						\
+		      "    la %[R_OUT],1(%[R_TMP],%[R_OUT])\n\t"	\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_TMP] "=a" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+			ASM_CLOBBER_VR ("v31")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at the next input byte.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ascii_internal)
+
+/* Convert from the internal (UCS4-like) format to ISO 646-IRV.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		1
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ascii_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ascii_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ascii)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  UNICODE_TAG_HANDLER (*((const uint32_t *) inptr), 4);			\
+  STANDARD_TO_LOOP_ERR_HANDLER (4);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*((const uint32_t *) inptr) > 0x7f))		\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*outptr++ = *((const uint32_t *) inptr);			\
+	inptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = (inend - inptr) / 4;					\
+    if (len > outend - outptr)						\
+      len = outend - outptr;						\
+    size_t loop_count, tmp, tmp2;					\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch > 0x7f.  */		\
+		      "    vzero %%v21\n\t"				\
+		      "    srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */ \
+		      "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */ \
+		      "    vleif %%v20,127,0\n\t"   /* element 0: 127  */ \
+		      "    lghi %[R_TMP],0\n\t"				\
+		      "    clgije %[R_LI],0,1f\n\t"			\
+		      "0:\n\t"						\
+		      "    vlm %%v16,%%v19,0(%[R_IN])\n\t"		\
+		      /* Shorten to byte values.  */			\
+		      "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		      /* Checking for values > 0x7f.  */		\
+		      "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "    jno 10f\n\t"					\
+		      "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    jno 11f\n\t"					\
+		      "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "    jno 12f\n\t"					\
+		      "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "    jno 13f\n\t"					\
+		      /* Store 16bytes to outptr.  */			\
+		      "    vst %%v23,0(%[R_OUT])\n\t"			\
+		      "    la %[R_IN],64(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],16(%[R_OUT])\n\t"		\
+		      "    brctg %[R_LI],0b\n\t"			\
+		      "    lghi %[R_LI],15\n\t"				\
+		      "    ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "    je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "    aghi %[R_LI],-1\n\t"				\
+		      "    jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Load remaining 1...63 bytes.  */		\
+		      "    vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 2f\n\t"					\
+		      "    vll %%v17,%[R_LI],16(%[R_IN])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 2f\n\t"					\
+		      "    vll %%v18,%[R_LI],32(%[R_IN])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 2f\n\t"					\
+		      "    vll %%v19,%[R_LI],48(%[R_IN])\n\t"		\
+		      "2:\n\t"						\
+		      /* Shorten to byte values.  */			\
+		      "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		      "    sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      "    jl 3f\n\t" /* v16 is not fully loaded.  */	\
+		      "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "    jno 10f\n\t"					\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      "    jl 4f\n\t" /* v17 is not fully loaded.  */	\
+		      "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    jno 11f\n\t"					\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      "    jl 5f\n\t" /* v18 is not fully loaded.  */	\
+		      "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "    jno 12f\n\t"					\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      /* v19 is not fully loaded. */			\
+		      "    lghi %[R_TMP],12\n\t"			\
+		      "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "6: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "    aghi %[R_LI],16\n\t"				\
+		      "    clrjl %[R_I],%[R_LI],14f\n\t"		\
+		      "    lgr %[R_I],%[R_LEN]\n\t"			\
+		      "    lghi %[R_LEN],0\n\t"				\
+		      "    j 15f\n\t"					\
+		      "3: vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "    j 6b\n\t"					\
+		      "4: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    lghi %[R_TMP],4\n\t"				\
+		      "    j 6b\n\t"					\
+		      "5: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    lghi %[R_TMP],8\n\t"				\
+		      "    j 6b\n\t"					\
+		      /* Found a value > 0x7f.  */			\
+		      "13: ahi %[R_TMP],4\n\t"				\
+		      "12: ahi %[R_TMP],4\n\t"				\
+		      "11: ahi %[R_TMP],4\n\t"				\
+		      "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "14: srlg %[R_I],%[R_I],2\n\t"			\
+		      "    agr %[R_I],%[R_TMP]\n\t"			\
+		      "    je 20f\n\t"					\
+		      /* Store characters before invalid one...  */	\
+		      "15: aghi %[R_I],-1\n\t"				\
+		      "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		      /* ... and update pointers.  */			\
+		      "    la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+		      "    sllg %[R_I],%[R_I],2\n\t"			\
+		      "    la %[R_IN],4(%[R_I],%[R_IN])\n\t"		\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_I] "=a" (tmp2)				\
+			, [R_TMP] "=d" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+			ASM_CLOBBER_VR ("v24")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character > 0x7f at next character.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_internal_ascii)
+
+
+/* Convert from internal UCS4 to UCS4 little endian form.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ucs4le_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ucs4le_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs4le)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (internal_ucs4le_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len = MIN (inend - inptr, outend - outptr) / 4;
+  size_t loop_count;
+  __asm__ volatile (".machine push\n\t"
+		    ".machine \"z13\"\n\t"
+		    ".machinemode \"zarch_nohighgprs\"\n\t"
+		    CONVERT_32BIT_SIZE_T ([R_LEN])
+		    "    bras %[R_LI],1f\n\t"
+		    /* Vector permute mask:  */
+		    "    .long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+		    "1:  vl %%v20,0(%[R_LI])\n\t"
+		    /* Process 64byte (16char) blocks.  */
+		    "    srlg %[R_LI],%[R_LEN],4\n\t"
+		    "    clgije %[R_LI],0,10f\n\t"
+		    "0:  vlm %%v16,%%v19,0(%[R_IN])\n\t"
+		    "    vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "    vperm %%v17,%%v17,%%v17,%%v20\n\t"
+		    "    vperm %%v18,%%v18,%%v18,%%v20\n\t"
+		    "    vperm %%v19,%%v19,%%v19,%%v20\n\t"
+		    "    vstm %%v16,%%v19,0(%[R_OUT])\n\t"
+		    "    la %[R_IN],64(%[R_IN])\n\t"
+		    "    la %[R_OUT],64(%[R_OUT])\n\t"
+		    "    brctg %[R_LI],0b\n\t"
+		    "    llgfr %[R_LEN],%[R_LEN]\n\t"
+		    "    nilf %[R_LEN],15\n\t"
+		    /* Process 16byte (4char) blocks.  */
+		    "10: srlg %[R_LI],%[R_LEN],2\n\t"
+		    "    clgije %[R_LI],0,20f\n\t"
+		    "11: vl %%v16,0(%[R_IN])\n\t"
+		    "    vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "    vst %%v16,0(%[R_OUT])\n\t"
+		    "    la %[R_IN],16(%[R_IN])\n\t"
+		    "    la %[R_OUT],16(%[R_OUT])\n\t"
+		    "    brctg %[R_LI],11b\n\t"
+		    "    nill %[R_LEN],3\n\t"
+		    /* Process <16bytes.  */
+		    "20: sll %[R_LEN],2\n\t"
+		    "    ahi %[R_LEN],-1\n\t"
+		    "    jl 30f\n\t"
+		    "    vll %%v16,%[R_LEN],0(%[R_IN])\n\t"
+		    "    vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "    vstl %%v16,%[R_LEN],0(%[R_OUT])\n\t"
+		    "    la %[R_IN],1(%[R_LEN],%[R_IN])\n\t"
+		    "    la %[R_OUT],1(%[R_LEN],%[R_OUT])\n\t"
+		    "30: \n\t"
+		    ".machine pop"
+		    : /* outputs */ [R_OUT] "+a" (outptr)
+		      , [R_IN] "+a" (inptr)
+		      , [R_LI] "=a" (loop_count)
+		      , [R_LEN] "+a" (len)
+		    : /* inputs */
+		    : /* clobber list*/ "memory", "cc"
+		      ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")
+		      ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")
+		      ASM_CLOBBER_VR ("v20")
+		    );
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (internal_ucs4le_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs4le)
+
+
+/* Transform from UCS4 to the internal, UCS4-like format.  Unlike
+   for the other direction we have to check for correct values here.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4_internal)
+# define ONE_DIRECTION		0
+
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4_internal_loop) (struct __gconv_step *step,
+				    struct __gconv_step_data *step_data,
+				    const unsigned char **inptrp,
+				    const unsigned char *inend,
+				    unsigned char **outptrp,
+				    unsigned char *outend,
+				    size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"    larl %[R_LI],9f\n\t"
+			"    vlm %%v20,%%v21,0(%[R_LI])\n\t"
+			"    srlg %[R_LI],%[R_LEN],2\n\t"
+			"    clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0:  vl %%v16,0(%[R_IN])\n\t"
+			"    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"    jno 10f\n\t"
+			"    vst %%v16,0(%[R_OUT])\n\t"
+			"    la %[R_IN],16(%[R_IN])\n\t"
+			"    la %[R_OUT],16(%[R_OUT])\n\t"
+			"    brctg %[R_LI],0b\n\t"
+			"    llgfr %[R_LEN],%[R_LEN]\n\t"
+			"    nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1:  sll %[R_LEN],2\n\t"
+			"    ahik %[R_LI],%[R_LEN],-1\n\t"
+			"    jl 20f\n\t" /* No further bytes available.  */
+			"    vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"    vlgvb %[R_LI],%%v22,7\n\t"
+			"    clr %[R_LI],%[R_LEN]\n\t"
+			"    locgrhe %[R_LI],%[R_LEN]\n\t"
+			"    locghihe %[R_LEN],0\n\t"
+			"    j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9:  .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			"    .long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v22,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"    jl 20f\n\t"
+			"    vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"    la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"    la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (ucs4_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4_internal)
+
+
+/* Transform from UCS4-LE to the internal encoding.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4le_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4le_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4le_internal)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4le_internal_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"    larl %[R_LI],9f\n\t"
+			"    vlm %%v20,%%v22,0(%[R_LI])\n\t"
+			"    srlg %[R_LI],%[R_LEN],2\n\t"
+			"    clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0:  vl %%v16,0(%[R_IN])\n\t"
+			"    vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"    vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"    jno 10f\n\t"
+			"    vst %%v16,0(%[R_OUT])\n\t"
+			"    la %[R_IN],16(%[R_IN])\n\t"
+			"    la %[R_OUT],16(%[R_OUT])\n\t"
+			"    brctg %[R_LI],0b\n\t"
+			"    llgfr %[R_LEN],%[R_LEN]\n\t"
+			"    nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1:  sll %[R_LEN],2\n\t"
+			"    ahik %[R_LI],%[R_LEN],-1\n\t"
+			"    jl 20f\n\t" /* No further bytes available.  */
+			"    vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"    vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"    vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"    vlgvb %[R_LI],%%v23,7\n\t"
+			"    clr %[R_LI],%[R_LEN]\n\t"
+			"    locgrhe %[R_LI],%[R_LEN]\n\t"
+			"    locghihe %[R_LEN],0\n\t"
+			"    j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			"    .long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* v22: Vector permute mask.  */
+			"    .long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v23,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"    jl 20f\n\t"
+			"    vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"    la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"    la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			  ASM_CLOBBER_VR ("v23")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*inptrp + 4 > inend)
+    result = __GCONV_INCOMPLETE_INPUT;
+  else
+    {
+      assert (*outptrp + 4 > outend);
+      result = __GCONV_FULL_OUTPUT;
+    }
+
+  return result;
+}
+ICONV_VX_SINGLE (ucs4le_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4le_internal)
+
+/* Convert from UCS2 to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  STANDARD_FROM_LOOP_ERR_HANDLER (2);
+# define BODY_ORIG							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "    larl %[R_TMP],9f\n\t"			\
+		      "    vlm %%v20,%%v21,0(%[R_TMP])\n\t"		\
+		      "    srlg %[R_TMP],%[R_LEN],3\n\t"		\
+		      "    clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0:  vl %%v16,0(%[R_IN])\n\t"			\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "    la %[R_IN],16(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],32(%[R_OUT])\n\t"		\
+		      "    brctg %[R_TMP],0b\n\t"			\
+		      "    llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "    nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1:  sll %[R_LEN],1\n\t"				\
+		      "    ahik %[R_TMP],%[R_LEN],-1\n\t"		\
+		      "    jl 20f\n\t" /* No further bytes available.  */ \
+		      "    vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "    clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "    locgrhe %[R_TMP],%[R_LEN]\n\t"		\
+		      "    locghihe %[R_LEN],0\n\t"			\
+		      "    j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9:  .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      "    .short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "    sll %[R_TMP],1\n\t"				\
+		      "    lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "    ahi %[R_TMP],-1\n\t"				\
+		      "    jl 20f\n\t"					\
+		      "    vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "    ahi %[R_TMP],-16\n\t"			\
+		      "    jl 19f\n\t"					\
+		      "    vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"	\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20: \n\t"					\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2_internal)
+
+/* Convert from UCS2 in other endianness to the internal (UCS4-like) format. */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop) /* This is not used.*/
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2reverse_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  if (! ignore_errors_p ())						\
+    {									\
+      result = __GCONV_ILLEGAL_INPUT;					\
+      break;								\
+    }									\
+  inptr += 2;								\
+  ++*irreversible;							\
+  continue;
+
+# define BODY_ORIG \
+  {									\
+    uint16_t u1 = bswap_16 (get16 (inptr));				\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "    larl %[R_TMP],9f\n\t"			\
+		      "    vlm %%v20,%%v22,0(%[R_TMP])\n\t"		\
+		      "    srlg %[R_TMP],%[R_LEN],3\n\t"		\
+		      "    clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0:  vl %%v16,0(%[R_IN])\n\t"			\
+		      "    vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "    la %[R_IN],16(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],32(%[R_OUT])\n\t"		\
+		      "    brctg %[R_TMP],0b\n\t"			\
+		      "    llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "    nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1:  sll %[R_LEN],1\n\t"				\
+		      "    ahik %[R_TMP],%[R_LEN],-1\n\t"		\
+		      "    jl 20f\n\t" /* No further bytes available.  */ \
+		      "    vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "    vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "    clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "    locgrhe %[R_TMP],%[R_LEN]\n\t"		\
+		      "    locghihe %[R_LEN],0\n\t"			\
+		      "    j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9:  .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      "    .short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v22: Vector permute mask.  */			\
+		      "    .short 0x0100,0x0302,0x0504,0x0706\n\t"	\
+		      "    .short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"	\
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "    sll %[R_TMP],1\n\t"				\
+		      "    lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "    ahi %[R_TMP],-1\n\t"				\
+		      "    jl 20f\n\t"					\
+		      "    vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "    ahi %[R_TMP],-16\n\t"			\
+		      "    jl 19f\n\t"					\
+		      "    vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"	\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20: \n\t"					\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2reverse_internal)
+
+/* Convert from the internal (UCS4-like) format to UCS2.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2_loop) /* This is not used.  */
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+									\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	result = __GCONV_ILLEGAL_INPUT;					\
+	if (! ignore_errors_p ())					\
+	  break;							\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, val);						\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "    larl %[R_I],3f\n\t"			\
+			  "    vlm %%v20,%%v23,0(%[R_I])\n\t"		\
+			  "0:  \n\t"					\
+			  "    vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2.  */			\
+			  "    vpkf %%v18,%%v16,%%v17\n\t"		\
+			  "    vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"	\
+			  "    jno 11f\n\t"				\
+			  "1:  vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "    jno 10f\n\t"				\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2:  vst %%v18,0(%[R_OUT])\n\t"		\
+			  "    la %[R_IN],32(%[R_IN])\n\t"		\
+			  "    la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "    brctg %[R_LI],0b\n\t"			\
+			  "    j 20f\n\t"				\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3:  .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  "    .long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  "    .long 0xe000,0x10000,0x0,0x0\n\t"	\
+			  "    .long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "    jo 2b\n\t" /* All ch's in this range, proceed.   */ \
+			  "    lghi %[R_TMP],16\n\t"			\
+			  "    j 12f\n\t"				\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "    jo 1b\n\t" /* All ch's in this range, proceed.   */ \
+			  "    lghi %[R_TMP],0\n\t"			\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "    agr %[R_I],%[R_TMP]\n\t"			\
+			  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			  "    srl %[R_I],1\n\t"			\
+			  "    ahi %[R_I],-1\n\t"			\
+			  "    jl 20f\n\t"				\
+			  "    vstl %%v18,%[R_I],0(%[R_OUT])\n\t"	\
+			  "    la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"	\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2)
+
+/* Convert from the internal (UCS4-like) format to UCS2 in other endianness. */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2reverse_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2reverse_loop)/* This is not used.*/
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2reverse)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	if (! ignore_errors_p ())					\
+	  {								\
+	    result = __GCONV_ILLEGAL_INPUT;				\
+	    break;							\
+	  }								\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, bswap_16 (val));					\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "    larl %[R_I],3f\n\t"			\
+			  "    vlm %%v20,%%v24,0(%[R_I])\n\t"		\
+			  "0:  \n\t"					\
+			  "    vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2 and byteswap.  */	\
+			  "    vpkf %%v18,%%v16,%%v17\n\t"		\
+			  "    vperm %%v18,%%v18,%%v18,%%v24\n\t"	\
+			  "    vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"	\
+			  "    jno 11f\n\t"				\
+			  "1:  vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "    jno 10f\n\t"				\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
+			  "    la %[R_IN],32(%[R_IN])\n\t"		\
+			  "    la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "    brctg %[R_LI],0b\n\t"			\
+			  "    j 20f\n\t"				\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  "    .long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  "    .long 0xe000,0x10000,0x0,0x0\n\t"	\
+			  "    .long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* Vector permute mask (v24)  */		\
+			  "    .short 0x0100,0x0302,0x0504,0x0706\n\t"	\
+			  "    .short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "    jo 2b\n\t" /* All ch's in this range, proceed.  */ \
+			  "    lghi %[R_TMP],16\n\t"			\
+			  "    j 12f\n\t"				\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "    jo 1b\n\t" /* All ch's in this range, proceed.  */ \
+			  "    lghi %[R_TMP],0\n\t"			\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "    agr %[R_I],%[R_TMP]\n\t"			\
+			  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			  "    srl %[R_I],1\n\t"			\
+			  "    ahi %[R_I],-1\n\t"			\
+			  "    jl 20f\n\t"				\
+			  "    vstl %%v18,%[R_I],0(%[R_OUT])\n\t"	\
+			  "    la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"	\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			    ASM_CLOBBER_VR ("v24")			\
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2reverse)
+
+
+#else
+/* Generate the internal transformations without ifunc if build environment
+   lacks vector support. Instead simply include the common version.  */
+# include <iconv/gconv_simple.c>
+#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 09/14] S390: Optimize utf16-utf32 module.
  2016-02-23  9:22 ` [PATCH 09/14] S390: Optimize utf16-utf32 module Stefan Liebler
@ 2016-04-21 14:55   ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 14:55 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 22833 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> This patch reworks the s390 specific module to convert between utf16 and utf32.
> Now ifunc is used to choose either the c or etf3eh (with convert utf
> instruction) variants at runtime.
> Furthermore a new vector variant for z13 is introduced which will be build
> and chosen if vector support is available at build / runtime.
>
> In case of converting utf 32 to utf16, the vector variant optimizes input of
> 2byte utf16 characters. The convert utf instruction is used if an utf16
> surrogate is found.
>
> For the other direction utf16 to utf32, the cu24 instruction can't be re-
> enabled, because it does not report an error, if the input-stream consists of
> a single low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13,
> too. Thus there is only the c or the new vector variant, which can handle utf16
> surrogate characters.
>
> This patch also fixes some whitespace errors. Furthermore, the etf3eh variant is
> handling the "UTF-xx//IGNORE" case now. Before they ignored the ignore-case and
> always stopped at an error.
>
> ChangeLog:
>
> 	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Use ifunc to select c,
> 	etf3eh or new vector loop-variant.
> ---
>   sysdeps/s390/s390-64/utf16-utf32-z9.c | 471 +++++++++++++++++++++++++++-------
>   1 file changed, 379 insertions(+), 92 deletions(-)
>
> diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
> index a3863ee..4c2c548 100644
> --- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
> +++ b/sysdeps/s390/s390-64/utf16-utf32-z9.c
> @@ -30,47 +30,27 @@
>   #include <dl-procinfo.h>
>   #include <gconv.h>
>
> +#if defined HAVE_S390_VX_GCC_SUPPORT
> +# define ASM_CLOBBER_VR(NR) , NR
> +#else
> +# define ASM_CLOBBER_VR(NR)
> +#endif
> +
>   /* UTF-32 big endian byte order mark.  */
>   #define BOM_UTF32               0x0000feffu
>
>   /* UTF-16 big endian byte order mark.  */
> -#define BOM_UTF16	        0xfeff
> +#define BOM_UTF16               0xfeff
>
>   #define DEFINE_INIT		0
>   #define DEFINE_FINI		0
>   #define MIN_NEEDED_FROM		2
>   #define MAX_NEEDED_FROM		4
>   #define MIN_NEEDED_TO		4
> -#define FROM_LOOP		from_utf16_loop
> -#define TO_LOOP			to_utf16_loop
> +#define FROM_LOOP		__from_utf16_loop
> +#define TO_LOOP			__to_utf16_loop
>   #define FROM_DIRECTION		(dir == from_utf16)
>   #define ONE_DIRECTION           0
> -#define PREPARE_LOOP							\
> -  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
> -  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
> -									\
> -  if (emit_bom && !data->__internal_use					\
> -      && data->__invocation_counter == 0)				\
> -    {									\
> -      if (dir == to_utf16)						\
> -	{								\
> -          /* Emit the UTF-16 Byte Order Mark.  */			\
> -          if (__glibc_unlikely (outbuf + 2 > outend))			      \
> -	    return __GCONV_FULL_OUTPUT;					\
> -									\
> -	  put16u (outbuf, BOM_UTF16);					\
> -	  outbuf += 2;							\
> -	}								\
> -      else								\
> -	{								\
> -          /* Emit the UTF-32 Byte Order Mark.  */			\
> -	  if (__glibc_unlikely (outbuf + 4 > outend))			      \
> -	    return __GCONV_FULL_OUTPUT;					\
> -									\
> -	  put32u (outbuf, BOM_UTF32);					\
> -	  outbuf += 4;							\
> -	}								\
> -    }
>
>   /* Direction of the transformation.  */
>   enum direction
> @@ -169,16 +149,16 @@ gconv_end (struct __gconv_step *data)
>       register unsigned long long outlen __asm__("11") = outend - outptr;	\
>       uint64_t cc = 0;							\
>   									\
> -    __asm__ volatile (".machine push       \n\t"			\
> -		      ".machine \"z9-109\" \n\t"			\
> -		      "0: " INSTRUCTION "  \n\t"			\
> -		      ".machine pop        \n\t"			\
> -		      "   jo     0b        \n\t"			\
> -		      "   ipm    %2        \n"				\
> -		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> -		      "+d" (outlen), "+d" (inlen)			\
> -		      :							\
> -		      : "cc", "memory");				\
> +    __asm__ __volatile__ (".machine push       \n\t"			\
> +			  ".machine \"z9-109\" \n\t"			\
> +			  "0: " INSTRUCTION "  \n\t"			\
> +			  ".machine pop        \n\t"			\
> +			  "   jo     0b        \n\t"			\
> +			  "   ipm    %2        \n"			\
> +			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> +			    "+d" (outlen), "+d" (inlen)			\
> +			  :						\
> +			  : "cc", "memory");				\
>   									\
>       inptr = pInput;							\
>       outptr = pOutput;							\
> @@ -187,44 +167,46 @@ gconv_end (struct __gconv_step *data)
>       if (cc == 1)							\
>         {									\
>   	result = __GCONV_FULL_OUTPUT;					\
> -	break;								\
>         }									\
>       else if (cc == 2)							\
>         {									\
>   	result = __GCONV_ILLEGAL_INPUT;					\
> -	break;								\
>         }									\
>     }
>
> +#define PREPARE_LOOP							\
> +  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
> +  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
> +									\
> +  if (emit_bom && !data->__internal_use					\
> +      && data->__invocation_counter == 0)				\
> +    {									\
> +      if (dir == to_utf16)						\
> +	{								\
> +	  /* Emit the UTF-16 Byte Order Mark.  */			\
> +	  if (__glibc_unlikely (outbuf + 2 > outend))			\
> +	    return __GCONV_FULL_OUTPUT;					\
> +									\
> +	  put16u (outbuf, BOM_UTF16);					\
> +	  outbuf += 2;							\
> +	}								\
> +      else								\
> +	{								\
> +	  /* Emit the UTF-32 Byte Order Mark.  */			\
> +	  if (__glibc_unlikely (outbuf + 4 > outend))			\
> +	    return __GCONV_FULL_OUTPUT;					\
> +									\
> +	  put32u (outbuf, BOM_UTF32);					\
> +	  outbuf += 4;							\
> +	}								\
> +    }
> +
>   /* Conversion function from UTF-16 to UTF-32 internal/BE.  */
>
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define LOOPFCT			FROM_LOOP
>   /* The software routine is copied from utf-16.c (minus bytes
>      swapping).  */
> -#define BODY								\
> +#define BODY_FROM_C							\
>     {									\
> -    /* The hardware instruction currently fails to report an error for	\
> -       isolated low surrogates so we have to disable the instruction	\
> -       until this gets resolved.  */					\
> -    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
> -      {									\
> -	HARDWARE_CONVERT ("cu24 %0, %1, 1");				\
> -	if (inptr != inend)						\
> -	  {								\
> -	    /* Check if the third byte is				\
> -	       a valid start of a UTF-16 surrogate.  */			\
> -	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
> -	      STANDARD_FROM_LOOP_ERR_HANDLER (3);			\
> -									\
> -	    result = __GCONV_INCOMPLETE_INPUT;				\
> -	    break;							\
> -	  }								\
> -	continue;							\
> -      }									\
> -									\
>       uint16_t u1 = get16 (inptr);					\
>   									\
>       if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
> @@ -235,15 +217,15 @@ gconv_end (struct __gconv_step *data)
>         }									\
>       else								\
>         {									\
> -        /* An isolated low-surrogate was found.  This has to be         \
> +	/* An isolated low-surrogate was found.  This has to be         \
>   	   considered ill-formed.  */					\
> -        if (__glibc_unlikely (u1 >= 0xdc00))				      \
> +	if (__glibc_unlikely (u1 >= 0xdc00))				\
>   	  {								\
>   	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
>   	  }								\
>   	/* It's a surrogate character.  At least the first word says	\
>   	   it is.  */							\
> -	if (__glibc_unlikely (inptr + 4 > inend))			      \
> +	if (__glibc_unlikely (inptr + 4 > inend))			\
>   	  {								\
>   	    /* We don't have enough input for another complete input	\
>   	       character.  */						\
> @@ -266,48 +248,200 @@ gconv_end (struct __gconv_step *data)
>         }									\
>       outptr += 4;							\
>     }
> -#define LOOP_NEED_FLAGS
> -#include <iconv/loop.c>
> +
> +#define BODY_FROM_VX							\
> +  {									\
> +    size_t inlen = inend - inptr;					\
> +    size_t outlen = outend - outptr;					\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  /* Setup to check for surrogates.  */			\
> +		  "larl %[R_TMP],9f\n\t"				\
> +		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> +		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
> +		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
> +		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
> +		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> +		  /* Check for surrogate chars.  */			\
> +		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t"						\
> +		  /* Enlarge to UTF-32.  */				\
> +		  "vuplhh %%v17,%%v16\n\t"				\
> +		  "la %[R_IN],16(%[R_IN])\n\t"				\
> +		  "vupllh %%v18,%%v16\n\t"				\
> +		  "aghi %[R_INLEN],-16\n\t"				\
> +		  /* Store 32 bytes to buf_out.  */			\
> +		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
> +		  "aghi %[R_OUTLEN],-32\n\t"				\
> +		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],16,2f\n\t"				\
> +		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
> +		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  /* At least on uint16_t is in range of surrogates.	\
> +		     Store the preceding chars.  */			\
> +		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		  "vuplhh %%v17,%%v16\n\t"				\
> +		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> +		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> +		  "jl 12f\n\t"						\
> +		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  "vupllh %%v18,%%v16\n\t"				\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
> +		  "11: \n\t" /* Update pointers.  */			\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
> +		  "12: lghi %[R_TMP2],16\n\t"				\
> +		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
> +		  "srl %[R_TMP2],1\n\t"					\
> +		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
> +		  "aghi %[R_OUTLEN],-4\n\t"				\
> +		  "j 16f\n\t"						\
> +		  /* Handle remaining bytes.  */			\
> +		  "2:\n\t"						\
> +		  /* Zero, one or more bytes available?  */		\
> +		  "clgfi %[R_INLEN],1\n\t"				\
> +		  "je 97f\n\t" /* Only one byte available.  */		\
> +		  "jl 99f\n\t" /* End if no bytes available.  */	\
> +		  /* Calculate remaining uint16_t values in inptr.  */	\
> +		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
> +		  /* Handle remaining uint16_t values.  */		\
> +		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
> +		  "slgfi %[R_OUTLEN],4\n\t"				\
> +		  "jl 96f \n\t"						\
> +		  "clfi %[R_TMP],0xd800\n\t"				\
> +		  "jhe 15f\n\t"						\
> +		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-2\n\t"				\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],13b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Handle UTF-16 surrogate pair.  */			\
> +		  "15: clfi %[R_TMP],0xdfff\n\t"			\
> +		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
> +		  "16: clfi %[R_TMP],0xdc00\n\t"			\
> +		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
> +		  "slgfi %[R_INLEN],4\n\t"				\
> +		  "jl 97f\n\t" /* Big enough input?  */			\
> +		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
> +		  "slfi %[R_TMP],0xd7c0\n\t"				\
> +		  "sll %[R_TMP],10\n\t"					\
> +		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
> +		  "nilf %[R_TMP3],0xfc00\n\t"				\
> +		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
> +		  "jne 98f\n\t"						\
> +		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "aghi %[R_TMP2],-2\n\t"				\
> +		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  "96:\n\t" /* Return full output.  */			\
> +		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
> +		  "j 99f\n\t"						\
> +		  "97:\n\t" /* Return incomplete input.  */		\
> +		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
> +		  "j 99f\n\t"						\
> +		  "98:\n\t" /* Return Illegal character.  */		\
> +		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "99:\n\t"						\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (inptr)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result != __GCONV_ILLEGAL_INPUT)				\
> +      break;								\
> +									\
> +    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
> +  }
> +
> +
> +/* Generate loop-function with software routing.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +# define LOOPFCT		__from_utf16_loop_c
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_C
> +# include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware vector instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		__from_utf16_loop_vx
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_VX
> +# include <iconv/loop.c>
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__from_utf16_loop_c)
> +__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
> +__from_utf16_loop;
> +
> +static void *
> +__from_utf16_loop_resolver (unsigned long int dl_hwcap)
> +{
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __from_utf16_loop_vx;
> +  else
> +    return __from_utf16_loop_c;
> +}
> +
> +strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
> +#else
> +# define LOOPFCT		FROM_LOOP
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_C
> +# include <iconv/loop.c>
> +#endif
>
>   /* Conversion from UTF-32 internal/BE to UTF-16.  */
>
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			TO_LOOP
>   /* The software routine is copied from utf-16.c (minus bytes
>      swapping).  */
> -#define BODY								\
> +#define BODY_TO_C							\
>     {									\
> -    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
> -      {									\
> -	HARDWARE_CONVERT ("cu42 %0, %1");				\
> -									\
> -	if (inptr != inend)						\
> -	  {								\
> -	    result = __GCONV_INCOMPLETE_INPUT;				\
> -	    break;							\
> -	  }								\
> -	continue;							\
> -      }									\
> -									\
>       uint32_t c = get32 (inptr);						\
>   									\
>       if (__builtin_expect (c <= 0xd7ff, 1)				\
>   	|| (c >=0xdc00 && c <= 0xffff))					\
>         {									\
> -        /* Two UTF-16 chars.  */					\
> -        put16 (outptr, c);						\
> +	/* Two UTF-16 chars.  */					\
> +	put16 (outptr, c);						\
>         }									\
>       else if (__builtin_expect (c >= 0x10000, 1)				\
>   	     && __builtin_expect (c <= 0x10ffff, 1))			\
>         {									\
>   	/* Four UTF-16 chars.  */					\
> -        uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
> +	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
>   	uint16_t out;							\
>   									\
>   	/* Generate a surrogate character.  */				\
> -	if (__glibc_unlikely (outptr + 4 > outend))			      \
> +	if (__glibc_unlikely (outptr + 4 > outend))			\
>   	  {								\
>   	    /* Overflow in the output buffer.  */			\
>   	    result = __GCONV_FULL_OUTPUT;				\
> @@ -326,12 +460,165 @@ gconv_end (struct __gconv_step *data)
>         }									\
>       else								\
>         {									\
> -        STANDARD_TO_LOOP_ERR_HANDLER (4);				\
> +	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
>         }									\
>       outptr += 2;							\
>       inptr += 4;								\
>     }
> +
> +#define BODY_TO_ETF3EH							\
> +  {									\
> +    HARDWARE_CONVERT ("cu42 %0, %1");					\
> +									\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +									\
> +    if (inptr + 4 > inend)						\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> +  }
> +
> +#define BODY_TO_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  /* Setup to check for surrogates.  */			\
> +		  "larl %[R_TMP],9f\n\t"				\
> +		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> +		  /* Loop which handles UTF-16 chars			\
> +		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
> +		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
> +		  "lghi %[R_TMP2],0\n\t"				\
> +		  /* Shorten to UTF-16.  */				\
> +		  "vpkf %%v18,%%v16,%%v17\n\t"				\
> +		  /* Check for surrogate chars.  */			\
> +		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t"						\
> +		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
> +		  "jno 11f\n\t"						\
> +		  /* Store 16 bytes to buf_out.  */			\
> +		  "vst %%v18,0(%[R_OUT])\n\t"				\
> +		  "la %[R_IN],32(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-32\n\t"				\
> +		  "aghi %[R_OUTLEN],-16\n\t"				\
> +		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],32,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
> +		     and check for ch >= 0x10000. (v30, v31)  */	\
> +		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
> +		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
> +		  /* At least on UTF32 char is in range of surrogates.	\
> +		     Store the preceding characters.  */		\
> +		  "11: ahi %[R_TMP2],16\n\t"				\
> +		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
> +		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> +		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> +		  "jl 20f\n\t"						\
> +		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  /* Update pointers.  */				\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Handles UTF16 surrogates with convert instruction.  */ \
> +		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +									\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +    if (inptr + 4 > inend)						\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> +  }
> +
> +/* Generate loop-function with software routing.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf16_loop_c
> +#define LOOP_NEED_FLAGS
> +#define BODY			BODY_TO_C
> +#include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf16_loop_etf3eh
>   #define LOOP_NEED_FLAGS
> +#define BODY			BODY_TO_ETF3EH
>   #include <iconv/loop.c>
>
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +# define LOOPFCT		__to_utf16_loop_vx
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_TO_VX
> +# include <iconv/loop.c>
> +#endif
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__to_utf16_loop_c)
> +__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
> +__to_utf16_loop;
> +
> +static void *
> +__to_utf16_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __to_utf16_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __to_utf16_loop_etf3eh;
> +  else
> +    return __to_utf16_loop_c;
> +}
> +
> +strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
> +
> +
>   #include <iconv/skeleton.c>
>

[-- Attachment #2: 0009-S390-Optimize-utf16-utf32-module.patch --]
[-- Type: text/x-patch, Size: 22007 bytes --]

From 787cdbd9a241bbf00fe7bde954797838054c4c54 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 09/14] S390: Optimize utf16-utf32 module.

This patch reworks the s390 specific module to convert between utf16 and utf32.
Now ifunc is used to choose either the c or etf3eh (with convert utf
instruction) variants at runtime.
Furthermore a new vector variant for z13 is introduced which will be build
and chosen if vector support is available at build / runtime.

In case of converting utf 32 to utf16, the vector variant optimizes input of
2byte utf16 characters. The convert utf instruction is used if an utf16
surrogate is found.

For the other direction utf16 to utf32, the cu24 instruction can't be re-
enabled, because it does not report an error, if the input-stream consists of
a single low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13,
too. Thus there is only the c or the new vector variant, which can handle utf16
surrogate characters.

This patch also fixes some whitespace errors. Furthermore, the etf3eh variant is
handling the "UTF-xx//IGNORE" case now. Before they ignored the ignore-case and
always stopped at an error.

ChangeLog:

	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Use ifunc to select c,
	etf3eh or new vector loop-variant.
---
 sysdeps/s390/s390-64/utf16-utf32-z9.c | 471 +++++++++++++++++++++++++++-------
 1 file changed, 379 insertions(+), 92 deletions(-)

diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
index a3863ee..61d0a94 100644
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ b/sysdeps/s390/s390-64/utf16-utf32-z9.c
@@ -30,47 +30,27 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
 /* UTF-32 big endian byte order mark.  */
 #define BOM_UTF32               0x0000feffu
 
 /* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	        0xfeff
+#define BOM_UTF16               0xfeff
 
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
 #define MIN_NEEDED_FROM		2
 #define MAX_NEEDED_FROM		4
 #define MIN_NEEDED_TO		4
-#define FROM_LOOP		from_utf16_loop
-#define TO_LOOP			to_utf16_loop
+#define FROM_LOOP		__from_utf16_loop
+#define TO_LOOP			__to_utf16_loop
 #define FROM_DIRECTION		(dir == from_utf16)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      if (dir == to_utf16)						\
-	{								\
-          /* Emit the UTF-16 Byte Order Mark.  */			\
-          if (__glibc_unlikely (outbuf + 2 > outend))			      \
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put16u (outbuf, BOM_UTF16);					\
-	  outbuf += 2;							\
-	}								\
-      else								\
-	{								\
-          /* Emit the UTF-32 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 4 > outend))			      \
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put32u (outbuf, BOM_UTF32);					\
-	  outbuf += 4;							\
-	}								\
-    }
 
 /* Direction of the transformation.  */
 enum direction
@@ -169,16 +149,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-		      "+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -187,44 +167,46 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      if (dir == to_utf16)						\
+	{								\
+	  /* Emit the UTF-16 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 2 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put16u (outbuf, BOM_UTF16);					\
+	  outbuf += 2;							\
+	}								\
+      else								\
+	{								\
+	  /* Emit the UTF-32 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 4 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put32u (outbuf, BOM_UTF32);					\
+	  outbuf += 4;							\
+	}								\
+    }
+
 /* Conversion function from UTF-16 to UTF-32 internal/BE.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
 /* The software routine is copied from utf-16.c (minus bytes
    swapping).  */
-#define BODY								\
+#define BODY_FROM_C							\
   {									\
-    /* The hardware instruction currently fails to report an error for	\
-       isolated low surrogates so we have to disable the instruction	\
-       until this gets resolved.  */					\
-    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
-      {									\
-	HARDWARE_CONVERT ("cu24 %0, %1, 1");				\
-	if (inptr != inend)						\
-	  {								\
-	    /* Check if the third byte is				\
-	       a valid start of a UTF-16 surrogate.  */			\
-	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
-	      STANDARD_FROM_LOOP_ERR_HANDLER (3);			\
-									\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint16_t u1 = get16 (inptr);					\
 									\
     if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
@@ -235,15 +217,15 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        /* An isolated low-surrogate was found.  This has to be         \
+	/* An isolated low-surrogate was found.  This has to be         \
 	   considered ill-formed.  */					\
-        if (__glibc_unlikely (u1 >= 0xdc00))				      \
+	if (__glibc_unlikely (u1 >= 0xdc00))				\
 	  {								\
 	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
 	  }								\
 	/* It's a surrogate character.  At least the first word says	\
 	   it is.  */							\
-	if (__glibc_unlikely (inptr + 4 > inend))			      \
+	if (__glibc_unlikely (inptr + 4 > inend))			\
 	  {								\
 	    /* We don't have enough input for another complete input	\
 	       character.  */						\
@@ -266,48 +248,200 @@ gconv_end (struct __gconv_step *data)
       }									\
     outptr += 4;							\
   }
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
+
+#define BODY_FROM_VX							\
+  {									\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
+		  "0:  clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  /* Enlarge to UTF-32.  */				\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
+		  "9:  .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  "    .short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least on uint16_t is in range of surrogates.	\
+		     Store the preceding chars.  */			\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 12f\n\t"					\
+		  "    vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "12: lghi %[R_TMP2],16\n\t"				\
+		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    srl %[R_TMP2],1\n\t"				\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_OUTLEN],-4\n\t"				\
+		  "    j 16f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    je 97f\n\t" /* Only one byte available.  */	\
+		  "    jl 99f\n\t" /* End if no bytes available.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle remaining uint16_t values.  */		\
+		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 96f \n\t"					\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    jhe 15f\n\t"					\
+		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],13b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Handle UTF-16 surrogate pair.  */			\
+		  "15: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 14b\n\t" /* Jump away if ch > 0xdfff.  */	\
+		  "16: clfi %[R_TMP],0xdc00\n\t"			\
+		  "    jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
+		  "    slgfi %[R_INLEN],4\n\t"				\
+		  "    jl 97f\n\t" /* Big enough input?  */		\
+		  "    llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    slfi %[R_TMP],0xd7c0\n\t"			\
+		  "    sll %[R_TMP],10\n\t"				\
+		  "    risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
+		  "    nilf %[R_TMP3],0xfc00\n\t"			\
+		  "    clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    jne 98f\n\t"					\
+		  "    st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "96: \n\t" /* Return full output.  */			\
+		  "    lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "97: \n\t" /* Return incomplete input.  */		\
+		  "    lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "98:\n\t" /* Return Illegal character.  */		\
+		  "    lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
+  }
+
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__from_utf16_loop_c
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf16_loop_c)
+__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
+__from_utf16_loop;
+
+static void *
+__from_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf16_loop_vx;
+  else
+    return __from_utf16_loop_c;
+}
+
+strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
+#else
+# define LOOPFCT		FROM_LOOP
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+#endif
 
 /* Conversion from UTF-32 internal/BE to UTF-16.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine is copied from utf-16.c (minus bytes
    swapping).  */
-#define BODY								\
+#define BODY_TO_C							\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu42 %0, %1");				\
-									\
-	if (inptr != inend)						\
-	  {								\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint32_t c = get32 (inptr);						\
 									\
     if (__builtin_expect (c <= 0xd7ff, 1)				\
 	|| (c >=0xdc00 && c <= 0xffff))					\
       {									\
-        /* Two UTF-16 chars.  */					\
-        put16 (outptr, c);						\
+	/* Two UTF-16 chars.  */					\
+	put16 (outptr, c);						\
       }									\
     else if (__builtin_expect (c >= 0x10000, 1)				\
 	     && __builtin_expect (c <= 0x10ffff, 1))			\
       {									\
 	/* Four UTF-16 chars.  */					\
-        uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
+	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
 	uint16_t out;							\
 									\
 	/* Generate a surrogate character.  */				\
-	if (__glibc_unlikely (outptr + 4 > outend))			      \
+	if (__glibc_unlikely (outptr + 4 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
@@ -326,12 +460,165 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
       }									\
     outptr += 2;							\
     inptr += 4;								\
   }
+
+#define BODY_TO_ETF3EH							\
+  {									\
+    HARDWARE_CONVERT ("cu42 %0, %1");					\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars			\
+		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
+		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Shorten to UTF-16.  */				\
+		  "    vpkf %%v18,%%v16,%%v17\n\t"			\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t"					\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
+		     and check for ch >= 0x10000. (v30, v31)  */	\
+		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
+		  "    .long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
+		  /* At least on UTF32 char is in range of surrogates.	\
+		     Store the preceding characters.  */		\
+		  "11: ahi %[R_TMP2],16\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 20f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handles UTF16 surrogates with convert instruction.  */ \
+		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_TO_VX
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf16_loop_c)
+__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
+__to_utf16_loop;
+
+static void *
+__to_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf16_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf16_loop_etf3eh;
+  else
+    return __to_utf16_loop_c;
+}
+
+strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
+
+
 #include <iconv/skeleton.c>
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-04-14 14:16   ` Stefan Liebler
@ 2016-04-21 15:00     ` Stefan Liebler
  2016-04-28  6:55       ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:00 UTC (permalink / raw)
  To: libc-alpha

Ping. Is the new handling of gconv-modules in iconvdata/Makefile okay to 
commit?

On 04/14/2016 04:16 PM, Stefan Liebler wrote:
> Ping. Is the new handling of gconv-modules in iconvdata/Makefile okay to
> commit?
>
> On 02/23/2016 10:21 AM, Stefan Liebler wrote:
>> This patch introduces a way to provide an architecture dependent
>> gconv-modules
>> file. Before this patch, the gconv-modules file was normally installed
>> from
>> src-dir/iconvdata/gconv-modules. The S390 Makefile had overridden the
>> installation recipe (with a make warning) in order to install the
>> gconv-module-s390 file from build-dir.
>> The iconvdata/Makefile provides another recipe, which copies the
>> gconv-modules
>> file from src to build dir, which are used by the testcases.
>> Thus the testcases does not use the currently build s390-modules.
>>
>> This patch uses build-dir/iconvdata/gconv-modules for installation.
>> If makefile variable GCONV_MODULES is not defined, then gconv-modules
>> file
>> is copied form source to build directory.
>> If an architecture wants to create his own gconv-modules file, then
>> the variable
>> GCONV_MODULE is set to the name of the architecture-dependent
>> gconv-modules file
>> in build-directory, which has to be created by a recipe in
>> sysdeps/.../Makefile.
>> Then the  iconvdata/Makefile copies this file to
>> build-dir/iconvdata/gconv-modules, which will be used for installation
>> and test.
>>
>> This way, the s390-Makefile does not need to override the recipe for
>> gconv-modules and no warning is emitted anymore.
>>
>> ChangeLog:
>>
>>      * iconvdata/Makefile (GCONV_MODULES): New variable, which can
>>      be set by sysdeps Makefile.
>>      ($(inst_gconvdir)/gconv-modules):
>>      Install file from $(objpfx)gconv-modules.
>>      ($(objpfx)gconv-modules): Copy File from src-dir or from
>>      build-dir with file-name specified by GCONV_MODULES.
>>      * sysdeps/s390/s390-64/Makefile ($(inst_gconvdir)/gconv-modules):
>>      Deleted.
>>      (GCONV_MODULES): New variable.
>> ---
>>   iconvdata/Makefile            | 15 +++++++++++++--
>>   sysdeps/s390/s390-64/Makefile | 17 ++---------------
>>   2 files changed, 15 insertions(+), 17 deletions(-)
>>
>> diff --git a/iconvdata/Makefile b/iconvdata/Makefile
>> index 357530b..1ac1a5c 100644
>> --- a/iconvdata/Makefile
>> +++ b/iconvdata/Makefile
>> @@ -244,7 +244,7 @@ headers: $(addprefix $(objpfx),
>> $(generated-modules:=.h))
>>   $(addprefix $(inst_gconvdir)/, $(modules.so)): \
>>       $(inst_gconvdir)/%: $(objpfx)% $(+force)
>>       $(do-install-program)
>> -$(inst_gconvdir)/gconv-modules: gconv-modules $(+force)
>> +$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules $(+force)
>>       $(do-install)
>>   ifeq (no,$(cross-compiling))
>>   # Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is
>> necessary
>> @@ -332,6 +332,17 @@ tst-tables-clean:
>>       -rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
>>
>>   ifdef objpfx
>> +# Override GCONV_MODULES file name and provide a Makefile recipe,
>> +# if you want to create your own version.
>> +ifndef GCONV_MODULES
>> +# Copy gconv-modules from src-tree for tests and installation.
>>   $(objpfx)gconv-modules: gconv-modules
>> -    cp $^ $@
>> +    cp $< $@
>> +else
>> +generated += $(GCONV_MODULES)
>> +
>> +# Copy overrided GCONV_MODULES file to gconv-modules for tests and
>> installation.
>> +$(objpfx)gconv-modules: $(objpfx)$(GCONV_MODULES)
>> +    cp $< $@
>> +endif
>>   endif
>> diff --git a/sysdeps/s390/s390-64/Makefile
>> b/sysdeps/s390/s390-64/Makefile
>> index ce4f0c5..de249a7 100644
>> --- a/sysdeps/s390/s390-64/Makefile
>> +++ b/sysdeps/s390/s390-64/Makefile
>> @@ -39,7 +39,7 @@ $(patsubst %, $(inst_gconvdir)/%.so,
>> $(s390x-iconv-modules)) : \
>>   $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
>>       $(do-install-program)
>>
>> -$(objpfx)gconv-modules-s390: gconv-modules $(+force)
>> +$(objpfx)gconv-modules-s390: gconv-modules
>>       cp $< $@
>>       echo >> $@
>>       echo "# S/390 hardware accelerated modules" >> $@
>> @@ -74,19 +74,6 @@ $(objpfx)gconv-modules-s390: gconv-modules $(+force)
>>       echo -n "module    ISO-10646/UTF8/        UTF-16BE//    " >> $@
>>       echo "    UTF8_UTF16_Z9        1" >> $@
>>
>> -$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules-s390 $(+force)
>> -    $(do-install)
>> -ifeq (no,$(cross-compiling))
>> -# Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is
>> necessary
>> -# if this libc has more gconv modules than the previously installed one.
>> -    if test -f "$(inst_gconvdir)/gconv-modules.cache"; then \
>> -       LC_ALL=C \
>> -       $(rtld-prefix) \
>> -       $(common-objpfx)iconv/iconvconfig \
>> -         $(addprefix --prefix=,$(install_root)); \
>> -    fi
>> -else
>> -    @echo '*@*@*@ You should recreate
>> $(inst_gconvdir)/gconv-modules.cache'
>> -endif
>> +GCONV_MODULES = gconv-modules-s390
>>
>>   endif
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module.
  2016-02-23  9:22 ` [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module Stefan Liebler
@ 2016-04-21 15:05   ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:05 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 5719 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> This patch reworks the s390 specific module which used the z900
> translate one to one instruction. Now the g5 translate instruction is used,
> because it outperforms the troo instruction.
>
> ChangeLog:
>
> 	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP):
> 	Rename to TR_LOOP and usage of tr instead of troo instruction.
> ---
>   sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 93 +++++++++++++++++-----------
>   1 file changed, 56 insertions(+), 37 deletions(-)
>
> diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
> index c59f87f..4d79bbf 100644
> --- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
> +++ b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
> @@ -1,7 +1,6 @@
>   /* Conversion between ISO 8859-1 and IBM037.
>
> -   This module uses the Z900 variant of the Translate One To One
> -   instruction.
> +   This module uses the translate instruction.
>      Copyright (C) 1997-2016 Free Software Foundation, Inc.
>
>      Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> @@ -176,50 +175,70 @@ __attribute__ ((aligned (8))) =
>   #define MIN_NEEDED_FROM		1
>   #define MIN_NEEDED_TO		1
>
> -/* The Z900 variant of troo forces us to always specify a test
> -   character which ends the translation.  So if we run into the
> -   situation where the translation has been interrupted due to the
> -   test character we translate the character by hand and jump back
> -   into the instruction.  */
> -
> -#define TROO_LOOP(TABLE)						\
> +#define TR_LOOP(TABLE)							\
>     {									\
> -    register const unsigned char test __asm__ ("0") = 0;		\
> -    register const unsigned char *pTable __asm__ ("1") = TABLE;		\
> -    register unsigned char *pOutput __asm__ ("2") = outptr;		\
> -    register uint64_t length __asm__ ("3");				\
> -    const unsigned char* pInput = inptr;				\
> -    uint64_t tmp;							\
> -									\
> -    length = (inend - inptr < outend - outptr				\
> -	      ? inend - inptr : outend - outptr);			\
> +    size_t length = (inend - inptr < outend - outptr			\
> +		     ? inend - inptr : outend - outptr);		\
>   									\
> -    __asm__ volatile ("0:                        \n\t"			\
> -		      "  troo    %0,%1           \n\t"			\
> -		      "  jz      1f              \n\t"			\
> -		      "  jo      0b              \n\t"			\
> -		      "  llgc    %3,0(%1)        \n\t"			\
> -		      "  la      %3,0(%3,%4)     \n\t"			\
> -		      "  mvc     0(1,%0),0(%3)   \n\t"			\
> -		      "  aghi    %1,1            \n\t"			\
> -		      "  aghi    %0,1            \n\t"			\
> -		      "  aghi    %2,-1           \n\t"			\
> -		      "  j       0b              \n\t"			\
> -		      "1:                        \n"			\
> +    /* Process in 256 byte blocks.  */					\
> +    if (__builtin_expect (length >= 256, 0))				\
> +      {									\
> +	size_t blocks = length / 256;					\
> +	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
> +			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
> +			     "la %[R_IN],256(%[R_IN])\n\t"		\
> +			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
> +			     "brctg %[R_LI],0b\n\t"			\
> +			     : /* outputs */ [R_IN] "+a" (inptr)	\
> +			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
> +			     : /* inputs */ [R_TBL] "a" (TABLE)		\
> +			     : /* clobber list */ "memory"		\
> +			     );						\
> +	length = length % 256;						\
> +      }									\
>   									\
> -     : "+a" (pOutput), "+a" (pInput), "+d" (length), "=&a" (tmp)        \
> -     : "a" (pTable), "d" (test)						\
> -     : "cc");								\
> +    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
> +    if (length >= 8)							\
> +      {									\
> +	size_t blocks = length / 8;					\
> +	for (int i = 0; i < blocks; i++)				\
> +	  {								\
> +	    outptr[0] = TABLE[inptr[0]];				\
> +	    outptr[1] = TABLE[inptr[1]];				\
> +	    outptr[2] = TABLE[inptr[2]];				\
> +	    outptr[3] = TABLE[inptr[3]];				\
> +	    outptr[4] = TABLE[inptr[4]];				\
> +	    outptr[5] = TABLE[inptr[5]];				\
> +	    outptr[6] = TABLE[inptr[6]];				\
> +	    outptr[7] = TABLE[inptr[7]];				\
> +	    inptr += 8;							\
> +	    outptr += 8;						\
> +	  }								\
> +	length = length % 8;						\
> +      }									\
>   									\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> +    /* Process remaining 0...7 bytes.  */				\
> +    switch (length)							\
> +      {									\
> +      case 7: outptr[6] = TABLE[inptr[6]];				\
> +      case 6: outptr[5] = TABLE[inptr[5]];				\
> +      case 5: outptr[4] = TABLE[inptr[4]];				\
> +      case 4: outptr[3] = TABLE[inptr[3]];				\
> +      case 3: outptr[2] = TABLE[inptr[2]];				\
> +      case 2: outptr[1] = TABLE[inptr[1]];				\
> +      case 1: outptr[0] = TABLE[inptr[0]];				\
> +      case 0: break;							\
> +      }									\
> +    inptr += length;							\
> +    outptr += length;							\
>     }
>
> +
>   /* First define the conversion function from ISO 8859-1 to CP037.  */
>   #define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
>   #define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
>   #define LOOPFCT			FROM_LOOP
> -#define BODY TROO_LOOP (table_iso8859_1_to_cp037)
> +#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
>
>   #include <iconv/loop.c>
>
> @@ -228,7 +247,7 @@ __attribute__ ((aligned (8))) =
>   #define MIN_NEEDED_INPUT	MIN_NEEDED_TO
>   #define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
>   #define LOOPFCT			TO_LOOP
> -#define BODY TROO_LOOP (table_cp037_iso8859_1);
> +#define BODY			TR_LOOP (table_cp037_iso8859_1);
>
>   #include <iconv/loop.c>
>
>

[-- Attachment #2: 0006-S390-Optimize-iso-8859-1-to-ibm037-iconv-module.patch --]
[-- Type: text/x-patch, Size: 5538 bytes --]

From d489351c09c82994adb872049fcb33bf189f86af Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module.

This patch reworks the s390 specific module which used the z900
translate one to one instruction. Now the g5 translate instruction is used,
because it outperforms the troo instruction.

ChangeLog:

	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP):
	Rename to TR_LOOP and usage of tr instead of troo instruction.
---
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 93 +++++++++++++++++-----------
 1 file changed, 56 insertions(+), 37 deletions(-)

diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
index c59f87f..3b63e6a 100644
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
@@ -1,7 +1,6 @@
 /* Conversion between ISO 8859-1 and IBM037.
 
-   This module uses the Z900 variant of the Translate One To One
-   instruction.
+   This module uses the translate instruction.
    Copyright (C) 1997-2016 Free Software Foundation, Inc.
 
    Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
@@ -176,50 +175,70 @@ __attribute__ ((aligned (8))) =
 #define MIN_NEEDED_FROM		1
 #define MIN_NEEDED_TO		1
 
-/* The Z900 variant of troo forces us to always specify a test
-   character which ends the translation.  So if we run into the
-   situation where the translation has been interrupted due to the
-   test character we translate the character by hand and jump back
-   into the instruction.  */
-
-#define TROO_LOOP(TABLE)						\
+#define TR_LOOP(TABLE)							\
   {									\
-    register const unsigned char test __asm__ ("0") = 0;		\
-    register const unsigned char *pTable __asm__ ("1") = TABLE;		\
-    register unsigned char *pOutput __asm__ ("2") = outptr;		\
-    register uint64_t length __asm__ ("3");				\
-    const unsigned char* pInput = inptr;				\
-    uint64_t tmp;							\
-									\
-    length = (inend - inptr < outend - outptr				\
-	      ? inend - inptr : outend - outptr);			\
+    size_t length = (inend - inptr < outend - outptr			\
+		     ? inend - inptr : outend - outptr);		\
 									\
-    __asm__ volatile ("0:                        \n\t"			\
-		      "  troo    %0,%1           \n\t"			\
-		      "  jz      1f              \n\t"			\
-		      "  jo      0b              \n\t"			\
-		      "  llgc    %3,0(%1)        \n\t"			\
-		      "  la      %3,0(%3,%4)     \n\t"			\
-		      "  mvc     0(1,%0),0(%3)   \n\t"			\
-		      "  aghi    %1,1            \n\t"			\
-		      "  aghi    %0,1            \n\t"			\
-		      "  aghi    %2,-1           \n\t"			\
-		      "  j       0b              \n\t"			\
-		      "1:                        \n"			\
+    /* Process in 256 byte blocks.  */					\
+    if (__builtin_expect (length >= 256, 0))				\
+      {									\
+	size_t blocks = length / 256;					\
+	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
+			     "   tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
+			     "   la %[R_IN],256(%[R_IN])\n\t"		\
+			     "   la %[R_OUT],256(%[R_OUT])\n\t"		\
+			     "   brctg %[R_LI],0b\n\t"			\
+			     : /* outputs */ [R_IN] "+a" (inptr)	\
+			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
+			     : /* inputs */ [R_TBL] "a" (TABLE)		\
+			     : /* clobber list */ "memory"		\
+			     );						\
+	length = length % 256;						\
+      }									\
 									\
-     : "+a" (pOutput), "+a" (pInput), "+d" (length), "=&a" (tmp)        \
-     : "a" (pTable), "d" (test)						\
-     : "cc");								\
+    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
+    if (length >= 8)							\
+      {									\
+	size_t blocks = length / 8;					\
+	for (int i = 0; i < blocks; i++)				\
+	  {								\
+	    outptr[0] = TABLE[inptr[0]];				\
+	    outptr[1] = TABLE[inptr[1]];				\
+	    outptr[2] = TABLE[inptr[2]];				\
+	    outptr[3] = TABLE[inptr[3]];				\
+	    outptr[4] = TABLE[inptr[4]];				\
+	    outptr[5] = TABLE[inptr[5]];				\
+	    outptr[6] = TABLE[inptr[6]];				\
+	    outptr[7] = TABLE[inptr[7]];				\
+	    inptr += 8;							\
+	    outptr += 8;						\
+	  }								\
+	length = length % 8;						\
+      }									\
 									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
+    /* Process remaining 0...7 bytes.  */				\
+    switch (length)							\
+      {									\
+      case 7: outptr[6] = TABLE[inptr[6]];				\
+      case 6: outptr[5] = TABLE[inptr[5]];				\
+      case 5: outptr[4] = TABLE[inptr[4]];				\
+      case 4: outptr[3] = TABLE[inptr[3]];				\
+      case 3: outptr[2] = TABLE[inptr[2]];				\
+      case 2: outptr[1] = TABLE[inptr[1]];				\
+      case 1: outptr[0] = TABLE[inptr[0]];				\
+      case 0: break;							\
+      }									\
+    inptr += length;							\
+    outptr += length;							\
   }
 
+
 /* First define the conversion function from ISO 8859-1 to CP037.  */
 #define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
 #define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
 #define LOOPFCT			FROM_LOOP
-#define BODY TROO_LOOP (table_iso8859_1_to_cp037)
+#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
 
 #include <iconv/loop.c>
 
@@ -228,7 +247,7 @@ __attribute__ ((aligned (8))) =
 #define MIN_NEEDED_INPUT	MIN_NEEDED_TO
 #define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
 #define LOOPFCT			TO_LOOP
-#define BODY TROO_LOOP (table_cp037_iso8859_1);
+#define BODY			TR_LOOP (table_cp037_iso8859_1);
 
 #include <iconv/loop.c>
 
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too.
  2016-02-23  9:23 ` [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too Stefan Liebler
  2016-02-23 12:06   ` Stefan Liebler
@ 2016-04-21 15:10   ` Stefan Liebler
  1 sibling, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:10 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 196000 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> This patch reworks the existing s390 64bit specific iconv modules in order
> to use them on s390 31bit, too.
>
> Thus the parts for subdirectory iconvdata in sysdeps/s390/s390-64/Makefile
> were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
> All those modules are moved from sysdeps/s390/s390-64 directory to sysdeps/s390.
>
> The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
> on count) instruction on 31bit s390 instead of brctg, because the brctg is a
> zarch instruction and is not available on a 31bit kernel.
>
> The utf modules are using zarch instructions, thus the directive machinemode
> zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
> in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
> The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
> only if zarch instructions are available (64bit kernel in 31bit compat-mode).
> Furthermore some variable types were changed. E.g. unsigned long long would be
> a register pair on s390 31bit, but we want only one single register.
> For variables of type size_t the register contents have to be enlarged from a
> 32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
> in such cases.
>
> ChangeLog:
>
> 	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
> 	Move to ...
> 	* sysdeps/s390/Makefile: ... here.
> 	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
> 	* sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
> 	(BRANCH_ON_COUNT): New define.
> 	(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
> 	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
> 	* sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
> 	run on s390-32, too.
> 	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
> 	* sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
> 	run on s390-32, too.
> 	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
> 	* sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
> 	run on s390-32, too.
> ---
>   sysdeps/s390/Makefile                        |  83 +++
>   sysdeps/s390/iso-8859-1_cp037_z900.c         | 262 +++++++++
>   sysdeps/s390/s390-64/Makefile                |  84 ---
>   sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 256 ---------
>   sysdeps/s390/s390-64/utf16-utf32-z9.c        | 624 --------------------
>   sysdeps/s390/s390-64/utf8-utf16-z9.c         | 806 --------------------------
>   sysdeps/s390/s390-64/utf8-utf32-z9.c         | 807 --------------------------
>   sysdeps/s390/utf16-utf32-z9.c                | 636 +++++++++++++++++++++
>   sysdeps/s390/utf8-utf16-z9.c                 | 818 ++++++++++++++++++++++++++
>   sysdeps/s390/utf8-utf32-z9.c                 | 820 +++++++++++++++++++++++++++
>   10 files changed, 2619 insertions(+), 2577 deletions(-)
>   create mode 100644 sysdeps/s390/Makefile
>   create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
>   delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
>   delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
>   delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
>   delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
>   create mode 100644 sysdeps/s390/utf16-utf32-z9.c
>   create mode 100644 sysdeps/s390/utf8-utf16-z9.c
>   create mode 100644 sysdeps/s390/utf8-utf32-z9.c
>
> diff --git a/sysdeps/s390/Makefile b/sysdeps/s390/Makefile
> new file mode 100644
> index 0000000..9b17342
> --- /dev/null
> +++ b/sysdeps/s390/Makefile
> @@ -0,0 +1,83 @@
> +ifeq ($(subdir),iconvdata)
> +ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
> +ISO-8859-1_CP037_Z900-map := gconv.map
> +
> +UTF8_UTF32_Z9-routines := utf8-utf32-z9
> +UTF8_UTF32_Z9-map := gconv.map
> +
> +UTF16_UTF32_Z9-routines := utf16-utf32-z9
> +UTF16_UTF32_Z9-map := gconv.map
> +
> +UTF8_UTF16_Z9-routines := utf8-utf16-z9
> +UTF8_UTF16_Z9-map := gconv.map
> +
> +s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
> +
> +extra-modules-left += $(s390x-iconv-modules)
> +include extra-module.mk
> +
> +cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
> +lib := iconvdata
> +include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
> +
> +extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
> +install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
> +
> +$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
> +$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
> +	$(do-install-program)
> +
> +$(objpfx)gconv-modules-s390: gconv-modules
> +	${AWK} 'BEGIN { emitted = 0 } \
> +	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
> +	!emitted { emit_s390_modules(); emitted = 1; print; } \
> +	function emit_s390_modules() { \
> +	  # Emit header line. \
> +	  print "# S/390 hardware accelerated modules"; \
> +	  print_val("#", 8); \
> +	  print_val("from", 24); \
> +	  print_val("to", 24); \
> +	  print_val("module", 24); \
> +	  printf "cost\n"; \
> +	  # Emit s390-specific modules. \
> +	  modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900"); \
> +	  modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900"); \
> +	  modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9"); \
> +	  modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9"); \
> +	  modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9"); \
> +	  modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9"); \
> +	  modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9"); \
> +	  modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9"); \
> +	  modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9"); \
> +	  modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9"); \
> +	  modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9"); \
> +	  modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9"); \
> +	  modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9"); \
> +	  modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9"); \
> +	  modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9"); \
> +	  printf "\n# Default glibc modules\n"; \
> +	} \
> +	function modul(from, to, file, cost) { \
> +	  print_val("module", 8); \
> +	  print_val(from, 24); \
> +	  print_val(to, 24); \
> +	  print_val(file, 24); \
> +	  if (cost == 0) cost = 1; \
> +	  printf "%d\n", cost; \
> +	} \
> +	function print_val(val, width) { \
> +	  # Emit value followed by tabs. \
> +	  printf "%s", val; \
> +	  len = length(val); \
> +	  if (len < width) { \
> +	    len = width - len; \
> +	    nr_tabs = len / 8; \
> +	    if (len % 8 != 0) nr_tabs++; \
> +	  } \
> +	  else nr_tabs = 1; \
> +	  for (i = 1; i <= nr_tabs; i++) printf "\t"; \
> +	}' < $< > $@
> +
> +GCONV_MODULES = gconv-modules-s390
> +
> +endif
> diff --git a/sysdeps/s390/iso-8859-1_cp037_z900.c b/sysdeps/s390/iso-8859-1_cp037_z900.c
> new file mode 100644
> index 0000000..5c19218
> --- /dev/null
> +++ b/sysdeps/s390/iso-8859-1_cp037_z900.c
> @@ -0,0 +1,262 @@
> +/* Conversion between ISO 8859-1 and IBM037.
> +
> +   This module uses the translate instruction.
> +   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> +
> +   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> +   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> +
> +   Thanks to Daniel Appich who covered the relevant performance work
> +   in his diploma thesis.
> +
> +   This is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   This is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <dlfcn.h>
> +#include <stdint.h>
> +
> +// conversion table from ISO-8859-1 to IBM037
> +static const unsigned char table_iso8859_1_to_cp037[256]
> +__attribute__ ((aligned (8))) =
> +{
> +  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
> +  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
> +  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
> +  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
> +  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
> +  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
> +  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
> +  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
> +  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
> +  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
> +  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
> +  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
> +  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
> +  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
> +  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
> +  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
> +  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
> +  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
> +  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
> +  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
> +  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
> +  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
> +  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
> +  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
> +  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
> +  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
> +  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
> +  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
> +  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
> +  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
> +  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
> +  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
> +  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
> +  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
> +  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
> +  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
> +  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
> +  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
> +  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
> +  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
> +  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
> +  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
> +  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
> +  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
> +  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
> +  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
> +  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
> +  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
> +  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
> +  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
> +  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
> +  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
> +  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
> +  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
> +  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
> +  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
> +  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
> +  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
> +  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
> +  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
> +  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
> +  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
> +  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
> +  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
> +};
> +
> +// conversion table from IBM037 to ISO-8859-1
> +static const unsigned char table_cp037_iso8859_1[256]
> +__attribute__ ((aligned (8))) =
> +{
> +  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
> +  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
> +  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
> +  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
> +  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
> +  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
> +  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
> +  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
> +  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
> +  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
> +  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
> +  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
> +  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
> +  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
> +  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
> +  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
> +  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
> +  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
> +  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
> +  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
> +  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
> +  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
> +  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
> +  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
> +  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
> +  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
> +  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
> +  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
> +  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
> +  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
> +  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
> +  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
> +  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
> +  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
> +  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
> +  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
> +  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
> +  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
> +  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
> +  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
> +  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
> +  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
> +  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
> +  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
> +  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
> +  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
> +  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
> +  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
> +  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
> +  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
> +  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
> +  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
> +  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
> +  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
> +  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
> +  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
> +  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
> +  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
> +  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
> +  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
> +  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
> +  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
> +  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
> +  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
> +};
> +
> +/* Definitions used in the body of the `gconv' function.  */
> +#define CHARSET_NAME		"ISO-8859-1//"
> +#define FROM_LOOP		iso8859_1_to_cp037_z900
> +#define TO_LOOP			cp037_to_iso8859_1_z900
> +#define DEFINE_INIT		1
> +#define DEFINE_FINI		1
> +#define MIN_NEEDED_FROM		1
> +#define MIN_NEEDED_TO		1
> +
> +# if defined __s390x__
> +#  define BRANCH_ON_COUNT(REG,LBL) "brctg %" #REG "," #LBL "\n\t"
> +# else
> +#  define BRANCH_ON_COUNT(REG,LBL) "brct %" #REG "," #LBL "\n\t"
> +# endif
> +
> +#define TR_LOOP(TABLE)							\
> +  {									\
> +    size_t length = (inend - inptr < outend - outptr			\
> +		     ? inend - inptr : outend - outptr);		\
> +									\
> +    /* Process in 256 byte blocks.  */					\
> +    if (__builtin_expect (length >= 256, 0))				\
> +      {									\
> +	size_t blocks = length / 256;					\
> +	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
> +			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
> +			     "la %[R_IN],256(%[R_IN])\n\t"		\
> +			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
> +			     BRANCH_ON_COUNT ([R_LI], 0b)		\
> +			     : /* outputs */ [R_IN] "+a" (inptr)	\
> +			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
> +			     : /* inputs */ [R_TBL] "a" (TABLE)		\
> +			     : /* clobber list */ "memory"		\
> +			     );						\
> +	length = length % 256;						\
> +      }									\
> +									\
> +    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
> +    if (length >= 8)							\
> +      {									\
> +	size_t blocks = length / 8;					\
> +	for (int i = 0; i < blocks; i++)				\
> +	  {								\
> +	    outptr[0] = TABLE[inptr[0]];				\
> +	    outptr[1] = TABLE[inptr[1]];				\
> +	    outptr[2] = TABLE[inptr[2]];				\
> +	    outptr[3] = TABLE[inptr[3]];				\
> +	    outptr[4] = TABLE[inptr[4]];				\
> +	    outptr[5] = TABLE[inptr[5]];				\
> +	    outptr[6] = TABLE[inptr[6]];				\
> +	    outptr[7] = TABLE[inptr[7]];				\
> +	    inptr += 8;							\
> +	    outptr += 8;						\
> +	  }								\
> +	length = length % 8;						\
> +      }									\
> +									\
> +    /* Process remaining 0...7 bytes.  */				\
> +    switch (length)							\
> +      {									\
> +      case 7: outptr[6] = TABLE[inptr[6]];				\
> +      case 6: outptr[5] = TABLE[inptr[5]];				\
> +      case 5: outptr[4] = TABLE[inptr[4]];				\
> +      case 4: outptr[3] = TABLE[inptr[3]];				\
> +      case 3: outptr[2] = TABLE[inptr[2]];				\
> +      case 2: outptr[1] = TABLE[inptr[1]];				\
> +      case 1: outptr[0] = TABLE[inptr[0]];				\
> +      case 0: break;							\
> +      }									\
> +    inptr += length;							\
> +    outptr += length;							\
> +  }
> +
> +
> +/* First define the conversion function from ISO 8859-1 to CP037.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define LOOPFCT			FROM_LOOP
> +#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
> +
> +#include <iconv/loop.c>
> +
> +
> +/* Next, define the conversion function from CP037 to ISO 8859-1.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define LOOPFCT			TO_LOOP
> +#define BODY			TR_LOOP (table_cp037_iso8859_1);
> +
> +#include <iconv/loop.c>
> +
> +
> +/* Now define the toplevel functions.  */
> +#include <iconv/skeleton.c>
> diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
> index d1ee59d..0a50514 100644
> --- a/sysdeps/s390/s390-64/Makefile
> +++ b/sysdeps/s390/s390-64/Makefile
> @@ -9,87 +9,3 @@ CFLAGS-rtld.c += -Wno-uninitialized -Wno-unused
>   CFLAGS-dl-load.c += -Wno-unused
>   CFLAGS-dl-reloc.c += -Wno-unused
>   endif
> -
> -ifeq ($(subdir),iconvdata)
> -ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
> -ISO-8859-1_CP037_Z900-map := gconv.map
> -
> -UTF8_UTF32_Z9-routines := utf8-utf32-z9
> -UTF8_UTF32_Z9-map := gconv.map
> -
> -UTF16_UTF32_Z9-routines := utf16-utf32-z9
> -UTF16_UTF32_Z9-map := gconv.map
> -
> -UTF8_UTF16_Z9-routines := utf8-utf16-z9
> -UTF8_UTF16_Z9-map := gconv.map
> -
> -s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
> -
> -extra-modules-left += $(s390x-iconv-modules)
> -include extra-module.mk
> -
> -cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
> -lib := iconvdata
> -include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
> -
> -extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
> -install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
> -
> -$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
> -$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
> -	$(do-install-program)
> -
> -$(objpfx)gconv-modules-s390: gconv-modules
> -	${AWK} 'BEGIN { emitted = 0 } \
> -	emitted || NF == 0 || $$1 ~ /^#/ { print; next; } \
> -	!emitted { emit_s390_modules(); emitted = 1; print; } \
> -	function emit_s390_modules() { \
> -	  # Emit header line. \
> -	  print "# S/390 hardware accelerated modules"; \
> -	  print_val("#", 8); \
> -	  print_val("from", 24); \
> -	  print_val("to", 24); \
> -	  print_val("module", 24); \
> -	  printf "cost\n"; \
> -	  # Emit s390-specific modules. \
> -	  modul("ISO-8859-1//", "IBM037//", "ISO-8859-1_CP037_Z900"); \
> -	  modul("IBM037//", "ISO-8859-1//", "ISO-8859-1_CP037_Z900"); \
> -	  modul("ISO-10646/UTF8/", "UTF-32//", "UTF8_UTF32_Z9"); \
> -	  modul("UTF-32BE//", "ISO-10646/UTF8/", "UTF8_UTF32_Z9"); \
> -	  modul("ISO-10646/UTF8/", "UTF-32BE//", "UTF8_UTF32_Z9"); \
> -	  modul("UTF-16BE//", "UTF-32//", "UTF16_UTF32_Z9"); \
> -	  modul("UTF-32BE//", "UTF-16//", "UTF16_UTF32_Z9"); \
> -	  modul("INTERNAL", "UTF-16//", "UTF16_UTF32_Z9"); \
> -	  modul("UTF-32BE//", "UTF-16BE//", "UTF16_UTF32_Z9"); \
> -	  modul("INTERNAL", "UTF-16BE//", "UTF16_UTF32_Z9"); \
> -	  modul("UTF-16BE//", "UTF-32BE//", "UTF16_UTF32_Z9"); \
> -	  modul("UTF-16BE//", "INTERNAL", "UTF16_UTF32_Z9"); \
> -	  modul("UTF-16BE//", "ISO-10646/UTF8/", "UTF8_UTF16_Z9"); \
> -	  modul("ISO-10646/UTF8/", "UTF-16//", "UTF8_UTF16_Z9"); \
> -	  modul("ISO-10646/UTF8/", "UTF-16BE//", "UTF8_UTF16_Z9"); \
> -	  printf "\n# Default glibc modules\n"; \
> -	} \
> -	function modul(from, to, file, cost) { \
> -	  print_val("module", 8); \
> -	  print_val(from, 24); \
> -	  print_val(to, 24); \
> -	  print_val(file, 24); \
> -	  if (cost == 0) cost = 1; \
> -	  printf "%d\n", cost; \
> -	} \
> -	function print_val(val, width) { \
> -	  # Emit value followed by tabs. \
> -	  printf "%s", val; \
> -	  len = length(val); \
> -	  if (len < width) { \
> -	    len = width - len; \
> -	    nr_tabs = len / 8; \
> -	    if (len % 8 != 0) nr_tabs++; \
> -	  } \
> -	  else nr_tabs = 1; \
> -	  for (i = 1; i <= nr_tabs; i++) printf "\t"; \
> -	}' < $< > $@
> -
> -GCONV_MODULES = gconv-modules-s390
> -
> -endif
> diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
> deleted file mode 100644
> index 4d79bbf..0000000
> --- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
> +++ /dev/null
> @@ -1,256 +0,0 @@
> -/* Conversion between ISO 8859-1 and IBM037.
> -
> -   This module uses the translate instruction.
> -   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> -
> -   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> -   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> -
> -   Thanks to Daniel Appich who covered the relevant performance work
> -   in his diploma thesis.
> -
> -   This is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   This is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -#include <dlfcn.h>
> -#include <stdint.h>
> -
> -// conversion table from ISO-8859-1 to IBM037
> -static const unsigned char table_iso8859_1_to_cp037[256]
> -__attribute__ ((aligned (8))) =
> -{
> -  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
> -  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
> -  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
> -  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
> -  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
> -  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
> -  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
> -  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
> -  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
> -  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
> -  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
> -  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
> -  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
> -  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
> -  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
> -  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
> -  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
> -  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
> -  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
> -  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
> -  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
> -  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
> -  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
> -  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
> -  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
> -  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
> -  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
> -  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
> -  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
> -  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
> -  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
> -  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
> -  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
> -  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
> -  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
> -  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
> -  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
> -  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
> -  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
> -  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
> -  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
> -  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
> -  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
> -  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
> -  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
> -  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
> -  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
> -  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
> -  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
> -  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
> -  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
> -  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
> -  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
> -  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
> -  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
> -  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
> -  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
> -  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
> -  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
> -  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
> -  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
> -  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
> -  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
> -  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
> -};
> -
> -// conversion table from IBM037 to ISO-8859-1
> -static const unsigned char table_cp037_iso8859_1[256]
> -__attribute__ ((aligned (8))) =
> -{
> -  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
> -  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
> -  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
> -  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
> -  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
> -  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
> -  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
> -  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
> -  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
> -  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
> -  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
> -  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
> -  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
> -  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
> -  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
> -  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
> -  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
> -  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
> -  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
> -  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
> -  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
> -  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
> -  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
> -  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
> -  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
> -  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
> -  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
> -  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
> -  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
> -  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
> -  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
> -  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
> -  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
> -  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
> -  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
> -  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
> -  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
> -  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
> -  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
> -  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
> -  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
> -  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
> -  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
> -  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
> -  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
> -  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
> -  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
> -  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
> -  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
> -  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
> -  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
> -  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
> -  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
> -  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
> -  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
> -  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
> -  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
> -  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
> -  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
> -  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
> -  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
> -  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
> -  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
> -  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
> -};
> -
> -/* Definitions used in the body of the `gconv' function.  */
> -#define CHARSET_NAME		"ISO-8859-1//"
> -#define FROM_LOOP		iso8859_1_to_cp037_z900
> -#define TO_LOOP			cp037_to_iso8859_1_z900
> -#define DEFINE_INIT		1
> -#define DEFINE_FINI		1
> -#define MIN_NEEDED_FROM		1
> -#define MIN_NEEDED_TO		1
> -
> -#define TR_LOOP(TABLE)							\
> -  {									\
> -    size_t length = (inend - inptr < outend - outptr			\
> -		     ? inend - inptr : outend - outptr);		\
> -									\
> -    /* Process in 256 byte blocks.  */					\
> -    if (__builtin_expect (length >= 256, 0))				\
> -      {									\
> -	size_t blocks = length / 256;					\
> -	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
> -			     "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
> -			     "la %[R_IN],256(%[R_IN])\n\t"		\
> -			     "la %[R_OUT],256(%[R_OUT])\n\t"		\
> -			     "brctg %[R_LI],0b\n\t"			\
> -			     : /* outputs */ [R_IN] "+a" (inptr)	\
> -			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
> -			     : /* inputs */ [R_TBL] "a" (TABLE)		\
> -			     : /* clobber list */ "memory"		\
> -			     );						\
> -	length = length % 256;						\
> -      }									\
> -									\
> -    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
> -    if (length >= 8)							\
> -      {									\
> -	size_t blocks = length / 8;					\
> -	for (int i = 0; i < blocks; i++)				\
> -	  {								\
> -	    outptr[0] = TABLE[inptr[0]];				\
> -	    outptr[1] = TABLE[inptr[1]];				\
> -	    outptr[2] = TABLE[inptr[2]];				\
> -	    outptr[3] = TABLE[inptr[3]];				\
> -	    outptr[4] = TABLE[inptr[4]];				\
> -	    outptr[5] = TABLE[inptr[5]];				\
> -	    outptr[6] = TABLE[inptr[6]];				\
> -	    outptr[7] = TABLE[inptr[7]];				\
> -	    inptr += 8;							\
> -	    outptr += 8;						\
> -	  }								\
> -	length = length % 8;						\
> -      }									\
> -									\
> -    /* Process remaining 0...7 bytes.  */				\
> -    switch (length)							\
> -      {									\
> -      case 7: outptr[6] = TABLE[inptr[6]];				\
> -      case 6: outptr[5] = TABLE[inptr[5]];				\
> -      case 5: outptr[4] = TABLE[inptr[4]];				\
> -      case 4: outptr[3] = TABLE[inptr[3]];				\
> -      case 3: outptr[2] = TABLE[inptr[2]];				\
> -      case 2: outptr[1] = TABLE[inptr[1]];				\
> -      case 1: outptr[0] = TABLE[inptr[0]];				\
> -      case 0: break;							\
> -      }									\
> -    inptr += length;							\
> -    outptr += length;							\
> -  }
> -
> -
> -/* First define the conversion function from ISO 8859-1 to CP037.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define LOOPFCT			FROM_LOOP
> -#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
> -
> -#include <iconv/loop.c>
> -
> -
> -/* Next, define the conversion function from CP037 to ISO 8859-1.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define LOOPFCT			TO_LOOP
> -#define BODY			TR_LOOP (table_cp037_iso8859_1);
> -
> -#include <iconv/loop.c>
> -
> -
> -/* Now define the toplevel functions.  */
> -#include <iconv/skeleton.c>
> diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
> deleted file mode 100644
> index 4c2c548..0000000
> --- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
> +++ /dev/null
> @@ -1,624 +0,0 @@
> -/* Conversion between UTF-16 and UTF-32 BE/internal.
> -
> -   This module uses the Z9-109 variants of the Convert Unicode
> -   instructions.
> -   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> -
> -   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> -   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> -
> -   Thanks to Daniel Appich who covered the relevant performance work
> -   in his diploma thesis.
> -
> -   This is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   This is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -#include <dlfcn.h>
> -#include <stdint.h>
> -#include <unistd.h>
> -#include <dl-procinfo.h>
> -#include <gconv.h>
> -
> -#if defined HAVE_S390_VX_GCC_SUPPORT
> -# define ASM_CLOBBER_VR(NR) , NR
> -#else
> -# define ASM_CLOBBER_VR(NR)
> -#endif
> -
> -/* UTF-32 big endian byte order mark.  */
> -#define BOM_UTF32               0x0000feffu
> -
> -/* UTF-16 big endian byte order mark.  */
> -#define BOM_UTF16               0xfeff
> -
> -#define DEFINE_INIT		0
> -#define DEFINE_FINI		0
> -#define MIN_NEEDED_FROM		2
> -#define MAX_NEEDED_FROM		4
> -#define MIN_NEEDED_TO		4
> -#define FROM_LOOP		__from_utf16_loop
> -#define TO_LOOP			__to_utf16_loop
> -#define FROM_DIRECTION		(dir == from_utf16)
> -#define ONE_DIRECTION           0
> -
> -/* Direction of the transformation.  */
> -enum direction
> -{
> -  illegal_dir,
> -  to_utf16,
> -  from_utf16
> -};
> -
> -struct utf16_data
> -{
> -  enum direction dir;
> -  int emit_bom;
> -};
> -
> -
> -extern int gconv_init (struct __gconv_step *step);
> -int
> -gconv_init (struct __gconv_step *step)
> -{
> -  /* Determine which direction.  */
> -  struct utf16_data *new_data;
> -  enum direction dir = illegal_dir;
> -  int emit_bom;
> -  int result;
> -
> -  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
> -	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
> -
> -  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
> -      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
> -	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
> -	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
> -    {
> -      dir = from_utf16;
> -    }
> -  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
> -	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
> -	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
> -	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
> -    {
> -      dir = to_utf16;
> -    }
> -
> -  result = __GCONV_NOCONV;
> -  if (dir != illegal_dir)
> -    {
> -      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
> -
> -      result = __GCONV_NOMEM;
> -      if (new_data != NULL)
> -	{
> -	  new_data->dir = dir;
> -	  new_data->emit_bom = emit_bom;
> -	  step->__data = new_data;
> -
> -	  if (dir == from_utf16)
> -	    {
> -	      step->__min_needed_from = MIN_NEEDED_FROM;
> -	      step->__max_needed_from = MIN_NEEDED_FROM;
> -	      step->__min_needed_to = MIN_NEEDED_TO;
> -	      step->__max_needed_to = MIN_NEEDED_TO;
> -	    }
> -	  else
> -	    {
> -	      step->__min_needed_from = MIN_NEEDED_TO;
> -	      step->__max_needed_from = MIN_NEEDED_TO;
> -	      step->__min_needed_to = MIN_NEEDED_FROM;
> -	      step->__max_needed_to = MIN_NEEDED_FROM;
> -	    }
> -
> -	  step->__stateful = 0;
> -
> -	  result = __GCONV_OK;
> -	}
> -    }
> -
> -  return result;
> -}
> -
> -
> -extern void gconv_end (struct __gconv_step *data);
> -void
> -gconv_end (struct __gconv_step *data)
> -{
> -  free (data->__data);
> -}
> -
> -/* The macro for the hardware loop.  This is used for both
> -   directions.  */
> -#define HARDWARE_CONVERT(INSTRUCTION)					\
> -  {									\
> -    register const unsigned char* pInput __asm__ ("8") = inptr;		\
> -    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
> -    register unsigned char* pOutput __asm__ ("10") = outptr;		\
> -    register unsigned long long outlen __asm__("11") = outend - outptr;	\
> -    uint64_t cc = 0;							\
> -									\
> -    __asm__ __volatile__ (".machine push       \n\t"			\
> -			  ".machine \"z9-109\" \n\t"			\
> -			  "0: " INSTRUCTION "  \n\t"			\
> -			  ".machine pop        \n\t"			\
> -			  "   jo     0b        \n\t"			\
> -			  "   ipm    %2        \n"			\
> -			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> -			    "+d" (outlen), "+d" (inlen)			\
> -			  :						\
> -			  : "cc", "memory");				\
> -									\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -    cc >>= 28;								\
> -									\
> -    if (cc == 1)							\
> -      {									\
> -	result = __GCONV_FULL_OUTPUT;					\
> -      }									\
> -    else if (cc == 2)							\
> -      {									\
> -	result = __GCONV_ILLEGAL_INPUT;					\
> -      }									\
> -  }
> -
> -#define PREPARE_LOOP							\
> -  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
> -  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
> -									\
> -  if (emit_bom && !data->__internal_use					\
> -      && data->__invocation_counter == 0)				\
> -    {									\
> -      if (dir == to_utf16)						\
> -	{								\
> -	  /* Emit the UTF-16 Byte Order Mark.  */			\
> -	  if (__glibc_unlikely (outbuf + 2 > outend))			\
> -	    return __GCONV_FULL_OUTPUT;					\
> -									\
> -	  put16u (outbuf, BOM_UTF16);					\
> -	  outbuf += 2;							\
> -	}								\
> -      else								\
> -	{								\
> -	  /* Emit the UTF-32 Byte Order Mark.  */			\
> -	  if (__glibc_unlikely (outbuf + 4 > outend))			\
> -	    return __GCONV_FULL_OUTPUT;					\
> -									\
> -	  put32u (outbuf, BOM_UTF32);					\
> -	  outbuf += 4;							\
> -	}								\
> -    }
> -
> -/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
> -
> -/* The software routine is copied from utf-16.c (minus bytes
> -   swapping).  */
> -#define BODY_FROM_C							\
> -  {									\
> -    uint16_t u1 = get16 (inptr);					\
> -									\
> -    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
> -      {									\
> -	/* No surrogate.  */						\
> -	put32 (outptr, u1);						\
> -	inptr += 2;							\
> -      }									\
> -    else								\
> -      {									\
> -	/* An isolated low-surrogate was found.  This has to be         \
> -	   considered ill-formed.  */					\
> -	if (__glibc_unlikely (u1 >= 0xdc00))				\
> -	  {								\
> -	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
> -	  }								\
> -	/* It's a surrogate character.  At least the first word says	\
> -	   it is.  */							\
> -	if (__glibc_unlikely (inptr + 4 > inend))			\
> -	  {								\
> -	    /* We don't have enough input for another complete input	\
> -	       character.  */						\
> -	    result = __GCONV_INCOMPLETE_INPUT;				\
> -	    break;							\
> -	  }								\
> -									\
> -	inptr += 2;							\
> -	uint16_t u2 = get16 (inptr);					\
> -	if (__builtin_expect (u2 < 0xdc00, 0)				\
> -	    || __builtin_expect (u2 > 0xdfff, 0))			\
> -	  {								\
> -	    /* This is no valid second word for a surrogate.  */	\
> -	    inptr -= 2;							\
> -	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
> -	  }								\
> -									\
> -	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
> -	inptr += 2;							\
> -      }									\
> -    outptr += 4;							\
> -  }
> -
> -#define BODY_FROM_VX							\
> -  {									\
> -    size_t inlen = inend - inptr;					\
> -    size_t outlen = outend - outptr;					\
> -    unsigned long tmp, tmp2, tmp3;					\
> -    asm volatile (".machine push\n\t"					\
> -		  ".machine \"z13\"\n\t"				\
> -		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> -		  /* Setup to check for surrogates.  */			\
> -		  "larl %[R_TMP],9f\n\t"				\
> -		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> -		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
> -		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
> -		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
> -		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> -		  /* Check for surrogate chars.  */			\
> -		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
> -		  "jno 10f\n\t"						\
> -		  /* Enlarge to UTF-32.  */				\
> -		  "vuplhh %%v17,%%v16\n\t"				\
> -		  "la %[R_IN],16(%[R_IN])\n\t"				\
> -		  "vupllh %%v18,%%v16\n\t"				\
> -		  "aghi %[R_INLEN],-16\n\t"				\
> -		  /* Store 32 bytes to buf_out.  */			\
> -		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
> -		  "aghi %[R_OUTLEN],-32\n\t"				\
> -		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],16,2f\n\t"				\
> -		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
> -		  "j 1b\n\t"						\
> -		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
> -		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> -		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> -		  /* At least on uint16_t is in range of surrogates.	\
> -		     Store the preceding chars.  */			\
> -		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> -		  "vuplhh %%v17,%%v16\n\t"				\
> -		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> -		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> -		  "jl 12f\n\t"						\
> -		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
> -		  "vupllh %%v18,%%v16\n\t"				\
> -		  "ahi %[R_TMP2],-16\n\t"				\
> -		  "jl 11f\n\t"						\
> -		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
> -		  "11: \n\t" /* Update pointers.  */			\
> -		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> -		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> -		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> -		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> -		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
> -		  "12: lghi %[R_TMP2],16\n\t"				\
> -		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
> -		  "srl %[R_TMP2],1\n\t"					\
> -		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
> -		  "aghi %[R_OUTLEN],-4\n\t"				\
> -		  "j 16f\n\t"						\
> -		  /* Handle remaining bytes.  */			\
> -		  "2:\n\t"						\
> -		  /* Zero, one or more bytes available?  */		\
> -		  "clgfi %[R_INLEN],1\n\t"				\
> -		  "je 97f\n\t" /* Only one byte available.  */		\
> -		  "jl 99f\n\t" /* End if no bytes available.  */	\
> -		  /* Calculate remaining uint16_t values in inptr.  */	\
> -		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
> -		  /* Handle remaining uint16_t values.  */		\
> -		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
> -		  "slgfi %[R_OUTLEN],4\n\t"				\
> -		  "jl 96f \n\t"						\
> -		  "clfi %[R_TMP],0xd800\n\t"				\
> -		  "jhe 15f\n\t"						\
> -		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
> -		  "la %[R_IN],2(%[R_IN])\n\t"				\
> -		  "aghi %[R_INLEN],-2\n\t"				\
> -		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> -		  "brctg %[R_TMP2],13b\n\t"				\
> -		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> -		  /* Handle UTF-16 surrogate pair.  */			\
> -		  "15: clfi %[R_TMP],0xdfff\n\t"			\
> -		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
> -		  "16: clfi %[R_TMP],0xdc00\n\t"			\
> -		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
> -		  "slgfi %[R_INLEN],4\n\t"				\
> -		  "jl 97f\n\t" /* Big enough input?  */			\
> -		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
> -		  "slfi %[R_TMP],0xd7c0\n\t"				\
> -		  "sll %[R_TMP],10\n\t"					\
> -		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
> -		  "nilf %[R_TMP3],0xfc00\n\t"				\
> -		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
> -		  "jne 98f\n\t"						\
> -		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
> -		  "la %[R_IN],4(%[R_IN])\n\t"				\
> -		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> -		  "aghi %[R_TMP2],-2\n\t"				\
> -		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
> -		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> -		  "96:\n\t" /* Return full output.  */			\
> -		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
> -		  "j 99f\n\t"						\
> -		  "97:\n\t" /* Return incomplete input.  */		\
> -		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
> -		  "j 99f\n\t"						\
> -		  "98:\n\t" /* Return Illegal character.  */		\
> -		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
> -		  "99:\n\t"						\
> -		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (inptr)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
> -		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> -		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> -		    , [R_RES] "+d" (result)				\
> -		  : /* inputs */					\
> -		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> -		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> -		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> -		  : /* clobber list */ "memory", "cc"			\
> -		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> -		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> -		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> -		  );							\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result != __GCONV_ILLEGAL_INPUT)				\
> -      break;								\
> -									\
> -    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
> -  }
> -
> -
> -/* Generate loop-function with software routing.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -# define LOOPFCT		__from_utf16_loop_c
> -# define LOOP_NEED_FLAGS
> -# define BODY			BODY_FROM_C
> -# include <iconv/loop.c>
> -
> -/* Generate loop-function with hardware vector instructions.  */
> -# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -# define LOOPFCT		__from_utf16_loop_vx
> -# define LOOP_NEED_FLAGS
> -# define BODY			BODY_FROM_VX
> -# include <iconv/loop.c>
> -
> -/* Generate ifunc'ed loop function.  */
> -__typeof(__from_utf16_loop_c)
> -__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
> -__from_utf16_loop;
> -
> -static void *
> -__from_utf16_loop_resolver (unsigned long int dl_hwcap)
> -{
> -  if (dl_hwcap & HWCAP_S390_VX)
> -    return __from_utf16_loop_vx;
> -  else
> -    return __from_utf16_loop_c;
> -}
> -
> -strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
> -#else
> -# define LOOPFCT		FROM_LOOP
> -# define LOOP_NEED_FLAGS
> -# define BODY			BODY_FROM_C
> -# include <iconv/loop.c>
> -#endif
> -
> -/* Conversion from UTF-32 internal/BE to UTF-16.  */
> -
> -/* The software routine is copied from utf-16.c (minus bytes
> -   swapping).  */
> -#define BODY_TO_C							\
> -  {									\
> -    uint32_t c = get32 (inptr);						\
> -									\
> -    if (__builtin_expect (c <= 0xd7ff, 1)				\
> -	|| (c >=0xdc00 && c <= 0xffff))					\
> -      {									\
> -	/* Two UTF-16 chars.  */					\
> -	put16 (outptr, c);						\
> -      }									\
> -    else if (__builtin_expect (c >= 0x10000, 1)				\
> -	     && __builtin_expect (c <= 0x10ffff, 1))			\
> -      {									\
> -	/* Four UTF-16 chars.  */					\
> -	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
> -	uint16_t out;							\
> -									\
> -	/* Generate a surrogate character.  */				\
> -	if (__glibc_unlikely (outptr + 4 > outend))			\
> -	  {								\
> -	    /* Overflow in the output buffer.  */			\
> -	    result = __GCONV_FULL_OUTPUT;				\
> -	    break;							\
> -	  }								\
> -									\
> -	out = 0xd800;							\
> -	out |= (zabcd & 0xff) << 6;					\
> -	out |= (c >> 10) & 0x3f;					\
> -	put16 (outptr, out);						\
> -	outptr += 2;							\
> -									\
> -	out = 0xdc00;							\
> -	out |= c & 0x3ff;						\
> -	put16 (outptr, out);						\
> -      }									\
> -    else								\
> -      {									\
> -	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
> -      }									\
> -    outptr += 2;							\
> -    inptr += 4;								\
> -  }
> -
> -#define BODY_TO_ETF3EH							\
> -  {									\
> -    HARDWARE_CONVERT ("cu42 %0, %1");					\
> -									\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> -      break;								\
> -									\
> -    if (inptr + 4 > inend)						\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> -									\
> -    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> -  }
> -
> -#define BODY_TO_VX							\
> -  {									\
> -    register const unsigned char* pInput asm ("8") = inptr;		\
> -    register size_t inlen asm ("9") = inend - inptr;			\
> -    register unsigned char* pOutput asm ("10") = outptr;		\
> -    register size_t outlen asm("11") = outend - outptr;			\
> -    unsigned long tmp, tmp2, tmp3;					\
> -    asm volatile (".machine push\n\t"					\
> -		  ".machine \"z13\"\n\t"				\
> -		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> -		  /* Setup to check for surrogates.  */			\
> -		  "larl %[R_TMP],9f\n\t"				\
> -		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> -		  /* Loop which handles UTF-16 chars			\
> -		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
> -		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> -		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
> -		  "lghi %[R_TMP2],0\n\t"				\
> -		  /* Shorten to UTF-16.  */				\
> -		  "vpkf %%v18,%%v16,%%v17\n\t"				\
> -		  /* Check for surrogate chars.  */			\
> -		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
> -		  "jno 10f\n\t"						\
> -		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
> -		  "jno 11f\n\t"						\
> -		  /* Store 16 bytes to buf_out.  */			\
> -		  "vst %%v18,0(%[R_OUT])\n\t"				\
> -		  "la %[R_IN],32(%[R_IN])\n\t"				\
> -		  "aghi %[R_INLEN],-32\n\t"				\
> -		  "aghi %[R_OUTLEN],-16\n\t"				\
> -		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],32,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> -		  "j 1b\n\t"						\
> -		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
> -		     and check for ch >= 0x10000. (v30, v31)  */	\
> -		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
> -		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
> -		  /* At least on UTF32 char is in range of surrogates.	\
> -		     Store the preceding characters.  */		\
> -		  "11: ahi %[R_TMP2],16\n\t"				\
> -		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> -		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
> -		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> -		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> -		  "jl 20f\n\t"						\
> -		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> -		  /* Update pointers.  */				\
> -		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> -		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> -		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> -		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> -		  /* Handles UTF16 surrogates with convert instruction.  */ \
> -		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
> -		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> -		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> -		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> -		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (pInput)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> -		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> -		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> -		    , [R_RES] "+d" (result)				\
> -		  : /* inputs */					\
> -		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> -		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> -		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> -		  : /* clobber list */ "memory", "cc"			\
> -		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> -		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> -		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> -		  );							\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -									\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> -      break;								\
> -    if (inptr + 4 > inend)						\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> -    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> -  }
> -
> -/* Generate loop-function with software routing.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			__to_utf16_loop_c
> -#define LOOP_NEED_FLAGS
> -#define BODY			BODY_TO_C
> -#include <iconv/loop.c>
> -
> -/* Generate loop-function with hardware utf-convert instruction.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			__to_utf16_loop_etf3eh
> -#define LOOP_NEED_FLAGS
> -#define BODY			BODY_TO_ETF3EH
> -#include <iconv/loop.c>
> -
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -/* Generate loop-function with hardware vector instructions.  */
> -# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -# define LOOPFCT		__to_utf16_loop_vx
> -# define LOOP_NEED_FLAGS
> -# define BODY			BODY_TO_VX
> -# include <iconv/loop.c>
> -#endif
> -
> -/* Generate ifunc'ed loop function.  */
> -__typeof(__to_utf16_loop_c)
> -__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
> -__to_utf16_loop;
> -
> -static void *
> -__to_utf16_loop_resolver (unsigned long int dl_hwcap)
> -{
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -  if (dl_hwcap & HWCAP_S390_VX)
> -    return __to_utf16_loop_vx;
> -  else
> -#endif
> -  if (dl_hwcap & HWCAP_S390_ETF3EH)
> -    return __to_utf16_loop_etf3eh;
> -  else
> -    return __to_utf16_loop_c;
> -}
> -
> -strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
> -
> -
> -#include <iconv/skeleton.c>
> diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
> deleted file mode 100644
> index 76625d0..0000000
> --- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
> +++ /dev/null
> @@ -1,806 +0,0 @@
> -/* Conversion between UTF-16 and UTF-32 BE/internal.
> -
> -   This module uses the Z9-109 variants of the Convert Unicode
> -   instructions.
> -   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> -
> -   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> -   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> -
> -   Thanks to Daniel Appich who covered the relevant performance work
> -   in his diploma thesis.
> -
> -   This is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   This is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -#include <dlfcn.h>
> -#include <stdint.h>
> -#include <unistd.h>
> -#include <dl-procinfo.h>
> -#include <gconv.h>
> -
> -#if defined HAVE_S390_VX_GCC_SUPPORT
> -# define ASM_CLOBBER_VR(NR) , NR
> -#else
> -# define ASM_CLOBBER_VR(NR)
> -#endif
> -
> -/* Defines for skeleton.c.  */
> -#define DEFINE_INIT		0
> -#define DEFINE_FINI		0
> -#define MIN_NEEDED_FROM		1
> -#define MAX_NEEDED_FROM		4
> -#define MIN_NEEDED_TO		2
> -#define MAX_NEEDED_TO		4
> -#define FROM_LOOP		__from_utf8_loop
> -#define TO_LOOP			__to_utf8_loop
> -#define FROM_DIRECTION		(dir == from_utf8)
> -#define ONE_DIRECTION           0
> -
> -
> -/* UTF-16 big endian byte order mark.  */
> -#define BOM_UTF16	0xfeff
> -
> -/* Direction of the transformation.  */
> -enum direction
> -{
> -  illegal_dir,
> -  to_utf8,
> -  from_utf8
> -};
> -
> -struct utf8_data
> -{
> -  enum direction dir;
> -  int emit_bom;
> -};
> -
> -
> -extern int gconv_init (struct __gconv_step *step);
> -int
> -gconv_init (struct __gconv_step *step)
> -{
> -  /* Determine which direction.  */
> -  struct utf8_data *new_data;
> -  enum direction dir = illegal_dir;
> -  int emit_bom;
> -  int result;
> -
> -  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
> -
> -  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
> -      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
> -	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
> -    {
> -      dir = from_utf8;
> -    }
> -  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
> -	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
> -    {
> -      dir = to_utf8;
> -    }
> -
> -  result = __GCONV_NOCONV;
> -  if (dir != illegal_dir)
> -    {
> -      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
> -
> -      result = __GCONV_NOMEM;
> -      if (new_data != NULL)
> -	{
> -	  new_data->dir = dir;
> -	  new_data->emit_bom = emit_bom;
> -	  step->__data = new_data;
> -
> -	  if (dir == from_utf8)
> -	    {
> -	      step->__min_needed_from = MIN_NEEDED_FROM;
> -	      step->__max_needed_from = MIN_NEEDED_FROM;
> -	      step->__min_needed_to = MIN_NEEDED_TO;
> -	      step->__max_needed_to = MIN_NEEDED_TO;
> -	    }
> -	  else
> -	    {
> -	      step->__min_needed_from = MIN_NEEDED_TO;
> -	      step->__max_needed_from = MIN_NEEDED_TO;
> -	      step->__min_needed_to = MIN_NEEDED_FROM;
> -	      step->__max_needed_to = MIN_NEEDED_FROM;
> -	    }
> -
> -	  step->__stateful = 0;
> -
> -	  result = __GCONV_OK;
> -	}
> -    }
> -
> -  return result;
> -}
> -
> -
> -extern void gconv_end (struct __gconv_step *data);
> -void
> -gconv_end (struct __gconv_step *data)
> -{
> -  free (data->__data);
> -}
> -
> -/* The macro for the hardware loop.  This is used for both
> -   directions.  */
> -#define HARDWARE_CONVERT(INSTRUCTION)					\
> -  {									\
> -    register const unsigned char* pInput __asm__ ("8") = inptr;		\
> -    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
> -    register unsigned char* pOutput __asm__ ("10") = outptr;		\
> -    register unsigned long long outlen __asm__("11") = outend - outptr;	\
> -    uint64_t cc = 0;							\
> -									\
> -    __asm__ __volatile__ (".machine push       \n\t"			\
> -			  ".machine \"z9-109\" \n\t"			\
> -			  "0: " INSTRUCTION "  \n\t"			\
> -			  ".machine pop        \n\t"			\
> -			  "   jo     0b        \n\t"			\
> -			  "   ipm    %2        \n"			\
> -			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> -			    "+d" (outlen), "+d" (inlen)			\
> -			  :						\
> -			  : "cc", "memory");				\
> -									\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -    cc >>= 28;								\
> -									\
> -    if (cc == 1)							\
> -      {									\
> -	result = __GCONV_FULL_OUTPUT;					\
> -      }									\
> -    else if (cc == 2)							\
> -      {									\
> -	result = __GCONV_ILLEGAL_INPUT;					\
> -      }									\
> -  }
> -
> -#define PREPARE_LOOP							\
> -  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> -  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> -									\
> -  if (emit_bom && !data->__internal_use					\
> -      && data->__invocation_counter == 0)				\
> -    {									\
> -      /* Emit the UTF-16 Byte Order Mark.  */				\
> -      if (__glibc_unlikely (outbuf + 2 > outend))			\
> -	return __GCONV_FULL_OUTPUT;					\
> -									\
> -      put16u (outbuf, BOM_UTF16);					\
> -      outbuf += 2;							\
> -    }
> -
> -/* Conversion function from UTF-8 to UTF-16.  */
> -#define BODY_FROM_HW(ASM)						\
> -  {									\
> -    ASM;								\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> -      break;								\
> -									\
> -    int i;								\
> -    for (i = 1; inptr + i < inend && i < 5; ++i)			\
> -      if ((inptr[i] & 0xc0) != 0x80)					\
> -	break;								\
> -									\
> -    if (__glibc_likely (inptr + i == inend				\
> -			&& result == __GCONV_EMPTY_INPUT))		\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> -    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
> -  }
> -
> -#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
> -
> -#define HW_FROM_VX							\
> -  {									\
> -    register const unsigned char* pInput asm ("8") = inptr;		\
> -    register size_t inlen asm ("9") = inend - inptr;			\
> -    register unsigned char* pOutput asm ("10") = outptr;		\
> -    register size_t outlen asm("11") = outend - outptr;			\
> -    unsigned long tmp, tmp2, tmp3;					\
> -    asm volatile (".machine push\n\t"					\
> -		  ".machine \"z13\"\n\t"				\
> -		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> -		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
> -		  "vrepib %%v31,0x20\n\t"				\
> -		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
> -		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
> -		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> -		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
> -		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> -				   UTF8 chars.  */			\
> -		  /* Enlarge to UTF-16.  */				\
> -		  "vuplhb %%v18,%%v16\n\t"				\
> -		  "la %[R_IN],16(%[R_IN])\n\t"				\
> -		  "vupllb %%v19,%%v16\n\t"				\
> -		  "aghi %[R_INLEN],-16\n\t"				\
> -		  /* Store 32 bytes to buf_out.  */			\
> -		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
> -		  "aghi %[R_OUTLEN],-32\n\t"				\
> -		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],16,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
> -		  "j 1b\n\t"						\
> -		  "10:\n\t"						\
> -		  /* At least one byte is > 0x7f.			\
> -		     Store the preceding 1-byte chars.  */		\
> -		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
> -		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
> -						     index to store. */ \
> -		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
> -		  "ahi %[R_TMP2],-1\n\t"				\
> -		  "jl 20f\n\t"						\
> -		  "vuplhb %%v18,%%v16\n\t"				\
> -		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> -		  "ahi %[R_TMP2],-16\n\t"				\
> -		  "jl 11f\n\t"						\
> -		  "vupllb %%v19,%%v16\n\t"				\
> -		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
> -		  "11:\n\t" /* Update pointers.  */			\
> -		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> -		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> -		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> -		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> -		  /* Handle multibyte utf8-char with convert instruction. */ \
> -		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
> -		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> -		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> -		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> -		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (pInput)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> -		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> -		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> -		    , [R_RES] "+d" (result)				\
> -		  : /* inputs */					\
> -		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> -		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> -		  : /* clobber list */ "memory", "cc"			\
> -		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> -		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> -		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> -		  );							\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -  }
> -#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
> -
> -
> -/* The software implementation is based on the code in gconv_simple.c.  */
> -#define BODY_FROM_C							\
> -  {									\
> -    /* Next input byte.  */						\
> -    uint16_t ch = *inptr;						\
> -									\
> -    if (__glibc_likely (ch < 0x80))					\
> -      {									\
> -	/* One byte sequence.  */					\
> -	++inptr;							\
> -      }									\
> -    else								\
> -      {									\
> -	uint_fast32_t cnt;						\
> -	uint_fast32_t i;						\
> -									\
> -	if (ch >= 0xc2 && ch < 0xe0)					\
> -	  {								\
> -	    /* We expect two bytes.  The first byte cannot be 0xc0	\
> -	       or 0xc1, otherwise the wide character could have been	\
> -	       represented using a single byte.  */			\
> -	    cnt = 2;							\
> -	    ch &= 0x1f;							\
> -	  }								\
> -	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
> -	  {								\
> -	    /* We expect three bytes.  */				\
> -	    cnt = 3;							\
> -	    ch &= 0x0f;							\
> -	  }								\
> -	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
> -	  {								\
> -	    /* We expect four bytes.  */				\
> -	    cnt = 4;							\
> -	    ch &= 0x07;							\
> -	  }								\
> -	else								\
> -	  {								\
> -	    /* Search the end of this ill-formed UTF-8 character.  This	\
> -	       is the next byte with (x & 0xc0) != 0x80.  */		\
> -	    i = 0;							\
> -	    do								\
> -	      ++i;							\
> -	    while (inptr + i < inend					\
> -		   && (*(inptr + i) & 0xc0) == 0x80			\
> -		   && i < 5);						\
> -									\
> -	  errout:							\
> -	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
> -	  }								\
> -									\
> -	if (__glibc_unlikely (inptr + cnt > inend))			\
> -	  {								\
> -	    /* We don't have enough input.  But before we report	\
> -	       that check that all the bytes are correct.  */		\
> -	    for (i = 1; inptr + i < inend; ++i)				\
> -	      if ((inptr[i] & 0xc0) != 0x80)				\
> -		break;							\
> -									\
> -	    if (__glibc_likely (inptr + i == inend))			\
> -	      {								\
> -		result = __GCONV_INCOMPLETE_INPUT;			\
> -		break;							\
> -	      }								\
> -									\
> -	    goto errout;						\
> -	  }								\
> -									\
> -	if (cnt == 4)							\
> -	  {								\
> -	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
> -	       low) are needed.  */					\
> -	    uint16_t zabcd, high, low;					\
> -									\
> -	    if (__glibc_unlikely (outptr + 4 > outend))			\
> -	      {								\
> -		/* Overflow in the output buffer.  */			\
> -		result = __GCONV_FULL_OUTPUT;				\
> -		break;							\
> -	      }								\
> -									\
> -	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
> -	    for (i = 1; i < cnt; ++i)					\
> -	      {								\
> -		if ((inptr[i] & 0xc0) != 0x80)				\
> -		  /* This is an illegal encoding.  */			\
> -		  goto errout;						\
> -	      }								\
> -									\
> -	    /* See Principles of Operations cu12.  */			\
> -	    zabcd = (((inptr[0] & 0x7) << 2) |				\
> -		     ((inptr[1] & 0x30) >> 4)) - 1;			\
> -									\
> -	    /* z-bit must be zero after subtracting 1.  */		\
> -	    if (zabcd & 0x10)						\
> -	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
> -									\
> -	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
> -	    high |= zabcd << 6;                         /* abcd bits */	\
> -	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
> -	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
> -									\
> -	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
> -	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
> -	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
> -	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
> -									\
> -	    put16 (outptr, high);					\
> -	    outptr += 2;						\
> -	    put16 (outptr, low);					\
> -	    outptr += 2;						\
> -	    inptr += 4;							\
> -	    continue;							\
> -	  }								\
> -	else								\
> -	  {								\
> -	    /* Read the possible remaining bytes.  */			\
> -	    for (i = 1; i < cnt; ++i)					\
> -	      {								\
> -		uint16_t byte = inptr[i];				\
> -									\
> -		if ((byte & 0xc0) != 0x80)				\
> -		  /* This is an illegal encoding.  */			\
> -		  break;						\
> -									\
> -		ch <<= 6;						\
> -		ch |= byte & 0x3f;					\
> -	      }								\
> -									\
> -	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
> -	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
> -	       have been represented with fewer than cnt bytes.  */	\
> -	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
> -		/* Do not accept UTF-16 surrogates.  */			\
> -		|| (ch >= 0xd800 && ch <= 0xdfff))			\
> -	      {								\
> -		/* This is an illegal encoding.  */			\
> -		goto errout;						\
> -	      }								\
> -									\
> -	    inptr += cnt;						\
> -	  }								\
> -      }									\
> -    /* Now adjust the pointers and store the result.  */		\
> -    *((uint16_t *) outptr) = ch;					\
> -    outptr += sizeof (uint16_t);					\
> -  }
> -
> -/* Generate loop-function with software implementation.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> -#define LOOPFCT			__from_utf8_loop_c
> -#define LOOP_NEED_FLAGS
> -#define BODY			BODY_FROM_C
> -#include <iconv/loop.c>
> -
> -/* Generate loop-function with hardware utf-convert instruction.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> -#define LOOPFCT			__from_utf8_loop_etf3eh
> -#define LOOP_NEED_FLAGS
> -#define BODY			BODY_FROM_ETF3EH
> -#include <iconv/loop.c>
> -
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -/* Generate loop-function with hardware vector and utf-convert instructions.  */
> -# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> -# define LOOPFCT		__from_utf8_loop_vx
> -# define LOOP_NEED_FLAGS
> -# define BODY			BODY_FROM_VX
> -# include <iconv/loop.c>
> -#endif
> -
> -
> -/* Generate ifunc'ed loop function.  */
> -__typeof(__from_utf8_loop_c)
> -__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
> -__from_utf8_loop;
> -
> -static void *
> -__from_utf8_loop_resolver (unsigned long int dl_hwcap)
> -{
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -  if (dl_hwcap & HWCAP_S390_VX)
> -    return __from_utf8_loop_vx;
> -  else
> -#endif
> -  if (dl_hwcap & HWCAP_S390_ETF3EH)
> -    return __from_utf8_loop_etf3eh;
> -  else
> -    return __from_utf8_loop_c;
> -}
> -
> -strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
> -
> -/* Conversion from UTF-16 to UTF-8.  */
> -
> -/* The software routine is based on the functionality of the S/390
> -   hardware instruction (cu21) as described in the Principles of
> -   Operation.  */
> -#define BODY_TO_C							\
> -  {									\
> -    uint16_t c = get16 (inptr);						\
> -									\
> -    if (__glibc_likely (c <= 0x007f))					\
> -      {									\
> -	/* Single byte UTF-8 char.  */					\
> -	*outptr = c & 0xff;						\
> -	outptr++;							\
> -      }									\
> -    else if (c >= 0x0080 && c <= 0x07ff)				\
> -      {									\
> -	/* Two byte UTF-8 char.  */					\
> -									\
> -	if (__glibc_unlikely (outptr + 2 > outend))			\
> -	  {								\
> -	    /* Overflow in the output buffer.  */			\
> -	    result = __GCONV_FULL_OUTPUT;				\
> -	    break;							\
> -	  }								\
> -									\
> -	outptr[0] = 0xc0;						\
> -	outptr[0] |= c >> 6;						\
> -									\
> -	outptr[1] = 0x80;						\
> -	outptr[1] |= c & 0x3f;						\
> -									\
> -	outptr += 2;							\
> -      }									\
> -    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
> -      {									\
> -	/* Three byte UTF-8 char.  */					\
> -									\
> -	if (__glibc_unlikely (outptr + 3 > outend))			\
> -	  {								\
> -	    /* Overflow in the output buffer.  */			\
> -	    result = __GCONV_FULL_OUTPUT;				\
> -	    break;							\
> -	  }								\
> -	outptr[0] = 0xe0;						\
> -	outptr[0] |= c >> 12;						\
> -									\
> -	outptr[1] = 0x80;						\
> -	outptr[1] |= (c >> 6) & 0x3f;					\
> -									\
> -	outptr[2] = 0x80;						\
> -	outptr[2] |= c & 0x3f;						\
> -									\
> -	outptr += 3;							\
> -      }									\
> -    else if (c >= 0xd800 && c <= 0xdbff)				\
> -      {									\
> -	/* Four byte UTF-8 char.  */					\
> -	uint16_t low, uvwxy;						\
> -									\
> -	if (__glibc_unlikely (outptr + 4 > outend))			\
> -	  {								\
> -	    /* Overflow in the output buffer.  */			\
> -	    result = __GCONV_FULL_OUTPUT;				\
> -	    break;							\
> -	  }								\
> -	if (__glibc_unlikely (inptr + 4 > inend))			\
> -	  {								\
> -	    result = __GCONV_INCOMPLETE_INPUT;				\
> -	    break;							\
> -	  }								\
> -									\
> -	inptr += 2;							\
> -	low = get16 (inptr);						\
> -									\
> -	if ((low & 0xfc00) != 0xdc00)					\
> -	  {								\
> -	    inptr -= 2;							\
> -	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
> -	  }								\
> -	uvwxy = ((c >> 6) & 0xf) + 1;					\
> -	outptr[0] = 0xf0;						\
> -	outptr[0] |= uvwxy >> 2;					\
> -									\
> -	outptr[1] = 0x80;						\
> -	outptr[1] |= (uvwxy << 4) & 0x30;				\
> -	outptr[1] |= (c >> 2) & 0x0f;					\
> -									\
> -	outptr[2] = 0x80;						\
> -	outptr[2] |= (c & 0x03) << 4;					\
> -	outptr[2] |= (low >> 6) & 0x0f;					\
> -									\
> -	outptr[3] = 0x80;						\
> -	outptr[3] |= low & 0x3f;					\
> -									\
> -	outptr += 4;							\
> -      }									\
> -    else								\
> -      {									\
> -	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
> -      }									\
> -    inptr += 2;								\
> -  }
> -
> -#define BODY_TO_VX							\
> -  {									\
> -    size_t inlen  = inend - inptr;					\
> -    size_t outlen  = outend - outptr;					\
> -    unsigned long tmp, tmp2, tmp3;					\
> -    asm volatile (".machine push\n\t"					\
> -		  ".machine \"z13\"\n\t"				\
> -		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> -		  /* Setup to check for values <= 0x7f.  */		\
> -		  "larl %[R_TMP],9f\n\t"				\
> -		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> -		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
> -		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
> -		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
> -		  "lghi %[R_TMP2],0\n\t"				\
> -		  /* Check for > 1byte UTF-8 chars.  */			\
> -		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
> -		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> -				   UTF8 chars.  */			\
> -		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
> -		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
> -				   UTF8 chars.  */			\
> -		  /* Shorten to UTF-8.  */				\
> -		  "vpkh %%v18,%%v16,%%v17\n\t"				\
> -		  "la %[R_IN],32(%[R_IN])\n\t"				\
> -		  "aghi %[R_INLEN],-32\n\t"				\
> -		  /* Store 16 bytes to buf_out.  */			\
> -		  "vst %%v18,0(%[R_OUT])\n\t"				\
> -		  "aghi %[R_OUTLEN],-16\n\t"				\
> -		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],32,2f\n\t"				\
> -		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
> -		  "j 1b\n\t"						\
> -		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
> -		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> -		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> -		  /* At least one byte is > 0x7f.			\
> -		     Store the preceding 1-byte chars.  */		\
> -		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
> -		  "10:\n\t"						\
> -		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
> -		  /* Shorten to UTF-8.  */				\
> -		  "vpkh %%v18,%%v16,%%v17\n\t"				\
> -		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
> -		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> -		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> -		  "jl 13f\n\t"						\
> -		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> -		  /* Update pointers.  */				\
> -		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> -		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> -		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> -		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> -		  "13:\n\t"						\
> -		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
> -		  "lghi %[R_TMP2],16\n\t"				\
> -		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
> -		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
> -		  "aghi %[R_INLEN],-2\n\t"				\
> -		  "j 22f\n\t"						\
> -		  /* Handle remaining bytes.  */			\
> -		  "2:\n\t"						\
> -		  /* Zero, one or more bytes available?  */		\
> -		  "clgfi %[R_INLEN],1\n\t"				\
> -		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
> -		  "jle 99f\n\t" /* End if less than two bytes.  */	\
> -		  /* Calculate remaining uint16_t values in inptr.  */	\
> -		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
> -		  /* Handle multibyte utf8-char. */			\
> -		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
> -		  "aghi %[R_INLEN],-2\n\t"				\
> -		  /* Test if ch is 1-byte UTF-8 char.  */		\
> -		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
> -		  /* Handle 1-byte UTF-8 char.  */			\
> -		  "31: slgfi %[R_OUTLEN],1\n\t"				\
> -		  "jl 90f \n\t"						\
> -		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
> -		  "la %[R_IN],2(%[R_IN])\n\t"				\
> -		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
> -		  "brctg %[R_TMP2],20b\n\t"				\
> -		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> -		  /* Test if ch is 2-byte UTF-8 char.  */		\
> -		  "22: clfi %[R_TMP],0x7ff\n\t"				\
> -		  "jh 23f\n\t"						\
> -		  /* Handle 2-byte UTF-8 char.  */			\
> -		  "32: slgfi %[R_OUTLEN],2\n\t"				\
> -		  "jl 90f \n\t"						\
> -		  "llill %[R_TMP3],0xc080\n\t"				\
> -		  "la %[R_IN],2(%[R_IN])\n\t"				\
> -		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
> -		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
> -		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
> -		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
> -		  "brctg %[R_TMP2],20b\n\t"				\
> -		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> -		  /* Test if ch is 3-byte UTF-8 char.  */		\
> -		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
> -		  "jh 24f\n\t"						\
> -		  /* Handle 3-byte UTF-8 char.  */			\
> -		  "33: slgfi %[R_OUTLEN],3\n\t"				\
> -		  "jl 90f \n\t"						\
> -		  "llilf %[R_TMP3],0xe08080\n\t"			\
> -		  "la %[R_IN],2(%[R_IN])\n\t"				\
> -		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
> -		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
> -		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
> -		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
> -		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
> -		  "brctg %[R_TMP2],20b\n\t"				\
> -		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> -		  /* Test if ch is 4-byte UTF-8 char.  */		\
> -		  "24: clfi %[R_TMP],0xdfff\n\t"			\
> -		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
> -		  "clfi %[R_TMP],0xdbff\n\t"				\
> -		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
> -		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
> -				  without a preceding high surrogate.  */ \
> -		  /* Handle 4-byte UTF-8 char.  */			\
> -		  "34: slgfi %[R_OUTLEN],4\n\t"				\
> -		  "jl 90f \n\t"						\
> -		  "slgfi %[R_INLEN],2\n\t"				\
> -		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
> -		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
> -		  "llilf %[R_TMP3],0xf0808080\n\t"			\
> -		  "aghi %[R_TMP],0x40\n\t"				\
> -		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
> -		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
> -		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
> -		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
> -		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
> -		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
> -		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
> -		  "nilf %[R_TMP],0xfc00\n\t"				\
> -		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
> -		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
> -		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
> -		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
> -		  "la %[R_IN],4(%[R_IN])\n\t"				\
> -		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> -		  "aghi %[R_TMP2],-2\n\t"				\
> -		  "jh 20b\n\t"						\
> -		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> -		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
> -		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
> -		  "99:\n\t"						\
> -		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (inptr)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
> -		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> -		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> -		    , [R_RES] "+d" (result)				\
> -		  : /* inputs */					\
> -		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> -		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> -		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> -		  : /* clobber list */ "memory", "cc"			\
> -		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> -		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> -		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> -		  );							\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result != __GCONV_ILLEGAL_INPUT)				\
> -      break;								\
> -									\
> -    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
> -  }
> -
> -/* Generate loop-function with software implementation.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -# define LOOPFCT		__to_utf8_loop_c
> -# define BODY                   BODY_TO_C
> -# define LOOP_NEED_FLAGS
> -# include <iconv/loop.c>
> -
> -/* Generate loop-function with software implementation.  */
> -# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
> -# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -# define LOOPFCT		__to_utf8_loop_vx
> -# define BODY                   BODY_TO_VX
> -# define LOOP_NEED_FLAGS
> -# include <iconv/loop.c>
> -
> -/* Generate ifunc'ed loop function.  */
> -__typeof(__to_utf8_loop_c)
> -__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
> -__to_utf8_loop;
> -
> -static void *
> -__to_utf8_loop_resolver (unsigned long int dl_hwcap)
> -{
> -  if (dl_hwcap & HWCAP_S390_VX)
> -    return __to_utf8_loop_vx;
> -  else
> -    return __to_utf8_loop_c;
> -}
> -
> -strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
> -
> -#else
> -# define LOOPFCT		TO_LOOP
> -# define BODY                   BODY_TO_C
> -# define LOOP_NEED_FLAGS
> -# include <iconv/loop.c>
> -#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
> -
> -#include <iconv/skeleton.c>
> diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
> deleted file mode 100644
> index e89dc70..0000000
> --- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
> +++ /dev/null
> @@ -1,807 +0,0 @@
> -/* Conversion between UTF-8 and UTF-32 BE/internal.
> -
> -   This module uses the Z9-109 variants of the Convert Unicode
> -   instructions.
> -   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> -
> -   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> -   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> -
> -   Thanks to Daniel Appich who covered the relevant performance work
> -   in his diploma thesis.
> -
> -   This is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   This is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -#include <dlfcn.h>
> -#include <stdint.h>
> -#include <unistd.h>
> -#include <dl-procinfo.h>
> -#include <gconv.h>
> -
> -#if defined HAVE_S390_VX_GCC_SUPPORT
> -# define ASM_CLOBBER_VR(NR) , NR
> -#else
> -# define ASM_CLOBBER_VR(NR)
> -#endif
> -
> -/* Defines for skeleton.c.  */
> -#define DEFINE_INIT		0
> -#define DEFINE_FINI		0
> -#define MIN_NEEDED_FROM		1
> -#define MAX_NEEDED_FROM		6
> -#define MIN_NEEDED_TO		4
> -#define FROM_LOOP		__from_utf8_loop
> -#define TO_LOOP			__to_utf8_loop
> -#define FROM_DIRECTION		(dir == from_utf8)
> -#define ONE_DIRECTION           0
> -
> -/* UTF-32 big endian byte order mark.  */
> -#define BOM			0x0000feffu
> -
> -/* Direction of the transformation.  */
> -enum direction
> -{
> -  illegal_dir,
> -  to_utf8,
> -  from_utf8
> -};
> -
> -struct utf8_data
> -{
> -  enum direction dir;
> -  int emit_bom;
> -};
> -
> -
> -extern int gconv_init (struct __gconv_step *step);
> -int
> -gconv_init (struct __gconv_step *step)
> -{
> -  /* Determine which direction.  */
> -  struct utf8_data *new_data;
> -  enum direction dir = illegal_dir;
> -  int emit_bom;
> -  int result;
> -
> -  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
> -
> -  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
> -      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
> -	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
> -	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
> -    {
> -      dir = from_utf8;
> -    }
> -  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
> -	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
> -	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
> -    {
> -      dir = to_utf8;
> -    }
> -
> -  result = __GCONV_NOCONV;
> -  if (dir != illegal_dir)
> -    {
> -      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
> -
> -      result = __GCONV_NOMEM;
> -      if (new_data != NULL)
> -	{
> -	  new_data->dir = dir;
> -	  new_data->emit_bom = emit_bom;
> -	  step->__data = new_data;
> -
> -	  if (dir == from_utf8)
> -	    {
> -	      step->__min_needed_from = MIN_NEEDED_FROM;
> -	      step->__max_needed_from = MIN_NEEDED_FROM;
> -	      step->__min_needed_to = MIN_NEEDED_TO;
> -	      step->__max_needed_to = MIN_NEEDED_TO;
> -	    }
> -	  else
> -	    {
> -	      step->__min_needed_from = MIN_NEEDED_TO;
> -	      step->__max_needed_from = MIN_NEEDED_TO;
> -	      step->__min_needed_to = MIN_NEEDED_FROM;
> -	      step->__max_needed_to = MIN_NEEDED_FROM;
> -	    }
> -
> -	  step->__stateful = 0;
> -
> -	  result = __GCONV_OK;
> -	}
> -    }
> -
> -  return result;
> -}
> -
> -
> -extern void gconv_end (struct __gconv_step *data);
> -void
> -gconv_end (struct __gconv_step *data)
> -{
> -  free (data->__data);
> -}
> -
> -/* The macro for the hardware loop.  This is used for both
> -   directions.  */
> -#define HARDWARE_CONVERT(INSTRUCTION)					\
> -  {									\
> -    register const unsigned char* pInput __asm__ ("8") = inptr;		\
> -    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
> -    register unsigned char* pOutput __asm__ ("10") = outptr;		\
> -    register unsigned long long outlen __asm__("11") = outend - outptr;	\
> -    uint64_t cc = 0;							\
> -									\
> -    __asm__ __volatile__ (".machine push       \n\t"			\
> -			  ".machine \"z9-109\" \n\t"			\
> -			  "0: " INSTRUCTION "  \n\t"			\
> -			  ".machine pop        \n\t"			\
> -			  "   jo     0b        \n\t"			\
> -			  "   ipm    %2        \n"			\
> -			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> -			    "+d" (outlen), "+d" (inlen)			\
> -			  :						\
> -			  : "cc", "memory");				\
> -									\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -    cc >>= 28;								\
> -									\
> -    if (cc == 1)							\
> -      {									\
> -	result = __GCONV_FULL_OUTPUT;					\
> -      }									\
> -    else if (cc == 2)							\
> -      {									\
> -	result = __GCONV_ILLEGAL_INPUT;					\
> -      }									\
> -  }
> -
> -#define PREPARE_LOOP							\
> -  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> -  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> -									\
> -  if (emit_bom && !data->__internal_use					\
> -      && data->__invocation_counter == 0)				\
> -    {									\
> -      /* Emit the Byte Order Mark.  */					\
> -      if (__glibc_unlikely (outbuf + 4 > outend))			\
> -	return __GCONV_FULL_OUTPUT;					\
> -									\
> -      put32u (outbuf, BOM);						\
> -      outbuf += 4;							\
> -    }
> -
> -/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
> -
> -#define STORE_REST_COMMON						      \
> -  {									      \
> -    /* We store the remaining bytes while converting them into the UCS4	      \
> -       format.  We can assume that the first byte in the buffer is	      \
> -       correct and that it requires a larger number of bytes than there	      \
> -       are in the input buffer.  */					      \
> -    wint_t ch = **inptrp;						      \
> -    size_t cnt, r;							      \
> -									      \
> -    state->__count = inend - *inptrp;					      \
> -									      \
> -    assert (ch != 0xc0 && ch != 0xc1);					      \
> -    if (ch >= 0xc2 && ch < 0xe0)					      \
> -      {									      \
> -	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
> -	   0xc1, otherwise the wide character could have been		      \
> -	   represented using a single byte.  */				      \
> -	cnt = 2;							      \
> -	ch &= 0x1f;							      \
> -      }									      \
> -    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
> -      {									      \
> -	/* We expect three bytes.  */					      \
> -	cnt = 3;							      \
> -	ch &= 0x0f;							      \
> -      }									      \
> -    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
> -      {									      \
> -	/* We expect four bytes.  */					      \
> -	cnt = 4;							      \
> -	ch &= 0x07;							      \
> -      }									      \
> -    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
> -      {									      \
> -	/* We expect five bytes.  */					      \
> -	cnt = 5;							      \
> -	ch &= 0x03;							      \
> -      }									      \
> -    else								      \
> -      {									      \
> -	/* We expect six bytes.  */					      \
> -	cnt = 6;							      \
> -	ch &= 0x01;							      \
> -      }									      \
> -									      \
> -    /* The first byte is already consumed.  */				      \
> -    r = cnt - 1;							      \
> -    while (++(*inptrp) < inend)						      \
> -      {									      \
> -	ch <<= 6;							      \
> -	ch |= **inptrp & 0x3f;						      \
> -	--r;								      \
> -      }									      \
> -									      \
> -    /* Shift for the so far missing bytes.  */				      \
> -    ch <<= r * 6;							      \
> -									      \
> -    /* Store the number of bytes expected for the entire sequence.  */	      \
> -    state->__count |= cnt << 8;						      \
> -									      \
> -    /* Store the value.  */						      \
> -    state->__value.__wch = ch;						      \
> -  }
> -
> -#define UNPACK_BYTES_COMMON \
> -  {									      \
> -    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
> -    wint_t wch = state->__value.__wch;					      \
> -    size_t ntotal = state->__count >> 8;				      \
> -									      \
> -    inlen = state->__count & 255;					      \
> -									      \
> -    bytebuf[0] = inmask[ntotal - 2];					      \
> -									      \
> -    do									      \
> -      {									      \
> -	if (--ntotal < inlen)						      \
> -	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
> -	wch >>= 6;							      \
> -      }									      \
> -    while (ntotal > 1);							      \
> -									      \
> -    bytebuf[0] |= wch;							      \
> -  }
> -
> -#define CLEAR_STATE_COMMON \
> -  state->__count = 0
> -
> -#define BODY_FROM_HW(ASM)						\
> -  {									\
> -    ASM;								\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> -      break;								\
> -									\
> -    int i;								\
> -    for (i = 1; inptr + i < inend && i < 5; ++i)			\
> -      if ((inptr[i] & 0xc0) != 0x80)					\
> -	break;								\
> -									\
> -    if (__glibc_likely (inptr + i == inend				\
> -			&& result == __GCONV_EMPTY_INPUT))		\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> -    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
> -  }
> -
> -/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
> -#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
> -
> -
> -/* The software routine is copied from gconv_simple.c.  */
> -#define BODY_FROM_C							\
> -  {									\
> -    /* Next input byte.  */						\
> -    uint32_t ch = *inptr;						\
> -									\
> -    if (__glibc_likely (ch < 0x80))					\
> -      {									\
> -	/* One byte sequence.  */					\
> -	++inptr;							\
> -      }									\
> -    else								\
> -      {									\
> -	uint_fast32_t cnt;						\
> -	uint_fast32_t i;						\
> -									\
> -	if (ch >= 0xc2 && ch < 0xe0)					\
> -	  {								\
> -	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
> -	       0xc1, otherwise the wide character could have been	\
> -	       represented using a single byte.  */			\
> -	    cnt = 2;							\
> -	    ch &= 0x1f;							\
> -	  }								\
> -	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
> -	  {								\
> -	    /* We expect three bytes.  */				\
> -	    cnt = 3;							\
> -	    ch &= 0x0f;							\
> -	  }								\
> -	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
> -	  {								\
> -	    /* We expect four bytes.  */				\
> -	    cnt = 4;							\
> -	    ch &= 0x07;							\
> -	  }								\
> -	else								\
> -	  {								\
> -	    /* Search the end of this ill-formed UTF-8 character.  This	\
> -	       is the next byte with (x & 0xc0) != 0x80.  */		\
> -	    i = 0;							\
> -	    do								\
> -	      ++i;							\
> -	    while (inptr + i < inend					\
> -		   && (*(inptr + i) & 0xc0) == 0x80			\
> -		   && i < 5);						\
> -									\
> -	  errout:							\
> -	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
> -	  }								\
> -									\
> -	if (__glibc_unlikely (inptr + cnt > inend))			\
> -	  {								\
> -	    /* We don't have enough input.  But before we report	\
> -	       that check that all the bytes are correct.  */		\
> -	    for (i = 1; inptr + i < inend; ++i)				\
> -	      if ((inptr[i] & 0xc0) != 0x80)				\
> -		break;							\
> -									\
> -	    if (__glibc_likely (inptr + i == inend))			\
> -	      {								\
> -		result = __GCONV_INCOMPLETE_INPUT;			\
> -		break;							\
> -	      }								\
> -									\
> -	    goto errout;						\
> -	  }								\
> -									\
> -	/* Read the possible remaining bytes.  */			\
> -	for (i = 1; i < cnt; ++i)					\
> -	  {								\
> -	    uint32_t byte = inptr[i];					\
> -									\
> -	    if ((byte & 0xc0) != 0x80)					\
> -	      /* This is an illegal encoding.  */			\
> -	      break;							\
> -									\
> -	    ch <<= 6;							\
> -	    ch |= byte & 0x3f;						\
> -	  }								\
> -									\
> -	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
> -	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
> -	   have been represented with fewer than cnt bytes.  */		\
> -	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
> -	    /* Do not accept UTF-16 surrogates.  */			\
> -	    || (ch >= 0xd800 && ch <= 0xdfff)				\
> -	    || (ch > 0x10ffff))						\
> -	  {								\
> -	    /* This is an illegal encoding.  */				\
> -	    goto errout;						\
> -	  }								\
> -									\
> -	inptr += cnt;							\
> -      }									\
> -									\
> -    /* Now adjust the pointers and store the result.  */		\
> -    *((uint32_t *) outptr) = ch;					\
> -    outptr += sizeof (uint32_t);					\
> -  }
> -
> -#define HW_FROM_VX							\
> -  {									\
> -    register const unsigned char* pInput asm ("8") = inptr;		\
> -    register size_t inlen asm ("9") = inend - inptr;			\
> -    register unsigned char* pOutput asm ("10") = outptr;		\
> -    register size_t outlen asm("11") = outend - outptr;			\
> -    unsigned long tmp, tmp2, tmp3;					\
> -    asm volatile (".machine push\n\t"					\
> -		  ".machine \"z13\"\n\t"				\
> -		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> -		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
> -		  "vrepib %%v31,0x20\n\t"				\
> -		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
> -		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
> -		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> -		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
> -		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> -				   UTF8 chars.  */			\
> -		  /* Enlarge to UCS4.  */				\
> -		  "vuplhb %%v18,%%v16\n\t"				\
> -		  "vupllb %%v19,%%v16\n\t"				\
> -		  "la %[R_IN],16(%[R_IN])\n\t"				\
> -		  "vuplhh %%v20,%%v18\n\t"				\
> -		  "aghi %[R_INLEN],-16\n\t"				\
> -		  "vupllh %%v21,%%v18\n\t"				\
> -		  "aghi %[R_OUTLEN],-64\n\t"				\
> -		  "vuplhh %%v22,%%v19\n\t"				\
> -		  "vupllh %%v23,%%v19\n\t"				\
> -		  /* Store 64 bytes to buf_out.  */			\
> -		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
> -		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],16,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
> -		  "j 1b\n\t"						\
> -		  "10:\n\t"						\
> -		  /* At least one byte is > 0x7f.			\
> -		     Store the preceding 1-byte chars.  */		\
> -		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
> -		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
> -						     index to store. */ \
> -		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
> -		  "ahi %[R_TMP2],-1\n\t"				\
> -		  "jl 20f\n\t"						\
> -		  "vuplhb %%v18,%%v16\n\t"				\
> -		  "vuplhh %%v20,%%v18\n\t"				\
> -		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
> -		  "ahi %[R_TMP2],-16\n\t"				\
> -		  "jl 11f\n\t"						\
> -		  "vupllh %%v21,%%v18\n\t"				\
> -		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
> -		  "ahi %[R_TMP2],-16\n\t"				\
> -		  "jl 11f\n\t"						\
> -		  "vupllb %%v19,%%v16\n\t"				\
> -		  "vuplhh %%v22,%%v19\n\t"				\
> -		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
> -		  "ahi %[R_TMP2],-16\n\t"				\
> -		  "jl 11f\n\t"						\
> -		  "vupllh %%v23,%%v19\n\t"				\
> -		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
> -		  "11:\n\t"						\
> -		  /* Update pointers.  */				\
> -		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> -		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> -		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> -		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> -		  /* Handle multibyte utf8-char with convert instruction. */ \
> -		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
> -		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> -		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> -		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> -		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (pInput)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> -		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> -		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> -		    , [R_RES] "+d" (result)				\
> -		  : /* inputs */					\
> -		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> -		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> -		  : /* clobber list */ "memory", "cc"			\
> -		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> -		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> -		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> -		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
> -		    ASM_CLOBBER_VR ("v31")				\
> -		  );							\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -  }
> -#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
> -
> -/* These definitions apply to the UTF-8 to UTF-32 direction.  The
> -   software implementation for UTF-8 still supports multibyte
> -   characters up to 6 bytes whereas the hardware variant does not.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define LOOPFCT			__from_utf8_loop_c
> -
> -#define LOOP_NEED_FLAGS
> -
> -#define STORE_REST		STORE_REST_COMMON
> -#define UNPACK_BYTES		UNPACK_BYTES_COMMON
> -#define CLEAR_STATE		CLEAR_STATE_COMMON
> -#define BODY			BODY_FROM_C
> -#include <iconv/loop.c>
> -
> -
> -/* Generate loop-function with hardware utf-convert instruction.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define LOOPFCT			__from_utf8_loop_etf3eh
> -
> -#define LOOP_NEED_FLAGS
> -
> -#define STORE_REST		STORE_REST_COMMON
> -#define UNPACK_BYTES		UNPACK_BYTES_COMMON
> -#define CLEAR_STATE		CLEAR_STATE_COMMON
> -#define BODY			BODY_FROM_ETF3EH
> -#include <iconv/loop.c>
> -
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -/* Generate loop-function with hardware vector instructions.  */
> -# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -# define LOOPFCT		__from_utf8_loop_vx
> -
> -# define LOOP_NEED_FLAGS
> -
> -# define STORE_REST		STORE_REST_COMMON
> -# define UNPACK_BYTES		UNPACK_BYTES_COMMON
> -# define CLEAR_STATE		CLEAR_STATE_COMMON
> -# define BODY			BODY_FROM_VX
> -# include <iconv/loop.c>
> -#endif
> -
> -
> -/* Generate ifunc'ed loop function.  */
> -__typeof(__from_utf8_loop_c)
> -__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
> -__from_utf8_loop;
> -
> -static void *
> -__from_utf8_loop_resolver (unsigned long int dl_hwcap)
> -{
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -  if (dl_hwcap & HWCAP_S390_VX)
> -    return __from_utf8_loop_vx;
> -  else
> -#endif
> -  if (dl_hwcap & HWCAP_S390_ETF3EH)
> -    return __from_utf8_loop_etf3eh;
> -  else
> -    return __from_utf8_loop_c;
> -}
> -
> -strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
> -
> -
> -/* Conversion from UTF-32 internal/BE to UTF-8.  */
> -#define BODY_TO_HW(ASM)							\
> -  {									\
> -    ASM;								\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> -      break;								\
> -    if (inptr + 4 > inend)						\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> -    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> -  }
> -
> -/* The hardware routine uses the S/390 cu41 instruction.  */
> -#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
> -
> -/* The hardware routine uses the S/390 vector and cu41 instructions.  */
> -#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
> -
> -/* The software routine mimics the S/390 cu41 instruction.  */
> -#define BODY_TO_C						\
> -  {								\
> -    uint32_t wc = *((const uint32_t *) inptr);			\
> -								\
> -    if (__glibc_likely (wc <= 0x7f))				\
> -      {								\
> -	/* Single UTF-8 char.  */				\
> -	*outptr = (uint8_t)wc;					\
> -	outptr++;						\
> -      }								\
> -    else if (wc <= 0x7ff)					\
> -      {								\
> -	/* Two UTF-8 chars.  */					\
> -	if (__glibc_unlikely (outptr + 2 > outend))		\
> -	  {							\
> -	    /* Overflow in the output buffer.  */		\
> -	    result = __GCONV_FULL_OUTPUT;			\
> -	    break;						\
> -	  }							\
> -								\
> -	outptr[0] = 0xc0;					\
> -	outptr[0] |= wc >> 6;					\
> -								\
> -	outptr[1] = 0x80;					\
> -	outptr[1] |= wc & 0x3f;					\
> -								\
> -	outptr += 2;						\
> -      }								\
> -    else if (wc <= 0xffff)					\
> -      {								\
> -	/* Three UTF-8 chars.  */				\
> -	if (__glibc_unlikely (outptr + 3 > outend))		\
> -	  {							\
> -	    /* Overflow in the output buffer.  */		\
> -	    result = __GCONV_FULL_OUTPUT;			\
> -	    break;						\
> -	  }							\
> -	if (wc >= 0xd800 && wc < 0xdc00)			\
> -	  {							\
> -	    /* Do not accept UTF-16 surrogates.   */		\
> -	    result = __GCONV_ILLEGAL_INPUT;			\
> -	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
> -	  }							\
> -	outptr[0] = 0xe0;					\
> -	outptr[0] |= wc >> 12;					\
> -								\
> -	outptr[1] = 0x80;					\
> -	outptr[1] |= (wc >> 6) & 0x3f;				\
> -								\
> -	outptr[2] = 0x80;					\
> -	outptr[2] |= wc & 0x3f;					\
> -								\
> -	outptr += 3;						\
> -      }								\
> -      else if (wc <= 0x10ffff)					\
> -	{							\
> -	  /* Four UTF-8 chars.  */				\
> -	  if (__glibc_unlikely (outptr + 4 > outend))		\
> -	    {							\
> -	      /* Overflow in the output buffer.  */		\
> -	      result = __GCONV_FULL_OUTPUT;			\
> -	      break;						\
> -	    }							\
> -	  outptr[0] = 0xf0;					\
> -	  outptr[0] |= wc >> 18;				\
> -								\
> -	  outptr[1] = 0x80;					\
> -	  outptr[1] |= (wc >> 12) & 0x3f;			\
> -								\
> -	  outptr[2] = 0x80;					\
> -	  outptr[2] |= (wc >> 6) & 0x3f;			\
> -								\
> -	  outptr[3] = 0x80;					\
> -	  outptr[3] |= wc & 0x3f;				\
> -								\
> -	  outptr += 4;						\
> -	}							\
> -      else							\
> -	{							\
> -	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
> -	}							\
> -    inptr += 4;							\
> -  }
> -
> -#define HW_TO_VX							\
> -  {									\
> -    register const unsigned char* pInput asm ("8") = inptr;		\
> -    register size_t inlen asm ("9") = inend - inptr;			\
> -    register unsigned char* pOutput asm ("10") = outptr;		\
> -    register size_t outlen asm("11") = outend - outptr;			\
> -    unsigned long tmp, tmp2;						\
> -    asm volatile (".machine push\n\t"					\
> -		  ".machine \"z13\"\n\t"				\
> -		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> -		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
> -		  "vzero %%v21\n\t"					\
> -		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
> -		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
> -		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
> -		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> -		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
> -		  "lghi %[R_TMP],0\n\t"					\
> -		  /* Shorten to byte values.  */			\
> -		  "vpkf %%v23,%%v16,%%v17\n\t"				\
> -		  "vpkf %%v24,%%v18,%%v19\n\t"				\
> -		  "vpkh %%v23,%%v23,%%v24\n\t"				\
> -		  /* Checking for values > 0x7f.  */			\
> -		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
> -		  "jno 10f\n\t"						\
> -		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
> -		  "jno 11f\n\t"						\
> -		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
> -		  "jno 12f\n\t"						\
> -		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
> -		  "jno 13f\n\t"						\
> -		  /* Store 16bytes to outptr.  */			\
> -		  "vst %%v23,0(%[R_OUT])\n\t"				\
> -		  "aghi %[R_INLEN],-64\n\t"				\
> -		  "aghi %[R_OUTLEN],-16\n\t"				\
> -		  "la %[R_IN],64(%[R_IN])\n\t"				\
> -		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],64,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> -		  "j 1b\n\t"						\
> -		  /* Found a value > 0x7f.  */				\
> -		  "13: ahi %[R_TMP],4\n\t"				\
> -		  "12: ahi %[R_TMP],4\n\t"				\
> -		  "11: ahi %[R_TMP],4\n\t"				\
> -		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
> -		  "srlg %[R_I],%[R_I],2\n\t"				\
> -		  "agr %[R_I],%[R_TMP]\n\t"				\
> -		  "je 20f\n\t"						\
> -		  /* Store characters before invalid one...  */		\
> -		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
> -		  "15: aghi %[R_I],-1\n\t"				\
> -		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
> -		  /* ... and update pointers.  */			\
> -		  "aghi %[R_I],1\n\t"					\
> -		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
> -		  "sllg %[R_I],%[R_I],2\n\t"				\
> -		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
> -		  "slgr %[R_INLEN],%[R_I]\n\t"				\
> -		  /* Handle multibyte utf8-char with convert instruction. */ \
> -		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
> -		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> -		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> -		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> -		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (pInput)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> -		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
> -		    , [R_I] "=a" (tmp2)					\
> -		    , [R_RES] "+d" (result)				\
> -		  : /* inputs */					\
> -		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> -		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> -		  : /* clobber list */ "memory", "cc"			\
> -		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> -		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> -		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> -		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
> -		    ASM_CLOBBER_VR ("v24")				\
> -		  );							\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -  }
> -
> -/* Generate loop-function with software routing.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			__to_utf8_loop_c
> -#define BODY			BODY_TO_C
> -#define LOOP_NEED_FLAGS
> -#include <iconv/loop.c>
> -
> -/* Generate loop-function with hardware utf-convert instruction.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			__to_utf8_loop_etf3eh
> -#define LOOP_NEED_FLAGS
> -#define BODY			BODY_TO_ETF3EH
> -#include <iconv/loop.c>
> -
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -/* Generate loop-function with hardware vector and utf-convert instructions.  */
> -# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -# define LOOPFCT		__to_utf8_loop_vx
> -# define BODY			BODY_TO_VX
> -# define LOOP_NEED_FLAGS
> -# include <iconv/loop.c>
> -#endif
> -
> -/* Generate ifunc'ed loop function.  */
> -__typeof(__to_utf8_loop_c)
> -__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
> -__to_utf8_loop;
> -
> -static void *
> -__to_utf8_loop_resolver (unsigned long int dl_hwcap)
> -{
> -#if defined HAVE_S390_VX_ASM_SUPPORT
> -  if (dl_hwcap & HWCAP_S390_VX)
> -    return __to_utf8_loop_vx;
> -  else
> -#endif
> -  if (dl_hwcap & HWCAP_S390_ETF3EH)
> -    return __to_utf8_loop_etf3eh;
> -  else
> -    return __to_utf8_loop_c;
> -}
> -
> -strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
> -
> -
> -#include <iconv/skeleton.c>
> diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
> new file mode 100644
> index 0000000..ecf06bd
> --- /dev/null
> +++ b/sysdeps/s390/utf16-utf32-z9.c
> @@ -0,0 +1,636 @@
> +/* Conversion between UTF-16 and UTF-32 BE/internal.
> +
> +   This module uses the Z9-109 variants of the Convert Unicode
> +   instructions.
> +   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> +
> +   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> +   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> +
> +   Thanks to Daniel Appich who covered the relevant performance work
> +   in his diploma thesis.
> +
> +   This is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   This is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <dlfcn.h>
> +#include <stdint.h>
> +#include <unistd.h>
> +#include <dl-procinfo.h>
> +#include <gconv.h>
> +
> +#if defined HAVE_S390_VX_GCC_SUPPORT
> +# define ASM_CLOBBER_VR(NR) , NR
> +#else
> +# define ASM_CLOBBER_VR(NR)
> +#endif
> +
> +#if defined __s390x__
> +# define CONVERT_32BIT_SIZE_T(REG)
> +#else
> +# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
> +#endif
> +
> +/* UTF-32 big endian byte order mark.  */
> +#define BOM_UTF32               0x0000feffu
> +
> +/* UTF-16 big endian byte order mark.  */
> +#define BOM_UTF16               0xfeff
> +
> +#define DEFINE_INIT		0
> +#define DEFINE_FINI		0
> +#define MIN_NEEDED_FROM		2
> +#define MAX_NEEDED_FROM		4
> +#define MIN_NEEDED_TO		4
> +#define FROM_LOOP		__from_utf16_loop
> +#define TO_LOOP			__to_utf16_loop
> +#define FROM_DIRECTION		(dir == from_utf16)
> +#define ONE_DIRECTION           0
> +
> +/* Direction of the transformation.  */
> +enum direction
> +{
> +  illegal_dir,
> +  to_utf16,
> +  from_utf16
> +};
> +
> +struct utf16_data
> +{
> +  enum direction dir;
> +  int emit_bom;
> +};
> +
> +
> +extern int gconv_init (struct __gconv_step *step);
> +int
> +gconv_init (struct __gconv_step *step)
> +{
> +  /* Determine which direction.  */
> +  struct utf16_data *new_data;
> +  enum direction dir = illegal_dir;
> +  int emit_bom;
> +  int result;
> +
> +  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
> +	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
> +
> +  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
> +      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
> +	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
> +	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
> +    {
> +      dir = from_utf16;
> +    }
> +  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
> +	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
> +	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
> +	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
> +    {
> +      dir = to_utf16;
> +    }
> +
> +  result = __GCONV_NOCONV;
> +  if (dir != illegal_dir)
> +    {
> +      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
> +
> +      result = __GCONV_NOMEM;
> +      if (new_data != NULL)
> +	{
> +	  new_data->dir = dir;
> +	  new_data->emit_bom = emit_bom;
> +	  step->__data = new_data;
> +
> +	  if (dir == from_utf16)
> +	    {
> +	      step->__min_needed_from = MIN_NEEDED_FROM;
> +	      step->__max_needed_from = MIN_NEEDED_FROM;
> +	      step->__min_needed_to = MIN_NEEDED_TO;
> +	      step->__max_needed_to = MIN_NEEDED_TO;
> +	    }
> +	  else
> +	    {
> +	      step->__min_needed_from = MIN_NEEDED_TO;
> +	      step->__max_needed_from = MIN_NEEDED_TO;
> +	      step->__min_needed_to = MIN_NEEDED_FROM;
> +	      step->__max_needed_to = MIN_NEEDED_FROM;
> +	    }
> +
> +	  step->__stateful = 0;
> +
> +	  result = __GCONV_OK;
> +	}
> +    }
> +
> +  return result;
> +}
> +
> +
> +extern void gconv_end (struct __gconv_step *data);
> +void
> +gconv_end (struct __gconv_step *data)
> +{
> +  free (data->__data);
> +}
> +
> +/* The macro for the hardware loop.  This is used for both
> +   directions.  */
> +#define HARDWARE_CONVERT(INSTRUCTION)					\
> +  {									\
> +    register const unsigned char* pInput __asm__ ("8") = inptr;		\
> +    register size_t inlen __asm__ ("9") = inend - inptr;		\
> +    register unsigned char* pOutput __asm__ ("10") = outptr;		\
> +    register size_t outlen __asm__("11") = outend - outptr;		\
> +    unsigned long cc = 0;						\
> +									\
> +    __asm__ __volatile__ (".machine push       \n\t"			\
> +			  ".machine \"z9-109\" \n\t"			\
> +			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
> +			  "0: " INSTRUCTION "  \n\t"			\
> +			  ".machine pop        \n\t"			\
> +			  "   jo     0b        \n\t"			\
> +			  "   ipm    %2        \n"			\
> +			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> +			    "+d" (outlen), "+d" (inlen)			\
> +			  :						\
> +			  : "cc", "memory");				\
> +									\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +    cc >>= 28;								\
> +									\
> +    if (cc == 1)							\
> +      {									\
> +	result = __GCONV_FULL_OUTPUT;					\
> +      }									\
> +    else if (cc == 2)							\
> +      {									\
> +	result = __GCONV_ILLEGAL_INPUT;					\
> +      }									\
> +  }
> +
> +#define PREPARE_LOOP							\
> +  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
> +  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
> +									\
> +  if (emit_bom && !data->__internal_use					\
> +      && data->__invocation_counter == 0)				\
> +    {									\
> +      if (dir == to_utf16)						\
> +	{								\
> +	  /* Emit the UTF-16 Byte Order Mark.  */			\
> +	  if (__glibc_unlikely (outbuf + 2 > outend))			\
> +	    return __GCONV_FULL_OUTPUT;					\
> +									\
> +	  put16u (outbuf, BOM_UTF16);					\
> +	  outbuf += 2;							\
> +	}								\
> +      else								\
> +	{								\
> +	  /* Emit the UTF-32 Byte Order Mark.  */			\
> +	  if (__glibc_unlikely (outbuf + 4 > outend))			\
> +	    return __GCONV_FULL_OUTPUT;					\
> +									\
> +	  put32u (outbuf, BOM_UTF32);					\
> +	  outbuf += 4;							\
> +	}								\
> +    }
> +
> +/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
> +
> +/* The software routine is copied from utf-16.c (minus bytes
> +   swapping).  */
> +#define BODY_FROM_C							\
> +  {									\
> +    uint16_t u1 = get16 (inptr);					\
> +									\
> +    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
> +      {									\
> +	/* No surrogate.  */						\
> +	put32 (outptr, u1);						\
> +	inptr += 2;							\
> +      }									\
> +    else								\
> +      {									\
> +	/* An isolated low-surrogate was found.  This has to be         \
> +	   considered ill-formed.  */					\
> +	if (__glibc_unlikely (u1 >= 0xdc00))				\
> +	  {								\
> +	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
> +	  }								\
> +	/* It's a surrogate character.  At least the first word says	\
> +	   it is.  */							\
> +	if (__glibc_unlikely (inptr + 4 > inend))			\
> +	  {								\
> +	    /* We don't have enough input for another complete input	\
> +	       character.  */						\
> +	    result = __GCONV_INCOMPLETE_INPUT;				\
> +	    break;							\
> +	  }								\
> +									\
> +	inptr += 2;							\
> +	uint16_t u2 = get16 (inptr);					\
> +	if (__builtin_expect (u2 < 0xdc00, 0)				\
> +	    || __builtin_expect (u2 > 0xdfff, 0))			\
> +	  {								\
> +	    /* This is no valid second word for a surrogate.  */	\
> +	    inptr -= 2;							\
> +	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
> +	  }								\
> +									\
> +	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
> +	inptr += 2;							\
> +      }									\
> +    outptr += 4;							\
> +  }
> +
> +#define BODY_FROM_VX							\
> +  {									\
> +    size_t inlen = inend - inptr;					\
> +    size_t outlen = outend - outptr;					\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  /* Setup to check for surrogates.  */			\
> +		  "larl %[R_TMP],9f\n\t"				\
> +		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> +		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
> +		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
> +		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
> +		  "0: clgijl %[R_INLEN],16,2f\n\t"			\
> +		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
> +		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> +		  /* Check for surrogate chars.  */			\
> +		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t"						\
> +		  /* Enlarge to UTF-32.  */				\
> +		  "vuplhh %%v17,%%v16\n\t"				\
> +		  "la %[R_IN],16(%[R_IN])\n\t"				\
> +		  "vupllh %%v18,%%v16\n\t"				\
> +		  "aghi %[R_INLEN],-16\n\t"				\
> +		  /* Store 32 bytes to buf_out.  */			\
> +		  "vstm %%v17,%%v18,0(%[R_OUT])\n\t"			\
> +		  "aghi %[R_OUTLEN],-32\n\t"				\
> +		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],16,2f\n\t"				\
> +		  "clgijl %[R_OUTLEN],32,2f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
> +		  "9: .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  ".short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  /* At least on uint16_t is in range of surrogates.	\
> +		     Store the preceding chars.  */			\
> +		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		  "vuplhh %%v17,%%v16\n\t"				\
> +		  "sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> +		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> +		  "jl 12f\n\t"						\
> +		  "vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  "vupllh %%v18,%%v16\n\t"				\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
> +		  "11: \n\t" /* Update pointers.  */			\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
> +		  "12: lghi %[R_TMP2],16\n\t"				\
> +		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
> +		  "srl %[R_TMP2],1\n\t"					\
> +		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
> +		  "aghi %[R_OUTLEN],-4\n\t"				\
> +		  "j 16f\n\t"						\
> +		  /* Handle remaining bytes.  */			\
> +		  "2:\n\t"						\
> +		  /* Zero, one or more bytes available?  */		\
> +		  "clgfi %[R_INLEN],1\n\t"				\
> +		  "je 97f\n\t" /* Only one byte available.  */		\
> +		  "jl 99f\n\t" /* End if no bytes available.  */	\
> +		  /* Calculate remaining uint16_t values in inptr.  */	\
> +		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
> +		  /* Handle remaining uint16_t values.  */		\
> +		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
> +		  "slgfi %[R_OUTLEN],4\n\t"				\
> +		  "jl 96f \n\t"						\
> +		  "clfi %[R_TMP],0xd800\n\t"				\
> +		  "jhe 15f\n\t"						\
> +		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-2\n\t"				\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],13b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Handle UTF-16 surrogate pair.  */			\
> +		  "15: clfi %[R_TMP],0xdfff\n\t"			\
> +		  "jh 14b\n\t" /* Jump away if ch > 0xdfff.  */		\
> +		  "16: clfi %[R_TMP],0xdc00\n\t"			\
> +		  "jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
> +		  "slgfi %[R_INLEN],4\n\t"				\
> +		  "jl 97f\n\t" /* Big enough input?  */			\
> +		  "llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
> +		  "slfi %[R_TMP],0xd7c0\n\t"				\
> +		  "sll %[R_TMP],10\n\t"					\
> +		  "risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
> +		  "nilf %[R_TMP3],0xfc00\n\t"				\
> +		  "clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
> +		  "jne 98f\n\t"						\
> +		  "st %[R_TMP],0(%[R_OUT])\n\t"				\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "aghi %[R_TMP2],-2\n\t"				\
> +		  "jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  "96:\n\t" /* Return full output.  */			\
> +		  "lghi %[R_RES],%[RES_OUT_FULL]\n\t"			\
> +		  "j 99f\n\t"						\
> +		  "97:\n\t" /* Return incomplete input.  */		\
> +		  "lghi %[R_RES],%[RES_IN_FULL]\n\t"			\
> +		  "j 99f\n\t"						\
> +		  "98:\n\t" /* Return Illegal character.  */		\
> +		  "lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "99:\n\t"						\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (inptr)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result != __GCONV_ILLEGAL_INPUT)				\
> +      break;								\
> +									\
> +    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
> +  }
> +
> +
> +/* Generate loop-function with software routing.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +# define LOOPFCT		__from_utf16_loop_c
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_C
> +# include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware vector instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		__from_utf16_loop_vx
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_VX
> +# include <iconv/loop.c>
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__from_utf16_loop_c)
> +__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
> +__from_utf16_loop;
> +
> +static void *
> +__from_utf16_loop_resolver (unsigned long int dl_hwcap)
> +{
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __from_utf16_loop_vx;
> +  else
> +    return __from_utf16_loop_c;
> +}
> +
> +strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
> +#else
> +# define LOOPFCT		FROM_LOOP
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_C
> +# include <iconv/loop.c>
> +#endif
> +
> +/* Conversion from UTF-32 internal/BE to UTF-16.  */
> +
> +/* The software routine is copied from utf-16.c (minus bytes
> +   swapping).  */
> +#define BODY_TO_C							\
> +  {									\
> +    uint32_t c = get32 (inptr);						\
> +									\
> +    if (__builtin_expect (c <= 0xd7ff, 1)				\
> +	|| (c >=0xdc00 && c <= 0xffff))					\
> +      {									\
> +	/* Two UTF-16 chars.  */					\
> +	put16 (outptr, c);						\
> +      }									\
> +    else if (__builtin_expect (c >= 0x10000, 1)				\
> +	     && __builtin_expect (c <= 0x10ffff, 1))			\
> +      {									\
> +	/* Four UTF-16 chars.  */					\
> +	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
> +	uint16_t out;							\
> +									\
> +	/* Generate a surrogate character.  */				\
> +	if (__glibc_unlikely (outptr + 4 > outend))			\
> +	  {								\
> +	    /* Overflow in the output buffer.  */			\
> +	    result = __GCONV_FULL_OUTPUT;				\
> +	    break;							\
> +	  }								\
> +									\
> +	out = 0xd800;							\
> +	out |= (zabcd & 0xff) << 6;					\
> +	out |= (c >> 10) & 0x3f;					\
> +	put16 (outptr, out);						\
> +	outptr += 2;							\
> +									\
> +	out = 0xdc00;							\
> +	out |= c & 0x3ff;						\
> +	put16 (outptr, out);						\
> +      }									\
> +    else								\
> +      {									\
> +	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
> +      }									\
> +    outptr += 2;							\
> +    inptr += 4;								\
> +  }
> +
> +#define BODY_TO_ETF3EH							\
> +  {									\
> +    HARDWARE_CONVERT ("cu42 %0, %1");					\
> +									\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +									\
> +    if (inptr + 4 > inend)						\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> +  }
> +
> +#define BODY_TO_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  /* Setup to check for surrogates.  */			\
> +		  "larl %[R_TMP],9f\n\t"				\
> +		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> +		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
> +		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
> +		  /* Loop which handles UTF-16 chars			\
> +		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
> +		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
> +		  "lghi %[R_TMP2],0\n\t"				\
> +		  /* Shorten to UTF-16.  */				\
> +		  "vpkf %%v18,%%v16,%%v17\n\t"				\
> +		  /* Check for surrogate chars.  */			\
> +		  "vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t"						\
> +		  "vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"			\
> +		  "jno 11f\n\t"						\
> +		  /* Store 16 bytes to buf_out.  */			\
> +		  "vst %%v18,0(%[R_OUT])\n\t"				\
> +		  "la %[R_IN],32(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-32\n\t"				\
> +		  "aghi %[R_OUTLEN],-16\n\t"				\
> +		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],32,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
> +		     and check for ch >= 0x10000. (v30, v31)  */	\
> +		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
> +		  ".long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
> +		  /* At least on UTF32 char is in range of surrogates.	\
> +		     Store the preceding characters.  */		\
> +		  "11: ahi %[R_TMP2],16\n\t"				\
> +		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
> +		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
> +		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> +		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> +		  "jl 20f\n\t"						\
> +		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  /* Update pointers.  */				\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Handles UTF16 surrogates with convert instruction.  */ \
> +		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +									\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +    if (inptr + 4 > inend)						\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> +  }
> +
> +/* Generate loop-function with software routing.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf16_loop_c
> +#define LOOP_NEED_FLAGS
> +#define BODY			BODY_TO_C
> +#include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf16_loop_etf3eh
> +#define LOOP_NEED_FLAGS
> +#define BODY			BODY_TO_ETF3EH
> +#include <iconv/loop.c>
> +
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +# define LOOPFCT		__to_utf16_loop_vx
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_TO_VX
> +# include <iconv/loop.c>
> +#endif
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__to_utf16_loop_c)
> +__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
> +__to_utf16_loop;
> +
> +static void *
> +__to_utf16_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __to_utf16_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
> +      && dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __to_utf16_loop_etf3eh;
> +  else
> +    return __to_utf16_loop_c;
> +}
> +
> +strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
> +
> +
> +#include <iconv/skeleton.c>
> diff --git a/sysdeps/s390/utf8-utf16-z9.c b/sysdeps/s390/utf8-utf16-z9.c
> new file mode 100644
> index 0000000..29a0bf9
> --- /dev/null
> +++ b/sysdeps/s390/utf8-utf16-z9.c
> @@ -0,0 +1,818 @@
> +/* Conversion between UTF-16 and UTF-32 BE/internal.
> +
> +   This module uses the Z9-109 variants of the Convert Unicode
> +   instructions.
> +   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> +
> +   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> +   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> +
> +   Thanks to Daniel Appich who covered the relevant performance work
> +   in his diploma thesis.
> +
> +   This is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   This is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <dlfcn.h>
> +#include <stdint.h>
> +#include <unistd.h>
> +#include <dl-procinfo.h>
> +#include <gconv.h>
> +
> +#if defined HAVE_S390_VX_GCC_SUPPORT
> +# define ASM_CLOBBER_VR(NR) , NR
> +#else
> +# define ASM_CLOBBER_VR(NR)
> +#endif
> +
> +#if defined __s390x__
> +# define CONVERT_32BIT_SIZE_T(REG)
> +#else
> +# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
> +#endif
> +
> +/* Defines for skeleton.c.  */
> +#define DEFINE_INIT		0
> +#define DEFINE_FINI		0
> +#define MIN_NEEDED_FROM		1
> +#define MAX_NEEDED_FROM		4
> +#define MIN_NEEDED_TO		2
> +#define MAX_NEEDED_TO		4
> +#define FROM_LOOP		__from_utf8_loop
> +#define TO_LOOP			__to_utf8_loop
> +#define FROM_DIRECTION		(dir == from_utf8)
> +#define ONE_DIRECTION           0
> +
> +
> +/* UTF-16 big endian byte order mark.  */
> +#define BOM_UTF16	0xfeff
> +
> +/* Direction of the transformation.  */
> +enum direction
> +{
> +  illegal_dir,
> +  to_utf8,
> +  from_utf8
> +};
> +
> +struct utf8_data
> +{
> +  enum direction dir;
> +  int emit_bom;
> +};
> +
> +
> +extern int gconv_init (struct __gconv_step *step);
> +int
> +gconv_init (struct __gconv_step *step)
> +{
> +  /* Determine which direction.  */
> +  struct utf8_data *new_data;
> +  enum direction dir = illegal_dir;
> +  int emit_bom;
> +  int result;
> +
> +  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
> +
> +  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
> +      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
> +	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
> +    {
> +      dir = from_utf8;
> +    }
> +  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
> +	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
> +    {
> +      dir = to_utf8;
> +    }
> +
> +  result = __GCONV_NOCONV;
> +  if (dir != illegal_dir)
> +    {
> +      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
> +
> +      result = __GCONV_NOMEM;
> +      if (new_data != NULL)
> +	{
> +	  new_data->dir = dir;
> +	  new_data->emit_bom = emit_bom;
> +	  step->__data = new_data;
> +
> +	  if (dir == from_utf8)
> +	    {
> +	      step->__min_needed_from = MIN_NEEDED_FROM;
> +	      step->__max_needed_from = MIN_NEEDED_FROM;
> +	      step->__min_needed_to = MIN_NEEDED_TO;
> +	      step->__max_needed_to = MIN_NEEDED_TO;
> +	    }
> +	  else
> +	    {
> +	      step->__min_needed_from = MIN_NEEDED_TO;
> +	      step->__max_needed_from = MIN_NEEDED_TO;
> +	      step->__min_needed_to = MIN_NEEDED_FROM;
> +	      step->__max_needed_to = MIN_NEEDED_FROM;
> +	    }
> +
> +	  step->__stateful = 0;
> +
> +	  result = __GCONV_OK;
> +	}
> +    }
> +
> +  return result;
> +}
> +
> +
> +extern void gconv_end (struct __gconv_step *data);
> +void
> +gconv_end (struct __gconv_step *data)
> +{
> +  free (data->__data);
> +}
> +
> +/* The macro for the hardware loop.  This is used for both
> +   directions.  */
> +#define HARDWARE_CONVERT(INSTRUCTION)					\
> +  {									\
> +    register const unsigned char* pInput __asm__ ("8") = inptr;		\
> +    register size_t inlen __asm__ ("9") = inend - inptr;		\
> +    register unsigned char* pOutput __asm__ ("10") = outptr;		\
> +    register size_t outlen __asm__("11") = outend - outptr;		\
> +    unsigned long cc = 0;						\
> +									\
> +    __asm__ __volatile__ (".machine push       \n\t"			\
> +			  ".machine \"z9-109\" \n\t"			\
> +			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
> +			  "0: " INSTRUCTION "  \n\t"			\
> +			  ".machine pop        \n\t"			\
> +			  "   jo     0b        \n\t"			\
> +			  "   ipm    %2        \n"			\
> +			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> +			    "+d" (outlen), "+d" (inlen)			\
> +			  :						\
> +			  : "cc", "memory");				\
> +									\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +    cc >>= 28;								\
> +									\
> +    if (cc == 1)							\
> +      {									\
> +	result = __GCONV_FULL_OUTPUT;					\
> +      }									\
> +    else if (cc == 2)							\
> +      {									\
> +	result = __GCONV_ILLEGAL_INPUT;					\
> +      }									\
> +  }
> +
> +#define PREPARE_LOOP							\
> +  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> +  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> +									\
> +  if (emit_bom && !data->__internal_use					\
> +      && data->__invocation_counter == 0)				\
> +    {									\
> +      /* Emit the UTF-16 Byte Order Mark.  */				\
> +      if (__glibc_unlikely (outbuf + 2 > outend))			\
> +	return __GCONV_FULL_OUTPUT;					\
> +									\
> +      put16u (outbuf, BOM_UTF16);					\
> +      outbuf += 2;							\
> +    }
> +
> +/* Conversion function from UTF-8 to UTF-16.  */
> +#define BODY_FROM_HW(ASM)						\
> +  {									\
> +    ASM;								\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +									\
> +    int i;								\
> +    for (i = 1; inptr + i < inend && i < 5; ++i)			\
> +      if ((inptr[i] & 0xc0) != 0x80)					\
> +	break;								\
> +									\
> +    if (__glibc_likely (inptr + i == inend				\
> +			&& result == __GCONV_EMPTY_INPUT))		\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
> +  }
> +
> +#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
> +
> +#define HW_FROM_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
> +		  "vrepib %%v31,0x20\n\t"				\
> +		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
> +		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
> +		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
> +		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
> +		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> +		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  /* Enlarge to UTF-16.  */				\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "la %[R_IN],16(%[R_IN])\n\t"				\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "aghi %[R_INLEN],-16\n\t"				\
> +		  /* Store 32 bytes to buf_out.  */			\
> +		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
> +		  "aghi %[R_OUTLEN],-32\n\t"				\
> +		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  "10:\n\t"						\
> +		  /* At least one byte is > 0x7f.			\
> +		     Store the preceding 1-byte chars.  */		\
> +		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
> +		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
> +						     index to store. */ \
> +		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
> +		  "ahi %[R_TMP2],-1\n\t"				\
> +		  "jl 20f\n\t"						\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
> +		  "11:\n\t" /* Update pointers.  */			\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Handle multibyte utf8-char with convert instruction. */ \
> +		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +  }
> +#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
> +
> +
> +/* The software implementation is based on the code in gconv_simple.c.  */
> +#define BODY_FROM_C							\
> +  {									\
> +    /* Next input byte.  */						\
> +    uint16_t ch = *inptr;						\
> +									\
> +    if (__glibc_likely (ch < 0x80))					\
> +      {									\
> +	/* One byte sequence.  */					\
> +	++inptr;							\
> +      }									\
> +    else								\
> +      {									\
> +	uint_fast32_t cnt;						\
> +	uint_fast32_t i;						\
> +									\
> +	if (ch >= 0xc2 && ch < 0xe0)					\
> +	  {								\
> +	    /* We expect two bytes.  The first byte cannot be 0xc0	\
> +	       or 0xc1, otherwise the wide character could have been	\
> +	       represented using a single byte.  */			\
> +	    cnt = 2;							\
> +	    ch &= 0x1f;							\
> +	  }								\
> +	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
> +	  {								\
> +	    /* We expect three bytes.  */				\
> +	    cnt = 3;							\
> +	    ch &= 0x0f;							\
> +	  }								\
> +	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
> +	  {								\
> +	    /* We expect four bytes.  */				\
> +	    cnt = 4;							\
> +	    ch &= 0x07;							\
> +	  }								\
> +	else								\
> +	  {								\
> +	    /* Search the end of this ill-formed UTF-8 character.  This	\
> +	       is the next byte with (x & 0xc0) != 0x80.  */		\
> +	    i = 0;							\
> +	    do								\
> +	      ++i;							\
> +	    while (inptr + i < inend					\
> +		   && (*(inptr + i) & 0xc0) == 0x80			\
> +		   && i < 5);						\
> +									\
> +	  errout:							\
> +	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
> +	  }								\
> +									\
> +	if (__glibc_unlikely (inptr + cnt > inend))			\
> +	  {								\
> +	    /* We don't have enough input.  But before we report	\
> +	       that check that all the bytes are correct.  */		\
> +	    for (i = 1; inptr + i < inend; ++i)				\
> +	      if ((inptr[i] & 0xc0) != 0x80)				\
> +		break;							\
> +									\
> +	    if (__glibc_likely (inptr + i == inend))			\
> +	      {								\
> +		result = __GCONV_INCOMPLETE_INPUT;			\
> +		break;							\
> +	      }								\
> +									\
> +	    goto errout;						\
> +	  }								\
> +									\
> +	if (cnt == 4)							\
> +	  {								\
> +	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
> +	       low) are needed.  */					\
> +	    uint16_t zabcd, high, low;					\
> +									\
> +	    if (__glibc_unlikely (outptr + 4 > outend))			\
> +	      {								\
> +		/* Overflow in the output buffer.  */			\
> +		result = __GCONV_FULL_OUTPUT;				\
> +		break;							\
> +	      }								\
> +									\
> +	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
> +	    for (i = 1; i < cnt; ++i)					\
> +	      {								\
> +		if ((inptr[i] & 0xc0) != 0x80)				\
> +		  /* This is an illegal encoding.  */			\
> +		  goto errout;						\
> +	      }								\
> +									\
> +	    /* See Principles of Operations cu12.  */			\
> +	    zabcd = (((inptr[0] & 0x7) << 2) |				\
> +		     ((inptr[1] & 0x30) >> 4)) - 1;			\
> +									\
> +	    /* z-bit must be zero after subtracting 1.  */		\
> +	    if (zabcd & 0x10)						\
> +	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
> +									\
> +	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
> +	    high |= zabcd << 6;                         /* abcd bits */	\
> +	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
> +	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
> +									\
> +	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
> +	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
> +	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
> +	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
> +									\
> +	    put16 (outptr, high);					\
> +	    outptr += 2;						\
> +	    put16 (outptr, low);					\
> +	    outptr += 2;						\
> +	    inptr += 4;							\
> +	    continue;							\
> +	  }								\
> +	else								\
> +	  {								\
> +	    /* Read the possible remaining bytes.  */			\
> +	    for (i = 1; i < cnt; ++i)					\
> +	      {								\
> +		uint16_t byte = inptr[i];				\
> +									\
> +		if ((byte & 0xc0) != 0x80)				\
> +		  /* This is an illegal encoding.  */			\
> +		  break;						\
> +									\
> +		ch <<= 6;						\
> +		ch |= byte & 0x3f;					\
> +	      }								\
> +									\
> +	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
> +	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
> +	       have been represented with fewer than cnt bytes.  */	\
> +	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
> +		/* Do not accept UTF-16 surrogates.  */			\
> +		|| (ch >= 0xd800 && ch <= 0xdfff))			\
> +	      {								\
> +		/* This is an illegal encoding.  */			\
> +		goto errout;						\
> +	      }								\
> +									\
> +	    inptr += cnt;						\
> +	  }								\
> +      }									\
> +    /* Now adjust the pointers and store the result.  */		\
> +    *((uint16_t *) outptr) = ch;					\
> +    outptr += sizeof (uint16_t);					\
> +  }
> +
> +/* Generate loop-function with software implementation.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_c
> +#define LOOP_NEED_FLAGS
> +#define BODY			BODY_FROM_C
> +#include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_etf3eh
> +#define LOOP_NEED_FLAGS
> +#define BODY			BODY_FROM_ETF3EH
> +#include <iconv/loop.c>
> +
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector and utf-convert instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> +# define LOOPFCT		__from_utf8_loop_vx
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_VX
> +# include <iconv/loop.c>
> +#endif
> +
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__from_utf8_loop_c)
> +__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
> +__from_utf8_loop;
> +
> +static void *
> +__from_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __from_utf8_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
> +      && dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __from_utf8_loop_etf3eh;
> +  else
> +    return __from_utf8_loop_c;
> +}
> +
> +strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
> +
> +/* Conversion from UTF-16 to UTF-8.  */
> +
> +/* The software routine is based on the functionality of the S/390
> +   hardware instruction (cu21) as described in the Principles of
> +   Operation.  */
> +#define BODY_TO_C							\
> +  {									\
> +    uint16_t c = get16 (inptr);						\
> +									\
> +    if (__glibc_likely (c <= 0x007f))					\
> +      {									\
> +	/* Single byte UTF-8 char.  */					\
> +	*outptr = c & 0xff;						\
> +	outptr++;							\
> +      }									\
> +    else if (c >= 0x0080 && c <= 0x07ff)				\
> +      {									\
> +	/* Two byte UTF-8 char.  */					\
> +									\
> +	if (__glibc_unlikely (outptr + 2 > outend))			\
> +	  {								\
> +	    /* Overflow in the output buffer.  */			\
> +	    result = __GCONV_FULL_OUTPUT;				\
> +	    break;							\
> +	  }								\
> +									\
> +	outptr[0] = 0xc0;						\
> +	outptr[0] |= c >> 6;						\
> +									\
> +	outptr[1] = 0x80;						\
> +	outptr[1] |= c & 0x3f;						\
> +									\
> +	outptr += 2;							\
> +      }									\
> +    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
> +      {									\
> +	/* Three byte UTF-8 char.  */					\
> +									\
> +	if (__glibc_unlikely (outptr + 3 > outend))			\
> +	  {								\
> +	    /* Overflow in the output buffer.  */			\
> +	    result = __GCONV_FULL_OUTPUT;				\
> +	    break;							\
> +	  }								\
> +	outptr[0] = 0xe0;						\
> +	outptr[0] |= c >> 12;						\
> +									\
> +	outptr[1] = 0x80;						\
> +	outptr[1] |= (c >> 6) & 0x3f;					\
> +									\
> +	outptr[2] = 0x80;						\
> +	outptr[2] |= c & 0x3f;						\
> +									\
> +	outptr += 3;							\
> +      }									\
> +    else if (c >= 0xd800 && c <= 0xdbff)				\
> +      {									\
> +	/* Four byte UTF-8 char.  */					\
> +	uint16_t low, uvwxy;						\
> +									\
> +	if (__glibc_unlikely (outptr + 4 > outend))			\
> +	  {								\
> +	    /* Overflow in the output buffer.  */			\
> +	    result = __GCONV_FULL_OUTPUT;				\
> +	    break;							\
> +	  }								\
> +	if (__glibc_unlikely (inptr + 4 > inend))			\
> +	  {								\
> +	    result = __GCONV_INCOMPLETE_INPUT;				\
> +	    break;							\
> +	  }								\
> +									\
> +	inptr += 2;							\
> +	low = get16 (inptr);						\
> +									\
> +	if ((low & 0xfc00) != 0xdc00)					\
> +	  {								\
> +	    inptr -= 2;							\
> +	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
> +	  }								\
> +	uvwxy = ((c >> 6) & 0xf) + 1;					\
> +	outptr[0] = 0xf0;						\
> +	outptr[0] |= uvwxy >> 2;					\
> +									\
> +	outptr[1] = 0x80;						\
> +	outptr[1] |= (uvwxy << 4) & 0x30;				\
> +	outptr[1] |= (c >> 2) & 0x0f;					\
> +									\
> +	outptr[2] = 0x80;						\
> +	outptr[2] |= (c & 0x03) << 4;					\
> +	outptr[2] |= (low >> 6) & 0x0f;					\
> +									\
> +	outptr[3] = 0x80;						\
> +	outptr[3] |= low & 0x3f;					\
> +									\
> +	outptr += 4;							\
> +      }									\
> +    else								\
> +      {									\
> +	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
> +      }									\
> +    inptr += 2;								\
> +  }
> +
> +#define BODY_TO_VX							\
> +  {									\
> +    size_t inlen  = inend - inptr;					\
> +    size_t outlen  = outend - outptr;					\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  /* Setup to check for values <= 0x7f.  */		\
> +		  "larl %[R_TMP],9f\n\t"				\
> +		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> +		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
> +		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
> +		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
> +		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
> +		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
> +		  "lghi %[R_TMP2],0\n\t"				\
> +		  /* Check for > 1byte UTF-8 chars.  */			\
> +		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
> +		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  /* Shorten to UTF-8.  */				\
> +		  "vpkh %%v18,%%v16,%%v17\n\t"				\
> +		  "la %[R_IN],32(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-32\n\t"				\
> +		  /* Store 16 bytes to buf_out.  */			\
> +		  "vst %%v18,0(%[R_OUT])\n\t"				\
> +		  "aghi %[R_OUTLEN],-16\n\t"				\
> +		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],32,2f\n\t"				\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
> +		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  /* At least one byte is > 0x7f.			\
> +		     Store the preceding 1-byte chars.  */		\
> +		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
> +		  "10:\n\t"						\
> +		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
> +		  /* Shorten to UTF-8.  */				\
> +		  "vpkh %%v18,%%v16,%%v17\n\t"				\
> +		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
> +		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> +		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> +		  "jl 13f\n\t"						\
> +		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  /* Update pointers.  */				\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  "13:\n\t"						\
> +		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
> +		  "lghi %[R_TMP2],16\n\t"				\
> +		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
> +		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-2\n\t"				\
> +		  "j 22f\n\t"						\
> +		  /* Handle remaining bytes.  */			\
> +		  "2:\n\t"						\
> +		  /* Zero, one or more bytes available?  */		\
> +		  "clgfi %[R_INLEN],1\n\t"				\
> +		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
> +		  "jle 99f\n\t" /* End if less than two bytes.  */	\
> +		  /* Calculate remaining uint16_t values in inptr.  */	\
> +		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
> +		  /* Handle multibyte utf8-char. */			\
> +		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
> +		  "aghi %[R_INLEN],-2\n\t"				\
> +		  /* Test if ch is 1-byte UTF-8 char.  */		\
> +		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
> +		  /* Handle 1-byte UTF-8 char.  */			\
> +		  "31: slgfi %[R_OUTLEN],1\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 2-byte UTF-8 char.  */		\
> +		  "22: clfi %[R_TMP],0x7ff\n\t"				\
> +		  "jh 23f\n\t"						\
> +		  /* Handle 2-byte UTF-8 char.  */			\
> +		  "32: slgfi %[R_OUTLEN],2\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "llill %[R_TMP3],0xc080\n\t"				\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
> +		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 3-byte UTF-8 char.  */		\
> +		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
> +		  "jh 24f\n\t"						\
> +		  /* Handle 3-byte UTF-8 char.  */			\
> +		  "33: slgfi %[R_OUTLEN],3\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "llilf %[R_TMP3],0xe08080\n\t"			\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
> +		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
> +		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 4-byte UTF-8 char.  */		\
> +		  "24: clfi %[R_TMP],0xdfff\n\t"			\
> +		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
> +		  "clfi %[R_TMP],0xdbff\n\t"				\
> +		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
> +				  without a preceding high surrogate.  */ \
> +		  /* Handle 4-byte UTF-8 char.  */			\
> +		  "34: slgfi %[R_OUTLEN],4\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "slgfi %[R_INLEN],2\n\t"				\
> +		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
> +		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
> +		  "llilf %[R_TMP3],0xf0808080\n\t"			\
> +		  "aghi %[R_TMP],0x40\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
> +		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
> +		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
> +		  "nilf %[R_TMP],0xfc00\n\t"				\
> +		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
> +		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
> +		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "aghi %[R_TMP2],-2\n\t"				\
> +		  "jh 20b\n\t"						\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
> +		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
> +		  "99:\n\t"						\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (inptr)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result != __GCONV_ILLEGAL_INPUT)				\
> +      break;								\
> +									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
> +  }
> +
> +/* Generate loop-function with software implementation.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +# define LOOPFCT		__to_utf8_loop_c
> +# define BODY                   BODY_TO_C
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +
> +/* Generate loop-function with software implementation.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +# define LOOPFCT		__to_utf8_loop_vx
> +# define BODY                   BODY_TO_VX
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__to_utf8_loop_c)
> +__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
> +__to_utf8_loop;
> +
> +static void *
> +__to_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __to_utf8_loop_vx;
> +  else
> +    return __to_utf8_loop_c;
> +}
> +
> +strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
> +
> +#else
> +# define LOOPFCT		TO_LOOP
> +# define BODY                   BODY_TO_C
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
> +
> +#include <iconv/skeleton.c>
> diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
> new file mode 100644
> index 0000000..1b2d6a2
> --- /dev/null
> +++ b/sysdeps/s390/utf8-utf32-z9.c
> @@ -0,0 +1,820 @@
> +/* Conversion between UTF-8 and UTF-32 BE/internal.
> +
> +   This module uses the Z9-109 variants of the Convert Unicode
> +   instructions.
> +   Copyright (C) 1997-2016 Free Software Foundation, Inc.
> +
> +   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
> +   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
> +
> +   Thanks to Daniel Appich who covered the relevant performance work
> +   in his diploma thesis.
> +
> +   This is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   This is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <dlfcn.h>
> +#include <stdint.h>
> +#include <unistd.h>
> +#include <dl-procinfo.h>
> +#include <gconv.h>
> +
> +#if defined HAVE_S390_VX_GCC_SUPPORT
> +# define ASM_CLOBBER_VR(NR) , NR
> +#else
> +# define ASM_CLOBBER_VR(NR)
> +#endif
> +
> +#if defined __s390x__
> +# define CONVERT_32BIT_SIZE_T(REG)
> +#else
> +# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
> +#endif
> +
> +/* Defines for skeleton.c.  */
> +#define DEFINE_INIT		0
> +#define DEFINE_FINI		0
> +#define MIN_NEEDED_FROM		1
> +#define MAX_NEEDED_FROM		6
> +#define MIN_NEEDED_TO		4
> +#define FROM_LOOP		__from_utf8_loop
> +#define TO_LOOP			__to_utf8_loop
> +#define FROM_DIRECTION		(dir == from_utf8)
> +#define ONE_DIRECTION           0
> +
> +/* UTF-32 big endian byte order mark.  */
> +#define BOM			0x0000feffu
> +
> +/* Direction of the transformation.  */
> +enum direction
> +{
> +  illegal_dir,
> +  to_utf8,
> +  from_utf8
> +};
> +
> +struct utf8_data
> +{
> +  enum direction dir;
> +  int emit_bom;
> +};
> +
> +
> +extern int gconv_init (struct __gconv_step *step);
> +int
> +gconv_init (struct __gconv_step *step)
> +{
> +  /* Determine which direction.  */
> +  struct utf8_data *new_data;
> +  enum direction dir = illegal_dir;
> +  int emit_bom;
> +  int result;
> +
> +  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
> +
> +  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
> +      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
> +	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
> +	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
> +    {
> +      dir = from_utf8;
> +    }
> +  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
> +	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
> +	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
> +    {
> +      dir = to_utf8;
> +    }
> +
> +  result = __GCONV_NOCONV;
> +  if (dir != illegal_dir)
> +    {
> +      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
> +
> +      result = __GCONV_NOMEM;
> +      if (new_data != NULL)
> +	{
> +	  new_data->dir = dir;
> +	  new_data->emit_bom = emit_bom;
> +	  step->__data = new_data;
> +
> +	  if (dir == from_utf8)
> +	    {
> +	      step->__min_needed_from = MIN_NEEDED_FROM;
> +	      step->__max_needed_from = MIN_NEEDED_FROM;
> +	      step->__min_needed_to = MIN_NEEDED_TO;
> +	      step->__max_needed_to = MIN_NEEDED_TO;
> +	    }
> +	  else
> +	    {
> +	      step->__min_needed_from = MIN_NEEDED_TO;
> +	      step->__max_needed_from = MIN_NEEDED_TO;
> +	      step->__min_needed_to = MIN_NEEDED_FROM;
> +	      step->__max_needed_to = MIN_NEEDED_FROM;
> +	    }
> +
> +	  step->__stateful = 0;
> +
> +	  result = __GCONV_OK;
> +	}
> +    }
> +
> +  return result;
> +}
> +
> +
> +extern void gconv_end (struct __gconv_step *data);
> +void
> +gconv_end (struct __gconv_step *data)
> +{
> +  free (data->__data);
> +}
> +
> +/* The macro for the hardware loop.  This is used for both
> +   directions.  */
> +#define HARDWARE_CONVERT(INSTRUCTION)					\
> +  {									\
> +    register const unsigned char* pInput __asm__ ("8") = inptr;		\
> +    register size_t inlen __asm__ ("9") = inend - inptr;		\
> +    register unsigned char* pOutput __asm__ ("10") = outptr;		\
> +    register size_t outlen __asm__("11") = outend - outptr;		\
> +    unsigned long cc = 0;						\
> +									\
> +    __asm__ __volatile__ (".machine push       \n\t"			\
> +			  ".machine \"z9-109\" \n\t"			\
> +			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
> +			  "0: " INSTRUCTION "  \n\t"			\
> +			  ".machine pop        \n\t"			\
> +			  "   jo     0b        \n\t"			\
> +			  "   ipm    %2        \n"			\
> +			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> +			    "+d" (outlen), "+d" (inlen)			\
> +			  :						\
> +			  : "cc", "memory");				\
> +									\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +    cc >>= 28;								\
> +									\
> +    if (cc == 1)							\
> +      {									\
> +	result = __GCONV_FULL_OUTPUT;					\
> +      }									\
> +    else if (cc == 2)							\
> +      {									\
> +	result = __GCONV_ILLEGAL_INPUT;					\
> +      }									\
> +  }
> +
> +#define PREPARE_LOOP							\
> +  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> +  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> +									\
> +  if (emit_bom && !data->__internal_use					\
> +      && data->__invocation_counter == 0)				\
> +    {									\
> +      /* Emit the Byte Order Mark.  */					\
> +      if (__glibc_unlikely (outbuf + 4 > outend))			\
> +	return __GCONV_FULL_OUTPUT;					\
> +									\
> +      put32u (outbuf, BOM);						\
> +      outbuf += 4;							\
> +    }
> +
> +/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
> +
> +#define STORE_REST_COMMON						      \
> +  {									      \
> +    /* We store the remaining bytes while converting them into the UCS4	      \
> +       format.  We can assume that the first byte in the buffer is	      \
> +       correct and that it requires a larger number of bytes than there	      \
> +       are in the input buffer.  */					      \
> +    wint_t ch = **inptrp;						      \
> +    size_t cnt, r;							      \
> +									      \
> +    state->__count = inend - *inptrp;					      \
> +									      \
> +    assert (ch != 0xc0 && ch != 0xc1);					      \
> +    if (ch >= 0xc2 && ch < 0xe0)					      \
> +      {									      \
> +	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
> +	   0xc1, otherwise the wide character could have been		      \
> +	   represented using a single byte.  */				      \
> +	cnt = 2;							      \
> +	ch &= 0x1f;							      \
> +      }									      \
> +    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
> +      {									      \
> +	/* We expect three bytes.  */					      \
> +	cnt = 3;							      \
> +	ch &= 0x0f;							      \
> +      }									      \
> +    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
> +      {									      \
> +	/* We expect four bytes.  */					      \
> +	cnt = 4;							      \
> +	ch &= 0x07;							      \
> +      }									      \
> +    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
> +      {									      \
> +	/* We expect five bytes.  */					      \
> +	cnt = 5;							      \
> +	ch &= 0x03;							      \
> +      }									      \
> +    else								      \
> +      {									      \
> +	/* We expect six bytes.  */					      \
> +	cnt = 6;							      \
> +	ch &= 0x01;							      \
> +      }									      \
> +									      \
> +    /* The first byte is already consumed.  */				      \
> +    r = cnt - 1;							      \
> +    while (++(*inptrp) < inend)						      \
> +      {									      \
> +	ch <<= 6;							      \
> +	ch |= **inptrp & 0x3f;						      \
> +	--r;								      \
> +      }									      \
> +									      \
> +    /* Shift for the so far missing bytes.  */				      \
> +    ch <<= r * 6;							      \
> +									      \
> +    /* Store the number of bytes expected for the entire sequence.  */	      \
> +    state->__count |= cnt << 8;						      \
> +									      \
> +    /* Store the value.  */						      \
> +    state->__value.__wch = ch;						      \
> +  }
> +
> +#define UNPACK_BYTES_COMMON \
> +  {									      \
> +    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
> +    wint_t wch = state->__value.__wch;					      \
> +    size_t ntotal = state->__count >> 8;				      \
> +									      \
> +    inlen = state->__count & 255;					      \
> +									      \
> +    bytebuf[0] = inmask[ntotal - 2];					      \
> +									      \
> +    do									      \
> +      {									      \
> +	if (--ntotal < inlen)						      \
> +	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
> +	wch >>= 6;							      \
> +      }									      \
> +    while (ntotal > 1);							      \
> +									      \
> +    bytebuf[0] |= wch;							      \
> +  }
> +
> +#define CLEAR_STATE_COMMON \
> +  state->__count = 0
> +
> +#define BODY_FROM_HW(ASM)						\
> +  {									\
> +    ASM;								\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +									\
> +    int i;								\
> +    for (i = 1; inptr + i < inend && i < 5; ++i)			\
> +      if ((inptr[i] & 0xc0) != 0x80)					\
> +	break;								\
> +									\
> +    if (__glibc_likely (inptr + i == inend				\
> +			&& result == __GCONV_EMPTY_INPUT))		\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
> +  }
> +
> +/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
> +#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
> +
> +
> +/* The software routine is copied from gconv_simple.c.  */
> +#define BODY_FROM_C							\
> +  {									\
> +    /* Next input byte.  */						\
> +    uint32_t ch = *inptr;						\
> +									\
> +    if (__glibc_likely (ch < 0x80))					\
> +      {									\
> +	/* One byte sequence.  */					\
> +	++inptr;							\
> +      }									\
> +    else								\
> +      {									\
> +	uint_fast32_t cnt;						\
> +	uint_fast32_t i;						\
> +									\
> +	if (ch >= 0xc2 && ch < 0xe0)					\
> +	  {								\
> +	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
> +	       0xc1, otherwise the wide character could have been	\
> +	       represented using a single byte.  */			\
> +	    cnt = 2;							\
> +	    ch &= 0x1f;							\
> +	  }								\
> +	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
> +	  {								\
> +	    /* We expect three bytes.  */				\
> +	    cnt = 3;							\
> +	    ch &= 0x0f;							\
> +	  }								\
> +	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
> +	  {								\
> +	    /* We expect four bytes.  */				\
> +	    cnt = 4;							\
> +	    ch &= 0x07;							\
> +	  }								\
> +	else								\
> +	  {								\
> +	    /* Search the end of this ill-formed UTF-8 character.  This	\
> +	       is the next byte with (x & 0xc0) != 0x80.  */		\
> +	    i = 0;							\
> +	    do								\
> +	      ++i;							\
> +	    while (inptr + i < inend					\
> +		   && (*(inptr + i) & 0xc0) == 0x80			\
> +		   && i < 5);						\
> +									\
> +	  errout:							\
> +	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
> +	  }								\
> +									\
> +	if (__glibc_unlikely (inptr + cnt > inend))			\
> +	  {								\
> +	    /* We don't have enough input.  But before we report	\
> +	       that check that all the bytes are correct.  */		\
> +	    for (i = 1; inptr + i < inend; ++i)				\
> +	      if ((inptr[i] & 0xc0) != 0x80)				\
> +		break;							\
> +									\
> +	    if (__glibc_likely (inptr + i == inend))			\
> +	      {								\
> +		result = __GCONV_INCOMPLETE_INPUT;			\
> +		break;							\
> +	      }								\
> +									\
> +	    goto errout;						\
> +	  }								\
> +									\
> +	/* Read the possible remaining bytes.  */			\
> +	for (i = 1; i < cnt; ++i)					\
> +	  {								\
> +	    uint32_t byte = inptr[i];					\
> +									\
> +	    if ((byte & 0xc0) != 0x80)					\
> +	      /* This is an illegal encoding.  */			\
> +	      break;							\
> +									\
> +	    ch <<= 6;							\
> +	    ch |= byte & 0x3f;						\
> +	  }								\
> +									\
> +	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
> +	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
> +	   have been represented with fewer than cnt bytes.  */		\
> +	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
> +	    /* Do not accept UTF-16 surrogates.  */			\
> +	    || (ch >= 0xd800 && ch <= 0xdfff)				\
> +	    || (ch > 0x10ffff))						\
> +	  {								\
> +	    /* This is an illegal encoding.  */				\
> +	    goto errout;						\
> +	  }								\
> +									\
> +	inptr += cnt;							\
> +      }									\
> +									\
> +    /* Now adjust the pointers and store the result.  */		\
> +    *((uint32_t *) outptr) = ch;					\
> +    outptr += sizeof (uint32_t);					\
> +  }
> +
> +#define HW_FROM_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
> +		  "vrepib %%v31,0x20\n\t"				\
> +		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
> +		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
> +		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
> +		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
> +		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> +		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  /* Enlarge to UCS4.  */				\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "la %[R_IN],16(%[R_IN])\n\t"				\
> +		  "vuplhh %%v20,%%v18\n\t"				\
> +		  "aghi %[R_INLEN],-16\n\t"				\
> +		  "vupllh %%v21,%%v18\n\t"				\
> +		  "aghi %[R_OUTLEN],-64\n\t"				\
> +		  "vuplhh %%v22,%%v19\n\t"				\
> +		  "vupllh %%v23,%%v19\n\t"				\
> +		  /* Store 64 bytes to buf_out.  */			\
> +		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
> +		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  "10:\n\t"						\
> +		  /* At least one byte is > 0x7f.			\
> +		     Store the preceding 1-byte chars.  */		\
> +		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
> +		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
> +						     index to store. */ \
> +		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
> +		  "ahi %[R_TMP2],-1\n\t"				\
> +		  "jl 20f\n\t"						\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "vuplhh %%v20,%%v18\n\t"				\
> +		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllh %%v21,%%v18\n\t"				\
> +		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "vuplhh %%v22,%%v19\n\t"				\
> +		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllh %%v23,%%v19\n\t"				\
> +		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
> +		  "11:\n\t"						\
> +		  /* Update pointers.  */				\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Handle multibyte utf8-char with convert instruction. */ \
> +		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
> +		    ASM_CLOBBER_VR ("v31")				\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +  }
> +#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
> +
> +/* These definitions apply to the UTF-8 to UTF-32 direction.  The
> +   software implementation for UTF-8 still supports multibyte
> +   characters up to 6 bytes whereas the hardware variant does not.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_c
> +
> +#define LOOP_NEED_FLAGS
> +
> +#define STORE_REST		STORE_REST_COMMON
> +#define UNPACK_BYTES		UNPACK_BYTES_COMMON
> +#define CLEAR_STATE		CLEAR_STATE_COMMON
> +#define BODY			BODY_FROM_C
> +#include <iconv/loop.c>
> +
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_etf3eh
> +
> +#define LOOP_NEED_FLAGS
> +
> +#define STORE_REST		STORE_REST_COMMON
> +#define UNPACK_BYTES		UNPACK_BYTES_COMMON
> +#define CLEAR_STATE		CLEAR_STATE_COMMON
> +#define BODY			BODY_FROM_ETF3EH
> +#include <iconv/loop.c>
> +
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		__from_utf8_loop_vx
> +
> +# define LOOP_NEED_FLAGS
> +
> +# define STORE_REST		STORE_REST_COMMON
> +# define UNPACK_BYTES		UNPACK_BYTES_COMMON
> +# define CLEAR_STATE		CLEAR_STATE_COMMON
> +# define BODY			BODY_FROM_VX
> +# include <iconv/loop.c>
> +#endif
> +
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__from_utf8_loop_c)
> +__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
> +__from_utf8_loop;
> +
> +static void *
> +__from_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __from_utf8_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
> +      && dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __from_utf8_loop_etf3eh;
> +  else
> +    return __from_utf8_loop_c;
> +}
> +
> +strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
> +
> +
> +/* Conversion from UTF-32 internal/BE to UTF-8.  */
> +#define BODY_TO_HW(ASM)							\
> +  {									\
> +    ASM;								\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +    if (inptr + 4 > inend)						\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> +  }
> +
> +/* The hardware routine uses the S/390 cu41 instruction.  */
> +#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
> +
> +/* The hardware routine uses the S/390 vector and cu41 instructions.  */
> +#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
> +
> +/* The software routine mimics the S/390 cu41 instruction.  */
> +#define BODY_TO_C						\
> +  {								\
> +    uint32_t wc = *((const uint32_t *) inptr);			\
> +								\
> +    if (__glibc_likely (wc <= 0x7f))				\
> +      {								\
> +	/* Single UTF-8 char.  */				\
> +	*outptr = (uint8_t)wc;					\
> +	outptr++;						\
> +      }								\
> +    else if (wc <= 0x7ff)					\
> +      {								\
> +	/* Two UTF-8 chars.  */					\
> +	if (__glibc_unlikely (outptr + 2 > outend))		\
> +	  {							\
> +	    /* Overflow in the output buffer.  */		\
> +	    result = __GCONV_FULL_OUTPUT;			\
> +	    break;						\
> +	  }							\
> +								\
> +	outptr[0] = 0xc0;					\
> +	outptr[0] |= wc >> 6;					\
> +								\
> +	outptr[1] = 0x80;					\
> +	outptr[1] |= wc & 0x3f;					\
> +								\
> +	outptr += 2;						\
> +      }								\
> +    else if (wc <= 0xffff)					\
> +      {								\
> +	/* Three UTF-8 chars.  */				\
> +	if (__glibc_unlikely (outptr + 3 > outend))		\
> +	  {							\
> +	    /* Overflow in the output buffer.  */		\
> +	    result = __GCONV_FULL_OUTPUT;			\
> +	    break;						\
> +	  }							\
> +	if (wc >= 0xd800 && wc < 0xdc00)			\
> +	  {							\
> +	    /* Do not accept UTF-16 surrogates.   */		\
> +	    result = __GCONV_ILLEGAL_INPUT;			\
> +	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
> +	  }							\
> +	outptr[0] = 0xe0;					\
> +	outptr[0] |= wc >> 12;					\
> +								\
> +	outptr[1] = 0x80;					\
> +	outptr[1] |= (wc >> 6) & 0x3f;				\
> +								\
> +	outptr[2] = 0x80;					\
> +	outptr[2] |= wc & 0x3f;					\
> +								\
> +	outptr += 3;						\
> +      }								\
> +      else if (wc <= 0x10ffff)					\
> +	{							\
> +	  /* Four UTF-8 chars.  */				\
> +	  if (__glibc_unlikely (outptr + 4 > outend))		\
> +	    {							\
> +	      /* Overflow in the output buffer.  */		\
> +	      result = __GCONV_FULL_OUTPUT;			\
> +	      break;						\
> +	    }							\
> +	  outptr[0] = 0xf0;					\
> +	  outptr[0] |= wc >> 18;				\
> +								\
> +	  outptr[1] = 0x80;					\
> +	  outptr[1] |= (wc >> 12) & 0x3f;			\
> +								\
> +	  outptr[2] = 0x80;					\
> +	  outptr[2] |= (wc >> 6) & 0x3f;			\
> +								\
> +	  outptr[3] = 0x80;					\
> +	  outptr[3] |= wc & 0x3f;				\
> +								\
> +	  outptr += 4;						\
> +	}							\
> +      else							\
> +	{							\
> +	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
> +	}							\
> +    inptr += 4;							\
> +  }
> +
> +#define HW_TO_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2;						\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
> +		  "vzero %%v21\n\t"					\
> +		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
> +		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
> +		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
> +		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
> +		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
> +		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
> +		  "lghi %[R_TMP],0\n\t"					\
> +		  /* Shorten to byte values.  */			\
> +		  "vpkf %%v23,%%v16,%%v17\n\t"				\
> +		  "vpkf %%v24,%%v18,%%v19\n\t"				\
> +		  "vpkh %%v23,%%v23,%%v24\n\t"				\
> +		  /* Checking for values > 0x7f.  */			\
> +		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
> +		  "jno 10f\n\t"						\
> +		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
> +		  "jno 11f\n\t"						\
> +		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
> +		  "jno 12f\n\t"						\
> +		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
> +		  "jno 13f\n\t"						\
> +		  /* Store 16bytes to outptr.  */			\
> +		  "vst %%v23,0(%[R_OUT])\n\t"				\
> +		  "aghi %[R_INLEN],-64\n\t"				\
> +		  "aghi %[R_OUTLEN],-16\n\t"				\
> +		  "la %[R_IN],64(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],64,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Found a value > 0x7f.  */				\
> +		  "13: ahi %[R_TMP],4\n\t"				\
> +		  "12: ahi %[R_TMP],4\n\t"				\
> +		  "11: ahi %[R_TMP],4\n\t"				\
> +		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
> +		  "srlg %[R_I],%[R_I],2\n\t"				\
> +		  "agr %[R_I],%[R_TMP]\n\t"				\
> +		  "je 20f\n\t"						\
> +		  /* Store characters before invalid one...  */		\
> +		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
> +		  "15: aghi %[R_I],-1\n\t"				\
> +		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
> +		  /* ... and update pointers.  */			\
> +		  "aghi %[R_I],1\n\t"					\
> +		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
> +		  "sllg %[R_I],%[R_I],2\n\t"				\
> +		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_I]\n\t"				\
> +		  /* Handle multibyte utf8-char with convert instruction. */ \
> +		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
> +		    , [R_I] "=a" (tmp2)					\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
> +		    ASM_CLOBBER_VR ("v24")				\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +  }
> +
> +/* Generate loop-function with software routing.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf8_loop_c
> +#define BODY			BODY_TO_C
> +#define LOOP_NEED_FLAGS
> +#include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf8_loop_etf3eh
> +#define LOOP_NEED_FLAGS
> +#define BODY			BODY_TO_ETF3EH
> +#include <iconv/loop.c>
> +
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector and utf-convert instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +# define LOOPFCT		__to_utf8_loop_vx
> +# define BODY			BODY_TO_VX
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +#endif
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__to_utf8_loop_c)
> +__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
> +__to_utf8_loop;
> +
> +static void *
> +__to_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __to_utf8_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
> +      && dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __to_utf8_loop_etf3eh;
> +  else
> +    return __to_utf8_loop_c;
> +}
> +
> +strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
> +
> +
> +#include <iconv/skeleton.c>
>

[-- Attachment #2: 0010-S390-Use-s390-64-specific-ionv-modules-on-s390-32-to.patch --]
[-- Type: text/x-patch, Size: 183989 bytes --]

From 9bb86665567646e706d2ef58569eaf8d5bb9904e Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32,
 too.

This patch reworks the existing s390 64bit specific iconv modules in order
to use them on s390 31bit, too.

Thus the parts for subdirectory iconvdata in sysdeps/s390/s390-64/Makefile
were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
All those modules are moved from sysdeps/s390/s390-64 directory to sysdeps/s390.

The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
on count) instruction on 31bit s390 instead of brctg, because the brctg is a
zarch instruction and is not available on a 31bit kernel.

The utf modules are using zarch instructions, thus the directive machinemode
zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
only if zarch instructions are available (64bit kernel in 31bit compat-mode).
Furthermore some variable types were changed. E.g. unsigned long long would be
a register pair on s390 31bit, but we want only one single register.
For variables of type size_t the register contents have to be enlarged from a
32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
in such cases.

ChangeLog:

	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
	Move to ...
	* sysdeps/s390/Makefile: ... here.
	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
	* sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
	(BRANCH_ON_COUNT): New define.
	(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
	* sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
	* sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
	* sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
---
 sysdeps/s390/Makefile                        |  35 ++
 sysdeps/s390/iso-8859-1_cp037_z900.c         | 262 +++++++++
 sysdeps/s390/s390-64/Makefile                |  36 --
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 256 ---------
 sysdeps/s390/s390-64/utf16-utf32-z9.c        | 624 --------------------
 sysdeps/s390/s390-64/utf8-utf16-z9.c         | 806 --------------------------
 sysdeps/s390/s390-64/utf8-utf32-z9.c         | 807 --------------------------
 sysdeps/s390/utf16-utf32-z9.c                | 636 +++++++++++++++++++++
 sysdeps/s390/utf8-utf16-z9.c                 | 818 ++++++++++++++++++++++++++
 sysdeps/s390/utf8-utf32-z9.c                 | 820 +++++++++++++++++++++++++++
 10 files changed, 2571 insertions(+), 2529 deletions(-)
 create mode 100644 sysdeps/s390/Makefile
 create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
 create mode 100644 sysdeps/s390/utf16-utf32-z9.c
 create mode 100644 sysdeps/s390/utf8-utf16-z9.c
 create mode 100644 sysdeps/s390/utf8-utf32-z9.c

diff --git a/sysdeps/s390/Makefile b/sysdeps/s390/Makefile
new file mode 100644
index 0000000..985a9df
--- /dev/null
+++ b/sysdeps/s390/Makefile
@@ -0,0 +1,35 @@
+ifeq ($(subdir),iconvdata)
+ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
+ISO-8859-1_CP037_Z900-map := gconv.map
+
+UTF8_UTF32_Z9-routines := utf8-utf32-z9
+UTF8_UTF32_Z9-map := gconv.map
+
+UTF16_UTF32_Z9-routines := utf16-utf32-z9
+UTF16_UTF32_Z9-map := gconv.map
+
+UTF8_UTF16_Z9-routines := utf8-utf16-z9
+UTF8_UTF16_Z9-map := gconv.map
+
+s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
+
+extra-modules-left += $(s390x-iconv-modules)
+include extra-module.mk
+
+cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
+lib := iconvdata
+include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
+
+extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
+install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
+
+$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
+$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
+	$(do-install-program)
+
+$(objpfx)gconv-modules-s390: ../sysdeps/s390/gconv-modules-s390.awk gconv-modules
+	${AWK} -f $^ > $@
+
+GCONV_MODULES = gconv-modules-s390
+
+endif
diff --git a/sysdeps/s390/iso-8859-1_cp037_z900.c b/sysdeps/s390/iso-8859-1_cp037_z900.c
new file mode 100644
index 0000000..fc25dff
--- /dev/null
+++ b/sysdeps/s390/iso-8859-1_cp037_z900.c
@@ -0,0 +1,262 @@
+/* Conversion between ISO 8859-1 and IBM037.
+
+   This module uses the translate instruction.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+
+// conversion table from ISO-8859-1 to IBM037
+static const unsigned char table_iso8859_1_to_cp037[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
+  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
+  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
+  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
+  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
+  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
+  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
+  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
+  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
+  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
+  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
+  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
+  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
+  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
+  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
+  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
+  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
+  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
+  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
+  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
+  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
+  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
+  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
+  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
+  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
+  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
+  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
+  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
+  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
+  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
+  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
+  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
+  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
+  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
+  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
+  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
+  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
+  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
+  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
+  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
+  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
+  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
+  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
+  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
+  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
+  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
+  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
+  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
+  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
+  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
+  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
+  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
+  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
+  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
+  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
+};
+
+// conversion table from IBM037 to ISO-8859-1
+static const unsigned char table_cp037_iso8859_1[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
+  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
+  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
+  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
+  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
+  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
+  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
+  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
+  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
+  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
+  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
+  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
+  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
+  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
+  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
+  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
+  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
+  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
+  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
+  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
+  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
+  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
+  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
+  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
+  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
+  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
+  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
+  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
+  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
+  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
+  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
+  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
+  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
+  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
+  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
+  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
+  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
+  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
+  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
+  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
+  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
+  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
+  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
+  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
+  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
+  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
+  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
+  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
+  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
+  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
+  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
+  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
+  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
+  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
+  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
+};
+
+/* Definitions used in the body of the `gconv' function.  */
+#define CHARSET_NAME		"ISO-8859-1//"
+#define FROM_LOOP		iso8859_1_to_cp037_z900
+#define TO_LOOP			cp037_to_iso8859_1_z900
+#define DEFINE_INIT		1
+#define DEFINE_FINI		1
+#define MIN_NEEDED_FROM		1
+#define MIN_NEEDED_TO		1
+
+# if defined __s390x__
+#  define BRANCH_ON_COUNT(REG,LBL) "brctg %" #REG "," #LBL "\n\t"
+# else
+#  define BRANCH_ON_COUNT(REG,LBL) "brct %" #REG "," #LBL "\n\t"
+# endif
+
+#define TR_LOOP(TABLE)							\
+  {									\
+    size_t length = (inend - inptr < outend - outptr			\
+		     ? inend - inptr : outend - outptr);		\
+									\
+    /* Process in 256 byte blocks.  */					\
+    if (__builtin_expect (length >= 256, 0))				\
+      {									\
+	size_t blocks = length / 256;					\
+	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
+			     "   tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
+			     "   la %[R_IN],256(%[R_IN])\n\t"		\
+			     "   la %[R_OUT],256(%[R_OUT])\n\t"		\
+			     BRANCH_ON_COUNT ([R_LI], 0b)		\
+			     : /* outputs */ [R_IN] "+a" (inptr)	\
+			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
+			     : /* inputs */ [R_TBL] "a" (TABLE)		\
+			     : /* clobber list */ "memory"		\
+			     );						\
+	length = length % 256;						\
+      }									\
+									\
+    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
+    if (length >= 8)							\
+      {									\
+	size_t blocks = length / 8;					\
+	for (int i = 0; i < blocks; i++)				\
+	  {								\
+	    outptr[0] = TABLE[inptr[0]];				\
+	    outptr[1] = TABLE[inptr[1]];				\
+	    outptr[2] = TABLE[inptr[2]];				\
+	    outptr[3] = TABLE[inptr[3]];				\
+	    outptr[4] = TABLE[inptr[4]];				\
+	    outptr[5] = TABLE[inptr[5]];				\
+	    outptr[6] = TABLE[inptr[6]];				\
+	    outptr[7] = TABLE[inptr[7]];				\
+	    inptr += 8;							\
+	    outptr += 8;						\
+	  }								\
+	length = length % 8;						\
+      }									\
+									\
+    /* Process remaining 0...7 bytes.  */				\
+    switch (length)							\
+      {									\
+      case 7: outptr[6] = TABLE[inptr[6]];				\
+      case 6: outptr[5] = TABLE[inptr[5]];				\
+      case 5: outptr[4] = TABLE[inptr[4]];				\
+      case 4: outptr[3] = TABLE[inptr[3]];				\
+      case 3: outptr[2] = TABLE[inptr[2]];				\
+      case 2: outptr[1] = TABLE[inptr[1]];				\
+      case 1: outptr[0] = TABLE[inptr[0]];				\
+      case 0: break;							\
+      }									\
+    inptr += length;							\
+    outptr += length;							\
+  }
+
+
+/* First define the conversion function from ISO 8859-1 to CP037.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
+
+#include <iconv/loop.c>
+
+
+/* Next, define the conversion function from CP037 to ISO 8859-1.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define LOOPFCT			TO_LOOP
+#define BODY			TR_LOOP (table_cp037_iso8859_1);
+
+#include <iconv/loop.c>
+
+
+/* Now define the toplevel functions.  */
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 094b1e9..0a50514 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -9,39 +9,3 @@ CFLAGS-rtld.c += -Wno-uninitialized -Wno-unused
 CFLAGS-dl-load.c += -Wno-unused
 CFLAGS-dl-reloc.c += -Wno-unused
 endif
-
-ifeq ($(subdir),iconvdata)
-ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
-ISO-8859-1_CP037_Z900-map := gconv.map
-
-UTF8_UTF32_Z9-routines := utf8-utf32-z9
-UTF8_UTF32_Z9-map := gconv.map
-
-UTF16_UTF32_Z9-routines := utf16-utf32-z9
-UTF16_UTF32_Z9-map := gconv.map
-
-UTF8_UTF16_Z9-routines := utf8-utf16-z9
-UTF8_UTF16_Z9-map := gconv.map
-
-s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
-
-extra-modules-left += $(s390x-iconv-modules)
-include extra-module.mk
-
-cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
-lib := iconvdata
-include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
-
-extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
-install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
-
-$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
-$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
-	$(do-install-program)
-
-$(objpfx)gconv-modules-s390: ../sysdeps/s390/gconv-modules-s390.awk gconv-modules
-	${AWK} -f $^ > $@
-
-GCONV_MODULES = gconv-modules-s390
-
-endif
diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
deleted file mode 100644
index 3b63e6a..0000000
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Conversion between ISO 8859-1 and IBM037.
-
-   This module uses the translate instruction.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-
-// conversion table from ISO-8859-1 to IBM037
-static const unsigned char table_iso8859_1_to_cp037[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
-  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
-  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
-  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
-  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
-  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
-  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
-  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
-  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
-  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
-  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
-  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
-  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
-  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
-  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
-  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
-  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
-  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
-  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
-  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
-  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
-  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
-  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
-  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
-  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
-  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
-  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
-  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
-  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
-  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
-  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
-  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
-  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
-  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
-  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
-  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
-  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
-  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
-  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
-  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
-  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
-  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
-  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
-  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
-  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
-  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
-  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
-  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
-  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
-  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
-  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
-  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
-  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
-  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
-  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
-};
-
-// conversion table from IBM037 to ISO-8859-1
-static const unsigned char table_cp037_iso8859_1[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
-  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
-  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
-  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
-  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
-  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
-  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
-  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
-  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
-  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
-  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
-  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
-  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
-  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
-  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
-  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
-  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
-  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
-  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
-  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
-  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
-  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
-  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
-  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
-  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
-  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
-  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
-  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
-  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
-  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
-  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
-  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
-  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
-  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
-  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
-  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
-  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
-  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
-  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
-  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
-  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
-  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
-  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
-  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
-  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
-  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
-  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
-  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
-  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
-  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
-  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
-  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
-  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
-  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
-  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
-};
-
-/* Definitions used in the body of the `gconv' function.  */
-#define CHARSET_NAME		"ISO-8859-1//"
-#define FROM_LOOP		iso8859_1_to_cp037_z900
-#define TO_LOOP			cp037_to_iso8859_1_z900
-#define DEFINE_INIT		1
-#define DEFINE_FINI		1
-#define MIN_NEEDED_FROM		1
-#define MIN_NEEDED_TO		1
-
-#define TR_LOOP(TABLE)							\
-  {									\
-    size_t length = (inend - inptr < outend - outptr			\
-		     ? inend - inptr : outend - outptr);		\
-									\
-    /* Process in 256 byte blocks.  */					\
-    if (__builtin_expect (length >= 256, 0))				\
-      {									\
-	size_t blocks = length / 256;					\
-	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
-			     "   tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
-			     "   la %[R_IN],256(%[R_IN])\n\t"		\
-			     "   la %[R_OUT],256(%[R_OUT])\n\t"		\
-			     "   brctg %[R_LI],0b\n\t"			\
-			     : /* outputs */ [R_IN] "+a" (inptr)	\
-			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
-			     : /* inputs */ [R_TBL] "a" (TABLE)		\
-			     : /* clobber list */ "memory"		\
-			     );						\
-	length = length % 256;						\
-      }									\
-									\
-    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
-    if (length >= 8)							\
-      {									\
-	size_t blocks = length / 8;					\
-	for (int i = 0; i < blocks; i++)				\
-	  {								\
-	    outptr[0] = TABLE[inptr[0]];				\
-	    outptr[1] = TABLE[inptr[1]];				\
-	    outptr[2] = TABLE[inptr[2]];				\
-	    outptr[3] = TABLE[inptr[3]];				\
-	    outptr[4] = TABLE[inptr[4]];				\
-	    outptr[5] = TABLE[inptr[5]];				\
-	    outptr[6] = TABLE[inptr[6]];				\
-	    outptr[7] = TABLE[inptr[7]];				\
-	    inptr += 8;							\
-	    outptr += 8;						\
-	  }								\
-	length = length % 8;						\
-      }									\
-									\
-    /* Process remaining 0...7 bytes.  */				\
-    switch (length)							\
-      {									\
-      case 7: outptr[6] = TABLE[inptr[6]];				\
-      case 6: outptr[5] = TABLE[inptr[5]];				\
-      case 5: outptr[4] = TABLE[inptr[4]];				\
-      case 4: outptr[3] = TABLE[inptr[3]];				\
-      case 3: outptr[2] = TABLE[inptr[2]];				\
-      case 2: outptr[1] = TABLE[inptr[1]];				\
-      case 1: outptr[0] = TABLE[inptr[0]];				\
-      case 0: break;							\
-      }									\
-    inptr += length;							\
-    outptr += length;							\
-  }
-
-
-/* First define the conversion function from ISO 8859-1 to CP037.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
-#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
-
-#include <iconv/loop.c>
-
-
-/* Next, define the conversion function from CP037 to ISO 8859-1.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
-#define BODY			TR_LOOP (table_cp037_iso8859_1);
-
-#include <iconv/loop.c>
-
-
-/* Now define the toplevel functions.  */
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
deleted file mode 100644
index 61d0a94..0000000
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ /dev/null
@@ -1,624 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM_UTF32               0x0000feffu
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16               0xfeff
-
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		2
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf16_loop
-#define TO_LOOP			__to_utf16_loop
-#define FROM_DIRECTION		(dir == from_utf16)
-#define ONE_DIRECTION           0
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf16,
-  from_utf16
-};
-
-struct utf16_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf16_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf16;
-    }
-  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
-	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf16;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf16)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      if (dir == to_utf16)						\
-	{								\
-	  /* Emit the UTF-16 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 2 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put16u (outbuf, BOM_UTF16);					\
-	  outbuf += 2;							\
-	}								\
-      else								\
-	{								\
-	  /* Emit the UTF-32 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 4 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put32u (outbuf, BOM_UTF32);					\
-	  outbuf += 4;							\
-	}								\
-    }
-
-/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_FROM_C							\
-  {									\
-    uint16_t u1 = get16 (inptr);					\
-									\
-    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
-      {									\
-	/* No surrogate.  */						\
-	put32 (outptr, u1);						\
-	inptr += 2;							\
-      }									\
-    else								\
-      {									\
-	/* An isolated low-surrogate was found.  This has to be         \
-	   considered ill-formed.  */					\
-	if (__glibc_unlikely (u1 >= 0xdc00))				\
-	  {								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	/* It's a surrogate character.  At least the first word says	\
-	   it is.  */							\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    /* We don't have enough input for another complete input	\
-	       character.  */						\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	uint16_t u2 = get16 (inptr);					\
-	if (__builtin_expect (u2 < 0xdc00, 0)				\
-	    || __builtin_expect (u2 > 0xdfff, 0))			\
-	  {								\
-	    /* This is no valid second word for a surrogate.  */	\
-	    inptr -= 2;							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-									\
-	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
-	inptr += 2;							\
-      }									\
-    outptr += 4;							\
-  }
-
-#define BODY_FROM_VX							\
-  {									\
-    size_t inlen = inend - inptr;					\
-    size_t outlen = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "    larl %[R_TMP],9f\n\t"				\
-		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
-		  "0:  clgijl %[R_INLEN],16,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
-		  /* Check for surrogate chars.  */			\
-		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t"					\
-		  /* Enlarge to UTF-32.  */				\
-		  "    vuplhh %%v17,%%v16\n\t"				\
-		  "    la %[R_IN],16(%[R_IN])\n\t"			\
-		  "    vupllh %%v18,%%v16\n\t"				\
-		  "    aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
-		  "    aghi %[R_OUTLEN],-32\n\t"			\
-		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],16,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
-		  "9:  .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
-		  "    .short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
-		  /* At least on uint16_t is in range of surrogates.	\
-		     Store the preceding chars.  */			\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "    vuplhh %%v17,%%v16\n\t"				\
-		  "    sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 12f\n\t"					\
-		  "    vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "    vupllh %%v18,%%v16\n\t"				\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11: \n\t" /* Update pointers.  */			\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "12: lghi %[R_TMP2],16\n\t"				\
-		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
-		  "    srl %[R_TMP2],1\n\t"				\
-		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    aghi %[R_OUTLEN],-4\n\t"				\
-		  "    j 16f\n\t"					\
-		  /* Handle remaining bytes.  */			\
-		  "2:  \n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "    clgfi %[R_INLEN],1\n\t"				\
-		  "    je 97f\n\t" /* Only one byte available.  */	\
-		  "    jl 99f\n\t" /* End if no bytes available.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle remaining uint16_t values.  */		\
-		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    slgfi %[R_OUTLEN],4\n\t"				\
-		  "    jl 96f \n\t"					\
-		  "    clfi %[R_TMP],0xd800\n\t"			\
-		  "    jhe 15f\n\t"					\
-		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-2\n\t"				\
-		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],13b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Handle UTF-16 surrogate pair.  */			\
-		  "15: clfi %[R_TMP],0xdfff\n\t"			\
-		  "    jh 14b\n\t" /* Jump away if ch > 0xdfff.  */	\
-		  "16: clfi %[R_TMP],0xdc00\n\t"			\
-		  "    jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
-		  "    slgfi %[R_INLEN],4\n\t"				\
-		  "    jl 97f\n\t" /* Big enough input?  */		\
-		  "    llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "    slfi %[R_TMP],0xd7c0\n\t"			\
-		  "    sll %[R_TMP],10\n\t"				\
-		  "    risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
-		  "    nilf %[R_TMP3],0xfc00\n\t"			\
-		  "    clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "    jne 98f\n\t"					\
-		  "    st %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],4(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
-		  "    aghi %[R_TMP2],-2\n\t"				\
-		  "    jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  "96: \n\t" /* Return full output.  */			\
-		  "    lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
-		  "    j 99f\n\t"					\
-		  "97: \n\t" /* Return incomplete input.  */		\
-		  "    lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
-		  "    j 99f\n\t"					\
-		  "98:\n\t" /* Return Illegal character.  */		\
-		  "    lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "99:\n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
-  }
-
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__from_utf16_loop_c
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf16_loop_c)
-__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
-__from_utf16_loop;
-
-static void *
-__from_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf16_loop_vx;
-  else
-    return __from_utf16_loop_c;
-}
-
-strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
-#else
-# define LOOPFCT		FROM_LOOP
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-#endif
-
-/* Conversion from UTF-32 internal/BE to UTF-16.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_TO_C							\
-  {									\
-    uint32_t c = get32 (inptr);						\
-									\
-    if (__builtin_expect (c <= 0xd7ff, 1)				\
-	|| (c >=0xdc00 && c <= 0xffff))					\
-      {									\
-	/* Two UTF-16 chars.  */					\
-	put16 (outptr, c);						\
-      }									\
-    else if (__builtin_expect (c >= 0x10000, 1)				\
-	     && __builtin_expect (c <= 0x10ffff, 1))			\
-      {									\
-	/* Four UTF-16 chars.  */					\
-	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
-	uint16_t out;							\
-									\
-	/* Generate a surrogate character.  */				\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	out = 0xd800;							\
-	out |= (zabcd & 0xff) << 6;					\
-	out |= (c >> 10) & 0x3f;					\
-	put16 (outptr, out);						\
-	outptr += 2;							\
-									\
-	out = 0xdc00;							\
-	out |= c & 0x3ff;						\
-	put16 (outptr, out);						\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
-      }									\
-    outptr += 2;							\
-    inptr += 4;								\
-  }
-
-#define BODY_TO_ETF3EH							\
-  {									\
-    HARDWARE_CONVERT ("cu42 %0, %1");					\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "    larl %[R_TMP],9f\n\t"				\
-		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars			\
-		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
-		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP2],0\n\t"				\
-		  /* Shorten to UTF-16.  */				\
-		  "    vpkf %%v18,%%v16,%%v17\n\t"			\
-		  /* Check for surrogate chars.  */			\
-		  "    vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t"					\
-		  "    vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"		\
-		  "    jno 11f\n\t"					\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "    vst %%v18,0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],32(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-32\n\t"				\
-		  "    aghi %[R_OUTLEN],-16\n\t"			\
-		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
-		     and check for ch >= 0x10000. (v30, v31)  */	\
-		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
-		  "    .long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
-		  /* At least on UTF32 char is in range of surrogates.	\
-		     Store the preceding characters.  */		\
-		  "11: ahi %[R_TMP2],16\n\t"				\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
-		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 20f\n\t"					\
-		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handles UTF16 surrogates with convert instruction.  */ \
-		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_TO_VX
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf16_loop_c)
-__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
-__to_utf16_loop;
-
-static void *
-__to_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf16_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf16_loop_etf3eh;
-  else
-    return __to_utf16_loop_c;
-}
-
-strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
deleted file mode 100644
index 7520ef2..0000000
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ /dev/null
@@ -1,806 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		2
-#define MAX_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	0xfeff
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the UTF-16 Byte Order Mark.  */				\
-      if (__glibc_unlikely (outbuf + 2 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put16u (outbuf, BOM_UTF16);					\
-      outbuf += 2;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-16.  */
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
-		  "    vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
-		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
-				       UTF8 chars.  */			\
-		  /* Enlarge to UTF-16.  */				\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    la %[R_IN],16(%[R_IN])\n\t"			\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "    vstm %%v18,%%v19,0(%[R_OUT])\n\t"		\
-		  "    aghi %[R_OUTLEN],-32\n\t"			\
-		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  "10:\n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
-		  "    sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest \
-							 index to store. */ \
-		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "    ahi %[R_TMP2],-1\n\t"				\
-		  "    jl 20f\n\t"					\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11: \n\t" /* Update pointers.  */			\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-
-/* The software implementation is based on the code in gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint16_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0	\
-	       or 0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	if (cnt == 4)							\
-	  {								\
-	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
-	       low) are needed.  */					\
-	    uint16_t zabcd, high, low;					\
-									\
-	    if (__glibc_unlikely (outptr + 4 > outend))			\
-	      {								\
-		/* Overflow in the output buffer.  */			\
-		result = __GCONV_FULL_OUTPUT;				\
-		break;							\
-	      }								\
-									\
-	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		if ((inptr[i] & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  goto errout;						\
-	      }								\
-									\
-	    /* See Principles of Operations cu12.  */			\
-	    zabcd = (((inptr[0] & 0x7) << 2) |				\
-		     ((inptr[1] & 0x30) >> 4)) - 1;			\
-									\
-	    /* z-bit must be zero after subtracting 1.  */		\
-	    if (zabcd & 0x10)						\
-	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
-									\
-	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
-	    high |= zabcd << 6;                         /* abcd bits */	\
-	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
-	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
-									\
-	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
-	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
-	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
-	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
-									\
-	    put16 (outptr, high);					\
-	    outptr += 2;						\
-	    put16 (outptr, low);					\
-	    outptr += 2;						\
-	    inptr += 4;							\
-	    continue;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Read the possible remaining bytes.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		uint16_t byte = inptr[i];				\
-									\
-		if ((byte & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  break;						\
-									\
-		ch <<= 6;						\
-		ch |= byte & 0x3f;					\
-	      }								\
-									\
-	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
-	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
-	       have been represented with fewer than cnt bytes.  */	\
-	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
-		/* Do not accept UTF-16 surrogates.  */			\
-		|| (ch >= 0xd800 && ch <= 0xdfff))			\
-	      {								\
-		/* This is an illegal encoding.  */			\
-		goto errout;						\
-	      }								\
-									\
-	    inptr += cnt;						\
-	  }								\
-      }									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint16_t *) outptr) = ch;					\
-    outptr += sizeof (uint16_t);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-/* Conversion from UTF-16 to UTF-8.  */
-
-/* The software routine is based on the functionality of the S/390
-   hardware instruction (cu21) as described in the Principles of
-   Operation.  */
-#define BODY_TO_C							\
-  {									\
-    uint16_t c = get16 (inptr);						\
-									\
-    if (__glibc_likely (c <= 0x007f))					\
-      {									\
-	/* Single byte UTF-8 char.  */					\
-	*outptr = c & 0xff;						\
-	outptr++;							\
-      }									\
-    else if (c >= 0x0080 && c <= 0x07ff)				\
-      {									\
-	/* Two byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 2 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	outptr[0] = 0xc0;						\
-	outptr[0] |= c >> 6;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= c & 0x3f;						\
-									\
-	outptr += 2;							\
-      }									\
-    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
-      {									\
-	/* Three byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 3 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	outptr[0] = 0xe0;						\
-	outptr[0] |= c >> 12;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (c >> 6) & 0x3f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= c & 0x3f;						\
-									\
-	outptr += 3;							\
-      }									\
-    else if (c >= 0xd800 && c <= 0xdbff)				\
-      {									\
-	/* Four byte UTF-8 char.  */					\
-	uint16_t low, uvwxy;						\
-									\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	low = get16 (inptr);						\
-									\
-	if ((low & 0xfc00) != 0xdc00)					\
-	  {								\
-	    inptr -= 2;							\
-	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	uvwxy = ((c >> 6) & 0xf) + 1;					\
-	outptr[0] = 0xf0;						\
-	outptr[0] |= uvwxy >> 2;					\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (uvwxy << 4) & 0x30;				\
-	outptr[1] |= (c >> 2) & 0x0f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= (c & 0x03) << 4;					\
-	outptr[2] |= (low >> 6) & 0x0f;					\
-									\
-	outptr[3] = 0x80;						\
-	outptr[3] |= low & 0x3f;					\
-									\
-	outptr += 4;							\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-      }									\
-    inptr += 2;								\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    size_t inlen  = inend - inptr;					\
-    size_t outlen  = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for values <= 0x7f.  */		\
-		  "    larl %[R_TMP],9f\n\t"				\
-		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
-		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP2],0\n\t"				\
-		  /* Check for > 1byte UTF-8 chars.  */			\
-		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
-				       UTF8 chars.  */			\
-		  "    vstrchs %%v19,%%v17,%%v30,%%v31\n\t"		\
-		  "    jno 11f\n\t" /* Jump away if not all bytes are 1byte \
-				       UTF8 chars.  */			\
-		  /* Shorten to UTF-8.  */				\
-		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
-		  "    la %[R_IN],32(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-32\n\t"				\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "    vst %%v18,0(%[R_OUT])\n\t"			\
-		  "    aghi %[R_OUTLEN],-16\n\t"			\
-		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],32,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
-		  "9:  .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  "    .short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
-		  "10:\n\t"						\
-		  "    vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  /* Shorten to UTF-8.  */				\
-		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
-		  "    ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
-		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 13f\n\t"					\
-		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  "13: \n\t"						\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "    lghi %[R_TMP2],16\n\t"				\
-		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
-		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-2\n\t"				\
-		  "    j 22f\n\t"					\
-		  /* Handle remaining bytes.  */			\
-		  "2:  \n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "    clgfi %[R_INLEN],1\n\t"				\
-		  "    locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
-		  "    jle 99f\n\t" /* End if less than two bytes.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle multibyte utf8-char. */			\
-		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-2\n\t"				\
-		  /* Test if ch is 1-byte UTF-8 char.  */		\
-		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
-		  /* Handle 1-byte UTF-8 char.  */			\
-		  "31: slgfi %[R_OUTLEN],1\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],20b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Test if ch is 2-byte UTF-8 char.  */		\
-		  "22: clfi %[R_TMP],0x7ff\n\t"				\
-		  "    jh 23f\n\t"					\
-		  /* Handle 2-byte UTF-8 char.  */			\
-		  "32: slgfi %[R_OUTLEN],2\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    llill %[R_TMP3],0xc080\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
-		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],20b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Test if ch is 3-byte UTF-8 char.  */		\
-		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
-		  "    jh 24f\n\t"					\
-		  /* Handle 3-byte UTF-8 char.  */			\
-		  "33: slgfi %[R_OUTLEN],3\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    llilf %[R_TMP3],0xe08080\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
-		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
-		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],20b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Test if ch is 4-byte UTF-8 char.  */		\
-		  "24: clfi %[R_TMP],0xdfff\n\t"			\
-		  "    jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */ \
-		  "    clfi %[R_TMP],0xdbff\n\t"			\
-		  "    locghih %[R_RES],%[RES_IN_ILL]\n\t"		\
-		  "    jh 99f\n\t" /* Jump away if this is a low surrogate \
-				      without a preceding high surrogate.  */ \
-		  /* Handle 4-byte UTF-8 char.  */			\
-		  "34: slgfi %[R_OUTLEN],4\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    slgfi %[R_INLEN],2\n\t"				\
-		  "    locghil %[R_RES],%[RES_IN_FULL]\n\t"		\
-		  "    jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
-		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
-		  "    aghi %[R_TMP],0x40\n\t"				\
-		  "    risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
-		  "    llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
-		  "    nilf %[R_TMP],0xfc00\n\t"			\
-		  "    clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "    locghine %[R_RES],%[RES_IN_ILL]\n\t"		\
-		  "    jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
-		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],4(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
-		  "    aghi %[R_TMP2],-2\n\t"				\
-		  "    jh 20b\n\t"					\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
-		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
-		  "99: \n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__to_utf8_loop_c
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate loop-function with software implementation.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY                   BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-#else
-# define LOOPFCT		TO_LOOP
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
deleted file mode 100644
index f9c9199..0000000
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ /dev/null
@@ -1,807 +0,0 @@
-/* Conversion between UTF-8 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		6
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM			0x0000feffu
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the Byte Order Mark.  */					\
-      if (__glibc_unlikely (outbuf + 4 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put32u (outbuf, BOM);						\
-      outbuf += 4;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
-
-#define STORE_REST_COMMON						      \
-  {									      \
-    /* We store the remaining bytes while converting them into the UCS4	      \
-       format.  We can assume that the first byte in the buffer is	      \
-       correct and that it requires a larger number of bytes than there	      \
-       are in the input buffer.  */					      \
-    wint_t ch = **inptrp;						      \
-    size_t cnt, r;							      \
-									      \
-    state->__count = inend - *inptrp;					      \
-									      \
-    assert (ch != 0xc0 && ch != 0xc1);					      \
-    if (ch >= 0xc2 && ch < 0xe0)					      \
-      {									      \
-	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
-	   0xc1, otherwise the wide character could have been		      \
-	   represented using a single byte.  */				      \
-	cnt = 2;							      \
-	ch &= 0x1f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
-      {									      \
-	/* We expect three bytes.  */					      \
-	cnt = 3;							      \
-	ch &= 0x0f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
-      {									      \
-	/* We expect four bytes.  */					      \
-	cnt = 4;							      \
-	ch &= 0x07;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-      {									      \
-	/* We expect five bytes.  */					      \
-	cnt = 5;							      \
-	ch &= 0x03;							      \
-      }									      \
-    else								      \
-      {									      \
-	/* We expect six bytes.  */					      \
-	cnt = 6;							      \
-	ch &= 0x01;							      \
-      }									      \
-									      \
-    /* The first byte is already consumed.  */				      \
-    r = cnt - 1;							      \
-    while (++(*inptrp) < inend)						      \
-      {									      \
-	ch <<= 6;							      \
-	ch |= **inptrp & 0x3f;						      \
-	--r;								      \
-      }									      \
-									      \
-    /* Shift for the so far missing bytes.  */				      \
-    ch <<= r * 6;							      \
-									      \
-    /* Store the number of bytes expected for the entire sequence.  */	      \
-    state->__count |= cnt << 8;						      \
-									      \
-    /* Store the value.  */						      \
-    state->__value.__wch = ch;						      \
-  }
-
-#define UNPACK_BYTES_COMMON \
-  {									      \
-    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
-    wint_t wch = state->__value.__wch;					      \
-    size_t ntotal = state->__count >> 8;				      \
-									      \
-    inlen = state->__count & 255;					      \
-									      \
-    bytebuf[0] = inmask[ntotal - 2];					      \
-									      \
-    do									      \
-      {									      \
-	if (--ntotal < inlen)						      \
-	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
-	wch >>= 6;							      \
-      }									      \
-    while (ntotal > 1);							      \
-									      \
-    bytebuf[0] |= wch;							      \
-  }
-
-#define CLEAR_STATE_COMMON \
-  state->__count = 0
-
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
-
-
-/* The software routine is copied from gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint32_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
-	       0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	/* Read the possible remaining bytes.  */			\
-	for (i = 1; i < cnt; ++i)					\
-	  {								\
-	    uint32_t byte = inptr[i];					\
-									\
-	    if ((byte & 0xc0) != 0x80)					\
-	      /* This is an illegal encoding.  */			\
-	      break;							\
-									\
-	    ch <<= 6;							\
-	    ch |= byte & 0x3f;						\
-	  }								\
-									\
-	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
-	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
-	   have been represented with fewer than cnt bytes.  */		\
-	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
-	    /* Do not accept UTF-16 surrogates.  */			\
-	    || (ch >= 0xd800 && ch <= 0xdfff)				\
-	    || (ch > 0x10ffff))						\
-	  {								\
-	    /* This is an illegal encoding.  */				\
-	    goto errout;						\
-	  }								\
-									\
-	inptr += cnt;							\
-      }									\
-									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint32_t *) outptr) = ch;					\
-    outptr += sizeof (uint32_t);					\
-  }
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
-		  "    vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
-				   UTF8 chars.  */			\
-		  /* Enlarge to UCS4.  */				\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    la %[R_IN],16(%[R_IN])\n\t"			\
-		  "    vuplhh %%v20,%%v18\n\t"				\
-		  "    aghi %[R_INLEN],-16\n\t"				\
-		  "    vupllh %%v21,%%v18\n\t"				\
-		  "    aghi %[R_OUTLEN],-64\n\t"			\
-		  "    vuplhh %%v22,%%v19\n\t"				\
-		  "    vupllh %%v23,%%v19\n\t"				\
-		  /* Store 64 bytes to buf_out.  */			\
-		  "    vstm %%v20,%%v23,0(%[R_OUT])\n\t"		\
-		  "    la %[R_OUT],64(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  "10: \n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
-		  "    sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest \
-						     index to store. */ \
-		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "    ahi %[R_TMP2],-1\n\t"				\
-		  "    jl 20f\n\t"					\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    vuplhh %%v20,%%v18\n\t"				\
-		  "    vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllh %%v21,%%v18\n\t"				\
-		  "    vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    vuplhh %%v22,%%v19\n\t"				\
-		  "    vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllh %%v23,%%v19\n\t"				\
-		  "    vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
-		  "11: \n\t"						\
-		  /* Update pointers.  */				\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
-		    ASM_CLOBBER_VR ("v31")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-/* These definitions apply to the UTF-8 to UTF-32 direction.  The
-   software implementation for UTF-8 still supports multibyte
-   characters up to 6 bytes whereas the hardware variant does not.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-
-# define LOOP_NEED_FLAGS
-
-# define STORE_REST		STORE_REST_COMMON
-# define UNPACK_BYTES		UNPACK_BYTES_COMMON
-# define CLEAR_STATE		CLEAR_STATE_COMMON
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-
-/* Conversion from UTF-32 internal/BE to UTF-8.  */
-#define BODY_TO_HW(ASM)							\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* The hardware routine uses the S/390 cu41 instruction.  */
-#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
-
-/* The hardware routine uses the S/390 vector and cu41 instructions.  */
-#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
-
-/* The software routine mimics the S/390 cu41 instruction.  */
-#define BODY_TO_C						\
-  {								\
-    uint32_t wc = *((const uint32_t *) inptr);			\
-								\
-    if (__glibc_likely (wc <= 0x7f))				\
-      {								\
-	/* Single UTF-8 char.  */				\
-	*outptr = (uint8_t)wc;					\
-	outptr++;						\
-      }								\
-    else if (wc <= 0x7ff)					\
-      {								\
-	/* Two UTF-8 chars.  */					\
-	if (__glibc_unlikely (outptr + 2 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-								\
-	outptr[0] = 0xc0;					\
-	outptr[0] |= wc >> 6;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= wc & 0x3f;					\
-								\
-	outptr += 2;						\
-      }								\
-    else if (wc <= 0xffff)					\
-      {								\
-	/* Three UTF-8 chars.  */				\
-	if (__glibc_unlikely (outptr + 3 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-	if (wc >= 0xd800 && wc < 0xdc00)			\
-	  {							\
-	    /* Do not accept UTF-16 surrogates.   */		\
-	    result = __GCONV_ILLEGAL_INPUT;			\
-	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	  }							\
-	outptr[0] = 0xe0;					\
-	outptr[0] |= wc >> 12;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= (wc >> 6) & 0x3f;				\
-								\
-	outptr[2] = 0x80;					\
-	outptr[2] |= wc & 0x3f;					\
-								\
-	outptr += 3;						\
-      }								\
-      else if (wc <= 0x10ffff)					\
-	{							\
-	  /* Four UTF-8 chars.  */				\
-	  if (__glibc_unlikely (outptr + 4 > outend))		\
-	    {							\
-	      /* Overflow in the output buffer.  */		\
-	      result = __GCONV_FULL_OUTPUT;			\
-	      break;						\
-	    }							\
-	  outptr[0] = 0xf0;					\
-	  outptr[0] |= wc >> 18;				\
-								\
-	  outptr[1] = 0x80;					\
-	  outptr[1] |= (wc >> 12) & 0x3f;			\
-								\
-	  outptr[2] = 0x80;					\
-	  outptr[2] |= (wc >> 6) & 0x3f;			\
-								\
-	  outptr[3] = 0x80;					\
-	  outptr[3] |= wc & 0x3f;				\
-								\
-	  outptr += 4;						\
-	}							\
-      else							\
-	{							\
-	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	}							\
-    inptr += 4;							\
-  }
-
-#define HW_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2;						\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "    vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
-		  "    vzero %%v21\n\t"					\
-		  "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
-		  "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
-		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
-		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP],0\n\t"				\
-		  /* Shorten to byte values.  */			\
-		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
-		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
-		  "    vpkh %%v23,%%v23,%%v24\n\t"			\
-		  /* Checking for values > 0x7f.  */			\
-		  "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
-		  "    jno 10f\n\t"					\
-		  "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
-		  "    jno 11f\n\t"					\
-		  "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
-		  "    jno 12f\n\t"					\
-		  "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
-		  "    jno 13f\n\t"					\
-		  /* Store 16bytes to outptr.  */			\
-		  "    vst %%v23,0(%[R_OUT])\n\t"			\
-		  "    aghi %[R_INLEN],-64\n\t"				\
-		  "    aghi %[R_OUTLEN],-16\n\t"			\
-		  "    la %[R_IN],64(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Found a value > 0x7f.  */				\
-		  "13: ahi %[R_TMP],4\n\t"				\
-		  "12: ahi %[R_TMP],4\n\t"				\
-		  "11: ahi %[R_TMP],4\n\t"				\
-		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
-		  "    srlg %[R_I],%[R_I],2\n\t"			\
-		  "    agr %[R_I],%[R_TMP]\n\t"				\
-		  "    je 20f\n\t"					\
-		  /* Store characters before invalid one...  */		\
-		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
-		  "15: aghi %[R_I],-1\n\t"				\
-		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
-		  /* ... and update pointers.  */			\
-		  "    aghi %[R_I],1\n\t"				\
-		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
-		  "    sllg %[R_I],%[R_I],2\n\t"			\
-		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
-		    , [R_I] "=a" (tmp2)					\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
-		    ASM_CLOBBER_VR ("v24")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_c
-#define BODY			BODY_TO_C
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY			BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf8_loop_etf3eh;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
new file mode 100644
index 0000000..8d42ab8
--- /dev/null
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -0,0 +1,636 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM_UTF32               0x0000feffu
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16               0xfeff
+
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		2
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf16_loop
+#define TO_LOOP			__to_utf16_loop
+#define FROM_DIRECTION		(dir == from_utf16)
+#define ONE_DIRECTION           0
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf16,
+  from_utf16
+};
+
+struct utf16_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf16_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf16;
+    }
+  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
+	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf16;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf16)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      if (dir == to_utf16)						\
+	{								\
+	  /* Emit the UTF-16 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 2 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put16u (outbuf, BOM_UTF16);					\
+	  outbuf += 2;							\
+	}								\
+      else								\
+	{								\
+	  /* Emit the UTF-32 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 4 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put32u (outbuf, BOM_UTF32);					\
+	  outbuf += 4;							\
+	}								\
+    }
+
+/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_FROM_C							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
+      {									\
+	/* No surrogate.  */						\
+	put32 (outptr, u1);						\
+	inptr += 2;							\
+      }									\
+    else								\
+      {									\
+	/* An isolated low-surrogate was found.  This has to be         \
+	   considered ill-formed.  */					\
+	if (__glibc_unlikely (u1 >= 0xdc00))				\
+	  {								\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	/* It's a surrogate character.  At least the first word says	\
+	   it is.  */							\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    /* We don't have enough input for another complete input	\
+	       character.  */						\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	uint16_t u2 = get16 (inptr);					\
+	if (__builtin_expect (u2 < 0xdc00, 0)				\
+	    || __builtin_expect (u2 > 0xdfff, 0))			\
+	  {								\
+	    /* This is no valid second word for a surrogate.  */	\
+	    inptr -= 2;							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+									\
+	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
+	inptr += 2;							\
+      }									\
+    outptr += 4;							\
+  }
+
+#define BODY_FROM_VX							\
+  {									\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
+		  "0:  clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  /* Enlarge to UTF-32.  */				\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
+		  "9:  .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  "    .short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least on uint16_t is in range of surrogates.	\
+		     Store the preceding chars.  */			\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 12f\n\t"					\
+		  "    vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "12: lghi %[R_TMP2],16\n\t"				\
+		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    srl %[R_TMP2],1\n\t"				\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_OUTLEN],-4\n\t"				\
+		  "    j 16f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    je 97f\n\t" /* Only one byte available.  */	\
+		  "    jl 99f\n\t" /* End if no bytes available.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle remaining uint16_t values.  */		\
+		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 96f \n\t"					\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    jhe 15f\n\t"					\
+		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],13b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Handle UTF-16 surrogate pair.  */			\
+		  "15: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 14b\n\t" /* Jump away if ch > 0xdfff.  */	\
+		  "16: clfi %[R_TMP],0xdc00\n\t"			\
+		  "    jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
+		  "    slgfi %[R_INLEN],4\n\t"				\
+		  "    jl 97f\n\t" /* Big enough input?  */		\
+		  "    llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    slfi %[R_TMP],0xd7c0\n\t"			\
+		  "    sll %[R_TMP],10\n\t"				\
+		  "    risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
+		  "    nilf %[R_TMP3],0xfc00\n\t"			\
+		  "    clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    jne 98f\n\t"					\
+		  "    st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "96: \n\t" /* Return full output.  */			\
+		  "    lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "97: \n\t" /* Return incomplete input.  */		\
+		  "    lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "98:\n\t" /* Return Illegal character.  */		\
+		  "    lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
+  }
+
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__from_utf16_loop_c
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf16_loop_c)
+__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
+__from_utf16_loop;
+
+static void *
+__from_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf16_loop_vx;
+  else
+    return __from_utf16_loop_c;
+}
+
+strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
+#else
+# define LOOPFCT		FROM_LOOP
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+#endif
+
+/* Conversion from UTF-32 internal/BE to UTF-16.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_TO_C							\
+  {									\
+    uint32_t c = get32 (inptr);						\
+									\
+    if (__builtin_expect (c <= 0xd7ff, 1)				\
+	|| (c >=0xdc00 && c <= 0xffff))					\
+      {									\
+	/* Two UTF-16 chars.  */					\
+	put16 (outptr, c);						\
+      }									\
+    else if (__builtin_expect (c >= 0x10000, 1)				\
+	     && __builtin_expect (c <= 0x10ffff, 1))			\
+      {									\
+	/* Four UTF-16 chars.  */					\
+	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
+	uint16_t out;							\
+									\
+	/* Generate a surrogate character.  */				\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	out = 0xd800;							\
+	out |= (zabcd & 0xff) << 6;					\
+	out |= (c >> 10) & 0x3f;					\
+	put16 (outptr, out);						\
+	outptr += 2;							\
+									\
+	out = 0xdc00;							\
+	out |= c & 0x3ff;						\
+	put16 (outptr, out);						\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    outptr += 2;							\
+    inptr += 4;								\
+  }
+
+#define BODY_TO_ETF3EH							\
+  {									\
+    HARDWARE_CONVERT ("cu42 %0, %1");					\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars			\
+		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
+		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Shorten to UTF-16.  */				\
+		  "    vpkf %%v18,%%v16,%%v17\n\t"			\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t"					\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
+		     and check for ch >= 0x10000. (v30, v31)  */	\
+		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
+		  "    .long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
+		  /* At least on UTF32 char is in range of surrogates.	\
+		     Store the preceding characters.  */		\
+		  "11: ahi %[R_TMP2],16\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 20f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handles UTF16 surrogates with convert instruction.  */ \
+		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_TO_VX
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf16_loop_c)
+__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
+__to_utf16_loop;
+
+static void *
+__to_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf16_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf16_loop_etf3eh;
+  else
+    return __to_utf16_loop_c;
+}
+
+strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
+
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf16-z9.c b/sysdeps/s390/utf8-utf16-z9.c
new file mode 100644
index 0000000..d3dc9bd
--- /dev/null
+++ b/sysdeps/s390/utf8-utf16-z9.c
@@ -0,0 +1,818 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define MAX_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16	0xfeff
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the UTF-16 Byte Order Mark.  */				\
+      if (__glibc_unlikely (outbuf + 2 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put16u (outbuf, BOM_UTF16);					\
+      outbuf += 2;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-16.  */
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Enlarge to UTF-16.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v18,%%v19,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest \
+							 index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+
+/* The software implementation is based on the code in gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint16_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0	\
+	       or 0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	if (cnt == 4)							\
+	  {								\
+	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
+	       low) are needed.  */					\
+	    uint16_t zabcd, high, low;					\
+									\
+	    if (__glibc_unlikely (outptr + 4 > outend))			\
+	      {								\
+		/* Overflow in the output buffer.  */			\
+		result = __GCONV_FULL_OUTPUT;				\
+		break;							\
+	      }								\
+									\
+	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		if ((inptr[i] & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  goto errout;						\
+	      }								\
+									\
+	    /* See Principles of Operations cu12.  */			\
+	    zabcd = (((inptr[0] & 0x7) << 2) |				\
+		     ((inptr[1] & 0x30) >> 4)) - 1;			\
+									\
+	    /* z-bit must be zero after subtracting 1.  */		\
+	    if (zabcd & 0x10)						\
+	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
+									\
+	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
+	    high |= zabcd << 6;                         /* abcd bits */	\
+	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
+	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
+									\
+	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
+	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
+	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
+	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
+									\
+	    put16 (outptr, high);					\
+	    outptr += 2;						\
+	    put16 (outptr, low);					\
+	    outptr += 2;						\
+	    inptr += 4;							\
+	    continue;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Read the possible remaining bytes.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		uint16_t byte = inptr[i];				\
+									\
+		if ((byte & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  break;						\
+									\
+		ch <<= 6;						\
+		ch |= byte & 0x3f;					\
+	      }								\
+									\
+	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
+	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+	       have been represented with fewer than cnt bytes.  */	\
+	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
+		/* Do not accept UTF-16 surrogates.  */			\
+		|| (ch >= 0xd800 && ch <= 0xdfff))			\
+	      {								\
+		/* This is an illegal encoding.  */			\
+		goto errout;						\
+	      }								\
+									\
+	    inptr += cnt;						\
+	  }								\
+      }									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint16_t *) outptr) = ch;					\
+    outptr += sizeof (uint16_t);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+/* Conversion from UTF-16 to UTF-8.  */
+
+/* The software routine is based on the functionality of the S/390
+   hardware instruction (cu21) as described in the Principles of
+   Operation.  */
+#define BODY_TO_C							\
+  {									\
+    uint16_t c = get16 (inptr);						\
+									\
+    if (__glibc_likely (c <= 0x007f))					\
+      {									\
+	/* Single byte UTF-8 char.  */					\
+	*outptr = c & 0xff;						\
+	outptr++;							\
+      }									\
+    else if (c >= 0x0080 && c <= 0x07ff)				\
+      {									\
+	/* Two byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 2 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	outptr[0] = 0xc0;						\
+	outptr[0] |= c >> 6;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= c & 0x3f;						\
+									\
+	outptr += 2;							\
+      }									\
+    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
+      {									\
+	/* Three byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 3 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	outptr[0] = 0xe0;						\
+	outptr[0] |= c >> 12;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (c >> 6) & 0x3f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= c & 0x3f;						\
+									\
+	outptr += 3;							\
+      }									\
+    else if (c >= 0xd800 && c <= 0xdbff)				\
+      {									\
+	/* Four byte UTF-8 char.  */					\
+	uint16_t low, uvwxy;						\
+									\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	low = get16 (inptr);						\
+									\
+	if ((low & 0xfc00) != 0xdc00)					\
+	  {								\
+	    inptr -= 2;							\
+	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	uvwxy = ((c >> 6) & 0xf) + 1;					\
+	outptr[0] = 0xf0;						\
+	outptr[0] |= uvwxy >> 2;					\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (uvwxy << 4) & 0x30;				\
+	outptr[1] |= (c >> 2) & 0x0f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= (c & 0x03) << 4;					\
+	outptr[2] |= (low >> 6) & 0x0f;					\
+									\
+	outptr[3] = 0x80;						\
+	outptr[3] |= low & 0x3f;					\
+									\
+	outptr += 4;							\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+      }									\
+    inptr += 2;								\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    size_t inlen  = inend - inptr;					\
+    size_t outlen  = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for values <= 0x7f.  */		\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Check for > 1byte UTF-8 chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  "    vstrchs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
+		  "9:  .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  "    .short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
+		  "10:\n\t"						\
+		  "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 13f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  "13: \n\t"						\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "    lghi %[R_TMP2],16\n\t"				\
+		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    j 22f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
+		  "    jle 99f\n\t" /* End if less than two bytes.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  /* Test if ch is 1-byte UTF-8 char.  */		\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 2-byte UTF-8 char.  */		\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "    jh 23f\n\t"					\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llill %[R_TMP3],0xc080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
+		  "    jh 24f\n\t"					\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xe08080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */ \
+		  "    clfi %[R_TMP],0xdbff\n\t"			\
+		  "    locghih %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jh 99f\n\t" /* Jump away if this is a low surrogate \
+				      without a preceding high surrogate.  */ \
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    slgfi %[R_INLEN],2\n\t"				\
+		  "    locghil %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
+		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "    aghi %[R_TMP],0x40\n\t"				\
+		  "    risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
+		  "    llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
+		  "    nilf %[R_TMP],0xfc00\n\t"			\
+		  "    clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    locghine %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 20b\n\t"					\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__to_utf8_loop_c
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate loop-function with software implementation.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY                   BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+#else
+# define LOOPFCT		TO_LOOP
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
new file mode 100644
index 0000000..e39e0a7
--- /dev/null
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -0,0 +1,820 @@
+/* Conversion between UTF-8 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		6
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM			0x0000feffu
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the Byte Order Mark.  */					\
+      if (__glibc_unlikely (outbuf + 4 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put32u (outbuf, BOM);						\
+      outbuf += 4;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
+
+#define STORE_REST_COMMON						      \
+  {									      \
+    /* We store the remaining bytes while converting them into the UCS4	      \
+       format.  We can assume that the first byte in the buffer is	      \
+       correct and that it requires a larger number of bytes than there	      \
+       are in the input buffer.  */					      \
+    wint_t ch = **inptrp;						      \
+    size_t cnt, r;							      \
+									      \
+    state->__count = inend - *inptrp;					      \
+									      \
+    assert (ch != 0xc0 && ch != 0xc1);					      \
+    if (ch >= 0xc2 && ch < 0xe0)					      \
+      {									      \
+	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
+	   0xc1, otherwise the wide character could have been		      \
+	   represented using a single byte.  */				      \
+	cnt = 2;							      \
+	ch &= 0x1f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+      {									      \
+	/* We expect three bytes.  */					      \
+	cnt = 3;							      \
+	ch &= 0x0f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+      {									      \
+	/* We expect four bytes.  */					      \
+	cnt = 4;							      \
+	ch &= 0x07;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
+      {									      \
+	/* We expect five bytes.  */					      \
+	cnt = 5;							      \
+	ch &= 0x03;							      \
+      }									      \
+    else								      \
+      {									      \
+	/* We expect six bytes.  */					      \
+	cnt = 6;							      \
+	ch &= 0x01;							      \
+      }									      \
+									      \
+    /* The first byte is already consumed.  */				      \
+    r = cnt - 1;							      \
+    while (++(*inptrp) < inend)						      \
+      {									      \
+	ch <<= 6;							      \
+	ch |= **inptrp & 0x3f;						      \
+	--r;								      \
+      }									      \
+									      \
+    /* Shift for the so far missing bytes.  */				      \
+    ch <<= r * 6;							      \
+									      \
+    /* Store the number of bytes expected for the entire sequence.  */	      \
+    state->__count |= cnt << 8;						      \
+									      \
+    /* Store the value.  */						      \
+    state->__value.__wch = ch;						      \
+  }
+
+#define UNPACK_BYTES_COMMON \
+  {									      \
+    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
+    wint_t wch = state->__value.__wch;					      \
+    size_t ntotal = state->__count >> 8;				      \
+									      \
+    inlen = state->__count & 255;					      \
+									      \
+    bytebuf[0] = inmask[ntotal - 2];					      \
+									      \
+    do									      \
+      {									      \
+	if (--ntotal < inlen)						      \
+	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
+	wch >>= 6;							      \
+      }									      \
+    while (ntotal > 1);							      \
+									      \
+    bytebuf[0] |= wch;							      \
+  }
+
+#define CLEAR_STATE_COMMON \
+  state->__count = 0
+
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
+
+
+/* The software routine is copied from gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint32_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
+	       0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	/* Read the possible remaining bytes.  */			\
+	for (i = 1; i < cnt; ++i)					\
+	  {								\
+	    uint32_t byte = inptr[i];					\
+									\
+	    if ((byte & 0xc0) != 0x80)					\
+	      /* This is an illegal encoding.  */			\
+	      break;							\
+									\
+	    ch <<= 6;							\
+	    ch |= byte & 0x3f;						\
+	  }								\
+									\
+	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
+	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
+	   have been represented with fewer than cnt bytes.  */		\
+	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
+	    /* Do not accept UTF-16 surrogates.  */			\
+	    || (ch >= 0xd800 && ch <= 0xdfff)				\
+	    || (ch > 0x10ffff))						\
+	  {								\
+	    /* This is an illegal encoding.  */				\
+	    goto errout;						\
+	  }								\
+									\
+	inptr += cnt;							\
+      }									\
+									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint32_t *) outptr) = ch;					\
+    outptr += sizeof (uint32_t);					\
+  }
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				   UTF8 chars.  */			\
+		  /* Enlarge to UCS4.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    aghi %[R_OUTLEN],-64\n\t"			\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  /* Store 64 bytes to buf_out.  */			\
+		  "    vstm %%v20,%%v23,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],64(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10: \n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest \
+						     index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  "    vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
+		  "11: \n\t"						\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+		    ASM_CLOBBER_VR ("v31")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+/* These definitions apply to the UTF-8 to UTF-32 direction.  The
+   software implementation for UTF-8 still supports multibyte
+   characters up to 6 bytes whereas the hardware variant does not.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+
+# define LOOP_NEED_FLAGS
+
+# define STORE_REST		STORE_REST_COMMON
+# define UNPACK_BYTES		UNPACK_BYTES_COMMON
+# define CLEAR_STATE		CLEAR_STATE_COMMON
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+
+/* Conversion from UTF-32 internal/BE to UTF-8.  */
+#define BODY_TO_HW(ASM)							\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* The hardware routine uses the S/390 cu41 instruction.  */
+#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
+
+/* The hardware routine uses the S/390 vector and cu41 instructions.  */
+#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
+
+/* The software routine mimics the S/390 cu41 instruction.  */
+#define BODY_TO_C						\
+  {								\
+    uint32_t wc = *((const uint32_t *) inptr);			\
+								\
+    if (__glibc_likely (wc <= 0x7f))				\
+      {								\
+	/* Single UTF-8 char.  */				\
+	*outptr = (uint8_t)wc;					\
+	outptr++;						\
+      }								\
+    else if (wc <= 0x7ff)					\
+      {								\
+	/* Two UTF-8 chars.  */					\
+	if (__glibc_unlikely (outptr + 2 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+								\
+	outptr[0] = 0xc0;					\
+	outptr[0] |= wc >> 6;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= wc & 0x3f;					\
+								\
+	outptr += 2;						\
+      }								\
+    else if (wc <= 0xffff)					\
+      {								\
+	/* Three UTF-8 chars.  */				\
+	if (__glibc_unlikely (outptr + 3 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+	if (wc >= 0xd800 && wc < 0xdc00)			\
+	  {							\
+	    /* Do not accept UTF-16 surrogates.   */		\
+	    result = __GCONV_ILLEGAL_INPUT;			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	  }							\
+	outptr[0] = 0xe0;					\
+	outptr[0] |= wc >> 12;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= (wc >> 6) & 0x3f;				\
+								\
+	outptr[2] = 0x80;					\
+	outptr[2] |= wc & 0x3f;					\
+								\
+	outptr += 3;						\
+      }								\
+      else if (wc <= 0x10ffff)					\
+	{							\
+	  /* Four UTF-8 chars.  */				\
+	  if (__glibc_unlikely (outptr + 4 > outend))		\
+	    {							\
+	      /* Overflow in the output buffer.  */		\
+	      result = __GCONV_FULL_OUTPUT;			\
+	      break;						\
+	    }							\
+	  outptr[0] = 0xf0;					\
+	  outptr[0] |= wc >> 18;				\
+								\
+	  outptr[1] = 0x80;					\
+	  outptr[1] |= (wc >> 12) & 0x3f;			\
+								\
+	  outptr[2] = 0x80;					\
+	  outptr[2] |= (wc >> 6) & 0x3f;			\
+								\
+	  outptr[3] = 0x80;					\
+	  outptr[3] |= wc & 0x3f;				\
+								\
+	  outptr += 4;						\
+	}							\
+      else							\
+	{							\
+	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	}							\
+    inptr += 4;							\
+  }
+
+#define HW_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2;						\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		  "    vzero %%v21\n\t"					\
+		  "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		  "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP],0\n\t"				\
+		  /* Shorten to byte values.  */			\
+		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		  "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		  /* Checking for values > 0x7f.  */			\
+		  "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		  "    jno 11f\n\t"					\
+		  "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		  "    jno 12f\n\t"					\
+		  "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		  "    jno 13f\n\t"					\
+		  /* Store 16bytes to outptr.  */			\
+		  "    vst %%v23,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_INLEN],-64\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_IN],64(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Found a value > 0x7f.  */				\
+		  "13: ahi %[R_TMP],4\n\t"				\
+		  "12: ahi %[R_TMP],4\n\t"				\
+		  "11: ahi %[R_TMP],4\n\t"				\
+		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		  "    srlg %[R_I],%[R_I],2\n\t"			\
+		  "    agr %[R_I],%[R_TMP]\n\t"				\
+		  "    je 20f\n\t"					\
+		  /* Store characters before invalid one...  */		\
+		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
+		  "15: aghi %[R_I],-1\n\t"				\
+		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		  /* ... and update pointers.  */			\
+		  "    aghi %[R_I],1\n\t"				\
+		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
+		  "    sllg %[R_I],%[R_I],2\n\t"			\
+		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
+		    , [R_I] "=a" (tmp2)					\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+		    ASM_CLOBBER_VR ("v24")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_c
+#define BODY			BODY_TO_C
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY			BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf8_loop_etf3eh;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+
+#include <iconv/skeleton.c>
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/14] S390: Optimize utf8-utf32 module.
  2016-02-23  9:22 ` [PATCH 07/14] S390: Optimize utf8-utf32 module Stefan Liebler
@ 2016-04-21 15:15   ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:15 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 30903 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> This patch reworks the s390 specific module to convert between utf8 and utf32.
> Now ifunc is used to choose either the c or etf3eh (with convert utf
> instruction) variants at runtime.
> Furthermore a new vector variant for z13 is introduced which will be build
> and chosen if vector support is available at build / runtime.
> The vector variants optimize input of 1byte utf8 characters. The convert utf
> instruction is used if a multibyte utf8 character is found.
>
> This patch also fixes some whitespace errors. The c variants are rejecting
> UTF-16 surrogates and values above 0x10ffff now.
> Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
> Before they ignored the ignore-case and always stopped at an error.
>
> ChangeLog:
>
> 	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Use ifunc to select c, etf3eh
> 	or new vector loop-variant.
> ---
>   sysdeps/s390/s390-64/utf8-utf32-z9.c | 664 +++++++++++++++++++++++++----------
>   1 file changed, 480 insertions(+), 184 deletions(-)
>
> diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
> index defd47d..e89dc70 100644
> --- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
> +++ b/sysdeps/s390/s390-64/utf8-utf32-z9.c
> @@ -30,35 +30,25 @@
>   #include <dl-procinfo.h>
>   #include <gconv.h>
>
> -/* UTF-32 big endian byte order mark.  */
> -#define BOM	                0x0000feffu
> +#if defined HAVE_S390_VX_GCC_SUPPORT
> +# define ASM_CLOBBER_VR(NR) , NR
> +#else
> +# define ASM_CLOBBER_VR(NR)
> +#endif
>
> +/* Defines for skeleton.c.  */
>   #define DEFINE_INIT		0
>   #define DEFINE_FINI		0
> -/* These definitions apply to the UTF-8 to UTF-32 direction.  The
> -   software implementation for UTF-8 still supports multibyte
> -   characters up to 6 bytes whereas the hardware variant does not.  */
>   #define MIN_NEEDED_FROM		1
>   #define MAX_NEEDED_FROM		6
>   #define MIN_NEEDED_TO		4
> -#define FROM_LOOP		from_utf8_loop
> -#define TO_LOOP			to_utf8_loop
> +#define FROM_LOOP		__from_utf8_loop
> +#define TO_LOOP			__to_utf8_loop
>   #define FROM_DIRECTION		(dir == from_utf8)
>   #define ONE_DIRECTION           0
> -#define PREPARE_LOOP							\
> -  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> -  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> -									\
> -  if (emit_bom && !data->__internal_use					\
> -      && data->__invocation_counter == 0)				\
> -    {									\
> -      /* Emit the Byte Order Mark.  */					\
> -      if (__glibc_unlikely (outbuf + 4 > outend))			      \
> -	return __GCONV_FULL_OUTPUT;					\
> -									\
> -      put32u (outbuf, BOM);						\
> -      outbuf += 4;							\
> -    }
> +
> +/* UTF-32 big endian byte order mark.  */
> +#define BOM			0x0000feffu
>
>   /* Direction of the transformation.  */
>   enum direction
> @@ -155,16 +145,16 @@ gconv_end (struct __gconv_step *data)
>       register unsigned long long outlen __asm__("11") = outend - outptr;	\
>       uint64_t cc = 0;							\
>   									\
> -    __asm__ volatile (".machine push       \n\t"			\
> -		      ".machine \"z9-109\" \n\t"			\
> -		      "0: " INSTRUCTION "  \n\t"			\
> -		      ".machine pop        \n\t"			\
> -		      "   jo     0b        \n\t"			\
> -		      "   ipm    %2        \n"				\
> -		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> -		      "+d" (outlen), "+d" (inlen)			\
> -		      :							\
> -		      : "cc", "memory");				\
> +    __asm__ __volatile__ (".machine push       \n\t"			\
> +			  ".machine \"z9-109\" \n\t"			\
> +			  "0: " INSTRUCTION "  \n\t"			\
> +			  ".machine pop        \n\t"			\
> +			  "   jo     0b        \n\t"			\
> +			  "   ipm    %2        \n"			\
> +			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> +			    "+d" (outlen), "+d" (inlen)			\
> +			  :						\
> +			  : "cc", "memory");				\
>   									\
>       inptr = pInput;							\
>       outptr = pOutput;							\
> @@ -173,49 +163,150 @@ gconv_end (struct __gconv_step *data)
>       if (cc == 1)							\
>         {									\
>   	result = __GCONV_FULL_OUTPUT;					\
> -	break;								\
>         }									\
>       else if (cc == 2)							\
>         {									\
>   	result = __GCONV_ILLEGAL_INPUT;					\
> -	break;								\
>         }									\
>     }
>
> +#define PREPARE_LOOP							\
> +  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> +  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> +									\
> +  if (emit_bom && !data->__internal_use					\
> +      && data->__invocation_counter == 0)				\
> +    {									\
> +      /* Emit the Byte Order Mark.  */					\
> +      if (__glibc_unlikely (outbuf + 4 > outend))			\
> +	return __GCONV_FULL_OUTPUT;					\
> +									\
> +      put32u (outbuf, BOM);						\
> +      outbuf += 4;							\
> +    }
> +
>   /* Conversion function from UTF-8 to UTF-32 internal/BE.  */
>
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define LOOPFCT			FROM_LOOP
> -/* The software routine is copied from gconv_simple.c.  */
> -#define BODY								\
> +#define STORE_REST_COMMON						      \
> +  {									      \
> +    /* We store the remaining bytes while converting them into the UCS4	      \
> +       format.  We can assume that the first byte in the buffer is	      \
> +       correct and that it requires a larger number of bytes than there	      \
> +       are in the input buffer.  */					      \
> +    wint_t ch = **inptrp;						      \
> +    size_t cnt, r;							      \
> +									      \
> +    state->__count = inend - *inptrp;					      \
> +									      \
> +    assert (ch != 0xc0 && ch != 0xc1);					      \
> +    if (ch >= 0xc2 && ch < 0xe0)					      \
> +      {									      \
> +	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
> +	   0xc1, otherwise the wide character could have been		      \
> +	   represented using a single byte.  */				      \
> +	cnt = 2;							      \
> +	ch &= 0x1f;							      \
> +      }									      \
> +    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
> +      {									      \
> +	/* We expect three bytes.  */					      \
> +	cnt = 3;							      \
> +	ch &= 0x0f;							      \
> +      }									      \
> +    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
> +      {									      \
> +	/* We expect four bytes.  */					      \
> +	cnt = 4;							      \
> +	ch &= 0x07;							      \
> +      }									      \
> +    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
> +      {									      \
> +	/* We expect five bytes.  */					      \
> +	cnt = 5;							      \
> +	ch &= 0x03;							      \
> +      }									      \
> +    else								      \
> +      {									      \
> +	/* We expect six bytes.  */					      \
> +	cnt = 6;							      \
> +	ch &= 0x01;							      \
> +      }									      \
> +									      \
> +    /* The first byte is already consumed.  */				      \
> +    r = cnt - 1;							      \
> +    while (++(*inptrp) < inend)						      \
> +      {									      \
> +	ch <<= 6;							      \
> +	ch |= **inptrp & 0x3f;						      \
> +	--r;								      \
> +      }									      \
> +									      \
> +    /* Shift for the so far missing bytes.  */				      \
> +    ch <<= r * 6;							      \
> +									      \
> +    /* Store the number of bytes expected for the entire sequence.  */	      \
> +    state->__count |= cnt << 8;						      \
> +									      \
> +    /* Store the value.  */						      \
> +    state->__value.__wch = ch;						      \
> +  }
> +
> +#define UNPACK_BYTES_COMMON \
> +  {									      \
> +    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
> +    wint_t wch = state->__value.__wch;					      \
> +    size_t ntotal = state->__count >> 8;				      \
> +									      \
> +    inlen = state->__count & 255;					      \
> +									      \
> +    bytebuf[0] = inmask[ntotal - 2];					      \
> +									      \
> +    do									      \
> +      {									      \
> +	if (--ntotal < inlen)						      \
> +	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
> +	wch >>= 6;							      \
> +      }									      \
> +    while (ntotal > 1);							      \
> +									      \
> +    bytebuf[0] |= wch;							      \
> +  }
> +
> +#define CLEAR_STATE_COMMON \
> +  state->__count = 0
> +
> +#define BODY_FROM_HW(ASM)						\
>     {									\
> -    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
> -      {									\
> -	HARDWARE_CONVERT ("cu14 %0, %1, 1");				\
> +    ASM;								\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
>   									\
> -	if (inptr != inend)						\
> -	  {								\
> -	    int i;							\
> -	    for (i = 1; inptr + i < inend; ++i)				\
> -	      if ((inptr[i] & 0xc0) != 0x80)				\
> -		break;							\
> +    int i;								\
> +    for (i = 1; inptr + i < inend && i < 5; ++i)			\
> +      if ((inptr[i] & 0xc0) != 0x80)					\
> +	break;								\
>   									\
> -	    if (__glibc_likely (inptr + i == inend))			      \
> -	      {								\
> -		result = __GCONV_INCOMPLETE_INPUT;			\
> -		break;							\
> -	      }								\
> -	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
> -	  }								\
> -	continue;							\
> +    if (__glibc_likely (inptr + i == inend				\
> +			&& result == __GCONV_EMPTY_INPUT))		\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
>         }									\
> -									\
> +    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
> +  }
> +
> +/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
> +#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
> +
> +
> +/* The software routine is copied from gconv_simple.c.  */
> +#define BODY_FROM_C							\
> +  {									\
>       /* Next input byte.  */						\
>       uint32_t ch = *inptr;						\
>   									\
> -    if (__glibc_likely (ch < 0x80))					      \
> +    if (__glibc_likely (ch < 0x80))					\
>         {									\
>   	/* One byte sequence.  */					\
>   	++inptr;							\
> @@ -233,30 +324,18 @@ gconv_end (struct __gconv_step *data)
>   	    cnt = 2;							\
>   	    ch &= 0x1f;							\
>   	  }								\
> -        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
> +	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
>   	  {								\
>   	    /* We expect three bytes.  */				\
>   	    cnt = 3;							\
>   	    ch &= 0x0f;							\
>   	  }								\
> -	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
> +	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
>   	  {								\
>   	    /* We expect four bytes.  */				\
>   	    cnt = 4;							\
>   	    ch &= 0x07;							\
>   	  }								\
> -	else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
> -	  {								\
> -	    /* We expect five bytes.  */				\
> -	    cnt = 5;							\
> -	    ch &= 0x03;							\
> -	  }								\
> -	else if (__glibc_likely ((ch & 0xfe) == 0xfc))			      \
> -	  {								\
> -	    /* We expect six bytes.  */					\
> -	    cnt = 6;							\
> -	    ch &= 0x01;							\
> -	  }								\
>   	else								\
>   	  {								\
>   	    /* Search the end of this ill-formed UTF-8 character.  This	\
> @@ -272,7 +351,7 @@ gconv_end (struct __gconv_step *data)
>   	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
>   	  }								\
>   									\
> -	if (__glibc_unlikely (inptr + cnt > inend))			      \
> +	if (__glibc_unlikely (inptr + cnt > inend))			\
>   	  {								\
>   	    /* We don't have enough input.  But before we report	\
>   	       that check that all the bytes are correct.  */		\
> @@ -280,7 +359,7 @@ gconv_end (struct __gconv_step *data)
>   	      if ((inptr[i] & 0xc0) != 0x80)				\
>   		break;							\
>   									\
> -	    if (__glibc_likely (inptr + i == inend))			      \
> +	    if (__glibc_likely (inptr + i == inend))			\
>   	      {								\
>   		result = __GCONV_INCOMPLETE_INPUT;			\
>   		break;							\
> @@ -305,7 +384,10 @@ gconv_end (struct __gconv_step *data)
>   	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
>   	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
>   	   have been represented with fewer than cnt bytes.  */		\
> -	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0))		\
> +	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
> +	    /* Do not accept UTF-16 surrogates.  */			\
> +	    || (ch >= 0xd800 && ch <= 0xdfff)				\
> +	    || (ch > 0x10ffff))						\
>   	  {								\
>   	    /* This is an illegal encoding.  */				\
>   	    goto errout;						\
> @@ -318,137 +400,212 @@ gconv_end (struct __gconv_step *data)
>       *((uint32_t *) outptr) = ch;					\
>       outptr += sizeof (uint32_t);					\
>     }
> -#define LOOP_NEED_FLAGS
>
> -#define STORE_REST							\
> -  {									      \
> -    /* We store the remaining bytes while converting them into the UCS4	      \
> -       format.  We can assume that the first byte in the buffer is	      \
> -       correct and that it requires a larger number of bytes than there	      \
> -       are in the input buffer.  */					      \
> -    wint_t ch = **inptrp;						      \
> -    size_t cnt, r;							      \
> -									      \
> -    state->__count = inend - *inptrp;					      \
> -									      \
> -    if (ch >= 0xc2 && ch < 0xe0)					      \
> -      {									      \
> -	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
> -	   0xc1, otherwise the wide character could have been		      \
> -	   represented using a single byte.  */				      \
> -	cnt = 2;							      \
> -	ch &= 0x1f;							      \
> -      }									      \
> -    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
> -      {									      \
> -	/* We expect three bytes.  */					      \
> -	cnt = 3;							      \
> -	ch &= 0x0f;							      \
> -      }									      \
> -    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
> -      {									      \
> -	/* We expect four bytes.  */					      \
> -	cnt = 4;							      \
> -	ch &= 0x07;							      \
> -      }									      \
> -    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
> -      {									      \
> -	/* We expect five bytes.  */					      \
> -	cnt = 5;							      \
> -	ch &= 0x03;							      \
> -      }									      \
> -    else								      \
> -      {									      \
> -	/* We expect six bytes.  */					      \
> -	cnt = 6;							      \
> -	ch &= 0x01;							      \
> -      }									      \
> -									      \
> -    /* The first byte is already consumed.  */				      \
> -    r = cnt - 1;							      \
> -    while (++(*inptrp) < inend)						      \
> -      {									      \
> -	ch <<= 6;							      \
> -	ch |= **inptrp & 0x3f;						      \
> -	--r;								      \
> -      }									      \
> -									      \
> -    /* Shift for the so far missing bytes.  */				      \
> -    ch <<= r * 6;							      \
> -									      \
> -    /* Store the number of bytes expected for the entire sequence.  */	      \
> -    state->__count |= cnt << 8;						      \
> -									      \
> -    /* Store the value.  */						      \
> -    state->__value.__wch = ch;						      \
> +#define HW_FROM_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
> +		  "vrepib %%v31,0x20\n\t"				\
> +		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
> +		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
> +		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> +		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  /* Enlarge to UCS4.  */				\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "la %[R_IN],16(%[R_IN])\n\t"				\
> +		  "vuplhh %%v20,%%v18\n\t"				\
> +		  "aghi %[R_INLEN],-16\n\t"				\
> +		  "vupllh %%v21,%%v18\n\t"				\
> +		  "aghi %[R_OUTLEN],-64\n\t"				\
> +		  "vuplhh %%v22,%%v19\n\t"				\
> +		  "vupllh %%v23,%%v19\n\t"				\
> +		  /* Store 64 bytes to buf_out.  */			\
> +		  "vstm %%v20,%%v23,0(%[R_OUT])\n\t"			\
> +		  "la %[R_OUT],64(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],64,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  "10:\n\t"						\
> +		  /* At least one byte is > 0x7f.			\
> +		     Store the preceding 1-byte chars.  */		\
> +		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
> +		  "sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest	\
> +						     index to store. */ \
> +		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
> +		  "ahi %[R_TMP2],-1\n\t"				\
> +		  "jl 20f\n\t"						\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "vuplhh %%v20,%%v18\n\t"				\
> +		  "vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllh %%v21,%%v18\n\t"				\
> +		  "vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "vuplhh %%v22,%%v19\n\t"				\
> +		  "vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllh %%v23,%%v19\n\t"				\
> +		  "vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
> +		  "11:\n\t"						\
> +		  /* Update pointers.  */				\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Handle multibyte utf8-char with convert instruction. */ \
> +		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
> +		    ASM_CLOBBER_VR ("v31")				\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
>     }
> +#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
>
> -#define UNPACK_BYTES \
> -  {									      \
> -    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
> -    wint_t wch = state->__value.__wch;					      \
> -    size_t ntotal = state->__count >> 8;				      \
> -									      \
> -    inlen = state->__count & 255;					      \
> -									      \
> -    bytebuf[0] = inmask[ntotal - 2];					      \
> -									      \
> -    do									      \
> -      {									      \
> -	if (--ntotal < inlen)						      \
> -	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
> -	wch >>= 6;							      \
> -      }									      \
> -    while (ntotal > 1);							      \
> -									      \
> -    bytebuf[0] |= wch;							      \
> -  }
> +/* These definitions apply to the UTF-8 to UTF-32 direction.  The
> +   software implementation for UTF-8 still supports multibyte
> +   characters up to 6 bytes whereas the hardware variant does not.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_c
>
> -#define CLEAR_STATE \
> -  state->__count = 0
> +#define LOOP_NEED_FLAGS
>
> +#define STORE_REST		STORE_REST_COMMON
> +#define UNPACK_BYTES		UNPACK_BYTES_COMMON
> +#define CLEAR_STATE		CLEAR_STATE_COMMON
> +#define BODY			BODY_FROM_C
>   #include <iconv/loop.c>
>
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_etf3eh
> +
> +#define LOOP_NEED_FLAGS
> +
> +#define STORE_REST		STORE_REST_COMMON
> +#define UNPACK_BYTES		UNPACK_BYTES_COMMON
> +#define CLEAR_STATE		CLEAR_STATE_COMMON
> +#define BODY			BODY_FROM_ETF3EH
> +#include <iconv/loop.c>
> +
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define LOOPFCT		__from_utf8_loop_vx
> +
> +# define LOOP_NEED_FLAGS
> +
> +# define STORE_REST		STORE_REST_COMMON
> +# define UNPACK_BYTES		UNPACK_BYTES_COMMON
> +# define CLEAR_STATE		CLEAR_STATE_COMMON
> +# define BODY			BODY_FROM_VX
> +# include <iconv/loop.c>
> +#endif
> +
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__from_utf8_loop_c)
> +__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
> +__from_utf8_loop;
> +
> +static void *
> +__from_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __from_utf8_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __from_utf8_loop_etf3eh;
> +  else
> +    return __from_utf8_loop_c;
> +}
> +
> +strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
> +
> +
>   /* Conversion from UTF-32 internal/BE to UTF-8.  */
> +#define BODY_TO_HW(ASM)							\
> +  {									\
> +    ASM;								\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +    if (inptr + 4 > inend)						\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> +  }
> +
> +/* The hardware routine uses the S/390 cu41 instruction.  */
> +#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
> +
> +/* The hardware routine uses the S/390 vector and cu41 instructions.  */
> +#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
>
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			TO_LOOP
>   /* The software routine mimics the S/390 cu41 instruction.  */
> -#define BODY							\
> +#define BODY_TO_C						\
>     {								\
> -    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)			\
> -      {								\
> -	HARDWARE_CONVERT ("cu41 %0, %1");			\
> -								\
> -	if (inptr != inend)					\
> -	  {							\
> -	    result = __GCONV_INCOMPLETE_INPUT;			\
> -	    break;						\
> -	  }							\
> -	continue;						\
> -      }								\
> -								\
>       uint32_t wc = *((const uint32_t *) inptr);			\
>   								\
> -    if (__glibc_likely (wc <= 0x7f))					      \
> +    if (__glibc_likely (wc <= 0x7f))				\
>         {								\
> -        /* Single UTF-8 char.  */				\
> -        *outptr = (uint8_t)wc;					\
> +	/* Single UTF-8 char.  */				\
> +	*outptr = (uint8_t)wc;					\
>   	outptr++;						\
>         }								\
>       else if (wc <= 0x7ff)					\
>         {								\
> -        /* Two UTF-8 chars.  */					\
> -        if (__glibc_unlikely (outptr + 2 > outend))			      \
> +	/* Two UTF-8 chars.  */					\
> +	if (__glibc_unlikely (outptr + 2 > outend))		\
>   	  {							\
>   	    /* Overflow in the output buffer.  */		\
>   	    result = __GCONV_FULL_OUTPUT;			\
>   	    break;						\
>   	  }							\
>   								\
> -        outptr[0] = 0xc0;					\
> +	outptr[0] = 0xc0;					\
>   	outptr[0] |= wc >> 6;					\
>   								\
>   	outptr[1] = 0x80;					\
> @@ -459,12 +616,18 @@ gconv_end (struct __gconv_step *data)
>       else if (wc <= 0xffff)					\
>         {								\
>   	/* Three UTF-8 chars.  */				\
> -	if (__glibc_unlikely (outptr + 3 > outend))			      \
> +	if (__glibc_unlikely (outptr + 3 > outend))		\
>   	  {							\
>   	    /* Overflow in the output buffer.  */		\
>   	    result = __GCONV_FULL_OUTPUT;			\
>   	    break;						\
>   	  }							\
> +	if (wc >= 0xd800 && wc < 0xdc00)			\
> +	  {							\
> +	    /* Do not accept UTF-16 surrogates.   */		\
> +	    result = __GCONV_ILLEGAL_INPUT;			\
> +	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
> +	  }							\
>   	outptr[0] = 0xe0;					\
>   	outptr[0] |= wc >> 12;					\
>   								\
> @@ -479,7 +642,7 @@ gconv_end (struct __gconv_step *data)
>         else if (wc <= 0x10ffff)					\
>   	{							\
>   	  /* Four UTF-8 chars.  */				\
> -	  if (__glibc_unlikely (outptr + 4 > outend))			      \
> +	  if (__glibc_unlikely (outptr + 4 > outend))		\
>   	    {							\
>   	      /* Overflow in the output buffer.  */		\
>   	      result = __GCONV_FULL_OUTPUT;			\
> @@ -505,7 +668,140 @@ gconv_end (struct __gconv_step *data)
>   	}							\
>       inptr += 4;							\
>     }
> +
> +#define HW_TO_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2;						\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  "vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
> +		  "vzero %%v21\n\t"					\
> +		  "vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
> +		  "vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
> +		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
> +		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
> +		  "lghi %[R_TMP],0\n\t"					\
> +		  /* Shorten to byte values.  */			\
> +		  "vpkf %%v23,%%v16,%%v17\n\t"				\
> +		  "vpkf %%v24,%%v18,%%v19\n\t"				\
> +		  "vpkh %%v23,%%v23,%%v24\n\t"				\
> +		  /* Checking for values > 0x7f.  */			\
> +		  "vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"			\
> +		  "jno 10f\n\t"						\
> +		  "vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"			\
> +		  "jno 11f\n\t"						\
> +		  "vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"			\
> +		  "jno 12f\n\t"						\
> +		  "vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"			\
> +		  "jno 13f\n\t"						\
> +		  /* Store 16bytes to outptr.  */			\
> +		  "vst %%v23,0(%[R_OUT])\n\t"				\
> +		  "aghi %[R_INLEN],-64\n\t"				\
> +		  "aghi %[R_OUTLEN],-16\n\t"				\
> +		  "la %[R_IN],64(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],64,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Found a value > 0x7f.  */				\
> +		  "13: ahi %[R_TMP],4\n\t"				\
> +		  "12: ahi %[R_TMP],4\n\t"				\
> +		  "11: ahi %[R_TMP],4\n\t"				\
> +		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
> +		  "srlg %[R_I],%[R_I],2\n\t"				\
> +		  "agr %[R_I],%[R_TMP]\n\t"				\
> +		  "je 20f\n\t"						\
> +		  /* Store characters before invalid one...  */		\
> +		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
> +		  "15: aghi %[R_I],-1\n\t"				\
> +		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
> +		  /* ... and update pointers.  */			\
> +		  "aghi %[R_I],1\n\t"					\
> +		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
> +		  "sllg %[R_I],%[R_I],2\n\t"				\
> +		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_I]\n\t"				\
> +		  /* Handle multibyte utf8-char with convert instruction. */ \
> +		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
> +		    , [R_I] "=a" (tmp2)					\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
> +		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
> +		    ASM_CLOBBER_VR ("v24")				\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +  }
> +
> +/* Generate loop-function with software routing.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf8_loop_c
> +#define BODY			BODY_TO_C
> +#define LOOP_NEED_FLAGS
> +#include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#define LOOPFCT			__to_utf8_loop_etf3eh
>   #define LOOP_NEED_FLAGS
> +#define BODY			BODY_TO_ETF3EH
>   #include <iconv/loop.c>
>
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector and utf-convert instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +# define LOOPFCT		__to_utf8_loop_vx
> +# define BODY			BODY_TO_VX
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +#endif
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__to_utf8_loop_c)
> +__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
> +__to_utf8_loop;
> +
> +static void *
> +__to_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __to_utf8_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __to_utf8_loop_etf3eh;
> +  else
> +    return __to_utf8_loop_c;
> +}
> +
> +strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
> +
> +
>   #include <iconv/skeleton.c>
>

[-- Attachment #2: 0007-S390-Optimize-utf8-utf32-module.patch --]
[-- Type: text/x-patch, Size: 29531 bytes --]

From 12647b88906a3e8bfff9deae09c54fa2c933304a Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 07/14] S390: Optimize utf8-utf32 module.

This patch reworks the s390 specific module to convert between utf8 and utf32.
Now ifunc is used to choose either the c or etf3eh (with convert utf
instruction) variants at runtime.
Furthermore a new vector variant for z13 is introduced which will be build
and chosen if vector support is available at build / runtime.
The vector variants optimize input of 1byte utf8 characters. The convert utf
instruction is used if a multibyte utf8 character is found.

This patch also fixes some whitespace errors. The c variants are rejecting
UTF-16 surrogates and values above 0x10ffff now.
Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
Before they ignored the ignore-case and always stopped at an error.

ChangeLog:

	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Use ifunc to select c, etf3eh
	or new vector loop-variant.
---
 sysdeps/s390/s390-64/utf8-utf32-z9.c | 664 +++++++++++++++++++++++++----------
 1 file changed, 480 insertions(+), 184 deletions(-)

diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
index defd47d..f9c9199 100644
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ b/sysdeps/s390/s390-64/utf8-utf32-z9.c
@@ -30,35 +30,25 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
-/* UTF-32 big endian byte order mark.  */
-#define BOM	                0x0000feffu
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
 
+/* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
-/* These definitions apply to the UTF-8 to UTF-32 direction.  The
-   software implementation for UTF-8 still supports multibyte
-   characters up to 6 bytes whereas the hardware variant does not.  */
 #define MIN_NEEDED_FROM		1
 #define MAX_NEEDED_FROM		6
 #define MIN_NEEDED_TO		4
-#define FROM_LOOP		from_utf8_loop
-#define TO_LOOP			to_utf8_loop
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
 #define FROM_DIRECTION		(dir == from_utf8)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the Byte Order Mark.  */					\
-      if (__glibc_unlikely (outbuf + 4 > outend))			      \
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put32u (outbuf, BOM);						\
-      outbuf += 4;							\
-    }
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM			0x0000feffu
 
 /* Direction of the transformation.  */
 enum direction
@@ -155,16 +145,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-		      "+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -173,49 +163,150 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the Byte Order Mark.  */					\
+      if (__glibc_unlikely (outbuf + 4 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put32u (outbuf, BOM);						\
+      outbuf += 4;							\
+    }
+
 /* Conversion function from UTF-8 to UTF-32 internal/BE.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
-/* The software routine is copied from gconv_simple.c.  */
-#define BODY								\
+#define STORE_REST_COMMON						      \
+  {									      \
+    /* We store the remaining bytes while converting them into the UCS4	      \
+       format.  We can assume that the first byte in the buffer is	      \
+       correct and that it requires a larger number of bytes than there	      \
+       are in the input buffer.  */					      \
+    wint_t ch = **inptrp;						      \
+    size_t cnt, r;							      \
+									      \
+    state->__count = inend - *inptrp;					      \
+									      \
+    assert (ch != 0xc0 && ch != 0xc1);					      \
+    if (ch >= 0xc2 && ch < 0xe0)					      \
+      {									      \
+	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
+	   0xc1, otherwise the wide character could have been		      \
+	   represented using a single byte.  */				      \
+	cnt = 2;							      \
+	ch &= 0x1f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+      {									      \
+	/* We expect three bytes.  */					      \
+	cnt = 3;							      \
+	ch &= 0x0f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+      {									      \
+	/* We expect four bytes.  */					      \
+	cnt = 4;							      \
+	ch &= 0x07;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
+      {									      \
+	/* We expect five bytes.  */					      \
+	cnt = 5;							      \
+	ch &= 0x03;							      \
+      }									      \
+    else								      \
+      {									      \
+	/* We expect six bytes.  */					      \
+	cnt = 6;							      \
+	ch &= 0x01;							      \
+      }									      \
+									      \
+    /* The first byte is already consumed.  */				      \
+    r = cnt - 1;							      \
+    while (++(*inptrp) < inend)						      \
+      {									      \
+	ch <<= 6;							      \
+	ch |= **inptrp & 0x3f;						      \
+	--r;								      \
+      }									      \
+									      \
+    /* Shift for the so far missing bytes.  */				      \
+    ch <<= r * 6;							      \
+									      \
+    /* Store the number of bytes expected for the entire sequence.  */	      \
+    state->__count |= cnt << 8;						      \
+									      \
+    /* Store the value.  */						      \
+    state->__value.__wch = ch;						      \
+  }
+
+#define UNPACK_BYTES_COMMON \
+  {									      \
+    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
+    wint_t wch = state->__value.__wch;					      \
+    size_t ntotal = state->__count >> 8;				      \
+									      \
+    inlen = state->__count & 255;					      \
+									      \
+    bytebuf[0] = inmask[ntotal - 2];					      \
+									      \
+    do									      \
+      {									      \
+	if (--ntotal < inlen)						      \
+	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
+	wch >>= 6;							      \
+      }									      \
+    while (ntotal > 1);							      \
+									      \
+    bytebuf[0] |= wch;							      \
+  }
+
+#define CLEAR_STATE_COMMON \
+  state->__count = 0
+
+#define BODY_FROM_HW(ASM)						\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu14 %0, %1, 1");				\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
 									\
-	if (inptr != inend)						\
-	  {								\
-	    int i;							\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-	continue;							\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
       }									\
-									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
+
+
+/* The software routine is copied from gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
     /* Next input byte.  */						\
     uint32_t ch = *inptr;						\
 									\
-    if (__glibc_likely (ch < 0x80))					      \
+    if (__glibc_likely (ch < 0x80))					\
       {									\
 	/* One byte sequence.  */					\
 	++inptr;							\
@@ -233,30 +324,18 @@ gconv_end (struct __gconv_step *data)
 	    cnt = 2;							\
 	    ch &= 0x1f;							\
 	  }								\
-        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
 	  {								\
 	    /* We expect three bytes.  */				\
 	    cnt = 3;							\
 	    ch &= 0x0f;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
 	  {								\
 	    /* We expect four bytes.  */				\
 	    cnt = 4;							\
 	    ch &= 0x07;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-	  {								\
-	    /* We expect five bytes.  */				\
-	    cnt = 5;							\
-	    ch &= 0x03;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xfe) == 0xfc))			      \
-	  {								\
-	    /* We expect six bytes.  */					\
-	    cnt = 6;							\
-	    ch &= 0x01;							\
-	  }								\
 	else								\
 	  {								\
 	    /* Search the end of this ill-formed UTF-8 character.  This	\
@@ -272,7 +351,7 @@ gconv_end (struct __gconv_step *data)
 	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
 	  }								\
 									\
-	if (__glibc_unlikely (inptr + cnt > inend))			      \
+	if (__glibc_unlikely (inptr + cnt > inend))			\
 	  {								\
 	    /* We don't have enough input.  But before we report	\
 	       that check that all the bytes are correct.  */		\
@@ -280,7 +359,7 @@ gconv_end (struct __gconv_step *data)
 	      if ((inptr[i] & 0xc0) != 0x80)				\
 		break;							\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
+	    if (__glibc_likely (inptr + i == inend))			\
 	      {								\
 		result = __GCONV_INCOMPLETE_INPUT;			\
 		break;							\
@@ -305,7 +384,10 @@ gconv_end (struct __gconv_step *data)
 	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
 	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
 	   have been represented with fewer than cnt bytes.  */		\
-	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0))		\
+	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
+	    /* Do not accept UTF-16 surrogates.  */			\
+	    || (ch >= 0xd800 && ch <= 0xdfff)				\
+	    || (ch > 0x10ffff))						\
 	  {								\
 	    /* This is an illegal encoding.  */				\
 	    goto errout;						\
@@ -318,137 +400,212 @@ gconv_end (struct __gconv_step *data)
     *((uint32_t *) outptr) = ch;					\
     outptr += sizeof (uint32_t);					\
   }
-#define LOOP_NEED_FLAGS
 
-#define STORE_REST							\
-  {									      \
-    /* We store the remaining bytes while converting them into the UCS4	      \
-       format.  We can assume that the first byte in the buffer is	      \
-       correct and that it requires a larger number of bytes than there	      \
-       are in the input buffer.  */					      \
-    wint_t ch = **inptrp;						      \
-    size_t cnt, r;							      \
-									      \
-    state->__count = inend - *inptrp;					      \
-									      \
-    if (ch >= 0xc2 && ch < 0xe0)					      \
-      {									      \
-	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
-	   0xc1, otherwise the wide character could have been		      \
-	   represented using a single byte.  */				      \
-	cnt = 2;							      \
-	ch &= 0x1f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
-      {									      \
-	/* We expect three bytes.  */					      \
-	cnt = 3;							      \
-	ch &= 0x0f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
-      {									      \
-	/* We expect four bytes.  */					      \
-	cnt = 4;							      \
-	ch &= 0x07;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-      {									      \
-	/* We expect five bytes.  */					      \
-	cnt = 5;							      \
-	ch &= 0x03;							      \
-      }									      \
-    else								      \
-      {									      \
-	/* We expect six bytes.  */					      \
-	cnt = 6;							      \
-	ch &= 0x01;							      \
-      }									      \
-									      \
-    /* The first byte is already consumed.  */				      \
-    r = cnt - 1;							      \
-    while (++(*inptrp) < inend)						      \
-      {									      \
-	ch <<= 6;							      \
-	ch |= **inptrp & 0x3f;						      \
-	--r;								      \
-      }									      \
-									      \
-    /* Shift for the so far missing bytes.  */				      \
-    ch <<= r * 6;							      \
-									      \
-    /* Store the number of bytes expected for the entire sequence.  */	      \
-    state->__count |= cnt << 8;						      \
-									      \
-    /* Store the value.  */						      \
-    state->__value.__wch = ch;						      \
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				   UTF8 chars.  */			\
+		  /* Enlarge to UCS4.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    aghi %[R_OUTLEN],-64\n\t"			\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  /* Store 64 bytes to buf_out.  */			\
+		  "    vstm %%v20,%%v23,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],64(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10: \n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest \
+						     index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  "    vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
+		  "11: \n\t"						\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+		    ASM_CLOBBER_VR ("v31")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
   }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
 
-#define UNPACK_BYTES \
-  {									      \
-    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
-    wint_t wch = state->__value.__wch;					      \
-    size_t ntotal = state->__count >> 8;				      \
-									      \
-    inlen = state->__count & 255;					      \
-									      \
-    bytebuf[0] = inmask[ntotal - 2];					      \
-									      \
-    do									      \
-      {									      \
-	if (--ntotal < inlen)						      \
-	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
-	wch >>= 6;							      \
-      }									      \
-    while (ntotal > 1);							      \
-									      \
-    bytebuf[0] |= wch;							      \
-  }
+/* These definitions apply to the UTF-8 to UTF-32 direction.  The
+   software implementation for UTF-8 still supports multibyte
+   characters up to 6 bytes whereas the hardware variant does not.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
 
-#define CLEAR_STATE \
-  state->__count = 0
+#define LOOP_NEED_FLAGS
 
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_C
 #include <iconv/loop.c>
 
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+
+# define LOOP_NEED_FLAGS
+
+# define STORE_REST		STORE_REST_COMMON
+# define UNPACK_BYTES		UNPACK_BYTES_COMMON
+# define CLEAR_STATE		CLEAR_STATE_COMMON
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+
 /* Conversion from UTF-32 internal/BE to UTF-8.  */
+#define BODY_TO_HW(ASM)							\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* The hardware routine uses the S/390 cu41 instruction.  */
+#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
+
+/* The hardware routine uses the S/390 vector and cu41 instructions.  */
+#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine mimics the S/390 cu41 instruction.  */
-#define BODY							\
+#define BODY_TO_C						\
   {								\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)			\
-      {								\
-	HARDWARE_CONVERT ("cu41 %0, %1");			\
-								\
-	if (inptr != inend)					\
-	  {							\
-	    result = __GCONV_INCOMPLETE_INPUT;			\
-	    break;						\
-	  }							\
-	continue;						\
-      }								\
-								\
     uint32_t wc = *((const uint32_t *) inptr);			\
 								\
-    if (__glibc_likely (wc <= 0x7f))					      \
+    if (__glibc_likely (wc <= 0x7f))				\
       {								\
-        /* Single UTF-8 char.  */				\
-        *outptr = (uint8_t)wc;					\
+	/* Single UTF-8 char.  */				\
+	*outptr = (uint8_t)wc;					\
 	outptr++;						\
       }								\
     else if (wc <= 0x7ff)					\
       {								\
-        /* Two UTF-8 chars.  */					\
-        if (__glibc_unlikely (outptr + 2 > outend))			      \
+	/* Two UTF-8 chars.  */					\
+	if (__glibc_unlikely (outptr + 2 > outend))		\
 	  {							\
 	    /* Overflow in the output buffer.  */		\
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
 								\
-        outptr[0] = 0xc0;					\
+	outptr[0] = 0xc0;					\
 	outptr[0] |= wc >> 6;					\
 								\
 	outptr[1] = 0x80;					\
@@ -459,12 +616,18 @@ gconv_end (struct __gconv_step *data)
     else if (wc <= 0xffff)					\
       {								\
 	/* Three UTF-8 chars.  */				\
-	if (__glibc_unlikely (outptr + 3 > outend))			      \
+	if (__glibc_unlikely (outptr + 3 > outend))		\
 	  {							\
 	    /* Overflow in the output buffer.  */		\
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
+	if (wc >= 0xd800 && wc < 0xdc00)			\
+	  {							\
+	    /* Do not accept UTF-16 surrogates.   */		\
+	    result = __GCONV_ILLEGAL_INPUT;			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	  }							\
 	outptr[0] = 0xe0;					\
 	outptr[0] |= wc >> 12;					\
 								\
@@ -479,7 +642,7 @@ gconv_end (struct __gconv_step *data)
       else if (wc <= 0x10ffff)					\
 	{							\
 	  /* Four UTF-8 chars.  */				\
-	  if (__glibc_unlikely (outptr + 4 > outend))			      \
+	  if (__glibc_unlikely (outptr + 4 > outend))		\
 	    {							\
 	      /* Overflow in the output buffer.  */		\
 	      result = __GCONV_FULL_OUTPUT;			\
@@ -505,7 +668,140 @@ gconv_end (struct __gconv_step *data)
 	}							\
     inptr += 4;							\
   }
+
+#define HW_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2;						\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		  "    vzero %%v21\n\t"					\
+		  "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		  "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP],0\n\t"				\
+		  /* Shorten to byte values.  */			\
+		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		  "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		  /* Checking for values > 0x7f.  */			\
+		  "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		  "    jno 11f\n\t"					\
+		  "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		  "    jno 12f\n\t"					\
+		  "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		  "    jno 13f\n\t"					\
+		  /* Store 16bytes to outptr.  */			\
+		  "    vst %%v23,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_INLEN],-64\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_IN],64(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Found a value > 0x7f.  */				\
+		  "13: ahi %[R_TMP],4\n\t"				\
+		  "12: ahi %[R_TMP],4\n\t"				\
+		  "11: ahi %[R_TMP],4\n\t"				\
+		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		  "    srlg %[R_I],%[R_I],2\n\t"			\
+		  "    agr %[R_I],%[R_TMP]\n\t"				\
+		  "    je 20f\n\t"					\
+		  /* Store characters before invalid one...  */		\
+		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
+		  "15: aghi %[R_I],-1\n\t"				\
+		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		  /* ... and update pointers.  */			\
+		  "    aghi %[R_I],1\n\t"				\
+		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
+		  "    sllg %[R_I],%[R_I],2\n\t"			\
+		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
+		    , [R_I] "=a" (tmp2)					\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+		    ASM_CLOBBER_VR ("v24")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_c
+#define BODY			BODY_TO_C
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY			BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf8_loop_etf3eh;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+
 #include <iconv/skeleton.c>
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/14] S390: Optimize utf8-utf16 module.
  2016-02-23  9:22 ` [PATCH 08/14] S390: Optimize utf8-utf16 module Stefan Liebler
@ 2016-04-21 15:20   ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:20 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 29302 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> This patch reworks the s390 specific module to convert between utf8 and utf16.
> Now ifunc is used to choose either the c or etf3eh (with convert utf instruction)
> variants at runtime. Furthermore a new vector variant for z13 is introduced
> which will be build and chosen if vector support is available at build / runtime.
>
> In case of converting utf 8 to utf16, the vector variant optimizes input of
> 1byte utf8 characters. The convert utf instruction is used if a multibyte utf8
> character is found.
>
> For the other direction utf16 to utf8, the cu21 instruction can't be re-enabled,
> because it does not report an error, if the input-stream consists of a single
> low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13, too.
> Thus there is only the c or the new vector variant, which can handle 1..4 byte
> utf8 characters.
>
> The c variant from utf16 to utf8 has beed fixed. If a high surrogate was at the
> end of the input-buffer, then errno was set to EINVAL and the input-pointer
> pointed just after the high surrogate. Now it points to the beginning of the
> high surrogate.
>
> This patch also fixes some whitespace errors. The c variant from utf8 to utf16
> is now checking that tail-bytes starts with 0b10... and the value is not in
> range of an utf16 surrogate.
>
> Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
> Before they ignored the ignore-case and always stopped at an error.
>
> ChangeLog:
>
> 	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Use ifunc to select c,
> 	etf3eh or new vector loop-variant.
> ---
>   sysdeps/s390/s390-64/utf8-utf16-z9.c | 547 ++++++++++++++++++++++++++++-------
>   1 file changed, 441 insertions(+), 106 deletions(-)
>
> diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
> index 4148ed7..76625d0 100644
> --- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
> +++ b/sysdeps/s390/s390-64/utf8-utf16-z9.c
> @@ -30,33 +30,27 @@
>   #include <dl-procinfo.h>
>   #include <gconv.h>
>
> -/* UTF-16 big endian byte order mark.  */
> -#define BOM_UTF16	0xfeff
> +#if defined HAVE_S390_VX_GCC_SUPPORT
> +# define ASM_CLOBBER_VR(NR) , NR
> +#else
> +# define ASM_CLOBBER_VR(NR)
> +#endif
>
> +/* Defines for skeleton.c.  */
>   #define DEFINE_INIT		0
>   #define DEFINE_FINI		0
>   #define MIN_NEEDED_FROM		1
>   #define MAX_NEEDED_FROM		4
>   #define MIN_NEEDED_TO		2
>   #define MAX_NEEDED_TO		4
> -#define FROM_LOOP		from_utf8_loop
> -#define TO_LOOP			to_utf8_loop
> +#define FROM_LOOP		__from_utf8_loop
> +#define TO_LOOP			__to_utf8_loop
>   #define FROM_DIRECTION		(dir == from_utf8)
>   #define ONE_DIRECTION           0
> -#define PREPARE_LOOP							\
> -  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> -  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> -									\
> -  if (emit_bom && !data->__internal_use					\
> -      && data->__invocation_counter == 0)				\
> -    {									\
> -      /* Emit the UTF-16 Byte Order Mark.  */				\
> -      if (__glibc_unlikely (outbuf + 2 > outend))			      \
> -	return __GCONV_FULL_OUTPUT;					\
> -									\
> -      put16u (outbuf, BOM_UTF16);					\
> -      outbuf += 2;							\
> -    }
> +
> +
> +/* UTF-16 big endian byte order mark.  */
> +#define BOM_UTF16	0xfeff
>
>   /* Direction of the transformation.  */
>   enum direction
> @@ -151,16 +145,16 @@ gconv_end (struct __gconv_step *data)
>       register unsigned long long outlen __asm__("11") = outend - outptr;	\
>       uint64_t cc = 0;							\
>   									\
> -    __asm__ volatile (".machine push       \n\t"			\
> -		      ".machine \"z9-109\" \n\t"			\
> -		      "0: " INSTRUCTION "  \n\t"			\
> -		      ".machine pop        \n\t"			\
> -		      "   jo     0b        \n\t"			\
> -		      "   ipm    %2        \n"				\
> -		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> -			"+d" (outlen), "+d" (inlen)			\
> -		      :							\
> -		      : "cc", "memory");				\
> +    __asm__ __volatile__ (".machine push       \n\t"			\
> +			  ".machine \"z9-109\" \n\t"			\
> +			  "0: " INSTRUCTION "  \n\t"			\
> +			  ".machine pop        \n\t"			\
> +			  "   jo     0b        \n\t"			\
> +			  "   ipm    %2        \n"			\
> +			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> +			    "+d" (outlen), "+d" (inlen)			\
> +			  :						\
> +			  : "cc", "memory");				\
>   									\
>       inptr = pInput;							\
>       outptr = pOutput;							\
> @@ -169,50 +163,135 @@ gconv_end (struct __gconv_step *data)
>       if (cc == 1)							\
>         {									\
>   	result = __GCONV_FULL_OUTPUT;					\
> -	break;								\
>         }									\
>       else if (cc == 2)							\
>         {									\
>   	result = __GCONV_ILLEGAL_INPUT;					\
> -	break;								\
>         }									\
>     }
>
> +#define PREPARE_LOOP							\
> +  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
> +  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
> +									\
> +  if (emit_bom && !data->__internal_use					\
> +      && data->__invocation_counter == 0)				\
> +    {									\
> +      /* Emit the UTF-16 Byte Order Mark.  */				\
> +      if (__glibc_unlikely (outbuf + 2 > outend))			\
> +	return __GCONV_FULL_OUTPUT;					\
> +									\
> +      put16u (outbuf, BOM_UTF16);					\
> +      outbuf += 2;							\
> +    }
> +
>   /* Conversion function from UTF-8 to UTF-16.  */
> +#define BODY_FROM_HW(ASM)						\
> +  {									\
> +    ASM;								\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result == __GCONV_FULL_OUTPUT)				\
> +      break;								\
> +									\
> +    int i;								\
> +    for (i = 1; inptr + i < inend && i < 5; ++i)			\
> +      if ((inptr[i] & 0xc0) != 0x80)					\
> +	break;								\
> +									\
> +    if (__glibc_likely (inptr + i == inend				\
> +			&& result == __GCONV_EMPTY_INPUT))		\
> +      {									\
> +	result = __GCONV_INCOMPLETE_INPUT;				\
> +	break;								\
> +      }									\
> +    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
> +  }
> +
> +#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
> +
> +#define HW_FROM_VX							\
> +  {									\
> +    register const unsigned char* pInput asm ("8") = inptr;		\
> +    register size_t inlen asm ("9") = inend - inptr;			\
> +    register unsigned char* pOutput asm ("10") = outptr;		\
> +    register size_t outlen asm("11") = outend - outptr;			\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  "vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */	\
> +		  "vrepib %%v31,0x20\n\t"				\
> +		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
> +		  "0: clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
> +		  "1: vl %%v16,0(%[R_IN])\n\t"				\
> +		  "vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  /* Enlarge to UTF-16.  */				\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "la %[R_IN],16(%[R_IN])\n\t"				\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "aghi %[R_INLEN],-16\n\t"				\
> +		  /* Store 32 bytes to buf_out.  */			\
> +		  "vstm %%v18,%%v19,0(%[R_OUT])\n\t"			\
> +		  "aghi %[R_OUTLEN],-32\n\t"				\
> +		  "la %[R_OUT],32(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],16,20f\n\t"			\
> +		  "clgijl %[R_OUTLEN],32,20f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  "10:\n\t"						\
> +		  /* At least one byte is > 0x7f.			\
> +		     Store the preceding 1-byte chars.  */		\
> +		  "vlgvb %[R_TMP],%%v17,7\n\t"				\
> +		  "sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest	\
> +						     index to store. */ \
> +		  "llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
> +		  "ahi %[R_TMP2],-1\n\t"				\
> +		  "jl 20f\n\t"						\
> +		  "vuplhb %%v18,%%v16\n\t"				\
> +		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  "ahi %[R_TMP2],-16\n\t"				\
> +		  "jl 11f\n\t"						\
> +		  "vupllb %%v19,%%v16\n\t"				\
> +		  "vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
> +		  "11:\n\t" /* Update pointers.  */			\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  /* Handle multibyte utf8-char with convert instruction. */ \
> +		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
> +		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> +		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> +		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (pInput)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    inptr = pInput;							\
> +    outptr = pOutput;							\
> +  }
> +#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
> +
>
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> -#define LOOPFCT			FROM_LOOP
>   /* The software implementation is based on the code in gconv_simple.c.  */
> -#define BODY								\
> +#define BODY_FROM_C							\
>     {									\
> -    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
> -      {									\
> -	HARDWARE_CONVERT ("cu12 %0, %1, 1");				\
> -									\
> -	if (inptr != inend)						\
> -	  {								\
> -	    int i;							\
> -	    for (i = 1; inptr + i < inend; ++i)				\
> -	      if ((inptr[i] & 0xc0) != 0x80)				\
> -		break;							\
> -								\
> -	    if (__glibc_likely (inptr + i == inend))			      \
> -	      {								\
> -		result = __GCONV_INCOMPLETE_INPUT;			\
> -		break;							\
> -	      }								\
> -	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
> -	  }								\
> -	continue;							\
> -    }									\
> -									\
>       /* Next input byte.  */						\
>       uint16_t ch = *inptr;						\
>   									\
> -    if (__glibc_likely (ch < 0x80))					      \
> +    if (__glibc_likely (ch < 0x80))					\
>         {									\
>   	/* One byte sequence.  */					\
>   	++inptr;							\
> @@ -230,13 +309,13 @@ gconv_end (struct __gconv_step *data)
>   	    cnt = 2;							\
>   	    ch &= 0x1f;							\
>   	  }								\
> -        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
> +	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
>   	  {								\
>   	    /* We expect three bytes.  */				\
>   	    cnt = 3;							\
>   	    ch &= 0x0f;							\
>   	  }								\
> -	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
> +	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
>   	  {								\
>   	    /* We expect four bytes.  */				\
>   	    cnt = 4;							\
> @@ -257,7 +336,7 @@ gconv_end (struct __gconv_step *data)
>   	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
>   	  }								\
>   									\
> -	if (__glibc_unlikely (inptr + cnt > inend))			      \
> +	if (__glibc_unlikely (inptr + cnt > inend))			\
>   	  {								\
>   	    /* We don't have enough input.  But before we report	\
>   	       that check that all the bytes are correct.  */		\
> @@ -265,7 +344,7 @@ gconv_end (struct __gconv_step *data)
>   	      if ((inptr[i] & 0xc0) != 0x80)				\
>   		break;							\
>   									\
> -	    if (__glibc_likely (inptr + i == inend))			      \
> +	    if (__glibc_likely (inptr + i == inend))			\
>   	      {								\
>   		result = __GCONV_INCOMPLETE_INPUT;			\
>   		break;							\
> @@ -280,23 +359,31 @@ gconv_end (struct __gconv_step *data)
>   	       low) are needed.  */					\
>   	    uint16_t zabcd, high, low;					\
>   									\
> -	    if (__glibc_unlikely (outptr + 4 > outend))			      \
> +	    if (__glibc_unlikely (outptr + 4 > outend))			\
>   	      {								\
>   		/* Overflow in the output buffer.  */			\
>   		result = __GCONV_FULL_OUTPUT;				\
>   		break;							\
>   	      }								\
>   									\
> +	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
> +	    for (i = 1; i < cnt; ++i)					\
> +	      {								\
> +		if ((inptr[i] & 0xc0) != 0x80)				\
> +		  /* This is an illegal encoding.  */			\
> +		  goto errout;						\
> +	      }								\
> +									\
>   	    /* See Principles of Operations cu12.  */			\
>   	    zabcd = (((inptr[0] & 0x7) << 2) |				\
> -                     ((inptr[1] & 0x30) >> 4)) - 1;			\
> +		     ((inptr[1] & 0x30) >> 4)) - 1;			\
>   									\
>   	    /* z-bit must be zero after subtracting 1.  */		\
>   	    if (zabcd & 0x10)						\
>   	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
>   									\
>   	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
> -	    high |= zabcd << 6;	                        /* abcd bits */	\
> +	    high |= zabcd << 6;                         /* abcd bits */	\
>   	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
>   	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
>   									\
> @@ -326,8 +413,19 @@ gconv_end (struct __gconv_step *data)
>   		ch <<= 6;						\
>   		ch |= byte & 0x3f;					\
>   	      }								\
> -	    inptr += cnt;						\
>   									\
> +	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
> +	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
> +	       have been represented with fewer than cnt bytes.  */	\
> +	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
> +		/* Do not accept UTF-16 surrogates.  */			\
> +		|| (ch >= 0xd800 && ch <= 0xdfff))			\
> +	      {								\
> +		/* This is an illegal encoding.  */			\
> +		goto errout;						\
> +	      }								\
> +									\
> +	    inptr += cnt;						\
>   	  }								\
>         }									\
>       /* Now adjust the pointers and store the result.  */		\
> @@ -335,43 +433,70 @@ gconv_end (struct __gconv_step *data)
>       outptr += sizeof (uint16_t);					\
>     }
>
> +/* Generate loop-function with software implementation.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_c
> +#define LOOP_NEED_FLAGS
> +#define BODY			BODY_FROM_C
> +#include <iconv/loop.c>
> +
> +/* Generate loop-function with hardware utf-convert instruction.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> +#define LOOPFCT			__from_utf8_loop_etf3eh
>   #define LOOP_NEED_FLAGS
> +#define BODY			BODY_FROM_ETF3EH
>   #include <iconv/loop.c>
>
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +/* Generate loop-function with hardware vector and utf-convert instructions.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
> +# define LOOPFCT		__from_utf8_loop_vx
> +# define LOOP_NEED_FLAGS
> +# define BODY			BODY_FROM_VX
> +# include <iconv/loop.c>
> +#endif
> +
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__from_utf8_loop_c)
> +__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
> +__from_utf8_loop;
> +
> +static void *
> +__from_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __from_utf8_loop_vx;
> +  else
> +#endif
> +  if (dl_hwcap & HWCAP_S390_ETF3EH)
> +    return __from_utf8_loop_etf3eh;
> +  else
> +    return __from_utf8_loop_c;
> +}
> +
> +strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
> +
>   /* Conversion from UTF-16 to UTF-8.  */
>
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			TO_LOOP
>   /* The software routine is based on the functionality of the S/390
>      hardware instruction (cu21) as described in the Principles of
>      Operation.  */
> -#define BODY								\
> +#define BODY_TO_C							\
>     {									\
> -    /* The hardware instruction currently fails to report an error for	\
> -       isolated low surrogates so we have to disable the instruction	\
> -       until this gets resolved.  */					\
> -    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
> -      {									\
> -	HARDWARE_CONVERT ("cu21 %0, %1, 1");				\
> -	if (inptr != inend)						\
> -	  {								\
> -	    /* Check if the third byte is				\
> -	       a valid start of a UTF-16 surrogate.  */			\
> -	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
> -	      STANDARD_TO_LOOP_ERR_HANDLER (3);				\
> -									\
> -	    result = __GCONV_INCOMPLETE_INPUT;				\
> -	    break;							\
> -	  }								\
> -	continue;							\
> -      }									\
> -									\
>       uint16_t c = get16 (inptr);						\
>   									\
> -    if (__glibc_likely (c <= 0x007f))					      \
> +    if (__glibc_likely (c <= 0x007f))					\
>         {									\
>   	/* Single byte UTF-8 char.  */					\
>   	*outptr = c & 0xff;						\
> @@ -379,20 +504,20 @@ gconv_end (struct __gconv_step *data)
>         }									\
>       else if (c >= 0x0080 && c <= 0x07ff)				\
>         {									\
> -        /* Two byte UTF-8 char.  */					\
> +	/* Two byte UTF-8 char.  */					\
>   									\
> -	if (__glibc_unlikely (outptr + 2 > outend))			      \
> +	if (__glibc_unlikely (outptr + 2 > outend))			\
>   	  {								\
>   	    /* Overflow in the output buffer.  */			\
>   	    result = __GCONV_FULL_OUTPUT;				\
>   	    break;							\
>   	  }								\
>   									\
> -        outptr[0] = 0xc0;						\
> -        outptr[0] |= c >> 6;						\
> +	outptr[0] = 0xc0;						\
> +	outptr[0] |= c >> 6;						\
>   									\
> -        outptr[1] = 0x80;						\
> -        outptr[1] |= c & 0x3f;						\
> +	outptr[1] = 0x80;						\
> +	outptr[1] |= c & 0x3f;						\
>   									\
>   	outptr += 2;							\
>         }									\
> @@ -400,7 +525,7 @@ gconv_end (struct __gconv_step *data)
>         {									\
>   	/* Three byte UTF-8 char.  */					\
>   									\
> -	if (__glibc_unlikely (outptr + 3 > outend))			      \
> +	if (__glibc_unlikely (outptr + 3 > outend))			\
>   	  {								\
>   	    /* Overflow in the output buffer.  */			\
>   	    result = __GCONV_FULL_OUTPUT;				\
> @@ -419,22 +544,22 @@ gconv_end (struct __gconv_step *data)
>         }									\
>       else if (c >= 0xd800 && c <= 0xdbff)				\
>         {									\
> -        /* Four byte UTF-8 char.  */					\
> +	/* Four byte UTF-8 char.  */					\
>   	uint16_t low, uvwxy;						\
>   									\
> -	if (__glibc_unlikely (outptr + 4 > outend))			      \
> +	if (__glibc_unlikely (outptr + 4 > outend))			\
>   	  {								\
>   	    /* Overflow in the output buffer.  */			\
>   	    result = __GCONV_FULL_OUTPUT;				\
>   	    break;							\
>   	  }								\
> -	inptr += 2;							\
> -	if (__glibc_unlikely (inptr + 2 > inend))			      \
> +	if (__glibc_unlikely (inptr + 4 > inend))			\
>   	  {								\
>   	    result = __GCONV_INCOMPLETE_INPUT;				\
>   	    break;							\
>   	  }								\
>   									\
> +	inptr += 2;							\
>   	low = get16 (inptr);						\
>   									\
>   	if ((low & 0xfc00) != 0xdc00)					\
> @@ -461,11 +586,221 @@ gconv_end (struct __gconv_step *data)
>         }									\
>       else								\
>         {									\
> -        STANDARD_TO_LOOP_ERR_HANDLER (2);				\
> +	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
>         }									\
>       inptr += 2;								\
>     }
> -#define LOOP_NEED_FLAGS
> -#include <iconv/loop.c>
> +
> +#define BODY_TO_VX							\
> +  {									\
> +    size_t inlen  = inend - inptr;					\
> +    size_t outlen  = outend - outptr;					\
> +    unsigned long tmp, tmp2, tmp3;					\
> +    asm volatile (".machine push\n\t"					\
> +		  ".machine \"z13\"\n\t"				\
> +		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> +		  /* Setup to check for values <= 0x7f.  */		\
> +		  "larl %[R_TMP],9f\n\t"				\
> +		  "vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
> +		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
> +		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
> +		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
> +		  "lghi %[R_TMP2],0\n\t"				\
> +		  /* Check for > 1byte UTF-8 chars.  */			\
> +		  "vstrchs %%v19,%%v16,%%v30,%%v31\n\t"			\
> +		  "jno 10f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  "vstrchs %%v19,%%v17,%%v30,%%v31\n\t"			\
> +		  "jno 11f\n\t" /* Jump away if not all bytes are 1byte	\
> +				   UTF8 chars.  */			\
> +		  /* Shorten to UTF-8.  */				\
> +		  "vpkh %%v18,%%v16,%%v17\n\t"				\
> +		  "la %[R_IN],32(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-32\n\t"				\
> +		  /* Store 16 bytes to buf_out.  */			\
> +		  "vst %%v18,0(%[R_OUT])\n\t"				\
> +		  "aghi %[R_OUTLEN],-16\n\t"				\
> +		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> +		  "clgijl %[R_INLEN],32,2f\n\t"				\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
> +		  "j 1b\n\t"						\
> +		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
> +		  "9: .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  ".short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
> +		  /* At least one byte is > 0x7f.			\
> +		     Store the preceding 1-byte chars.  */		\
> +		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
> +		  "10:\n\t"						\
> +		  "vlgvb %[R_TMP],%%v19,7\n\t"				\
> +		  /* Shorten to UTF-8.  */				\
> +		  "vpkh %%v18,%%v16,%%v17\n\t"				\
> +		  "ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
> +		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
> +		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> +		  "jl 13f\n\t"						\
> +		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
> +		  /* Update pointers.  */				\
> +		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
> +		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
> +		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> +		  "13:\n\t"						\
> +		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
> +		  "lghi %[R_TMP2],16\n\t"				\
> +		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
> +		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-2\n\t"				\
> +		  "j 22f\n\t"						\
> +		  /* Handle remaining bytes.  */			\
> +		  "2:\n\t"						\
> +		  /* Zero, one or more bytes available?  */		\
> +		  "clgfi %[R_INLEN],1\n\t"				\
> +		  "locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
> +		  "jle 99f\n\t" /* End if less than two bytes.  */	\
> +		  /* Calculate remaining uint16_t values in inptr.  */	\
> +		  "srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
> +		  /* Handle multibyte utf8-char. */			\
> +		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
> +		  "aghi %[R_INLEN],-2\n\t"				\
> +		  /* Test if ch is 1-byte UTF-8 char.  */		\
> +		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
> +		  /* Handle 1-byte UTF-8 char.  */			\
> +		  "31: slgfi %[R_OUTLEN],1\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 2-byte UTF-8 char.  */		\
> +		  "22: clfi %[R_TMP],0x7ff\n\t"				\
> +		  "jh 23f\n\t"						\
> +		  /* Handle 2-byte UTF-8 char.  */			\
> +		  "32: slgfi %[R_OUTLEN],2\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "llill %[R_TMP3],0xc080\n\t"				\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
> +		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 3-byte UTF-8 char.  */		\
> +		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
> +		  "jh 24f\n\t"						\
> +		  /* Handle 3-byte UTF-8 char.  */			\
> +		  "33: slgfi %[R_OUTLEN],3\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "llilf %[R_TMP3],0xe08080\n\t"			\
> +		  "la %[R_IN],2(%[R_IN])\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
> +		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
> +		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 4-byte UTF-8 char.  */		\
> +		  "24: clfi %[R_TMP],0xdfff\n\t"			\
> +		  "jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */	\
> +		  "clfi %[R_TMP],0xdbff\n\t"				\
> +		  "locghih %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "jh 99f\n\t" /* Jump away if this is a low surrogate	\
> +				  without a preceding high surrogate.  */ \
> +		  /* Handle 4-byte UTF-8 char.  */			\
> +		  "34: slgfi %[R_OUTLEN],4\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "slgfi %[R_INLEN],2\n\t"				\
> +		  "locghil %[R_RES],%[RES_IN_FULL]\n\t"			\
> +		  "jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
> +		  "llilf %[R_TMP3],0xf0808080\n\t"			\
> +		  "aghi %[R_TMP],0x40\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */	\
> +		  "risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
> +		  "llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
> +		  "nilf %[R_TMP],0xfc00\n\t"				\
> +		  "clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
> +		  "locghine %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
> +		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "aghi %[R_TMP2],-2\n\t"				\
> +		  "jh 20b\n\t"						\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
> +		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
> +		  "99:\n\t"						\
> +		  ".machine pop"					\
> +		  : /* outputs */ [R_IN] "+a" (inptr)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
> +		    , [R_RES] "+d" (result)				\
> +		  : /* inputs */					\
> +		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
> +		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
> +		  : /* clobber list */ "memory", "cc"			\
> +		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
> +		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> +		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
> +		  );							\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result != __GCONV_ILLEGAL_INPUT)				\
> +      break;								\
> +									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
> +  }
> +
> +/* Generate loop-function with software implementation.  */
> +#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
> +#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +#if defined HAVE_S390_VX_ASM_SUPPORT
> +# define LOOPFCT		__to_utf8_loop_c
> +# define BODY                   BODY_TO_C
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +
> +/* Generate loop-function with software implementation.  */
> +# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> +# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
> +# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> +# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> +# define LOOPFCT		__to_utf8_loop_vx
> +# define BODY                   BODY_TO_VX
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +
> +/* Generate ifunc'ed loop function.  */
> +__typeof(__to_utf8_loop_c)
> +__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
> +__to_utf8_loop;
> +
> +static void *
> +__to_utf8_loop_resolver (unsigned long int dl_hwcap)
> +{
> +  if (dl_hwcap & HWCAP_S390_VX)
> +    return __to_utf8_loop_vx;
> +  else
> +    return __to_utf8_loop_c;
> +}
> +
> +strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
> +
> +#else
> +# define LOOPFCT		TO_LOOP
> +# define BODY                   BODY_TO_C
> +# define LOOP_NEED_FLAGS
> +# include <iconv/loop.c>
> +#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
>
>   #include <iconv/skeleton.c>
>

[-- Attachment #2: 0008-S390-Optimize-utf8-utf16-module.patch --]
[-- Type: text/x-patch, Size: 28224 bytes --]

From 85018cce92ca04453f7b7177d6dd4100349f2ec3 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 08/14] S390: Optimize utf8-utf16 module.

This patch reworks the s390 specific module to convert between utf8 and utf16.
Now ifunc is used to choose either the c or etf3eh (with convert utf instruction)
variants at runtime. Furthermore a new vector variant for z13 is introduced
which will be build and chosen if vector support is available at build / runtime.

In case of converting utf 8 to utf16, the vector variant optimizes input of
1byte utf8 characters. The convert utf instruction is used if a multibyte utf8
character is found.

For the other direction utf16 to utf8, the cu21 instruction can't be re-enabled,
because it does not report an error, if the input-stream consists of a single
low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13, too.
Thus there is only the c or the new vector variant, which can handle 1..4 byte
utf8 characters.

The c variant from utf16 to utf8 has beed fixed. If a high surrogate was at the
end of the input-buffer, then errno was set to EINVAL and the input-pointer
pointed just after the high surrogate. Now it points to the beginning of the
high surrogate.

This patch also fixes some whitespace errors. The c variant from utf8 to utf16
is now checking that tail-bytes starts with 0b10... and the value is not in
range of an utf16 surrogate.

Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
Before they ignored the ignore-case and always stopped at an error.

ChangeLog:

	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Use ifunc to select c,
	etf3eh or new vector loop-variant.
---
 sysdeps/s390/s390-64/utf8-utf16-z9.c | 547 ++++++++++++++++++++++++++++-------
 1 file changed, 441 insertions(+), 106 deletions(-)

diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
index 4148ed7..7520ef2 100644
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ b/sysdeps/s390/s390-64/utf8-utf16-z9.c
@@ -30,33 +30,27 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	0xfeff
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
 
+/* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
 #define MIN_NEEDED_FROM		1
 #define MAX_NEEDED_FROM		4
 #define MIN_NEEDED_TO		2
 #define MAX_NEEDED_TO		4
-#define FROM_LOOP		from_utf8_loop
-#define TO_LOOP			to_utf8_loop
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
 #define FROM_DIRECTION		(dir == from_utf8)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the UTF-16 Byte Order Mark.  */				\
-      if (__glibc_unlikely (outbuf + 2 > outend))			      \
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put16u (outbuf, BOM_UTF16);					\
-      outbuf += 2;							\
-    }
+
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16	0xfeff
 
 /* Direction of the transformation.  */
 enum direction
@@ -151,16 +145,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			"+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -169,50 +163,135 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the UTF-16 Byte Order Mark.  */				\
+      if (__glibc_unlikely (outbuf + 2 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put16u (outbuf, BOM_UTF16);					\
+      outbuf += 2;							\
+    }
+
 /* Conversion function from UTF-8 to UTF-16.  */
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Enlarge to UTF-16.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v18,%%v19,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest \
+							 index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
 /* The software implementation is based on the code in gconv_simple.c.  */
-#define BODY								\
+#define BODY_FROM_C							\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu12 %0, %1, 1");				\
-									\
-	if (inptr != inend)						\
-	  {								\
-	    int i;							\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-								\
-	    if (__glibc_likely (inptr + i == inend))			      \
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-	continue;							\
-    }									\
-									\
     /* Next input byte.  */						\
     uint16_t ch = *inptr;						\
 									\
-    if (__glibc_likely (ch < 0x80))					      \
+    if (__glibc_likely (ch < 0x80))					\
       {									\
 	/* One byte sequence.  */					\
 	++inptr;							\
@@ -230,13 +309,13 @@ gconv_end (struct __gconv_step *data)
 	    cnt = 2;							\
 	    ch &= 0x1f;							\
 	  }								\
-        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
 	  {								\
 	    /* We expect three bytes.  */				\
 	    cnt = 3;							\
 	    ch &= 0x0f;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
 	  {								\
 	    /* We expect four bytes.  */				\
 	    cnt = 4;							\
@@ -257,7 +336,7 @@ gconv_end (struct __gconv_step *data)
 	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
 	  }								\
 									\
-	if (__glibc_unlikely (inptr + cnt > inend))			      \
+	if (__glibc_unlikely (inptr + cnt > inend))			\
 	  {								\
 	    /* We don't have enough input.  But before we report	\
 	       that check that all the bytes are correct.  */		\
@@ -265,7 +344,7 @@ gconv_end (struct __gconv_step *data)
 	      if ((inptr[i] & 0xc0) != 0x80)				\
 		break;							\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
+	    if (__glibc_likely (inptr + i == inend))			\
 	      {								\
 		result = __GCONV_INCOMPLETE_INPUT;			\
 		break;							\
@@ -280,23 +359,31 @@ gconv_end (struct __gconv_step *data)
 	       low) are needed.  */					\
 	    uint16_t zabcd, high, low;					\
 									\
-	    if (__glibc_unlikely (outptr + 4 > outend))			      \
+	    if (__glibc_unlikely (outptr + 4 > outend))			\
 	      {								\
 		/* Overflow in the output buffer.  */			\
 		result = __GCONV_FULL_OUTPUT;				\
 		break;							\
 	      }								\
 									\
+	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		if ((inptr[i] & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  goto errout;						\
+	      }								\
+									\
 	    /* See Principles of Operations cu12.  */			\
 	    zabcd = (((inptr[0] & 0x7) << 2) |				\
-                     ((inptr[1] & 0x30) >> 4)) - 1;			\
+		     ((inptr[1] & 0x30) >> 4)) - 1;			\
 									\
 	    /* z-bit must be zero after subtracting 1.  */		\
 	    if (zabcd & 0x10)						\
 	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
 									\
 	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
-	    high |= zabcd << 6;	                        /* abcd bits */	\
+	    high |= zabcd << 6;                         /* abcd bits */	\
 	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
 	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
 									\
@@ -326,8 +413,19 @@ gconv_end (struct __gconv_step *data)
 		ch <<= 6;						\
 		ch |= byte & 0x3f;					\
 	      }								\
-	    inptr += cnt;						\
 									\
+	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
+	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+	       have been represented with fewer than cnt bytes.  */	\
+	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
+		/* Do not accept UTF-16 surrogates.  */			\
+		|| (ch >= 0xd800 && ch <= 0xdfff))			\
+	      {								\
+		/* This is an illegal encoding.  */			\
+		goto errout;						\
+	      }								\
+									\
+	    inptr += cnt;						\
 	  }								\
       }									\
     /* Now adjust the pointers and store the result.  */		\
@@ -335,43 +433,70 @@ gconv_end (struct __gconv_step *data)
     outptr += sizeof (uint16_t);					\
   }
 
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
 /* Conversion from UTF-16 to UTF-8.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine is based on the functionality of the S/390
    hardware instruction (cu21) as described in the Principles of
    Operation.  */
-#define BODY								\
+#define BODY_TO_C							\
   {									\
-    /* The hardware instruction currently fails to report an error for	\
-       isolated low surrogates so we have to disable the instruction	\
-       until this gets resolved.  */					\
-    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
-      {									\
-	HARDWARE_CONVERT ("cu21 %0, %1, 1");				\
-	if (inptr != inend)						\
-	  {								\
-	    /* Check if the third byte is				\
-	       a valid start of a UTF-16 surrogate.  */			\
-	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
-	      STANDARD_TO_LOOP_ERR_HANDLER (3);				\
-									\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint16_t c = get16 (inptr);						\
 									\
-    if (__glibc_likely (c <= 0x007f))					      \
+    if (__glibc_likely (c <= 0x007f))					\
       {									\
 	/* Single byte UTF-8 char.  */					\
 	*outptr = c & 0xff;						\
@@ -379,20 +504,20 @@ gconv_end (struct __gconv_step *data)
       }									\
     else if (c >= 0x0080 && c <= 0x07ff)				\
       {									\
-        /* Two byte UTF-8 char.  */					\
+	/* Two byte UTF-8 char.  */					\
 									\
-	if (__glibc_unlikely (outptr + 2 > outend))			      \
+	if (__glibc_unlikely (outptr + 2 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
 	    break;							\
 	  }								\
 									\
-        outptr[0] = 0xc0;						\
-        outptr[0] |= c >> 6;						\
+	outptr[0] = 0xc0;						\
+	outptr[0] |= c >> 6;						\
 									\
-        outptr[1] = 0x80;						\
-        outptr[1] |= c & 0x3f;						\
+	outptr[1] = 0x80;						\
+	outptr[1] |= c & 0x3f;						\
 									\
 	outptr += 2;							\
       }									\
@@ -400,7 +525,7 @@ gconv_end (struct __gconv_step *data)
       {									\
 	/* Three byte UTF-8 char.  */					\
 									\
-	if (__glibc_unlikely (outptr + 3 > outend))			      \
+	if (__glibc_unlikely (outptr + 3 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
@@ -419,22 +544,22 @@ gconv_end (struct __gconv_step *data)
       }									\
     else if (c >= 0xd800 && c <= 0xdbff)				\
       {									\
-        /* Four byte UTF-8 char.  */					\
+	/* Four byte UTF-8 char.  */					\
 	uint16_t low, uvwxy;						\
 									\
-	if (__glibc_unlikely (outptr + 4 > outend))			      \
+	if (__glibc_unlikely (outptr + 4 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
 	    break;							\
 	  }								\
-	inptr += 2;							\
-	if (__glibc_unlikely (inptr + 2 > inend))			      \
+	if (__glibc_unlikely (inptr + 4 > inend))			\
 	  {								\
 	    result = __GCONV_INCOMPLETE_INPUT;				\
 	    break;							\
 	  }								\
 									\
+	inptr += 2;							\
 	low = get16 (inptr);						\
 									\
 	if ((low & 0xfc00) != 0xdc00)					\
@@ -461,11 +586,221 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
       }									\
     inptr += 2;								\
   }
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
+
+#define BODY_TO_VX							\
+  {									\
+    size_t inlen  = inend - inptr;					\
+    size_t outlen  = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for values <= 0x7f.  */		\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Check for > 1byte UTF-8 chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  "    vstrchs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
+		  "9:  .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  "    .short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
+		  "10:\n\t"						\
+		  "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 13f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  "13: \n\t"						\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "    lghi %[R_TMP2],16\n\t"				\
+		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    j 22f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
+		  "    jle 99f\n\t" /* End if less than two bytes.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  /* Test if ch is 1-byte UTF-8 char.  */		\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 2-byte UTF-8 char.  */		\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "    jh 23f\n\t"					\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llill %[R_TMP3],0xc080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
+		  "    jh 24f\n\t"					\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xe08080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */ \
+		  "    clfi %[R_TMP],0xdbff\n\t"			\
+		  "    locghih %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jh 99f\n\t" /* Jump away if this is a low surrogate \
+				      without a preceding high surrogate.  */ \
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    slgfi %[R_INLEN],2\n\t"				\
+		  "    locghil %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
+		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "    aghi %[R_TMP],0x40\n\t"				\
+		  "    risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
+		  "    llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
+		  "    nilf %[R_TMP],0xfc00\n\t"			\
+		  "    clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    locghine %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 20b\n\t"					\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__to_utf8_loop_c
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate loop-function with software implementation.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY                   BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+#else
+# define LOOPFCT		TO_LOOP
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
 
 #include <iconv/skeleton.c>
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 11/14] S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).
  2016-02-23  9:22 ` [PATCH 11/14] S390: Fix utf32 to utf8 handling of low surrogates (disable cu41) Stefan Liebler
@ 2016-04-21 15:25   ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:25 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 11994 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> According to the latest Unicode standard, a conversion from/to UTF-xx has
> to report an error if the character value is in range of an utf16 surrogate
> (0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
>
> Thus the cu41 instruction, which converts from utf32 to utf8,  has to be
> disabled because it does not report an error in case of a value in range of
> a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
> vector variant is adjusted to handle the value in range of an utf16 low
> surrogate correctly.
>
> ChangeLog:
>
> 	* sysdeps/s390/utf8-utf32-z9.c: Disable cu41 instruction and report
> 	an error in case of a value in range of an utf16 low surrogate.
> ---
>   sysdeps/s390/utf8-utf32-z9.c | 188 ++++++++++++++++++++++++++-----------------
>   1 file changed, 115 insertions(+), 73 deletions(-)
>
> diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
> index 1b2d6a2..b378823 100644
> --- a/sysdeps/s390/utf8-utf32-z9.c
> +++ b/sysdeps/s390/utf8-utf32-z9.c
> @@ -572,28 +572,6 @@ __from_utf8_loop_resolver (unsigned long int dl_hwcap)
>
>   strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
>
> -
> -/* Conversion from UTF-32 internal/BE to UTF-8.  */
> -#define BODY_TO_HW(ASM)							\
> -  {									\
> -    ASM;								\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> -      break;								\
> -    if (inptr + 4 > inend)						\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> -    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> -  }
> -
> -/* The hardware routine uses the S/390 cu41 instruction.  */
> -#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
> -
> -/* The hardware routine uses the S/390 vector and cu41 instructions.  */
> -#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
> -
>   /* The software routine mimics the S/390 cu41 instruction.  */
>   #define BODY_TO_C						\
>     {								\
> @@ -632,7 +610,7 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
>   	    result = __GCONV_FULL_OUTPUT;			\
>   	    break;						\
>   	  }							\
> -	if (wc >= 0xd800 && wc < 0xdc00)			\
> +	if (wc >= 0xd800 && wc <= 0xdfff)			\
>   	  {							\
>   	    /* Do not accept UTF-16 surrogates.   */		\
>   	    result = __GCONV_ILLEGAL_INPUT;			\
> @@ -679,13 +657,12 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
>       inptr += 4;							\
>     }
>
> -#define HW_TO_VX							\
> +/* The hardware routine uses the S/390 vector instructions.  */
> +#define BODY_TO_VX							\
>     {									\
> -    register const unsigned char* pInput asm ("8") = inptr;		\
> -    register size_t inlen asm ("9") = inend - inptr;			\
> -    register unsigned char* pOutput asm ("10") = outptr;		\
> -    register size_t outlen asm("11") = outend - outptr;			\
> -    unsigned long tmp, tmp2;						\
> +    size_t inlen = inend - inptr;					\
> +    size_t outlen = outend - outptr;					\
> +    unsigned long tmp, tmp2, tmp3;					\
>       asm volatile (".machine push\n\t"					\
>   		  ".machine \"z13\"\n\t"				\
>   		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
> @@ -696,10 +673,10 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
>   		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
>   		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
>   		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
> -		  "0: clgijl %[R_INLEN],64,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "0: clgijl %[R_INLEN],64,2f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
>   		  "1: vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
> -		  "lghi %[R_TMP],0\n\t"					\
> +		  "lghi %[R_TMP2],0\n\t"				\
>   		  /* Shorten to byte values.  */			\
>   		  "vpkf %%v23,%%v16,%%v17\n\t"				\
>   		  "vpkf %%v24,%%v18,%%v19\n\t"				\
> @@ -719,41 +696,116 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
>   		  "aghi %[R_OUTLEN],-16\n\t"				\
>   		  "la %[R_IN],64(%[R_IN])\n\t"				\
>   		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],64,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "clgijl %[R_INLEN],64,2f\n\t"				\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
>   		  "j 1b\n\t"						\
>   		  /* Found a value > 0x7f.  */				\
> -		  "13: ahi %[R_TMP],4\n\t"				\
> -		  "12: ahi %[R_TMP],4\n\t"				\
> -		  "11: ahi %[R_TMP],4\n\t"				\
> -		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
> -		  "srlg %[R_I],%[R_I],2\n\t"				\
> -		  "agr %[R_I],%[R_TMP]\n\t"				\
> -		  "je 20f\n\t"						\
> +		  "13: ahi %[R_TMP2],4\n\t"				\
> +		  "12: ahi %[R_TMP2],4\n\t"				\
> +		  "11: ahi %[R_TMP2],4\n\t"				\
> +		  "10: vlgvb %[R_TMP],%%v22,7\n\t"			\
> +		  "srlg %[R_TMP],%[R_TMP],2\n\t"			\
> +		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
> +		  "je 16f\n\t"						\
>   		  /* Store characters before invalid one...  */		\
> -		  "slgr %[R_OUTLEN],%[R_I]\n\t"				\
> -		  "15: aghi %[R_I],-1\n\t"				\
> -		  "vstl %%v23,%[R_I],0(%[R_OUT])\n\t"			\
> +		  "slgr %[R_OUTLEN],%[R_TMP]\n\t"			\
> +		  "15: aghi %[R_TMP],-1\n\t"				\
> +		  "vstl %%v23,%[R_TMP],0(%[R_OUT])\n\t"			\
>   		  /* ... and update pointers.  */			\
> -		  "aghi %[R_I],1\n\t"					\
> -		  "la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"			\
> -		  "sllg %[R_I],%[R_I],2\n\t"				\
> -		  "la %[R_IN],0(%[R_I],%[R_IN])\n\t"			\
> -		  "slgr %[R_INLEN],%[R_I]\n\t"				\
> -		  /* Handle multibyte utf8-char with convert instruction. */ \
> -		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
> -		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> -		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> -		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  "aghi %[R_TMP],1\n\t"					\
> +		  "la %[R_OUT],0(%[R_TMP],%[R_OUT])\n\t"		\
> +		  "sllg %[R_TMP2],%[R_TMP],2\n\t"			\
> +		  "la %[R_IN],0(%[R_TMP2],%[R_IN])\n\t"			\
> +		  "slgr %[R_INLEN],%[R_TMP2]\n\t"			\
> +		  /* Calculate remaining uint32_t values in loaded vrs.  */ \
> +		  "16: lghi %[R_TMP2],16\n\t"				\
> +		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
> +		  "l %[R_TMP],0(%[R_IN])\n\t"				\
> +		  "aghi %[R_INLEN],-4\n\t"				\
> +		  "j 22f\n\t"						\
> +		  /* Handle remaining bytes.  */			\
> +		  "2: clgije %[R_INLEN],0,99f\n\t"			\
> +		  "clgijl %[R_INLEN],4,92f\n\t"				\
> +		  /* Calculate remaining uint32_t values in inptr.  */	\
> +		  "srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
> +		  /* Handle multibyte utf8-char. */			\
> +		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
> +		  "aghi %[R_INLEN],-4\n\t"				\
> +		  /* Test if ch is 1byte UTF-8 char. */			\
> +		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
> +		  /* Handle 1-byte UTF-8 char.  */			\
> +		  "31: slgfi %[R_OUTLEN],1\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "stc %[R_TMP],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],1(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 2byte UTF-8 char. */			\
> +		  "22: clfi %[R_TMP],0x7ff\n\t"				\
> +		  "jh 23f\n\t"						\
> +		  /* Handle 2-byte UTF-8 char.  */			\
> +		  "32: slgfi %[R_OUTLEN],2\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "llill %[R_TMP3],0xc080\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
> +		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 3-byte UTF-8 char.  */		\
> +		  "23: clfi %[R_TMP],0xffff\n\t"			\
> +		  "jh 24f\n\t"						\
> +		  /* Handle 3-byte UTF-8 char.  */			\
> +		  "33: slgfi %[R_OUTLEN],3\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "llilf %[R_TMP3],0xe08080\n\t"			\
> +		  "risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
> +		  /* Test if ch is a UTF-16 surrogate: ch & 0xf800 == 0xd800  */ \
> +		  "nilf %[R_TMP],0xf800\n\t"				\
> +		  "clfi %[R_TMP],0xd800\n\t"				\
> +		  "je 91f\n\t" /* Do not accept UTF-16 surrogates.  */	\
> +		  "stcm %[R_TMP3],7,0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],3(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 4-byte UTF-8 char.  */		\
> +		  "24: clfi %[R_TMP],0x10ffff\n\t"			\
> +		  "jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
> +		  /* Handle 4-byte UTF-8 char.  */			\
> +		  "34: slgfi %[R_OUTLEN],4\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "llilf %[R_TMP3],0xf0808080\n\t"			\
> +		  "risbgn %[R_TMP3],%[R_TMP],37,39,6\n\t" /* 1. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],42,47,4\n\t" /* 2. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 3. byte.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte.  */ \
> +		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
> +		  "j 99f\n\t"						\
> +		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "j 99f\n\t"						\
> +		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
> +		  "99:\n\t"						\
>   		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (pInput)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> -		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
> -		    , [R_I] "=a" (tmp2)					\
> +		  : /* outputs */ [R_IN] "+a" (inptr)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
> +		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
> +		    , [R_TMP2] "=a" (tmp2), [R_TMP3] "=d" (tmp3)	\
>   		    , [R_RES] "+d" (result)				\
>   		  : /* inputs */					\
>   		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
>   		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
> +		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
>   		  : /* clobber list */ "memory", "cc"			\
>   		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
>   		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
> @@ -761,8 +813,11 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
>   		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
>   		    ASM_CLOBBER_VR ("v24")				\
>   		  );							\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> +    if (__glibc_likely (inptr == inend)					\
> +	|| result != __GCONV_ILLEGAL_INPUT)				\
> +      break;								\
> +									\
> +    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
>     }
>
>   /* Generate loop-function with software routing.  */
> @@ -774,15 +829,6 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
>   #define LOOP_NEED_FLAGS
>   #include <iconv/loop.c>
>
> -/* Generate loop-function with hardware utf-convert instruction.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			__to_utf8_loop_etf3eh
> -#define LOOP_NEED_FLAGS
> -#define BODY			BODY_TO_ETF3EH
> -#include <iconv/loop.c>
> -
>   #if defined HAVE_S390_VX_ASM_SUPPORT
>   /* Generate loop-function with hardware vector and utf-convert instructions.  */
>   # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> @@ -807,10 +853,6 @@ __to_utf8_loop_resolver (unsigned long int dl_hwcap)
>       return __to_utf8_loop_vx;
>     else
>   #endif
> -  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
> -      && dl_hwcap & HWCAP_S390_ETF3EH)
> -    return __to_utf8_loop_etf3eh;
> -  else
>       return __to_utf8_loop_c;
>   }
>
>

[-- Attachment #2: 0011-S390-Fix-utf32-to-utf8-handling-of-low-surrogates-di.patch --]
[-- Type: text/x-patch, Size: 11814 bytes --]

From 8f6fa15125cd74e49340fc11858b0692e6789950 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 11/14] S390: Fix utf32 to utf8 handling of low surrogates
 (disable cu41).

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.

Thus the cu41 instruction, which converts from utf32 to utf8,  has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.

ChangeLog:

	* sysdeps/s390/utf8-utf32-z9.c: Disable cu41 instruction and report
	an error in case of a value in range of an utf16 low surrogate.
---
 sysdeps/s390/utf8-utf32-z9.c | 188 ++++++++++++++++++++++++++-----------------
 1 file changed, 115 insertions(+), 73 deletions(-)

diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
index e39e0a7..efae745 100644
--- a/sysdeps/s390/utf8-utf32-z9.c
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -572,28 +572,6 @@ __from_utf8_loop_resolver (unsigned long int dl_hwcap)
 
 strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 
-
-/* Conversion from UTF-32 internal/BE to UTF-8.  */
-#define BODY_TO_HW(ASM)							\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* The hardware routine uses the S/390 cu41 instruction.  */
-#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
-
-/* The hardware routine uses the S/390 vector and cu41 instructions.  */
-#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
-
 /* The software routine mimics the S/390 cu41 instruction.  */
 #define BODY_TO_C						\
   {								\
@@ -632,7 +610,7 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
-	if (wc >= 0xd800 && wc < 0xdc00)			\
+	if (wc >= 0xd800 && wc <= 0xdfff)			\
 	  {							\
 	    /* Do not accept UTF-16 surrogates.   */		\
 	    result = __GCONV_ILLEGAL_INPUT;			\
@@ -679,13 +657,12 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
     inptr += 4;							\
   }
 
-#define HW_TO_VX							\
+/* The hardware routine uses the S/390 vector instructions.  */
+#define BODY_TO_VX							\
   {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2;						\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
     asm volatile (".machine push\n\t"					\
 		  ".machine \"z13\"\n\t"				\
 		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
@@ -696,10 +673,10 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
 		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
-		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "0:  clgijl %[R_INLEN],64,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP],0\n\t"				\
+		  "    lghi %[R_TMP2],0\n\t"				\
 		  /* Shorten to byte values.  */			\
 		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
 		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
@@ -719,41 +696,116 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  "    aghi %[R_OUTLEN],-16\n\t"			\
 		  "    la %[R_IN],64(%[R_IN])\n\t"			\
 		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    clgijl %[R_INLEN],64,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "    j 1b\n\t"					\
 		  /* Found a value > 0x7f.  */				\
-		  "13: ahi %[R_TMP],4\n\t"				\
-		  "12: ahi %[R_TMP],4\n\t"				\
-		  "11: ahi %[R_TMP],4\n\t"				\
-		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
-		  "    srlg %[R_I],%[R_I],2\n\t"			\
-		  "    agr %[R_I],%[R_TMP]\n\t"				\
-		  "    je 20f\n\t"					\
+		  "13: ahi %[R_TMP2],4\n\t"				\
+		  "12: ahi %[R_TMP2],4\n\t"				\
+		  "11: ahi %[R_TMP2],4\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v22,7\n\t"			\
+		  "    srlg %[R_TMP],%[R_TMP],2\n\t"			\
+		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
+		  "    je 16f\n\t"					\
 		  /* Store characters before invalid one...  */		\
-		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
-		  "15: aghi %[R_I],-1\n\t"				\
-		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP]\n\t"			\
+		  "15: aghi %[R_TMP],-1\n\t"				\
+		  "    vstl %%v23,%[R_TMP],0(%[R_OUT])\n\t"		\
 		  /* ... and update pointers.  */			\
-		  "    aghi %[R_I],1\n\t"				\
-		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
-		  "    sllg %[R_I],%[R_I],2\n\t"			\
-		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  "    aghi %[R_TMP],1\n\t"				\
+		  "    la %[R_OUT],0(%[R_TMP],%[R_OUT])\n\t"		\
+		  "    sllg %[R_TMP2],%[R_TMP],2\n\t"			\
+		  "    la %[R_IN],0(%[R_TMP2],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP2]\n\t"			\
+		  /* Calculate remaining uint32_t values in loaded vrs.  */ \
+		  "16: lghi %[R_TMP2],16\n\t"				\
+		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-4\n\t"				\
+		  "    j 22f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  clgije %[R_INLEN],0,99f\n\t"			\
+		  "    clgijl %[R_INLEN],4,92f\n\t"			\
+		  /* Calculate remaining uint32_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-4\n\t"				\
+		  /* Test if ch is 1byte UTF-8 char. */			\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 2byte UTF-8 char. */			\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "    jh 23f\n\t"					\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llill %[R_TMP3],0xc080\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xffff\n\t"			\
+		  "    jh 24f\n\t"					\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xe08080\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  /* Test if ch is a UTF-16 surrogate: ch & 0xf800 == 0xd800  */ \
+		  "    nilf %[R_TMP],0xf800\n\t"			\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    je 91f\n\t" /* Do not accept UTF-16 surrogates.  */ \
+		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0x10ffff\n\t"			\
+		  "    jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],37,39,6\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],42,47,4\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 3. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "    j 99f\n\t"					\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
 		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
-		    , [R_I] "=a" (tmp2)					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=a" (tmp2), [R_TMP3] "=d" (tmp3)	\
 		    , [R_RES] "+d" (result)				\
 		  : /* inputs */					\
 		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
 		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
 		  : /* clobber list */ "memory", "cc"			\
 		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
 		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
@@ -761,8 +813,11 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
 		    ASM_CLOBBER_VR ("v24")				\
 		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
   }
 
 /* Generate loop-function with software routing.  */
@@ -774,15 +829,6 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 #define LOOP_NEED_FLAGS
 #include <iconv/loop.c>
 
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
 #if defined HAVE_S390_VX_ASM_SUPPORT
 /* Generate loop-function with hardware vector and utf-convert instructions.  */
 # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
@@ -807,10 +853,6 @@ __to_utf8_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf8_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
-      && dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf8_loop_etf3eh;
-  else
     return __to_utf8_loop_c;
 }
 
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/14] S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).
  2016-02-23  9:22 ` [PATCH 12/14] S390: Fix utf32 to utf16 handling of low surrogates (disable cu42) Stefan Liebler
@ 2016-04-21 15:30   ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:30 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 10854 bytes --]

Here is an updated patch, where the labels in inline assemblies are 
out-dented as suggested by Florian.

On 02/23/2016 10:21 AM, Stefan Liebler wrote:
> According to the latest Unicode standard, a conversion from/to UTF-xx has
> to report an error if the character value is in range of an utf16 surrogate
> (0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
>
> Thus the cu42 instruction, which converts from utf32 to utf16,  has to be
> disabled because it does not report an error in case of a value in range of
> a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
> vector variant is adjusted to handle the value in range of an utf16 low
> surrogate correctly.
>
> ChangeLog:
>
> 	* sysdeps/s390/utf16-utf32-z9.c: Disable cu42 instruction and report
> 	an error in case of a value in range of an utf16 low surrogate.
> ---
>   sysdeps/s390/utf16-utf32-z9.c | 155 +++++++++++++++++-------------------------
>   1 file changed, 62 insertions(+), 93 deletions(-)
>
> diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
> index ecf06bd..70aa640 100644
> --- a/sysdeps/s390/utf16-utf32-z9.c
> +++ b/sysdeps/s390/utf16-utf32-z9.c
> @@ -145,42 +145,6 @@ gconv_end (struct __gconv_step *data)
>     free (data->__data);
>   }
>
> -/* The macro for the hardware loop.  This is used for both
> -   directions.  */
> -#define HARDWARE_CONVERT(INSTRUCTION)					\
> -  {									\
> -    register const unsigned char* pInput __asm__ ("8") = inptr;		\
> -    register size_t inlen __asm__ ("9") = inend - inptr;		\
> -    register unsigned char* pOutput __asm__ ("10") = outptr;		\
> -    register size_t outlen __asm__("11") = outend - outptr;		\
> -    unsigned long cc = 0;						\
> -									\
> -    __asm__ __volatile__ (".machine push       \n\t"			\
> -			  ".machine \"z9-109\" \n\t"			\
> -			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
> -			  "0: " INSTRUCTION "  \n\t"			\
> -			  ".machine pop        \n\t"			\
> -			  "   jo     0b        \n\t"			\
> -			  "   ipm    %2        \n"			\
> -			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
> -			    "+d" (outlen), "+d" (inlen)			\
> -			  :						\
> -			  : "cc", "memory");				\
> -									\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -    cc >>= 28;								\
> -									\
> -    if (cc == 1)							\
> -      {									\
> -	result = __GCONV_FULL_OUTPUT;					\
> -      }									\
> -    else if (cc == 2)							\
> -      {									\
> -	result = __GCONV_ILLEGAL_INPUT;					\
> -      }									\
> -  }
> -
>   #define PREPARE_LOOP							\
>     enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
>     int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
> @@ -310,7 +274,7 @@ gconv_end (struct __gconv_step *data)
>   		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
>   		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
>   		  "12: lghi %[R_TMP2],16\n\t"				\
> -		  "sgr %[R_TMP2],%[R_TMP]\n\t"				\
> +		  "slgr %[R_TMP2],%[R_TMP]\n\t"				\
>   		  "srl %[R_TMP2],1\n\t"					\
>   		  "llh %[R_TMP],0(%[R_IN])\n\t"				\
>   		  "aghi %[R_OUTLEN],-4\n\t"				\
> @@ -437,7 +401,7 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
>       uint32_t c = get32 (inptr);						\
>   									\
>       if (__builtin_expect (c <= 0xd7ff, 1)				\
> -	|| (c >=0xdc00 && c <= 0xffff))					\
> +	|| (c > 0xdfff && c <= 0xffff))					\
>         {									\
>   	/* Two UTF-16 chars.  */					\
>   	put16 (outptr, c);						\
> @@ -475,29 +439,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
>       inptr += 4;								\
>     }
>
> -#define BODY_TO_ETF3EH							\
> -  {									\
> -    HARDWARE_CONVERT ("cu42 %0, %1");					\
> -									\
> -    if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> -      break;								\
> -									\
> -    if (inptr + 4 > inend)						\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> -									\
> -    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
> -  }
> -
>   #define BODY_TO_VX							\
>     {									\
> -    register const unsigned char* pInput asm ("8") = inptr;		\
> -    register size_t inlen asm ("9") = inend - inptr;			\
> -    register unsigned char* pOutput asm ("10") = outptr;		\
> -    register size_t outlen asm("11") = outend - outptr;			\
> +    size_t inlen = inend - inptr;					\
> +    size_t outlen = outend - outptr;					\
>       unsigned long tmp, tmp2, tmp3;					\
>       asm volatile (".machine push\n\t"					\
>   		  ".machine \"z13\"\n\t"				\
> @@ -509,8 +454,8 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
>   		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
>   		  /* Loop which handles UTF-16 chars			\
>   		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
> -		  "0: clgijl %[R_INLEN],32,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "0: clgijl %[R_INLEN],32,2f\n\t"			\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
>   		  "1: vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
>   		  "lghi %[R_TMP2],0\n\t"				\
>   		  /* Shorten to UTF-16.  */				\
> @@ -526,9 +471,15 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
>   		  "aghi %[R_INLEN],-32\n\t"				\
>   		  "aghi %[R_OUTLEN],-16\n\t"				\
>   		  "la %[R_OUT],16(%[R_OUT])\n\t"			\
> -		  "clgijl %[R_INLEN],32,20f\n\t"			\
> -		  "clgijl %[R_OUTLEN],16,20f\n\t"			\
> +		  "clgijl %[R_INLEN],32,2f\n\t"				\
> +		  "clgijl %[R_OUTLEN],16,2f\n\t"			\
>   		  "j 1b\n\t"						\
> +		  /* Calculate remaining uint32_t values in inptr.  */	\
> +		  "2:\n\t"						\
> +		  "clgije %[R_INLEN],0,99f\n\t"				\
> +		  "clgijl %[R_INLEN],4,92f\n\t"				\
> +		  "srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
> +		  "j 20f\n\t"						\
>   		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
>   		     and check for ch >= 0x10000. (v30, v31)  */	\
>   		  "9: .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
> @@ -540,21 +491,59 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
>   		  "agr %[R_TMP],%[R_TMP2]\n\t"				\
>   		  "srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
>   		  "ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
> -		  "jl 20f\n\t"						\
> +		  "jl 12f\n\t"						\
>   		  "vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
>   		  /* Update pointers.  */				\
>   		  "la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"			\
>   		  "slgr %[R_INLEN],%[R_TMP]\n\t"			\
>   		  "la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
>   		  "slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
> -		  /* Handles UTF16 surrogates with convert instruction.  */ \
> -		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
> -		  "jo 0b\n\t" /* Try vector implemenation again.  */	\
> -		  "lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */	\
> -		  "lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */	\
> +		  /* Calculate remaining uint32_t values in vrs.  */	\
> +		  "12: lghi %[R_TMP2],8\n\t"				\
> +		  "srlg %[R_TMP3],%[R_TMP3],1\n\t"			\
> +		  "slgr %[R_TMP2],%[R_TMP3]\n\t"			\
> +		  /* Handle remaining UTF-32 characters.  */		\
> +		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
> +		  "aghi %[R_INLEN],-4\n\t"				\
> +		  /* Test if ch is 2byte UTF-16 char. */		\
> +		  "clfi %[R_TMP],0xffff\n\t"				\
> +		  "jh 21f\n\t"						\
> +		  /* Handle 2 byte UTF16 char.  */			\
> +		  "lgr %[R_TMP3],%[R_TMP]\n\t"				\
> +		  "nilf %[R_TMP],0xf800\n\t"				\
> +		  "clfi %[R_TMP],0xd800\n\t"				\
> +		  "je 91f\n\t" /* Do not accept UTF-16 surrogates.  */	\
> +		  "slgfi %[R_OUTLEN],2\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "sth %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "la %[R_OUT],2(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  /* Test if ch is 4byte UTF-16 char. */		\
> +		  "21: clfi %[R_TMP],0x10ffff\n\t"			\
> +		  "jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
> +		  /* Handle 4 byte UTF16 char.  */			\
> +		  "slgfi %[R_OUTLEN],4\n\t"				\
> +		  "jl 90f \n\t"						\
> +		  "slfi %[R_TMP],0x10000\n\t" /* zabcd = uvwxy - 1.  */	\
> +		  "llilf %[R_TMP3],0xd800dc00\n\t"			\
> +		  "la %[R_IN],4(%[R_IN])\n\t"				\
> +		  "risbgn %[R_TMP3],%[R_TMP],38,47,6\n\t" /* High surrogate.  */ \
> +		  "risbgn %[R_TMP3],%[R_TMP],54,63,0\n\t" /* Low surrogate.  */ \
> +		  "st %[R_TMP3],0(%[R_OUT])\n\t"			\
> +		  "la %[R_OUT],4(%[R_OUT])\n\t"				\
> +		  "brctg %[R_TMP2],20b\n\t"				\
> +		  "j 0b\n\t" /* Switch to vx-loop.  */			\
> +		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
> +		  "j 99f\n\t"						\
> +		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
> +		  "j 99f\n\t"						\
> +		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
> +		  "99:\n\t"						\
>   		  ".machine pop"					\
> -		  : /* outputs */ [R_IN] "+a" (pInput)			\
> -		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
> +		  : /* outputs */ [R_IN] "+a" (inptr)			\
> +		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
>   		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
>   		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
>   		    , [R_RES] "+d" (result)				\
> @@ -567,17 +556,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
>   		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
>   		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
>   		  );							\
> -    inptr = pInput;							\
> -    outptr = pOutput;							\
> -									\
>       if (__glibc_likely (inptr == inend)					\
> -	|| result == __GCONV_FULL_OUTPUT)				\
> +	|| result != __GCONV_ILLEGAL_INPUT)				\
>         break;								\
> -    if (inptr + 4 > inend)						\
> -      {									\
> -	result = __GCONV_INCOMPLETE_INPUT;				\
> -	break;								\
> -      }									\
> +									\
>       STANDARD_TO_LOOP_ERR_HANDLER (4);					\
>     }
>
> @@ -590,15 +572,6 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
>   #define BODY			BODY_TO_C
>   #include <iconv/loop.c>
>
> -/* Generate loop-function with hardware utf-convert instruction.  */
> -#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> -#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
> -#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
> -#define LOOPFCT			__to_utf16_loop_etf3eh
> -#define LOOP_NEED_FLAGS
> -#define BODY			BODY_TO_ETF3EH
> -#include <iconv/loop.c>
> -
>   #if defined HAVE_S390_VX_ASM_SUPPORT
>   /* Generate loop-function with hardware vector instructions.  */
>   # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
> @@ -623,10 +596,6 @@ __to_utf16_loop_resolver (unsigned long int dl_hwcap)
>       return __to_utf16_loop_vx;
>     else
>   #endif
> -  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
> -      && dl_hwcap & HWCAP_S390_ETF3EH)
> -    return __to_utf16_loop_etf3eh;
> -  else
>       return __to_utf16_loop_c;
>   }
>
>

[-- Attachment #2: 0012-S390-Fix-utf32-to-utf16-handling-of-low-surrogates-d.patch --]
[-- Type: text/x-patch, Size: 10621 bytes --]

From 9af543d67eca5e22c79c590ff82a0d517c9fb3bf Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 12/14] S390: Fix utf32 to utf16 handling of low surrogates
 (disable cu42).

According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.

Thus the cu42 instruction, which converts from utf32 to utf16,  has to be
disabled because it does not report an error in case of a value in range of
a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
vector variant is adjusted to handle the value in range of an utf16 low
surrogate correctly.

ChangeLog:

	* sysdeps/s390/utf16-utf32-z9.c: Disable cu42 instruction and report
	an error in case of a value in range of an utf16 low surrogate.
---
 sysdeps/s390/utf16-utf32-z9.c | 155 +++++++++++++++++-------------------------
 1 file changed, 62 insertions(+), 93 deletions(-)

diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
index 8d42ab8..5d2ac44 100644
--- a/sysdeps/s390/utf16-utf32-z9.c
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -145,42 +145,6 @@ gconv_end (struct __gconv_step *data)
   free (data->__data);
 }
 
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register size_t inlen __asm__ ("9") = inend - inptr;		\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register size_t outlen __asm__("11") = outend - outptr;		\
-    unsigned long cc = 0;						\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
 #define PREPARE_LOOP							\
   enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
   int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
@@ -310,7 +274,7 @@ gconv_end (struct __gconv_step *data)
 		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
 		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
 		  "12: lghi %[R_TMP2],16\n\t"				\
-		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    slgr %[R_TMP2],%[R_TMP]\n\t"			\
 		  "    srl %[R_TMP2],1\n\t"				\
 		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
 		  "    aghi %[R_OUTLEN],-4\n\t"				\
@@ -437,7 +401,7 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
     uint32_t c = get32 (inptr);						\
 									\
     if (__builtin_expect (c <= 0xd7ff, 1)				\
-	|| (c >=0xdc00 && c <= 0xffff))					\
+	|| (c > 0xdfff && c <= 0xffff))					\
       {									\
 	/* Two UTF-16 chars.  */					\
 	put16 (outptr, c);						\
@@ -475,29 +439,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
     inptr += 4;								\
   }
 
-#define BODY_TO_ETF3EH							\
-  {									\
-    HARDWARE_CONVERT ("cu42 %0, %1");					\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
 #define BODY_TO_VX							\
   {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
     unsigned long tmp, tmp2, tmp3;					\
     asm volatile (".machine push\n\t"					\
 		  ".machine \"z13\"\n\t"				\
@@ -509,8 +454,8 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-16 chars			\
 		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
-		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
 		  "    lghi %[R_TMP2],0\n\t"				\
 		  /* Shorten to UTF-16.  */				\
@@ -526,9 +471,15 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  "    aghi %[R_INLEN],-32\n\t"				\
 		  "    aghi %[R_OUTLEN],-16\n\t"			\
 		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "    j 1b\n\t"					\
+		  /* Calculate remaining uint32_t values in inptr.  */	\
+		  "2:  \n\t"						\
+		  "    clgije %[R_INLEN],0,99f\n\t"			\
+		  "    clgijl %[R_INLEN],4,92f\n\t"			\
+		  "    srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
+		  "    j 20f\n\t"					\
 		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
 		     and check for ch >= 0x10000. (v30, v31)  */	\
 		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
@@ -540,21 +491,59 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
 		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
 		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 20f\n\t"					\
+		  "    jl 12f\n\t"					\
 		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
 		  /* Update pointers.  */				\
 		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
 		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
 		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
 		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handles UTF16 surrogates with convert instruction.  */ \
-		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  /* Calculate remaining uint32_t values in vrs.  */	\
+		  "12: lghi %[R_TMP2],8\n\t"				\
+		  "    srlg %[R_TMP3],%[R_TMP3],1\n\t"			\
+		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  /* Handle remaining UTF-32 characters.  */		\
+		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-4\n\t"				\
+		  /* Test if ch is 2byte UTF-16 char. */		\
+		  "    clfi %[R_TMP],0xffff\n\t"			\
+		  "    jh 21f\n\t"					\
+		  /* Handle 2 byte UTF16 char.  */			\
+		  "    lgr %[R_TMP3],%[R_TMP]\n\t"			\
+		  "    nilf %[R_TMP],0xf800\n\t"			\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    je 91f\n\t" /* Do not accept UTF-16 surrogates.  */ \
+		  "    slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4byte UTF-16 char. */		\
+		  "21: clfi %[R_TMP],0x10ffff\n\t"			\
+		  "    jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
+		  /* Handle 4 byte UTF16 char.  */			\
+		  "    slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    slfi %[R_TMP],0x10000\n\t" /* zabcd = uvwxy - 1.  */ \
+		  "    llilf %[R_TMP3],0xd800dc00\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],38,47,6\n\t" /* High surrogate.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],54,63,0\n\t" /* Low surrogate.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "    j 99f\n\t"					\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
 		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
 		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
 		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
 		    , [R_RES] "+d" (result)				\
@@ -567,17 +556,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
 		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
 		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-									\
     if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
       break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
+									\
     STANDARD_TO_LOOP_ERR_HANDLER (4);					\
   }
 
@@ -590,15 +572,6 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 #define BODY			BODY_TO_C
 #include <iconv/loop.c>
 
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
 #if defined HAVE_S390_VX_ASM_SUPPORT
 /* Generate loop-function with hardware vector instructions.  */
 # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
@@ -623,10 +596,6 @@ __to_utf16_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf16_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
-      && dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf16_loop_etf3eh;
-  else
     return __to_utf16_loop_c;
 }
 
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/14] S390: Optimize 8bit-generic iconv modules.
  2016-04-15 13:05   ` Florian Weimer
@ 2016-04-21 15:35     ` Stefan Liebler
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Liebler @ 2016-04-21 15:35 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

[-- Attachment #1: Type: text/plain, Size: 1541 bytes --]

On 04/15/2016 03:05 PM, Florian Weimer wrote:
> On 02/23/2016 10:21 AM, Stefan Liebler wrote:
>> +     to the 1 byte generic character. If this table contains only up
>> +     to 256 entry, then the highest UCS4 value can be stored in 1 byte
>
> “256 entries”? (spelling)
>
> Why don't you compute the required table at compile time?  Then it can
> live in .rodata and does not have to end up in .bss.
>
> In the inline assembly, I would suggest to out-dent the labels.  There
> is a typo in a comment, “blcocks”.  You could reduce the amount of
> inline assembly by falling back on the C code for error handling, I think.
>
> I can't comment on the technical accuracy of the inline assembly.
>
> Florian
>
Good point. Now this table is generated at compile time.
The gen-8bit.sh script in sysdeps/s390/multiarch generates the 
conversion table to_ucs1. Therefore in sysdeps/s390/multiarch/Makefile 
an override define generate-8bit-table is added, which is originally 
defined in iconvdata/Makefile. This version calls the gen-8bit.sh in 
iconvdata folder and the s390 one. Thus the headers - e.g. 
<build-dir>/iconvdata/ibm037.h - contain this table.

I've fixed the typo and out-dent the labels in the inline assemblies.
I've adjusted the inline assemblies from the other patches, too.

ChangeLog:

	* sysdeps/s390/multiarch/8bit-generic.c: New File.
	* sysdeps/s390/multiarch/gen-8bit.sh: New File.
	* sysdeps/s390/multiarch/Makefile (generate-8bit-table):
	New override define.
	* sysdeps/s390/multiarch/iconv/skeleton.c: Likewise.

[-- Attachment #2: 0004-S390-Optimize-8bit-generic-iconv-modules.patch --]
[-- Type: text/x-patch, Size: 19662 bytes --]

From 2f04fe1bb4e7e123447347ac3ee71115fb021a5c Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 04/14] S390: Optimize 8bit-generic iconv modules.

This patch introduces a s390 specific 8bit-generic.c file which provides an
optimized version for z13 with translate-/vector-instructions, which will be
chosen at runtime via ifunc.
If the build-environment lacks vector support, then iconvdata/8bit-generic.c
is used wihtout any change. Otherwise iconvdata/8bit-generic.c is used to create
conversion loop routines without vector instructions as fallback, if vector
instructions aren't available at runtime.

The vector routines can only be used with charsets where the maximum UCS4 value
fits in 1 byte size. Then the hardware translate-instruction is used
to translate between up to 256 generic characters and "1 byte UCS4"
characters at once. The vector instructions are used to convert between
the "1 byte UCS4" and UCS4.

The gen-8bit.sh script in sysdeps/s390/multiarch generates the conversion
table to_ucs1. Therefore in sysdeps/s390/multiarch/Makefile is added an
override define generate-8bit-table, which is originally defined in
iconvdata/Makefile. This version calls the gen-8bit.sh in iconvdata folder
and the s390 one.

ChangeLog:

	* sysdeps/s390/multiarch/8bit-generic.c: New File.
	* sysdeps/s390/multiarch/gen-8bit.sh: New File.
	* sysdeps/s390/multiarch/Makefile (generate-8bit-table):
	New override define.
	* sysdeps/s390/multiarch/iconv/skeleton.c: Likewise.
---
 sysdeps/s390/multiarch/8bit-generic.c   | 415 ++++++++++++++++++++++++++++++++
 sysdeps/s390/multiarch/Makefile         |  10 +
 sysdeps/s390/multiarch/gen-8bit.sh      |   6 +
 sysdeps/s390/multiarch/iconv/skeleton.c |  21 ++
 4 files changed, 452 insertions(+)
 create mode 100644 sysdeps/s390/multiarch/8bit-generic.c
 create mode 100644 sysdeps/s390/multiarch/gen-8bit.sh
 create mode 100644 sysdeps/s390/multiarch/iconv/skeleton.c

diff --git a/sysdeps/s390/multiarch/8bit-generic.c b/sysdeps/s390/multiarch/8bit-generic.c
new file mode 100644
index 0000000..93565e1
--- /dev/null
+++ b/sysdeps/s390/multiarch/8bit-generic.c
@@ -0,0 +1,415 @@
+/* Generic conversion to and from 8bit charsets - S390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+
+# if defined HAVE_S390_VX_GCC_SUPPORT
+#  define ASM_CLOBBER_VR(NR) , NR
+# else
+#  define ASM_CLOBBER_VR(NR)
+# endif
+
+/* Generate the conversion loop routines without vector instructions as
+   fallback, if vector instructions aren't available at runtime.  */
+# define IGNORE_ICONV_SKELETON
+# define from_generic __from_generic_c
+# define to_generic __to_generic_c
+# include "iconvdata/8bit-generic.c"
+# undef IGNORE_ICONV_SKELETON
+# undef from_generic
+# undef to_generic
+
+/* Generate the converion routines with vector instructions. The vector
+   routines can only be used with charsets where the maximum UCS4 value
+   fits in 1 byte size. Then the hardware translate-instruction is used
+   to translate between multiple generic characters and "1 byte UCS4"
+   characters at once. The vector instructions are used to convert between
+   the "1 byte UCS4" and UCS4.  */
+# include <unistd.h>
+# include <dl-procinfo.h>
+
+# undef FROM_LOOP
+# undef TO_LOOP
+# define FROM_LOOP		__from_generic_vx
+# define TO_LOOP		__to_generic_vx
+
+# define MIN_NEEDED_FROM	1
+# define MIN_NEEDED_TO		4
+# define ONE_DIRECTION		0
+
+/* First define the conversion function from the 8bit charset to UCS4.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_FROM_ORIG \
+  {									      \
+    uint32_t ch = to_ucs4[*inptr];					      \
+									      \
+    if (HAS_HOLES && __builtin_expect (ch == L'\0', 0) && *inptr != '\0')     \
+      {									      \
+	/* This is an illegal character.  */				      \
+	STANDARD_FROM_LOOP_ERR_HANDLER (1);				      \
+      }									      \
+									      \
+    put32 (outptr, ch);							      \
+    outptr += 4;							      \
+    ++inptr;								      \
+  }
+
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 16, 1)			\
+	|| outend - outptr < 64)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_FROM_ORIG							\
+    else								\
+       {								\
+	 /* Convert 16 ... 256 bytes at once with tr-instruction.  */	\
+	 size_t index;							\
+	 char buf[256];							\
+	 size_t loop_count = (inend - inptr) / 16;			\
+	 if (loop_count > (outend - outptr) / 64)			\
+	   loop_count = (outend - outptr) / 64;				\
+	 if (loop_count > 16)						\
+	   loop_count = 16;						\
+	 __asm__ volatile (".machine push\n\t"				\
+			   ".machine \"z13\"\n\t"			\
+			   ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			   "    sllk %[R_I],%[R_LI],4\n\t"		\
+			   "    ahi %[R_I],-1\n\t"			\
+			   /* Execute mvc and tr with correct len.  */	\
+			   "    exrl %[R_I],21f\n\t"			\
+			   "    exrl %[R_I],22f\n\t"			\
+			   /* Post-processing.  */			\
+			   "    lghi %[R_I],0\n\t"			\
+			   "    vzero %%v0\n\t"				\
+			   "0:  \n\t"					\
+			   /* Find invalid character - value is zero.  */ \
+			   "    vl %%v16,0(%[R_I],%[R_BUF])\n\t"	\
+			   "    vceqbs %%v23,%%v0,%%v16\n\t"		\
+			   "    jle 10f\n\t"				\
+			   "1:  \n\t"					\
+			   /* Enlarge to UCS4.  */			\
+			   "    vuplhb %%v17,%%v16\n\t"			\
+			   "    vupllb %%v18,%%v16\n\t"			\
+			   "    vuplhh %%v19,%%v17\n\t"			\
+			   "    vupllh %%v20,%%v17\n\t"			\
+			   "    vuplhh %%v21,%%v18\n\t"			\
+			   "    vupllh %%v22,%%v18\n\t"			\
+			   /* Store 64bytes to buf_out.  */		\
+			   "    vstm %%v19,%%v22,0(%[R_OUT])\n\t"	\
+			   "    aghi %[R_I],16\n\t"			\
+			   "    la %[R_OUT],64(%[R_OUT])\n\t"		\
+			   "    brct %[R_LI],0b\n\t"			\
+			   "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			   "    j 20f\n\t"				\
+			   "21: mvc 0(1,%[R_BUF]),0(%[R_IN])\n\t"	\
+			   "22: tr 0(1,%[R_BUF]),0(%[R_TBL])\n\t"	\
+			   /* Possibly invalid character found.  */	\
+			   "10: \n\t"					\
+			   /* Test if input was zero, too.  */		\
+			   "    vl %%v24,0(%[R_I],%[R_IN])\n\t"		\
+			   "    vceqb %%v24,%%v0,%%v24\n\t"		\
+			   /* Zeros in buf (v23) and inptr (v24) are marked \
+			      with one bits. After xor, invalid characters \
+			      are marked as one bits. Proceed, if no	\
+			      invalid characters are found.  */		\
+			   "    vx %%v24,%%v23,%%v24\n\t"		\
+			   "    vfenebs %%v24,%%v24,%%v0\n\t"		\
+			   "    jo 1b\n\t"				\
+			   /* Found an invalid translation.		\
+			      Store the preceding chars.  */		\
+			   "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			   "    vlgvb %[R_I],%%v24,7\n\t"		\
+			   "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			   "    sll %[R_I],2\n\t"			\
+			   "    ahi %[R_I],-1\n\t"			\
+			   "    jl 20f\n\t"				\
+			   "    lgr %[R_LI],%[R_I]\n\t"			\
+			   "    vuplhb %%v17,%%v16\n\t"			\
+			   "    vuplhh %%v19,%%v17\n\t"			\
+			   "    vstl %%v19,%[R_I],0(%[R_OUT])\n\t"	\
+			   "    ahi %[R_I],-16\n\t"			\
+			   "    jl 11f\n\t"				\
+			   "    vupllh %%v20,%%v17\n\t"			\
+			   "    vstl %%v20,%[R_I],16(%[R_OUT])\n\t"	\
+			   "    ahi %[R_I],-16\n\t"			\
+			   "    jl 11f\n\t"				\
+			   "    vupllb %%v18,%%v16\n\t"			\
+			   "    vuplhh %%v21,%%v18\n\t"			\
+			   "    vstl %%v21,%[R_I],32(%[R_OUT])\n\t"	\
+			   "    ahi %[R_I],-16\n\t"			\
+			   "    jl 11f\n\t"				\
+			   "    vupllh %%v22,%%v18\n\t"			\
+			   "    vstl %%v22,%[R_I],48(%[R_OUT])\n\t"	\
+			   "11: \n\t"					\
+			   "    la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"	\
+			   "20: \n\t"					\
+			   ".machine pop"				\
+			   : /* outputs */ [R_IN] "+a" (inptr)		\
+			     , [R_OUT] "+a" (outptr), [R_I] "=&a" (index) \
+			     , [R_LI] "+a" (loop_count)			\
+			   : /* inputs */ [R_BUF] "a" (buf)		\
+			     , [R_TBL] "a" (to_ucs1)			\
+			   : /* clobber list*/ "memory", "cc"		\
+			     ASM_CLOBBER_VR ("v0")  ASM_CLOBBER_VR ("v16") \
+			     ASM_CLOBBER_VR ("v17") ASM_CLOBBER_VR ("v18") \
+			     ASM_CLOBBER_VR ("v19") ASM_CLOBBER_VR ("v20") \
+			     ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22") \
+			     ASM_CLOBBER_VR ("v23") ASM_CLOBBER_VR ("v24") \
+			   );						\
+	 /* Error occured?  */						\
+	 if (loop_count != 0)						\
+	   {								\
+	     /* Found an invalid character!  */				\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (1);				\
+	  }								\
+      }									\
+    }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Next, define the other direction - from UCS4 to 8bit charset.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define LOOPFCT		TO_LOOP
+# define BODY_TO_ORIG \
+  {									      \
+    uint32_t ch = get32 (inptr);					      \
+									      \
+    if (__builtin_expect (ch >= sizeof (from_ucs4) / sizeof (from_ucs4[0]), 0)\
+	|| (__builtin_expect (from_ucs4[ch], '\1') == '\0' && ch != 0))	      \
+      {									      \
+	UNICODE_TAG_HANDLER (ch, 4);					      \
+									      \
+	/* This is an illegal character.  */				      \
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				      \
+      }									      \
+									      \
+    *outptr++ = from_ucs4[ch];						      \
+    inptr += 4;								      \
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 64, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_TO_ORIG							\
+    else								\
+      {									\
+	/* Convert 64 ... 1024 bytes at once with tr-instruction.  */	\
+	size_t index, tmp;						\
+	char buf[256];							\
+	size_t loop_count = (inend - inptr) / 64;			\
+	uint32_t max = sizeof (from_ucs4) / sizeof (from_ucs4[0]);	\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	if (loop_count > 16)						\
+	  loop_count = 16;						\
+	size_t remaining_loop_count = loop_count;			\
+	/* Step 1: Check for ch>=max, ch == 0 and shorten to bytes.	\
+	   (ch == 0 is no error, but is handled differently)  */	\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  /* Setup to check for ch >= max.  */		\
+			  "    vzero %%v21\n\t"				\
+			  "    vleih %%v21,-24576,0\n\t" /* element 0:   >  */ \
+			  "    vleih %%v21,-8192,2\n\t"  /* element 1: =<>  */ \
+			  "    vlvgf %%v20,%[R_MAX],0\n\t" /* element 0: val  */ \
+			  /* Process in 64byte - 16 characters blocks.  */ \
+			  "    lghi %[R_I],0\n\t"			\
+			  "    lghi %[R_TMP],0\n\t"			\
+			  "0:  \n\t"					\
+			  "    vlm %%v16,%%v19,0(%[R_IN])\n\t"		\
+			  /* Test for ch >= max and ch == 0.  */	\
+			  "    vstrczfs %%v22,%%v16,%%v20,%%v21\n\t"	\
+			  "    jno 10f\n\t"				\
+			  "    vstrczfs %%v22,%%v17,%%v20,%%v21\n\t"	\
+			  "    jno 11f\n\t"				\
+			  "    vstrczfs %%v22,%%v18,%%v20,%%v21\n\t"	\
+			  "    jno 12f\n\t"				\
+			  "    vstrczfs %%v22,%%v19,%%v20,%%v21\n\t"	\
+			  "    jno 13f\n\t"				\
+			  /* Shorten to byte values.  */		\
+			  "    vpkf %%v16,%%v16,%%v17\n\t"		\
+			  "    vpkf %%v18,%%v18,%%v19\n\t"		\
+			  "    vpkh %%v16,%%v16,%%v18\n\t"		\
+			  /* Store 16bytes to buf.  */			\
+			  "    vst %%v16,0(%[R_I],%[R_BUF])\n\t"	\
+			  /* Loop until all blocks are processed.  */	\
+			  "    la %[R_IN],64(%[R_IN])\n\t"		\
+			  "    aghi %[R_I],16\n\t"			\
+			  "    brct %[R_LI],0b\n\t"			\
+			  "    j 20f\n\t"				\
+			  /* Found error ch >= max or ch == 0. */	\
+			  "13: aghi %[R_TMP],4\n\t"			\
+			  "12: aghi %[R_TMP],4\n\t"			\
+			  "11: aghi %[R_TMP],4\n\t"			\
+			  "10: vlgvb %[R_I],%%v22,7\n\t"		\
+			  "    srlg %[R_I],%[R_I],2\n\t"		\
+			  "    agr %[R_I],%[R_TMP]\n\t"			\
+			  "20: \n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_IN] "+a" (inptr)		\
+			    , [R_I] "=&a" (index)			\
+			    , [R_TMP] "=d" (tmp)			\
+			    , [R_LI] "+d" (remaining_loop_count)	\
+			  : /* inputs */ [R_BUF] "a" (buf)		\
+			    , [R_MAX] "d" (max)				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22")			\
+			  );						\
+	/* Error occured in step 1? An error (ch >= max || ch == 0)	\
+	   occured, if remaining_loop_count > 0. The error occured	\
+	   at character-index (index) after already processed blocks.  */ \
+	loop_count -= remaining_loop_count;				\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Step 2: Translate already processed blocks in buf and	\
+	       check for errors (from_ucs4[ch] == 0).  */		\
+	    __asm__ volatile (".machine push\n\t"			\
+			      ".machine \"z13\"\n\t"			\
+			      ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			      "    sllk %[R_I],%[R_LI],4\n\t"		\
+			      "    ahi %[R_I],-1\n\t"			\
+			      /* Execute tr with correct len.  */	\
+			      "    exrl %[R_I],21f\n\t"			\
+			      /* Post-processing.  */			\
+			      "    lghi %[R_I],0\n\t"			\
+			      "0:  \n\t"				\
+			      /* Find invalid character - value == 0.  */ \
+			      "    vl %%v16,0(%[R_I],%[R_BUF])\n\t"	\
+			      "    vfenezbs %%v17,%%v16,%%v16\n\t"	\
+			      "    je 10f\n\t"				\
+			      /* Store 16bytes to buf_out.  */		\
+			      "    vst %%v16,0(%[R_I],%[R_OUT])\n\t"	\
+			      "    aghi %[R_I],16\n\t"			\
+			      "    brct %[R_LI],0b\n\t"			\
+			      "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"	\
+			      "    j 20f\n\t"				\
+			      "21: tr 0(1,%[R_BUF]),0(%[R_TBL])\n\t"	\
+			      /* Found an error: from_ucs4[ch] == 0.  */ \
+			      "10: la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"	\
+			      "    vlgvb %[R_I],%%v17,7\n\t"		\
+			      "20: \n\t"				\
+			      ".machine pop"				\
+			      : /* outputs */ [R_OUT] "+a" (outptr)	\
+				, [R_I] "=&a" (tmp)			\
+				, [R_LI] "+d" (loop_count)		\
+			      : /* inputs */ [R_BUF] "a" (buf)		\
+				, [R_TBL] "a" (from_ucs4)		\
+			      : /* clobber list*/ "memory", "cc"	\
+				ASM_CLOBBER_VR ("v16")			\
+				ASM_CLOBBER_VR ("v17")			\
+			      );					\
+	    /* Error occured in processed bytes of step 2?		\
+	       Thus possible error in step 1 is obselete.*/		\
+	    if (tmp < 16)						\
+	      {								\
+		index = tmp;						\
+		inptr -= loop_count * 64;				\
+	      }								\
+	  }								\
+	/* Error occured in step 1/2?  */				\
+	if (index < 16)							\
+	  {								\
+	    /* Found an invalid character (see step 2) or zero		\
+	       (see step 1) at index! Convert the chars before index	\
+	       manually. If there is a zero at index detected by step 1, \
+	       there could be invalid characters before this zero.  */	\
+	    int i;							\
+	    uint32_t ch;						\
+	    for (i = 0; i < index; i++)					\
+	      {								\
+		ch = get32 (inptr);					\
+		if (__builtin_expect (from_ucs4[ch], '\1') == '\0')     \
+		  break;						\
+		*outptr++ = from_ucs4[ch];				\
+		inptr += 4;						\
+	      }								\
+	    if (i == index)						\
+	      {								\
+		ch = get32 (inptr);					\
+		if (ch == 0)						\
+		  {							\
+		    /* This is no error, but handled differently.  */	\
+		    *outptr++ = from_ucs4[ch];				\
+		    inptr += 4;						\
+		    continue;						\
+		  }							\
+	      }								\
+									\
+	    UNICODE_TAG_HANDLER (ch, 4);				\
+									\
+	    /* This is an illegal character.  */			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+	  }								\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_generic_c)
+__attribute__ ((ifunc ("__from_generic_resolver")))
+__from_generic;
+
+static void *
+__from_generic_resolver (unsigned long int dl_hwcap)
+{
+  if (sizeof (from_ucs4) / sizeof (from_ucs4[0]) <= 256
+      && dl_hwcap & HWCAP_S390_VX)
+    return &__from_generic_vx;
+  else
+    return &__from_generic_c;
+}
+
+__typeof(__to_generic_c)
+__attribute__ ((ifunc ("__to_generic_resolver")))
+__to_generic;
+
+static void *
+__to_generic_resolver (unsigned long int dl_hwcap)
+{
+  if (sizeof (from_ucs4) / sizeof (from_ucs4[0]) <= 256
+      && dl_hwcap & HWCAP_S390_VX)
+    return &__to_generic_vx;
+  else
+    return &__to_generic_c;
+}
+
+strong_alias (__to_generic_c_single, __to_generic_single)
+
+# undef FROM_LOOP
+# undef TO_LOOP
+# define FROM_LOOP		__from_generic
+# define TO_LOOP		__to_generic
+# include <iconv/skeleton.c>
+
+#else
+/* Generate this module without ifunc if build environment lacks vector
+   support. Instead the common 8bit-generic.c is used.  */
+# include "iconvdata/8bit-generic.c"
+#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/multiarch/Makefile b/sysdeps/s390/multiarch/Makefile
index 0805b07..11ad2b9 100644
--- a/sysdeps/s390/multiarch/Makefile
+++ b/sysdeps/s390/multiarch/Makefile
@@ -42,3 +42,13 @@ sysdep_routines += wcslen wcslen-vx wcslen-c \
 		   wmemset wmemset-vx wmemset-c \
 		   wmemcmp wmemcmp-vx wmemcmp-c
 endif
+
+ifeq ($(subdir),iconvdata)
+override define generate-8bit-table
+$(make-target-directory)
+LC_ALL=C $(SHELL) ./gen-8bit.sh $< > $(@:stmp=T)
+LC_ALL=C $(SHELL) ../sysdeps/s390/multiarch/gen-8bit.sh $< >> $(@:stmp=T)
+$(move-if-change) $(@:stmp=T) $(@:stmp=h)
+touch $@
+endef
+endif
diff --git a/sysdeps/s390/multiarch/gen-8bit.sh b/sysdeps/s390/multiarch/gen-8bit.sh
new file mode 100644
index 0000000..6f88c4b
--- /dev/null
+++ b/sysdeps/s390/multiarch/gen-8bit.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+echo "static const uint8_t to_ucs1[256] = {"
+sed -ne '/^[^[:space:]]*[[:space:]]*.x00/d;/^END/q' \
+    -e 's/^<U00\(..\)>[[:space:]]*.x\(..\).*/  [0x\2] = 0x\1,/p' \
+    "$@" | sort -u
+echo "};"
diff --git a/sysdeps/s390/multiarch/iconv/skeleton.c b/sysdeps/s390/multiarch/iconv/skeleton.c
new file mode 100644
index 0000000..3a90031
--- /dev/null
+++ b/sysdeps/s390/multiarch/iconv/skeleton.c
@@ -0,0 +1,21 @@
+/* Skeleton for a conversion module - S390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef IGNORE_ICONV_SKELETON
+# include_next <iconv/skeleton.c>
+#endif
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-04-21 15:00     ` Stefan Liebler
@ 2016-04-28  6:55       ` Stefan Liebler
  2016-05-04 13:15         ` [PING] " Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-04-28  6:55 UTC (permalink / raw)
  To: libc-alpha

Ping

On 04/21/2016 04:55 PM, Stefan Liebler wrote:
> Ping. Is the new handling of gconv-modules in iconvdata/Makefile okay to
> commit?
>
> On 04/14/2016 04:16 PM, Stefan Liebler wrote:
>> Ping. Is the new handling of gconv-modules in iconvdata/Makefile okay to
>> commit?
>>
>> On 02/23/2016 10:21 AM, Stefan Liebler wrote:
>>> This patch introduces a way to provide an architecture dependent
>>> gconv-modules
>>> file. Before this patch, the gconv-modules file was normally installed
>>> from
>>> src-dir/iconvdata/gconv-modules. The S390 Makefile had overridden the
>>> installation recipe (with a make warning) in order to install the
>>> gconv-module-s390 file from build-dir.
>>> The iconvdata/Makefile provides another recipe, which copies the
>>> gconv-modules
>>> file from src to build dir, which are used by the testcases.
>>> Thus the testcases does not use the currently build s390-modules.
>>>
>>> This patch uses build-dir/iconvdata/gconv-modules for installation.
>>> If makefile variable GCONV_MODULES is not defined, then gconv-modules
>>> file
>>> is copied form source to build directory.
>>> If an architecture wants to create his own gconv-modules file, then
>>> the variable
>>> GCONV_MODULE is set to the name of the architecture-dependent
>>> gconv-modules file
>>> in build-directory, which has to be created by a recipe in
>>> sysdeps/.../Makefile.
>>> Then the  iconvdata/Makefile copies this file to
>>> build-dir/iconvdata/gconv-modules, which will be used for installation
>>> and test.
>>>
>>> This way, the s390-Makefile does not need to override the recipe for
>>> gconv-modules and no warning is emitted anymore.
>>>
>>> ChangeLog:
>>>
>>>      * iconvdata/Makefile (GCONV_MODULES): New variable, which can
>>>      be set by sysdeps Makefile.
>>>      ($(inst_gconvdir)/gconv-modules):
>>>      Install file from $(objpfx)gconv-modules.
>>>      ($(objpfx)gconv-modules): Copy File from src-dir or from
>>>      build-dir with file-name specified by GCONV_MODULES.
>>>      * sysdeps/s390/s390-64/Makefile ($(inst_gconvdir)/gconv-modules):
>>>      Deleted.
>>>      (GCONV_MODULES): New variable.
>>> ---
>>>   iconvdata/Makefile            | 15 +++++++++++++--
>>>   sysdeps/s390/s390-64/Makefile | 17 ++---------------
>>>   2 files changed, 15 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/iconvdata/Makefile b/iconvdata/Makefile
>>> index 357530b..1ac1a5c 100644
>>> --- a/iconvdata/Makefile
>>> +++ b/iconvdata/Makefile
>>> @@ -244,7 +244,7 @@ headers: $(addprefix $(objpfx),
>>> $(generated-modules:=.h))
>>>   $(addprefix $(inst_gconvdir)/, $(modules.so)): \
>>>       $(inst_gconvdir)/%: $(objpfx)% $(+force)
>>>       $(do-install-program)
>>> -$(inst_gconvdir)/gconv-modules: gconv-modules $(+force)
>>> +$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules $(+force)
>>>       $(do-install)
>>>   ifeq (no,$(cross-compiling))
>>>   # Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is
>>> necessary
>>> @@ -332,6 +332,17 @@ tst-tables-clean:
>>>       -rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
>>>
>>>   ifdef objpfx
>>> +# Override GCONV_MODULES file name and provide a Makefile recipe,
>>> +# if you want to create your own version.
>>> +ifndef GCONV_MODULES
>>> +# Copy gconv-modules from src-tree for tests and installation.
>>>   $(objpfx)gconv-modules: gconv-modules
>>> -    cp $^ $@
>>> +    cp $< $@
>>> +else
>>> +generated += $(GCONV_MODULES)
>>> +
>>> +# Copy overrided GCONV_MODULES file to gconv-modules for tests and
>>> installation.
>>> +$(objpfx)gconv-modules: $(objpfx)$(GCONV_MODULES)
>>> +    cp $< $@
>>> +endif
>>>   endif
>>> diff --git a/sysdeps/s390/s390-64/Makefile
>>> b/sysdeps/s390/s390-64/Makefile
>>> index ce4f0c5..de249a7 100644
>>> --- a/sysdeps/s390/s390-64/Makefile
>>> +++ b/sysdeps/s390/s390-64/Makefile
>>> @@ -39,7 +39,7 @@ $(patsubst %, $(inst_gconvdir)/%.so,
>>> $(s390x-iconv-modules)) : \
>>>   $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
>>>       $(do-install-program)
>>>
>>> -$(objpfx)gconv-modules-s390: gconv-modules $(+force)
>>> +$(objpfx)gconv-modules-s390: gconv-modules
>>>       cp $< $@
>>>       echo >> $@
>>>       echo "# S/390 hardware accelerated modules" >> $@
>>> @@ -74,19 +74,6 @@ $(objpfx)gconv-modules-s390: gconv-modules $(+force)
>>>       echo -n "module    ISO-10646/UTF8/        UTF-16BE//    " >> $@
>>>       echo "    UTF8_UTF16_Z9        1" >> $@
>>>
>>> -$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules-s390 $(+force)
>>> -    $(do-install)
>>> -ifeq (no,$(cross-compiling))
>>> -# Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is
>>> necessary
>>> -# if this libc has more gconv modules than the previously installed
>>> one.
>>> -    if test -f "$(inst_gconvdir)/gconv-modules.cache"; then \
>>> -       LC_ALL=C \
>>> -       $(rtld-prefix) \
>>> -       $(common-objpfx)iconv/iconvconfig \
>>> -         $(addprefix --prefix=,$(install_root)); \
>>> -    fi
>>> -else
>>> -    @echo '*@*@*@ You should recreate
>>> $(inst_gconvdir)/gconv-modules.cache'
>>> -endif
>>> +GCONV_MODULES = gconv-modules-s390
>>>
>>>   endif
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PING] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-04-28  6:55       ` Stefan Liebler
@ 2016-05-04 13:15         ` Stefan Liebler
  2016-05-04 13:40           ` Andreas Schwab
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-05-04 13:15 UTC (permalink / raw)
  To: libc-alpha

ping.

Is the new handling of gconv-modules in iconvdata/Makefile okay to
commit? This prevents the make warning due to overriden recipe for
gconv-modules.

On 04/28/2016 08:50 AM, Stefan Liebler wrote:
> Ping
>
> On 04/21/2016 04:55 PM, Stefan Liebler wrote:
>> Ping. Is the new handling of gconv-modules in iconvdata/Makefile okay to
>> commit?
>>
>> On 04/14/2016 04:16 PM, Stefan Liebler wrote:
>>> Ping. Is the new handling of gconv-modules in iconvdata/Makefile okay to
>>> commit?
>>>
>>> On 02/23/2016 10:21 AM, Stefan Liebler wrote:
>>>> This patch introduces a way to provide an architecture dependent
>>>> gconv-modules
>>>> file. Before this patch, the gconv-modules file was normally installed
>>>> from
>>>> src-dir/iconvdata/gconv-modules. The S390 Makefile had overridden the
>>>> installation recipe (with a make warning) in order to install the
>>>> gconv-module-s390 file from build-dir.
>>>> The iconvdata/Makefile provides another recipe, which copies the
>>>> gconv-modules
>>>> file from src to build dir, which are used by the testcases.
>>>> Thus the testcases does not use the currently build s390-modules.
>>>>
>>>> This patch uses build-dir/iconvdata/gconv-modules for installation.
>>>> If makefile variable GCONV_MODULES is not defined, then gconv-modules
>>>> file
>>>> is copied form source to build directory.
>>>> If an architecture wants to create his own gconv-modules file, then
>>>> the variable
>>>> GCONV_MODULE is set to the name of the architecture-dependent
>>>> gconv-modules file
>>>> in build-directory, which has to be created by a recipe in
>>>> sysdeps/.../Makefile.
>>>> Then the  iconvdata/Makefile copies this file to
>>>> build-dir/iconvdata/gconv-modules, which will be used for installation
>>>> and test.
>>>>
>>>> This way, the s390-Makefile does not need to override the recipe for
>>>> gconv-modules and no warning is emitted anymore.
>>>>
>>>> ChangeLog:
>>>>
>>>>      * iconvdata/Makefile (GCONV_MODULES): New variable, which can
>>>>      be set by sysdeps Makefile.
>>>>      ($(inst_gconvdir)/gconv-modules):
>>>>      Install file from $(objpfx)gconv-modules.
>>>>      ($(objpfx)gconv-modules): Copy File from src-dir or from
>>>>      build-dir with file-name specified by GCONV_MODULES.
>>>>      * sysdeps/s390/s390-64/Makefile ($(inst_gconvdir)/gconv-modules):
>>>>      Deleted.
>>>>      (GCONV_MODULES): New variable.
>>>> ---
>>>>   iconvdata/Makefile            | 15 +++++++++++++--
>>>>   sysdeps/s390/s390-64/Makefile | 17 ++---------------
>>>>   2 files changed, 15 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/iconvdata/Makefile b/iconvdata/Makefile
>>>> index 357530b..1ac1a5c 100644
>>>> --- a/iconvdata/Makefile
>>>> +++ b/iconvdata/Makefile
>>>> @@ -244,7 +244,7 @@ headers: $(addprefix $(objpfx),
>>>> $(generated-modules:=.h))
>>>>   $(addprefix $(inst_gconvdir)/, $(modules.so)): \
>>>>       $(inst_gconvdir)/%: $(objpfx)% $(+force)
>>>>       $(do-install-program)
>>>> -$(inst_gconvdir)/gconv-modules: gconv-modules $(+force)
>>>> +$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules $(+force)
>>>>       $(do-install)
>>>>   ifeq (no,$(cross-compiling))
>>>>   # Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is
>>>> necessary
>>>> @@ -332,6 +332,17 @@ tst-tables-clean:
>>>>       -rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
>>>>
>>>>   ifdef objpfx
>>>> +# Override GCONV_MODULES file name and provide a Makefile recipe,
>>>> +# if you want to create your own version.
>>>> +ifndef GCONV_MODULES
>>>> +# Copy gconv-modules from src-tree for tests and installation.
>>>>   $(objpfx)gconv-modules: gconv-modules
>>>> -    cp $^ $@
>>>> +    cp $< $@
>>>> +else
>>>> +generated += $(GCONV_MODULES)
>>>> +
>>>> +# Copy overrided GCONV_MODULES file to gconv-modules for tests and
>>>> installation.
>>>> +$(objpfx)gconv-modules: $(objpfx)$(GCONV_MODULES)
>>>> +    cp $< $@
>>>> +endif
>>>>   endif
>>>> diff --git a/sysdeps/s390/s390-64/Makefile
>>>> b/sysdeps/s390/s390-64/Makefile
>>>> index ce4f0c5..de249a7 100644
>>>> --- a/sysdeps/s390/s390-64/Makefile
>>>> +++ b/sysdeps/s390/s390-64/Makefile
>>>> @@ -39,7 +39,7 @@ $(patsubst %, $(inst_gconvdir)/%.so,
>>>> $(s390x-iconv-modules)) : \
>>>>   $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
>>>>       $(do-install-program)
>>>>
>>>> -$(objpfx)gconv-modules-s390: gconv-modules $(+force)
>>>> +$(objpfx)gconv-modules-s390: gconv-modules
>>>>       cp $< $@
>>>>       echo >> $@
>>>>       echo "# S/390 hardware accelerated modules" >> $@
>>>> @@ -74,19 +74,6 @@ $(objpfx)gconv-modules-s390: gconv-modules $(+force)
>>>>       echo -n "module    ISO-10646/UTF8/        UTF-16BE//    " >> $@
>>>>       echo "    UTF8_UTF16_Z9        1" >> $@
>>>>
>>>> -$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules-s390 $(+force)
>>>> -    $(do-install)
>>>> -ifeq (no,$(cross-compiling))
>>>> -# Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is
>>>> necessary
>>>> -# if this libc has more gconv modules than the previously installed
>>>> one.
>>>> -    if test -f "$(inst_gconvdir)/gconv-modules.cache"; then \
>>>> -       LC_ALL=C \
>>>> -       $(rtld-prefix) \
>>>> -       $(common-objpfx)iconv/iconvconfig \
>>>> -         $(addprefix --prefix=,$(install_root)); \
>>>> -    fi
>>>> -else
>>>> -    @echo '*@*@*@ You should recreate
>>>> $(inst_gconvdir)/gconv-modules.cache'
>>>> -endif
>>>> +GCONV_MODULES = gconv-modules-s390
>>>>
>>>>   endif
>>>>
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PING] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-04 13:15         ` [PING] " Stefan Liebler
@ 2016-05-04 13:40           ` Andreas Schwab
  2016-05-09 14:33             ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Andreas Schwab @ 2016-05-04 13:40 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

Define a variable sysdep-gconv-modules that can be set by
sysdeps/.../Makefile, and use it in iconvdata/Makefile to cat the files
together.  Please also fix the rule in sysdeps/s390/s390-64/Makefile to
use a temporary file to make the update atomic.  Since we no longer
support empty objpfx the conditional test can be removed.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PING] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-04 13:40           ` Andreas Schwab
@ 2016-05-09 14:33             ` Stefan Liebler
  2016-05-18 15:28               ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-05-09 14:33 UTC (permalink / raw)
  To: libc-alpha

On 05/04/2016 03:40 PM, Andreas Schwab wrote:
> Define a variable sysdep-gconv-modules that can be set by
> sysdeps/.../Makefile, and use it in iconvdata/Makefile to cat the files
> together.  Please also fix the rule in sysdeps/s390/s390-64/Makefile to
> use a temporary file to make the update atomic.  Since we no longer
> support empty objpfx the conditional test can be removed.
>
> Andreas.
>

Okay. I will remove the objpfx conditional test in iconvdata/Makefile.

I have to add the s390 specific modules before all the other ones in 
<source>/iconvdata/gconv-modules.
(See my second patch: "S390: Mention s390-specific gconv-modues before 
common ones.")
Thus simply concatenating would lead to something like that:
"
# GNU libc iconv configuration.
# Copyright (C) 1997-2016 Free Software Foundation, Inc.
#....

s390-specific modules

# GNU libc iconv configuration.
# Copyright (C) 1997-2016 Free Software Foundation, Inc.
#....

common modules
"

This doesn't look very nice. Or is it okay?

Then I would prefer to create a file 
<source>/sysdeps/s390/gconv-modules-s390 with the module-definitions, 
set the variable sysdep-gconv-modules and omit the rule with "cp, echo, 
echo ..." in sysdeps/s390/s390-64/Makefile at all.

Bye
Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PING] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-09 14:33             ` Stefan Liebler
@ 2016-05-18 15:28               ` Stefan Liebler
  2016-05-24 15:02                 ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-05-18 15:28 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

On 05/09/2016 04:15 PM, Stefan Liebler wrote:
> On 05/04/2016 03:40 PM, Andreas Schwab wrote:
>> Define a variable sysdep-gconv-modules that can be set by
>> sysdeps/.../Makefile, and use it in iconvdata/Makefile to cat the files
>> together.  Please also fix the rule in sysdeps/s390/s390-64/Makefile to
>> use a temporary file to make the update atomic.  Since we no longer
>> support empty objpfx the conditional test can be removed.
>>
>> Andreas.
>>
>
> Okay. I will remove the objpfx conditional test in iconvdata/Makefile.
>
> I have to add the s390 specific modules before all the other ones in
> <source>/iconvdata/gconv-modules.
> (See my second patch: "S390: Mention s390-specific gconv-modues before
> common ones.")
> Thus simply concatenating would lead to something like that:
> "
> # GNU libc iconv configuration.
> # Copyright (C) 1997-2016 Free Software Foundation, Inc.
> #....
>
> s390-specific modules
>
> # GNU libc iconv configuration.
> # Copyright (C) 1997-2016 Free Software Foundation, Inc.
> #....
>
> common modules
> "
>
> This doesn't look very nice. Or is it okay?
>
> Then I would prefer to create a file
> <source>/sysdeps/s390/gconv-modules-s390 with the module-definitions,
> set the variable sysdep-gconv-modules and omit the rule with "cp, echo,
> echo ..." in sysdeps/s390/s390-64/Makefile at all.
>
> Bye
> Stefan
>
>
Here is an updated patch. It concatenates the s390-specific and the 
common gconv-modules file together. The s390-specific gconv-modules 
files is specified with variable sysdep-gconv-modules in 
sysdeps/s390/s390-64/Makefile.

The second patch "[PATCH 02/14] S390: Mention s390-specific gconv-modues 
before common ones." can be removed since the s390 modules are already 
mentioned before the common ones with this patch.

The patch "[PATCH 10/14] S390: Use s390-64 specific ionv-modules on 
s390-32,too.", which moves the iconvdata contents from 
sysdeps/s390/s390-64/Makefile to sysdeps/s390/Makefile has to be 
adjusted in order to reflect the Makefile-changes.

Okay to commit with these changes?

Bye
Stefan

---
ChangeLog:

	* iconvdata/Makefile ($(inst_gconvdir)/gconv-modules):
	Install file from $(objpfx)gconv-modules.
	($(objpfx)gconv-modules): Concatenate architecture specific file
	in variable sysdeps-gconv-modules and gconv-modules in src dir.
	* sysdeps/s390/gconv-modules: New file.
	* sysdeps/s390/s390-64/Makefile: ($(inst_gconvdir)/gconv-modules):
	Deleted.
	($(objpfx)gconv-modules-s390): Deleted.
	(sysdeps-gconv-modules): New variable.

[-- Attachment #2: 0001-S390-Get-rid-of-make-warning-overriding-recipe-for-t.patch --]
[-- Type: text/x-patch, Size: 7957 bytes --]

From 9f6bbd59d4ea86c9e61d4497fc8e888985567557 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Wed, 18 May 2016 12:48:37 +0200
Subject: [PATCH 01/13] S390: Get rid of make warning: overriding recipe for
 target gconv-modules.

This patch introduces a way to provide an architecture dependent gconv-modules
file. Before this patch, the gconv-modules file was normally installed from
src-dir/iconvdata/gconv-modules. The S390 Makefile had overridden the
installation recipe (with a make warning) in order to install the
gconv-module-s390 file from build-dir.
The iconvdata/Makefile provides another recipe, which copies the gconv-modules
file from src to build dir, which are used by the testcases.
Thus the testcases does not use the currently build s390-modules.

This patch uses build-dir/iconvdata/gconv-modules for installation, which
is generated by concatenating src-dir/iconvdata/gconv-modules and the
architecture specific one. The latter one can be specified by setting the variable
sysdeps-gconv-modules in sysdeps/.../Makefile.

The architecture specific gconv-modules file is emitted before the common one
because these modules aren't used in all possible conversions. E.g. the converting
from INTERNAL to UTF-16 used the common UTF-16.so module instead of UTF16_UTF32_Z9.so.

This way, the s390-Makefile does not need to override the recipe for gconv-modules
and no warning is emitted anymore.
Since we no longer support empty objpfx the conditional test in iconvdata/Makefile
is removed.

ChangeLog:

	* iconvdata/Makefile ($(inst_gconvdir)/gconv-modules):
	Install file from $(objpfx)gconv-modules.
	($(objpfx)gconv-modules): Concatenate architecture specific file
	in variable sysdeps-gconv-modules and gconv-modules in src dir.
	* sysdeps/s390/gconv-modules: New file.
	* sysdeps/s390/s390-64/Makefile: ($(inst_gconvdir)/gconv-modules):
	Deleted.
	($(objpfx)gconv-modules-s390): Deleted.
	(sysdeps-gconv-modules): New variable.
---
 iconvdata/Makefile            |  6 ++---
 sysdeps/s390/gconv-modules    | 51 +++++++++++++++++++++++++++++++++++++++++++
 sysdeps/s390/s390-64/Makefile | 51 +------------------------------------------
 3 files changed, 54 insertions(+), 54 deletions(-)
 create mode 100644 sysdeps/s390/gconv-modules

diff --git a/iconvdata/Makefile b/iconvdata/Makefile
index 357530b..f9826b3 100644
--- a/iconvdata/Makefile
+++ b/iconvdata/Makefile
@@ -244,7 +244,7 @@ headers: $(addprefix $(objpfx), $(generated-modules:=.h))
 $(addprefix $(inst_gconvdir)/, $(modules.so)): \
     $(inst_gconvdir)/%: $(objpfx)% $(+force)
 	$(do-install-program)
-$(inst_gconvdir)/gconv-modules: gconv-modules $(+force)
+$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules $(+force)
 	$(do-install)
 ifeq (no,$(cross-compiling))
 # Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
@@ -331,7 +331,5 @@ do-tests-clean common-mostlyclean: tst-tables-clean
 tst-tables-clean:
 	-rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
 
-ifdef objpfx
 $(objpfx)gconv-modules: gconv-modules
-	cp $^ $@
-endif
+	cat $(sysdeps-gconv-modules) $^ > $@
diff --git a/sysdeps/s390/gconv-modules b/sysdeps/s390/gconv-modules
new file mode 100644
index 0000000..376235a
--- /dev/null
+++ b/sysdeps/s390/gconv-modules
@@ -0,0 +1,51 @@
+# GNU libc iconv configuration.
+# Copyright (C) 1997-2016 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# All lines contain the following information:
+
+# If the lines start with `module'
+#  fromset:	either a name triple or a regular expression triple.
+#  toset:	a name triple or an expression with \N to get regular
+#		expression matching results.
+#  filename:	filename of the module implementing the transformation.
+#		If it is not absolute the path is made absolute by prepending
+#		the directory the configuration file is found in.
+#  cost:	optional cost of the transformation.  Default is 1.
+
+# If the lines start with `alias'
+#  alias:	alias name which is not really recognized.
+#  name:	the real name of the character set
+
+# S/390 hardware accelerated modules
+#	from			to			module			cost
+module	ISO-8859-1//		IBM037//		ISO-8859-1_CP037_Z900	1
+module	IBM037//		ISO-8859-1//		ISO-8859-1_CP037_Z900	1
+module	ISO-10646/UTF8/		UTF-32//		UTF8_UTF32_Z9		1
+module	UTF-32BE//		ISO-10646/UTF8/		UTF8_UTF32_Z9		1
+module	ISO-10646/UTF8/		UTF-32BE//		UTF8_UTF32_Z9		1
+module	UTF-16BE//		UTF-32//		UTF16_UTF32_Z9		1
+module	UTF-32BE//		UTF-16//		UTF16_UTF32_Z9		1
+module	INTERNAL		UTF-16//		UTF16_UTF32_Z9		1
+module	UTF-32BE//		UTF-16BE//		UTF16_UTF32_Z9		1
+module	INTERNAL		UTF-16BE//		UTF16_UTF32_Z9		1
+module	UTF-16BE//		UTF-32BE//		UTF16_UTF32_Z9		1
+module	UTF-16BE//		INTERNAL		UTF16_UTF32_Z9		1
+module	UTF-16BE//		ISO-10646/UTF8/		UTF8_UTF16_Z9		1
+module	ISO-10646/UTF8/		UTF-16//		UTF8_UTF16_Z9		1
+module	ISO-10646/UTF8/		UTF-16BE//		UTF8_UTF16_Z9		1
+
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 5909d1f..ce4aa3b 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -37,54 +37,5 @@ $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
 $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
 	$(do-install-program)
 
-$(objpfx)gconv-modules-s390: gconv-modules $(+force)
-	cp $< $@
-	echo >> $@
-	echo "# S/390 hardware accelerated modules" >> $@
-	echo -n "module	ISO-8859-1//		IBM037//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	IBM037//		ISO-8859-1//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32BE//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		INTERNAL	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16BE//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-
-$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules-s390 $(+force)
-	$(do-install)
-ifeq (no,$(cross-compiling))
-# Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
-# if this libc has more gconv modules than the previously installed one.
-	if test -f "$(inst_gconvdir)/gconv-modules.cache"; then \
-	   LC_ALL=C \
-	   $(rtld-prefix) \
-	   $(common-objpfx)iconv/iconvconfig \
-	     $(addprefix --prefix=,$(install_root)); \
-	fi
-else
-	@echo '*@*@*@ You should recreate $(inst_gconvdir)/gconv-modules.cache'
-endif
-
+sysdeps-gconv-modules = ../sysdeps/s390/gconv-modules
 endif
-- 
2.5.5


[-- Attachment #3: 0009-S390-Use-s390-64-specific-ionv-modules-on-s390-32-to.patch --]
[-- Type: text/x-patch, Size: 183812 bytes --]

From 0169d55ae1960b1a06eb2cdbd5715cc938055400 Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 21 Apr 2016 12:42:49 +0200
Subject: [PATCH 09/13] S390: Use s390-64 specific ionv-modules on s390-32,
 too.

This patch reworks the existing s390 64bit specific iconv modules in order
to use them on s390 31bit, too.

Thus the parts for subdirectory iconvdata in sysdeps/s390/s390-64/Makefile
were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
All those modules are moved from sysdeps/s390/s390-64 directory to sysdeps/s390.

The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
on count) instruction on 31bit s390 instead of brctg, because the brctg is a
zarch instruction and is not available on a 31bit kernel.

The utf modules are using zarch instructions, thus the directive machinemode
zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
only if zarch instructions are available (64bit kernel in 31bit compat-mode).
Furthermore some variable types were changed. E.g. unsigned long long would be
a register pair on s390 31bit, but we want only one single register.
For variables of type size_t the register contents have to be enlarged from a
32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
in such cases.

ChangeLog:

	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
	Move to ...
	* sysdeps/s390/Makefile: ... here.
	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
	* sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
	(BRANCH_ON_COUNT): New define.
	(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
	* sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
	* sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
	run on s390-32, too.
	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
	* sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
	run on s390-32, too.
---
 sysdeps/s390/Makefile                        |  31 +
 sysdeps/s390/iso-8859-1_cp037_z900.c         | 262 +++++++++
 sysdeps/s390/s390-64/Makefile                |  32 --
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 256 ---------
 sysdeps/s390/s390-64/utf16-utf32-z9.c        | 624 --------------------
 sysdeps/s390/s390-64/utf8-utf16-z9.c         | 806 --------------------------
 sysdeps/s390/s390-64/utf8-utf32-z9.c         | 807 --------------------------
 sysdeps/s390/utf16-utf32-z9.c                | 636 +++++++++++++++++++++
 sysdeps/s390/utf8-utf16-z9.c                 | 818 ++++++++++++++++++++++++++
 sysdeps/s390/utf8-utf32-z9.c                 | 820 +++++++++++++++++++++++++++
 10 files changed, 2567 insertions(+), 2525 deletions(-)
 create mode 100644 sysdeps/s390/Makefile
 create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
 create mode 100644 sysdeps/s390/utf16-utf32-z9.c
 create mode 100644 sysdeps/s390/utf8-utf16-z9.c
 create mode 100644 sysdeps/s390/utf8-utf32-z9.c

diff --git a/sysdeps/s390/Makefile b/sysdeps/s390/Makefile
new file mode 100644
index 0000000..d508365
--- /dev/null
+++ b/sysdeps/s390/Makefile
@@ -0,0 +1,31 @@
+ifeq ($(subdir),iconvdata)
+ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
+ISO-8859-1_CP037_Z900-map := gconv.map
+
+UTF8_UTF32_Z9-routines := utf8-utf32-z9
+UTF8_UTF32_Z9-map := gconv.map
+
+UTF16_UTF32_Z9-routines := utf16-utf32-z9
+UTF16_UTF32_Z9-map := gconv.map
+
+UTF8_UTF16_Z9-routines := utf8-utf16-z9
+UTF8_UTF16_Z9-map := gconv.map
+
+s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
+
+extra-modules-left += $(s390x-iconv-modules)
+include extra-module.mk
+
+cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
+lib := iconvdata
+include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
+
+extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
+install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
+
+$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
+$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
+	$(do-install-program)
+
+sysdeps-gconv-modules = ../sysdeps/s390/gconv-modules
+endif
diff --git a/sysdeps/s390/iso-8859-1_cp037_z900.c b/sysdeps/s390/iso-8859-1_cp037_z900.c
new file mode 100644
index 0000000..fc25dff
--- /dev/null
+++ b/sysdeps/s390/iso-8859-1_cp037_z900.c
@@ -0,0 +1,262 @@
+/* Conversion between ISO 8859-1 and IBM037.
+
+   This module uses the translate instruction.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+
+// conversion table from ISO-8859-1 to IBM037
+static const unsigned char table_iso8859_1_to_cp037[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
+  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
+  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
+  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
+  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
+  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
+  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
+  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
+  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
+  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
+  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
+  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
+  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
+  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
+  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
+  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
+  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
+  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
+  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
+  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
+  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
+  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
+  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
+  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
+  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
+  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
+  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
+  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
+  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
+  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
+  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
+  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
+  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
+  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
+  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
+  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
+  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
+  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
+  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
+  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
+  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
+  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
+  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
+  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
+  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
+  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
+  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
+  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
+  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
+  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
+  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
+  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
+  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
+  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
+  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
+};
+
+// conversion table from IBM037 to ISO-8859-1
+static const unsigned char table_cp037_iso8859_1[256]
+__attribute__ ((aligned (8))) =
+{
+  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
+  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
+  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
+  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
+  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
+  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
+  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
+  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
+  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
+  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
+  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
+  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
+  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
+  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
+  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
+  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
+  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
+  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
+  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
+  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
+  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
+  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
+  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
+  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
+  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
+  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
+  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
+  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
+  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
+  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
+  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
+  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
+  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
+  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
+  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
+  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
+  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
+  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
+  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
+  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
+  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
+  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
+  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
+  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
+  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
+  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
+  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
+  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
+  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
+  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
+  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
+  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
+  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
+  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
+  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
+  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
+  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
+  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
+  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
+  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
+  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
+  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
+  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
+  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
+};
+
+/* Definitions used in the body of the `gconv' function.  */
+#define CHARSET_NAME		"ISO-8859-1//"
+#define FROM_LOOP		iso8859_1_to_cp037_z900
+#define TO_LOOP			cp037_to_iso8859_1_z900
+#define DEFINE_INIT		1
+#define DEFINE_FINI		1
+#define MIN_NEEDED_FROM		1
+#define MIN_NEEDED_TO		1
+
+# if defined __s390x__
+#  define BRANCH_ON_COUNT(REG,LBL) "brctg %" #REG "," #LBL "\n\t"
+# else
+#  define BRANCH_ON_COUNT(REG,LBL) "brct %" #REG "," #LBL "\n\t"
+# endif
+
+#define TR_LOOP(TABLE)							\
+  {									\
+    size_t length = (inend - inptr < outend - outptr			\
+		     ? inend - inptr : outend - outptr);		\
+									\
+    /* Process in 256 byte blocks.  */					\
+    if (__builtin_expect (length >= 256, 0))				\
+      {									\
+	size_t blocks = length / 256;					\
+	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
+			     "   tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
+			     "   la %[R_IN],256(%[R_IN])\n\t"		\
+			     "   la %[R_OUT],256(%[R_OUT])\n\t"		\
+			     BRANCH_ON_COUNT ([R_LI], 0b)		\
+			     : /* outputs */ [R_IN] "+a" (inptr)	\
+			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
+			     : /* inputs */ [R_TBL] "a" (TABLE)		\
+			     : /* clobber list */ "memory"		\
+			     );						\
+	length = length % 256;						\
+      }									\
+									\
+    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
+    if (length >= 8)							\
+      {									\
+	size_t blocks = length / 8;					\
+	for (int i = 0; i < blocks; i++)				\
+	  {								\
+	    outptr[0] = TABLE[inptr[0]];				\
+	    outptr[1] = TABLE[inptr[1]];				\
+	    outptr[2] = TABLE[inptr[2]];				\
+	    outptr[3] = TABLE[inptr[3]];				\
+	    outptr[4] = TABLE[inptr[4]];				\
+	    outptr[5] = TABLE[inptr[5]];				\
+	    outptr[6] = TABLE[inptr[6]];				\
+	    outptr[7] = TABLE[inptr[7]];				\
+	    inptr += 8;							\
+	    outptr += 8;						\
+	  }								\
+	length = length % 8;						\
+      }									\
+									\
+    /* Process remaining 0...7 bytes.  */				\
+    switch (length)							\
+      {									\
+      case 7: outptr[6] = TABLE[inptr[6]];				\
+      case 6: outptr[5] = TABLE[inptr[5]];				\
+      case 5: outptr[4] = TABLE[inptr[4]];				\
+      case 4: outptr[3] = TABLE[inptr[3]];				\
+      case 3: outptr[2] = TABLE[inptr[2]];				\
+      case 2: outptr[1] = TABLE[inptr[1]];				\
+      case 1: outptr[0] = TABLE[inptr[0]];				\
+      case 0: break;							\
+      }									\
+    inptr += length;							\
+    outptr += length;							\
+  }
+
+
+/* First define the conversion function from ISO 8859-1 to CP037.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
+
+#include <iconv/loop.c>
+
+
+/* Next, define the conversion function from CP037 to ISO 8859-1.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define LOOPFCT			TO_LOOP
+#define BODY			TR_LOOP (table_cp037_iso8859_1);
+
+#include <iconv/loop.c>
+
+
+/* Now define the toplevel functions.  */
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index ce4aa3b..b4d793b 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -7,35 +7,3 @@ CFLAGS-rtld.c += -Wno-uninitialized -Wno-unused
 CFLAGS-dl-load.c += -Wno-unused
 CFLAGS-dl-reloc.c += -Wno-unused
 endif
-
-ifeq ($(subdir),iconvdata)
-ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
-ISO-8859-1_CP037_Z900-map := gconv.map
-
-UTF8_UTF32_Z9-routines := utf8-utf32-z9
-UTF8_UTF32_Z9-map := gconv.map
-
-UTF16_UTF32_Z9-routines := utf16-utf32-z9
-UTF16_UTF32_Z9-map := gconv.map
-
-UTF8_UTF16_Z9-routines := utf8-utf16-z9
-UTF8_UTF16_Z9-map := gconv.map
-
-s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
-
-extra-modules-left += $(s390x-iconv-modules)
-include extra-module.mk
-
-cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
-lib := iconvdata
-include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
-
-extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
-install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
-
-$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
-$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
-	$(do-install-program)
-
-sysdeps-gconv-modules = ../sysdeps/s390/gconv-modules
-endif
diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
deleted file mode 100644
index 3b63e6a..0000000
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/* Conversion between ISO 8859-1 and IBM037.
-
-   This module uses the translate instruction.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-
-// conversion table from ISO-8859-1 to IBM037
-static const unsigned char table_iso8859_1_to_cp037[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x37, [0x05] = 0x2D, [0x06] = 0x2E, [0x07] = 0x2F,
-  [0x08] = 0x16, [0x09] = 0x05, [0x0A] = 0x25, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x3C, [0x15] = 0x3D, [0x16] = 0x32, [0x17] = 0x26,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x3F, [0x1B] = 0x27,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x40, [0x21] = 0x5A, [0x22] = 0x7F, [0x23] = 0x7B,
-  [0x24] = 0x5B, [0x25] = 0x6C, [0x26] = 0x50, [0x27] = 0x7D,
-  [0x28] = 0x4D, [0x29] = 0x5D, [0x2A] = 0x5C, [0x2B] = 0x4E,
-  [0x2C] = 0x6B, [0x2D] = 0x60, [0x2E] = 0x4B, [0x2F] = 0x61,
-  [0x30] = 0xF0, [0x31] = 0xF1, [0x32] = 0xF2, [0x33] = 0xF3,
-  [0x34] = 0xF4, [0x35] = 0xF5, [0x36] = 0xF6, [0x37] = 0xF7,
-  [0x38] = 0xF8, [0x39] = 0xF9, [0x3A] = 0x7A, [0x3B] = 0x5E,
-  [0x3C] = 0x4C, [0x3D] = 0x7E, [0x3E] = 0x6E, [0x3F] = 0x6F,
-  [0x40] = 0x7C, [0x41] = 0xC1, [0x42] = 0xC2, [0x43] = 0xC3,
-  [0x44] = 0xC4, [0x45] = 0xC5, [0x46] = 0xC6, [0x47] = 0xC7,
-  [0x48] = 0xC8, [0x49] = 0xC9, [0x4A] = 0xD1, [0x4B] = 0xD2,
-  [0x4C] = 0xD3, [0x4D] = 0xD4, [0x4E] = 0xD5, [0x4F] = 0xD6,
-  [0x50] = 0xD7, [0x51] = 0xD8, [0x52] = 0xD9, [0x53] = 0xE2,
-  [0x54] = 0xE3, [0x55] = 0xE4, [0x56] = 0xE5, [0x57] = 0xE6,
-  [0x58] = 0xE7, [0x59] = 0xE8, [0x5A] = 0xE9, [0x5B] = 0xBA,
-  [0x5C] = 0xE0, [0x5D] = 0xBB, [0x5E] = 0xB0, [0x5F] = 0x6D,
-  [0x60] = 0x79, [0x61] = 0x81, [0x62] = 0x82, [0x63] = 0x83,
-  [0x64] = 0x84, [0x65] = 0x85, [0x66] = 0x86, [0x67] = 0x87,
-  [0x68] = 0x88, [0x69] = 0x89, [0x6A] = 0x91, [0x6B] = 0x92,
-  [0x6C] = 0x93, [0x6D] = 0x94, [0x6E] = 0x95, [0x6F] = 0x96,
-  [0x70] = 0x97, [0x71] = 0x98, [0x72] = 0x99, [0x73] = 0xA2,
-  [0x74] = 0xA3, [0x75] = 0xA4, [0x76] = 0xA5, [0x77] = 0xA6,
-  [0x78] = 0xA7, [0x79] = 0xA8, [0x7A] = 0xA9, [0x7B] = 0xC0,
-  [0x7C] = 0x4F, [0x7D] = 0xD0, [0x7E] = 0xA1, [0x7F] = 0x07,
-  [0x80] = 0x20, [0x81] = 0x21, [0x82] = 0x22, [0x83] = 0x23,
-  [0x84] = 0x24, [0x85] = 0x15, [0x86] = 0x06, [0x87] = 0x17,
-  [0x88] = 0x28, [0x89] = 0x29, [0x8A] = 0x2A, [0x8B] = 0x2B,
-  [0x8C] = 0x2C, [0x8D] = 0x09, [0x8E] = 0x0A, [0x8F] = 0x1B,
-  [0x90] = 0x30, [0x91] = 0x31, [0x92] = 0x1A, [0x93] = 0x33,
-  [0x94] = 0x34, [0x95] = 0x35, [0x96] = 0x36, [0x97] = 0x08,
-  [0x98] = 0x38, [0x99] = 0x39, [0x9A] = 0x3A, [0x9B] = 0x3B,
-  [0x9C] = 0x04, [0x9D] = 0x14, [0x9E] = 0x3E, [0x9F] = 0xFF,
-  [0xA0] = 0x41, [0xA1] = 0xAA, [0xA2] = 0x4A, [0xA3] = 0xB1,
-  [0xA4] = 0x9F, [0xA5] = 0xB2, [0xA6] = 0x6A, [0xA7] = 0xB5,
-  [0xA8] = 0xBD, [0xA9] = 0xB4, [0xAA] = 0x9A, [0xAB] = 0x8A,
-  [0xAC] = 0x5F, [0xAD] = 0xCA, [0xAE] = 0xAF, [0xAF] = 0xBC,
-  [0xB0] = 0x90, [0xB1] = 0x8F, [0xB2] = 0xEA, [0xB3] = 0xFA,
-  [0xB4] = 0xBE, [0xB5] = 0xA0, [0xB6] = 0xB6, [0xB7] = 0xB3,
-  [0xB8] = 0x9D, [0xB9] = 0xDA, [0xBA] = 0x9B, [0xBB] = 0x8B,
-  [0xBC] = 0xB7, [0xBD] = 0xB8, [0xBE] = 0xB9, [0xBF] = 0xAB,
-  [0xC0] = 0x64, [0xC1] = 0x65, [0xC2] = 0x62, [0xC3] = 0x66,
-  [0xC4] = 0x63, [0xC5] = 0x67, [0xC6] = 0x9E, [0xC7] = 0x68,
-  [0xC8] = 0x74, [0xC9] = 0x71, [0xCA] = 0x72, [0xCB] = 0x73,
-  [0xCC] = 0x78, [0xCD] = 0x75, [0xCE] = 0x76, [0xCF] = 0x77,
-  [0xD0] = 0xAC, [0xD1] = 0x69, [0xD2] = 0xED, [0xD3] = 0xEE,
-  [0xD4] = 0xEB, [0xD5] = 0xEF, [0xD6] = 0xEC, [0xD7] = 0xBF,
-  [0xD8] = 0x80, [0xD9] = 0xFD, [0xDA] = 0xFE, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xAD, [0xDE] = 0xAE, [0xDF] = 0x59,
-  [0xE0] = 0x44, [0xE1] = 0x45, [0xE2] = 0x42, [0xE3] = 0x46,
-  [0xE4] = 0x43, [0xE5] = 0x47, [0xE6] = 0x9C, [0xE7] = 0x48,
-  [0xE8] = 0x54, [0xE9] = 0x51, [0xEA] = 0x52, [0xEB] = 0x53,
-  [0xEC] = 0x58, [0xED] = 0x55, [0xEE] = 0x56, [0xEF] = 0x57,
-  [0xF0] = 0x8C, [0xF1] = 0x49, [0xF2] = 0xCD, [0xF3] = 0xCE,
-  [0xF4] = 0xCB, [0xF5] = 0xCF, [0xF6] = 0xCC, [0xF7] = 0xE1,
-  [0xF8] = 0x70, [0xF9] = 0xDD, [0xFA] = 0xDE, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0x8D, [0xFE] = 0x8E, [0xFF] = 0xDF
-};
-
-// conversion table from IBM037 to ISO-8859-1
-static const unsigned char table_cp037_iso8859_1[256]
-__attribute__ ((aligned (8))) =
-{
-  [0x00] = 0x00, [0x01] = 0x01, [0x02] = 0x02, [0x03] = 0x03,
-  [0x04] = 0x9C, [0x05] = 0x09, [0x06] = 0x86, [0x07] = 0x7F,
-  [0x08] = 0x97, [0x09] = 0x8D, [0x0A] = 0x8E, [0x0B] = 0x0B,
-  [0x0C] = 0x0C, [0x0D] = 0x0D, [0x0E] = 0x0E, [0x0F] = 0x0F,
-  [0x10] = 0x10, [0x11] = 0x11, [0x12] = 0x12, [0x13] = 0x13,
-  [0x14] = 0x9D, [0x15] = 0x85, [0x16] = 0x08, [0x17] = 0x87,
-  [0x18] = 0x18, [0x19] = 0x19, [0x1A] = 0x92, [0x1B] = 0x8F,
-  [0x1C] = 0x1C, [0x1D] = 0x1D, [0x1E] = 0x1E, [0x1F] = 0x1F,
-  [0x20] = 0x80, [0x21] = 0x81, [0x22] = 0x82, [0x23] = 0x83,
-  [0x24] = 0x84, [0x25] = 0x0A, [0x26] = 0x17, [0x27] = 0x1B,
-  [0x28] = 0x88, [0x29] = 0x89, [0x2A] = 0x8A, [0x2B] = 0x8B,
-  [0x2C] = 0x8C, [0x2D] = 0x05, [0x2E] = 0x06, [0x2F] = 0x07,
-  [0x30] = 0x90, [0x31] = 0x91, [0x32] = 0x16, [0x33] = 0x93,
-  [0x34] = 0x94, [0x35] = 0x95, [0x36] = 0x96, [0x37] = 0x04,
-  [0x38] = 0x98, [0x39] = 0x99, [0x3A] = 0x9A, [0x3B] = 0x9B,
-  [0x3C] = 0x14, [0x3D] = 0x15, [0x3E] = 0x9E, [0x3F] = 0x1A,
-  [0x40] = 0x20, [0x41] = 0xA0, [0x42] = 0xE2, [0x43] = 0xE4,
-  [0x44] = 0xE0, [0x45] = 0xE1, [0x46] = 0xE3, [0x47] = 0xE5,
-  [0x48] = 0xE7, [0x49] = 0xF1, [0x4A] = 0xA2, [0x4B] = 0x2E,
-  [0x4C] = 0x3C, [0x4D] = 0x28, [0x4E] = 0x2B, [0x4F] = 0x7C,
-  [0x50] = 0x26, [0x51] = 0xE9, [0x52] = 0xEA, [0x53] = 0xEB,
-  [0x54] = 0xE8, [0x55] = 0xED, [0x56] = 0xEE, [0x57] = 0xEF,
-  [0x58] = 0xEC, [0x59] = 0xDF, [0x5A] = 0x21, [0x5B] = 0x24,
-  [0x5C] = 0x2A, [0x5D] = 0x29, [0x5E] = 0x3B, [0x5F] = 0xAC,
-  [0x60] = 0x2D, [0x61] = 0x2F, [0x62] = 0xC2, [0x63] = 0xC4,
-  [0x64] = 0xC0, [0x65] = 0xC1, [0x66] = 0xC3, [0x67] = 0xC5,
-  [0x68] = 0xC7, [0x69] = 0xD1, [0x6A] = 0xA6, [0x6B] = 0x2C,
-  [0x6C] = 0x25, [0x6D] = 0x5F, [0x6E] = 0x3E, [0x6F] = 0x3F,
-  [0x70] = 0xF8, [0x71] = 0xC9, [0x72] = 0xCA, [0x73] = 0xCB,
-  [0x74] = 0xC8, [0x75] = 0xCD, [0x76] = 0xCE, [0x77] = 0xCF,
-  [0x78] = 0xCC, [0x79] = 0x60, [0x7A] = 0x3A, [0x7B] = 0x23,
-  [0x7C] = 0x40, [0x7D] = 0x27, [0x7E] = 0x3D, [0x7F] = 0x22,
-  [0x80] = 0xD8, [0x81] = 0x61, [0x82] = 0x62, [0x83] = 0x63,
-  [0x84] = 0x64, [0x85] = 0x65, [0x86] = 0x66, [0x87] = 0x67,
-  [0x88] = 0x68, [0x89] = 0x69, [0x8A] = 0xAB, [0x8B] = 0xBB,
-  [0x8C] = 0xF0, [0x8D] = 0xFD, [0x8E] = 0xFE, [0x8F] = 0xB1,
-  [0x90] = 0xB0, [0x91] = 0x6A, [0x92] = 0x6B, [0x93] = 0x6C,
-  [0x94] = 0x6D, [0x95] = 0x6E, [0x96] = 0x6F, [0x97] = 0x70,
-  [0x98] = 0x71, [0x99] = 0x72, [0x9A] = 0xAA, [0x9B] = 0xBA,
-  [0x9C] = 0xE6, [0x9D] = 0xB8, [0x9E] = 0xC6, [0x9F] = 0xA4,
-  [0xA0] = 0xB5, [0xA1] = 0x7E, [0xA2] = 0x73, [0xA3] = 0x74,
-  [0xA4] = 0x75, [0xA5] = 0x76, [0xA6] = 0x77, [0xA7] = 0x78,
-  [0xA8] = 0x79, [0xA9] = 0x7A, [0xAA] = 0xA1, [0xAB] = 0xBF,
-  [0xAC] = 0xD0, [0xAD] = 0xDD, [0xAE] = 0xDE, [0xAF] = 0xAE,
-  [0xB0] = 0x5E, [0xB1] = 0xA3, [0xB2] = 0xA5, [0xB3] = 0xB7,
-  [0xB4] = 0xA9, [0xB5] = 0xA7, [0xB6] = 0xB6, [0xB7] = 0xBC,
-  [0xB8] = 0xBD, [0xB9] = 0xBE, [0xBA] = 0x5B, [0xBB] = 0x5D,
-  [0xBC] = 0xAF, [0xBD] = 0xA8, [0xBE] = 0xB4, [0xBF] = 0xD7,
-  [0xC0] = 0x7B, [0xC1] = 0x41, [0xC2] = 0x42, [0xC3] = 0x43,
-  [0xC4] = 0x44, [0xC5] = 0x45, [0xC6] = 0x46, [0xC7] = 0x47,
-  [0xC8] = 0x48, [0xC9] = 0x49, [0xCA] = 0xAD, [0xCB] = 0xF4,
-  [0xCC] = 0xF6, [0xCD] = 0xF2, [0xCE] = 0xF3, [0xCF] = 0xF5,
-  [0xD0] = 0x7D, [0xD1] = 0x4A, [0xD2] = 0x4B, [0xD3] = 0x4C,
-  [0xD4] = 0x4D, [0xD5] = 0x4E, [0xD6] = 0x4F, [0xD7] = 0x50,
-  [0xD8] = 0x51, [0xD9] = 0x52, [0xDA] = 0xB9, [0xDB] = 0xFB,
-  [0xDC] = 0xFC, [0xDD] = 0xF9, [0xDE] = 0xFA, [0xDF] = 0xFF,
-  [0xE0] = 0x5C, [0xE1] = 0xF7, [0xE2] = 0x53, [0xE3] = 0x54,
-  [0xE4] = 0x55, [0xE5] = 0x56, [0xE6] = 0x57, [0xE7] = 0x58,
-  [0xE8] = 0x59, [0xE9] = 0x5A, [0xEA] = 0xB2, [0xEB] = 0xD4,
-  [0xEC] = 0xD6, [0xED] = 0xD2, [0xEE] = 0xD3, [0xEF] = 0xD5,
-  [0xF0] = 0x30, [0xF1] = 0x31, [0xF2] = 0x32, [0xF3] = 0x33,
-  [0xF4] = 0x34, [0xF5] = 0x35, [0xF6] = 0x36, [0xF7] = 0x37,
-  [0xF8] = 0x38, [0xF9] = 0x39, [0xFA] = 0xB3, [0xFB] = 0xDB,
-  [0xFC] = 0xDC, [0xFD] = 0xD9, [0xFE] = 0xDA, [0xFF] = 0x9F
-};
-
-/* Definitions used in the body of the `gconv' function.  */
-#define CHARSET_NAME		"ISO-8859-1//"
-#define FROM_LOOP		iso8859_1_to_cp037_z900
-#define TO_LOOP			cp037_to_iso8859_1_z900
-#define DEFINE_INIT		1
-#define DEFINE_FINI		1
-#define MIN_NEEDED_FROM		1
-#define MIN_NEEDED_TO		1
-
-#define TR_LOOP(TABLE)							\
-  {									\
-    size_t length = (inend - inptr < outend - outptr			\
-		     ? inend - inptr : outend - outptr);		\
-									\
-    /* Process in 256 byte blocks.  */					\
-    if (__builtin_expect (length >= 256, 0))				\
-      {									\
-	size_t blocks = length / 256;					\
-	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
-			     "   tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
-			     "   la %[R_IN],256(%[R_IN])\n\t"		\
-			     "   la %[R_OUT],256(%[R_OUT])\n\t"		\
-			     "   brctg %[R_LI],0b\n\t"			\
-			     : /* outputs */ [R_IN] "+a" (inptr)	\
-			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
-			     : /* inputs */ [R_TBL] "a" (TABLE)		\
-			     : /* clobber list */ "memory"		\
-			     );						\
-	length = length % 256;						\
-      }									\
-									\
-    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
-    if (length >= 8)							\
-      {									\
-	size_t blocks = length / 8;					\
-	for (int i = 0; i < blocks; i++)				\
-	  {								\
-	    outptr[0] = TABLE[inptr[0]];				\
-	    outptr[1] = TABLE[inptr[1]];				\
-	    outptr[2] = TABLE[inptr[2]];				\
-	    outptr[3] = TABLE[inptr[3]];				\
-	    outptr[4] = TABLE[inptr[4]];				\
-	    outptr[5] = TABLE[inptr[5]];				\
-	    outptr[6] = TABLE[inptr[6]];				\
-	    outptr[7] = TABLE[inptr[7]];				\
-	    inptr += 8;							\
-	    outptr += 8;						\
-	  }								\
-	length = length % 8;						\
-      }									\
-									\
-    /* Process remaining 0...7 bytes.  */				\
-    switch (length)							\
-      {									\
-      case 7: outptr[6] = TABLE[inptr[6]];				\
-      case 6: outptr[5] = TABLE[inptr[5]];				\
-      case 5: outptr[4] = TABLE[inptr[4]];				\
-      case 4: outptr[3] = TABLE[inptr[3]];				\
-      case 3: outptr[2] = TABLE[inptr[2]];				\
-      case 2: outptr[1] = TABLE[inptr[1]];				\
-      case 1: outptr[0] = TABLE[inptr[0]];				\
-      case 0: break;							\
-      }									\
-    inptr += length;							\
-    outptr += length;							\
-  }
-
-
-/* First define the conversion function from ISO 8859-1 to CP037.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
-#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
-
-#include <iconv/loop.c>
-
-
-/* Next, define the conversion function from CP037 to ISO 8859-1.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
-#define BODY			TR_LOOP (table_cp037_iso8859_1);
-
-#include <iconv/loop.c>
-
-
-/* Now define the toplevel functions.  */
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
deleted file mode 100644
index 61d0a94..0000000
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ /dev/null
@@ -1,624 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM_UTF32               0x0000feffu
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16               0xfeff
-
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		2
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf16_loop
-#define TO_LOOP			__to_utf16_loop
-#define FROM_DIRECTION		(dir == from_utf16)
-#define ONE_DIRECTION           0
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf16,
-  from_utf16
-};
-
-struct utf16_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf16_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf16;
-    }
-  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
-	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf16;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf16)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      if (dir == to_utf16)						\
-	{								\
-	  /* Emit the UTF-16 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 2 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put16u (outbuf, BOM_UTF16);					\
-	  outbuf += 2;							\
-	}								\
-      else								\
-	{								\
-	  /* Emit the UTF-32 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 4 > outend))			\
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put32u (outbuf, BOM_UTF32);					\
-	  outbuf += 4;							\
-	}								\
-    }
-
-/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_FROM_C							\
-  {									\
-    uint16_t u1 = get16 (inptr);					\
-									\
-    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
-      {									\
-	/* No surrogate.  */						\
-	put32 (outptr, u1);						\
-	inptr += 2;							\
-      }									\
-    else								\
-      {									\
-	/* An isolated low-surrogate was found.  This has to be         \
-	   considered ill-formed.  */					\
-	if (__glibc_unlikely (u1 >= 0xdc00))				\
-	  {								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	/* It's a surrogate character.  At least the first word says	\
-	   it is.  */							\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    /* We don't have enough input for another complete input	\
-	       character.  */						\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	uint16_t u2 = get16 (inptr);					\
-	if (__builtin_expect (u2 < 0xdc00, 0)				\
-	    || __builtin_expect (u2 > 0xdfff, 0))			\
-	  {								\
-	    /* This is no valid second word for a surrogate.  */	\
-	    inptr -= 2;							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
-	  }								\
-									\
-	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
-	inptr += 2;							\
-      }									\
-    outptr += 4;							\
-  }
-
-#define BODY_FROM_VX							\
-  {									\
-    size_t inlen = inend - inptr;					\
-    size_t outlen = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "    larl %[R_TMP],9f\n\t"				\
-		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
-		  "0:  clgijl %[R_INLEN],16,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
-		  /* Check for surrogate chars.  */			\
-		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t"					\
-		  /* Enlarge to UTF-32.  */				\
-		  "    vuplhh %%v17,%%v16\n\t"				\
-		  "    la %[R_IN],16(%[R_IN])\n\t"			\
-		  "    vupllh %%v18,%%v16\n\t"				\
-		  "    aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
-		  "    aghi %[R_OUTLEN],-32\n\t"			\
-		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],16,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
-		  "9:  .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
-		  "    .short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
-		  /* At least on uint16_t is in range of surrogates.	\
-		     Store the preceding chars.  */			\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "    vuplhh %%v17,%%v16\n\t"				\
-		  "    sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 12f\n\t"					\
-		  "    vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "    vupllh %%v18,%%v16\n\t"				\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11: \n\t" /* Update pointers.  */			\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "12: lghi %[R_TMP2],16\n\t"				\
-		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
-		  "    srl %[R_TMP2],1\n\t"				\
-		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    aghi %[R_OUTLEN],-4\n\t"				\
-		  "    j 16f\n\t"					\
-		  /* Handle remaining bytes.  */			\
-		  "2:  \n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "    clgfi %[R_INLEN],1\n\t"				\
-		  "    je 97f\n\t" /* Only one byte available.  */	\
-		  "    jl 99f\n\t" /* End if no bytes available.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle remaining uint16_t values.  */		\
-		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    slgfi %[R_OUTLEN],4\n\t"				\
-		  "    jl 96f \n\t"					\
-		  "    clfi %[R_TMP],0xd800\n\t"			\
-		  "    jhe 15f\n\t"					\
-		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-2\n\t"				\
-		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],13b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Handle UTF-16 surrogate pair.  */			\
-		  "15: clfi %[R_TMP],0xdfff\n\t"			\
-		  "    jh 14b\n\t" /* Jump away if ch > 0xdfff.  */	\
-		  "16: clfi %[R_TMP],0xdc00\n\t"			\
-		  "    jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
-		  "    slgfi %[R_INLEN],4\n\t"				\
-		  "    jl 97f\n\t" /* Big enough input?  */		\
-		  "    llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "    slfi %[R_TMP],0xd7c0\n\t"			\
-		  "    sll %[R_TMP],10\n\t"				\
-		  "    risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
-		  "    nilf %[R_TMP3],0xfc00\n\t"			\
-		  "    clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "    jne 98f\n\t"					\
-		  "    st %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],4(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
-		  "    aghi %[R_TMP2],-2\n\t"				\
-		  "    jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  "96: \n\t" /* Return full output.  */			\
-		  "    lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
-		  "    j 99f\n\t"					\
-		  "97: \n\t" /* Return incomplete input.  */		\
-		  "    lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
-		  "    j 99f\n\t"					\
-		  "98:\n\t" /* Return Illegal character.  */		\
-		  "    lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
-		  "99:\n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
-  }
-
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__from_utf16_loop_c
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf16_loop_c)
-__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
-__from_utf16_loop;
-
-static void *
-__from_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf16_loop_vx;
-  else
-    return __from_utf16_loop_c;
-}
-
-strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
-#else
-# define LOOPFCT		FROM_LOOP
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_C
-# include <iconv/loop.c>
-#endif
-
-/* Conversion from UTF-32 internal/BE to UTF-16.  */
-
-/* The software routine is copied from utf-16.c (minus bytes
-   swapping).  */
-#define BODY_TO_C							\
-  {									\
-    uint32_t c = get32 (inptr);						\
-									\
-    if (__builtin_expect (c <= 0xd7ff, 1)				\
-	|| (c >=0xdc00 && c <= 0xffff))					\
-      {									\
-	/* Two UTF-16 chars.  */					\
-	put16 (outptr, c);						\
-      }									\
-    else if (__builtin_expect (c >= 0x10000, 1)				\
-	     && __builtin_expect (c <= 0x10ffff, 1))			\
-      {									\
-	/* Four UTF-16 chars.  */					\
-	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
-	uint16_t out;							\
-									\
-	/* Generate a surrogate character.  */				\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	out = 0xd800;							\
-	out |= (zabcd & 0xff) << 6;					\
-	out |= (c >> 10) & 0x3f;					\
-	put16 (outptr, out);						\
-	outptr += 2;							\
-									\
-	out = 0xdc00;							\
-	out |= c & 0x3ff;						\
-	put16 (outptr, out);						\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
-      }									\
-    outptr += 2;							\
-    inptr += 4;								\
-  }
-
-#define BODY_TO_ETF3EH							\
-  {									\
-    HARDWARE_CONVERT ("cu42 %0, %1");					\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for surrogates.  */			\
-		  "    larl %[R_TMP],9f\n\t"				\
-		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars			\
-		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
-		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP2],0\n\t"				\
-		  /* Shorten to UTF-16.  */				\
-		  "    vpkf %%v18,%%v16,%%v17\n\t"			\
-		  /* Check for surrogate chars.  */			\
-		  "    vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t"					\
-		  "    vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"		\
-		  "    jno 11f\n\t"					\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "    vst %%v18,0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],32(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-32\n\t"				\
-		  "    aghi %[R_OUTLEN],-16\n\t"			\
-		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
-		     and check for ch >= 0x10000. (v30, v31)  */	\
-		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
-		  "    .long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
-		  /* At least on UTF32 char is in range of surrogates.	\
-		     Store the preceding characters.  */		\
-		  "11: ahi %[R_TMP2],16\n\t"				\
-		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
-		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 20f\n\t"					\
-		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handles UTF16 surrogates with convert instruction.  */ \
-		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf16_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_TO_VX
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf16_loop_c)
-__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
-__to_utf16_loop;
-
-static void *
-__to_utf16_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf16_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf16_loop_etf3eh;
-  else
-    return __to_utf16_loop_c;
-}
-
-strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
deleted file mode 100644
index 7520ef2..0000000
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ /dev/null
@@ -1,806 +0,0 @@
-/* Conversion between UTF-16 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		4
-#define MIN_NEEDED_TO		2
-#define MAX_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	0xfeff
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
-	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the UTF-16 Byte Order Mark.  */				\
-      if (__glibc_unlikely (outbuf + 2 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put16u (outbuf, BOM_UTF16);					\
-      outbuf += 2;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-16.  */
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
-		  "    vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
-		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
-				       UTF8 chars.  */			\
-		  /* Enlarge to UTF-16.  */				\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    la %[R_IN],16(%[R_IN])\n\t"			\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    aghi %[R_INLEN],-16\n\t"				\
-		  /* Store 32 bytes to buf_out.  */			\
-		  "    vstm %%v18,%%v19,0(%[R_OUT])\n\t"		\
-		  "    aghi %[R_OUTLEN],-32\n\t"			\
-		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  "10:\n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
-		  "    sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest \
-							 index to store. */ \
-		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "    ahi %[R_TMP2],-1\n\t"				\
-		  "    jl 20f\n\t"					\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "11: \n\t" /* Update pointers.  */			\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-
-/* The software implementation is based on the code in gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint16_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0	\
-	       or 0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	if (cnt == 4)							\
-	  {								\
-	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
-	       low) are needed.  */					\
-	    uint16_t zabcd, high, low;					\
-									\
-	    if (__glibc_unlikely (outptr + 4 > outend))			\
-	      {								\
-		/* Overflow in the output buffer.  */			\
-		result = __GCONV_FULL_OUTPUT;				\
-		break;							\
-	      }								\
-									\
-	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		if ((inptr[i] & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  goto errout;						\
-	      }								\
-									\
-	    /* See Principles of Operations cu12.  */			\
-	    zabcd = (((inptr[0] & 0x7) << 2) |				\
-		     ((inptr[1] & 0x30) >> 4)) - 1;			\
-									\
-	    /* z-bit must be zero after subtracting 1.  */		\
-	    if (zabcd & 0x10)						\
-	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
-									\
-	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
-	    high |= zabcd << 6;                         /* abcd bits */	\
-	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
-	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
-									\
-	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
-	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
-	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
-	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
-									\
-	    put16 (outptr, high);					\
-	    outptr += 2;						\
-	    put16 (outptr, low);					\
-	    outptr += 2;						\
-	    inptr += 4;							\
-	    continue;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Read the possible remaining bytes.  */			\
-	    for (i = 1; i < cnt; ++i)					\
-	      {								\
-		uint16_t byte = inptr[i];				\
-									\
-		if ((byte & 0xc0) != 0x80)				\
-		  /* This is an illegal encoding.  */			\
-		  break;						\
-									\
-		ch <<= 6;						\
-		ch |= byte & 0x3f;					\
-	      }								\
-									\
-	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
-	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
-	       have been represented with fewer than cnt bytes.  */	\
-	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
-		/* Do not accept UTF-16 surrogates.  */			\
-		|| (ch >= 0xd800 && ch <= 0xdfff))			\
-	      {								\
-		/* This is an illegal encoding.  */			\
-		goto errout;						\
-	      }								\
-									\
-	    inptr += cnt;						\
-	  }								\
-      }									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint16_t *) outptr) = ch;					\
-    outptr += sizeof (uint16_t);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-# define LOOP_NEED_FLAGS
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-/* Conversion from UTF-16 to UTF-8.  */
-
-/* The software routine is based on the functionality of the S/390
-   hardware instruction (cu21) as described in the Principles of
-   Operation.  */
-#define BODY_TO_C							\
-  {									\
-    uint16_t c = get16 (inptr);						\
-									\
-    if (__glibc_likely (c <= 0x007f))					\
-      {									\
-	/* Single byte UTF-8 char.  */					\
-	*outptr = c & 0xff;						\
-	outptr++;							\
-      }									\
-    else if (c >= 0x0080 && c <= 0x07ff)				\
-      {									\
-	/* Two byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 2 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-									\
-	outptr[0] = 0xc0;						\
-	outptr[0] |= c >> 6;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= c & 0x3f;						\
-									\
-	outptr += 2;							\
-      }									\
-    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
-      {									\
-	/* Three byte UTF-8 char.  */					\
-									\
-	if (__glibc_unlikely (outptr + 3 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	outptr[0] = 0xe0;						\
-	outptr[0] |= c >> 12;						\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (c >> 6) & 0x3f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= c & 0x3f;						\
-									\
-	outptr += 3;							\
-      }									\
-    else if (c >= 0xd800 && c <= 0xdbff)				\
-      {									\
-	/* Four byte UTF-8 char.  */					\
-	uint16_t low, uvwxy;						\
-									\
-	if (__glibc_unlikely (outptr + 4 > outend))			\
-	  {								\
-	    /* Overflow in the output buffer.  */			\
-	    result = __GCONV_FULL_OUTPUT;				\
-	    break;							\
-	  }								\
-	if (__glibc_unlikely (inptr + 4 > inend))			\
-	  {								\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-									\
-	inptr += 2;							\
-	low = get16 (inptr);						\
-									\
-	if ((low & 0xfc00) != 0xdc00)					\
-	  {								\
-	    inptr -= 2;							\
-	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-	  }								\
-	uvwxy = ((c >> 6) & 0xf) + 1;					\
-	outptr[0] = 0xf0;						\
-	outptr[0] |= uvwxy >> 2;					\
-									\
-	outptr[1] = 0x80;						\
-	outptr[1] |= (uvwxy << 4) & 0x30;				\
-	outptr[1] |= (c >> 2) & 0x0f;					\
-									\
-	outptr[2] = 0x80;						\
-	outptr[2] |= (c & 0x03) << 4;					\
-	outptr[2] |= (low >> 6) & 0x0f;					\
-									\
-	outptr[3] = 0x80;						\
-	outptr[3] |= low & 0x3f;					\
-									\
-	outptr += 4;							\
-      }									\
-    else								\
-      {									\
-	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
-      }									\
-    inptr += 2;								\
-  }
-
-#define BODY_TO_VX							\
-  {									\
-    size_t inlen  = inend - inptr;					\
-    size_t outlen  = outend - outptr;					\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  /* Setup to check for values <= 0x7f.  */		\
-		  "    larl %[R_TMP],9f\n\t"				\
-		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
-		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
-		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP2],0\n\t"				\
-		  /* Check for > 1byte UTF-8 chars.  */			\
-		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
-				       UTF8 chars.  */			\
-		  "    vstrchs %%v19,%%v17,%%v30,%%v31\n\t"		\
-		  "    jno 11f\n\t" /* Jump away if not all bytes are 1byte \
-				       UTF8 chars.  */			\
-		  /* Shorten to UTF-8.  */				\
-		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
-		  "    la %[R_IN],32(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-32\n\t"				\
-		  /* Store 16 bytes to buf_out.  */			\
-		  "    vst %%v18,0(%[R_OUT])\n\t"			\
-		  "    aghi %[R_OUTLEN],-16\n\t"			\
-		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],32,2f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
-		  "9:  .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
-		  "    .short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
-		  "10:\n\t"						\
-		  "    vlgvb %[R_TMP],%%v19,7\n\t"			\
-		  /* Shorten to UTF-8.  */				\
-		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
-		  "    ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
-		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
-		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 13f\n\t"					\
-		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  /* Update pointers.  */				\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  "13: \n\t"						\
-		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
-		  "    lghi %[R_TMP2],16\n\t"				\
-		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
-		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-2\n\t"				\
-		  "    j 22f\n\t"					\
-		  /* Handle remaining bytes.  */			\
-		  "2:  \n\t"						\
-		  /* Zero, one or more bytes available?  */		\
-		  "    clgfi %[R_INLEN],1\n\t"				\
-		  "    locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
-		  "    jle 99f\n\t" /* End if less than two bytes.  */	\
-		  /* Calculate remaining uint16_t values in inptr.  */	\
-		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
-		  /* Handle multibyte utf8-char. */			\
-		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
-		  "    aghi %[R_INLEN],-2\n\t"				\
-		  /* Test if ch is 1-byte UTF-8 char.  */		\
-		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
-		  /* Handle 1-byte UTF-8 char.  */			\
-		  "31: slgfi %[R_OUTLEN],1\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],20b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Test if ch is 2-byte UTF-8 char.  */		\
-		  "22: clfi %[R_TMP],0x7ff\n\t"				\
-		  "    jh 23f\n\t"					\
-		  /* Handle 2-byte UTF-8 char.  */			\
-		  "32: slgfi %[R_OUTLEN],2\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    llill %[R_TMP3],0xc080\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
-		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],20b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Test if ch is 3-byte UTF-8 char.  */		\
-		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
-		  "    jh 24f\n\t"					\
-		  /* Handle 3-byte UTF-8 char.  */			\
-		  "33: slgfi %[R_OUTLEN],3\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    llilf %[R_TMP3],0xe08080\n\t"			\
-		  "    la %[R_IN],2(%[R_IN])\n\t"			\
-		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
-		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
-		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
-		  "    brctg %[R_TMP2],20b\n\t"				\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Test if ch is 4-byte UTF-8 char.  */		\
-		  "24: clfi %[R_TMP],0xdfff\n\t"			\
-		  "    jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */ \
-		  "    clfi %[R_TMP],0xdbff\n\t"			\
-		  "    locghih %[R_RES],%[RES_IN_ILL]\n\t"		\
-		  "    jh 99f\n\t" /* Jump away if this is a low surrogate \
-				      without a preceding high surrogate.  */ \
-		  /* Handle 4-byte UTF-8 char.  */			\
-		  "34: slgfi %[R_OUTLEN],4\n\t"				\
-		  "    jl 90f \n\t"					\
-		  "    slgfi %[R_INLEN],2\n\t"				\
-		  "    locghil %[R_RES],%[RES_IN_FULL]\n\t"		\
-		  "    jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
-		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
-		  "    aghi %[R_TMP],0x40\n\t"				\
-		  "    risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
-		  "    llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
-		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
-		  "    nilf %[R_TMP],0xfc00\n\t"			\
-		  "    clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
-		  "    locghine %[R_RES],%[RES_IN_ILL]\n\t"		\
-		  "    jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
-		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
-		  "    la %[R_IN],4(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
-		  "    aghi %[R_TMP2],-2\n\t"				\
-		  "    jh 20b\n\t"					\
-		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
-		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
-		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
-		  "99: \n\t"						\
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (inptr)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
-		  );							\
-    if (__glibc_likely (inptr == inend)					\
-	|| result != __GCONV_ILLEGAL_INPUT)				\
-      break;								\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
-  }
-
-/* Generate loop-function with software implementation.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#if defined HAVE_S390_VX_ASM_SUPPORT
-# define LOOPFCT		__to_utf8_loop_c
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate loop-function with software implementation.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY                   BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-#else
-# define LOOPFCT		TO_LOOP
-# define BODY                   BODY_TO_C
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
deleted file mode 100644
index f9c9199..0000000
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ /dev/null
@@ -1,807 +0,0 @@
-/* Conversion between UTF-8 and UTF-32 BE/internal.
-
-   This module uses the Z9-109 variants of the Convert Unicode
-   instructions.
-   Copyright (C) 1997-2016 Free Software Foundation, Inc.
-
-   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
-   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
-
-   Thanks to Daniel Appich who covered the relevant performance work
-   in his diploma thesis.
-
-   This is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   This is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <dlfcn.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <dl-procinfo.h>
-#include <gconv.h>
-
-#if defined HAVE_S390_VX_GCC_SUPPORT
-# define ASM_CLOBBER_VR(NR) , NR
-#else
-# define ASM_CLOBBER_VR(NR)
-#endif
-
-/* Defines for skeleton.c.  */
-#define DEFINE_INIT		0
-#define DEFINE_FINI		0
-#define MIN_NEEDED_FROM		1
-#define MAX_NEEDED_FROM		6
-#define MIN_NEEDED_TO		4
-#define FROM_LOOP		__from_utf8_loop
-#define TO_LOOP			__to_utf8_loop
-#define FROM_DIRECTION		(dir == from_utf8)
-#define ONE_DIRECTION           0
-
-/* UTF-32 big endian byte order mark.  */
-#define BOM			0x0000feffu
-
-/* Direction of the transformation.  */
-enum direction
-{
-  illegal_dir,
-  to_utf8,
-  from_utf8
-};
-
-struct utf8_data
-{
-  enum direction dir;
-  int emit_bom;
-};
-
-
-extern int gconv_init (struct __gconv_step *step);
-int
-gconv_init (struct __gconv_step *step)
-{
-  /* Determine which direction.  */
-  struct utf8_data *new_data;
-  enum direction dir = illegal_dir;
-  int emit_bom;
-  int result;
-
-  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
-
-  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
-      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
-	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
-	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
-    {
-      dir = from_utf8;
-    }
-  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
-	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
-	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
-    {
-      dir = to_utf8;
-    }
-
-  result = __GCONV_NOCONV;
-  if (dir != illegal_dir)
-    {
-      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
-
-      result = __GCONV_NOMEM;
-      if (new_data != NULL)
-	{
-	  new_data->dir = dir;
-	  new_data->emit_bom = emit_bom;
-	  step->__data = new_data;
-
-	  if (dir == from_utf8)
-	    {
-	      step->__min_needed_from = MIN_NEEDED_FROM;
-	      step->__max_needed_from = MIN_NEEDED_FROM;
-	      step->__min_needed_to = MIN_NEEDED_TO;
-	      step->__max_needed_to = MIN_NEEDED_TO;
-	    }
-	  else
-	    {
-	      step->__min_needed_from = MIN_NEEDED_TO;
-	      step->__max_needed_from = MIN_NEEDED_TO;
-	      step->__min_needed_to = MIN_NEEDED_FROM;
-	      step->__max_needed_to = MIN_NEEDED_FROM;
-	    }
-
-	  step->__stateful = 0;
-
-	  result = __GCONV_OK;
-	}
-    }
-
-  return result;
-}
-
-
-extern void gconv_end (struct __gconv_step *data);
-void
-gconv_end (struct __gconv_step *data)
-{
-  free (data->__data);
-}
-
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the Byte Order Mark.  */					\
-      if (__glibc_unlikely (outbuf + 4 > outend))			\
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put32u (outbuf, BOM);						\
-      outbuf += 4;							\
-    }
-
-/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
-
-#define STORE_REST_COMMON						      \
-  {									      \
-    /* We store the remaining bytes while converting them into the UCS4	      \
-       format.  We can assume that the first byte in the buffer is	      \
-       correct and that it requires a larger number of bytes than there	      \
-       are in the input buffer.  */					      \
-    wint_t ch = **inptrp;						      \
-    size_t cnt, r;							      \
-									      \
-    state->__count = inend - *inptrp;					      \
-									      \
-    assert (ch != 0xc0 && ch != 0xc1);					      \
-    if (ch >= 0xc2 && ch < 0xe0)					      \
-      {									      \
-	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
-	   0xc1, otherwise the wide character could have been		      \
-	   represented using a single byte.  */				      \
-	cnt = 2;							      \
-	ch &= 0x1f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
-      {									      \
-	/* We expect three bytes.  */					      \
-	cnt = 3;							      \
-	ch &= 0x0f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
-      {									      \
-	/* We expect four bytes.  */					      \
-	cnt = 4;							      \
-	ch &= 0x07;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-      {									      \
-	/* We expect five bytes.  */					      \
-	cnt = 5;							      \
-	ch &= 0x03;							      \
-      }									      \
-    else								      \
-      {									      \
-	/* We expect six bytes.  */					      \
-	cnt = 6;							      \
-	ch &= 0x01;							      \
-      }									      \
-									      \
-    /* The first byte is already consumed.  */				      \
-    r = cnt - 1;							      \
-    while (++(*inptrp) < inend)						      \
-      {									      \
-	ch <<= 6;							      \
-	ch |= **inptrp & 0x3f;						      \
-	--r;								      \
-      }									      \
-									      \
-    /* Shift for the so far missing bytes.  */				      \
-    ch <<= r * 6;							      \
-									      \
-    /* Store the number of bytes expected for the entire sequence.  */	      \
-    state->__count |= cnt << 8;						      \
-									      \
-    /* Store the value.  */						      \
-    state->__value.__wch = ch;						      \
-  }
-
-#define UNPACK_BYTES_COMMON \
-  {									      \
-    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
-    wint_t wch = state->__value.__wch;					      \
-    size_t ntotal = state->__count >> 8;				      \
-									      \
-    inlen = state->__count & 255;					      \
-									      \
-    bytebuf[0] = inmask[ntotal - 2];					      \
-									      \
-    do									      \
-      {									      \
-	if (--ntotal < inlen)						      \
-	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
-	wch >>= 6;							      \
-      }									      \
-    while (ntotal > 1);							      \
-									      \
-    bytebuf[0] |= wch;							      \
-  }
-
-#define CLEAR_STATE_COMMON \
-  state->__count = 0
-
-#define BODY_FROM_HW(ASM)						\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    int i;								\
-    for (i = 1; inptr + i < inend && i < 5; ++i)			\
-      if ((inptr[i] & 0xc0) != 0x80)					\
-	break;								\
-									\
-    if (__glibc_likely (inptr + i == inend				\
-			&& result == __GCONV_EMPTY_INPUT))		\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
-  }
-
-/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
-#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
-
-
-/* The software routine is copied from gconv_simple.c.  */
-#define BODY_FROM_C							\
-  {									\
-    /* Next input byte.  */						\
-    uint32_t ch = *inptr;						\
-									\
-    if (__glibc_likely (ch < 0x80))					\
-      {									\
-	/* One byte sequence.  */					\
-	++inptr;							\
-      }									\
-    else								\
-      {									\
-	uint_fast32_t cnt;						\
-	uint_fast32_t i;						\
-									\
-	if (ch >= 0xc2 && ch < 0xe0)					\
-	  {								\
-	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
-	       0xc1, otherwise the wide character could have been	\
-	       represented using a single byte.  */			\
-	    cnt = 2;							\
-	    ch &= 0x1f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
-	  {								\
-	    /* We expect three bytes.  */				\
-	    cnt = 3;							\
-	    ch &= 0x0f;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
-	  {								\
-	    /* We expect four bytes.  */				\
-	    cnt = 4;							\
-	    ch &= 0x07;							\
-	  }								\
-	else								\
-	  {								\
-	    /* Search the end of this ill-formed UTF-8 character.  This	\
-	       is the next byte with (x & 0xc0) != 0x80.  */		\
-	    i = 0;							\
-	    do								\
-	      ++i;							\
-	    while (inptr + i < inend					\
-		   && (*(inptr + i) & 0xc0) == 0x80			\
-		   && i < 5);						\
-									\
-	  errout:							\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-									\
-	if (__glibc_unlikely (inptr + cnt > inend))			\
-	  {								\
-	    /* We don't have enough input.  But before we report	\
-	       that check that all the bytes are correct.  */		\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-									\
-	    if (__glibc_likely (inptr + i == inend))			\
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-									\
-	    goto errout;						\
-	  }								\
-									\
-	/* Read the possible remaining bytes.  */			\
-	for (i = 1; i < cnt; ++i)					\
-	  {								\
-	    uint32_t byte = inptr[i];					\
-									\
-	    if ((byte & 0xc0) != 0x80)					\
-	      /* This is an illegal encoding.  */			\
-	      break;							\
-									\
-	    ch <<= 6;							\
-	    ch |= byte & 0x3f;						\
-	  }								\
-									\
-	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
-	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
-	   have been represented with fewer than cnt bytes.  */		\
-	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
-	    /* Do not accept UTF-16 surrogates.  */			\
-	    || (ch >= 0xd800 && ch <= 0xdfff)				\
-	    || (ch > 0x10ffff))						\
-	  {								\
-	    /* This is an illegal encoding.  */				\
-	    goto errout;						\
-	  }								\
-									\
-	inptr += cnt;							\
-      }									\
-									\
-    /* Now adjust the pointers and store the result.  */		\
-    *((uint32_t *) outptr) = ch;					\
-    outptr += sizeof (uint32_t);					\
-  }
-
-#define HW_FROM_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2, tmp3;					\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
-		  "    vrepib %%v31,0x20\n\t"				\
-		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
-		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "1: vl %%v16,0(%[R_IN])\n\t"				\
-		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
-		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
-				   UTF8 chars.  */			\
-		  /* Enlarge to UCS4.  */				\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    la %[R_IN],16(%[R_IN])\n\t"			\
-		  "    vuplhh %%v20,%%v18\n\t"				\
-		  "    aghi %[R_INLEN],-16\n\t"				\
-		  "    vupllh %%v21,%%v18\n\t"				\
-		  "    aghi %[R_OUTLEN],-64\n\t"			\
-		  "    vuplhh %%v22,%%v19\n\t"				\
-		  "    vupllh %%v23,%%v19\n\t"				\
-		  /* Store 64 bytes to buf_out.  */			\
-		  "    vstm %%v20,%%v23,0(%[R_OUT])\n\t"		\
-		  "    la %[R_OUT],64(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],16,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  "10: \n\t"						\
-		  /* At least one byte is > 0x7f.			\
-		     Store the preceding 1-byte chars.  */		\
-		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
-		  "    sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest \
-						     index to store. */ \
-		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
-		  "    ahi %[R_TMP2],-1\n\t"				\
-		  "    jl 20f\n\t"					\
-		  "    vuplhb %%v18,%%v16\n\t"				\
-		  "    vuplhh %%v20,%%v18\n\t"				\
-		  "    vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllh %%v21,%%v18\n\t"				\
-		  "    vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllb %%v19,%%v16\n\t"				\
-		  "    vuplhh %%v22,%%v19\n\t"				\
-		  "    vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
-		  "    ahi %[R_TMP2],-16\n\t"				\
-		  "    jl 11f\n\t"					\
-		  "    vupllh %%v23,%%v19\n\t"				\
-		  "    vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
-		  "11: \n\t"						\
-		  /* Update pointers.  */				\
-		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
-		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
-		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
-		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
-		    ASM_CLOBBER_VR ("v31")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
-
-/* These definitions apply to the UTF-8 to UTF-32 direction.  The
-   software implementation for UTF-8 still supports multibyte
-   characters up to 6 bytes whereas the hardware variant does not.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_c
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_C
-#include <iconv/loop.c>
-
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			__from_utf8_loop_etf3eh
-
-#define LOOP_NEED_FLAGS
-
-#define STORE_REST		STORE_REST_COMMON
-#define UNPACK_BYTES		UNPACK_BYTES_COMMON
-#define CLEAR_STATE		CLEAR_STATE_COMMON
-#define BODY			BODY_FROM_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-# define LOOPFCT		__from_utf8_loop_vx
-
-# define LOOP_NEED_FLAGS
-
-# define STORE_REST		STORE_REST_COMMON
-# define UNPACK_BYTES		UNPACK_BYTES_COMMON
-# define CLEAR_STATE		CLEAR_STATE_COMMON
-# define BODY			BODY_FROM_VX
-# include <iconv/loop.c>
-#endif
-
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__from_utf8_loop_c)
-__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
-__from_utf8_loop;
-
-static void *
-__from_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __from_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __from_utf8_loop_etf3eh;
-  else
-    return __from_utf8_loop_c;
-}
-
-strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
-
-
-/* Conversion from UTF-32 internal/BE to UTF-8.  */
-#define BODY_TO_HW(ASM)							\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* The hardware routine uses the S/390 cu41 instruction.  */
-#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
-
-/* The hardware routine uses the S/390 vector and cu41 instructions.  */
-#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
-
-/* The software routine mimics the S/390 cu41 instruction.  */
-#define BODY_TO_C						\
-  {								\
-    uint32_t wc = *((const uint32_t *) inptr);			\
-								\
-    if (__glibc_likely (wc <= 0x7f))				\
-      {								\
-	/* Single UTF-8 char.  */				\
-	*outptr = (uint8_t)wc;					\
-	outptr++;						\
-      }								\
-    else if (wc <= 0x7ff)					\
-      {								\
-	/* Two UTF-8 chars.  */					\
-	if (__glibc_unlikely (outptr + 2 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-								\
-	outptr[0] = 0xc0;					\
-	outptr[0] |= wc >> 6;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= wc & 0x3f;					\
-								\
-	outptr += 2;						\
-      }								\
-    else if (wc <= 0xffff)					\
-      {								\
-	/* Three UTF-8 chars.  */				\
-	if (__glibc_unlikely (outptr + 3 > outend))		\
-	  {							\
-	    /* Overflow in the output buffer.  */		\
-	    result = __GCONV_FULL_OUTPUT;			\
-	    break;						\
-	  }							\
-	if (wc >= 0xd800 && wc < 0xdc00)			\
-	  {							\
-	    /* Do not accept UTF-16 surrogates.   */		\
-	    result = __GCONV_ILLEGAL_INPUT;			\
-	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	  }							\
-	outptr[0] = 0xe0;					\
-	outptr[0] |= wc >> 12;					\
-								\
-	outptr[1] = 0x80;					\
-	outptr[1] |= (wc >> 6) & 0x3f;				\
-								\
-	outptr[2] = 0x80;					\
-	outptr[2] |= wc & 0x3f;					\
-								\
-	outptr += 3;						\
-      }								\
-      else if (wc <= 0x10ffff)					\
-	{							\
-	  /* Four UTF-8 chars.  */				\
-	  if (__glibc_unlikely (outptr + 4 > outend))		\
-	    {							\
-	      /* Overflow in the output buffer.  */		\
-	      result = __GCONV_FULL_OUTPUT;			\
-	      break;						\
-	    }							\
-	  outptr[0] = 0xf0;					\
-	  outptr[0] |= wc >> 18;				\
-								\
-	  outptr[1] = 0x80;					\
-	  outptr[1] |= (wc >> 12) & 0x3f;			\
-								\
-	  outptr[2] = 0x80;					\
-	  outptr[2] |= (wc >> 6) & 0x3f;			\
-								\
-	  outptr[3] = 0x80;					\
-	  outptr[3] |= wc & 0x3f;				\
-								\
-	  outptr += 4;						\
-	}							\
-      else							\
-	{							\
-	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
-	}							\
-    inptr += 4;							\
-  }
-
-#define HW_TO_VX							\
-  {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2;						\
-    asm volatile (".machine push\n\t"					\
-		  ".machine \"z13\"\n\t"				\
-		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
-		  "    vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
-		  "    vzero %%v21\n\t"					\
-		  "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
-		  "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
-		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
-		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP],0\n\t"				\
-		  /* Shorten to byte values.  */			\
-		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
-		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
-		  "    vpkh %%v23,%%v23,%%v24\n\t"			\
-		  /* Checking for values > 0x7f.  */			\
-		  "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
-		  "    jno 10f\n\t"					\
-		  "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
-		  "    jno 11f\n\t"					\
-		  "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
-		  "    jno 12f\n\t"					\
-		  "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
-		  "    jno 13f\n\t"					\
-		  /* Store 16bytes to outptr.  */			\
-		  "    vst %%v23,0(%[R_OUT])\n\t"			\
-		  "    aghi %[R_INLEN],-64\n\t"				\
-		  "    aghi %[R_OUTLEN],-16\n\t"			\
-		  "    la %[R_IN],64(%[R_IN])\n\t"			\
-		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
-		  "    j 1b\n\t"					\
-		  /* Found a value > 0x7f.  */				\
-		  "13: ahi %[R_TMP],4\n\t"				\
-		  "12: ahi %[R_TMP],4\n\t"				\
-		  "11: ahi %[R_TMP],4\n\t"				\
-		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
-		  "    srlg %[R_I],%[R_I],2\n\t"			\
-		  "    agr %[R_I],%[R_TMP]\n\t"				\
-		  "    je 20f\n\t"					\
-		  /* Store characters before invalid one...  */		\
-		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
-		  "15: aghi %[R_I],-1\n\t"				\
-		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
-		  /* ... and update pointers.  */			\
-		  "    aghi %[R_I],1\n\t"				\
-		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
-		  "    sllg %[R_I],%[R_I],2\n\t"			\
-		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
-		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
-		    , [R_I] "=a" (tmp2)					\
-		    , [R_RES] "+d" (result)				\
-		  : /* inputs */					\
-		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
-		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
-		  : /* clobber list */ "memory", "cc"			\
-		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
-		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
-		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
-		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
-		    ASM_CLOBBER_VR ("v24")				\
-		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-  }
-
-/* Generate loop-function with software routing.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_c
-#define BODY			BODY_TO_C
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
-
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
-#if defined HAVE_S390_VX_ASM_SUPPORT
-/* Generate loop-function with hardware vector and utf-convert instructions.  */
-# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-# define LOOPFCT		__to_utf8_loop_vx
-# define BODY			BODY_TO_VX
-# define LOOP_NEED_FLAGS
-# include <iconv/loop.c>
-#endif
-
-/* Generate ifunc'ed loop function.  */
-__typeof(__to_utf8_loop_c)
-__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
-__to_utf8_loop;
-
-static void *
-__to_utf8_loop_resolver (unsigned long int dl_hwcap)
-{
-#if defined HAVE_S390_VX_ASM_SUPPORT
-  if (dl_hwcap & HWCAP_S390_VX)
-    return __to_utf8_loop_vx;
-  else
-#endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf8_loop_etf3eh;
-  else
-    return __to_utf8_loop_c;
-}
-
-strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
-
-
-#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
new file mode 100644
index 0000000..8d42ab8
--- /dev/null
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -0,0 +1,636 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM_UTF32               0x0000feffu
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16               0xfeff
+
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		2
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf16_loop
+#define TO_LOOP			__to_utf16_loop
+#define FROM_DIRECTION		(dir == from_utf16)
+#define ONE_DIRECTION           0
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf16,
+  from_utf16
+};
+
+struct utf16_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf16_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	      || __strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf16;
+    }
+  else if ((__strcasecmp (step->__to_name, "UTF-16//") == 0
+	    || __strcasecmp (step->__to_name, "UTF-16BE//") == 0)
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf16;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf16_data *) malloc (sizeof (struct utf16_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf16)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      if (dir == to_utf16)						\
+	{								\
+	  /* Emit the UTF-16 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 2 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put16u (outbuf, BOM_UTF16);					\
+	  outbuf += 2;							\
+	}								\
+      else								\
+	{								\
+	  /* Emit the UTF-32 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 4 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put32u (outbuf, BOM_UTF32);					\
+	  outbuf += 4;							\
+	}								\
+    }
+
+/* Conversion function from UTF-16 to UTF-32 internal/BE.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_FROM_C							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
+      {									\
+	/* No surrogate.  */						\
+	put32 (outptr, u1);						\
+	inptr += 2;							\
+      }									\
+    else								\
+      {									\
+	/* An isolated low-surrogate was found.  This has to be         \
+	   considered ill-formed.  */					\
+	if (__glibc_unlikely (u1 >= 0xdc00))				\
+	  {								\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	/* It's a surrogate character.  At least the first word says	\
+	   it is.  */							\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    /* We don't have enough input for another complete input	\
+	       character.  */						\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	uint16_t u2 = get16 (inptr);					\
+	if (__builtin_expect (u2 < 0xdc00, 0)				\
+	    || __builtin_expect (u2 > 0xdfff, 0))			\
+	  {								\
+	    /* This is no valid second word for a surrogate.  */	\
+	    inptr -= 2;							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
+	  }								\
+									\
+	put32 (outptr, ((u1 - 0xd7c0) << 10) + (u2 - 0xdc00));		\
+	inptr += 2;							\
+      }									\
+    outptr += 4;							\
+  }
+
+#define BODY_FROM_VX							\
+  {									\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
+		  "0:  clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  /* Enlarge to UTF-32.  */				\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
+		  "9:  .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  "    .short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least on uint16_t is in range of surrogates.	\
+		     Store the preceding chars.  */			\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 12f\n\t"					\
+		  "    vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "12: lghi %[R_TMP2],16\n\t"				\
+		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    srl %[R_TMP2],1\n\t"				\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_OUTLEN],-4\n\t"				\
+		  "    j 16f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    je 97f\n\t" /* Only one byte available.  */	\
+		  "    jl 99f\n\t" /* End if no bytes available.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle remaining uint16_t values.  */		\
+		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 96f \n\t"					\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    jhe 15f\n\t"					\
+		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],13b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Handle UTF-16 surrogate pair.  */			\
+		  "15: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 14b\n\t" /* Jump away if ch > 0xdfff.  */	\
+		  "16: clfi %[R_TMP],0xdc00\n\t"			\
+		  "    jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
+		  "    slgfi %[R_INLEN],4\n\t"				\
+		  "    jl 97f\n\t" /* Big enough input?  */		\
+		  "    llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    slfi %[R_TMP],0xd7c0\n\t"			\
+		  "    sll %[R_TMP],10\n\t"				\
+		  "    risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
+		  "    nilf %[R_TMP3],0xfc00\n\t"			\
+		  "    clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    jne 98f\n\t"					\
+		  "    st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "96: \n\t" /* Return full output.  */			\
+		  "    lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "97: \n\t" /* Return incomplete input.  */		\
+		  "    lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "98:\n\t" /* Return Illegal character.  */		\
+		  "    lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
+  }
+
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__from_utf16_loop_c
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf16_loop_c)
+__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
+__from_utf16_loop;
+
+static void *
+__from_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf16_loop_vx;
+  else
+    return __from_utf16_loop_c;
+}
+
+strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
+#else
+# define LOOPFCT		FROM_LOOP
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+#endif
+
+/* Conversion from UTF-32 internal/BE to UTF-16.  */
+
+/* The software routine is copied from utf-16.c (minus bytes
+   swapping).  */
+#define BODY_TO_C							\
+  {									\
+    uint32_t c = get32 (inptr);						\
+									\
+    if (__builtin_expect (c <= 0xd7ff, 1)				\
+	|| (c >=0xdc00 && c <= 0xffff))					\
+      {									\
+	/* Two UTF-16 chars.  */					\
+	put16 (outptr, c);						\
+      }									\
+    else if (__builtin_expect (c >= 0x10000, 1)				\
+	     && __builtin_expect (c <= 0x10ffff, 1))			\
+      {									\
+	/* Four UTF-16 chars.  */					\
+	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
+	uint16_t out;							\
+									\
+	/* Generate a surrogate character.  */				\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	out = 0xd800;							\
+	out |= (zabcd & 0xff) << 6;					\
+	out |= (c >> 10) & 0x3f;					\
+	put16 (outptr, out);						\
+	outptr += 2;							\
+									\
+	out = 0xdc00;							\
+	out |= c & 0x3ff;						\
+	put16 (outptr, out);						\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    outptr += 2;							\
+    inptr += 4;								\
+  }
+
+#define BODY_TO_ETF3EH							\
+  {									\
+    HARDWARE_CONVERT ("cu42 %0, %1");					\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars			\
+		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
+		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Shorten to UTF-16.  */				\
+		  "    vpkf %%v18,%%v16,%%v17\n\t"			\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t"					\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
+		     and check for ch >= 0x10000. (v30, v31)  */	\
+		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
+		  "    .long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
+		  /* At least on UTF32 char is in range of surrogates.	\
+		     Store the preceding characters.  */		\
+		  "11: ahi %[R_TMP2],16\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 20f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handles UTF16 surrogates with convert instruction.  */ \
+		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_TO_VX
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf16_loop_c)
+__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
+__to_utf16_loop;
+
+static void *
+__to_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf16_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf16_loop_etf3eh;
+  else
+    return __to_utf16_loop_c;
+}
+
+strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
+
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf16-z9.c b/sysdeps/s390/utf8-utf16-z9.c
new file mode 100644
index 0000000..d3dc9bd
--- /dev/null
+++ b/sysdeps/s390/utf8-utf16-z9.c
@@ -0,0 +1,818 @@
+/* Conversion between UTF-16 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define MAX_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16	0xfeff
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-16//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-16//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-16BE//") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__from_name, "UTF-16BE//") == 0
+	   && __strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0)
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the UTF-16 Byte Order Mark.  */				\
+      if (__glibc_unlikely (outbuf + 2 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put16u (outbuf, BOM_UTF16);					\
+      outbuf += 2;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-16.  */
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Enlarge to UTF-16.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v18,%%v19,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest \
+							 index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+
+/* The software implementation is based on the code in gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint16_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0	\
+	       or 0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	if (cnt == 4)							\
+	  {								\
+	    /* For 4 byte UTF-8 chars two UTF-16 chars (high and	\
+	       low) are needed.  */					\
+	    uint16_t zabcd, high, low;					\
+									\
+	    if (__glibc_unlikely (outptr + 4 > outend))			\
+	      {								\
+		/* Overflow in the output buffer.  */			\
+		result = __GCONV_FULL_OUTPUT;				\
+		break;							\
+	      }								\
+									\
+	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		if ((inptr[i] & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  goto errout;						\
+	      }								\
+									\
+	    /* See Principles of Operations cu12.  */			\
+	    zabcd = (((inptr[0] & 0x7) << 2) |				\
+		     ((inptr[1] & 0x30) >> 4)) - 1;			\
+									\
+	    /* z-bit must be zero after subtracting 1.  */		\
+	    if (zabcd & 0x10)						\
+	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
+									\
+	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
+	    high |= zabcd << 6;                         /* abcd bits */	\
+	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
+	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
+									\
+	    low = (uint16_t)(0xdc << 8);         /* low surrogate id */ \
+	    low |= ((uint16_t)inptr[2] & 0xc) << 6;       /* kl bits */	\
+	    low |= (inptr[2] & 0x3) << 6;                 /* mn bits */	\
+	    low |= inptr[3] & 0x3f;                   /* opqrst bits */	\
+									\
+	    put16 (outptr, high);					\
+	    outptr += 2;						\
+	    put16 (outptr, low);					\
+	    outptr += 2;						\
+	    inptr += 4;							\
+	    continue;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Read the possible remaining bytes.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		uint16_t byte = inptr[i];				\
+									\
+		if ((byte & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  break;						\
+									\
+		ch <<= 6;						\
+		ch |= byte & 0x3f;					\
+	      }								\
+									\
+	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
+	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+	       have been represented with fewer than cnt bytes.  */	\
+	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
+		/* Do not accept UTF-16 surrogates.  */			\
+		|| (ch >= 0xd800 && ch <= 0xdfff))			\
+	      {								\
+		/* This is an illegal encoding.  */			\
+		goto errout;						\
+	      }								\
+									\
+	    inptr += cnt;						\
+	  }								\
+      }									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint16_t *) outptr) = ch;					\
+    outptr += sizeof (uint16_t);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+/* Conversion from UTF-16 to UTF-8.  */
+
+/* The software routine is based on the functionality of the S/390
+   hardware instruction (cu21) as described in the Principles of
+   Operation.  */
+#define BODY_TO_C							\
+  {									\
+    uint16_t c = get16 (inptr);						\
+									\
+    if (__glibc_likely (c <= 0x007f))					\
+      {									\
+	/* Single byte UTF-8 char.  */					\
+	*outptr = c & 0xff;						\
+	outptr++;							\
+      }									\
+    else if (c >= 0x0080 && c <= 0x07ff)				\
+      {									\
+	/* Two byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 2 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+									\
+	outptr[0] = 0xc0;						\
+	outptr[0] |= c >> 6;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= c & 0x3f;						\
+									\
+	outptr += 2;							\
+      }									\
+    else if ((c >= 0x0800 && c <= 0xd7ff) || c > 0xdfff)		\
+      {									\
+	/* Three byte UTF-8 char.  */					\
+									\
+	if (__glibc_unlikely (outptr + 3 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	outptr[0] = 0xe0;						\
+	outptr[0] |= c >> 12;						\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (c >> 6) & 0x3f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= c & 0x3f;						\
+									\
+	outptr += 3;							\
+      }									\
+    else if (c >= 0xd800 && c <= 0xdbff)				\
+      {									\
+	/* Four byte UTF-8 char.  */					\
+	uint16_t low, uvwxy;						\
+									\
+	if (__glibc_unlikely (outptr + 4 > outend))			\
+	  {								\
+	    /* Overflow in the output buffer.  */			\
+	    result = __GCONV_FULL_OUTPUT;				\
+	    break;							\
+	  }								\
+	if (__glibc_unlikely (inptr + 4 > inend))			\
+	  {								\
+	    result = __GCONV_INCOMPLETE_INPUT;				\
+	    break;							\
+	  }								\
+									\
+	inptr += 2;							\
+	low = get16 (inptr);						\
+									\
+	if ((low & 0xfc00) != 0xdc00)					\
+	  {								\
+	    inptr -= 2;							\
+	    STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+	  }								\
+	uvwxy = ((c >> 6) & 0xf) + 1;					\
+	outptr[0] = 0xf0;						\
+	outptr[0] |= uvwxy >> 2;					\
+									\
+	outptr[1] = 0x80;						\
+	outptr[1] |= (uvwxy << 4) & 0x30;				\
+	outptr[1] |= (c >> 2) & 0x0f;					\
+									\
+	outptr[2] = 0x80;						\
+	outptr[2] |= (c & 0x03) << 4;					\
+	outptr[2] |= (low >> 6) & 0x0f;					\
+									\
+	outptr[3] = 0x80;						\
+	outptr[3] |= low & 0x3f;					\
+									\
+	outptr += 4;							\
+      }									\
+    else								\
+      {									\
+	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+      }									\
+    inptr += 2;								\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    size_t inlen  = inend - inptr;					\
+    size_t outlen  = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for values <= 0x7f.  */		\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Check for > 1byte UTF-8 chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  "    vstrchs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
+		  "9:  .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  "    .short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
+		  "10:\n\t"						\
+		  "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 13f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  "13: \n\t"						\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "    lghi %[R_TMP2],16\n\t"				\
+		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    j 22f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
+		  "    jle 99f\n\t" /* End if less than two bytes.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  /* Test if ch is 1-byte UTF-8 char.  */		\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 2-byte UTF-8 char.  */		\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "    jh 23f\n\t"					\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llill %[R_TMP3],0xc080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
+		  "    jh 24f\n\t"					\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xe08080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */ \
+		  "    clfi %[R_TMP],0xdbff\n\t"			\
+		  "    locghih %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jh 99f\n\t" /* Jump away if this is a low surrogate \
+				      without a preceding high surrogate.  */ \
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    slgfi %[R_INLEN],2\n\t"				\
+		  "    locghil %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
+		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "    aghi %[R_TMP],0x40\n\t"				\
+		  "    risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
+		  "    llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
+		  "    nilf %[R_TMP],0xfc00\n\t"			\
+		  "    clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    locghine %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 20b\n\t"					\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__to_utf8_loop_c
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate loop-function with software implementation.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY                   BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+#else
+# define LOOPFCT		TO_LOOP
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
+
+#include <iconv/skeleton.c>
diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
new file mode 100644
index 0000000..e39e0a7
--- /dev/null
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -0,0 +1,820 @@
+/* Conversion between UTF-8 and UTF-32 BE/internal.
+
+   This module uses the Z9-109 variants of the Convert Unicode
+   instructions.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
+   Based on the work by Ulrich Drepper  <drepper@cygnus.com>, 1997.
+
+   Thanks to Daniel Appich who covered the relevant performance work
+   in his diploma thesis.
+
+   This is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   This is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <dl-procinfo.h>
+#include <gconv.h>
+
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
+/* Defines for skeleton.c.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MAX_NEEDED_FROM		6
+#define MIN_NEEDED_TO		4
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
+#define FROM_DIRECTION		(dir == from_utf8)
+#define ONE_DIRECTION           0
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM			0x0000feffu
+
+/* Direction of the transformation.  */
+enum direction
+{
+  illegal_dir,
+  to_utf8,
+  from_utf8
+};
+
+struct utf8_data
+{
+  enum direction dir;
+  int emit_bom;
+};
+
+
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+  /* Determine which direction.  */
+  struct utf8_data *new_data;
+  enum direction dir = illegal_dir;
+  int emit_bom;
+  int result;
+
+  emit_bom = (__strcasecmp (step->__to_name, "UTF-32//") == 0);
+
+  if (__strcasecmp (step->__from_name, "ISO-10646/UTF8/") == 0
+      && (__strcasecmp (step->__to_name, "UTF-32//") == 0
+	  || __strcasecmp (step->__to_name, "UTF-32BE//") == 0
+	  || __strcasecmp (step->__to_name, "INTERNAL") == 0))
+    {
+      dir = from_utf8;
+    }
+  else if (__strcasecmp (step->__to_name, "ISO-10646/UTF8/") == 0
+	   && (__strcasecmp (step->__from_name, "UTF-32BE//") == 0
+	       || __strcasecmp (step->__from_name, "INTERNAL") == 0))
+    {
+      dir = to_utf8;
+    }
+
+  result = __GCONV_NOCONV;
+  if (dir != illegal_dir)
+    {
+      new_data = (struct utf8_data *) malloc (sizeof (struct utf8_data));
+
+      result = __GCONV_NOMEM;
+      if (new_data != NULL)
+	{
+	  new_data->dir = dir;
+	  new_data->emit_bom = emit_bom;
+	  step->__data = new_data;
+
+	  if (dir == from_utf8)
+	    {
+	      step->__min_needed_from = MIN_NEEDED_FROM;
+	      step->__max_needed_from = MIN_NEEDED_FROM;
+	      step->__min_needed_to = MIN_NEEDED_TO;
+	      step->__max_needed_to = MIN_NEEDED_TO;
+	    }
+	  else
+	    {
+	      step->__min_needed_from = MIN_NEEDED_TO;
+	      step->__max_needed_from = MIN_NEEDED_TO;
+	      step->__min_needed_to = MIN_NEEDED_FROM;
+	      step->__max_needed_to = MIN_NEEDED_FROM;
+	    }
+
+	  step->__stateful = 0;
+
+	  result = __GCONV_OK;
+	}
+    }
+
+  return result;
+}
+
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+  free (data->__data);
+}
+
+/* The macro for the hardware loop.  This is used for both
+   directions.  */
+#define HARDWARE_CONVERT(INSTRUCTION)					\
+  {									\
+    register const unsigned char* pInput __asm__ ("8") = inptr;		\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
+    register unsigned char* pOutput __asm__ ("10") = outptr;		\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
+									\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
+									\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+    cc >>= 28;								\
+									\
+    if (cc == 1)							\
+      {									\
+	result = __GCONV_FULL_OUTPUT;					\
+      }									\
+    else if (cc == 2)							\
+      {									\
+	result = __GCONV_ILLEGAL_INPUT;					\
+      }									\
+  }
+
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the Byte Order Mark.  */					\
+      if (__glibc_unlikely (outbuf + 4 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put32u (outbuf, BOM);						\
+      outbuf += 4;							\
+    }
+
+/* Conversion function from UTF-8 to UTF-32 internal/BE.  */
+
+#define STORE_REST_COMMON						      \
+  {									      \
+    /* We store the remaining bytes while converting them into the UCS4	      \
+       format.  We can assume that the first byte in the buffer is	      \
+       correct and that it requires a larger number of bytes than there	      \
+       are in the input buffer.  */					      \
+    wint_t ch = **inptrp;						      \
+    size_t cnt, r;							      \
+									      \
+    state->__count = inend - *inptrp;					      \
+									      \
+    assert (ch != 0xc0 && ch != 0xc1);					      \
+    if (ch >= 0xc2 && ch < 0xe0)					      \
+      {									      \
+	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
+	   0xc1, otherwise the wide character could have been		      \
+	   represented using a single byte.  */				      \
+	cnt = 2;							      \
+	ch &= 0x1f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+      {									      \
+	/* We expect three bytes.  */					      \
+	cnt = 3;							      \
+	ch &= 0x0f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+      {									      \
+	/* We expect four bytes.  */					      \
+	cnt = 4;							      \
+	ch &= 0x07;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
+      {									      \
+	/* We expect five bytes.  */					      \
+	cnt = 5;							      \
+	ch &= 0x03;							      \
+      }									      \
+    else								      \
+      {									      \
+	/* We expect six bytes.  */					      \
+	cnt = 6;							      \
+	ch &= 0x01;							      \
+      }									      \
+									      \
+    /* The first byte is already consumed.  */				      \
+    r = cnt - 1;							      \
+    while (++(*inptrp) < inend)						      \
+      {									      \
+	ch <<= 6;							      \
+	ch |= **inptrp & 0x3f;						      \
+	--r;								      \
+      }									      \
+									      \
+    /* Shift for the so far missing bytes.  */				      \
+    ch <<= r * 6;							      \
+									      \
+    /* Store the number of bytes expected for the entire sequence.  */	      \
+    state->__count |= cnt << 8;						      \
+									      \
+    /* Store the value.  */						      \
+    state->__value.__wch = ch;						      \
+  }
+
+#define UNPACK_BYTES_COMMON \
+  {									      \
+    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
+    wint_t wch = state->__value.__wch;					      \
+    size_t ntotal = state->__count >> 8;				      \
+									      \
+    inlen = state->__count & 255;					      \
+									      \
+    bytebuf[0] = inmask[ntotal - 2];					      \
+									      \
+    do									      \
+      {									      \
+	if (--ntotal < inlen)						      \
+	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
+	wch >>= 6;							      \
+      }									      \
+    while (ntotal > 1);							      \
+									      \
+    bytebuf[0] |= wch;							      \
+  }
+
+#define CLEAR_STATE_COMMON \
+  state->__count = 0
+
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
+
+
+/* The software routine is copied from gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
+    /* Next input byte.  */						\
+    uint32_t ch = *inptr;						\
+									\
+    if (__glibc_likely (ch < 0x80))					\
+      {									\
+	/* One byte sequence.  */					\
+	++inptr;							\
+      }									\
+    else								\
+      {									\
+	uint_fast32_t cnt;						\
+	uint_fast32_t i;						\
+									\
+	if (ch >= 0xc2 && ch < 0xe0)					\
+	  {								\
+	    /* We expect two bytes.  The first byte cannot be 0xc0 or	\
+	       0xc1, otherwise the wide character could have been	\
+	       represented using a single byte.  */			\
+	    cnt = 2;							\
+	    ch &= 0x1f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
+	  {								\
+	    /* We expect three bytes.  */				\
+	    cnt = 3;							\
+	    ch &= 0x0f;							\
+	  }								\
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
+	  {								\
+	    /* We expect four bytes.  */				\
+	    cnt = 4;							\
+	    ch &= 0x07;							\
+	  }								\
+	else								\
+	  {								\
+	    /* Search the end of this ill-formed UTF-8 character.  This	\
+	       is the next byte with (x & 0xc0) != 0x80.  */		\
+	    i = 0;							\
+	    do								\
+	      ++i;							\
+	    while (inptr + i < inend					\
+		   && (*(inptr + i) & 0xc0) == 0x80			\
+		   && i < 5);						\
+									\
+	  errout:							\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
+	  }								\
+									\
+	if (__glibc_unlikely (inptr + cnt > inend))			\
+	  {								\
+	    /* We don't have enough input.  But before we report	\
+	       that check that all the bytes are correct.  */		\
+	    for (i = 1; inptr + i < inend; ++i)				\
+	      if ((inptr[i] & 0xc0) != 0x80)				\
+		break;							\
+									\
+	    if (__glibc_likely (inptr + i == inend))			\
+	      {								\
+		result = __GCONV_INCOMPLETE_INPUT;			\
+		break;							\
+	      }								\
+									\
+	    goto errout;						\
+	  }								\
+									\
+	/* Read the possible remaining bytes.  */			\
+	for (i = 1; i < cnt; ++i)					\
+	  {								\
+	    uint32_t byte = inptr[i];					\
+									\
+	    if ((byte & 0xc0) != 0x80)					\
+	      /* This is an illegal encoding.  */			\
+	      break;							\
+									\
+	    ch <<= 6;							\
+	    ch |= byte & 0x3f;						\
+	  }								\
+									\
+	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
+	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
+	   have been represented with fewer than cnt bytes.  */		\
+	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
+	    /* Do not accept UTF-16 surrogates.  */			\
+	    || (ch >= 0xd800 && ch <= 0xdfff)				\
+	    || (ch > 0x10ffff))						\
+	  {								\
+	    /* This is an illegal encoding.  */				\
+	    goto errout;						\
+	  }								\
+									\
+	inptr += cnt;							\
+      }									\
+									\
+    /* Now adjust the pointers and store the result.  */		\
+    *((uint32_t *) outptr) = ch;					\
+    outptr += sizeof (uint32_t);					\
+  }
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				   UTF8 chars.  */			\
+		  /* Enlarge to UCS4.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    aghi %[R_OUTLEN],-64\n\t"			\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  /* Store 64 bytes to buf_out.  */			\
+		  "    vstm %%v20,%%v23,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],64(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10: \n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest \
+						     index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  "    vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
+		  "11: \n\t"						\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+		    ASM_CLOBBER_VR ("v31")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
+/* These definitions apply to the UTF-8 to UTF-32 direction.  The
+   software implementation for UTF-8 still supports multibyte
+   characters up to 6 bytes whereas the hardware variant does not.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+
+# define LOOP_NEED_FLAGS
+
+# define STORE_REST		STORE_REST_COMMON
+# define UNPACK_BYTES		UNPACK_BYTES_COMMON
+# define CLEAR_STATE		CLEAR_STATE_COMMON
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+
+/* Conversion from UTF-32 internal/BE to UTF-8.  */
+#define BODY_TO_HW(ASM)							\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* The hardware routine uses the S/390 cu41 instruction.  */
+#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
+
+/* The hardware routine uses the S/390 vector and cu41 instructions.  */
+#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
+
+/* The software routine mimics the S/390 cu41 instruction.  */
+#define BODY_TO_C						\
+  {								\
+    uint32_t wc = *((const uint32_t *) inptr);			\
+								\
+    if (__glibc_likely (wc <= 0x7f))				\
+      {								\
+	/* Single UTF-8 char.  */				\
+	*outptr = (uint8_t)wc;					\
+	outptr++;						\
+      }								\
+    else if (wc <= 0x7ff)					\
+      {								\
+	/* Two UTF-8 chars.  */					\
+	if (__glibc_unlikely (outptr + 2 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+								\
+	outptr[0] = 0xc0;					\
+	outptr[0] |= wc >> 6;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= wc & 0x3f;					\
+								\
+	outptr += 2;						\
+      }								\
+    else if (wc <= 0xffff)					\
+      {								\
+	/* Three UTF-8 chars.  */				\
+	if (__glibc_unlikely (outptr + 3 > outend))		\
+	  {							\
+	    /* Overflow in the output buffer.  */		\
+	    result = __GCONV_FULL_OUTPUT;			\
+	    break;						\
+	  }							\
+	if (wc >= 0xd800 && wc < 0xdc00)			\
+	  {							\
+	    /* Do not accept UTF-16 surrogates.   */		\
+	    result = __GCONV_ILLEGAL_INPUT;			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	  }							\
+	outptr[0] = 0xe0;					\
+	outptr[0] |= wc >> 12;					\
+								\
+	outptr[1] = 0x80;					\
+	outptr[1] |= (wc >> 6) & 0x3f;				\
+								\
+	outptr[2] = 0x80;					\
+	outptr[2] |= wc & 0x3f;					\
+								\
+	outptr += 3;						\
+      }								\
+      else if (wc <= 0x10ffff)					\
+	{							\
+	  /* Four UTF-8 chars.  */				\
+	  if (__glibc_unlikely (outptr + 4 > outend))		\
+	    {							\
+	      /* Overflow in the output buffer.  */		\
+	      result = __GCONV_FULL_OUTPUT;			\
+	      break;						\
+	    }							\
+	  outptr[0] = 0xf0;					\
+	  outptr[0] |= wc >> 18;				\
+								\
+	  outptr[1] = 0x80;					\
+	  outptr[1] |= (wc >> 12) & 0x3f;			\
+								\
+	  outptr[2] = 0x80;					\
+	  outptr[2] |= (wc >> 6) & 0x3f;			\
+								\
+	  outptr[3] = 0x80;					\
+	  outptr[3] |= wc & 0x3f;				\
+								\
+	  outptr += 4;						\
+	}							\
+      else							\
+	{							\
+	  STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	}							\
+    inptr += 4;							\
+  }
+
+#define HW_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2;						\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		  "    vzero %%v21\n\t"					\
+		  "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		  "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
+		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP],0\n\t"				\
+		  /* Shorten to byte values.  */			\
+		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		  "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		  /* Checking for values > 0x7f.  */			\
+		  "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		  "    jno 11f\n\t"					\
+		  "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		  "    jno 12f\n\t"					\
+		  "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		  "    jno 13f\n\t"					\
+		  /* Store 16bytes to outptr.  */			\
+		  "    vst %%v23,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_INLEN],-64\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_IN],64(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Found a value > 0x7f.  */				\
+		  "13: ahi %[R_TMP],4\n\t"				\
+		  "12: ahi %[R_TMP],4\n\t"				\
+		  "11: ahi %[R_TMP],4\n\t"				\
+		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		  "    srlg %[R_I],%[R_I],2\n\t"			\
+		  "    agr %[R_I],%[R_TMP]\n\t"				\
+		  "    je 20f\n\t"					\
+		  /* Store characters before invalid one...  */		\
+		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
+		  "15: aghi %[R_I],-1\n\t"				\
+		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		  /* ... and update pointers.  */			\
+		  "    aghi %[R_I],1\n\t"				\
+		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
+		  "    sllg %[R_I],%[R_I],2\n\t"			\
+		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
+		    , [R_I] "=a" (tmp2)					\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+		    ASM_CLOBBER_VR ("v24")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_c
+#define BODY			BODY_TO_C
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_etf3eh
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY			BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf8_loop_etf3eh;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+
+#include <iconv/skeleton.c>
-- 
2.5.5


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PING] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-18 15:28               ` Stefan Liebler
@ 2016-05-24 15:02                 ` Stefan Liebler
  2016-05-25 15:29                   ` [COMMITTED] " Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-05-24 15:02 UTC (permalink / raw)
  To: libc-alpha

On 05/18/2016 05:20 PM, Stefan Liebler wrote:
> On 05/09/2016 04:15 PM, Stefan Liebler wrote:
>> On 05/04/2016 03:40 PM, Andreas Schwab wrote:
>>> Define a variable sysdep-gconv-modules that can be set by
>>> sysdeps/.../Makefile, and use it in iconvdata/Makefile to cat the files
>>> together.  Please also fix the rule in sysdeps/s390/s390-64/Makefile to
>>> use a temporary file to make the update atomic.  Since we no longer
>>> support empty objpfx the conditional test can be removed.
>>>
>>> Andreas.
>>>
>>
>> Okay. I will remove the objpfx conditional test in iconvdata/Makefile.
>>
>> I have to add the s390 specific modules before all the other ones in
>> <source>/iconvdata/gconv-modules.
>> (See my second patch: "S390: Mention s390-specific gconv-modues before
>> common ones.")
>> Thus simply concatenating would lead to something like that:
>> "
>> # GNU libc iconv configuration.
>> # Copyright (C) 1997-2016 Free Software Foundation, Inc.
>> #....
>>
>> s390-specific modules
>>
>> # GNU libc iconv configuration.
>> # Copyright (C) 1997-2016 Free Software Foundation, Inc.
>> #....
>>
>> common modules
>> "
>>
>> This doesn't look very nice. Or is it okay?
>>
>> Then I would prefer to create a file
>> <source>/sysdeps/s390/gconv-modules-s390 with the module-definitions,
>> set the variable sysdep-gconv-modules and omit the rule with "cp, echo,
>> echo ..." in sysdeps/s390/s390-64/Makefile at all.
>>
>> Bye
>> Stefan
>>
>>
> Here is an updated patch. It concatenates the s390-specific and the
> common gconv-modules file together. The s390-specific gconv-modules
> files is specified with variable sysdep-gconv-modules in
> sysdeps/s390/s390-64/Makefile.
>
> The second patch "[PATCH 02/14] S390: Mention s390-specific gconv-modues
> before common ones." can be removed since the s390 modules are already
> mentioned before the common ones with this patch.
>
> The patch "[PATCH 10/14] S390: Use s390-64 specific ionv-modules on
> s390-32,too.", which moves the iconvdata contents from
> sysdeps/s390/s390-64/Makefile to sysdeps/s390/Makefile has to be
> adjusted in order to reflect the Makefile-changes.
>
> Okay to commit with these changes?
>
> Bye
> Stefan
>
> ---
> ChangeLog:
>
>      * iconvdata/Makefile ($(inst_gconvdir)/gconv-modules):
>      Install file from $(objpfx)gconv-modules.
>      ($(objpfx)gconv-modules): Concatenate architecture specific file
>      in variable sysdeps-gconv-modules and gconv-modules in src dir.
>      * sysdeps/s390/gconv-modules: New file.
>      * sysdeps/s390/s390-64/Makefile: ($(inst_gconvdir)/gconv-modules):
>      Deleted.
>      ($(objpfx)gconv-modules-s390): Deleted.
>      (sysdeps-gconv-modules): New variable.

Any objection?
Otherwise I'll commit the patch series.

Bye
Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [COMMITTED] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-24 15:02                 ` Stefan Liebler
@ 2016-05-25 15:29                   ` Stefan Liebler
  2016-05-25 15:37                     ` Joseph Myers
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-05-25 15:29 UTC (permalink / raw)
  To: libc-alpha

On 05/24/2016 03:26 PM, Stefan Liebler wrote:
> On 05/18/2016 05:20 PM, Stefan Liebler wrote:
>> On 05/09/2016 04:15 PM, Stefan Liebler wrote:
>>> On 05/04/2016 03:40 PM, Andreas Schwab wrote:
>>>> Define a variable sysdep-gconv-modules that can be set by
>>>> sysdeps/.../Makefile, and use it in iconvdata/Makefile to cat the files
>>>> together.  Please also fix the rule in sysdeps/s390/s390-64/Makefile to
>>>> use a temporary file to make the update atomic.  Since we no longer
>>>> support empty objpfx the conditional test can be removed.
>>>>
>>>> Andreas.
>>>>
>>>
>>> Okay. I will remove the objpfx conditional test in iconvdata/Makefile.
>>>
>>> I have to add the s390 specific modules before all the other ones in
>>> <source>/iconvdata/gconv-modules.
>>> (See my second patch: "S390: Mention s390-specific gconv-modues before
>>> common ones.")
>>> Thus simply concatenating would lead to something like that:
>>> "
>>> # GNU libc iconv configuration.
>>> # Copyright (C) 1997-2016 Free Software Foundation, Inc.
>>> #....
>>>
>>> s390-specific modules
>>>
>>> # GNU libc iconv configuration.
>>> # Copyright (C) 1997-2016 Free Software Foundation, Inc.
>>> #....
>>>
>>> common modules
>>> "
>>>
>>> This doesn't look very nice. Or is it okay?
>>>
>>> Then I would prefer to create a file
>>> <source>/sysdeps/s390/gconv-modules-s390 with the module-definitions,
>>> set the variable sysdep-gconv-modules and omit the rule with "cp, echo,
>>> echo ..." in sysdeps/s390/s390-64/Makefile at all.
>>>
>>> Bye
>>> Stefan
>>>
>>>
>> Here is an updated patch. It concatenates the s390-specific and the
>> common gconv-modules file together. The s390-specific gconv-modules
>> files is specified with variable sysdep-gconv-modules in
>> sysdeps/s390/s390-64/Makefile.
>>
>> The second patch "[PATCH 02/14] S390: Mention s390-specific gconv-modues
>> before common ones." can be removed since the s390 modules are already
>> mentioned before the common ones with this patch.
>>
>> The patch "[PATCH 10/14] S390: Use s390-64 specific ionv-modules on
>> s390-32,too.", which moves the iconvdata contents from
>> sysdeps/s390/s390-64/Makefile to sysdeps/s390/Makefile has to be
>> adjusted in order to reflect the Makefile-changes.
>>
>> Okay to commit with these changes?
>>
>> Bye
>> Stefan
>>
>> ---
>> ChangeLog:
>>
>>      * iconvdata/Makefile ($(inst_gconvdir)/gconv-modules):
>>      Install file from $(objpfx)gconv-modules.
>>      ($(objpfx)gconv-modules): Concatenate architecture specific file
>>      in variable sysdeps-gconv-modules and gconv-modules in src dir.
>>      * sysdeps/s390/gconv-modules: New file.
>>      * sysdeps/s390/s390-64/Makefile: ($(inst_gconvdir)/gconv-modules):
>>      Deleted.
>>      ($(objpfx)gconv-modules-s390): Deleted.
>>      (sysdeps-gconv-modules): New variable.
>
> Any objection?
> Otherwise I'll commit the patch series.
>
> Bye
> Stefan
>
>
Hi,

I have committed the patchset as Andreas Schwab gave the okay for this 
last common-code patch. See off-mailing-list mail:
"
Stefan Liebler <stli@linux.vnet.ibm.com> writes:

 > can you please have a look at the latest patch (see forwarded mail) where
 > the two gconv-modules files are concatenated.
 >
 > Is this your intended way? If yes, then I'll commit the patches.

This looks good, please go on.

Thanks, Andreas.
"

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [COMMITTED] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-25 15:29                   ` [COMMITTED] " Stefan Liebler
@ 2016-05-25 15:37                     ` Joseph Myers
  2016-05-25 15:58                       ` Stefan Liebler
  0 siblings, 1 reply; 55+ messages in thread
From: Joseph Myers @ 2016-05-25 15:37 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

Does this commit fully fix bugs 19726 and 19727, or only partly?  If it 
fully fixes them, they should be resolved as FIXED with target milestone 
set to 2.24.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [COMMITTED] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-25 15:37                     ` Joseph Myers
@ 2016-05-25 15:58                       ` Stefan Liebler
  2016-05-25 16:32                         ` Joseph Myers
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Liebler @ 2016-05-25 15:58 UTC (permalink / raw)
  To: libc-alpha

On 05/25/2016 05:34 PM, Joseph Myers wrote:
> Does this commit fully fix bugs 19726 and 19727, or only partly?  If it
> fully fixes them, they should be resolved as FIXED with target milestone
> set to 2.24.
>
I've set the bugs to fixed, but I was not able to set the target milestone.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [COMMITTED] [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules.
  2016-05-25 15:58                       ` Stefan Liebler
@ 2016-05-25 16:32                         ` Joseph Myers
  0 siblings, 0 replies; 55+ messages in thread
From: Joseph Myers @ 2016-05-25 16:32 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: libc-alpha

On Wed, 25 May 2016, Stefan Liebler wrote:

> On 05/25/2016 05:34 PM, Joseph Myers wrote:
> > Does this commit fully fix bugs 19726 and 19727, or only partly?  If it
> > fully fixes them, they should be resolved as FIXED with target milestone
> > set to 2.24.
> > 
> I've set the bugs to fixed, but I was not able to set the target milestone.

I've added you to the editbugs group.  Please try milestone setting again.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2016-05-25 15:58 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-23  9:22 [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
2016-02-23  9:21 ` [PATCH 01/14] S390: Get rid of make warning: overriding recipe for target gconv-modules Stefan Liebler
2016-04-14 14:16   ` Stefan Liebler
2016-04-21 15:00     ` Stefan Liebler
2016-04-28  6:55       ` Stefan Liebler
2016-05-04 13:15         ` [PING] " Stefan Liebler
2016-05-04 13:40           ` Andreas Schwab
2016-05-09 14:33             ` Stefan Liebler
2016-05-18 15:28               ` Stefan Liebler
2016-05-24 15:02                 ` Stefan Liebler
2016-05-25 15:29                   ` [COMMITTED] " Stefan Liebler
2016-05-25 15:37                     ` Joseph Myers
2016-05-25 15:58                       ` Stefan Liebler
2016-05-25 16:32                         ` Joseph Myers
2016-02-23  9:21 ` [PATCH 13/14] Fix ucs4le_internal_loop in error case Stefan Liebler
2016-02-23 17:42   ` Joseph Myers
2016-02-25  9:00     ` Stefan Liebler
2016-03-18 13:04       ` Stefan Liebler
2016-03-31  9:20         ` Stefan Liebler
2016-03-31  9:45       ` Andreas Schwab
2016-02-23  9:21 ` [PATCH 02/14] S390: Mention s390-specific gconv-modues before common ones Stefan Liebler
2016-04-15 10:27   ` Florian Weimer
2016-04-21 14:50     ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 04/14] S390: Optimize 8bit-generic iconv modules Stefan Liebler
2016-04-15 13:05   ` Florian Weimer
2016-04-21 15:35     ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module Stefan Liebler
2016-04-21 15:05   ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 11/14] S390: Fix utf32 to utf8 handling of low surrogates (disable cu41) Stefan Liebler
2016-04-21 15:25   ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 03/14] S390: Configure check for vector support in gcc Stefan Liebler
2016-02-23  9:22 ` [PATCH 05/14] S390: Optimize builtin iconv-modules Stefan Liebler
2016-03-18 12:58   ` Stefan Liebler
2016-04-21 14:51     ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 12/14] S390: Fix utf32 to utf16 handling of low surrogates (disable cu42) Stefan Liebler
2016-04-21 15:30   ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 09/14] S390: Optimize utf16-utf32 module Stefan Liebler
2016-04-21 14:55   ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 07/14] S390: Optimize utf8-utf32 module Stefan Liebler
2016-04-21 15:15   ` Stefan Liebler
2016-02-23  9:22 ` [PATCH 08/14] S390: Optimize utf8-utf16 module Stefan Liebler
2016-04-21 15:20   ` Stefan Liebler
2016-02-23  9:23 ` [PATCH 10/14] S390: Use s390-64 specific ionv-modules on s390-32, too Stefan Liebler
2016-02-23 12:06   ` Stefan Liebler
2016-04-21 15:10   ` Stefan Liebler
2016-02-23  9:23 ` [PATCH 14/14] Fix UTF-16 surrogate handling Stefan Liebler
2016-02-23 17:57   ` Joseph Myers
2016-02-25 12:57     ` Stefan Liebler
2016-03-18 13:05       ` Stefan Liebler
2016-03-22 14:39         ` Stefan Liebler
2016-03-31  9:18           ` Stefan Liebler
2016-04-07 14:35             ` Stefan Liebler
2016-04-07 15:18           ` Andreas Schwab
2016-03-01 15:01 ` [PATCH 00/14] S390: Optimize iconv modules Stefan Liebler
2016-03-08 12:33   ` Stefan Liebler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).