public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 00/26] ARM improvements
@ 2013-02-27  3:16 Richard Henderson
  2013-02-27  3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
                   ` (28 more replies)
  0 siblings, 29 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:16 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

The first two patches are required to get glibc to build with gcc 4.8
on armv7.  Otherwise it doesn't actually notice armv7 and configures
for the default armv4.

The third patch, I thought I was going to need for an armv6 (not t2)
implementation of addmul_1 (using umaal).  But in the end I managed
to match speed of the umaal version with the default umlal version,
so I opted not to submit it at all.  It could be dropped, but I think
it makes sense to keep it.

Patches 4-18 improve the ability to build libc as a thumb2 binary.
In the end, almost all assembly is done in thumb2 mode if -mthumb
is present in ASFLAGS.  Its that last that's the sticky part: by
default we copy only a couple of flags over from CFLAGS.  I'm not
sure why we're not passing them all to the assembler.  So at the
moment I'm just putting ASFLAGS on the make command-line to get
what I want.

Patches 19-23 add improved string routines for armv6t2.  I've had these
hanging around for almost 2 years without properly submitting them. 
Which is perhaps a bit silly, but the A8 host I was originally doing
testing on has a dreadfully low resolution clock, so it was hard to get
real numbers. Whereas the A15 has a 1ns resolution CLOCK_MONOTONIC_RAW.
I can post the benchmarks under separate cover if you like.

Patches 24-26 add improved gmp routines for armv4.  They're written
from scratch, as I understand that glibc is LGPL2.1, and gmp is GPL3.
They're significantly faster than the generic defaults, and they're
of similar performance to gmp on the A15 (though probably not on the
in-order cores).  For the sizes of multiplies that we're going to
encounter inside glibc, they're probably sufficient.


r~


Richard Henderson (26):
  Sync config.guess and config.sub with upstream
  arm: Update preconfigure fragment for gcc 4.8
  arm: Handle armv6 in preconfigure
  arm: Include libc-do-syscall in sysdep-rtld-routines
  arm: Introduce thumb helpers s and pc_ofs
  arm: Use pc_ofs
  arm: Introduce and use GET_TLS
  arm: Add IT insns for thumb mode
  arm: Mark assembly files that will not use thumb mode
  arm: Introduce and use LDST_PCREL
  arm: Introduce and use NEGOFF series of macros
  arm: Enable thumb2 mode in assembly files
  arm: Store lr in r2 around GET_TLS
  arm: Use push/pop mnemonics
  arm: Delete LOADREGS macro
  arm: Commonize BX conditionals
  arm: Unless arm4t, pop return address directly into pc
  arm: Use GET_TLS more often
  arm: Add optimized ffs for armv6t2
  arm: Implement armv6t2 optimized strlen
  arm: Implement armv6t2 optimized strcpy
  arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr
  arm: Rewrite armv6t2 memchr with uqadd8
  arm: Add optimized addmul_1
  arm: Add optimized submul_1
  arm: Add optimized add_n and sub_n

 ports/sysdeps/arm/__longjmp.S                      |   8 +-
 ports/sysdeps/arm/add_n.S                          |  83 ++++++++
 ports/sysdeps/arm/addmul_1.S                       |  60 ++++++
 ports/sysdeps/arm/arm-mcount.S                     |  19 +-
 ports/sysdeps/arm/armv6t2/ffs.S                    |  34 ++++
 ports/sysdeps/arm/armv6t2/ffsll.S                  |  49 +++++
 ports/sysdeps/arm/armv6t2/memchr.S                 | 216 ++++++++++-----------
 ports/sysdeps/arm/armv6t2/rawmemchr.S              |  81 ++++++++
 ports/sysdeps/arm/armv6t2/stpcpy.S                 |   1 +
 ports/sysdeps/arm/armv6t2/strchr.S                 | 138 +++++++++++++
 ports/sysdeps/arm/armv6t2/strcpy.S                 | 213 ++++++++++++++++++++
 ports/sysdeps/arm/armv6t2/strlen.S                 |  93 +++++++++
 ports/sysdeps/arm/armv6t2/strrchr.S                | 137 +++++++++++++
 ports/sysdeps/arm/crti.S                           |   6 +-
 ports/sysdeps/arm/crtn.S                           |  10 +-
 ports/sysdeps/arm/dl-tlsdesc.S                     |  49 +++--
 ports/sysdeps/arm/dl-trampoline.S                  |  15 +-
 ports/sysdeps/arm/memcpy.S                         |  60 +++---
 ports/sysdeps/arm/memmove.S                        |  60 +++---
 ports/sysdeps/arm/memset.S                         |   2 +
 ports/sysdeps/arm/preconfigure                     |   7 +-
 ports/sysdeps/arm/setjmp.S                         |   6 +-
 ports/sysdeps/arm/start.S                          |  10 +-
 ports/sysdeps/arm/strlen.S                         |   2 +
 ports/sysdeps/arm/sub_n.S                          |   2 +
 ports/sysdeps/arm/submul_1.S                       |  67 +++++++
 ports/sysdeps/arm/sysdep.h                         |  99 +++++++---
 ports/sysdeps/unix/arm/sysdep.S                    |  39 ++--
 ports/sysdeps/unix/sysv/linux/arm/Makefile         |   2 +-
 .../sysdeps/unix/sysv/linux/arm/____longjmp_chk.S  |   6 +-
 ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S  |   6 +
 ports/sysdeps/unix/sysv/linux/arm/clone.S          |  17 +-
 ports/sysdeps/unix/sysv/linux/arm/getcontext.S     |   2 +-
 ports/sysdeps/unix/sysv/linux/arm/mmap.S           |   9 +-
 ports/sysdeps/unix/sysv/linux/arm/mmap64.S         |  14 +-
 ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S  |  23 +--
 .../unix/sysv/linux/arm/nptl/sysdep-cancel.h       |  57 +++---
 .../unix/sysv/linux/arm/nptl/unwind-forcedunwind.c |   4 +-
 .../unix/sysv/linux/arm/nptl/unwind-resume.c       |   4 +-
 ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S     |  26 ++-
 ports/sysdeps/unix/sysv/linux/arm/setcontext.S     |   4 +-
 ports/sysdeps/unix/sysv/linux/arm/syscall.S        |   5 +-
 ports/sysdeps/unix/sysv/linux/arm/sysdep.h         |  53 +++--
 ports/sysdeps/unix/sysv/linux/arm/vfork.S          |   3 +-
 scripts/config.guess                               |  31 ++-
 scripts/config.sub                                 |  72 +++----
 46 files changed, 1462 insertions(+), 442 deletions(-)
 create mode 100644 ports/sysdeps/arm/add_n.S
 create mode 100644 ports/sysdeps/arm/addmul_1.S
 create mode 100644 ports/sysdeps/arm/armv6t2/ffs.S
 create mode 100644 ports/sysdeps/arm/armv6t2/ffsll.S
 create mode 100644 ports/sysdeps/arm/armv6t2/rawmemchr.S
 create mode 100644 ports/sysdeps/arm/armv6t2/stpcpy.S
 create mode 100644 ports/sysdeps/arm/armv6t2/strchr.S
 create mode 100644 ports/sysdeps/arm/armv6t2/strcpy.S
 create mode 100644 ports/sysdeps/arm/armv6t2/strlen.S
 create mode 100644 ports/sysdeps/arm/armv6t2/strrchr.S
 create mode 100644 ports/sysdeps/arm/sub_n.S
 create mode 100644 ports/sysdeps/arm/submul_1.S
 mode change 100755 => 100644 scripts/config.guess
 mode change 100755 => 100644 scripts/config.sub

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2013-02-27  3:16 ` [PATCH 01/26] Sync config.guess and config.sub with upstream Richard Henderson
@ 2013-02-27  3:16 ` Richard Henderson
  2013-02-28  0:15   ` Joseph S. Myers
  2013-02-27  3:16 ` [PATCH 09/26] arm: Mark assembly files that will not use thumb mode Richard Henderson
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:16 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

When compiling with -mthumb, ld.so itself also needs __libc_do_syscall.
---
	* sysdeps/unix/sysv/linux/arm/Makefile [elf] (sysdep-rtld-routines):
	Include libc-do-syscall.
---
 ports/sysdeps/unix/sysv/linux/arm/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ports/sysdeps/unix/sysv/linux/arm/Makefile b/ports/sysdeps/unix/sysv/linux/arm/Makefile
index be7946e..56ef159 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/Makefile
+++ b/ports/sysdeps/unix/sysv/linux/arm/Makefile
@@ -10,7 +10,7 @@ shared-only-routines += libc-aeabi_read_tp
 endif
 
 ifeq ($(subdir),elf)
-sysdep-rtld-routines += aeabi_read_tp
+sysdep-rtld-routines += aeabi_read_tp libc-do-syscall
 endif
 
 ifeq ($(subdir),misc)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 03/26] arm: Handle armv6 in preconfigure
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
@ 2013-02-27  3:16 ` Richard Henderson
  2013-02-27 18:02   ` Joseph S. Myers
  2013-02-27  3:16 ` [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs Richard Henderson
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:16 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

---
	* sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.
---
 ports/sysdeps/arm/preconfigure | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ports/sysdeps/arm/preconfigure b/ports/sysdeps/arm/preconfigure
index b0c0540..d19e838 100644
--- a/ports/sysdeps/arm/preconfigure
+++ b/ports/sysdeps/arm/preconfigure
@@ -28,7 +28,10 @@ arm*)
 		  machine=armv6t2
 		  echo "Found compiler is configured for $machine"
 		  ;;
-
+		x__ARM_ARCH_6__)
+		  machine=armv6
+		  echo "Found compiler is configured for $machine"
+		  ;;
 		*)
 		  machine=arm
 		  echo 2>&1 "arm/preconfigure: Did not find ARM architecture type; using default"
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 09/26] arm: Mark assembly files that will not use thumb mode
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2013-02-27  3:16 ` [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines Richard Henderson
@ 2013-02-27  3:16 ` Richard Henderson
  2013-02-28  0:58   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 11/26] arm: Introduce and use NEGOFF series of macros Richard Henderson
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:16 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Some routines are written with complex LDM/STM insns that cannot be
used in thumb mode, or are highly conditional requiring excessive
IT insns.

When a future patch goes in to enable thumb2 by default, this marker
will be used to override that default.
---
	* ports/sysdeps/arm/__longjmp.S: Define NO_THUMB before <sysdep.h>
	* sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
	* sysdeps/arm/dl-trampoline.S: Likewise.
	* sysdeps/arm/memcpy.S: Likewise.
	* sysdeps/arm/memmove.S: Likewise.
	* sysdeps/arm/memset.S: Likewise.
	* sysdeps/arm/setjmp.S: Likewise.
	* sysdeps/arm/strlen.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/setcontext.S: Likewise.
---
 ports/sysdeps/arm/__longjmp.S                       | 2 ++
 ports/sysdeps/arm/crti.S                            | 2 ++
 ports/sysdeps/arm/crtn.S                            | 2 ++
 ports/sysdeps/arm/dl-trampoline.S                   | 2 ++
 ports/sysdeps/arm/memcpy.S                          | 2 ++
 ports/sysdeps/arm/memmove.S                         | 2 ++
 ports/sysdeps/arm/memset.S                          | 2 ++
 ports/sysdeps/arm/setjmp.S                          | 2 ++
 ports/sysdeps/arm/strlen.S                          | 2 ++
 ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S | 2 ++
 ports/sysdeps/unix/sysv/linux/arm/setcontext.S      | 2 ++
 11 files changed, 22 insertions(+)

diff --git a/ports/sysdeps/arm/__longjmp.S b/ports/sysdeps/arm/__longjmp.S
index 28281d5..af4b963 100644
--- a/ports/sysdeps/arm/__longjmp.S
+++ b/ports/sysdeps/arm/__longjmp.S
@@ -16,6 +16,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* ??? Needs more rearrangement for the LDM to handle thumb mode.  */
+#define NO_THUMB
 #include <sysdep.h>
 #define _SETJMP_H
 #define _ASM
diff --git a/ports/sysdeps/arm/crti.S b/ports/sysdeps/arm/crti.S
index 44e20f0..1d55ae2 100644
--- a/ports/sysdeps/arm/crti.S
+++ b/ports/sysdeps/arm/crti.S
@@ -38,6 +38,8 @@
    they can be called as functions.  The symbols _init and _fini are
    magic and cause the linker to emit DT_INIT and DT_FINI.  */
 
+/* Always build .init and .fini sections in ARM mode.  */
+#define NO_THUMB
 #include <libc-symbols.h>
 #include <sysdep.h>
 
diff --git a/ports/sysdeps/arm/crtn.S b/ports/sysdeps/arm/crtn.S
index 5ff3661..a01eb01 100644
--- a/ports/sysdeps/arm/crtn.S
+++ b/ports/sysdeps/arm/crtn.S
@@ -33,6 +33,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* Always build .init and .fini sections in ARM mode.  */
+#define NO_THUMB
 #include <sysdep.h>
 
 /* crtn.S puts function epilogues in the .init and .fini sections
diff --git a/ports/sysdeps/arm/dl-trampoline.S b/ports/sysdeps/arm/dl-trampoline.S
index ebf221c..b9769cb 100644
--- a/ports/sysdeps/arm/dl-trampoline.S
+++ b/ports/sysdeps/arm/dl-trampoline.S
@@ -16,6 +16,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* ??? Needs more rearrangement for the LDM to handle thumb mode.  */
+#define NO_THUMB
 #include <sysdep.h>
 #include <libc-symbols.h>
 
diff --git a/ports/sysdeps/arm/memcpy.S b/ports/sysdeps/arm/memcpy.S
index d8164b4..98b9b47 100644
--- a/ports/sysdeps/arm/memcpy.S
+++ b/ports/sysdeps/arm/memcpy.S
@@ -17,6 +17,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* Thumb requires excessive IT insns here.  */
+#define NO_THUMB
 #include <sysdep.h>
 
 /*
diff --git a/ports/sysdeps/arm/memmove.S b/ports/sysdeps/arm/memmove.S
index d33c1ce..059ca7a 100644
--- a/ports/sysdeps/arm/memmove.S
+++ b/ports/sysdeps/arm/memmove.S
@@ -17,6 +17,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* Thumb requires excessive IT insns here.  */
+#define NO_THUMB
 #include <sysdep.h>
 
 /*
diff --git a/ports/sysdeps/arm/memset.S b/ports/sysdeps/arm/memset.S
index 3152a84..9924cb911 100644
--- a/ports/sysdeps/arm/memset.S
+++ b/ports/sysdeps/arm/memset.S
@@ -16,6 +16,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* Thumb requires excessive IT insns here.  */
+#define NO_THUMB
 #include <sysdep.h>
 
 /* void *memset (dstpp, c, len) */
diff --git a/ports/sysdeps/arm/setjmp.S b/ports/sysdeps/arm/setjmp.S
index 774c78a..39f2662 100644
--- a/ports/sysdeps/arm/setjmp.S
+++ b/ports/sysdeps/arm/setjmp.S
@@ -16,6 +16,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* ??? Needs more rearrangement for the STM to handle thumb mode.  */
+#define NO_THUMB
 #include <sysdep.h>
 #define _SETJMP_H
 #define _ASM
diff --git a/ports/sysdeps/arm/strlen.S b/ports/sysdeps/arm/strlen.S
index 15e9221..2b947e2 100644
--- a/ports/sysdeps/arm/strlen.S
+++ b/ports/sysdeps/arm/strlen.S
@@ -16,6 +16,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* Thumb requires excessive IT insns here.  */
+#define NO_THUMB
 #include <sysdep.h>
 
 /* size_t strlen(const char *S)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
index bdcfa20..29edec6 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
@@ -15,6 +15,8 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* ??? Needs more rearrangement for the LDM to handle thumb mode.  */
+#define NO_THUMB
 #include <sysdep.h>
 
 	.section .rodata.str1.1,"aMS",%progbits,1
diff --git a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
index 8d96c57..45e751b 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
@@ -15,6 +15,8 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+/* ??? Needs more rearrangement for the LDM to handle thumb mode.  */
+#define NO_THUMB
 #include <sysdep.h>
 #include <rtld-global-offsets.h>
 
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/26] Sync config.guess and config.sub with upstream
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
  2013-02-27  3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
  2013-02-27  3:16 ` [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs Richard Henderson
@ 2013-02-27  3:16 ` Richard Henderson
  2013-02-27 17:03   ` Joseph S. Myers
  2013-02-27  3:16 ` [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines Richard Henderson
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:16 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

---
	* scripts/config.guess: Merge upstream version 2013-02-12.
	* scripts/config.sub: Likewise.
---
 scripts/config.guess | 31 ++++++++++------------
 scripts/config.sub   | 72 +++++++++++++++++++++++++++-------------------------
 2 files changed, 51 insertions(+), 52 deletions(-)
 mode change 100755 => 100644 scripts/config.guess
 mode change 100755 => 100644 scripts/config.sub

diff --git a/scripts/config.guess b/scripts/config.guess
old mode 100755
new mode 100644
index 872b96a..f475ceb
--- a/scripts/config.guess
+++ b/scripts/config.guess
@@ -1,14 +1,12 @@
 #! /bin/sh
 # Attempt to guess a canonical system name.
-#   Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
-#   2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
-#   2011, 2012 Free Software Foundation, Inc.
+#   Copyright 1992-2013 Free Software Foundation, Inc.
 
-timestamp='2012-09-25'
+timestamp='2013-02-12'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
+# the Free Software Foundation; either version 3 of the License, or
 # (at your option) any later version.
 #
 # This program is distributed in the hope that it will be useful, but
@@ -22,19 +20,17 @@ timestamp='2012-09-25'
 # As a special exception to the GNU General Public License, if you
 # distribute this file as part of a program that contains a
 # configuration script generated by Autoconf, you may include it under
-# the same distribution terms that you use for the rest of that program.
-
-
-# Originally written by Per Bothner.  Please send patches (context
-# diff format) to <config-patches@gnu.org> and include a ChangeLog
-# entry.
+# the same distribution terms that you use for the rest of that
+# program.  This Exception is an additional permission under section 7
+# of the GNU General Public License, version 3 ("GPLv3").
 #
-# This script attempts to guess a canonical system name similar to
-# config.sub.  If it succeeds, it prints the system name on stdout, and
-# exits with 0.  Otherwise, it exits with 1.
+# Originally written by Per Bothner.
 #
 # You can get the latest version of this script from:
 # http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
+#
+# Please send patches with a ChangeLog entry to config-patches@gnu.org.
+
 
 me=`echo "$0" | sed -e 's,.*/,,'`
 
@@ -54,9 +50,7 @@ version="\
 GNU config.guess ($timestamp)
 
 Originally written by Per Bothner.
-Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
-Free Software Foundation, Inc.
+Copyright 1992-2013 Free Software Foundation, Inc.
 
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -959,6 +953,9 @@ EOF
 	eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep '^CPU'`
 	test x"${CPU}" != x && { echo "${CPU}-unknown-linux-gnu"; exit; }
 	;;
+    or1k:Linux:*:*)
+	echo ${UNAME_MACHINE}-unknown-linux-gnu
+	exit ;;
     or32:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-gnu
 	exit ;;
diff --git a/scripts/config.sub b/scripts/config.sub
old mode 100755
new mode 100644
index bdda9e4..872199a
--- a/scripts/config.sub
+++ b/scripts/config.sub
@@ -1,24 +1,18 @@
 #! /bin/sh
 # Configuration validation subroutine script.
-#   Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
-#   2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
-#   2011, 2012 Free Software Foundation, Inc.
+#   Copyright 1992-2013 Free Software Foundation, Inc.
 
-timestamp='2012-08-18'
+timestamp='2013-02-12'
 
-# This file is (in principle) common to ALL GNU software.
-# The presence of a machine in this file suggests that SOME GNU software
-# can handle that machine.  It does not imply ALL GNU software can.
-#
-# This file is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
+# This file is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
 # (at your option) any later version.
 #
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
 #
 # You should have received a copy of the GNU General Public License
 # along with this program; if not, see <http://www.gnu.org/licenses/>.
@@ -26,11 +20,12 @@ timestamp='2012-08-18'
 # As a special exception to the GNU General Public License, if you
 # distribute this file as part of a program that contains a
 # configuration script generated by Autoconf, you may include it under
-# the same distribution terms that you use for the rest of that program.
+# the same distribution terms that you use for the rest of that
+# program.  This Exception is an additional permission under section 7
+# of the GNU General Public License, version 3 ("GPLv3").
 
 
-# Please send patches to <config-patches@gnu.org>.  Submit a context
-# diff and a properly formatted GNU ChangeLog entry.
+# Please send patches with a ChangeLog entry to config-patches@gnu.org.
 #
 # Configuration subroutine to validate and canonicalize a configuration type.
 # Supply the specified configuration type as an argument.
@@ -73,9 +68,7 @@ Report bugs and patches to <config-patches@gnu.org>."
 version="\
 GNU config.sub ($timestamp)
 
-Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
-Free Software Foundation, Inc.
+Copyright 1992-2013 Free Software Foundation, Inc.
 
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -156,7 +149,7 @@ case $os in
 	-convergent* | -ncr* | -news | -32* | -3600* | -3100* | -hitachi* |\
 	-c[123]* | -convex* | -sun | -crds | -omron* | -dg | -ultra | -tti* | \
 	-harris | -dolphin | -highlevel | -gould | -cbm | -ns | -masscomp | \
-	-apple | -axis | -knuth | -cray | -microblaze)
+	-apple | -axis | -knuth | -cray | -microblaze*)
 		os=
 		basic_machine=$1
 		;;
@@ -259,8 +252,10 @@ case $basic_machine in
 	| alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \
 	| alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] | alpha64pca5[67] \
 	| am33_2.0 \
-	| arc | arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb] | avr | avr32 \
-        | be32 | be64 \
+	| arc \
+	| arm | arm[bl]e | arme[lb] | armv[2-8] | armv[3-8][lb] | armv7[arm] \
+	| avr | avr32 \
+	| be32 | be64 \
 	| bfin \
 	| c4x | clipper \
 	| d10v | d30v | dlx | dsp16xx \
@@ -273,7 +268,7 @@ case $basic_machine in
 	| le32 | le64 \
 	| lm32 \
 	| m32c | m32r | m32rle | m68000 | m68k | m88k \
-	| maxq | mb | microblaze | mcore | mep | metag \
+	| maxq | mb | microblaze | microblazeel | mcore | mep | metag \
 	| mips | mipsbe | mipseb | mipsel | mipsle \
 	| mips16 \
 	| mips64 | mips64el \
@@ -291,16 +286,17 @@ case $basic_machine in
 	| mipsisa64r2 | mipsisa64r2el \
 	| mipsisa64sb1 | mipsisa64sb1el \
 	| mipsisa64sr71k | mipsisa64sr71kel \
+	| mipsr5900 | mipsr5900el \
 	| mipstx39 | mipstx39el \
 	| mn10200 | mn10300 \
 	| moxie \
 	| mt \
 	| msp430 \
 	| nds32 | nds32le | nds32be \
-	| nios | nios2 \
+	| nios | nios2 | nios2eb | nios2el \
 	| ns16k | ns32k \
 	| open8 \
-	| or32 \
+	| or1k | or32 \
 	| pdp10 | pdp11 | pj | pjl \
 	| powerpc | powerpc64 | powerpc64le | powerpcle \
 	| pyramid \
@@ -389,7 +385,8 @@ case $basic_machine in
 	| lm32-* \
 	| m32c-* | m32r-* | m32rle-* \
 	| m68000-* | m680[012346]0-* | m68360-* | m683?2-* | m68k-* \
-	| m88110-* | m88k-* | maxq-* | mcore-* | metag-* | microblaze-* \
+	| m88110-* | m88k-* | maxq-* | mcore-* | metag-* \
+	| microblaze-* | microblazeel-* \
 	| mips-* | mipsbe-* | mipseb-* | mipsel-* | mipsle-* \
 	| mips16-* \
 	| mips64-* | mips64el-* \
@@ -407,12 +404,13 @@ case $basic_machine in
 	| mipsisa64r2-* | mipsisa64r2el-* \
 	| mipsisa64sb1-* | mipsisa64sb1el-* \
 	| mipsisa64sr71k-* | mipsisa64sr71kel-* \
+	| mipsr5900-* | mipsr5900el-* \
 	| mipstx39-* | mipstx39el-* \
 	| mmix-* \
 	| mt-* \
 	| msp430-* \
 	| nds32-* | nds32le-* | nds32be-* \
-	| nios-* | nios2-* \
+	| nios-* | nios2-* | nios2eb-* | nios2el-* \
 	| none-* | np1-* | ns16k-* | ns32k-* \
 	| open8-* \
 	| orion-* \
@@ -788,7 +786,7 @@ case $basic_machine in
 		basic_machine=ns32k-utek
 		os=-sysv
 		;;
-	microblaze)
+	microblaze*)
 		basic_machine=microblaze-xilinx
 		;;
 	mingw64)
@@ -1023,7 +1021,11 @@ case $basic_machine in
 		basic_machine=i586-unknown
 		os=-pw32
 		;;
-	rdos)
+	rdos | rdos64)
+		basic_machine=x86_64-pc
+		os=-rdos
+		;;
+	rdos32)
 		basic_machine=i386-pc
 		os=-rdos
 		;;
@@ -1350,7 +1352,7 @@ case $os in
 	-gnu* | -bsd* | -mach* | -minix* | -genix* | -ultrix* | -irix* \
 	      | -*vms* | -sco* | -esix* | -isc* | -aix* | -cnk* | -sunos | -sunos[34]*\
 	      | -hpux* | -unos* | -osf* | -luna* | -dgux* | -auroraux* | -solaris* \
-	      | -sym* | -kopensolaris* \
+	      | -sym* | -kopensolaris* | -plan9* \
 	      | -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \
 	      | -aos* | -aros* \
 	      | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \
@@ -1496,9 +1498,6 @@ case $os in
 	-aros*)
 		os=-aros
 		;;
-	-kaos*)
-		os=-kaos
-		;;
 	-zvmoe)
 		os=-zvmoe
 		;;
@@ -1590,6 +1589,9 @@ case $basic_machine in
 	mips*-*)
 		os=-elf
 		;;
+	or1k-*)
+		os=-elf
+		;;
 	or32-*)
 		os=-coff
 		;;
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
  2013-02-27  3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
@ 2013-02-27  3:16 ` Richard Henderson
  2013-02-28  0:20   ` Joseph S. Myers
  2013-02-27  3:16 ` [PATCH 01/26] Sync config.guess and config.sub with upstream Richard Henderson
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:16 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

---
	* sysdeps/arm/sysdep.h (s, pc_ofs): New macros.
---
 ports/sysdeps/arm/sysdep.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 0e6f645..4a9f05a 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -114,4 +114,19 @@
    the caller.  */
 	.eabi_attribute 24, 1
 
+/* We occasionally want to use the S form simply to achieve a smaller
+   instruction form in Thumb mode.  Never set the flags in ARM mode.  */
+#ifdef __thumb__
+# define s(insn)	insn##s
+#else
+# define s(insn)	insn
+#endif
+
+/* This number is the offset from the pc at the current location.  */
+#ifdef __thumb__
+# define pc_ofs		4
+#else
+# define pc_ofs		8
+#endif
+
 #endif	/* __ASSEMBLER__ */
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 12/26] arm: Enable thumb2 mode in assembly files
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (22 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 15/26] arm: Delete LOADREGS macro Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27  3:17 ` [PATCH 16/26] arm: Commonize BX conditionals Richard Henderson
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

The preceeding patches have allowed for the few incompatibilities
between arm and thumb2 mode, or have marked the file as not wanting
to use thumb2 mode.

Note that one still has to edit config.make in the build directory
to add ASFLAGS to add -mthumb...
---
	* sysdeps/arm/sysdep.h [__ASSEMBLER__]: Enable thumb2 if __thumb2__.
---
 ports/sysdeps/arm/sysdep.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 2d40823..3459219 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -114,6 +114,17 @@
    the caller.  */
 	.eabi_attribute 24, 1
 
+/* The thumb2 encoding is reasonably complete.  Unless suppressed, use it.  */
+#ifdef NO_THUMB
+# undef __thumb__
+# undef __thumb2__
+	.arm
+#endif
+#ifdef __thumb2__
+	.syntax unified
+	.thumb
+#endif
+
 /* We occasionally want to use the S form simply to achieve a smaller
    instruction form in Thumb mode.  Never set the flags in ARM mode.  */
 #ifdef __thumb__
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 16/26] arm: Commonize BX conditionals
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (23 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 12/26] arm: Enable thumb2 mode in assembly files Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28 21:51   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 08/26] arm: Add IT insns for thumb mode Richard Henderson
                   ` (3 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Add BLX macro in addition and use it where appropriate.
---
	* sysdeps/arm/sysdep.h (BX, BXC, BLX): New macros.
	(DO_RET): Use BX.
	(RETINSTR): Use BXC.
	* sysdeps/arm/dl-tlsdesc.S (BX): Remove.
	* sysdeps/arm/dl-trampoline.S (BX): Remove.
	(_dl_runtime_profile): Use BLX.
---
 ports/sysdeps/arm/dl-tlsdesc.S    |  6 ------
 ports/sysdeps/arm/dl-trampoline.S |  9 +--------
 ports/sysdeps/arm/sysdep.h        | 29 +++++++++++++----------------
 3 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 15a0c21..417b8b3 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -20,12 +20,6 @@
 #include <tls.h>
 #include "tlsdesc.h"
 
-#ifdef __USE_BX__
-  #define BX(x)	bx x
-#else
-  #define BX(x)	mov pc, x
-#endif
-
 	.text
 	@ emit debug information with cfi
 	@ use arm-specific pseudos for unwinding itself
diff --git a/ports/sysdeps/arm/dl-trampoline.S b/ports/sysdeps/arm/dl-trampoline.S
index a13a4c3..c34c61e 100644
--- a/ports/sysdeps/arm/dl-trampoline.S
+++ b/ports/sysdeps/arm/dl-trampoline.S
@@ -21,12 +21,6 @@
 #include <sysdep.h>
 #include <libc-symbols.h>
 
-#if defined(__USE_BX__)
-#define BX(x) bx	x
-#else
-#define BX(x) mov	pc, x
-#endif
-
 	.text
 	.globl _dl_runtime_resolve
 	.type _dl_runtime_resolve, #function
@@ -192,8 +186,7 @@ _dl_runtime_profile:
 	add	ip, r7, #72
 	ldmia	ip, {r0-r3}
 	ldr	ip, [r7, #264]
-	mov	lr, pc
-	BX(ip)
+	BLX(ip)
 	stmia	r7, {r0-r3}
 
 	@ Call pltexit.
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index bfdba27..71abb7a 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -33,26 +33,23 @@
 
 #define PLTJMP(_x)	_x##(PLT)
 
-/* APCS-32 doesn't preserve the condition codes across function call. */
-#ifdef __APCS_32__
 #ifdef __USE_BX__
-#define RETINSTR(cond, reg)	\
-	bx##cond	reg
-#define DO_RET(_reg)		\
-	bx _reg
+# define BX(R)		bx	R
+# define BXC(C, R)	bx##C	R
+# ifdef __ARM_ARCH_4T__
+#  define BLX(R)	mov	lr, pc; bx R
+# else
+#  define BLX(R)	blx	R
+# endif
 #else
-#define RETINSTR(cond, reg)	\
-	mov##cond	pc, reg
-#define DO_RET(_reg)		\
-	mov pc, _reg
-#endif
-#else  /* APCS-26 */
-#define RETINSTR(cond, reg)	\
-	mov##cond##s	pc, reg
-#define DO_RET(_reg)		\
-	movs pc, _reg
+# define BX(R)		mov	pc, R
+# define BXC(C, R)	mov##C	pc, R
+# define BLX(R)		mov	lr, pc; mov pc, R
 #endif
 
+#define DO_RET(R)	BX(R)
+#define RETINSTR(C, R)	BXC(C, R)
+
 /* Define an entry point visible from C.  */
 #define	ENTRY(name)							      \
   .globl C_SYMBOL_NAME(name);						      \
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (20 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 26/26] arm: Add optimized add_n and sub_n Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27 17:54   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 15/26] arm: Delete LOADREGS macro Richard Henderson
                   ` (6 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

New defines from gcc 4.8:
#define __ARM_ARCH_ISA_ARM 1
#define __ARM_ARCH_PROFILE 65
#define __ARM_ARCH_ISA_THUMB 2
#define __ARM_ARCH 7

all of which got in the way of the one we wanted:
#define __ARM_ARCH_7A__ 1
---
	* sysdeps/arm/preconfigure: Adjust scan for __ARM_ARCH_* defines.
---
 ports/sysdeps/arm/preconfigure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ports/sysdeps/arm/preconfigure b/ports/sysdeps/arm/preconfigure
index 20f6d91..b0c0540 100644
--- a/ports/sysdeps/arm/preconfigure
+++ b/ports/sysdeps/arm/preconfigure
@@ -10,7 +10,7 @@ arm*)
 		# an appropriate directory exists in sysdeps/arm
 		archcppflag=`echo "" |
 		$CC $CFLAGS $CPPFLAGS -E -dM - |
-		  grep __ARM_ARCH |
+		  grep '__ARM_ARCH_.*__' |
 		  sed -e 's/^#define //' -e 's/ .*//'`
 
 		case x$archcppflag in
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 18/26] arm: Use GET_TLS more often
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 13/26] arm: Store lr in r2 around GET_TLS Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28 21:59   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

---
	* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_undefweak): Use GET_TLS,
	save LR in R1, and return directly from R1.
	(_dl_tlsdesc_dynamic): Use GET_TLS.
	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h 
	(SINGLE_THREAD_P): Use GET_TLS.
---
 ports/sysdeps/arm/dl-tlsdesc.S                     | 23 +++++++---------------
 .../unix/sysv/linux/arm/nptl/sysdep-cancel.h       |  2 +-
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 6c47743..12214f1 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -44,22 +44,13 @@ _dl_tlsdesc_return:
 	.fnstart
 	.align 2
 _dl_tlsdesc_undefweak:
-	@ Are we allowed a misaligned stack pointer calling read_tp?
-	.save	{lr}
-	push	{lr}
-	cfi_adjust_cfa_offset (4)
-	cfi_rel_offset (lr,0)
-	bl 	__aeabi_read_tp
+	@ ??? The only GET_TLS implementation in tree is Linux,
+	@ which is guaranteed to clobber only R0 and LR.
+	mov	r1, lr
+	cfi_register (lr, r1)
+	GET_TLS
 	rsb 	r0, r0, #0
-#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
-	pop	{lr}
-	cfi_adjust_cfa_offset (-4)
-	cfi_restore (lr)
-	bx	lr
-#else
-	pop	{pc}
-#endif
-
+	BX	(r1)
 	cfi_endproc
 	.fnend
 	.size	_dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
@@ -104,7 +95,7 @@ _dl_tlsdesc_dynamic:
 	cfi_rel_offset (r4,8)
 	cfi_rel_offset (lr,12)
 	ldr	r1, [r0] /* td */
-	bl	__aeabi_read_tp
+	GET_TLS
 	mov	r4, r0 /* r4 = tp */
 	ldr	r0, [r0]
 	ldr	r2, [r1, #8] /* gen_count */
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index f0f7043..c2ab0ce 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -220,7 +220,7 @@ extern int __local_multiple_threads attribute_hidden;
 	push	{r0, lr};						\
 	cfi_adjust_cfa_offset (8);					\
 	cfi_rel_offset (lr, 4);						\
-	bl	__aeabi_read_tp;					\
+	GET_TLS;							\
 	NEGOFF_ADJ_BASE(r0, MULTIPLE_THREADS_OFFSET);			\
 	ldr	ip, NEGOFF_OFF1(r0, MULTIPLE_THREADS_OFFSET);		\
 	pop	{r0, lr};						\
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 11/26] arm: Introduce and use NEGOFF series of macros
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2013-02-27  3:16 ` [PATCH 09/26] arm: Mark assembly files that will not use thumb mode Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27  3:17 ` [PATCH 13/26] arm: Store lr in r2 around GET_TLS Richard Henderson
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

There are several places in which we access negative offsets from
the thread-pointer, but thumb2 only supports positive offsets in
memory references.

Avoid duplicating the rather large macros in which these references
are embedded by abstracting out the operation.
---
	* sysdeps/arm/sysdep.h (NEGOFF_ADJ_BASE): New macro.
	(NEGOFF_ADJ_BASE2, NEGOFF_OFF1, NEGOFF_OFF2): New macros.
	* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Use them.
	* sysdeps/unix/sysv/linux/arm/nptl/vfork.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h (SINGLE_THREAD_P):
	Likewise.
---
 ports/sysdeps/arm/sysdep.h                            | 16 ++++++++++++++++
 ports/sysdeps/unix/sysv/linux/arm/clone.S             |  5 +++--
 ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S     | 11 ++++++-----
 .../sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h  | 19 ++++++++++---------
 ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S        | 14 ++++++++------
 5 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index b7ba9b1..2d40823 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -147,4 +147,20 @@
 99:	OP	R, [pc, T]
 #endif
 
+/* Cope with negative memory offsets, which thumb can't encode.
+   Use NEGOFF_ADJ_BASE to (conditionally) alter the base register,
+   and then NEGOFF_OFF1 to use 0 for thumb and the offset for arm,
+   or NEGOFF_OFF2 to use A-B for thumb and A for arm.  */
+#ifdef __thumb2__
+# define NEGOFF_ADJ_BASE(R, OFF)	add R, R, $OFF
+# define NEGOFF_ADJ_BASE2(D, S, OFF)	add D, S, $OFF
+# define NEGOFF_OFF1(R, OFF)		[R]
+# define NEGOFF_OFF2(R, OFFA, OFFB)	[R, $((OFFA) - (OFFB))]
+#else
+# define NEGOFF_ADJ_BASE(R, OFF)
+# define NEGOFF_ADJ_BASE2(D, S, OFF)	mov D, S
+# define NEGOFF_OFF1(R, OFF)		[R, $OFF]
+# define NEGOFF_OFF2(R, OFFA, OFFB)	[R, $OFFA]
+#endif
+
 #endif	/* __ASSEMBLER__ */
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index 9de37f2..58ee7b4 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -81,8 +81,9 @@ PSEUDO_END (__clone)
 	ite	ne
 	movne	r0, #-1
 	swieq	0x0
-	str	r0, [r1, #PID_OFFSET]
-	str	r0, [r1, #TID_OFFSET]
+	NEGOFF_ADJ_BASE(r1, TID_OFFSET)
+	str	r0, NEGOFF_OFF1(r1, TID_OFFSET)
+	str	r0, NEGOFF_OFF2(r1, PID_OFFSET, TID_OFFSET)
 3:
 #endif
 	@ pick the function arg and call address off the stack and execute
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
index 749aaab..bc0a771 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
@@ -26,14 +26,15 @@
 	ldr	lr, [sp], #4;		/* Restore LR.  */		\
 	cfi_adjust_cfa_offset (-4);					\
 	cfi_restore (lr);						\
-	mov	r2, r0;			/* Save the TLS addr in r2.  */	\
-	ldr	r3, [r2, #PID_OFFSET];	/* Load the saved PID.  */	\
-	rsb	r0, r3, #0;		/* Negate it.  */		\
-	str	r0, [r2, #PID_OFFSET]	/* Store the temporary PID.  */
+	NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2. */ \
+	ldr	r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID.  */  \
+	rsb	r0, r3, #0;		/* Negate it.  */		     \
+	str	r0, NEGOFF_OFF1(r2, PID_OFFSET); /* Store the temp PID.  */
 
 /* Restore the old PID value in the parent.  */
 #define RESTORE_PID \
 	cmp	r0, #0;			/* If we are the parent... */	\
-	strne	r3, [r2, #PID_OFFSET]	/* ... restore the saved PID.  */
+	it	ne;							\
+	strne	r3, NEGOFF_OFF1(r2, PID_OFFSET); /* restore the saved PID.  */
 
 #include "../vfork.S"
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index b6dc3e0..0c9e780 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -217,15 +217,16 @@ extern int __local_multiple_threads attribute_hidden;
 				   header.multiple_threads) == 0, 1)
 #  else
 #   define SINGLE_THREAD_P						\
-  stmfd	sp!, {r0, lr};							\
-  cfi_adjust_cfa_offset (8);						\
-  cfi_rel_offset (lr, 4);						\
-  bl	__aeabi_read_tp;						\
-  ldr	ip, [r0, #MULTIPLE_THREADS_OFFSET];				\
-  ldmfd	sp!, {r0, lr};							\
-  cfi_adjust_cfa_offset (-8);						\
-  cfi_restore (lr);							\
-  teq	ip, #0
+	stmfd	sp!, {r0, lr};						\
+	cfi_adjust_cfa_offset (8);					\
+	cfi_rel_offset (lr, 4);						\
+	bl	__aeabi_read_tp;					\
+	NEGOFF_ADJ_BASE(r0, MULTIPLE_THREADS_OFFSET);			\
+	ldr	ip, NEGOFF_OFF1(r0, MULTIPLE_THREADS_OFFSET);		\
+	ldmfd	sp!, {r0, lr};						\
+	cfi_adjust_cfa_offset (-8);					\
+	cfi_restore (lr);						\
+	teq	ip, #0
 #   define SINGLE_THREAD_P_PIC(x) SINGLE_THREAD_P
 #  endif
 # endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
index 1bbe5c6..3c0ef78 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
@@ -26,15 +26,17 @@
 	ldr	lr, [sp], #4;		/* Restore LR.  */		\
 	cfi_adjust_cfa_offset (-4);					\
 	cfi_restore (lr);						\
-	mov	r2, r0;			/* Save the TLS addr in r2.  */	\
-	ldr	r3, [r2, #PID_OFFSET];	/* Load the saved PID.  */	\
-	rsbs	r0, r3, #0;		/* Negate it.  */		\
-	moveq	r0, #0x80000000;	/* Use 0x80000000 if it was 0.  */ \
-	str	r0, [r2, #PID_OFFSET]	/* Store the temporary PID.  */
+	NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2.  */ \
+	ldr	r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID.  */   \
+	rsbs	r0, r3, #0;		/* Negate it.  */		      \
+	it	eq;							      \
+	moveq	r0, #0x80000000;	/* Use 0x80000000 if it was 0.  */    \
+	str	r0, NEGOFF_OFF1(r2, PID_OFFSET); /* Store the temp PID.  */
 
 /* Restore the old PID value in the parent.  */
 #define RESTORE_PID \
 	cmp	r0, #0;		/* If we are the parent... */		\
-	strne	r3, [r2, #PID_OFFSET]	/* ... restore the saved PID.  */
+	it	ne;							\
+	strne	r3, NEGOFF_OFF1(r2, PID_OFFSET); /* restore the saved PID.  */
 
 #include "../vfork.S"
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 13/26] arm: Store lr in r2 around GET_TLS
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 11/26] arm: Introduce and use NEGOFF series of macros Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27  3:17 ` [PATCH 18/26] arm: Use GET_TLS more often Richard Henderson
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Rather than on the stack.
---
	* sysdeps/unix/sysv/linux/arm/nptl/vfork.S (SAVE_PID): Save lr to r2
	around the GET_TLS call.
	* sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S (SAVE_PID): Likewise.
---
 ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S | 8 +++-----
 ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S    | 8 +++-----
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
index bc0a771..cd51122 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
@@ -19,12 +19,10 @@
 
 /* Save the PID value.  */
 #define SAVE_PID \
-	str	lr, [sp, #-4]!;		/* Save LR.  */			\
-	cfi_adjust_cfa_offset (4);					\
-	cfi_rel_offset (lr, 0);						\
+	mov	r2, lr;			/* Save LR.  */			\
+	cfi_register (lr, r2);						\
 	GET_TLS;							\
-	ldr	lr, [sp], #4;		/* Restore LR.  */		\
-	cfi_adjust_cfa_offset (-4);					\
+	mov	lr, r2;			/* Restore LR.  */		\
 	cfi_restore (lr);						\
 	NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2. */ \
 	ldr	r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID.  */  \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
index 3c0ef78..4007081 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
@@ -19,12 +19,10 @@
 
 /* Save the PID value.  */
 #define SAVE_PID \
-	str	lr, [sp, #-4]!;		/* Save LR.  */			\
-	cfi_adjust_cfa_offset (4);					\
-	cfi_rel_offset (lr, 0);						\
+	mov	r2, lr;			/* Save LR.  */			\
+	cfi_register (lr, r2);						\
 	GET_TLS;							\
-	ldr	lr, [sp], #4;		/* Restore LR.  */		\
-	cfi_adjust_cfa_offset (-4);					\
+	mov	lr, r2;			/* Restore LR.  */		\
 	cfi_restore (lr);						\
 	NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2.  */ \
 	ldr	r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID.  */   \
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 20/26] arm: Implement armv6t2 optimized strlen
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 07/26] arm: Introduce and use GET_TLS Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27 17:12   ` Måns Rullgård
  2013-02-27  3:17 ` [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc Richard Henderson
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Twice as fast for long strings and 50% faster for short strings
over the armv4 version on A15.
---
	* sysdeps/arm/armv6t2/strlen.S: New file.
---
 ports/sysdeps/arm/armv6t2/strlen.S | 93 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 ports/sysdeps/arm/armv6t2/strlen.S

diff --git a/ports/sysdeps/arm/armv6t2/strlen.S b/ports/sysdeps/arm/armv6t2/strlen.S
new file mode 100644
index 0000000..d7d6e1f
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strlen.S
@@ -0,0 +1,93 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+ENTRY(strlen)
+	@ r0 = start of string
+	pld	[r0]
+
+	@ To cater to long strings, we want to search through a few
+	@ characters until we reach an aligned pointer.  To cater to
+	@ small strings, we don't want to start doing word operations
+	@ immediately.  The compromise is a maximum of 16 bytes less
+	@ whatever is required to end with an aligned pointer.
+	@ r3 = number of characters to search in alignment loop
+	and	r3, r0, #7
+	s(mov)	r1, r0			@ Save the input pointer
+	rsb	r3, r3, #16
+
+	@ Loop until we find ...
+1:	ldrb	r2, [r0], #1
+	subs	r3, r3, #1		@ ... the aligment point
+	it	ne
+	cmpne	r2, #0			@ ... or EOS
+	bne	1b
+
+	@ Disambiguate the exit possibilites above
+	cmp	r2, #0			@ Found EOS
+	ittt	eq
+	subeq	r0, r0, #1		@ Undo post-inc above
+	subeq	r0, r0, r1		@ Subtract input to compute length
+	bxeq	lr
+
+	@ So now we're aligned.
+	ldrd	r2, r3, [r0], #8
+	movw	ip, #0xfefe
+	pld	[r0, #64]
+	movt	ip, #0xfefe
+	pld	[r0, #128]
+	pld	[r0, #192]
+
+	@ Loop searching for EOS or C, 8 bytes at a time.
+	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+	@ that was originally zero and 0xff otherwise.  Therefore we consider
+	@ the lsb of each byte the "found" bit, with 0 for a match.
+	.balign	16
+2:	uqadd8	r2, r2, ip		@ Find EOS
+	uqadd8	r3, r3, ip
+	pld	[r0, #256]		@ Prefetch 4 lines ahead
+	s(and)	r3, r3, r2		@ Combine the two words
+	mvns	r3, r3			@ Test for any found bit true
+	it	eq
+	ldrdeq	r2, r3, [r0], #8
+	beq	2b
+
+	@ Found something.  Disambiguate between first and second words.
+	@ Adjust r0 to point to the word containing the match.
+	@ Adjust r2 to the found bits for the word containing the match.
+	mvns	r2, r2
+	itee	ne
+	subne	r0, r0, #8
+	moveq	r2, r3
+	subeq	r0, r0, #4
+
+	@ Find the bit-offset of the match within the word.
+#ifdef __ARMEL__
+	rbit	r2, r2			@ For LE we need count-trailing-zeros
+#endif
+	clz	r2, r2
+	add	r0, r0, r2, lsr #3	@ Adjust the pointer to the found byte
+	s(sub)	r0, r0, r1		@ Subtract input to compute length
+	bx	lr
+
+END(strlen)
+
+libc_hidden_builtin_def (strlen)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 24/26] arm: Add optimized addmul_1 Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28  1:31   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 25/26] arm: Add optimized submul_1 Richard Henderson
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Not specifically speed tested against the byte-by-byte versions,
but expected to be about as fast as the new strlen.
---
	* sysdeps/arm/armv6t2/strchr.S: New file.
	* sysdeps/arm/armv6t2/strrchr.S: New file.
	* sysdeps/arm/armv6t2/rawmemchr.S: New file.
---
 ports/sysdeps/arm/armv6t2/rawmemchr.S |  81 ++++++++++++++++++++
 ports/sysdeps/arm/armv6t2/strchr.S    | 138 ++++++++++++++++++++++++++++++++++
 ports/sysdeps/arm/armv6t2/strrchr.S   | 137 +++++++++++++++++++++++++++++++++
 3 files changed, 356 insertions(+)
 create mode 100644 ports/sysdeps/arm/armv6t2/rawmemchr.S
 create mode 100644 ports/sysdeps/arm/armv6t2/strchr.S
 create mode 100644 ports/sysdeps/arm/armv6t2/strrchr.S

diff --git a/ports/sysdeps/arm/armv6t2/rawmemchr.S b/ports/sysdeps/arm/armv6t2/rawmemchr.S
new file mode 100644
index 0000000..eea7707
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/rawmemchr.S
@@ -0,0 +1,81 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+ENTRY(__rawmemchr)
+	@ r0 = start of string
+	@ r1 = character to match
+	@ returns a pointer to the match, which must be present.
+	uxtb	r1, r1
+
+	@ Loop until we find ...
+1:	ldrb	r2, [r0], #1
+	cmp	r2, r1			@ ... the character
+	it	ne
+	tstne	r0, #7			@ ... the aligment point
+	bne	1b
+
+	@ Disambiguate the exit possibilites above
+	cmp	r2, r1			@ Found the character
+	itt	eq
+	subeq	r0, r0, #1
+	bxeq	lr
+
+	@ So now we're aligned.
+	orr	r1, r1, r1, lsl #8	@ Replicate C to all bytes
+	movw	ip, #0xfefe
+	orr	r1, r1, r1, lsl #16
+	movt	ip, #0xfefe
+
+	@ Loop searching for EOS or C, 8 bytes at a time.
+	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+	@ that was originally zero and 0xff otherwise.  Therefore we consider
+	@ the lsb of each byte the "found" bit, with 0 for a match.
+2:	ldrd	r2, r3, [r0], #8
+	s(eor)	r2, r2, r1		@ Convert C bytes to 0
+	s(eor)	r3, r3, r1
+	uqadd8	r2, r2, ip		@ Find C
+	uqadd8	r3, r3, ip
+	s(and)	r3, r3, r2		@ Combine the two words
+	mvns	r3, r3			@ Test for any found bit true
+	beq	2b
+
+	@ Found something.  Disambiguate between first and second words.
+	@ Adjust r0 to point to the word containing the match.
+	@ Adjust r2 to the found bits for the word containing the match.
+	mvns	r2, r2
+	itee	ne
+	subne	r0, r0, #8
+	subeq	r0, r0, #4
+	moveq	r2, r3
+
+	@ Find the bit-offset of the match within the word.
+#ifdef __ARMEL__
+	rbit	r2, r2			@ For LE we need count-trailing-zeros
+#endif
+	clz	r2, r2
+	add	r0, r0, r2, lsr #3	@ Adjust the pointer to the found byte
+	bx	lr
+
+END(__rawmemchr)
+
+weak_alias (__rawmemchr, rawmemchr)
+libc_hidden_def (__rawmemchr)
diff --git a/ports/sysdeps/arm/armv6t2/strchr.S b/ports/sysdeps/arm/armv6t2/strchr.S
new file mode 100644
index 0000000..e7f5acf
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strchr.S
@@ -0,0 +1,138 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+ENTRY(strchr)
+	@ r0 = start of string
+	@ r1 = character to match
+	@ returns NULL for no match, or a pointer to the match
+
+	@ To cater to long strings, we want to search through a few
+	@ characters until we reach an aligned pointer.  To cater to
+	@ small strings, we don't want to start doing word operations
+	@ immediately.  The compromise is a maximum of 32 bytes less
+	@ whatever is required to end with an aligned pointer.
+	@ r3 = number of characters to search in alignment loop
+	and	r3, r0, #7
+	uxtb	r1, r1
+	rsb	r3, r3, #32
+
+	@ Loop until we find ...
+1:	ldrb	r2, [r0], #1
+	subs	r3, r3, #1		@ ... the aligment point
+	it	ne
+	cmpne	r2, r1			@ ... or the character
+	it	ne
+	cmpne	r2, #0			@ ... or EOS
+	bne	1b
+
+	@ Disambiguate the exit possibilites above
+	cmp	r2, r1			@ Found the character
+	itt	eq
+	subeq	r0, r0, #1
+	bxeq	lr
+
+	cmp	r2, #0			@ Found EOS
+	itt	eq
+	moveq	r0, #0
+	bxeq	lr
+
+	@ So now we're aligned.  Now we actually need a stack frame.
+	push	{ r4, r5, r6, r7 }
+	cfi_adjust_cfa_offset (16)
+	cfi_rel_offset (r4, 0)
+	cfi_rel_offset (r5, 4)
+	cfi_rel_offset (r6, 8)
+	cfi_rel_offset (r7, 12)
+
+	orr	r1, r1, r1, lsl #8	@ Replicate C to all bytes
+	movw	ip, #0xfefe
+	orr	r1, r1, r1, lsl #16
+	movt	ip, #0xfefe
+
+	@ Loop searching for EOS or C, 8 bytes at a time.
+2:	ldrd	r2, r3, [r0], #8
+	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+	@ that was originally zero and 0xff otherwise.  Therefore we consider
+	@ the lsb of each byte the "found" bit, with 0 for a match.
+	uqadd8	r4, r2, ip		@ Find EOS
+	uqadd8	r5, r3, ip
+	eor	r6, r2, r1		@ Convert C bytes to 0
+	eor	r7, r3, r1
+	uqadd8	r6, r6, ip		@ Find C
+	uqadd8	r7, r7, ip
+	s(and)	r4, r4, r6		@ Combine found for EOS and C
+	s(and)	r5, r5, r7
+	and	r6, r4, r5		@ Combine the two words
+	mvns	r6, r6			@ Test for any found bit true
+	beq	2b
+
+	@ Invert the sense of the found bits.  After this we have 1 in
+	@ any byte that contains a match, and 0 otherwise.
+	s(mvn)	r5, r5
+	mvns	r4, r4
+
+	@ Found something.  Disambiguate between first and second words.
+	@ Adjust r0 to point to the word containing the match.
+	@ Adjust r2 to the contents of the word containing the match.
+	@ Adjust r4 to the found bits for the word containing the match.
+	iteee	ne
+	subne	r0, r0, #8
+	subeq	r0, r0, #4
+	moveq	r4, r5
+	moveq	r2, r3
+
+	@ Find the bit-offset of the match within the word.
+#ifdef __ARMEL__
+	@ For little-endian, we only need to reverse the bits so that
+	@ count-leading-zeros becomes in effect count-trailing-zeros.
+	rbit	r4, r4
+	clz	r3, r4
+#else
+	@ For big-endian, we're matching 0x01 (not 0x80), and so the
+	@ bit offset is 7 too high.  Also, we byte-swap the word so
+	@ that we can shift down to extract the found byte.
+	clz	r3, r4
+	rev	r2, r2
+	s(sub)	r3, r3, #7
+#endif
+	s(lsr)	r2, r2, r3		@ Shift down found byte
+	add	r0, r0, r3, lsr #3	@ Adjust the pointer to the found byte
+	uxtb	r2, r2			@ Extract found byte
+	uxtb	r1, r1			@ Undo replication of C
+
+	pop	{ r4, r5, r6, r7 }
+	cfi_adjust_cfa_offset (-16)
+	cfi_restore (r4)
+	cfi_restore (r5)
+	cfi_restore (r6)
+	cfi_restore (r7)
+
+	@ Disambiguate between EOS and C.
+	cmp	r2, r1
+	it	ne
+	movne	r0, #0			@ Found EOS, return NULL
+	bx	lr
+
+END(strchr)
+
+weak_alias (strchr, index)
+libc_hidden_builtin_def (strchr)
diff --git a/ports/sysdeps/arm/armv6t2/strrchr.S b/ports/sysdeps/arm/armv6t2/strrchr.S
new file mode 100644
index 0000000..483e52a
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strrchr.S
@@ -0,0 +1,137 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+ENTRY(strrchr)
+	@ r0 = start of string
+	@ r1 = character to match
+	@ returns NULL for no match, or a pointer to the match
+
+	mov	r3, r0
+	s(mov)	r0, #0
+	uxtb	r1, r1
+
+	@ Loop a few times until we're aligned.
+	tst	r3, #7
+	beq	2f
+1:	ldrb	r2, [r3], #1
+	cmp	r2, r1			@ Find the character
+	it	eq
+	subeq	r0, r3, #1
+	cmp	r2, #0			@ Find EOS
+	it	eq
+	bxeq	lr
+	tst	r3, #7			@ Find the aligment point
+	bne	1b
+
+	@ So now we're aligned.  Now we actually need a stack frame.
+2:	push	{ r4, r5, r6, r7 }
+	cfi_adjust_cfa_offset (16)
+	cfi_rel_offset (r4, 0)
+	cfi_rel_offset (r5, 4)
+	cfi_rel_offset (r6, 8)
+	cfi_rel_offset (r7, 12)
+
+	orr	r1, r1, r1, lsl #8	@ Replicate C to all bytes
+	movw	ip, #0xfefe
+	orr	r1, r1, r1, lsl #16
+	movt	ip, #0xfefe
+	s(mov)	r2, #0			@ No found bits yet
+
+	@ Loop searching for EOS and C, 8 bytes at a time.
+	@ Any time we find a match in a word, we copy the address of
+	@ the word to r0, and the found bits to r2.
+3:	ldrd	r4, r5, [r3], #8
+	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+	@ that was originally zero and 0xff otherwise.  Therefore we consider
+	@ the lsb of each byte the "found" bit, with 0 for a match.
+	uqadd8	r6, r4, ip		@ Find EOS
+	uqadd8	r7, r5, ip
+	s(eor)	r4, r4, r1		@ Convert C bytes to 0
+	s(eor)	r5, r5, r1
+	uqadd8	r4, r4, ip		@ Find C
+	uqadd8	r5, r5, ip
+	mvns	r6, r6			@ Found EOS, first word
+	bne	4f
+	mvns	r4, r4			@ Handle C, first word
+	itt	ne
+	subne	r0, r3, #8
+	movne	r2, r4
+	mvns	r7, r7			@ Found EOS, second word
+	bne	5f
+	mvns	r5, r5			@ Handle C, second word
+	itt	ne
+	subne	r0, r3, #4
+	movne	r2, r5
+	b	3b
+
+	@ Found EOS in second word; fold to first word.
+5:	s(add)	r3, r3, #4		@ Dec pointer to 2nd word, with below
+	mov	r4, r5			@ Overwrite first word C found
+	mov	r6, r7			@ Overwrite first word EOS found
+
+	@ Found EOS.  Zap found C after EOS.
+4:	s(sub)	r3, r3, #8		@ Decrement pointer to first word
+	s(mvn)	r4, r4			@ Positive found bit for C
+#ifdef __ARMEL__
+	sub	r7, r6, #1		@ Toggle EOS lsb and below
+	s(eor)	r6, r6, r7		@ All bits below and including lsb
+	ands	r4, r4, r6		@ Zap C above EOS
+#else
+	clz	r6, r6			@ Find highest EOS bit set.
+	s(mvn)	r7, #0
+	s(add)	r6, r6, #1
+	s(lsr)	r7, r7, r6		@ All bits below msb
+	bics	r4, r4, r7		@ Zap C below EOS
+#endif
+	itt	ne
+	movne	r2, r4			@ Copy to result, if still non-zero
+	movne	r0, r3
+
+	pop	{ r4, r5, r6, r7 }
+	cfi_adjust_cfa_offset (-16)
+	cfi_restore (r4)
+	cfi_restore (r5)
+	cfi_restore (r6)
+	cfi_restore (r7)
+
+	@ Adjust the result pointer if we found a word containing C.
+	@ Rather than fight with thumb IT insn about how many insns
+	@ we'd like to conditionally execute, just jump over them here.
+#ifdef __thumb2__
+#define ne(insn)  insn
+	cbz	r2, 6f			@ Did we find any C?
+#else
+#define ne(insn)  insn##ne
+	cmp	r2, #0
+#endif
+#ifdef __ARMEB__
+	ne(rbit) r2, r2			@ BE needs count-trailing-zeros
+#endif
+	ne(clz)	r2, r2			@ Find the bit offset of the last C
+	ne(rsb)	r2, r2, #32		@ Convert to a count from the right
+	ne(add)	r0, r0, r2, lsr #3	@ Convert to byte offset and add.
+6:	bx	lr
+
+END(strrchr)
+
+weak_alias (strrchr, rindex)
+libc_hidden_builtin_def (strrchr)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 24/26] arm: Add optimized addmul_1
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 21/26] arm: Implement armv6t2 optimized strcpy Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28 13:58   ` Måns Rullgård
  2013-02-27  3:17 ` [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr Richard Henderson
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Written from scratch rather than copied from GMP, due to GPL 2.1 vs
GPL 3, but tested with the GMP testsuite.

This is 25% faster than the generic code as measured on Cortex-A15,
and the same speed as GMP on the same core.  It's probably slower
than GMP on the A8 and A9 cores though.
---
	* sysdeps/arm/addmul_1.S: New file.
---
 ports/sysdeps/arm/addmul_1.S | 60 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 ports/sysdeps/arm/addmul_1.S

diff --git a/ports/sysdeps/arm/addmul_1.S b/ports/sysdeps/arm/addmul_1.S
new file mode 100644
index 0000000..ecb8983
--- /dev/null
+++ b/ports/sysdeps/arm/addmul_1.S
@@ -0,0 +1,60 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+@		cycles/limb
+@ StrongArm	   ?
+@ Cortex-A8	   ?
+@ Cortex-A9	   ?
+@ Cortex-A15	   4
+
+/* mp_limb_t mpn_addmul_1(res_ptr, src1_ptr, size, s2_limb) */
+
+ENTRY(__mpn_addmul_1)
+	push	{ r4, r5, r6 }
+	cfi_adjust_cfa_offset (12)
+	cfi_rel_offset (r4, 0)
+	cfi_rel_offset (r5, 4)
+	cfi_rel_offset (r6, 8)
+
+	ldr	r6, [r1], #4
+	ldr	r5, [r0]
+	mov	r4, #0			/* init carry in */
+	b	1f
+0:
+	ldr	r6, [r1], #4		/* load next ul */
+	adds	r4, r4, r5		/* (out, c) = cl + lpl */
+	ldr	r5, [r0, #4]		/* load next rl */
+	str	r4, [r0], #4
+	adc	r4, ip, #0		/* cl = hpl + c */
+1:
+	mov	ip, #0			/* zero-extend rl */
+	umlal	r5, ip, r6, r3		/* (hpl, lpl) = ul * vl + rl */
+	subs	r2, r2, #1
+	bne	0b
+
+	adds	r4, r4, r5		/* (out, c) = cl + llpl */
+	str	r4, [r0]
+	adc	r0, ip, #0		/* return hpl + c */
+
+	pop	{ r4, r5, r6 }
+	DO_RET(lr)
+END(__mpn_addmul_1)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 26/26] arm: Add optimized add_n and sub_n
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (19 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 10/26] arm: Introduce and use LDST_PCREL Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27  3:17 ` [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8 Richard Henderson
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Written from scratch rather than copied from GMP, due to GPL 2.1 vs
GPL 3, but tested with the GMP testsuite.

This is 250% faster than the generic code as measured on Cortex-A15,
and the same speed as GMP on the same core, and probably everywhere.
---
	* sysdeps/arm/add_n.S: New file.
	* sysdeps/arm/sub_n.S: New file.
---
 ports/sysdeps/arm/add_n.S | 83 +++++++++++++++++++++++++++++++++++++++++++++++
 ports/sysdeps/arm/sub_n.S |  2 ++
 2 files changed, 85 insertions(+)
 create mode 100644 ports/sysdeps/arm/add_n.S
 create mode 100644 ports/sysdeps/arm/sub_n.S

diff --git a/ports/sysdeps/arm/add_n.S b/ports/sysdeps/arm/add_n.S
new file mode 100644
index 0000000..bbfb701
--- /dev/null
+++ b/ports/sysdeps/arm/add_n.S
@@ -0,0 +1,83 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+#ifdef USE_AS_SUB_N
+# define INITC	cmp r0, r0
+# define OPC	sbcs
+# define RETC	s(sbc) r0, r0, r0; s(neg) r0, r0
+# define FUNC	__mpn_sub_n
+#else
+# define INITC	cmn r0, #0
+# define OPC	adcs
+# define RETC	s(mov) r0, #0; s(adc) r0, r0, r0
+# define FUNC	__mpn_add_n
+#endif
+
+/* mp_limb_t mpn_add_n(res_ptr, src1_ptr, src2_ptr, size) */
+
+ENTRY(FUNC)
+	push	{ r4, r5, r6, r7, r8, r9, lr }
+	cfi_adjust_cfa_offset (28)
+	cfi_rel_offset (r4, 0)
+	cfi_rel_offset (r5, 4)
+	cfi_rel_offset (r6, 8)
+	cfi_rel_offset (r7, 12)
+	cfi_rel_offset (r8, 16)
+	cfi_rel_offset (r9, 20)
+	cfi_rel_offset (lr, 24)
+
+	INITC				/* initialize carry flag */
+	tst	r3, #1			/* count & 1 == 1? */
+	add	lr, r1, r3, lsl #2	/* compute end src1 */
+	beq	1f
+
+	ldr	r4, [r1], #4		/* do one to make count even */
+	ldr	r5, [r2], #4
+	OPC	r4, r4, r5
+	teq	r1, lr			/* end of count? (preserve carry) */
+	str	r4, [r0], #4
+	beq	9f
+1:
+	tst	r3, #2			/* count & 2 == 2?  */
+	beq	2f
+	ldm	r1!, { r4, r5 }		/* do two to make count 0 mod 4 */
+	ldm	r2!, { r6, r7 }
+	OPC	r4, r4, r6
+	OPC	r5, r5, r7
+	teq	r1, lr			/* end of count? */
+	stm	r0!, { r4, r5 }
+	beq	9f
+2:
+	ldm	r1!, { r3, r4, r5, r6 }	/* do four each loop */
+	ldm	r2!, { r7, r8, r9, ip }
+	OPC	r3, r3, r7
+	OPC	r4, r4, r8
+	OPC	r5, r5, r9
+	OPC	r6, r6, ip
+	teq	r1, lr
+	stm	r0!, { r3, r4, r5, r6 }
+	bne	2b
+
+9:
+	RETC				/* copy carry out */
+	pop	{ r4, r5, r6, r7, r8, r9, pc }
+END(FUNC)
diff --git a/ports/sysdeps/arm/sub_n.S b/ports/sysdeps/arm/sub_n.S
new file mode 100644
index 0000000..8eafa41
--- /dev/null
+++ b/ports/sysdeps/arm/sub_n.S
@@ -0,0 +1,2 @@
+#define USE_AS_SUB_N
+#include "add_n.S"
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27  7:04   ` Richard Henderson
  2013-02-27  3:17 ` [PATCH 14/26] arm: Use push/pop mnemonics Richard Henderson
                   ` (18 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Not recently speed tested vs the Linaro version, but having a
common set of algorithms for all the *chr routines is surely
worth something for maintanence.  But I do recall it was once
a few percent faster on A8.
---
	* sysdeps/arm/armv6t2/memchr.S: Rewrite.
---
 ports/sysdeps/arm/armv6t2/memchr.S | 216 +++++++++++++++++--------------------
 1 file changed, 101 insertions(+), 115 deletions(-)

diff --git a/ports/sysdeps/arm/armv6t2/memchr.S b/ports/sysdeps/arm/armv6t2/memchr.S
index 6d35f47..1739a4c 100644
--- a/ports/sysdeps/arm/armv6t2/memchr.S
+++ b/ports/sysdeps/arm/armv6t2/memchr.S
@@ -22,142 +22,128 @@
 @ and ARMv6T2 processors.  It has a fast path for short sizes, and has an
 @ optimised path for large data sets; the worst case is finding the match early
 @ in a large data set.
-@ Note: The use of cbz/cbnz means it's Thumb only
-
-@ 2011-07-15 david.gilbert@linaro.org
-@    Copy from Cortex strings release 21 and change license
-@ http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/src/linaro-a9/memchr.S
-@    Change function declarations/entry/exit
-@ 2011-12-01 david.gilbert@linaro.org
-@    Add some fixes from comments received (including use of ldrd instead ldm)
-@ 2011-12-07 david.gilbert@linaro.org
-@    Removed cbz from align loop - can't be taken
-
-@ this lets us check a flag in a 00/ff byte easily in either endianness
-#ifdef __ARMEB__
-#define CHARTSTMASK(c) 1<<(31-(c*8))
-#else
-#define CHARTSTMASK(c) 1<<(c*8)
-#endif
-	.syntax unified
 
+	.syntax unified
 	.text
-	.thumb
 
-@ ---------------------------------------------------------------------------
-	.thumb_func
-	.global memchr
-	.type memchr,%function
 ENTRY(memchr)
 	@ r0 = start of memory to scan
 	@ r1 = character to look for
 	@ r2 = length
 	@ returns r0 = pointer to character or NULL if not found
-	and	r1,r1,#0xff	@ Don't think we can trust the caller to actually pass a char
-
-	cmp	r2,#16		@ If it's short don't bother with anything clever
-	blt	20f
 
-	tst	r0, #7		@ If it's already aligned skip the next bit
-	beq	10f
+	uxtb	r1, r1
+	cmp	r2, #16			@ Is the buffer too short?
+	blo	.Lbuf_small
+	tst	r0, #7			@ Is the buffer already aligned?
+	beq	.Lbuf_aligned
 
 	@ Work up to an aligned point
-5:
-	ldrb	r3, [r0],#1
-	subs	r2, r2, #1
-	cmp	r3, r1
-	beq	50f		@ If it matches exit found
-	tst	r0, #7
-	bne	5b		@ If not aligned yet then do next byte
-
-10:
-	@ At this point, we are aligned, we know we have at least 8 bytes to work with
-	push	{r4,r5,r6,r7}
-	cfi_adjust_cfa_offset (16)
+0:	ldrb	r3, [r0], #1
+	s(sub)	r2, r2, #1
+	cmp	r3, r1			@ If found, adjust and return.
+	beq	.Lfound_minus1
+	tst	r0, #7			@ If not yet aligned, loop
+	bne	0b
+
+.Lbuf_aligned:
+	@ Here, we are aligned and we have at least 8 bytes to work with.
+	push	{ r4, r5 }
+	cfi_adjust_cfa_offset (8)
 	cfi_rel_offset (r4, 0)
 	cfi_rel_offset (r5, 4)
-	cfi_rel_offset (r6, 8)
-	cfi_rel_offset (r7, 12)
 
-	cfi_remember_state
-
-	orr	r1, r1, r1, lsl #8	@ expand the match word across to all bytes
+	orr	r1, r1, r1, lsl #8	@ Replicate C to all bytes
+	movw	ip, #0xfefe
 	orr	r1, r1, r1, lsl #16
-	bic	r4, r2, #7	@ Number of double words to work with * 8
-	mvns	r7, #0		@ all F's
-	movs	r3, #0
-
-15:
-	ldrd 	r5,r6, [r0],#8
-	subs	r4, r4, #8
-	eor	r5,r5, r1	@ Get it so that r5,r6 have 00's where the bytes match the target
-	eor	r6,r6, r1
-	uadd8	r5, r5, r7	@ Parallel add 0xff - sets the GE bits for anything that wasn't 0
-	sel	r5, r3, r7	@ bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
-	uadd8	r6, r6, r7	@ Parallel add 0xff - sets the GE bits for anything that wasn't 0
-	sel	r6, r5, r7	@ chained....bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
-	cbnz	r6, 60f
-	bne	15b		@ (Flags from the subs above) If not run out of bytes then go around again
-
-	pop	{r4,r5,r6,r7}
-	cfi_adjust_cfa_offset (-16)
+	movt	ip, #0xfefe
+
+1:	ldrd 	r4, r5, [r0], #8
+	s(eor)	r4, r4, r1		@ Convert C's to zeros
+	s(eor)	r5, r5, r1
+	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+	@ that was originally zero and 0xff otherwise.  Therefore we consider
+	@ the lsb of each byte the "found" bit, with 0 for a match.
+	uqadd8	r4, r4, ip
+	uqadd8	r5, r5, ip
+	cmp	r2, #8			@ Have we still a full 8 bytes left?
+	blo	.Lbuf_aligned_finish
+	s(and)	r5, r4, r4		@ Combine found bits between words.
+	mvns	r5, r5			@ Match within either word?
+	beq	1b
+
+	@ Here, we've found a match.  Disambiguate 1st or 2nd word.
+	mvns	r4, r4
+	itee	ne
+	subne	r0, r0, #8		@ Dec pointer to 1st word
+	subeq	r0, r0, #4		@ Dec pointer to 2nd word
+	moveq	r4, r5			@ Copy found bits from 2nd word.
+
+.Lfind_word:
+	@ Here we've found a match.
+	@ r0 = pointer to word containing the match
+	@ r4 = found bits for the word.
+#ifdef __ARMEL__
+	rbit	r4, r4			@ For LE we need count-trailing-zeros
+#endif
+	clz	r4, r4			@ Find the bit offset of the match.
+	add	r0, r0, r4, lsr #3	@ Adjust the pointer to the found byte
+
+	pop	{ r4, r5 }
+	cfi_remember_state
+	cfi_adjust_cfa_offset (-8)
 	cfi_restore (r4)
 	cfi_restore (r5)
-	cfi_restore (r6)
-	cfi_restore (r7)
-
-	and	r1,r1,#0xff	@ Get r1 back to a single character from the expansion above
-	and	r2,r2,#7	@ Leave the count remaining as the number after the double words have been done
-
-20:
-	cbz	r2, 40f		@ 0 length or hit the end already then not found
-
-21:  @ Post aligned section, or just a short call
-	ldrb	r3,[r0],#1
-	subs	r2,r2,#1
-	eor	r3,r3,r1	@ r3 = 0 if match - doesn't break flags from sub
-	cbz	r3, 50f
-	bne	21b		@ on r2 flags
-
-40:
-	movs	r0,#0		@ not found
-	DO_RET(lr)
+	bx	lr
 
-50:
-	subs	r0,r0,#1	@ found
-	DO_RET(lr)
-
-60:  @ We're here because the fast path found a hit - now we have to track down exactly which word it was
-     @ r0 points to the start of the double word after the one that was tested
-     @ r5 has the 00/ff pattern for the first word, r6 has the chained value
 	cfi_restore_state
-	cmp	r5, #0
-	itte	eq
-	moveq	r5, r6		@ the end is in the 2nd word
-	subeq	r0,r0,#3	@ Points to 2nd byte of 2nd word
-	subne	r0,r0,#7	@ or 2nd byte of 1st word
-
-	@ r0 currently points to the 2nd byte of the word containing the hit
-	tst	r5, # CHARTSTMASK(0)	@ 1st character
-	bne	61f
-	adds	r0,r0,#1
-	tst	r5, # CHARTSTMASK(1)	@ 2nd character
-	ittt	eq
-	addeq	r0,r0,#1
-	tsteq	r5, # (3<<15)		@ 2nd & 3rd character
-	@ If not the 3rd must be the last one
-	addeq	r0,r0,#1
-
-61:
-	pop	{r4,r5,r6,r7}
-	cfi_adjust_cfa_offset (-16)
+.Lbuf_aligned_finish:
+	@ Here we've read and computed found bits for 8 bytes, but not
+	@ all of those bytes are within the buffer.  Determine which
+	@ found bytes are really valid.
+	s(sub)	r0, r0, #8		@ Dec pointer to the 1st word
+	cmp	r2, #4			@ Do we have at least 4 bytes left?
+	blo	1f
+	mvns	r4, r4			@ Match within the 1st word?
+	bne	.Lfind_word
+	s(add)	r0, r0, #4		@ Inc pointer to the 2nd word
+	s(mvn)	r4, r5			@ Copy found bits from 2nd word
+	s(sub)	r2, r2, #4		@ Bytes remaining in 2nd word
+1:
+	lsls	r2, r2, #3		@ Convert remaining to bits
+	bne	2f			@ No bytes remaining?
+	mvn	r3, #0
+#ifdef __ARMEL__
+	s(lsl)	r3, r3, r2		@ Mask with 1s covering invalid bytes
+#else
+	s(lsr)	r3, r3, r2
+#endif
+	bics	r4, r4, r3		@ Clear found past end of buffer
+	bne	.Lfind_word
+2:
+	s(mov)	r0, #0			@ No found
+	pop	{ r4, r5 }
+	cfi_adjust_cfa_offset (-8)
 	cfi_restore (r4)
 	cfi_restore (r5)
-	cfi_restore (r6)
-	cfi_restore (r7)
-
-	subs	r0,r0,#1
-	DO_RET(lr)
+	bx	lr
+
+.Lbuf_small:
+	@ Here we've a small buffer to be searched a byte at a time.
+0:	ldrb	r3, [r0], #1
+	cmp	r3, r1			@ If found, adjust and return.
+	beq	.Lfound_minus1
+	subs	r2, r2, #1		@ Any bytes left?
+	bne	0b
+
+	s(mov)	r0, #0			@ Not found
+	bx	lr
+
+.Lfound_minus1:
+	@ Here we've found a match in a byte loop
+	@ r0 = pointer, post-incremented past the byte
+	s(sub)	r0, r0, #1
+	bx	lr
 
 END(memchr)
 libc_hidden_builtin_def (memchr)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 19/26] arm: Add optimized ffs for armv6t2
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 18/26] arm: Use GET_TLS more often Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27 15:51   ` Måns Rullgård
  2013-02-27 17:49   ` Roland McGrath
  2013-02-27  3:17 ` [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8 Richard Henderson
                   ` (19 subsequent siblings)
  28 siblings, 2 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Half the size of gcc 4.8's output.
---
	* sysdeps/arm/armv6t2/ffs.S: New file.
	* sysdeps/arm/armv6t2/ffsll.S: New file.
---
 ports/sysdeps/arm/armv6t2/ffs.S   | 34 +++++++++++++++++++++++++++
 ports/sysdeps/arm/armv6t2/ffsll.S | 49 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
 create mode 100644 ports/sysdeps/arm/armv6t2/ffs.S
 create mode 100644 ports/sysdeps/arm/armv6t2/ffsll.S

diff --git a/ports/sysdeps/arm/armv6t2/ffs.S b/ports/sysdeps/arm/armv6t2/ffs.S
new file mode 100644
index 0000000..765fb5d
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/ffs.S
@@ -0,0 +1,34 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+ENTRY(__ffs)
+	cmp	r0, #0
+	ittt	ne
+	rbitne	r0, r0
+	clzne	r0, r0
+	addne	r0, r0, #1
+	bx	lr
+END(__ffs)
+
+weak_alias (__ffs, ffs)
+weak_alias (__ffs, ffsl)
+libc_hidden_builtin_def (ffs)
diff --git a/ports/sysdeps/arm/armv6t2/ffsll.S b/ports/sysdeps/arm/armv6t2/ffsll.S
new file mode 100644
index 0000000..d428509
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/ffsll.S
@@ -0,0 +1,49 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+ENTRY(ffsll)
+	@ If low part is 0, operate on the high part.  Ensure that the
+	@ word on which we operate is in r0.  Set r2 to the bit offset
+	@ of the word being considered.  Set the flags for the word
+	@ being operated on.
+#ifdef __ARMEL__
+	cmp	r0, #0
+	itee	ne
+	movne	r2, #0
+	moveq	r2, #32
+	movseq	r0, r1
+#else
+	cmp	r1, #0
+	ittee	ne
+	movne	r2, #0
+	movne	r0, r1
+	moveq	r2, #32
+	cmpeq	r0, #0
+#endif
+	@ Perform the ffs on r0.
+	itttt	ne
+	rbitne	r0, r0
+	clzne	r0, r0
+	addne	r0, r0, #1
+	addne	r0, r0, r2
+	bx	lr
+END(ffsll)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 08/26] arm: Add IT insns for thumb mode
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (24 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 16/26] arm: Commonize BX conditionals Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28  0:41   ` Joseph S. Myers
  2013-02-27 15:41 ` [PATCH 00/26] ARM improvements Måns Rullgård
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

These are ignored by the assembler in ARM mode, so by
default this has no effect on generated code.
---
	* ports/sysdeps/arm/arm-mcount.S: Always use unified syntax and
	always add IT markup.
	* sysdeps/unix/sysv/linux/arm/mmap64.S (__mmap64): Likewise.
	* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Add IT markup.
	* sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
	* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/arm/mmap.S (__mmap): Likewise.
	* sysdeps/unix/sysv/linux/arm/syscall.S (syscall): Likewise.
	* sysdeps/unix/sysv/linux/arm/sysdep.h (PSEUDO_RET): Likewise.
	* sysdeps/unix/sysv/linux/arm/vfork.S (__vfork): Likewise.
---
 ports/sysdeps/arm/arm-mcount.S              | 9 ++-------
 ports/sysdeps/arm/dl-tlsdesc.S              | 1 +
 ports/sysdeps/unix/arm/sysdep.S             | 5 +++--
 ports/sysdeps/unix/sysv/linux/arm/clone.S   | 4 +++-
 ports/sysdeps/unix/sysv/linux/arm/mmap.S    | 1 +
 ports/sysdeps/unix/sysv/linux/arm/mmap64.S  | 6 +++++-
 ports/sysdeps/unix/sysv/linux/arm/syscall.S | 1 +
 ports/sysdeps/unix/sysv/linux/arm/sysdep.h  | 1 +
 ports/sysdeps/unix/sysv/linux/arm/vfork.S   | 1 +
 9 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/ports/sysdeps/arm/arm-mcount.S b/ports/sysdeps/arm/arm-mcount.S
index 6c24271..679d042 100644
--- a/ports/sysdeps/arm/arm-mcount.S
+++ b/ports/sysdeps/arm/arm-mcount.S
@@ -24,8 +24,8 @@
 
 #ifdef __thumb2__
 	.thumb
-	.syntax unified
 #endif
+	.syntax unified
 
 
 /* Use an assembly stub with a special ABI.  The calling lr has been
@@ -77,15 +77,10 @@ ENTRY(_mcount)
 	cfi_rel_offset (r3, 12)
 	cfi_rel_offset (fp, 16)
 	cfi_rel_offset (lr, 20)
-#ifdef __thumb2__
 	movs r0, fp
 	ittt ne
 	ldrne r0, [r0, #-4]
-#else
-	movs fp, fp
-	ldrne r0, [fp, #-4]
-#endif
-	movnes r1, lr
+	movsne r1, lr
 	blne __mcount_internal
 #ifdef __thumb2__
 	ldmia sp!, {r0, r1, r2, r3, fp, pc}
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 0ae3abb..c3e2b3e 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -116,6 +116,7 @@ _dl_tlsdesc_dynamic:
 	ldr	r3, [r1]
 	ldr	r2, [r0, r3, lsl #3]
 	cmn	r2, #1
+	ittt	ne
 	ldrne	r3, [r1, #4]
 	addne	r3, r2, r3
 	rsbne	r0, r4, r3
diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index 425f4ac..951642f 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -31,8 +31,9 @@ __syscall_error:
 	/* We translate the system's EWOULDBLOCK error into EAGAIN.
 	   The GNU C library always defines EWOULDBLOCK==EAGAIN.
 	   EWOULDBLOCK_sys is the original number.  */
-	cmp r0, $EWOULDBLOCK_sys /* Is it the old EWOULDBLOCK?  */
-	moveq r0, $EAGAIN	/* Yes; translate it to EAGAIN.  */
+	cmp	r0, $EWOULDBLOCK_sys /* Is it the old EWOULDBLOCK?  */
+	it	eq
+	moveq	r0, $EAGAIN	/* Yes; translate it to EAGAIN.  */
 #endif
 
 #ifndef IS_IN_rtld
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index 8807781..9de37f2 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -33,6 +33,7 @@
 ENTRY(__clone)
 	@ sanity check args
 	cmp	r0, #0
+	ite	ne
 	cmpne	r1, #0
 	moveq	r0, #-EINVAL
 	beq	PLTJMP(syscall_error)
@@ -76,8 +77,9 @@ PSEUDO_END (__clone)
 	GET_TLS
 	mov	r1, r0
 	tst	ip, #CLONE_VM
-	movne	r0, #-1
 	ldr	r7, =SYS_ify(getpid)
+	ite	ne
+	movne	r0, #-1
 	swieq	0x0
 	str	r0, [r1, #PID_OFFSET]
 	str	r0, [r1, #TID_OFFSET]
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap.S b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
index fa8a2b8..68560b0 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
@@ -51,6 +51,7 @@ ENTRY (__mmap)
 	cfi_restore (r5)
 
 	cmn	r0, $4096
+	it	cc
 	RETINSTR(cc, lr)
 	b	PLTJMP(syscall_error)
 
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
index 2eafd1b..dcbab3a 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
@@ -17,6 +17,8 @@
 
 #include <sysdep.h>
 
+	.syntax unified
+
 #define	EINVAL		22
 
 #ifdef __ARMEB__
@@ -42,7 +44,8 @@ ENTRY (__mmap64)
 	cfi_remember_state
 	movs	r4, ip, lsl $20		@ check that offset is page-aligned
 	mov	ip, ip, lsr $12
-	moveqs	r4, r5, lsr $12		@ check for overflow
+	it	eq
+	movseq	r4, r5, lsr $12		@ check for overflow
 	bne	.Linval
 	ldr	r4, [sp, $8]		@ load fd
 	orr	r5, ip, r5, lsl $20	@ compose page offset
@@ -52,6 +55,7 @@ ENTRY (__mmap64)
 	cfi_adjust_cfa_offset (-8)
 	cfi_restore (r4)
 	cfi_restore (r5)
+	it	cc
 	RETINSTR(cc, lr)
 	b	PLTJMP(syscall_error)
 
diff --git a/ports/sysdeps/unix/sysv/linux/arm/syscall.S b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
index c6dd57d..665ecb4 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/syscall.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
@@ -42,6 +42,7 @@ ENTRY (syscall)
 	cfi_restore (r6)
 	cfi_restore (r7)
 	cmn	r0, #4096
+	it	cc
 	RETINSTR(cc, lr)
 	b	PLTJMP(syscall_error)
 PSEUDO_END (syscall)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index dae9d98..c1f2c9e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -79,6 +79,7 @@
     cmn r0, $4096;
 
 #define PSEUDO_RET							      \
+    it cc;								      \
     RETINSTR(cc, lr);							      \
     b PLTJMP(SYSCALL_ERROR)
 #undef ret
diff --git a/ports/sysdeps/unix/sysv/linux/arm/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
index 4f84c57..ae931f7 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
@@ -51,6 +51,7 @@ ENTRY (__vfork)
 	RESTORE_PID
 #endif
 	cmn	a1, #4096
+	it	cc
 	RETINSTR(cc, lr)
 
 	b	PLTJMP(SYSCALL_ERROR)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 14/26] arm: Use push/pop mnemonics
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8 Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28  1:03   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 07/26] arm: Introduce and use GET_TLS Richard Henderson
                   ` (17 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

For arm this makes no difference--the result is bit-for-bit identical;
for thumb this results in smaller encodings.  Perhaps it ought not and
this is in fact an assembler bug, but I also think it's clearer.
---
	* sysdeps/arm/arm-mcount.S (_mcount): Use push/pop mnemonics.
	* sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
	* sysdeps/arm/dl-tlsdesc.S: Likewise.
	* sysdeps/arm/dl-trampoline.S: Likewise.
	* sysdeps/arm/start.S: Likewise.
	* sysdeps/arm/memcpy.S (PULL): Rename macro from pull.
	(PUSH): Rename macro from push.
	(memcpy): Use push/pop mnemonics.
	* sysdeps/arm/memmove.S: Similarly.
	* sysdeps/arm/sysdep.h (CALL_MCOUNT): Use push/pop mnemonics.
	* sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/mmap.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/mmap64.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c: Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c: Likewise.
	* sysdeps/unix/sysv/linux/arm/syscall.S: Likewise.
	* sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise.
	* sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.
---
 ports/sysdeps/arm/arm-mcount.S                     |  6 +--
 ports/sysdeps/arm/crti.S                           |  4 +-
 ports/sysdeps/arm/crtn.S                           |  8 +--
 ports/sysdeps/arm/dl-tlsdesc.S                     | 20 ++++----
 ports/sysdeps/arm/dl-trampoline.S                  |  4 +-
 ports/sysdeps/arm/memcpy.S                         | 58 +++++++++++-----------
 ports/sysdeps/arm/memmove.S                        | 58 +++++++++++-----------
 ports/sysdeps/arm/start.S                          | 10 ++--
 ports/sysdeps/arm/sysdep.h                         |  6 +--
 .../sysdeps/unix/sysv/linux/arm/____longjmp_chk.S  |  4 +-
 ports/sysdeps/unix/sysv/linux/arm/clone.S          |  4 +-
 ports/sysdeps/unix/sysv/linux/arm/mmap.S           |  8 +--
 ports/sysdeps/unix/sysv/linux/arm/mmap64.S         |  8 +--
 .../unix/sysv/linux/arm/nptl/sysdep-cancel.h       | 32 ++++++------
 .../unix/sysv/linux/arm/nptl/unwind-forcedunwind.c |  4 +-
 .../unix/sysv/linux/arm/nptl/unwind-resume.c       |  4 +-
 ports/sysdeps/unix/sysv/linux/arm/syscall.S        |  4 +-
 ports/sysdeps/unix/sysv/linux/arm/sysdep.h         | 27 +++++-----
 ports/sysdeps/unix/sysv/linux/arm/vfork.S          |  2 +-
 19 files changed, 135 insertions(+), 136 deletions(-)

diff --git a/ports/sysdeps/arm/arm-mcount.S b/ports/sysdeps/arm/arm-mcount.S
index 679d042..b6e5ec7 100644
--- a/ports/sysdeps/arm/arm-mcount.S
+++ b/ports/sysdeps/arm/arm-mcount.S
@@ -69,7 +69,7 @@ END(__gnu_mcount_nc)
    code be compiled with APCS frame pointers.  */
 
 ENTRY(_mcount)
-	stmdb sp!, {r0, r1, r2, r3, fp, lr}
+	push	{r0, r1, r2, r3, fp, lr}
 	cfi_adjust_cfa_offset (24)
 	cfi_rel_offset (r0, 0)
 	cfi_rel_offset (r1, 4)
@@ -83,9 +83,9 @@ ENTRY(_mcount)
 	movsne r1, lr
 	blne __mcount_internal
 #ifdef __thumb2__
-	ldmia sp!, {r0, r1, r2, r3, fp, pc}
+	pop	{r0, r1, r2, r3, fp, pc}
 #else
-	ldmia sp!, {r0, r1, r2, r3, fp, lr}
+	pop	{r0, r1, r2, r3, fp, lr}
 	cfi_adjust_cfa_offset (-24)
 	cfi_restore (r0)
 	cfi_restore (r1)
diff --git a/ports/sysdeps/arm/crti.S b/ports/sysdeps/arm/crti.S
index 1d55ae2..be20a11 100644
--- a/ports/sysdeps/arm/crti.S
+++ b/ports/sysdeps/arm/crti.S
@@ -80,7 +80,7 @@ call_weak_fn:
 	.globl _init
 	.type _init, %function
 _init:
-	stmfd sp!, {r3, lr}
+	push	{r3, lr}
 #if PREINIT_FUNCTION_WEAK
 	bl call_weak_fn
 #else
@@ -92,4 +92,4 @@ _init:
 	.globl _fini
 	.type _fini, %function
 _fini:
-	stmfd sp!, {r3, lr}
+	push	{r3, lr}
diff --git a/ports/sysdeps/arm/crtn.S b/ports/sysdeps/arm/crtn.S
index a01eb01..ae7546c 100644
--- a/ports/sysdeps/arm/crtn.S
+++ b/ports/sysdeps/arm/crtn.S
@@ -42,16 +42,16 @@
 
 	.section .init,"ax",%progbits
 #ifdef __ARM_ARCH_4T__
-	ldmfd sp!, {r3, lr}
+	pop {r3, lr}
 	bx lr
 #else
-	ldmfd sp!, {r3, pc}
+	pop {r3, pc}
 #endif
 
 	.section .fini,"ax",%progbits
 #ifdef __ARM_ARCH_4T__
-	ldmfd sp!, {r3, lr}
+	pop {r3, lr}
 	bx lr
 #else
-	ldmfd sp!, {r3, pc}
+	pop {r3, pc}
 #endif
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index c3e2b3e..15a0c21 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -52,12 +52,12 @@ _dl_tlsdesc_return:
 _dl_tlsdesc_undefweak:
 	@ Are we allowed a misaligned stack pointer calling read_tp?
 	.save	{lr}
-	stmdb 	sp!, {lr}
+	push	{lr}
 	cfi_adjust_cfa_offset (4)
 	cfi_rel_offset (lr,0)
 	bl 	__aeabi_read_tp
 	rsb 	r0, r0, #0
-	ldmia 	sp!, {lr}
+	pop	{lr}
 	cfi_adjust_cfa_offset (-4)
 	cfi_restore (lr)
 	BX	(lr)
@@ -99,7 +99,7 @@ _dl_tlsdesc_dynamic:
 	/* Our calling convention is to clobber r0, r1 and the processor
 	   flags.  All others that are modified must be saved */
 	.save	{r2,r3,r4,lr}
-	stmdb   sp!, {r2,r3,r4,lr}
+	push	{r2,r3,r4,lr}
 	cfi_adjust_cfa_offset (16)
 	cfi_rel_offset (r2,0)
 	cfi_rel_offset (r3,4)
@@ -124,7 +124,7 @@ _dl_tlsdesc_dynamic:
 1:	mov	r0, r1
 	bl	__tls_get_addr
 	rsb	r0, r4, r0
-2:	ldmia	sp!, {r2,r3,r4, lr}
+2:	pop	{r2,r3,r4, lr}
 	cfi_adjust_cfa_offset (-16)
 	cfi_restore (lr)
 	cfi_restore (r4)
@@ -155,7 +155,7 @@ _dl_tlsdesc_lazy_resolver:
 	cfi_adjust_cfa_offset (4)
 	cfi_rel_offset (r2, 0)
 	.save	{r0,r1,r3,ip,lr}
-	stmdb	sp!, {r0, r1, r3, ip, lr}
+	push	{r0, r1, r3, ip, lr}
 	cfi_adjust_cfa_offset (20)
 	cfi_rel_offset (r0, 0)
 	cfi_rel_offset (r1, 4)
@@ -163,14 +163,14 @@ _dl_tlsdesc_lazy_resolver:
 	cfi_rel_offset (ip, 12)
 	cfi_rel_offset (lr, 16)
 	bl	_dl_tlsdesc_lazy_resolver_fixup
-	ldmia	sp!, {r0, r1, r3, ip, lr}
+	pop	{r0, r1, r3, ip, lr}
 	cfi_adjust_cfa_offset (-20)
 	cfi_restore (lr)
 	cfi_restore (ip)
 	cfi_restore (r3)
 	cfi_restore (r1)
 	cfi_restore (r0)
-	ldmia	sp!, {r2}
+	pop	{r2}
 	cfi_adjust_cfa_offset (-4)
 	cfi_restore (r2)
 	ldr	r1, [r0, #4]
@@ -193,7 +193,7 @@ _dl_tlsdesc_resolve_hold:
 	cfi_adjust_cfa_offset (4)
 	cfi_rel_offset (r2, 0)
 	.save	{r0,r1,r3,ip,lr}
-	stmdb   sp!, {r0, r1, r3, ip, lr}
+	push	{r0, r1, r3, ip, lr}
 	cfi_adjust_cfa_offset (20)
 	cfi_rel_offset (r0, 0)
 	cfi_rel_offset (r1, 4)
@@ -202,14 +202,14 @@ _dl_tlsdesc_resolve_hold:
 	cfi_rel_offset (lr, 16)
 	adr	r2, _dl_tlsdesc_resolve_hold
 	bl	_dl_tlsdesc_resolve_hold_fixup
-	ldmia   sp!, {r0, r1, r3, ip, lr}
+	pop	{r0, r1, r3, ip, lr}
 	cfi_adjust_cfa_offset (-20)
 	cfi_restore (lr)
 	cfi_restore (ip)
 	cfi_restore (r3)
 	cfi_restore (r1)
 	cfi_restore (r0)
-	ldmia   sp!, {r2}
+	pop	{r2}
 	cfi_adjust_cfa_offset (-4)
 	cfi_restore (r2)
 	ldr     r1, [r0, #4]
diff --git a/ports/sysdeps/arm/dl-trampoline.S b/ports/sysdeps/arm/dl-trampoline.S
index b9769cb..a13a4c3 100644
--- a/ports/sysdeps/arm/dl-trampoline.S
+++ b/ports/sysdeps/arm/dl-trampoline.S
@@ -43,7 +43,7 @@ _dl_runtime_resolve:
 	@	lr points to &GOT[2]
 
 	@ Save arguments.  We save r4 to realign the stack.
-	stmdb	sp!,{r0-r4}
+	push	{r0-r4}
 	cfi_adjust_cfa_offset (20)
 	cfi_rel_offset (r0, 0)
 	cfi_rel_offset (r1, 4)
@@ -67,7 +67,7 @@ _dl_runtime_resolve:
 
 	@ get arguments and return address back.  We restore r4
 	@ only to realign the stack.
-	ldmia	sp!, {r0-r4,lr}
+	pop	{r0-r4,lr}
 	cfi_adjust_cfa_offset (-24)
 
 	@ jump to the newly found address
diff --git a/ports/sysdeps/arm/memcpy.S b/ports/sysdeps/arm/memcpy.S
index 98b9b47..98981ef 100644
--- a/ports/sysdeps/arm/memcpy.S
+++ b/ports/sysdeps/arm/memcpy.S
@@ -45,11 +45,11 @@
  * Endian independent macros for shifting bytes within registers.
  */
 #ifndef __ARMEB__
-#define pull            lsr
-#define push            lsl
+#define PULL            lsr
+#define PUSH            lsl
 #else
-#define pull            lsl
-#define push            lsr
+#define PULL            lsl
+#define PUSH            lsr
 #endif
 
 		.text
@@ -58,7 +58,7 @@
 
 ENTRY(memcpy)
 
-		stmfd	sp!, {r0, r4, lr}
+		push	{r0, r4, lr}
 		cfi_adjust_cfa_offset (12)
 		cfi_rel_offset (r4, 4)
 		cfi_rel_offset (lr, 8)
@@ -74,7 +74,7 @@ ENTRY(memcpy)
 		bne	10f
 
 1:		subs	r2, r2, #(28)
-		stmfd	sp!, {r5 - r8}
+		push	{r5 - r8}
 		cfi_adjust_cfa_offset (16)
 		cfi_rel_offset (r5, 0)
 		cfi_rel_offset (r6, 4)
@@ -131,7 +131,7 @@ ENTRY(memcpy)
 
 	CALGN(	bcs	2b			)
 
-7:		ldmfd	sp!, {r5 - r8}
+7:		pop	{r5 - r8}
 		cfi_adjust_cfa_offset (-16)
 		cfi_restore (r5)
 		cfi_restore (r6)
@@ -147,13 +147,13 @@ ENTRY(memcpy)
 		strcsb	ip, [r0]
 
 #if defined (__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
-		ldmfd	sp!, {r0, r4, lr}
+		pop	{r0, r4, lr}
 		cfi_adjust_cfa_offset (-12)
 		cfi_restore (r4)
 		cfi_restore (lr)
 		bx      lr
 #else
-		ldmfd	sp!, {r0, r4, pc}
+		pop	{r0, r4, pc}
 #endif
 
 		cfi_restore_state
@@ -189,7 +189,7 @@ ENTRY(memcpy)
 	CALGN(	subcc	r2, r2, ip		)
 	CALGN(	bcc	15f			)
 
-11:		stmfd	sp!, {r5 - r9}
+11:		push	{r5 - r9}
 		cfi_adjust_cfa_offset (20)
 		cfi_rel_offset (r5, 0)
 		cfi_rel_offset (r6, 4)
@@ -206,30 +206,30 @@ ENTRY(memcpy)
 
 12:	PLD(	pld	[r1, #124]		)
 13:		ldmia	r1!, {r4, r5, r6, r7}
-		mov	r3, lr, pull #\pull
+		mov	r3, lr, PULL #\pull
 		subs	r2, r2, #32
 		ldmia	r1!, {r8, r9, ip, lr}
-		orr	r3, r3, r4, push #\push
-		mov	r4, r4, pull #\pull
-		orr	r4, r4, r5, push #\push
-		mov	r5, r5, pull #\pull
-		orr	r5, r5, r6, push #\push
-		mov	r6, r6, pull #\pull
-		orr	r6, r6, r7, push #\push
-		mov	r7, r7, pull #\pull
-		orr	r7, r7, r8, push #\push
-		mov	r8, r8, pull #\pull
-		orr	r8, r8, r9, push #\push
-		mov	r9, r9, pull #\pull
-		orr	r9, r9, ip, push #\push
-		mov	ip, ip, pull #\pull
-		orr	ip, ip, lr, push #\push
+		orr	r3, r3, r4, PUSH #\push
+		mov	r4, r4, PULL #\pull
+		orr	r4, r4, r5, PUSH #\push
+		mov	r5, r5, PULL #\pull
+		orr	r5, r5, r6, PUSH #\push
+		mov	r6, r6, PULL #\pull
+		orr	r6, r6, r7, PUSH #\push
+		mov	r7, r7, PULL #\pull
+		orr	r7, r7, r8, PUSH #\push
+		mov	r8, r8, PULL #\pull
+		orr	r8, r8, r9, PUSH #\push
+		mov	r9, r9, PULL #\pull
+		orr	r9, r9, ip, PUSH #\push
+		mov	ip, ip, PULL #\pull
+		orr	ip, ip, lr, PUSH #\push
 		stmia	r0!, {r3, r4, r5, r6, r7, r8, r9, ip}
 		bge	12b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	13b			)
 
-		ldmfd	sp!, {r5 - r9}
+		pop	{r5 - r9}
 		cfi_adjust_cfa_offset (-20)
 		cfi_restore (r5)
 		cfi_restore (r6)
@@ -240,10 +240,10 @@ ENTRY(memcpy)
 14:		ands	ip, r2, #28
 		beq	16f
 
-15:		mov	r3, lr, pull #\pull
+15:		mov	r3, lr, PULL #\pull
 		ldr	lr, [r1], #4
 		subs	ip, ip, #4
-		orr	r3, r3, lr, push #\push
+		orr	r3, r3, lr, PUSH #\push
 		str	r3, [r0], #4
 		bgt	15b
 	CALGN(	cmp	r2, #0			)
diff --git a/ports/sysdeps/arm/memmove.S b/ports/sysdeps/arm/memmove.S
index 059ca7a..d9fa0e3 100644
--- a/ports/sysdeps/arm/memmove.S
+++ b/ports/sysdeps/arm/memmove.S
@@ -45,11 +45,11 @@
  * Endian independent macros for shifting bytes within registers.
  */
 #ifndef __ARMEB__
-#define pull            lsr
-#define push            lsl
+#define PULL            lsr
+#define PUSH            lsl
 #else
-#define pull            lsl
-#define push            lsr
+#define PULL            lsl
+#define PUSH            lsr
 #endif
 
 		.text
@@ -73,7 +73,7 @@ ENTRY(memmove)
 		bls	HIDDEN_JUMPTARGET(memcpy)
 #endif
 
-		stmfd	sp!, {r0, r4, lr}
+		push	{r0, r4, lr}
 		cfi_adjust_cfa_offset (12)
 		cfi_rel_offset (r4, 4)
 		cfi_rel_offset (lr, 8)
@@ -91,7 +91,7 @@ ENTRY(memmove)
 		bne	10f
 
 1:		subs	r2, r2, #(28)
-		stmfd	sp!, {r5 - r8}
+		push	{r5 - r8}
 		cfi_adjust_cfa_offset (16)
 		cfi_rel_offset (r5, 0)
 		cfi_rel_offset (r6, 4)
@@ -147,7 +147,7 @@ ENTRY(memmove)
 
 	CALGN(	bcs	2b			)
 
-7:		ldmfd	sp!, {r5 - r8}
+7:		pop	{r5 - r8}
 		cfi_adjust_cfa_offset (-16)
 		cfi_restore (r5)
 		cfi_restore (r6)
@@ -163,13 +163,13 @@ ENTRY(memmove)
 		strcsb	ip, [r0, #-1]
 
 #if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
-		ldmfd	sp!, {r0, r4, lr}
+		pop	{r0, r4, lr}
 		cfi_adjust_cfa_offset (-12)
 		cfi_restore (r4)
 		cfi_restore (lr)
 		bx      lr
 #else
-		ldmfd	sp!, {r0, r4, pc}
+		pop	{r0, r4, pc}
 #endif
 
 		cfi_restore_state
@@ -204,7 +204,7 @@ ENTRY(memmove)
 	CALGN(	subcc	r2, r2, ip		)
 	CALGN(	bcc	15f			)
 
-11:		stmfd	sp!, {r5 - r9}
+11:		push	{r5 - r9}
 		cfi_adjust_cfa_offset (20)
 		cfi_rel_offset (r5, 0)
 		cfi_rel_offset (r6, 4)
@@ -221,30 +221,30 @@ ENTRY(memmove)
 
 12:	PLD(	pld	[r1, #-128]		)
 13:		ldmdb   r1!, {r7, r8, r9, ip}
-		mov     lr, r3, push #\push
+		mov     lr, r3, PUSH #\push
 		subs    r2, r2, #32
 		ldmdb   r1!, {r3, r4, r5, r6}
-		orr     lr, lr, ip, pull #\pull
-		mov     ip, ip, push #\push
-		orr     ip, ip, r9, pull #\pull
-		mov     r9, r9, push #\push
-		orr     r9, r9, r8, pull #\pull
-		mov     r8, r8, push #\push
-		orr     r8, r8, r7, pull #\pull
-		mov     r7, r7, push #\push
-		orr     r7, r7, r6, pull #\pull
-		mov     r6, r6, push #\push
-		orr     r6, r6, r5, pull #\pull
-		mov     r5, r5, push #\push
-		orr     r5, r5, r4, pull #\pull
-		mov     r4, r4, push #\push
-		orr     r4, r4, r3, pull #\pull
+		orr     lr, lr, ip, PULL #\pull
+		mov     ip, ip, PUSH #\push
+		orr     ip, ip, r9, PULL #\pull
+		mov     r9, r9, PUSH #\push
+		orr     r9, r9, r8, PULL #\pull
+		mov     r8, r8, PUSH #\push
+		orr     r8, r8, r7, PULL #\pull
+		mov     r7, r7, PUSH #\push
+		orr     r7, r7, r6, PULL #\pull
+		mov     r6, r6, PUSH #\push
+		orr     r6, r6, r5, PULL #\pull
+		mov     r5, r5, PUSH #\push
+		orr     r5, r5, r4, PULL #\pull
+		mov     r4, r4, PUSH #\push
+		orr     r4, r4, r3, PULL #\pull
 		stmdb   r0!, {r4 - r9, ip, lr}
 		bge	12b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	13b			)
 
-		ldmfd	sp!, {r5 - r9}
+		pop	{r5 - r9}
 		cfi_adjust_cfa_offset (-20)
 		cfi_restore (r5)
 		cfi_restore (r6)
@@ -255,10 +255,10 @@ ENTRY(memmove)
 14:		ands	ip, r2, #28
 		beq	16f
 
-15:		mov     lr, r3, push #\push
+15:		mov     lr, r3, PUSH #\push
 		ldr	r3, [r1, #-4]!
 		subs	ip, ip, #4
-		orr	lr, lr, r3, pull #\pull
+		orr	lr, lr, r3, PULL #\pull
 		str	lr, [r0, #-4]!
 		bgt	15b
 	CALGN(	cmp	r2, #0			)
diff --git a/ports/sysdeps/arm/start.S b/ports/sysdeps/arm/start.S
index a1d15b8..0a57b0b 100644
--- a/ports/sysdeps/arm/start.S
+++ b/ports/sysdeps/arm/start.S
@@ -80,14 +80,14 @@ _start:
 	mov lr, #0
 
 	/* Pop argc off the stack and save a pointer to argv */
-	ldr a2, [sp], #4
+	pop { a2 }
 	mov a3, sp
 
 	/* Push stack limit */
-	str a3, [sp, #-4]!
+	push { a3 }
 
 	/* Push rtld_fini */
-	str a1, [sp, #-4]!
+	push { a1 }
 
 #ifdef SHARED
 	ldr sl, .L_GOT
@@ -97,7 +97,7 @@ _start:
 	ldr ip, .L_GOT+4	/* __libc_csu_fini */
 	ldr ip, [sl, ip]
 
-	str ip, [sp, #-4]!	/* Push __libc_csu_fini */
+	push { ip }		/* Push __libc_csu_fini */
 
 	ldr a4, .L_GOT+8	/* __libc_csu_init */
 	ldr a4, [sl, a4]
@@ -113,7 +113,7 @@ _start:
 	ldr ip, =__libc_csu_fini
 
 	/* Push __libc_csu_fini */
-	str ip, [sp, #-4]!
+	push { ip }
 
 	/* Set up the other arguments in registers */
 	ldr a1, =main
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 3459219..fed3dfd 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -77,7 +77,7 @@
 /* Call __gnu_mcount_nc if GCC >= 4.4.  */
 #if __GNUC_PREREQ(4,4)
 #define CALL_MCOUNT \
-  str	lr,[sp, #-4]!; \
+  push { lr }; \
   cfi_adjust_cfa_offset (4); \
   cfi_rel_offset (lr, 0); \
   bl PLTJMP(mcount); \
@@ -85,11 +85,11 @@
   cfi_restore (lr)
 #else /* else call _mcount */
 #define CALL_MCOUNT \
-  str	lr,[sp, #-4]!; \
+  push { lr }; \
   cfi_adjust_cfa_offset (4); \
   cfi_rel_offset (lr, 0); \
   bl PLTJMP(mcount); \
-  ldr lr, [sp], #4; \
+  pop { lr }; \
   cfi_adjust_cfa_offset (-4); \
   cfi_restore (lr)
 #endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
index 29edec6..6ee7a1a 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
@@ -53,7 +53,7 @@ longjmp_msg:
 	cfi_remember_state;			\
 	cmp	sp, reg;			\
 	bls	.Lok;				\
-	str	r7, [sp, #-4]!;			\
+	push	{ r7 };				\
 	cfi_adjust_cfa_offset (4);		\
 	cfi_rel_offset (r7, 0);			\
 	mov	r5, r0;				\
@@ -79,7 +79,7 @@ longjmp_msg:
 .Lfail:						\
 	add	sp, sp, #12;			\
 	cfi_adjust_cfa_offset (-12);		\
-	ldr	r7, [sp], #4;			\
+	pop	{ r7 };				\
 	cfi_adjust_cfa_offset (-4);		\
 	cfi_restore (r7);			\
 	CALL_FAIL				\
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index 58ee7b4..2e8c61e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -49,7 +49,7 @@ ENTRY(__clone)
 	mov	ip, r2
 #endif
 	@ new sp is already in r1
-	stmfd	sp!, {r4, r7}
+	push	{r4, r7}
 	cfi_adjust_cfa_offset (8)
 	cfi_rel_offset (r4, 0)
 	cfi_rel_offset (r7, 4)
@@ -61,7 +61,7 @@ ENTRY(__clone)
 	cfi_endproc
 	cmp	r0, #0
 	beq	1f
-	ldmfd	sp!, {r4, r7}
+	pop	{r4, r7}
 	blt	PLTJMP(C_SYMBOL_NAME(__syscall_error))
 	RETINSTR(, lr)
 
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap.S b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
index 68560b0..06b737e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
@@ -23,11 +23,11 @@
 
 ENTRY (__mmap)
 	/* shuffle args */
-	str	r5, [sp, #-4]!
+	push	{ r5 }
 	cfi_adjust_cfa_offset (4)
 	cfi_rel_offset (r5, 0)
 	ldr	r5, [sp, #8]
-	str	r4, [sp, #-4]!
+	push	{ r4 }
 	cfi_adjust_cfa_offset (4)
 	cfi_rel_offset (r4, 0)
 	cfi_remember_state
@@ -43,10 +43,10 @@ ENTRY (__mmap)
 
 	/* restore registers */
 2:
-	ldr	r4, [sp], #4
+	pop	{ r4 }
 	cfi_adjust_cfa_offset (-4)
 	cfi_restore (r4)
-	ldr	r5, [sp], #4
+	pop	{ r5 }
 	cfi_adjust_cfa_offset (-4)
 	cfi_restore (r5)
 
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
index dcbab3a..d039129 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
@@ -34,11 +34,11 @@
 	.text
 ENTRY (__mmap64)
 	ldr	ip, [sp, $LOW_OFFSET]
-	str	r5, [sp, #-4]!
+	push	{ r5 }
 	cfi_adjust_cfa_offset (4)
 	cfi_rel_offset (r5, 0)
 	ldr	r5, [sp, $HIGH_OFFSET]
-	str	r4, [sp, #-4]!
+	push	{ r4 }
 	cfi_adjust_cfa_offset (4)
 	cfi_rel_offset (r4, 0)
 	cfi_remember_state
@@ -51,7 +51,7 @@ ENTRY (__mmap64)
 	orr	r5, ip, r5, lsl $20	@ compose page offset
 	DO_CALL (mmap2, 0)
 	cmn	r0, $4096
-	ldmfd	sp!, {r4, r5}
+	pop	{r4, r5}
 	cfi_adjust_cfa_offset (-8)
 	cfi_restore (r4)
 	cfi_restore (r5)
@@ -62,7 +62,7 @@ ENTRY (__mmap64)
 	cfi_restore_state
 .Linval:
 	mov	r0, $-EINVAL
-	ldmfd	sp!, {r4, r5}
+	pop	{r4, r5}
 	cfi_adjust_cfa_offset (-8)
 	cfi_restore (r4)
 	cfi_restore (r5)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index 0c9e780..f0f7043 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -80,19 +80,19 @@
 
 # define DOCARGS_0 \
   .save {r7}; \
-  str lr, [sp, #-4]!; \
+  push { lr }; \
   cfi_adjust_cfa_offset (4); \
   cfi_rel_offset (lr, 0); \
   .save {lr}
 # define UNDOCARGS_0
 # define RESTORE_LR_0 \
-  ldr lr, [sp], #4; \
+  pop { lr }; \
   cfi_adjust_cfa_offset (-4); \
   cfi_restore (lr)
 
 # define DOCARGS_1 \
   .save {r7}; \
-  stmfd sp!, {r0, r1, lr}; \
+  push {r0, r1, lr}; \
   cfi_adjust_cfa_offset (12); \
   cfi_rel_offset (lr, 8); \
   .save {lr}; \
@@ -106,13 +106,13 @@
 
 # define DOCARGS_2 \
   .save {r7}; \
-  stmfd sp!, {r0, r1, lr}; \
+  push {r0, r1, lr}; \
   cfi_adjust_cfa_offset (12); \
   cfi_rel_offset (lr, 8); \
   .save {lr}; \
   .pad #8
 # define UNDOCARGS_2 \
-  ldmfd sp!, {r0, r1}; \
+  pop {r0, r1}; \
   cfi_adjust_cfa_offset (-8); \
   RESTART_UNWIND
 # define RESTORE_LR_2 \
@@ -120,13 +120,13 @@
 
 # define DOCARGS_3 \
   .save {r7}; \
-  stmfd sp!, {r0, r1, r2, r3, lr}; \
+  push {r0, r1, r2, r3, lr}; \
   cfi_adjust_cfa_offset (20); \
   cfi_rel_offset (lr, 16); \
   .save {lr}; \
   .pad #16
 # define UNDOCARGS_3 \
-  ldmfd sp!, {r0, r1, r2, r3}; \
+  pop {r0, r1, r2, r3}; \
   cfi_adjust_cfa_offset (-16); \
   RESTART_UNWIND
 # define RESTORE_LR_3 \
@@ -134,13 +134,13 @@
 
 # define DOCARGS_4 \
   .save {r7}; \
-  stmfd sp!, {r0, r1, r2, r3, lr}; \
+  push {r0, r1, r2, r3, lr}; \
   cfi_adjust_cfa_offset (20); \
   cfi_rel_offset (lr, 16); \
   .save {lr}; \
   .pad #16
 # define UNDOCARGS_4 \
-  ldmfd sp!, {r0, r1, r2, r3}; \
+  pop {r0, r1, r2, r3}; \
   cfi_adjust_cfa_offset (-16); \
   RESTART_UNWIND
 # define RESTORE_LR_4 \
@@ -149,13 +149,13 @@
 /* r4 is only stmfd'ed for correct stack alignment.  */
 # define DOCARGS_5 \
   .save {r4, r7}; \
-  stmfd sp!, {r0, r1, r2, r3, r4, lr}; \
+  push {r0, r1, r2, r3, r4, lr}; \
   cfi_adjust_cfa_offset (24); \
   cfi_rel_offset (lr, 20); \
   .save {lr}; \
   .pad #20
 # define UNDOCARGS_5 \
-  ldmfd sp!, {r0, r1, r2, r3}; \
+  pop {r0, r1, r2, r3}; \
   cfi_adjust_cfa_offset (-16); \
   .fnend; \
   .fnstart; \
@@ -163,20 +163,20 @@
   .save {lr}; \
   .pad #4
 # define RESTORE_LR_5 \
-  ldmfd sp!, {r4, lr}; \
+  pop {r4, lr}; \
   cfi_adjust_cfa_offset (-8); \
   /* r4 will be marked as restored later.  */ \
   cfi_restore (lr)
 
 # define DOCARGS_6 \
   .save {r4, r5, r7}; \
-  stmfd sp!, {r0, r1, r2, r3, lr}; \
+  push {r0, r1, r2, r3, lr}; \
   cfi_adjust_cfa_offset (20); \
   cfi_rel_offset (lr, 16); \
   .save {lr}; \
   .pad #16
 # define UNDOCARGS_6 \
-  ldmfd sp!, {r0, r1, r2, r3}; \
+  pop {r0, r1, r2, r3}; \
   cfi_adjust_cfa_offset (-16); \
   .fnend; \
   .fnstart; \
@@ -217,13 +217,13 @@ extern int __local_multiple_threads attribute_hidden;
 				   header.multiple_threads) == 0, 1)
 #  else
 #   define SINGLE_THREAD_P						\
-	stmfd	sp!, {r0, lr};						\
+	push	{r0, lr};						\
 	cfi_adjust_cfa_offset (8);					\
 	cfi_rel_offset (lr, 4);						\
 	bl	__aeabi_read_tp;					\
 	NEGOFF_ADJ_BASE(r0, MULTIPLE_THREADS_OFFSET);			\
 	ldr	ip, NEGOFF_OFF1(r0, MULTIPLE_THREADS_OFFSET);		\
-	ldmfd	sp!, {r0, lr};						\
+	pop	{r0, lr};						\
 	cfi_adjust_cfa_offset (-8);					\
 	cfi_restore (lr);						\
 	teq	ip, #0
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
index 58ca9ac..44a0dc9 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
@@ -90,7 +90,7 @@ asm (
 "_Unwind_Resume:\n"
 "	.cfi_sections .debug_frame\n"
 "	" CFI_STARTPROC "\n"
-"	stmfd	sp!, {r4, r5, r6, lr}\n"
+"	push	{r4, r5, r6, lr}\n"
 "	" CFI_ADJUST_CFA_OFFSET (16)" \n"
 "	" CFI_REL_OFFSET (r4, 0) "\n"
 "	" CFI_REL_OFFSET (r5, 4) "\n"
@@ -105,7 +105,7 @@ asm (
 "	cmp	r3, #0\n"
 "	beq	4f\n"
 "5:	mov	r0, r6\n"
-"	ldmfd	sp!, {r4, r5, r6, lr}\n"
+"	pop	{r4, r5, r6, lr}\n"
 "	" CFI_ADJUST_CFA_OFFSET (-16) "\n"
 "	" CFI_RESTORE (r4) "\n"
 "	" CFI_RESTORE (r5) "\n"
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c
index 0a3ad95..4c15827 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c
@@ -53,7 +53,7 @@ asm (
 "_Unwind_Resume:\n"
 "	.cfi_sections .debug_frame\n"
 "	" CFI_STARTPROC "\n"
-"	stmfd	sp!, {r4, r5, r6, lr}\n"
+"	push	{r4, r5, r6, lr}\n"
 "	" CFI_ADJUST_CFA_OFFSET (16)" \n"
 "	" CFI_REL_OFFSET (r4, 0) "\n"
 "	" CFI_REL_OFFSET (r5, 4) "\n"
@@ -68,7 +68,7 @@ asm (
 "	cmp	r3, #0\n"
 "	beq	4f\n"
 "5:	mov	r0, r6\n"
-"	ldmfd	sp!, {r4, r5, r6, lr}\n"
+"	pop	{r4, r5, r6, lr}\n"
 "	" CFI_ADJUST_CFA_OFFSET (-16) "\n"
 "	" CFI_RESTORE (r4) "\n"
 "	" CFI_RESTORE (r5) "\n"
diff --git a/ports/sysdeps/unix/sysv/linux/arm/syscall.S b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
index 665ecb4..bdd5a52 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/syscall.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
@@ -23,7 +23,7 @@
 
 ENTRY (syscall)
 	mov	ip, sp
-	stmfd	sp!, {r4, r5, r6, r7}
+	push	{r4, r5, r6, r7}
 	cfi_adjust_cfa_offset (16)
 	cfi_rel_offset (r4, 0)
 	cfi_rel_offset (r5, 4)
@@ -35,7 +35,7 @@ ENTRY (syscall)
 	mov	r2, r3
 	ldmfd	ip, {r3, r4, r5, r6}
 	swi	0x0
-	ldmfd	sp!, {r4, r5, r6, r7}
+	pop	{r4, r5, r6, r7}
 	cfi_adjust_cfa_offset (-16)
 	cfi_restore (r4)
 	cfi_restore (r5)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index e448e61..f77af7f 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -132,23 +132,22 @@ __local_syscall_error:						\
 # else
 #  if defined(__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
 #   define POP_PC \
-  ldr lr, [sp], #4; \
+  pop { lr }; \
   cfi_adjust_cfa_offset (-4); \
   cfi_restore (lr); \
   bx lr
 #  else
-#   define POP_PC  \
-  ldr pc, [sp], #4
+#   define POP_PC  pop { pc }
 #  endif
 #  define SYSCALL_ERROR_HANDLER					\
 __local_syscall_error:						\
-	str	lr, [sp, #-4]!;					\
+	push	{ lr };						\
 	cfi_adjust_cfa_offset (4);				\
 	cfi_rel_offset (lr, 0);					\
-	str	r0, [sp, #-4]!;					\
+	push	{ r0 };	    					\
 	cfi_adjust_cfa_offset (4);				\
 	bl	PLTJMP(C_SYMBOL_NAME(__errno_location)); 	\
-	ldr	r1, [sp], #4;					\
+	pop	{ r1 };						\
 	cfi_adjust_cfa_offset (-4);				\
 	rsb	r1, r1, #0;					\
 	str	r1, [r0];					\
@@ -215,7 +214,7 @@ __local_syscall_error:						\
 #undef  DOARGS_0
 #define DOARGS_0 \
   .fnstart; \
-  str r7, [sp, #-4]!; \
+  push { r7 }; \
   cfi_adjust_cfa_offset (4); \
   cfi_rel_offset (r7, 0); \
   .save { r7 }
@@ -230,7 +229,7 @@ __local_syscall_error:						\
 #undef  DOARGS_5
 #define DOARGS_5 \
   .fnstart; \
-  stmfd sp!, {r4, r7}; \
+  push {r4, r7}; \
   cfi_adjust_cfa_offset (8); \
   cfi_rel_offset (r4, 0); \
   cfi_rel_offset (r7, 4); \
@@ -240,7 +239,7 @@ __local_syscall_error:						\
 #define DOARGS_6 \
   .fnstart; \
   mov ip, sp; \
-  stmfd sp!, {r4, r5, r7}; \
+  push {r4, r5, r7}; \
   cfi_adjust_cfa_offset (12); \
   cfi_rel_offset (r4, 0); \
   cfi_rel_offset (r5, 4); \
@@ -251,7 +250,7 @@ __local_syscall_error:						\
 #define DOARGS_7 \
   .fnstart; \
   mov ip, sp; \
-  stmfd sp!, {r4, r5, r6, r7}; \
+  push {r4, r5, r6, r7}; \
   cfi_adjust_cfa_offset (16); \
   cfi_rel_offset (r4, 0); \
   cfi_rel_offset (r5, 4); \
@@ -262,7 +261,7 @@ __local_syscall_error:						\
 
 #undef  UNDOARGS_0
 #define UNDOARGS_0 \
-  ldr r7, [sp], #4; \
+  pop { r7 }; \
   cfi_adjust_cfa_offset (-4); \
   cfi_restore (r7); \
   .fnend
@@ -276,14 +275,14 @@ __local_syscall_error:						\
 #define UNDOARGS_4 UNDOARGS_0
 #undef  UNDOARGS_5
 #define UNDOARGS_5 \
-  ldmfd sp!, {r4, r7}; \
+  pop {r4, r7}; \
   cfi_adjust_cfa_offset (-8); \
   cfi_restore (r4); \
   cfi_restore (r7); \
   .fnend
 #undef  UNDOARGS_6
 #define UNDOARGS_6 \
-  ldmfd sp!, {r4, r5, r7}; \
+  pop {r4, r5, r7}; \
   cfi_adjust_cfa_offset (-12); \
   cfi_restore (r4); \
   cfi_restore (r5); \
@@ -291,7 +290,7 @@ __local_syscall_error:						\
   .fnend
 #undef  UNDOARGS_7
 #define UNDOARGS_7 \
-  ldmfd sp!, {r4, r5, r6, r7}; \
+  pop {r4, r5, r6, r7}; \
   cfi_adjust_cfa_offset (-16); \
   cfi_restore (r4); \
   cfi_restore (r5); \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
index ae931f7..128a640 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
@@ -37,7 +37,7 @@ ENTRY (__vfork)
 	mov	ip, r7
 	cfi_register (r7, ip)
 	.fnstart
-	str r7, [sp, #-4]!
+	push	{ r7 }
 	cfi_adjust_cfa_offset (4)
 	.save { r7 }
 	ldr	r7, =SYS_ify (vfork)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 07/26] arm: Introduce and use GET_TLS
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 14/26] arm: Use push/pop mnemonics Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28  0:34   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 20/26] arm: Implement armv6t2 optimized strlen Richard Henderson
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Factor out the sequence needed to call kuser_get_tls,
as we can't play subtract into pc games in thumb mode.
---
	* sysdeps/unix/sysv/linux/arm/sysdep.h (GET_TLS): New macro.
	* sysdeps/unix/arm/sysdep.S (__syscall_error): Use it.
	* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S (SAVE_PID): Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/vfork.S (SAVE_PID): Likewise.
	* sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S (__aeabi_read_tp):
	Add thumb2 alternative.
---
 ports/sysdeps/unix/arm/sysdep.S                   |  4 +---
 ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S |  6 ++++++
 ports/sysdeps/unix/sysv/linux/arm/clone.S         |  4 +---
 ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S |  4 +---
 ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S    |  4 +---
 ports/sysdeps/unix/sysv/linux/arm/sysdep.h        | 15 +++++++++++++++
 6 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index 76137b3..425f4ac 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -40,9 +40,7 @@ __syscall_error:
 	cfi_register (lr, ip)
 	mov r1, r0
 
-	mov r0, #0xffff0fff
-	mov lr, pc
-	sub pc, r0, #31
+	GET_TLS
 
 	ldr r2, 1f
 2:	ldr r2, [pc, r2]
diff --git a/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S b/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S
index c4ddbc6..ecdc322 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S
@@ -41,6 +41,12 @@
 
 	.hidden __aeabi_read_tp
 ENTRY (__aeabi_read_tp)
+#ifdef __thumb2__
+	movw	r0, #0x0fe0
+	movt	r0, #0xffff
+	bx	r0
+#else
 	mov	r0, #0xffff0fff
 	sub	pc, r0, #31
+#endif
 END (__aeabi_read_tp)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index de25db1..8807781 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -73,9 +73,7 @@ PSEUDO_END (__clone)
 #ifdef RESET_PID
 	tst	ip, #CLONE_THREAD
 	bne	3f
-	mov	r0, #0xffff0fff
-	mov	lr, pc
-	sub	pc, r0, #31
+	GET_TLS
 	mov	r1, r0
 	tst	ip, #CLONE_VM
 	movne	r0, #-1
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
index a38d564..749aaab 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
@@ -22,9 +22,7 @@
 	str	lr, [sp, #-4]!;		/* Save LR.  */			\
 	cfi_adjust_cfa_offset (4);					\
 	cfi_rel_offset (lr, 0);						\
-	mov	r0, #0xffff0fff;	/* Point to the high page.  */	\
-	mov	lr, pc;			/* Save our return address.  */	\
-	sub	pc, r0, #31;		/* Jump to the TLS entry.  */	\
+	GET_TLS;							\
 	ldr	lr, [sp], #4;		/* Restore LR.  */		\
 	cfi_adjust_cfa_offset (-4);					\
 	cfi_restore (lr);						\
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
index 3fce2d1..1bbe5c6 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
@@ -22,9 +22,7 @@
 	str	lr, [sp, #-4]!;		/* Save LR.  */			\
 	cfi_adjust_cfa_offset (4);					\
 	cfi_rel_offset (lr, 0);						\
-	mov	r0, #0xffff0fff;	/* Point to the high page.  */	\
-	mov	lr, pc;			/* Save our return address.  */	\
-	sub	pc, r0, #31;		/* Jump to the TLS entry.  */	\
+	GET_TLS;							\
 	ldr	lr, [sp], #4;		/* Restore LR.  */		\
 	cfi_adjust_cfa_offset (-4);					\
 	cfi_restore (lr);						\
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index cb237d9..dae9d98 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -45,6 +45,21 @@
 
 #ifdef __ASSEMBLER__
 
+/* Call the linux kernel kuser_get_tls helper.  Returns in R0, clobbers LR.
+   Note that in thumb mode, a constant pool break is often out of range, so
+   we always expand the constant inline.  */
+#ifdef __thumb2__
+# define GET_TLS \
+	movw	r0, #0x0fe0;	\
+	movt	r0, #0xffff;	\
+	blx r0
+#else
+# define GET_TLS \
+	mov	r0, #0xffff0fff;	/* Point to the high page.  */	\
+	mov	lr, pc;			/* Save our return address.  */	\
+	sub	pc, r0, #31		/* Jump to the TLS entry.  */
+#endif
+
 /* Linux uses a negative return value to indicate syscall errors,
    unlike most Unices, which use the condition codes' carry flag.
 
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 10/26] arm: Introduce and use LDST_PCREL
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (18 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 06/26] arm: Use pc_ofs Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28  1:00   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 26/26] arm: Add optimized add_n and sub_n Richard Henderson
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Macro-ising the few instances where we need to distinguish between
arm and thumb pc-relative memory operations.
---
	* sysdeps/arm/sysdep.h (LDST_PCREL): New macro.
	* sysdeps/unix/arm/sysdep.S (__syscall_error): Use LDST_PCREL.
	Fix up gottpoff load of errno for thumb2.
	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
	(SINGLE_THREAD_P): Use LDST_PCREL.
	(PSEUDO_PROLOGUE): Remove.
	(PSEUDO): Don't use it.
	* sysdeps/unix/sysv/linux/arm/sysdep.h (SYSCALL_ERROR_HANDLER):
	Use LDST_PCREL.
---
 ports/sysdeps/arm/sysdep.h                         | 18 +++++++++++++
 ports/sysdeps/unix/arm/sysdep.S                    | 30 +++++++++++-----------
 .../unix/sysv/linux/arm/nptl/sysdep-cancel.h       | 10 ++------
 ports/sysdeps/unix/sysv/linux/arm/sysdep.h         | 10 +++-----
 4 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 4a9f05a..b7ba9b1 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -129,4 +129,22 @@
 # define pc_ofs		8
 #endif
 
+/* Load or store to/from a pc-relative EXPR into/from R, using T.  */
+#ifdef __thumb2__
+# define LDST_PCREL(OP, R, T, EXPR) \
+	ldr	T, 98f;					\
+	.subsection 2;					\
+98:	.word	EXPR - 99f - pc_ofs;			\
+	.previous;					\
+99:	add	T, T, pc;				\
+	OP	R, [T]
+#else
+# define LDST_PCREL(OP, R, T, EXPR) \
+	ldr	T, 98f;					\
+	.subsection 2;					\
+98:	.word	EXPR - 99f - pc_ofs;			\
+	.previous;					\
+99:	OP	R, [pc, T]
+#endif
+
 #endif	/* __ASSEMBLER__ */
diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index 951642f..969628a 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -37,26 +37,26 @@ __syscall_error:
 #endif
 
 #ifndef IS_IN_rtld
-	mov ip, lr
+	mov	ip, lr
 	cfi_register (lr, ip)
-	mov r1, r0
-
+	mov	r1, r0
 	GET_TLS
+	ldr	r2, 1f
+#ifdef __thumb__
+2:	add	r2, r2, pc
+	ldr	r2, [r2]
+#else
+2:	ldr	r2, [pc, r2]
+#endif
+	str	r1, [r0, r2]
+	mvn	r0, #0
+	DO_RET(ip)
 
-	ldr r2, 1f
-2:	ldr r2, [pc, r2]
-	str r1, [r0, r2]
-	mvn r0, #0
-	RETINSTR (, ip)
-
-1:	.word errno(gottpoff) + (. - 2b - pc_ofs)
+1:	.word	errno(gottpoff) + (. - 2b - pc_ofs)
 #elif RTLD_PRIVATE_ERRNO
-	ldr r1, 1f
-0:	str r0, [pc, r1]
-	mvn r0, $0
+	LDST_PCREL(str, r0, r1, C_SYMBOL_NAME(rtld_errno))
+	mvn	r0, #0
 	DO_RET(r14)
-
-1:	.word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs
 #else
 #error "Unsupported non-TLS case"
 #endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index 1745f9e..b6dc3e0 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -31,7 +31,6 @@
 # undef PSEUDO
 # define PSEUDO(name, syscall_name, args)				\
   .section ".text";							\
-    PSEUDO_PROLOGUE;							\
   .type __##syscall_name##_nocancel,%function;				\
   .globl __##syscall_name##_nocancel;					\
   __##syscall_name##_nocancel:						\
@@ -207,12 +206,8 @@ extern int __local_multiple_threads attribute_hidden;
 #   define SINGLE_THREAD_P __builtin_expect (__local_multiple_threads == 0, 1)
 #  else
 #   define SINGLE_THREAD_P						\
-  ldr ip, 1b;								\
-2:									\
-  ldr ip, [pc, ip];							\
-  teq ip, #0;
-#   define PSEUDO_PROLOGUE						\
-  1:  .word __local_multiple_threads - 2f - pc_ofs;
+	LDST_PCREL(ldr, ip, ip, __local_multiple_threads);		\
+	teq ip, #0
 #  endif
 # else
 /*  There is no __local_multiple_threads for librt, so use the TCB.  */
@@ -221,7 +216,6 @@ extern int __local_multiple_threads attribute_hidden;
   __builtin_expect (THREAD_GETMEM (THREAD_SELF,				\
 				   header.multiple_threads) == 0, 1)
 #  else
-#   define PSEUDO_PROLOGUE
 #   define SINGLE_THREAD_P						\
   stmfd	sp!, {r0, lr};							\
   cfi_adjust_cfa_offset (8);						\
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index c1f2c9e..e448e61 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -125,12 +125,10 @@
 # if RTLD_PRIVATE_ERRNO
 #  define SYSCALL_ERROR_HANDLER					\
 __local_syscall_error:						\
-       ldr     r1, 1f;						\
-       rsb     r0, r0, #0;					\
-0:     str     r0, [pc, r1];					\
-       mvn     r0, #0;						\
-       DO_RET(lr);						\
-1:     .word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs;
+	rsb	r0, r0, #0;					\
+	LDST_PCREL(str, r0, r1, C_SYMBOL_NAME(rtld_errno));	\
+	mvn	r0, #0;						\
+	DO_RET(lr)
 # else
 #  if defined(__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
 #   define POP_PC \
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 21/26] arm: Implement armv6t2 optimized strcpy
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27  3:17 ` [PATCH 24/26] arm: Add optimized addmul_1 Richard Henderson
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Four times faster than the byte-by-byte default version.
---
	* sysdeps/arm/armv6t2/strcpy.S: New file.
	* sysdeps/arm/armv6t2/stpcpy.S: New file.
---
 ports/sysdeps/arm/armv6t2/stpcpy.S |   1 +
 ports/sysdeps/arm/armv6t2/strcpy.S | 213 +++++++++++++++++++++++++++++++++++++
 2 files changed, 214 insertions(+)
 create mode 100644 ports/sysdeps/arm/armv6t2/stpcpy.S
 create mode 100644 ports/sysdeps/arm/armv6t2/strcpy.S

diff --git a/ports/sysdeps/arm/armv6t2/stpcpy.S b/ports/sysdeps/arm/armv6t2/stpcpy.S
new file mode 100644
index 0000000..21a4f38
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/stpcpy.S
@@ -0,0 +1 @@
+/* Defined in strcpy.S.  */
diff --git a/ports/sysdeps/arm/armv6t2/strcpy.S b/ports/sysdeps/arm/armv6t2/strcpy.S
new file mode 100644
index 0000000..a0e3bcc
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strcpy.S
@@ -0,0 +1,213 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+/* Endian independent macros for shifting bytes within registers.  */
+#ifdef __ARMEB__
+#define lsh_gt		lsr
+#define lsh_ls		lsl
+#else
+#define lsh_gt		lsl
+#define lsh_ls		lsr
+#endif
+
+#ifndef USE_AS_STPCPY
+# define STRCPY strcpy
+#endif
+
+	.syntax unified
+	.text
+
+ENTRY(__stpcpy)
+	@ Signal stpcpy with NULL in IP.
+	s(mov)	ip, #0
+	b	0f
+END(__stpcpy)
+
+weak_alias (__stpcpy, stpcpy)
+libc_hidden_def (__stpcpy)
+libc_hidden_builtin_def (stpcpy)
+
+ENTRY(strcpy)
+	@ Signal strcpy with DEST in IP.
+	mov	ip, r0
+0:
+	@ To cater to long strings, we want 8 byte alignment in the source.
+	@ To cater to small strings, we don't want to start that right away.
+	@ Loop up to 16 times, less whatever it takes to reach alignment.
+	and	r3, r1, #7
+	rsb	r3, r3, #16
+
+	@ Loop until we find ...
+1:	ldrb	r2, [r1], #1
+	subs	r3, r3, #1		@ ... the alignment point
+	strb	r2, [r0], #1
+	it	ne
+	cmpne	r2, #0			@ ... or EOS
+	bne	1b
+
+	@ Disambiguate the exit possibilites above
+	cmp	r2, #0			@ Found EOS
+	beq	.Lreturn
+
+	@ Load the next two words asap
+	ldrd	r2, r3, [r1], #8
+
+	@ For longer strings, we actaully need a stack frame.
+	push	{ r4, r5, r6, r7 }
+	cfi_adjust_cfa_offset (16)
+	cfi_rel_offset (r4, 0)
+	cfi_rel_offset (r5, 4)
+	cfi_rel_offset (r6, 8)
+	cfi_rel_offset (r7, 12)
+
+	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+	@ that was originally zero and 0xff otherwise.  Therefore we consider
+	@ the lsb of each byte the "found" bit, with 0 for a match.
+	movw	r7, #0xfefe
+	tst	r0, #3			@ Test alignment of DEST
+	movt	r7, #0xfefe
+	bne	.Lunaligned
+
+	@ So now source (r1) is aligned to 8, and dest (r0) is aligned to 4.
+	@ Loop, reading 8 bytes at a time, searching for EOS.
+	.balign	16
+2:	uqadd8	r4, r2, r7		@ Find EOS
+	uqadd8	r5, r3, r7
+	pld	[r1, #256]
+	mvns	r4, r4			@ EOS in first word?
+
+	pld	[r0, #256]
+	bne	3f
+	str	r2, [r0], #4
+	mvns	r5, r5			@ EOS in second word?
+
+	bne	4f
+	str	r3, [r0], #4
+	ldrd	r2, r3, [r1], #8
+	b	2b
+
+3:	s(sub)	r1, r1, #4		@ backup to first word
+4:	s(sub)	r1, r1, #4		@ backup to second word
+
+	@ ... then finish up any tail a byte at a time.
+	@ Note that we generally back up and re-read source bytes,
+	@ but we'll not re-write dest bytes.
+.Lbyte_loop:
+	ldrb	r2, [r1], #1
+	cmp	r2, #0
+	strb	r2, [r0], #1
+	bne	.Lbyte_loop
+
+	pop	{ r4, r5, r6, r7 }
+	cfi_remember_state
+	cfi_adjust_cfa_offset (-16)
+	cfi_restore (r4)
+	cfi_restore (r5)
+	cfi_restore (r6)
+	cfi_restore (r7)
+
+.Lreturn:
+	cmp	ip, #0			@ Was this strcpy or strcpy?
+	ite	eq
+	subeq	r0, r0, #1		@ stpcpy: undo post-inc from store
+	movne	r0, ip			@ strcpy: return original dest
+	bx	lr
+
+.Lunaligned:
+	cfi_restore_state
+	@ Here, source is aligned to 8, but the destination is not word
+	@ aligned.  Therefore we have to shift the data in order to be
+	@ able to perform aligned word stores.
+
+	@ Find out which misalignment we're dealing with.
+	tst	r0, #1
+	beq	.Lunaligned2
+	tst	r0, #2
+	bne	.Lunaligned3
+	@ Fallthru to .Lunaligned1.
+
+.macro unaligned_copy	unalign
+	@ Prologue to unaligned loop.  Seed shifted non-zero bytes.
+	uqadd8	r4, r2, r7		@ Find EOS
+	uqadd8	r5, r3, r7
+	mvns	r4, r4			@ EOS in first word?
+	it	ne
+	subne	r1, r1, #8
+	bne	.Lbyte_loop
+#ifdef __ARMEB__
+	rev	r2, r2			@ Byte stores below need LE data
+#endif
+	@ Store a few bytes from the first word.
+	@ At the same time we align r0 and shift out bytes from r2.
+.rept	4-\unalign
+	strb	r2, [r0], #1
+	s(lsr)	r2, r2, #8
+.endr
+#ifdef __ARMEB__
+	rev	r2, r2			@ Undo previous rev
+#endif
+	@ Rotated unaligned copy loop.  The tail of the prologue is
+	@ shared with the loop itself.
+	.balign 8
+1:	mvns	r5, r5			@ EOS in second word?
+	bne	4f
+	@ Combine first and second words
+	orr	r2, r2, r3, lsh_gt #(\unalign*8)
+	@ Save leftover bytes from the two words
+	lsh_ls	r6, r3, #((4-\unalign)*8)
+	str	r2, [r0], #4
+	@ The "real" start of the unaligned copy loop.
+	ldrd	r2, r3, [r1], #8	@ Load 8 more bytes
+	uqadd8	r4, r2, r7		@ Find EOS
+	pld	[r1, #256]
+	uqadd8	r5, r3, r7
+	pld	[r0, #256]
+	mvns	r4, r4			@ EOS in first word?
+	bne	3f
+	@ Combine the leftover and the first word
+	orr	r6, r6, r2, lsh_gt #(\unalign*8)
+	@ Discard used bytes from the first word.
+	lsh_ls	r2, r2, #((4-\unalign)*8)
+	str	r6, [r0], #4
+	b	1b
+	@ Found EOS in one of the words; adjust backward
+3:	s(sub)	r1, r1, #4
+	mov	r2, r6
+4:	s(sub)	r1, r1, #4
+	@ And store the remaining bytes from the leftover
+#ifdef __ARMEB__
+	rev	r2, r2
+#endif
+.rept	\unalign
+	strb	r2, [r0], #1
+	s(lsr)	r2, r2, #8
+.endr
+	b	.Lbyte_loop
+.endm
+
+.Lunaligned1:
+	unaligned_copy	1
+.Lunaligned2:
+	unaligned_copy	2
+.Lunaligned3:
+	unaligned_copy	3
+
+END(STRCPY)
+
+libc_hidden_builtin_def (strcpy)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 25/26] arm: Add optimized submul_1
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (16 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-27  3:17 ` [PATCH 06/26] arm: Use pc_ofs Richard Henderson
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Written from scratch rather than copied from GMP, due to GPL 2.1 vs
GPL 3, but tested with the GMP testsuite.

This is 50% faster than the generic code as measured on Cortex-A15.
It is 25% slower than the current GMP routine on the same core.
---
	* sysdeps/arm/submul_1.S: New file.
---
 ports/sysdeps/arm/submul_1.S | 67 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 ports/sysdeps/arm/submul_1.S

diff --git a/ports/sysdeps/arm/submul_1.S b/ports/sysdeps/arm/submul_1.S
new file mode 100644
index 0000000..35e1348
--- /dev/null
+++ b/ports/sysdeps/arm/submul_1.S
@@ -0,0 +1,67 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.syntax unified
+	.text
+
+@		cycles/limb
+@ StrongArm	   ?
+@ Cortex-A8	   ?
+@ Cortex-A9	   ?
+@ Cortex-A15	   4
+
+/* mp_limb_t mpn_submul_1(res_ptr, src1_ptr, size, s2_limb) */
+
+ENTRY(__mpn_submul_1)
+	push	{ r4, r5, r6, r7 }
+	cfi_adjust_cfa_offset (16)
+	cfi_rel_offset (r4, 0)
+	cfi_rel_offset (r5, 4)
+	cfi_rel_offset (r6, 8)
+	cfi_rel_offset (r7, 12)
+
+	ldr	r6, [r1], #4
+	ldr	r7, [r0]
+	mov	r4, #0			/* init carry in */
+	b	1f
+0:
+	ldr	r6, [r1], #4		/* load next ul */
+	adds	r5, r5, r4		/* (lpl, c) = lpl + cl */
+	adc	r4, ip, #0		/* cl = hpl + c */
+	subs	r5, r7, r5		/* (lpl, !c) = rl - lpl */
+	ldr	r7, [r0, #4]		/* load next rl */
+	it	cc
+	addcc	r4, r4, #1		/* cl += !c */
+	str	r5, [r0], #4
+1:
+	umull	r5, ip, r6, r3		/* (hpl, lpl) = ul * vl */
+	subs	r2, r2, #1
+	bne	0b
+
+	adds	r5, r5, r4		/* (lpl, c) = lpl + cl */
+	adc	r4, ip, #0		/* cl = hpl + c */
+	subs	r5, r7, r5		/* (lpl, !c) = rl - lpl */
+	str	r5, [r0], #4
+	ite	cc
+	addcc	r0, r4, #1		/* cl += !c */
+	movcs	r0, r4			/* return carry */
+
+	pop	{ r4, r5, r6, r7 }
+	DO_RET(lr)
+END(__mpn_submul_1)
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 15/26] arm: Delete LOADREGS macro
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (21 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8 Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28  1:24   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 12/26] arm: Enable thumb2 mode in assembly files Richard Henderson
                   ` (5 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

There was only one user.  It's "condition" argument was used
for "ia" rather than an actual condition.  The apcs26 syntax
is almost certainly not needed, given current binutils requirements.
---
	* sysdeps/arm/__longjmp.S (__longjmp): Use ldmia insn directly.
	* sysdeps/arm/sysdep.h (LOADREGS): Remove.
---
 ports/sysdeps/arm/__longjmp.S | 2 +-
 ports/sysdeps/arm/sysdep.h    | 4 ----
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/ports/sysdeps/arm/__longjmp.S b/ports/sysdeps/arm/__longjmp.S
index af4b963..050227b 100644
--- a/ports/sysdeps/arm/__longjmp.S
+++ b/ports/sysdeps/arm/__longjmp.S
@@ -37,7 +37,7 @@ ENTRY (__longjmp)
 	cfi_undefined (r4)
 	CHECK_SP (r4)
 #endif
-	LOADREGS(ia, ip!, {v1-v6, sl, fp, sp, lr})
+	ldmia	ip!, {v1-v6, sl, fp, sp, lr}
 	cfi_restore (v1)
 	cfi_restore (v2)
 	cfi_restore (v3)
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index fed3dfd..bfdba27 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -35,8 +35,6 @@
 
 /* APCS-32 doesn't preserve the condition codes across function call. */
 #ifdef __APCS_32__
-#define LOADREGS(cond, base, reglist...)\
-	ldm##cond	base,reglist
 #ifdef __USE_BX__
 #define RETINSTR(cond, reg)	\
 	bx##cond	reg
@@ -49,8 +47,6 @@
 	mov pc, _reg
 #endif
 #else  /* APCS-26 */
-#define LOADREGS(cond, base, reglist...)\
-	ldm##cond	base,reglist^
 #define RETINSTR(cond, reg)	\
 	mov##cond##s	pc, reg
 #define DO_RET(_reg)		\
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 20/26] arm: Implement armv6t2 optimized strlen Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28 21:57   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 21/26] arm: Implement armv6t2 optimized strcpy Richard Henderson
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Unless we're trying old interworking, there's no point restoring to
LR first.  Everthing from armv5 on handles pop as an interworking jump.
---
	* sysdeps/arm/arm-mcount.S (_mcount): Use pop into pc unless
	__ARM_ARCH_4T__ and __THUMB_INTERWORK__.
	* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_undefweak): Likewise.
	(_dl_tlsdesc_dynamic): Likewise.
---
 ports/sysdeps/arm/arm-mcount.S |  6 +++---
 ports/sysdeps/arm/dl-tlsdesc.S | 15 ++++++++++++---
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/ports/sysdeps/arm/arm-mcount.S b/ports/sysdeps/arm/arm-mcount.S
index b6e5ec7..8ad0779 100644
--- a/ports/sysdeps/arm/arm-mcount.S
+++ b/ports/sysdeps/arm/arm-mcount.S
@@ -82,9 +82,7 @@ ENTRY(_mcount)
 	ldrne r0, [r0, #-4]
 	movsne r1, lr
 	blne __mcount_internal
-#ifdef __thumb2__
-	pop	{r0, r1, r2, r3, fp, pc}
-#else
+#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
 	pop	{r0, r1, r2, r3, fp, lr}
 	cfi_adjust_cfa_offset (-24)
 	cfi_restore (r0)
@@ -94,6 +92,8 @@ ENTRY(_mcount)
 	cfi_restore (fp)
 	cfi_restore (lr)
 	bx lr
+#else
+	pop	{r0, r1, r2, r3, fp, pc}
 #endif
 END(_mcount)
 
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 417b8b3..6c47743 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -51,10 +51,14 @@ _dl_tlsdesc_undefweak:
 	cfi_rel_offset (lr,0)
 	bl 	__aeabi_read_tp
 	rsb 	r0, r0, #0
+#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
 	pop	{lr}
 	cfi_adjust_cfa_offset (-4)
 	cfi_restore (lr)
-	BX	(lr)
+	bx	lr
+#else
+	pop	{pc}
+#endif
 
 	cfi_endproc
 	.fnend
@@ -118,13 +122,18 @@ _dl_tlsdesc_dynamic:
 1:	mov	r0, r1
 	bl	__tls_get_addr
 	rsb	r0, r4, r0
-2:	pop	{r2,r3,r4, lr}
+2:
+#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
+	pop	{r2,r3,r4, lr}
 	cfi_adjust_cfa_offset (-16)
 	cfi_restore (lr)
 	cfi_restore (r4)
 	cfi_restore (r3)
 	cfi_restore (r2)
-	BX      (lr)
+	bx	lr
+#else
+	pop	{r2,r3,r4, pc}
+#endif
 	.fnend
 	cfi_endproc
 	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 06/26] arm: Use pc_ofs
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (17 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 25/26] arm: Add optimized submul_1 Richard Henderson
@ 2013-02-27  3:17 ` Richard Henderson
  2013-02-28  0:21   ` Joseph S. Myers
  2013-02-27  3:17 ` [PATCH 10/26] arm: Introduce and use LDST_PCREL Richard Henderson
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  3:17 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Scour the source for raw "-8" adjustments that are related to the
offset created by reading the pc.
---
	* sysdeps/arm/__longjmp.S (__longjmp): Use pc_ofs.
	* sysdeps/arm/setjmp.S (__sigsetjmp): Likewise.
	* sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
	* sysdeps/unix/sysv/linux/arm/getcontext.S (__getcontext): Likewise.
	* sysdeps/unix/sysv/linux/arm/setcontext.S (__startcontext): Likewise.
	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
	(SINGLE_THREAD_P): Likewise.
	* sysdeps/unix/sysv/linux/arm/sysdep.h
	(SYSCALL_ERROR_HANDLER): Likewise.
---
 ports/sysdeps/arm/__longjmp.S                          | 4 ++--
 ports/sysdeps/arm/setjmp.S                             | 4 ++--
 ports/sysdeps/unix/arm/sysdep.S                        | 4 ++--
 ports/sysdeps/unix/sysv/linux/arm/getcontext.S         | 2 +-
 ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h | 2 +-
 ports/sysdeps/unix/sysv/linux/arm/setcontext.S         | 2 +-
 ports/sysdeps/unix/sysv/linux/arm/sysdep.h             | 2 +-
 7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/ports/sysdeps/arm/__longjmp.S b/ports/sysdeps/arm/__longjmp.S
index 5c04f36..28281d5 100644
--- a/ports/sysdeps/arm/__longjmp.S
+++ b/ports/sysdeps/arm/__longjmp.S
@@ -105,12 +105,12 @@ ENTRY (__longjmp)
 
 #ifdef NEED_HWCAP
 # ifdef IS_IN_rtld
-1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - 8
+1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
 .Lrtld_local_ro:
 	.long	C_SYMBOL_NAME(_rtld_local_ro)(GOTOFF)
 # else
 #  ifdef PIC
-1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - 8
+1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
 .Lrtld_global_ro:
 	.long	C_SYMBOL_NAME(_rtld_global_ro)(GOT)
 #  else
diff --git a/ports/sysdeps/arm/setjmp.S b/ports/sysdeps/arm/setjmp.S
index 4b7542a..774c78a 100644
--- a/ports/sysdeps/arm/setjmp.S
+++ b/ports/sysdeps/arm/setjmp.S
@@ -91,12 +91,12 @@ ENTRY (__sigsetjmp)
 
 #ifdef NEED_HWCAP
 # ifdef IS_IN_rtld
-1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - 8
+1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
 .Lrtld_local_ro:
 	.long	C_SYMBOL_NAME(_rtld_local_ro)(GOTOFF)
 # else
 #  ifdef PIC
-1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - 8
+1:	.long	_GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
 .Lrtld_global_ro:
 	.long	C_SYMBOL_NAME(_rtld_global_ro)(GOT)
 #  else
diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index da07d85..76137b3 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -50,14 +50,14 @@ __syscall_error:
 	mvn r0, #0
 	RETINSTR (, ip)
 
-1:	.word errno(gottpoff) + (. - 2b - 8)
+1:	.word errno(gottpoff) + (. - 2b - pc_ofs)
 #elif RTLD_PRIVATE_ERRNO
 	ldr r1, 1f
 0:	str r0, [pc, r1]
 	mvn r0, $0
 	DO_RET(r14)
 
-1:	.word C_SYMBOL_NAME(rtld_errno) - 0b - 8
+1:	.word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs
 #else
 #error "Unsupported non-TLS case"
 #endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/getcontext.S b/ports/sysdeps/unix/sysv/linux/arm/getcontext.S
index f7857c1..69cae48 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/getcontext.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/getcontext.S
@@ -103,7 +103,7 @@ ENTRY(__getcontext)
 END(__getcontext)
 
 #ifdef PIC
-1:      .long   _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1:      .long   _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
 .Lrtld_global_ro:
 	.long   C_SYMBOL_NAME(_rtld_global_ro)(GOT)
 #else
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index 1b0a244..1745f9e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -212,7 +212,7 @@ extern int __local_multiple_threads attribute_hidden;
   ldr ip, [pc, ip];							\
   teq ip, #0;
 #   define PSEUDO_PROLOGUE						\
-  1:  .word __local_multiple_threads - 2f - 8;
+  1:  .word __local_multiple_threads - 2f - pc_ofs;
 #  endif
 # else
 /*  There is no __local_multiple_threads for librt, so use the TCB.  */
diff --git a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
index 8e71f5b..8d96c57 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
@@ -93,7 +93,7 @@ ENTRY(__startcontext)
 END(__startcontext)
 
 #ifdef PIC
-1:      .long   _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1:      .long   _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
 .Lrtld_global_ro:
 	.long   C_SYMBOL_NAME(_rtld_global_ro)(GOT)
 #else
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index 6b5bb14..cb237d9 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -114,7 +114,7 @@ __local_syscall_error:						\
 0:     str     r0, [pc, r1];					\
        mvn     r0, #0;						\
        DO_RET(lr);						\
-1:     .word C_SYMBOL_NAME(rtld_errno) - 0b - 8;
+1:     .word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs;
 # else
 #  if defined(__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
 #   define POP_PC \
-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8
  2013-02-27  3:17 ` [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8 Richard Henderson
@ 2013-02-27  7:04   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27  7:04 UTC (permalink / raw)
  To: libc-ports; +Cc: Joseph Myers

Ignore this one for now.  I've somehow lost the fixed, working version 
during rebasing and moving patches around.  This one's broken.

r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 00/26] ARM improvements
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (25 preceding siblings ...)
  2013-02-27  3:17 ` [PATCH 08/26] arm: Add IT insns for thumb mode Richard Henderson
@ 2013-02-27 15:41 ` Måns Rullgård
  2013-02-27 16:59 ` Joseph S. Myers
  2013-02-28 22:05 ` Joseph S. Myers
  28 siblings, 0 replies; 63+ messages in thread
From: Måns Rullgård @ 2013-02-27 15:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports, Joseph Myers

Richard Henderson <rth@twiddle.net> writes:

> Patches 19-23 add improved string routines for armv6t2.  I've had these
> hanging around for almost 2 years without properly submitting them. 
> Which is perhaps a bit silly, but the A8 host I was originally doing
> testing on has a dreadfully low resolution clock, so it was hard to get
> real numbers. Whereas the A15 has a 1ns resolution CLOCK_MONOTONIC_RAW.
> I can post the benchmarks under separate cover if you like.

With a suitable kernel hack, you can access the cycle counter from
userspace on any ARMv7 device.  This is probably more accurate and
avoids the syscall overhead.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/26] arm: Add optimized ffs for armv6t2
  2013-02-27  3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
@ 2013-02-27 15:51   ` Måns Rullgård
  2013-02-27 16:34     ` Richard Henderson
  2013-02-27 17:49   ` Roland McGrath
  1 sibling, 1 reply; 63+ messages in thread
From: Måns Rullgård @ 2013-02-27 15:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports, Joseph Myers

Richard Henderson <rth@twiddle.net> writes:

> +ENTRY(__ffs)
> +	cmp	r0, #0
> +	ittt	ne
> +	rbitne	r0, r0
> +	clzne	r0, r0
> +	addne	r0, r0, #1
> +	bx	lr
> +END(__ffs)

Making the RBIT unconditional (bit-reverse of zero is still zero) is
better since it reduces dependencies between instructions.  Depending on
microarchitecture details, this might save a cycle.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/26] arm: Add optimized ffs for armv6t2
  2013-02-27 15:51   ` Måns Rullgård
@ 2013-02-27 16:34     ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 16:34 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: libc-ports, Joseph Myers

On 02/27/2013 07:51 AM, Måns Rullgård wrote:
> Richard Henderson <rth@twiddle.net> writes:
> 
>> +ENTRY(__ffs)
>> +	cmp	r0, #0
>> +	ittt	ne
>> +	rbitne	r0, r0
>> +	clzne	r0, r0
>> +	addne	r0, r0, #1
>> +	bx	lr
>> +END(__ffs)
> 
> Making the RBIT unconditional (bit-reverse of zero is still zero) is
> better since it reduces dependencies between instructions.  Depending on
> microarchitecture details, this might save a cycle.
> 

Fair enough.  Consider this change made for any next round.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 00/26] ARM improvements
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (26 preceding siblings ...)
  2013-02-27 15:41 ` [PATCH 00/26] ARM improvements Måns Rullgård
@ 2013-02-27 16:59 ` Joseph S. Myers
  2013-02-27 17:34   ` Richard Henderson
  2013-02-28 22:05 ` Joseph S. Myers
  28 siblings, 1 reply; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 16:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

Could you please clarify how these patches have been tested?  In 
particular, what testing has been done for big-endian (I think the string 
functions at least do need bit-endian testing - it should be possible to 
run string tests with userspace QEMU without needing BE hardware).

> Patches 4-18 improve the ability to build libc as a thumb2 binary.
> In the end, almost all assembly is done in thumb2 mode if -mthumb
> is present in ASFLAGS.  Its that last that's the sticky part: by
> default we copy only a couple of flags over from CFLAGS.  I'm not
> sure why we're not passing them all to the assembler.  So at the
> moment I'm just putting ASFLAGS on the make command-line to get
> what I want.

I'd typically expect builds to be done with CC containing any relevant 
options for this sort of thing, rather than CFLAGS.

That also raises the question of dependencies between the patches.  Given 
a patch series like this, each subset 1-N of the patches should generally 
leave the tree in a working state.  But if a patch (say patch 6) makes 
changes to .S code for __thumb2__ that are only correct after that 
actually means the generated code is Thumb-2 (patch 12) that leaves a 
broken intermediate state (given a compiler that defaults to Thumb-2, 
whether because configured --with-mode=thumb or because of the options in 
$CC), meaning the changes can't quite go in the given order (patch 5 could 
define pc_ofs unconditionally to 8 in assembly code, for example, and only 
patch 12 change the value for Thumb-2 assembly).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/26] Sync config.guess and config.sub with upstream
  2013-02-27  3:16 ` [PATCH 01/26] Sync config.guess and config.sub with upstream Richard Henderson
@ 2013-02-27 17:03   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 17:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> 	* scripts/config.guess: Merge upstream version 2013-02-12.
> 	* scripts/config.sub: Likewise.

Such verbatim updates from a new upstream version of a file taken from 
upstream (minus any trailing whitespace in the upstream version, that 
glibc's git hooks will reject) should just be committed and posted to 
libc-alpha (not libc-ports).

<http://sourceware.org/glibc/wiki/Regeneration> has a list of files taken 
from upstream sources like that.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 20/26] arm: Implement armv6t2 optimized strlen
  2013-02-27  3:17 ` [PATCH 20/26] arm: Implement armv6t2 optimized strlen Richard Henderson
@ 2013-02-27 17:12   ` Måns Rullgård
  2013-02-27 17:44     ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Måns Rullgård @ 2013-02-27 17:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports, Joseph Myers

Richard Henderson <rth@twiddle.net> writes:

> +ENTRY(strlen)
> +	@ r0 = start of string
> +	pld	[r0]
> +
> +	@ To cater to long strings, we want to search through a few
> +	@ characters until we reach an aligned pointer.  To cater to
> +	@ small strings, we don't want to start doing word operations
> +	@ immediately.  The compromise is a maximum of 16 bytes less
> +	@ whatever is required to end with an aligned pointer.
> +	@ r3 = number of characters to search in alignment loop
> +	and	r3, r0, #7
> +	s(mov)	r1, r0			@ Save the input pointer
> +	rsb	r3, r3, #16
> +
> +	@ Loop until we find ...
> +1:	ldrb	r2, [r0], #1
> +	subs	r3, r3, #1		@ ... the aligment point
> +	it	ne
> +	cmpne	r2, #0			@ ... or EOS
> +	bne	1b
> +
> +	@ Disambiguate the exit possibilites above
> +	cmp	r2, #0			@ Found EOS
> +	ittt	eq
> +	subeq	r0, r0, #1		@ Undo post-inc above
> +	subeq	r0, r0, r1		@ Subtract input to compute length
> +	bxeq	lr
> +
> +	@ So now we're aligned.
> +	ldrd	r2, r3, [r0], #8
> +	movw	ip, #0xfefe
> +	pld	[r0, #64]
> +	movt	ip, #0xfefe
> +	pld	[r0, #128]
> +	pld	[r0, #192]
> +
> +	@ Loop searching for EOS or C, 8 bytes at a time.

This comment seems to be for strchr().

> +	@ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
> +	@ that was originally zero and 0xff otherwise.  Therefore we consider
> +	@ the lsb of each byte the "found" bit, with 0 for a match.
> +	.balign	16
> +2:	uqadd8	r2, r2, ip		@ Find EOS
> +	uqadd8	r3, r3, ip
> +	pld	[r0, #256]		@ Prefetch 4 lines ahead
> +	s(and)	r3, r3, r2		@ Combine the two words
> +	mvns	r3, r3			@ Test for any found bit true
> +	it	eq
> +	ldrdeq	r2, r3, [r0], #8
> +	beq	2b

Subtracting the values (with UQSUB8) from 1 instead would result in a 0
result any non-zero input and a 1 for "found", i.e. the inverse of what
you have here.  Testing for a match anywhere in the double-word then
becomes a single ORRS instruction.  Unless I'm making some stupid mistake.

> +	@ Found something.  Disambiguate between first and second words.
> +	@ Adjust r0 to point to the word containing the match.
> +	@ Adjust r2 to the found bits for the word containing the match.
> +	mvns	r2, r2
> +	itee	ne
> +	subne	r0, r0, #8
> +	moveq	r2, r3
> +	subeq	r0, r0, #4
> +
> +	@ Find the bit-offset of the match within the word.
> +#ifdef __ARMEL__
> +	rbit	r2, r2			@ For LE we need count-trailing-zeros
> +#endif
> +	clz	r2, r2
> +	add	r0, r0, r2, lsr #3	@ Adjust the pointer to the found byte
> +	s(sub)	r0, r0, r1		@ Subtract input to compute length
> +	bx	lr
> +
> +END(strlen)

This code could be made to work for any ARMv6 by (conditionally)
replacing the MOVW/MOVT with some equivalent and the RBIT by REV.  REV
works since only the lsb in each byte can be set, so the result of CLZ
will simply be 7 more than we want, and the 3 low-order bits are shifted
out anyway.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 00/26] ARM improvements
  2013-02-27 16:59 ` Joseph S. Myers
@ 2013-02-27 17:34   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 17:34 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-ports

On 02/27/2013 08:58 AM, Joseph S. Myers wrote:
> Could you please clarify how these patches have been tested? 

As make check on a Cortex-A15 LE, both in arm and thumb mode.

> what testing has been done for big-endian (I think the string 
> functions at least do need bit-endian testing - it should be possible to 
> run string tests with userspace QEMU without needing BE hardware).

I haven't done that recently, but I can do that again.

> I'd typically expect builds to be done with CC containing any relevant 
> options for this sort of thing, rather than CFLAGS.

Point.  I'll give that a try in future.

> That also raises the question of dependencies between the patches.  Given 
> a patch series like this, each subset 1-N of the patches should generally 
> leave the tree in a working state.  But if a patch (say patch 6) makes 
> changes to .S code for __thumb2__ that are only correct after that 
> actually means the generated code is Thumb-2 (patch 12) that leaves a 
> broken intermediate state (given a compiler that defaults to Thumb-2, 
> whether because configured --with-mode=thumb or because of the options in 
> $CC), meaning the changes can't quite go in the given order (patch 5 could 
> define pc_ofs unconditionally to 8 in assembly code, for example, and only 
> patch 12 change the value for Thumb-2 assembly).

I was attempting such an order, but I see what you mean about cpp conditionals
not matching up with the actual assembly mode.

As far as I can remember, pc_ofs is the only such example.  The rest of the
changes -- I'm thinking of how pc is added to addresses and negative offset
addressing -- while conditioned on __thumb2__ still produce valid ARM insns.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 20/26] arm: Implement armv6t2 optimized strlen
  2013-02-27 17:12   ` Måns Rullgård
@ 2013-02-27 17:44     ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 17:44 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: libc-ports, Joseph Myers

On 02/27/2013 09:12 AM, Måns Rullgård wrote:
> Richard Henderson <rth@twiddle.net> writes:
> 
>> +ENTRY(strlen)
...
>> +	@ Loop searching for EOS or C, 8 bytes at a time.
> 
> This comment seems to be for strchr().

Whoops.  As you can imagine there's some amount of cut and paste here.  ;-)

> Subtracting the values (with UQSUB8) from 1 instead would result in a 0
> result any non-zero input and a 1 for "found", i.e. the inverse of what
> you have here.  Testing for a match anywhere in the double-word then
> becomes a single ORRS instruction.  Unless I'm making some stupid mistake.

Yes, this works.  And a good idea for improvement.

> This code could be made to work for any ARMv6 by (conditionally)
> replacing the MOVW/MOVT with some equivalent and the RBIT by REV.  REV
> works since only the lsb in each byte can be set, so the result of CLZ
> will simply be 7 more than we want, and the 3 low-order bits are shifted
> out anyway.

Ah, I'd mis-read the document the first time round and thought uqadd8 was an
armv6t2 instruction.  I'll rearrange all these so that armv6 can benefit.

Which makes patch 3 once again useful... ;-)


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/26] arm: Add optimized ffs for armv6t2
  2013-02-27  3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
  2013-02-27 15:51   ` Måns Rullgård
@ 2013-02-27 17:49   ` Roland McGrath
  1 sibling, 0 replies; 63+ messages in thread
From: Roland McGrath @ 2013-02-27 17:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports, Joseph Myers

Space before paren.  Descriptive line at top of new files.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8
  2013-02-27  3:17 ` [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8 Richard Henderson
@ 2013-02-27 17:54   ` Joseph S. Myers
  2013-02-27 18:11     ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 17:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> -		  grep __ARM_ARCH |
> +		  grep '__ARM_ARCH_.*__' |

I think this should be restricted more closely to those likely to be 
relevant, using [0-9] before the .* - OK with that change (presuming it 
works).  Just this particular change would be reasonable to cherry-pick to 
2.17 branch as well for the benefit of anyone using that release branch 
with 4.8.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
  2013-02-27  3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
@ 2013-02-27 18:02   ` Joseph S. Myers
  2013-02-27 18:04     ` Roland McGrath
  2013-02-27 18:34     ` Richard Henderson
  0 siblings, 2 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 18:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> 	* sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.

Shouldn't all of 6J, 6K, 6Z, 6ZK be handled the same way?  That is, 
__ARM_ARCH_6*__ unless you want to list all the variants explicitly?

(Do you get any warnings from configure about the armv6 directory not 
existing?)

(When such a directory is actually added, an Implies file in armv6t2 will 
be needed to make that fall back to armv6 versions as appropriate.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
  2013-02-27 18:02   ` Joseph S. Myers
@ 2013-02-27 18:04     ` Roland McGrath
  2013-02-27 18:08       ` Richard Henderson
  2013-02-27 18:34     ` Richard Henderson
  1 sibling, 1 reply; 63+ messages in thread
From: Roland McGrath @ 2013-02-27 18:04 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Henderson, libc-ports

> (When such a directory is actually added, an Implies file in armv6t2 will 
> be needed to make that fall back to armv6 versions as appropriate.)

Why wouldn't it be a subdirectory?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
  2013-02-27 18:04     ` Roland McGrath
@ 2013-02-27 18:08       ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 18:08 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Joseph S. Myers, libc-ports

On 02/27/2013 10:04 AM, Roland McGrath wrote:
>> (When such a directory is actually added, an Implies file in armv6t2 will 
>> be needed to make that fall back to armv6 versions as appropriate.)
> 
> Why wouldn't it be a subdirectory?
> 
*shrug*

For the same reason that we don't nest all incremental architecture extensions:
it's tedious to find armv7 code in arm/armv5/armv6/armv6t2/armv7/ as opposed to
arm/armv7/.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8
  2013-02-27 17:54   ` Joseph S. Myers
@ 2013-02-27 18:11     ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 18:11 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-ports

On 02/27/2013 09:54 AM, Joseph S. Myers wrote:
> On Tue, 26 Feb 2013, Richard Henderson wrote:
> 
>> -		  grep __ARM_ARCH |
>> +		  grep '__ARM_ARCH_.*__' |
> 
> I think this should be restricted more closely to those likely to be 
> relevant, using [0-9] before the .* - OK with that change (presuming it 
> works).  Just this particular change would be reasonable to cherry-pick to 
> 2.17 branch as well for the benefit of anyone using that release branch 
> with 4.8.
> 

Yes, this [0-9] change works.  I'll commit that separately.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
  2013-02-27 18:02   ` Joseph S. Myers
  2013-02-27 18:04     ` Roland McGrath
@ 2013-02-27 18:34     ` Richard Henderson
  2013-02-27 19:57       ` Joseph S. Myers
  1 sibling, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 18:34 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-ports

On 02/27/2013 10:01 AM, Joseph S. Myers wrote:
> On Tue, 26 Feb 2013, Richard Henderson wrote:
> 
>> 	* sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.
> 
> Shouldn't all of 6J, 6K, 6Z, 6ZK be handled the same way?  That is, 
> __ARM_ARCH_6*__ unless you want to list all the variants explicitly?

Does 6* get us into trouble with 6M?  Or do we just assume that glibc is not
portable to non-A class cores?

> (Do you get any warnings from configure about the armv6 directory not 
> existing?)

No.

> (When such a directory is actually added, an Implies file in armv6t2 will 
> be needed to make that fall back to armv6 versions as appropriate.)

Yep.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
  2013-02-27 18:34     ` Richard Henderson
@ 2013-02-27 19:57       ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 19:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Wed, 27 Feb 2013, Richard Henderson wrote:

> On 02/27/2013 10:01 AM, Joseph S. Myers wrote:
> > On Tue, 26 Feb 2013, Richard Henderson wrote:
> > 
> >> 	* sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.
> > 
> > Shouldn't all of 6J, 6K, 6Z, 6ZK be handled the same way?  That is, 
> > __ARM_ARCH_6*__ unless you want to list all the variants explicitly?
> 
> Does 6* get us into trouble with 6M?  Or do we just assume that glibc is not
> portable to non-A class cores?

Only A class cores have an MMU and so are relevant to glibc (and I don't 
see a use for building glibc for such a core that it can't run on, even if 
theoretically one might build glibc for such a core but use it on a 
different core with an MMU).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines
  2013-02-27  3:16 ` [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines Richard Henderson
@ 2013-02-28  0:15   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  0:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> When compiling with -mthumb, ld.so itself also needs __libc_do_syscall.
> ---
> 	* sysdeps/unix/sysv/linux/arm/Makefile [elf] (sysdep-rtld-routines):
> 	Include libc-do-syscall.

OK, though I haven't observed such a need myself.  There really ought to 
be a better way of making things like this available to all libraries, 
though, rather than needing to list so many cases explicitly in this 
Makefile.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
  2013-02-27  3:16 ` [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs Richard Henderson
@ 2013-02-28  0:20   ` Joseph S. Myers
  2013-02-28  0:36     ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  0:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> 	* sysdeps/arm/sysdep.h (s, pc_ofs): New macros.

Other assembler syntax macros have names in uppercase, and a single 
character seems too short for a macro name to me.

> +/* This number is the offset from the pc at the current location.  */
> +#ifdef __thumb__
> +# define pc_ofs		4
> +#else
> +# define pc_ofs		8
> +#endif

As noted, I don't think this can have a conditional definition like this 
until .S files are actually built as Thumb.

However, there are existing cases in asm statements with conditionals for 
this (sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c and 
sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c).  So what I'd 
actually suggest is defining such a macro for both C and .S code, but 
always defining it to 8 for __ASSEMBLER__ at this point and only 
conditioning on __thumb__ for C, and making those two C files use the 
result of stringizing the macro.  Then, when .S files are built as Thumb 
the __ASSEMBLER__ conditionals on how to define this macro can be removed.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/26] arm: Use pc_ofs
  2013-02-27  3:17 ` [PATCH 06/26] arm: Use pc_ofs Richard Henderson
@ 2013-02-28  0:21   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  0:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> 	* sysdeps/arm/__longjmp.S (__longjmp): Use pc_ofs.
> 	* sysdeps/arm/setjmp.S (__sigsetjmp): Likewise.
> 	* sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/getcontext.S (__getcontext): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/setcontext.S (__startcontext): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
> 	(SINGLE_THREAD_P): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/sysdep.h
> 	(SYSCALL_ERROR_HANDLER): Likewise.

OK, once the macro itself is in, and updated as necessary for a renaming 
of the macro into uppercase.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/26] arm: Introduce and use GET_TLS
  2013-02-27  3:17 ` [PATCH 07/26] arm: Introduce and use GET_TLS Richard Henderson
@ 2013-02-28  0:34   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  0:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> Factor out the sequence needed to call kuser_get_tls,
> as we can't play subtract into pc games in thumb mode.
> ---
> 	* sysdeps/unix/sysv/linux/arm/sysdep.h (GET_TLS): New macro.
> 	* sysdeps/unix/arm/sysdep.S (__syscall_error): Use it.
> 	* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S (SAVE_PID): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/nptl/vfork.S (SAVE_PID): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S (__aeabi_read_tp):
> 	Add thumb2 alternative.

OK - although for v6K, v6ZK and v7-A and above, GCC defaults to -mtp=hard, 
so it might make sense as a followup to define this macro to use 
corresponding hard-tp code on such architectures (and for that matter to 
make __aeabi_read_tp use the hard-tp instruction, for running code built 
for an older architecture with libc built for a newer architecture).

(In the hard-tp case you could then avoid the lr save/restore in the users 
of GET_TLS, though you'd need to define another macro to say that GET_TLS 
doesn't clobber lr.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
  2013-02-28  0:20   ` Joseph S. Myers
@ 2013-02-28  0:36     ` Richard Henderson
  2013-02-28  1:45       ` Måns Rullgård
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-28  0:36 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-ports

On 2013-02-27 16:20, Joseph S. Myers wrote:
>> 	* sysdeps/arm/sysdep.h (s, pc_ofs): New macros.
> Other assembler syntax macros have names in uppercase, and a single
> character seems too short for a macro name to me.
>

I simply found that "s(and)" is easy to read, and it looks close to "ands" to 
which it expands.  I'd rather drop this change, and all uses thereof, than make 
the code less readable.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 08/26] arm: Add IT insns for thumb mode
  2013-02-27  3:17 ` [PATCH 08/26] arm: Add IT insns for thumb mode Richard Henderson
@ 2013-02-28  0:41   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  0:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> These are ignored by the assembler in ARM mode, so by
> default this has no effect on generated code.
> ---
> 	* ports/sysdeps/arm/arm-mcount.S: Always use unified syntax and
> 	always add IT markup.
> 	* sysdeps/unix/sysv/linux/arm/mmap64.S (__mmap64): Likewise.
> 	* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Add IT markup.
> 	* sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/mmap.S (__mmap): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/syscall.S (syscall): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/sysdep.h (PSEUDO_RET): Likewise.
> 	* sysdeps/unix/sysv/linux/arm/vfork.S (__vfork): Likewise.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/26] arm: Mark assembly files that will not use thumb mode
  2013-02-27  3:16 ` [PATCH 09/26] arm: Mark assembly files that will not use thumb mode Richard Henderson
@ 2013-02-28  0:58   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  0:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> Some routines are written with complex LDM/STM insns that cannot be
> used in thumb mode, or are highly conditional requiring excessive
> IT insns.
> 
> When a future patch goes in to enable thumb2 by default, this marker
> will be used to override that default.
> ---
> 	* ports/sysdeps/arm/__longjmp.S: Define NO_THUMB before <sysdep.h>
> 	* sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
> 	* sysdeps/arm/dl-trampoline.S: Likewise.
> 	* sysdeps/arm/memcpy.S: Likewise.
> 	* sysdeps/arm/memmove.S: Likewise.
> 	* sysdeps/arm/memset.S: Likewise.
> 	* sysdeps/arm/setjmp.S: Likewise.
> 	* sysdeps/arm/strlen.S: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/setcontext.S: Likewise.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/26] arm: Introduce and use LDST_PCREL
  2013-02-27  3:17 ` [PATCH 10/26] arm: Introduce and use LDST_PCREL Richard Henderson
@ 2013-02-28  1:00   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  1:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> Macro-ising the few instances where we need to distinguish between
> arm and thumb pc-relative memory operations.
> ---
> 	* sysdeps/arm/sysdep.h (LDST_PCREL): New macro.
> 	* sysdeps/unix/arm/sysdep.S (__syscall_error): Use LDST_PCREL.
> 	Fix up gottpoff load of errno for thumb2.
> 	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
> 	(SINGLE_THREAD_P): Use LDST_PCREL.
> 	(PSEUDO_PROLOGUE): Remove.
> 	(PSEUDO): Don't use it.
> 	* sysdeps/unix/sysv/linux/arm/sysdep.h (SYSCALL_ERROR_HANDLER):
> 	Use LDST_PCREL.

This patch appears to include whitespace changes to otherwise unmodified 
code, as well as the substantive changes.  Please separate the whitespace 
and substantive changes and resubmit.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] arm: Use push/pop mnemonics
  2013-02-27  3:17 ` [PATCH 14/26] arm: Use push/pop mnemonics Richard Henderson
@ 2013-02-28  1:03   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  1:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> For arm this makes no difference--the result is bit-for-bit identical;
> for thumb this results in smaller encodings.  Perhaps it ought not and
> this is in fact an assembler bug, but I also think it's clearer.
> ---
> 	* sysdeps/arm/arm-mcount.S (_mcount): Use push/pop mnemonics.
> 	* sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
> 	* sysdeps/arm/dl-tlsdesc.S: Likewise.
> 	* sysdeps/arm/dl-trampoline.S: Likewise.
> 	* sysdeps/arm/start.S: Likewise.
> 	* sysdeps/arm/memcpy.S (PULL): Rename macro from pull.
> 	(PUSH): Rename macro from push.
> 	(memcpy): Use push/pop mnemonics.
> 	* sysdeps/arm/memmove.S: Similarly.
> 	* sysdeps/arm/sysdep.h (CALL_MCOUNT): Use push/pop mnemonics.
> 	* sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/mmap.S: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/mmap64.S: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/syscall.S: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise.
> 	* sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 15/26] arm: Delete LOADREGS macro
  2013-02-27  3:17 ` [PATCH 15/26] arm: Delete LOADREGS macro Richard Henderson
@ 2013-02-28  1:24   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  1:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> There was only one user.  It's "condition" argument was used
> for "ia" rather than an actual condition.  The apcs26 syntax
> is almost certainly not needed, given current binutils requirements.
> ---
> 	* sysdeps/arm/__longjmp.S (__longjmp): Use ldmia insn directly.
> 	* sysdeps/arm/sysdep.h (LOADREGS): Remove.

OK.  The __APCS_32__ conditional can simply be removed; there's no 
practical support for 26-bit ARM (or anything older than v4) in glibc (and 
I don't know if v4 as opposed to v4t will actually work, although in 
principle it should work via linking with --fix-v4bx-interworking).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr
  2013-02-27  3:17 ` [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr Richard Henderson
@ 2013-02-28  1:31   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28  1:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> Not specifically speed tested against the byte-by-byte versions,
> but expected to be about as fast as the new strlen.

I think any such string function patch needs measured performance 
information compared to the previous version, not just "expected to be 
about as fast".

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
  2013-02-28  0:36     ` Richard Henderson
@ 2013-02-28  1:45       ` Måns Rullgård
  0 siblings, 0 replies; 63+ messages in thread
From: Måns Rullgård @ 2013-02-28  1:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Joseph S. Myers, libc-ports

Richard Henderson <rth@twiddle.net> writes:

> On 2013-02-27 16:20, Joseph S. Myers wrote:
>>> 	* sysdeps/arm/sysdep.h (s, pc_ofs): New macros.
>> Other assembler syntax macros have names in uppercase, and a single
>> character seems too short for a macro name to me.
>
> I simply found that "s(and)" is easy to read, and it looks close to
> "ands" to which it expands.  I'd rather drop this change, and all uses
> thereof, than make the code less readable.

How many places is it used?  From what I can tell, the only purpose is
to enable use of the 16-bit thumb encoding, so if it's used in just a
few places, the savings might not be worth the obfuscation.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 24/26] arm: Add optimized addmul_1
  2013-02-27  3:17 ` [PATCH 24/26] arm: Add optimized addmul_1 Richard Henderson
@ 2013-02-28 13:58   ` Måns Rullgård
  2013-02-28 18:19     ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Måns Rullgård @ 2013-02-28 13:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports, Joseph Myers

Richard Henderson <rth@twiddle.net> writes:

> +ENTRY(__mpn_addmul_1)
> +	push	{ r4, r5, r6 }
> +	cfi_adjust_cfa_offset (12)
> +	cfi_rel_offset (r4, 0)
> +	cfi_rel_offset (r5, 4)
> +	cfi_rel_offset (r6, 8)
> +
> +	ldr	r6, [r1], #4
> +	ldr	r5, [r0]
> +	mov	r4, #0			/* init carry in */
> +	b	1f
> +0:
> +	ldr	r6, [r1], #4		/* load next ul */
> +	adds	r4, r4, r5		/* (out, c) = cl + lpl */
> +	ldr	r5, [r0, #4]		/* load next rl */
> +	str	r4, [r0], #4
> +	adc	r4, ip, #0		/* cl = hpl + c */

You might gain a cycle here on some cores by replacing r4 by something
else in the adds/str sequence and reversing the order of the last two
insns to better exploit dual-issue.  On most semi-modern cores you can
get another register for free by pushing one more to the stack
(load/store multiple instructions transfer registers pairwise).

I'd expect this to benefit the A8 and maybe A9.  On A15 it should make
no difference.

> +1:
> +	mov	ip, #0			/* zero-extend rl */
> +	umlal	r5, ip, r6, r3		/* (hpl, lpl) = ul * vl + rl */
> +	subs	r2, r2, #1
> +	bne	0b
> +
> +	adds	r4, r4, r5		/* (out, c) = cl + llpl */
> +	str	r4, [r0]
> +	adc	r0, ip, #0		/* return hpl + c */
> +
> +	pop	{ r4, r5, r6 }
> +	DO_RET(lr)
> +END(__mpn_addmul_1)

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 24/26] arm: Add optimized addmul_1
  2013-02-28 13:58   ` Måns Rullgård
@ 2013-02-28 18:19     ` Richard Henderson
  2013-02-28 19:37       ` Måns Rullgård
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-28 18:19 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: libc-ports, Joseph Myers

On 02/28/2013 05:58 AM, Måns Rullgård wrote:
>> > +0:
>> > +	ldr	r6, [r1], #4		/* load next ul */
>> > +	adds	r4, r4, r5		/* (out, c) = cl + lpl */
>> > +	ldr	r5, [r0, #4]		/* load next rl */
>> > +	str	r4, [r0], #4
>> > +	adc	r4, ip, #0		/* cl = hpl + c */
> You might gain a cycle here on some cores by replacing r4 by something
> else in the adds/str sequence and reversing the order of the last two
> insns to better exploit dual-issue.  On most semi-modern cores you can
> get another register for free by pushing one more to the stack
> (load/store multiple instructions transfer registers pairwise).
> 
> I'd expect this to benefit the A8 and maybe A9.  On A15 it should make
> no difference.
> 

To swap the adc and str, I'd have to add another move insn too.  I guess the
intent is that would dual-issue with the store, giving us 6 insns in 3 cycles
as opposed to 5 insns in 4 cycles?

Fair enough.

I'm not willing to work *too* hard on this.  If someone cares about the last
cycle of performance on A[89], they should work on getting the real libgmp
routines re-licensed for glibc.  I'm not willing to do politics.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 24/26] arm: Add optimized addmul_1
  2013-02-28 18:19     ` Richard Henderson
@ 2013-02-28 19:37       ` Måns Rullgård
  0 siblings, 0 replies; 63+ messages in thread
From: Måns Rullgård @ 2013-02-28 19:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Måns Rullgård, libc-ports, Joseph Myers

Richard Henderson <rth@twiddle.net> writes:

> On 02/28/2013 05:58 AM, Måns Rullgård wrote:
>>> > +0:
>>> > +	ldr	r6, [r1], #4		/* load next ul */
>>> > +	adds	r4, r4, r5		/* (out, c) = cl + lpl */
>>> > +	ldr	r5, [r0, #4]		/* load next rl */
>>> > +	str	r4, [r0], #4
>>> > +	adc	r4, ip, #0		/* cl = hpl + c */
>> You might gain a cycle here on some cores by replacing r4 by something
>> else in the adds/str sequence and reversing the order of the last two
>> insns to better exploit dual-issue.  On most semi-modern cores you can
>> get another register for free by pushing one more to the stack
>> (load/store multiple instructions transfer registers pairwise).
>> 
>> I'd expect this to benefit the A8 and maybe A9.  On A15 it should make
>> no difference.
>> 
>
> To swap the adc and str, I'd have to add another move insn too.  I guess the
> intent is that would dual-issue with the store, giving us 6 insns in 3 cycles
> as opposed to 5 insns in 4 cycles?

I meant like this:

	ldr	r6, [r1], #4		/* load next ul */
	adds	r7, r4, r5		/* (out, c) = cl + lpl */
	ldr	r5, [r0, #4]		/* load next rl */
	adc	r4, ip, #0		/* cl = hpl + c */
	str	r7, [r0], #4

It seems to me this leaves everything with the same values as your
version.  r7 can be pushed/popped for free since you're currently
preserving and odd number of registers.

> Fair enough.
>
> I'm not willing to work *too* hard on this.  If someone cares about the last
> cycle of performance on A[89], they should work on getting the real libgmp
> routines re-licensed for glibc.  I'm not willing to do politics.

Nor am I.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/26] arm: Commonize BX conditionals
  2013-02-27  3:17 ` [PATCH 16/26] arm: Commonize BX conditionals Richard Henderson
@ 2013-02-28 21:51   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 21:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> Add BLX macro in addition and use it where appropriate.
> ---
> 	* sysdeps/arm/sysdep.h (BX, BXC, BLX): New macros.
> 	(DO_RET): Use BX.
> 	(RETINSTR): Use BXC.
> 	* sysdeps/arm/dl-tlsdesc.S (BX): Remove.
> 	* sysdeps/arm/dl-trampoline.S (BX): Remove.
> 	(_dl_runtime_profile): Use BLX.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc
  2013-02-27  3:17 ` [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc Richard Henderson
@ 2013-02-28 21:57   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 21:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> Unless we're trying old interworking, there's no point restoring to
> LR first.  Everthing from armv5 on handles pop as an interworking jump.
> ---
> 	* sysdeps/arm/arm-mcount.S (_mcount): Use pop into pc unless
> 	__ARM_ARCH_4T__ and __THUMB_INTERWORK__.
> 	* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_undefweak): Likewise.
> 	(_dl_tlsdesc_dynamic): Likewise.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 18/26] arm: Use GET_TLS more often
  2013-02-27  3:17 ` [PATCH 18/26] arm: Use GET_TLS more often Richard Henderson
@ 2013-02-28 21:59   ` Joseph S. Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 21:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

On Tue, 26 Feb 2013, Richard Henderson wrote:

> ---
> 	* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_undefweak): Use GET_TLS,
> 	save LR in R1, and return directly from R1.
> 	(_dl_tlsdesc_dynamic): Use GET_TLS.
> 	* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h 
> 	(SINGLE_THREAD_P): Use GET_TLS.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 00/26] ARM improvements
  2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
                   ` (27 preceding siblings ...)
  2013-02-27 16:59 ` Joseph S. Myers
@ 2013-02-28 22:05 ` Joseph S. Myers
  28 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 22:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: libc-ports

I think you should now put in all the approved patches (except if any 
don't make sense without a previous patch that hasn't been approved), 
subject to testing that they do work in the sequence in which they go in.  
Then repost the remaining patches, adapted as appropriate for the comments 
that have been posted on libc-ports so far.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2013-02-28 22:05 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-27  3:16 [PATCH 00/26] ARM improvements Richard Henderson
2013-02-27  3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
2013-02-27 18:02   ` Joseph S. Myers
2013-02-27 18:04     ` Roland McGrath
2013-02-27 18:08       ` Richard Henderson
2013-02-27 18:34     ` Richard Henderson
2013-02-27 19:57       ` Joseph S. Myers
2013-02-27  3:16 ` [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs Richard Henderson
2013-02-28  0:20   ` Joseph S. Myers
2013-02-28  0:36     ` Richard Henderson
2013-02-28  1:45       ` Måns Rullgård
2013-02-27  3:16 ` [PATCH 01/26] Sync config.guess and config.sub with upstream Richard Henderson
2013-02-27 17:03   ` Joseph S. Myers
2013-02-27  3:16 ` [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines Richard Henderson
2013-02-28  0:15   ` Joseph S. Myers
2013-02-27  3:16 ` [PATCH 09/26] arm: Mark assembly files that will not use thumb mode Richard Henderson
2013-02-28  0:58   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 11/26] arm: Introduce and use NEGOFF series of macros Richard Henderson
2013-02-27  3:17 ` [PATCH 13/26] arm: Store lr in r2 around GET_TLS Richard Henderson
2013-02-27  3:17 ` [PATCH 18/26] arm: Use GET_TLS more often Richard Henderson
2013-02-28 21:59   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
2013-02-27 15:51   ` Måns Rullgård
2013-02-27 16:34     ` Richard Henderson
2013-02-27 17:49   ` Roland McGrath
2013-02-27  3:17 ` [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8 Richard Henderson
2013-02-27  7:04   ` Richard Henderson
2013-02-27  3:17 ` [PATCH 14/26] arm: Use push/pop mnemonics Richard Henderson
2013-02-28  1:03   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 07/26] arm: Introduce and use GET_TLS Richard Henderson
2013-02-28  0:34   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 20/26] arm: Implement armv6t2 optimized strlen Richard Henderson
2013-02-27 17:12   ` Måns Rullgård
2013-02-27 17:44     ` Richard Henderson
2013-02-27  3:17 ` [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc Richard Henderson
2013-02-28 21:57   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 21/26] arm: Implement armv6t2 optimized strcpy Richard Henderson
2013-02-27  3:17 ` [PATCH 24/26] arm: Add optimized addmul_1 Richard Henderson
2013-02-28 13:58   ` Måns Rullgård
2013-02-28 18:19     ` Richard Henderson
2013-02-28 19:37       ` Måns Rullgård
2013-02-27  3:17 ` [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr Richard Henderson
2013-02-28  1:31   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 25/26] arm: Add optimized submul_1 Richard Henderson
2013-02-27  3:17 ` [PATCH 06/26] arm: Use pc_ofs Richard Henderson
2013-02-28  0:21   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 10/26] arm: Introduce and use LDST_PCREL Richard Henderson
2013-02-28  1:00   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 26/26] arm: Add optimized add_n and sub_n Richard Henderson
2013-02-27  3:17 ` [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8 Richard Henderson
2013-02-27 17:54   ` Joseph S. Myers
2013-02-27 18:11     ` Richard Henderson
2013-02-27  3:17 ` [PATCH 15/26] arm: Delete LOADREGS macro Richard Henderson
2013-02-28  1:24   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 12/26] arm: Enable thumb2 mode in assembly files Richard Henderson
2013-02-27  3:17 ` [PATCH 16/26] arm: Commonize BX conditionals Richard Henderson
2013-02-28 21:51   ` Joseph S. Myers
2013-02-27  3:17 ` [PATCH 08/26] arm: Add IT insns for thumb mode Richard Henderson
2013-02-28  0:41   ` Joseph S. Myers
2013-02-27 15:41 ` [PATCH 00/26] ARM improvements Måns Rullgård
2013-02-27 16:59 ` Joseph S. Myers
2013-02-27 17:34   ` Richard Henderson
2013-02-28 22:05 ` Joseph S. Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).