* [PATCH 03/26] arm: Handle armv6 in preconfigure
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
@ 2013-02-27 3:16 ` Richard Henderson
2013-02-27 18:02 ` Joseph S. Myers
2013-02-27 3:16 ` [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs Richard Henderson
` (27 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:16 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
---
* sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.
---
ports/sysdeps/arm/preconfigure | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/ports/sysdeps/arm/preconfigure b/ports/sysdeps/arm/preconfigure
index b0c0540..d19e838 100644
--- a/ports/sysdeps/arm/preconfigure
+++ b/ports/sysdeps/arm/preconfigure
@@ -28,7 +28,10 @@ arm*)
machine=armv6t2
echo "Found compiler is configured for $machine"
;;
-
+ x__ARM_ARCH_6__)
+ machine=armv6
+ echo "Found compiler is configured for $machine"
+ ;;
*)
machine=arm
echo 2>&1 "arm/preconfigure: Did not find ARM architecture type; using default"
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
2013-02-27 3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
@ 2013-02-27 18:02 ` Joseph S. Myers
2013-02-27 18:04 ` Roland McGrath
2013-02-27 18:34 ` Richard Henderson
0 siblings, 2 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 18:02 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> * sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.
Shouldn't all of 6J, 6K, 6Z, 6ZK be handled the same way? That is,
__ARM_ARCH_6*__ unless you want to list all the variants explicitly?
(Do you get any warnings from configure about the armv6 directory not
existing?)
(When such a directory is actually added, an Implies file in armv6t2 will
be needed to make that fall back to armv6 versions as appropriate.)
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
2013-02-27 18:02 ` Joseph S. Myers
@ 2013-02-27 18:04 ` Roland McGrath
2013-02-27 18:08 ` Richard Henderson
2013-02-27 18:34 ` Richard Henderson
1 sibling, 1 reply; 63+ messages in thread
From: Roland McGrath @ 2013-02-27 18:04 UTC (permalink / raw)
To: Joseph S. Myers; +Cc: Richard Henderson, libc-ports
> (When such a directory is actually added, an Implies file in armv6t2 will
> be needed to make that fall back to armv6 versions as appropriate.)
Why wouldn't it be a subdirectory?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
2013-02-27 18:04 ` Roland McGrath
@ 2013-02-27 18:08 ` Richard Henderson
0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 18:08 UTC (permalink / raw)
To: Roland McGrath; +Cc: Joseph S. Myers, libc-ports
On 02/27/2013 10:04 AM, Roland McGrath wrote:
>> (When such a directory is actually added, an Implies file in armv6t2 will
>> be needed to make that fall back to armv6 versions as appropriate.)
>
> Why wouldn't it be a subdirectory?
>
*shrug*
For the same reason that we don't nest all incremental architecture extensions:
it's tedious to find armv7 code in arm/armv5/armv6/armv6t2/armv7/ as opposed to
arm/armv7/.
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
2013-02-27 18:02 ` Joseph S. Myers
2013-02-27 18:04 ` Roland McGrath
@ 2013-02-27 18:34 ` Richard Henderson
2013-02-27 19:57 ` Joseph S. Myers
1 sibling, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 18:34 UTC (permalink / raw)
To: Joseph S. Myers; +Cc: libc-ports
On 02/27/2013 10:01 AM, Joseph S. Myers wrote:
> On Tue, 26 Feb 2013, Richard Henderson wrote:
>
>> * sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.
>
> Shouldn't all of 6J, 6K, 6Z, 6ZK be handled the same way? That is,
> __ARM_ARCH_6*__ unless you want to list all the variants explicitly?
Does 6* get us into trouble with 6M? Or do we just assume that glibc is not
portable to non-A class cores?
> (Do you get any warnings from configure about the armv6 directory not
> existing?)
No.
> (When such a directory is actually added, an Implies file in armv6t2 will
> be needed to make that fall back to armv6 versions as appropriate.)
Yep.
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 03/26] arm: Handle armv6 in preconfigure
2013-02-27 18:34 ` Richard Henderson
@ 2013-02-27 19:57 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 19:57 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Wed, 27 Feb 2013, Richard Henderson wrote:
> On 02/27/2013 10:01 AM, Joseph S. Myers wrote:
> > On Tue, 26 Feb 2013, Richard Henderson wrote:
> >
> >> * sysdeps/arm/preconfigure: Handle __ARM_ARCH_6__.
> >
> > Shouldn't all of 6J, 6K, 6Z, 6ZK be handled the same way? That is,
> > __ARM_ARCH_6*__ unless you want to list all the variants explicitly?
>
> Does 6* get us into trouble with 6M? Or do we just assume that glibc is not
> portable to non-A class cores?
Only A class cores have an MMU and so are relevant to glibc (and I don't
see a use for building glibc for such a core that it can't run on, even if
theoretically one might build glibc for such a core but use it on a
different core with an MMU).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
2013-02-27 3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
@ 2013-02-27 3:16 ` Richard Henderson
2013-02-28 0:20 ` Joseph S. Myers
2013-02-27 3:16 ` [PATCH 01/26] Sync config.guess and config.sub with upstream Richard Henderson
` (26 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:16 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
---
* sysdeps/arm/sysdep.h (s, pc_ofs): New macros.
---
ports/sysdeps/arm/sysdep.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 0e6f645..4a9f05a 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -114,4 +114,19 @@
the caller. */
.eabi_attribute 24, 1
+/* We occasionally want to use the S form simply to achieve a smaller
+ instruction form in Thumb mode. Never set the flags in ARM mode. */
+#ifdef __thumb__
+# define s(insn) insn##s
+#else
+# define s(insn) insn
+#endif
+
+/* This number is the offset from the pc at the current location. */
+#ifdef __thumb__
+# define pc_ofs 4
+#else
+# define pc_ofs 8
+#endif
+
#endif /* __ASSEMBLER__ */
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
2013-02-27 3:16 ` [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs Richard Henderson
@ 2013-02-28 0:20 ` Joseph S. Myers
2013-02-28 0:36 ` Richard Henderson
0 siblings, 1 reply; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 0:20 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> * sysdeps/arm/sysdep.h (s, pc_ofs): New macros.
Other assembler syntax macros have names in uppercase, and a single
character seems too short for a macro name to me.
> +/* This number is the offset from the pc at the current location. */
> +#ifdef __thumb__
> +# define pc_ofs 4
> +#else
> +# define pc_ofs 8
> +#endif
As noted, I don't think this can have a conditional definition like this
until .S files are actually built as Thumb.
However, there are existing cases in asm statements with conditionals for
this (sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c and
sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c). So what I'd
actually suggest is defining such a macro for both C and .S code, but
always defining it to 8 for __ASSEMBLER__ at this point and only
conditioning on __thumb__ for C, and making those two C files use the
result of stringizing the macro. Then, when .S files are built as Thumb
the __ASSEMBLER__ conditionals on how to define this macro can be removed.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
2013-02-28 0:20 ` Joseph S. Myers
@ 2013-02-28 0:36 ` Richard Henderson
2013-02-28 1:45 ` Måns Rullgård
0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-28 0:36 UTC (permalink / raw)
To: Joseph S. Myers; +Cc: libc-ports
On 2013-02-27 16:20, Joseph S. Myers wrote:
>> * sysdeps/arm/sysdep.h (s, pc_ofs): New macros.
> Other assembler syntax macros have names in uppercase, and a single
> character seems too short for a macro name to me.
>
I simply found that "s(and)" is easy to read, and it looks close to "ands" to
which it expands. I'd rather drop this change, and all uses thereof, than make
the code less readable.
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs
2013-02-28 0:36 ` Richard Henderson
@ 2013-02-28 1:45 ` Måns Rullgård
0 siblings, 0 replies; 63+ messages in thread
From: Måns Rullgård @ 2013-02-28 1:45 UTC (permalink / raw)
To: Richard Henderson; +Cc: Joseph S. Myers, libc-ports
Richard Henderson <rth@twiddle.net> writes:
> On 2013-02-27 16:20, Joseph S. Myers wrote:
>>> * sysdeps/arm/sysdep.h (s, pc_ofs): New macros.
>> Other assembler syntax macros have names in uppercase, and a single
>> character seems too short for a macro name to me.
>
> I simply found that "s(and)" is easy to read, and it looks close to
> "ands" to which it expands. I'd rather drop this change, and all uses
> thereof, than make the code less readable.
How many places is it used? From what I can tell, the only purpose is
to enable use of the 16-bit thumb encoding, so if it's used in just a
few places, the savings might not be worth the obfuscation.
--
Måns Rullgård
mans@mansr.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 01/26] Sync config.guess and config.sub with upstream
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
2013-02-27 3:16 ` [PATCH 03/26] arm: Handle armv6 in preconfigure Richard Henderson
2013-02-27 3:16 ` [PATCH 05/26] arm: Introduce thumb helpers s and pc_ofs Richard Henderson
@ 2013-02-27 3:16 ` Richard Henderson
2013-02-27 17:03 ` Joseph S. Myers
2013-02-27 3:16 ` [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines Richard Henderson
` (25 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:16 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
---
* scripts/config.guess: Merge upstream version 2013-02-12.
* scripts/config.sub: Likewise.
---
scripts/config.guess | 31 ++++++++++------------
scripts/config.sub | 72 +++++++++++++++++++++++++++-------------------------
2 files changed, 51 insertions(+), 52 deletions(-)
mode change 100755 => 100644 scripts/config.guess
mode change 100755 => 100644 scripts/config.sub
diff --git a/scripts/config.guess b/scripts/config.guess
old mode 100755
new mode 100644
index 872b96a..f475ceb
--- a/scripts/config.guess
+++ b/scripts/config.guess
@@ -1,14 +1,12 @@
#! /bin/sh
# Attempt to guess a canonical system name.
-# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
-# 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
-# 2011, 2012 Free Software Foundation, Inc.
+# Copyright 1992-2013 Free Software Foundation, Inc.
-timestamp='2012-09-25'
+timestamp='2013-02-12'
# This file is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
+# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
@@ -22,19 +20,17 @@ timestamp='2012-09-25'
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
-# the same distribution terms that you use for the rest of that program.
-
-
-# Originally written by Per Bothner. Please send patches (context
-# diff format) to <config-patches@gnu.org> and include a ChangeLog
-# entry.
+# the same distribution terms that you use for the rest of that
+# program. This Exception is an additional permission under section 7
+# of the GNU General Public License, version 3 ("GPLv3").
#
-# This script attempts to guess a canonical system name similar to
-# config.sub. If it succeeds, it prints the system name on stdout, and
-# exits with 0. Otherwise, it exits with 1.
+# Originally written by Per Bothner.
#
# You can get the latest version of this script from:
# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
+#
+# Please send patches with a ChangeLog entry to config-patches@gnu.org.
+
me=`echo "$0" | sed -e 's,.*/,,'`
@@ -54,9 +50,7 @@ version="\
GNU config.guess ($timestamp)
Originally written by Per Bothner.
-Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
-Free Software Foundation, Inc.
+Copyright 1992-2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -959,6 +953,9 @@ EOF
eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep '^CPU'`
test x"${CPU}" != x && { echo "${CPU}-unknown-linux-gnu"; exit; }
;;
+ or1k:Linux:*:*)
+ echo ${UNAME_MACHINE}-unknown-linux-gnu
+ exit ;;
or32:Linux:*:*)
echo ${UNAME_MACHINE}-unknown-linux-gnu
exit ;;
diff --git a/scripts/config.sub b/scripts/config.sub
old mode 100755
new mode 100644
index bdda9e4..872199a
--- a/scripts/config.sub
+++ b/scripts/config.sub
@@ -1,24 +1,18 @@
#! /bin/sh
# Configuration validation subroutine script.
-# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
-# 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
-# 2011, 2012 Free Software Foundation, Inc.
+# Copyright 1992-2013 Free Software Foundation, Inc.
-timestamp='2012-08-18'
+timestamp='2013-02-12'
-# This file is (in principle) common to ALL GNU software.
-# The presence of a machine in this file suggests that SOME GNU software
-# can handle that machine. It does not imply ALL GNU software can.
-#
-# This file is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
+# This file is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-# GNU General Public License for more details.
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.
@@ -26,11 +20,12 @@ timestamp='2012-08-18'
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
-# the same distribution terms that you use for the rest of that program.
+# the same distribution terms that you use for the rest of that
+# program. This Exception is an additional permission under section 7
+# of the GNU General Public License, version 3 ("GPLv3").
-# Please send patches to <config-patches@gnu.org>. Submit a context
-# diff and a properly formatted GNU ChangeLog entry.
+# Please send patches with a ChangeLog entry to config-patches@gnu.org.
#
# Configuration subroutine to validate and canonicalize a configuration type.
# Supply the specified configuration type as an argument.
@@ -73,9 +68,7 @@ Report bugs and patches to <config-patches@gnu.org>."
version="\
GNU config.sub ($timestamp)
-Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
-Free Software Foundation, Inc.
+Copyright 1992-2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -156,7 +149,7 @@ case $os in
-convergent* | -ncr* | -news | -32* | -3600* | -3100* | -hitachi* |\
-c[123]* | -convex* | -sun | -crds | -omron* | -dg | -ultra | -tti* | \
-harris | -dolphin | -highlevel | -gould | -cbm | -ns | -masscomp | \
- -apple | -axis | -knuth | -cray | -microblaze)
+ -apple | -axis | -knuth | -cray | -microblaze*)
os=
basic_machine=$1
;;
@@ -259,8 +252,10 @@ case $basic_machine in
| alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \
| alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] | alpha64pca5[67] \
| am33_2.0 \
- | arc | arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb] | avr | avr32 \
- | be32 | be64 \
+ | arc \
+ | arm | arm[bl]e | arme[lb] | armv[2-8] | armv[3-8][lb] | armv7[arm] \
+ | avr | avr32 \
+ | be32 | be64 \
| bfin \
| c4x | clipper \
| d10v | d30v | dlx | dsp16xx \
@@ -273,7 +268,7 @@ case $basic_machine in
| le32 | le64 \
| lm32 \
| m32c | m32r | m32rle | m68000 | m68k | m88k \
- | maxq | mb | microblaze | mcore | mep | metag \
+ | maxq | mb | microblaze | microblazeel | mcore | mep | metag \
| mips | mipsbe | mipseb | mipsel | mipsle \
| mips16 \
| mips64 | mips64el \
@@ -291,16 +286,17 @@ case $basic_machine in
| mipsisa64r2 | mipsisa64r2el \
| mipsisa64sb1 | mipsisa64sb1el \
| mipsisa64sr71k | mipsisa64sr71kel \
+ | mipsr5900 | mipsr5900el \
| mipstx39 | mipstx39el \
| mn10200 | mn10300 \
| moxie \
| mt \
| msp430 \
| nds32 | nds32le | nds32be \
- | nios | nios2 \
+ | nios | nios2 | nios2eb | nios2el \
| ns16k | ns32k \
| open8 \
- | or32 \
+ | or1k | or32 \
| pdp10 | pdp11 | pj | pjl \
| powerpc | powerpc64 | powerpc64le | powerpcle \
| pyramid \
@@ -389,7 +385,8 @@ case $basic_machine in
| lm32-* \
| m32c-* | m32r-* | m32rle-* \
| m68000-* | m680[012346]0-* | m68360-* | m683?2-* | m68k-* \
- | m88110-* | m88k-* | maxq-* | mcore-* | metag-* | microblaze-* \
+ | m88110-* | m88k-* | maxq-* | mcore-* | metag-* \
+ | microblaze-* | microblazeel-* \
| mips-* | mipsbe-* | mipseb-* | mipsel-* | mipsle-* \
| mips16-* \
| mips64-* | mips64el-* \
@@ -407,12 +404,13 @@ case $basic_machine in
| mipsisa64r2-* | mipsisa64r2el-* \
| mipsisa64sb1-* | mipsisa64sb1el-* \
| mipsisa64sr71k-* | mipsisa64sr71kel-* \
+ | mipsr5900-* | mipsr5900el-* \
| mipstx39-* | mipstx39el-* \
| mmix-* \
| mt-* \
| msp430-* \
| nds32-* | nds32le-* | nds32be-* \
- | nios-* | nios2-* \
+ | nios-* | nios2-* | nios2eb-* | nios2el-* \
| none-* | np1-* | ns16k-* | ns32k-* \
| open8-* \
| orion-* \
@@ -788,7 +786,7 @@ case $basic_machine in
basic_machine=ns32k-utek
os=-sysv
;;
- microblaze)
+ microblaze*)
basic_machine=microblaze-xilinx
;;
mingw64)
@@ -1023,7 +1021,11 @@ case $basic_machine in
basic_machine=i586-unknown
os=-pw32
;;
- rdos)
+ rdos | rdos64)
+ basic_machine=x86_64-pc
+ os=-rdos
+ ;;
+ rdos32)
basic_machine=i386-pc
os=-rdos
;;
@@ -1350,7 +1352,7 @@ case $os in
-gnu* | -bsd* | -mach* | -minix* | -genix* | -ultrix* | -irix* \
| -*vms* | -sco* | -esix* | -isc* | -aix* | -cnk* | -sunos | -sunos[34]*\
| -hpux* | -unos* | -osf* | -luna* | -dgux* | -auroraux* | -solaris* \
- | -sym* | -kopensolaris* \
+ | -sym* | -kopensolaris* | -plan9* \
| -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \
| -aos* | -aros* \
| -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \
@@ -1496,9 +1498,6 @@ case $os in
-aros*)
os=-aros
;;
- -kaos*)
- os=-kaos
- ;;
-zvmoe)
os=-zvmoe
;;
@@ -1590,6 +1589,9 @@ case $basic_machine in
mips*-*)
os=-elf
;;
+ or1k-*)
+ os=-elf
+ ;;
or32-*)
os=-coff
;;
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (2 preceding siblings ...)
2013-02-27 3:16 ` [PATCH 01/26] Sync config.guess and config.sub with upstream Richard Henderson
@ 2013-02-27 3:16 ` Richard Henderson
2013-02-28 0:15 ` Joseph S. Myers
2013-02-27 3:16 ` [PATCH 09/26] arm: Mark assembly files that will not use thumb mode Richard Henderson
` (24 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:16 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
When compiling with -mthumb, ld.so itself also needs __libc_do_syscall.
---
* sysdeps/unix/sysv/linux/arm/Makefile [elf] (sysdep-rtld-routines):
Include libc-do-syscall.
---
ports/sysdeps/unix/sysv/linux/arm/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/Makefile b/ports/sysdeps/unix/sysv/linux/arm/Makefile
index be7946e..56ef159 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/Makefile
+++ b/ports/sysdeps/unix/sysv/linux/arm/Makefile
@@ -10,7 +10,7 @@ shared-only-routines += libc-aeabi_read_tp
endif
ifeq ($(subdir),elf)
-sysdep-rtld-routines += aeabi_read_tp
+sysdep-rtld-routines += aeabi_read_tp libc-do-syscall
endif
ifeq ($(subdir),misc)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines
2013-02-27 3:16 ` [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines Richard Henderson
@ 2013-02-28 0:15 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 0:15 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> When compiling with -mthumb, ld.so itself also needs __libc_do_syscall.
> ---
> * sysdeps/unix/sysv/linux/arm/Makefile [elf] (sysdep-rtld-routines):
> Include libc-do-syscall.
OK, though I haven't observed such a need myself. There really ought to
be a better way of making things like this available to all libraries,
though, rather than needing to list so many cases explicitly in this
Makefile.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 09/26] arm: Mark assembly files that will not use thumb mode
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (3 preceding siblings ...)
2013-02-27 3:16 ` [PATCH 04/26] arm: Include libc-do-syscall in sysdep-rtld-routines Richard Henderson
@ 2013-02-27 3:16 ` Richard Henderson
2013-02-28 0:58 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8 Richard Henderson
` (23 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:16 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Some routines are written with complex LDM/STM insns that cannot be
used in thumb mode, or are highly conditional requiring excessive
IT insns.
When a future patch goes in to enable thumb2 by default, this marker
will be used to override that default.
---
* ports/sysdeps/arm/__longjmp.S: Define NO_THUMB before <sysdep.h>
* sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
* sysdeps/arm/dl-trampoline.S: Likewise.
* sysdeps/arm/memcpy.S: Likewise.
* sysdeps/arm/memmove.S: Likewise.
* sysdeps/arm/memset.S: Likewise.
* sysdeps/arm/setjmp.S: Likewise.
* sysdeps/arm/strlen.S: Likewise.
* sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
* sysdeps/unix/sysv/linux/arm/setcontext.S: Likewise.
---
ports/sysdeps/arm/__longjmp.S | 2 ++
ports/sysdeps/arm/crti.S | 2 ++
ports/sysdeps/arm/crtn.S | 2 ++
ports/sysdeps/arm/dl-trampoline.S | 2 ++
ports/sysdeps/arm/memcpy.S | 2 ++
ports/sysdeps/arm/memmove.S | 2 ++
ports/sysdeps/arm/memset.S | 2 ++
ports/sysdeps/arm/setjmp.S | 2 ++
ports/sysdeps/arm/strlen.S | 2 ++
ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S | 2 ++
ports/sysdeps/unix/sysv/linux/arm/setcontext.S | 2 ++
11 files changed, 22 insertions(+)
diff --git a/ports/sysdeps/arm/__longjmp.S b/ports/sysdeps/arm/__longjmp.S
index 28281d5..af4b963 100644
--- a/ports/sysdeps/arm/__longjmp.S
+++ b/ports/sysdeps/arm/__longjmp.S
@@ -16,6 +16,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* ??? Needs more rearrangement for the LDM to handle thumb mode. */
+#define NO_THUMB
#include <sysdep.h>
#define _SETJMP_H
#define _ASM
diff --git a/ports/sysdeps/arm/crti.S b/ports/sysdeps/arm/crti.S
index 44e20f0..1d55ae2 100644
--- a/ports/sysdeps/arm/crti.S
+++ b/ports/sysdeps/arm/crti.S
@@ -38,6 +38,8 @@
they can be called as functions. The symbols _init and _fini are
magic and cause the linker to emit DT_INIT and DT_FINI. */
+/* Always build .init and .fini sections in ARM mode. */
+#define NO_THUMB
#include <libc-symbols.h>
#include <sysdep.h>
diff --git a/ports/sysdeps/arm/crtn.S b/ports/sysdeps/arm/crtn.S
index 5ff3661..a01eb01 100644
--- a/ports/sysdeps/arm/crtn.S
+++ b/ports/sysdeps/arm/crtn.S
@@ -33,6 +33,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* Always build .init and .fini sections in ARM mode. */
+#define NO_THUMB
#include <sysdep.h>
/* crtn.S puts function epilogues in the .init and .fini sections
diff --git a/ports/sysdeps/arm/dl-trampoline.S b/ports/sysdeps/arm/dl-trampoline.S
index ebf221c..b9769cb 100644
--- a/ports/sysdeps/arm/dl-trampoline.S
+++ b/ports/sysdeps/arm/dl-trampoline.S
@@ -16,6 +16,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* ??? Needs more rearrangement for the LDM to handle thumb mode. */
+#define NO_THUMB
#include <sysdep.h>
#include <libc-symbols.h>
diff --git a/ports/sysdeps/arm/memcpy.S b/ports/sysdeps/arm/memcpy.S
index d8164b4..98b9b47 100644
--- a/ports/sysdeps/arm/memcpy.S
+++ b/ports/sysdeps/arm/memcpy.S
@@ -17,6 +17,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* Thumb requires excessive IT insns here. */
+#define NO_THUMB
#include <sysdep.h>
/*
diff --git a/ports/sysdeps/arm/memmove.S b/ports/sysdeps/arm/memmove.S
index d33c1ce..059ca7a 100644
--- a/ports/sysdeps/arm/memmove.S
+++ b/ports/sysdeps/arm/memmove.S
@@ -17,6 +17,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* Thumb requires excessive IT insns here. */
+#define NO_THUMB
#include <sysdep.h>
/*
diff --git a/ports/sysdeps/arm/memset.S b/ports/sysdeps/arm/memset.S
index 3152a84..9924cb911 100644
--- a/ports/sysdeps/arm/memset.S
+++ b/ports/sysdeps/arm/memset.S
@@ -16,6 +16,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* Thumb requires excessive IT insns here. */
+#define NO_THUMB
#include <sysdep.h>
/* void *memset (dstpp, c, len) */
diff --git a/ports/sysdeps/arm/setjmp.S b/ports/sysdeps/arm/setjmp.S
index 774c78a..39f2662 100644
--- a/ports/sysdeps/arm/setjmp.S
+++ b/ports/sysdeps/arm/setjmp.S
@@ -16,6 +16,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* ??? Needs more rearrangement for the STM to handle thumb mode. */
+#define NO_THUMB
#include <sysdep.h>
#define _SETJMP_H
#define _ASM
diff --git a/ports/sysdeps/arm/strlen.S b/ports/sysdeps/arm/strlen.S
index 15e9221..2b947e2 100644
--- a/ports/sysdeps/arm/strlen.S
+++ b/ports/sysdeps/arm/strlen.S
@@ -16,6 +16,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* Thumb requires excessive IT insns here. */
+#define NO_THUMB
#include <sysdep.h>
/* size_t strlen(const char *S)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
index bdcfa20..29edec6 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
@@ -15,6 +15,8 @@
License along with the GNU C Library. If not, see
<http://www.gnu.org/licenses/>. */
+/* ??? Needs more rearrangement for the LDM to handle thumb mode. */
+#define NO_THUMB
#include <sysdep.h>
.section .rodata.str1.1,"aMS",%progbits,1
diff --git a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
index 8d96c57..45e751b 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
@@ -15,6 +15,8 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
+/* ??? Needs more rearrangement for the LDM to handle thumb mode. */
+#define NO_THUMB
#include <sysdep.h>
#include <rtld-global-offsets.h>
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 09/26] arm: Mark assembly files that will not use thumb mode
2013-02-27 3:16 ` [PATCH 09/26] arm: Mark assembly files that will not use thumb mode Richard Henderson
@ 2013-02-28 0:58 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 0:58 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> Some routines are written with complex LDM/STM insns that cannot be
> used in thumb mode, or are highly conditional requiring excessive
> IT insns.
>
> When a future patch goes in to enable thumb2 by default, this marker
> will be used to override that default.
> ---
> * ports/sysdeps/arm/__longjmp.S: Define NO_THUMB before <sysdep.h>
> * sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
> * sysdeps/arm/dl-trampoline.S: Likewise.
> * sysdeps/arm/memcpy.S: Likewise.
> * sysdeps/arm/memmove.S: Likewise.
> * sysdeps/arm/memset.S: Likewise.
> * sysdeps/arm/setjmp.S: Likewise.
> * sysdeps/arm/strlen.S: Likewise.
> * sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
> * sysdeps/unix/sysv/linux/arm/setcontext.S: Likewise.
OK.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (4 preceding siblings ...)
2013-02-27 3:16 ` [PATCH 09/26] arm: Mark assembly files that will not use thumb mode Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 7:04 ` Richard Henderson
2013-02-27 3:17 ` [PATCH 07/26] arm: Introduce and use GET_TLS Richard Henderson
` (22 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Not recently speed tested vs the Linaro version, but having a
common set of algorithms for all the *chr routines is surely
worth something for maintanence. But I do recall it was once
a few percent faster on A8.
---
* sysdeps/arm/armv6t2/memchr.S: Rewrite.
---
ports/sysdeps/arm/armv6t2/memchr.S | 216 +++++++++++++++++--------------------
1 file changed, 101 insertions(+), 115 deletions(-)
diff --git a/ports/sysdeps/arm/armv6t2/memchr.S b/ports/sysdeps/arm/armv6t2/memchr.S
index 6d35f47..1739a4c 100644
--- a/ports/sysdeps/arm/armv6t2/memchr.S
+++ b/ports/sysdeps/arm/armv6t2/memchr.S
@@ -22,142 +22,128 @@
@ and ARMv6T2 processors. It has a fast path for short sizes, and has an
@ optimised path for large data sets; the worst case is finding the match early
@ in a large data set.
-@ Note: The use of cbz/cbnz means it's Thumb only
-
-@ 2011-07-15 david.gilbert@linaro.org
-@ Copy from Cortex strings release 21 and change license
-@ http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/src/linaro-a9/memchr.S
-@ Change function declarations/entry/exit
-@ 2011-12-01 david.gilbert@linaro.org
-@ Add some fixes from comments received (including use of ldrd instead ldm)
-@ 2011-12-07 david.gilbert@linaro.org
-@ Removed cbz from align loop - can't be taken
-
-@ this lets us check a flag in a 00/ff byte easily in either endianness
-#ifdef __ARMEB__
-#define CHARTSTMASK(c) 1<<(31-(c*8))
-#else
-#define CHARTSTMASK(c) 1<<(c*8)
-#endif
- .syntax unified
+ .syntax unified
.text
- .thumb
-@ ---------------------------------------------------------------------------
- .thumb_func
- .global memchr
- .type memchr,%function
ENTRY(memchr)
@ r0 = start of memory to scan
@ r1 = character to look for
@ r2 = length
@ returns r0 = pointer to character or NULL if not found
- and r1,r1,#0xff @ Don't think we can trust the caller to actually pass a char
-
- cmp r2,#16 @ If it's short don't bother with anything clever
- blt 20f
- tst r0, #7 @ If it's already aligned skip the next bit
- beq 10f
+ uxtb r1, r1
+ cmp r2, #16 @ Is the buffer too short?
+ blo .Lbuf_small
+ tst r0, #7 @ Is the buffer already aligned?
+ beq .Lbuf_aligned
@ Work up to an aligned point
-5:
- ldrb r3, [r0],#1
- subs r2, r2, #1
- cmp r3, r1
- beq 50f @ If it matches exit found
- tst r0, #7
- bne 5b @ If not aligned yet then do next byte
-
-10:
- @ At this point, we are aligned, we know we have at least 8 bytes to work with
- push {r4,r5,r6,r7}
- cfi_adjust_cfa_offset (16)
+0: ldrb r3, [r0], #1
+ s(sub) r2, r2, #1
+ cmp r3, r1 @ If found, adjust and return.
+ beq .Lfound_minus1
+ tst r0, #7 @ If not yet aligned, loop
+ bne 0b
+
+.Lbuf_aligned:
+ @ Here, we are aligned and we have at least 8 bytes to work with.
+ push { r4, r5 }
+ cfi_adjust_cfa_offset (8)
cfi_rel_offset (r4, 0)
cfi_rel_offset (r5, 4)
- cfi_rel_offset (r6, 8)
- cfi_rel_offset (r7, 12)
- cfi_remember_state
-
- orr r1, r1, r1, lsl #8 @ expand the match word across to all bytes
+ orr r1, r1, r1, lsl #8 @ Replicate C to all bytes
+ movw ip, #0xfefe
orr r1, r1, r1, lsl #16
- bic r4, r2, #7 @ Number of double words to work with * 8
- mvns r7, #0 @ all F's
- movs r3, #0
-
-15:
- ldrd r5,r6, [r0],#8
- subs r4, r4, #8
- eor r5,r5, r1 @ Get it so that r5,r6 have 00's where the bytes match the target
- eor r6,r6, r1
- uadd8 r5, r5, r7 @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
- sel r5, r3, r7 @ bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
- uadd8 r6, r6, r7 @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
- sel r6, r5, r7 @ chained....bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
- cbnz r6, 60f
- bne 15b @ (Flags from the subs above) If not run out of bytes then go around again
-
- pop {r4,r5,r6,r7}
- cfi_adjust_cfa_offset (-16)
+ movt ip, #0xfefe
+
+1: ldrd r4, r5, [r0], #8
+ s(eor) r4, r4, r1 @ Convert C's to zeros
+ s(eor) r5, r5, r1
+ @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+ @ that was originally zero and 0xff otherwise. Therefore we consider
+ @ the lsb of each byte the "found" bit, with 0 for a match.
+ uqadd8 r4, r4, ip
+ uqadd8 r5, r5, ip
+ cmp r2, #8 @ Have we still a full 8 bytes left?
+ blo .Lbuf_aligned_finish
+ s(and) r5, r4, r4 @ Combine found bits between words.
+ mvns r5, r5 @ Match within either word?
+ beq 1b
+
+ @ Here, we've found a match. Disambiguate 1st or 2nd word.
+ mvns r4, r4
+ itee ne
+ subne r0, r0, #8 @ Dec pointer to 1st word
+ subeq r0, r0, #4 @ Dec pointer to 2nd word
+ moveq r4, r5 @ Copy found bits from 2nd word.
+
+.Lfind_word:
+ @ Here we've found a match.
+ @ r0 = pointer to word containing the match
+ @ r4 = found bits for the word.
+#ifdef __ARMEL__
+ rbit r4, r4 @ For LE we need count-trailing-zeros
+#endif
+ clz r4, r4 @ Find the bit offset of the match.
+ add r0, r0, r4, lsr #3 @ Adjust the pointer to the found byte
+
+ pop { r4, r5 }
+ cfi_remember_state
+ cfi_adjust_cfa_offset (-8)
cfi_restore (r4)
cfi_restore (r5)
- cfi_restore (r6)
- cfi_restore (r7)
-
- and r1,r1,#0xff @ Get r1 back to a single character from the expansion above
- and r2,r2,#7 @ Leave the count remaining as the number after the double words have been done
-
-20:
- cbz r2, 40f @ 0 length or hit the end already then not found
-
-21: @ Post aligned section, or just a short call
- ldrb r3,[r0],#1
- subs r2,r2,#1
- eor r3,r3,r1 @ r3 = 0 if match - doesn't break flags from sub
- cbz r3, 50f
- bne 21b @ on r2 flags
-
-40:
- movs r0,#0 @ not found
- DO_RET(lr)
+ bx lr
-50:
- subs r0,r0,#1 @ found
- DO_RET(lr)
-
-60: @ We're here because the fast path found a hit - now we have to track down exactly which word it was
- @ r0 points to the start of the double word after the one that was tested
- @ r5 has the 00/ff pattern for the first word, r6 has the chained value
cfi_restore_state
- cmp r5, #0
- itte eq
- moveq r5, r6 @ the end is in the 2nd word
- subeq r0,r0,#3 @ Points to 2nd byte of 2nd word
- subne r0,r0,#7 @ or 2nd byte of 1st word
-
- @ r0 currently points to the 2nd byte of the word containing the hit
- tst r5, # CHARTSTMASK(0) @ 1st character
- bne 61f
- adds r0,r0,#1
- tst r5, # CHARTSTMASK(1) @ 2nd character
- ittt eq
- addeq r0,r0,#1
- tsteq r5, # (3<<15) @ 2nd & 3rd character
- @ If not the 3rd must be the last one
- addeq r0,r0,#1
-
-61:
- pop {r4,r5,r6,r7}
- cfi_adjust_cfa_offset (-16)
+.Lbuf_aligned_finish:
+ @ Here we've read and computed found bits for 8 bytes, but not
+ @ all of those bytes are within the buffer. Determine which
+ @ found bytes are really valid.
+ s(sub) r0, r0, #8 @ Dec pointer to the 1st word
+ cmp r2, #4 @ Do we have at least 4 bytes left?
+ blo 1f
+ mvns r4, r4 @ Match within the 1st word?
+ bne .Lfind_word
+ s(add) r0, r0, #4 @ Inc pointer to the 2nd word
+ s(mvn) r4, r5 @ Copy found bits from 2nd word
+ s(sub) r2, r2, #4 @ Bytes remaining in 2nd word
+1:
+ lsls r2, r2, #3 @ Convert remaining to bits
+ bne 2f @ No bytes remaining?
+ mvn r3, #0
+#ifdef __ARMEL__
+ s(lsl) r3, r3, r2 @ Mask with 1s covering invalid bytes
+#else
+ s(lsr) r3, r3, r2
+#endif
+ bics r4, r4, r3 @ Clear found past end of buffer
+ bne .Lfind_word
+2:
+ s(mov) r0, #0 @ No found
+ pop { r4, r5 }
+ cfi_adjust_cfa_offset (-8)
cfi_restore (r4)
cfi_restore (r5)
- cfi_restore (r6)
- cfi_restore (r7)
-
- subs r0,r0,#1
- DO_RET(lr)
+ bx lr
+
+.Lbuf_small:
+ @ Here we've a small buffer to be searched a byte at a time.
+0: ldrb r3, [r0], #1
+ cmp r3, r1 @ If found, adjust and return.
+ beq .Lfound_minus1
+ subs r2, r2, #1 @ Any bytes left?
+ bne 0b
+
+ s(mov) r0, #0 @ Not found
+ bx lr
+
+.Lfound_minus1:
+ @ Here we've found a match in a byte loop
+ @ r0 = pointer, post-incremented past the byte
+ s(sub) r0, r0, #1
+ bx lr
END(memchr)
libc_hidden_builtin_def (memchr)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 07/26] arm: Introduce and use GET_TLS
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (5 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 23/26] arm: Rewrite armv6t2 memchr with uqadd8 Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 0:34 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 14/26] arm: Use push/pop mnemonics Richard Henderson
` (21 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Factor out the sequence needed to call kuser_get_tls,
as we can't play subtract into pc games in thumb mode.
---
* sysdeps/unix/sysv/linux/arm/sysdep.h (GET_TLS): New macro.
* sysdeps/unix/arm/sysdep.S (__syscall_error): Use it.
* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S (SAVE_PID): Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/vfork.S (SAVE_PID): Likewise.
* sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S (__aeabi_read_tp):
Add thumb2 alternative.
---
ports/sysdeps/unix/arm/sysdep.S | 4 +---
ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S | 6 ++++++
ports/sysdeps/unix/sysv/linux/arm/clone.S | 4 +---
ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S | 4 +---
ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S | 4 +---
ports/sysdeps/unix/sysv/linux/arm/sysdep.h | 15 +++++++++++++++
6 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index 76137b3..425f4ac 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -40,9 +40,7 @@ __syscall_error:
cfi_register (lr, ip)
mov r1, r0
- mov r0, #0xffff0fff
- mov lr, pc
- sub pc, r0, #31
+ GET_TLS
ldr r2, 1f
2: ldr r2, [pc, r2]
diff --git a/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S b/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S
index c4ddbc6..ecdc322 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S
@@ -41,6 +41,12 @@
.hidden __aeabi_read_tp
ENTRY (__aeabi_read_tp)
+#ifdef __thumb2__
+ movw r0, #0x0fe0
+ movt r0, #0xffff
+ bx r0
+#else
mov r0, #0xffff0fff
sub pc, r0, #31
+#endif
END (__aeabi_read_tp)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index de25db1..8807781 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -73,9 +73,7 @@ PSEUDO_END (__clone)
#ifdef RESET_PID
tst ip, #CLONE_THREAD
bne 3f
- mov r0, #0xffff0fff
- mov lr, pc
- sub pc, r0, #31
+ GET_TLS
mov r1, r0
tst ip, #CLONE_VM
movne r0, #-1
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
index a38d564..749aaab 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
@@ -22,9 +22,7 @@
str lr, [sp, #-4]!; /* Save LR. */ \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (lr, 0); \
- mov r0, #0xffff0fff; /* Point to the high page. */ \
- mov lr, pc; /* Save our return address. */ \
- sub pc, r0, #31; /* Jump to the TLS entry. */ \
+ GET_TLS; \
ldr lr, [sp], #4; /* Restore LR. */ \
cfi_adjust_cfa_offset (-4); \
cfi_restore (lr); \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
index 3fce2d1..1bbe5c6 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
@@ -22,9 +22,7 @@
str lr, [sp, #-4]!; /* Save LR. */ \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (lr, 0); \
- mov r0, #0xffff0fff; /* Point to the high page. */ \
- mov lr, pc; /* Save our return address. */ \
- sub pc, r0, #31; /* Jump to the TLS entry. */ \
+ GET_TLS; \
ldr lr, [sp], #4; /* Restore LR. */ \
cfi_adjust_cfa_offset (-4); \
cfi_restore (lr); \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index cb237d9..dae9d98 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -45,6 +45,21 @@
#ifdef __ASSEMBLER__
+/* Call the linux kernel kuser_get_tls helper. Returns in R0, clobbers LR.
+ Note that in thumb mode, a constant pool break is often out of range, so
+ we always expand the constant inline. */
+#ifdef __thumb2__
+# define GET_TLS \
+ movw r0, #0x0fe0; \
+ movt r0, #0xffff; \
+ blx r0
+#else
+# define GET_TLS \
+ mov r0, #0xffff0fff; /* Point to the high page. */ \
+ mov lr, pc; /* Save our return address. */ \
+ sub pc, r0, #31 /* Jump to the TLS entry. */
+#endif
+
/* Linux uses a negative return value to indicate syscall errors,
unlike most Unices, which use the condition codes' carry flag.
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 07/26] arm: Introduce and use GET_TLS
2013-02-27 3:17 ` [PATCH 07/26] arm: Introduce and use GET_TLS Richard Henderson
@ 2013-02-28 0:34 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 0:34 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> Factor out the sequence needed to call kuser_get_tls,
> as we can't play subtract into pc games in thumb mode.
> ---
> * sysdeps/unix/sysv/linux/arm/sysdep.h (GET_TLS): New macro.
> * sysdeps/unix/arm/sysdep.S (__syscall_error): Use it.
> * sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
> * sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S (SAVE_PID): Likewise.
> * sysdeps/unix/sysv/linux/arm/nptl/vfork.S (SAVE_PID): Likewise.
> * sysdeps/unix/sysv/linux/arm/aeabi_read_tp.S (__aeabi_read_tp):
> Add thumb2 alternative.
OK - although for v6K, v6ZK and v7-A and above, GCC defaults to -mtp=hard,
so it might make sense as a followup to define this macro to use
corresponding hard-tp code on such architectures (and for that matter to
make __aeabi_read_tp use the hard-tp instruction, for running code built
for an older architecture with libc built for a newer architecture).
(In the hard-tp case you could then avoid the lr save/restore in the users
of GET_TLS, though you'd need to define another macro to say that GET_TLS
doesn't clobber lr.)
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 14/26] arm: Use push/pop mnemonics
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (6 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 07/26] arm: Introduce and use GET_TLS Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 1:03 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 20/26] arm: Implement armv6t2 optimized strlen Richard Henderson
` (20 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
For arm this makes no difference--the result is bit-for-bit identical;
for thumb this results in smaller encodings. Perhaps it ought not and
this is in fact an assembler bug, but I also think it's clearer.
---
* sysdeps/arm/arm-mcount.S (_mcount): Use push/pop mnemonics.
* sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
* sysdeps/arm/dl-tlsdesc.S: Likewise.
* sysdeps/arm/dl-trampoline.S: Likewise.
* sysdeps/arm/start.S: Likewise.
* sysdeps/arm/memcpy.S (PULL): Rename macro from pull.
(PUSH): Rename macro from push.
(memcpy): Use push/pop mnemonics.
* sysdeps/arm/memmove.S: Similarly.
* sysdeps/arm/sysdep.h (CALL_MCOUNT): Use push/pop mnemonics.
* sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
* sysdeps/unix/sysv/linux/arm/mmap.S: Likewise.
* sysdeps/unix/sysv/linux/arm/mmap64.S: Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c: Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c: Likewise.
* sysdeps/unix/sysv/linux/arm/syscall.S: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.
---
ports/sysdeps/arm/arm-mcount.S | 6 +--
ports/sysdeps/arm/crti.S | 4 +-
ports/sysdeps/arm/crtn.S | 8 +--
ports/sysdeps/arm/dl-tlsdesc.S | 20 ++++----
ports/sysdeps/arm/dl-trampoline.S | 4 +-
ports/sysdeps/arm/memcpy.S | 58 +++++++++++-----------
ports/sysdeps/arm/memmove.S | 58 +++++++++++-----------
ports/sysdeps/arm/start.S | 10 ++--
ports/sysdeps/arm/sysdep.h | 6 +--
.../sysdeps/unix/sysv/linux/arm/____longjmp_chk.S | 4 +-
ports/sysdeps/unix/sysv/linux/arm/clone.S | 4 +-
ports/sysdeps/unix/sysv/linux/arm/mmap.S | 8 +--
ports/sysdeps/unix/sysv/linux/arm/mmap64.S | 8 +--
.../unix/sysv/linux/arm/nptl/sysdep-cancel.h | 32 ++++++------
.../unix/sysv/linux/arm/nptl/unwind-forcedunwind.c | 4 +-
.../unix/sysv/linux/arm/nptl/unwind-resume.c | 4 +-
ports/sysdeps/unix/sysv/linux/arm/syscall.S | 4 +-
ports/sysdeps/unix/sysv/linux/arm/sysdep.h | 27 +++++-----
ports/sysdeps/unix/sysv/linux/arm/vfork.S | 2 +-
19 files changed, 135 insertions(+), 136 deletions(-)
diff --git a/ports/sysdeps/arm/arm-mcount.S b/ports/sysdeps/arm/arm-mcount.S
index 679d042..b6e5ec7 100644
--- a/ports/sysdeps/arm/arm-mcount.S
+++ b/ports/sysdeps/arm/arm-mcount.S
@@ -69,7 +69,7 @@ END(__gnu_mcount_nc)
code be compiled with APCS frame pointers. */
ENTRY(_mcount)
- stmdb sp!, {r0, r1, r2, r3, fp, lr}
+ push {r0, r1, r2, r3, fp, lr}
cfi_adjust_cfa_offset (24)
cfi_rel_offset (r0, 0)
cfi_rel_offset (r1, 4)
@@ -83,9 +83,9 @@ ENTRY(_mcount)
movsne r1, lr
blne __mcount_internal
#ifdef __thumb2__
- ldmia sp!, {r0, r1, r2, r3, fp, pc}
+ pop {r0, r1, r2, r3, fp, pc}
#else
- ldmia sp!, {r0, r1, r2, r3, fp, lr}
+ pop {r0, r1, r2, r3, fp, lr}
cfi_adjust_cfa_offset (-24)
cfi_restore (r0)
cfi_restore (r1)
diff --git a/ports/sysdeps/arm/crti.S b/ports/sysdeps/arm/crti.S
index 1d55ae2..be20a11 100644
--- a/ports/sysdeps/arm/crti.S
+++ b/ports/sysdeps/arm/crti.S
@@ -80,7 +80,7 @@ call_weak_fn:
.globl _init
.type _init, %function
_init:
- stmfd sp!, {r3, lr}
+ push {r3, lr}
#if PREINIT_FUNCTION_WEAK
bl call_weak_fn
#else
@@ -92,4 +92,4 @@ _init:
.globl _fini
.type _fini, %function
_fini:
- stmfd sp!, {r3, lr}
+ push {r3, lr}
diff --git a/ports/sysdeps/arm/crtn.S b/ports/sysdeps/arm/crtn.S
index a01eb01..ae7546c 100644
--- a/ports/sysdeps/arm/crtn.S
+++ b/ports/sysdeps/arm/crtn.S
@@ -42,16 +42,16 @@
.section .init,"ax",%progbits
#ifdef __ARM_ARCH_4T__
- ldmfd sp!, {r3, lr}
+ pop {r3, lr}
bx lr
#else
- ldmfd sp!, {r3, pc}
+ pop {r3, pc}
#endif
.section .fini,"ax",%progbits
#ifdef __ARM_ARCH_4T__
- ldmfd sp!, {r3, lr}
+ pop {r3, lr}
bx lr
#else
- ldmfd sp!, {r3, pc}
+ pop {r3, pc}
#endif
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index c3e2b3e..15a0c21 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -52,12 +52,12 @@ _dl_tlsdesc_return:
_dl_tlsdesc_undefweak:
@ Are we allowed a misaligned stack pointer calling read_tp?
.save {lr}
- stmdb sp!, {lr}
+ push {lr}
cfi_adjust_cfa_offset (4)
cfi_rel_offset (lr,0)
bl __aeabi_read_tp
rsb r0, r0, #0
- ldmia sp!, {lr}
+ pop {lr}
cfi_adjust_cfa_offset (-4)
cfi_restore (lr)
BX (lr)
@@ -99,7 +99,7 @@ _dl_tlsdesc_dynamic:
/* Our calling convention is to clobber r0, r1 and the processor
flags. All others that are modified must be saved */
.save {r2,r3,r4,lr}
- stmdb sp!, {r2,r3,r4,lr}
+ push {r2,r3,r4,lr}
cfi_adjust_cfa_offset (16)
cfi_rel_offset (r2,0)
cfi_rel_offset (r3,4)
@@ -124,7 +124,7 @@ _dl_tlsdesc_dynamic:
1: mov r0, r1
bl __tls_get_addr
rsb r0, r4, r0
-2: ldmia sp!, {r2,r3,r4, lr}
+2: pop {r2,r3,r4, lr}
cfi_adjust_cfa_offset (-16)
cfi_restore (lr)
cfi_restore (r4)
@@ -155,7 +155,7 @@ _dl_tlsdesc_lazy_resolver:
cfi_adjust_cfa_offset (4)
cfi_rel_offset (r2, 0)
.save {r0,r1,r3,ip,lr}
- stmdb sp!, {r0, r1, r3, ip, lr}
+ push {r0, r1, r3, ip, lr}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r0, 0)
cfi_rel_offset (r1, 4)
@@ -163,14 +163,14 @@ _dl_tlsdesc_lazy_resolver:
cfi_rel_offset (ip, 12)
cfi_rel_offset (lr, 16)
bl _dl_tlsdesc_lazy_resolver_fixup
- ldmia sp!, {r0, r1, r3, ip, lr}
+ pop {r0, r1, r3, ip, lr}
cfi_adjust_cfa_offset (-20)
cfi_restore (lr)
cfi_restore (ip)
cfi_restore (r3)
cfi_restore (r1)
cfi_restore (r0)
- ldmia sp!, {r2}
+ pop {r2}
cfi_adjust_cfa_offset (-4)
cfi_restore (r2)
ldr r1, [r0, #4]
@@ -193,7 +193,7 @@ _dl_tlsdesc_resolve_hold:
cfi_adjust_cfa_offset (4)
cfi_rel_offset (r2, 0)
.save {r0,r1,r3,ip,lr}
- stmdb sp!, {r0, r1, r3, ip, lr}
+ push {r0, r1, r3, ip, lr}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r0, 0)
cfi_rel_offset (r1, 4)
@@ -202,14 +202,14 @@ _dl_tlsdesc_resolve_hold:
cfi_rel_offset (lr, 16)
adr r2, _dl_tlsdesc_resolve_hold
bl _dl_tlsdesc_resolve_hold_fixup
- ldmia sp!, {r0, r1, r3, ip, lr}
+ pop {r0, r1, r3, ip, lr}
cfi_adjust_cfa_offset (-20)
cfi_restore (lr)
cfi_restore (ip)
cfi_restore (r3)
cfi_restore (r1)
cfi_restore (r0)
- ldmia sp!, {r2}
+ pop {r2}
cfi_adjust_cfa_offset (-4)
cfi_restore (r2)
ldr r1, [r0, #4]
diff --git a/ports/sysdeps/arm/dl-trampoline.S b/ports/sysdeps/arm/dl-trampoline.S
index b9769cb..a13a4c3 100644
--- a/ports/sysdeps/arm/dl-trampoline.S
+++ b/ports/sysdeps/arm/dl-trampoline.S
@@ -43,7 +43,7 @@ _dl_runtime_resolve:
@ lr points to &GOT[2]
@ Save arguments. We save r4 to realign the stack.
- stmdb sp!,{r0-r4}
+ push {r0-r4}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r0, 0)
cfi_rel_offset (r1, 4)
@@ -67,7 +67,7 @@ _dl_runtime_resolve:
@ get arguments and return address back. We restore r4
@ only to realign the stack.
- ldmia sp!, {r0-r4,lr}
+ pop {r0-r4,lr}
cfi_adjust_cfa_offset (-24)
@ jump to the newly found address
diff --git a/ports/sysdeps/arm/memcpy.S b/ports/sysdeps/arm/memcpy.S
index 98b9b47..98981ef 100644
--- a/ports/sysdeps/arm/memcpy.S
+++ b/ports/sysdeps/arm/memcpy.S
@@ -45,11 +45,11 @@
* Endian independent macros for shifting bytes within registers.
*/
#ifndef __ARMEB__
-#define pull lsr
-#define push lsl
+#define PULL lsr
+#define PUSH lsl
#else
-#define pull lsl
-#define push lsr
+#define PULL lsl
+#define PUSH lsr
#endif
.text
@@ -58,7 +58,7 @@
ENTRY(memcpy)
- stmfd sp!, {r0, r4, lr}
+ push {r0, r4, lr}
cfi_adjust_cfa_offset (12)
cfi_rel_offset (r4, 4)
cfi_rel_offset (lr, 8)
@@ -74,7 +74,7 @@ ENTRY(memcpy)
bne 10f
1: subs r2, r2, #(28)
- stmfd sp!, {r5 - r8}
+ push {r5 - r8}
cfi_adjust_cfa_offset (16)
cfi_rel_offset (r5, 0)
cfi_rel_offset (r6, 4)
@@ -131,7 +131,7 @@ ENTRY(memcpy)
CALGN( bcs 2b )
-7: ldmfd sp!, {r5 - r8}
+7: pop {r5 - r8}
cfi_adjust_cfa_offset (-16)
cfi_restore (r5)
cfi_restore (r6)
@@ -147,13 +147,13 @@ ENTRY(memcpy)
strcsb ip, [r0]
#if defined (__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
- ldmfd sp!, {r0, r4, lr}
+ pop {r0, r4, lr}
cfi_adjust_cfa_offset (-12)
cfi_restore (r4)
cfi_restore (lr)
bx lr
#else
- ldmfd sp!, {r0, r4, pc}
+ pop {r0, r4, pc}
#endif
cfi_restore_state
@@ -189,7 +189,7 @@ ENTRY(memcpy)
CALGN( subcc r2, r2, ip )
CALGN( bcc 15f )
-11: stmfd sp!, {r5 - r9}
+11: push {r5 - r9}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r5, 0)
cfi_rel_offset (r6, 4)
@@ -206,30 +206,30 @@ ENTRY(memcpy)
12: PLD( pld [r1, #124] )
13: ldmia r1!, {r4, r5, r6, r7}
- mov r3, lr, pull #\pull
+ mov r3, lr, PULL #\pull
subs r2, r2, #32
ldmia r1!, {r8, r9, ip, lr}
- orr r3, r3, r4, push #\push
- mov r4, r4, pull #\pull
- orr r4, r4, r5, push #\push
- mov r5, r5, pull #\pull
- orr r5, r5, r6, push #\push
- mov r6, r6, pull #\pull
- orr r6, r6, r7, push #\push
- mov r7, r7, pull #\pull
- orr r7, r7, r8, push #\push
- mov r8, r8, pull #\pull
- orr r8, r8, r9, push #\push
- mov r9, r9, pull #\pull
- orr r9, r9, ip, push #\push
- mov ip, ip, pull #\pull
- orr ip, ip, lr, push #\push
+ orr r3, r3, r4, PUSH #\push
+ mov r4, r4, PULL #\pull
+ orr r4, r4, r5, PUSH #\push
+ mov r5, r5, PULL #\pull
+ orr r5, r5, r6, PUSH #\push
+ mov r6, r6, PULL #\pull
+ orr r6, r6, r7, PUSH #\push
+ mov r7, r7, PULL #\pull
+ orr r7, r7, r8, PUSH #\push
+ mov r8, r8, PULL #\pull
+ orr r8, r8, r9, PUSH #\push
+ mov r9, r9, PULL #\pull
+ orr r9, r9, ip, PUSH #\push
+ mov ip, ip, PULL #\pull
+ orr ip, ip, lr, PUSH #\push
stmia r0!, {r3, r4, r5, r6, r7, r8, r9, ip}
bge 12b
PLD( cmn r2, #96 )
PLD( bge 13b )
- ldmfd sp!, {r5 - r9}
+ pop {r5 - r9}
cfi_adjust_cfa_offset (-20)
cfi_restore (r5)
cfi_restore (r6)
@@ -240,10 +240,10 @@ ENTRY(memcpy)
14: ands ip, r2, #28
beq 16f
-15: mov r3, lr, pull #\pull
+15: mov r3, lr, PULL #\pull
ldr lr, [r1], #4
subs ip, ip, #4
- orr r3, r3, lr, push #\push
+ orr r3, r3, lr, PUSH #\push
str r3, [r0], #4
bgt 15b
CALGN( cmp r2, #0 )
diff --git a/ports/sysdeps/arm/memmove.S b/ports/sysdeps/arm/memmove.S
index 059ca7a..d9fa0e3 100644
--- a/ports/sysdeps/arm/memmove.S
+++ b/ports/sysdeps/arm/memmove.S
@@ -45,11 +45,11 @@
* Endian independent macros for shifting bytes within registers.
*/
#ifndef __ARMEB__
-#define pull lsr
-#define push lsl
+#define PULL lsr
+#define PUSH lsl
#else
-#define pull lsl
-#define push lsr
+#define PULL lsl
+#define PUSH lsr
#endif
.text
@@ -73,7 +73,7 @@ ENTRY(memmove)
bls HIDDEN_JUMPTARGET(memcpy)
#endif
- stmfd sp!, {r0, r4, lr}
+ push {r0, r4, lr}
cfi_adjust_cfa_offset (12)
cfi_rel_offset (r4, 4)
cfi_rel_offset (lr, 8)
@@ -91,7 +91,7 @@ ENTRY(memmove)
bne 10f
1: subs r2, r2, #(28)
- stmfd sp!, {r5 - r8}
+ push {r5 - r8}
cfi_adjust_cfa_offset (16)
cfi_rel_offset (r5, 0)
cfi_rel_offset (r6, 4)
@@ -147,7 +147,7 @@ ENTRY(memmove)
CALGN( bcs 2b )
-7: ldmfd sp!, {r5 - r8}
+7: pop {r5 - r8}
cfi_adjust_cfa_offset (-16)
cfi_restore (r5)
cfi_restore (r6)
@@ -163,13 +163,13 @@ ENTRY(memmove)
strcsb ip, [r0, #-1]
#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
- ldmfd sp!, {r0, r4, lr}
+ pop {r0, r4, lr}
cfi_adjust_cfa_offset (-12)
cfi_restore (r4)
cfi_restore (lr)
bx lr
#else
- ldmfd sp!, {r0, r4, pc}
+ pop {r0, r4, pc}
#endif
cfi_restore_state
@@ -204,7 +204,7 @@ ENTRY(memmove)
CALGN( subcc r2, r2, ip )
CALGN( bcc 15f )
-11: stmfd sp!, {r5 - r9}
+11: push {r5 - r9}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r5, 0)
cfi_rel_offset (r6, 4)
@@ -221,30 +221,30 @@ ENTRY(memmove)
12: PLD( pld [r1, #-128] )
13: ldmdb r1!, {r7, r8, r9, ip}
- mov lr, r3, push #\push
+ mov lr, r3, PUSH #\push
subs r2, r2, #32
ldmdb r1!, {r3, r4, r5, r6}
- orr lr, lr, ip, pull #\pull
- mov ip, ip, push #\push
- orr ip, ip, r9, pull #\pull
- mov r9, r9, push #\push
- orr r9, r9, r8, pull #\pull
- mov r8, r8, push #\push
- orr r8, r8, r7, pull #\pull
- mov r7, r7, push #\push
- orr r7, r7, r6, pull #\pull
- mov r6, r6, push #\push
- orr r6, r6, r5, pull #\pull
- mov r5, r5, push #\push
- orr r5, r5, r4, pull #\pull
- mov r4, r4, push #\push
- orr r4, r4, r3, pull #\pull
+ orr lr, lr, ip, PULL #\pull
+ mov ip, ip, PUSH #\push
+ orr ip, ip, r9, PULL #\pull
+ mov r9, r9, PUSH #\push
+ orr r9, r9, r8, PULL #\pull
+ mov r8, r8, PUSH #\push
+ orr r8, r8, r7, PULL #\pull
+ mov r7, r7, PUSH #\push
+ orr r7, r7, r6, PULL #\pull
+ mov r6, r6, PUSH #\push
+ orr r6, r6, r5, PULL #\pull
+ mov r5, r5, PUSH #\push
+ orr r5, r5, r4, PULL #\pull
+ mov r4, r4, PUSH #\push
+ orr r4, r4, r3, PULL #\pull
stmdb r0!, {r4 - r9, ip, lr}
bge 12b
PLD( cmn r2, #96 )
PLD( bge 13b )
- ldmfd sp!, {r5 - r9}
+ pop {r5 - r9}
cfi_adjust_cfa_offset (-20)
cfi_restore (r5)
cfi_restore (r6)
@@ -255,10 +255,10 @@ ENTRY(memmove)
14: ands ip, r2, #28
beq 16f
-15: mov lr, r3, push #\push
+15: mov lr, r3, PUSH #\push
ldr r3, [r1, #-4]!
subs ip, ip, #4
- orr lr, lr, r3, pull #\pull
+ orr lr, lr, r3, PULL #\pull
str lr, [r0, #-4]!
bgt 15b
CALGN( cmp r2, #0 )
diff --git a/ports/sysdeps/arm/start.S b/ports/sysdeps/arm/start.S
index a1d15b8..0a57b0b 100644
--- a/ports/sysdeps/arm/start.S
+++ b/ports/sysdeps/arm/start.S
@@ -80,14 +80,14 @@ _start:
mov lr, #0
/* Pop argc off the stack and save a pointer to argv */
- ldr a2, [sp], #4
+ pop { a2 }
mov a3, sp
/* Push stack limit */
- str a3, [sp, #-4]!
+ push { a3 }
/* Push rtld_fini */
- str a1, [sp, #-4]!
+ push { a1 }
#ifdef SHARED
ldr sl, .L_GOT
@@ -97,7 +97,7 @@ _start:
ldr ip, .L_GOT+4 /* __libc_csu_fini */
ldr ip, [sl, ip]
- str ip, [sp, #-4]! /* Push __libc_csu_fini */
+ push { ip } /* Push __libc_csu_fini */
ldr a4, .L_GOT+8 /* __libc_csu_init */
ldr a4, [sl, a4]
@@ -113,7 +113,7 @@ _start:
ldr ip, =__libc_csu_fini
/* Push __libc_csu_fini */
- str ip, [sp, #-4]!
+ push { ip }
/* Set up the other arguments in registers */
ldr a1, =main
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 3459219..fed3dfd 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -77,7 +77,7 @@
/* Call __gnu_mcount_nc if GCC >= 4.4. */
#if __GNUC_PREREQ(4,4)
#define CALL_MCOUNT \
- str lr,[sp, #-4]!; \
+ push { lr }; \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (lr, 0); \
bl PLTJMP(mcount); \
@@ -85,11 +85,11 @@
cfi_restore (lr)
#else /* else call _mcount */
#define CALL_MCOUNT \
- str lr,[sp, #-4]!; \
+ push { lr }; \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (lr, 0); \
bl PLTJMP(mcount); \
- ldr lr, [sp], #4; \
+ pop { lr }; \
cfi_adjust_cfa_offset (-4); \
cfi_restore (lr)
#endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
index 29edec6..6ee7a1a 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/____longjmp_chk.S
@@ -53,7 +53,7 @@ longjmp_msg:
cfi_remember_state; \
cmp sp, reg; \
bls .Lok; \
- str r7, [sp, #-4]!; \
+ push { r7 }; \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (r7, 0); \
mov r5, r0; \
@@ -79,7 +79,7 @@ longjmp_msg:
.Lfail: \
add sp, sp, #12; \
cfi_adjust_cfa_offset (-12); \
- ldr r7, [sp], #4; \
+ pop { r7 }; \
cfi_adjust_cfa_offset (-4); \
cfi_restore (r7); \
CALL_FAIL \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index 58ee7b4..2e8c61e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -49,7 +49,7 @@ ENTRY(__clone)
mov ip, r2
#endif
@ new sp is already in r1
- stmfd sp!, {r4, r7}
+ push {r4, r7}
cfi_adjust_cfa_offset (8)
cfi_rel_offset (r4, 0)
cfi_rel_offset (r7, 4)
@@ -61,7 +61,7 @@ ENTRY(__clone)
cfi_endproc
cmp r0, #0
beq 1f
- ldmfd sp!, {r4, r7}
+ pop {r4, r7}
blt PLTJMP(C_SYMBOL_NAME(__syscall_error))
RETINSTR(, lr)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap.S b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
index 68560b0..06b737e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
@@ -23,11 +23,11 @@
ENTRY (__mmap)
/* shuffle args */
- str r5, [sp, #-4]!
+ push { r5 }
cfi_adjust_cfa_offset (4)
cfi_rel_offset (r5, 0)
ldr r5, [sp, #8]
- str r4, [sp, #-4]!
+ push { r4 }
cfi_adjust_cfa_offset (4)
cfi_rel_offset (r4, 0)
cfi_remember_state
@@ -43,10 +43,10 @@ ENTRY (__mmap)
/* restore registers */
2:
- ldr r4, [sp], #4
+ pop { r4 }
cfi_adjust_cfa_offset (-4)
cfi_restore (r4)
- ldr r5, [sp], #4
+ pop { r5 }
cfi_adjust_cfa_offset (-4)
cfi_restore (r5)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
index dcbab3a..d039129 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
@@ -34,11 +34,11 @@
.text
ENTRY (__mmap64)
ldr ip, [sp, $LOW_OFFSET]
- str r5, [sp, #-4]!
+ push { r5 }
cfi_adjust_cfa_offset (4)
cfi_rel_offset (r5, 0)
ldr r5, [sp, $HIGH_OFFSET]
- str r4, [sp, #-4]!
+ push { r4 }
cfi_adjust_cfa_offset (4)
cfi_rel_offset (r4, 0)
cfi_remember_state
@@ -51,7 +51,7 @@ ENTRY (__mmap64)
orr r5, ip, r5, lsl $20 @ compose page offset
DO_CALL (mmap2, 0)
cmn r0, $4096
- ldmfd sp!, {r4, r5}
+ pop {r4, r5}
cfi_adjust_cfa_offset (-8)
cfi_restore (r4)
cfi_restore (r5)
@@ -62,7 +62,7 @@ ENTRY (__mmap64)
cfi_restore_state
.Linval:
mov r0, $-EINVAL
- ldmfd sp!, {r4, r5}
+ pop {r4, r5}
cfi_adjust_cfa_offset (-8)
cfi_restore (r4)
cfi_restore (r5)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index 0c9e780..f0f7043 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -80,19 +80,19 @@
# define DOCARGS_0 \
.save {r7}; \
- str lr, [sp, #-4]!; \
+ push { lr }; \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (lr, 0); \
.save {lr}
# define UNDOCARGS_0
# define RESTORE_LR_0 \
- ldr lr, [sp], #4; \
+ pop { lr }; \
cfi_adjust_cfa_offset (-4); \
cfi_restore (lr)
# define DOCARGS_1 \
.save {r7}; \
- stmfd sp!, {r0, r1, lr}; \
+ push {r0, r1, lr}; \
cfi_adjust_cfa_offset (12); \
cfi_rel_offset (lr, 8); \
.save {lr}; \
@@ -106,13 +106,13 @@
# define DOCARGS_2 \
.save {r7}; \
- stmfd sp!, {r0, r1, lr}; \
+ push {r0, r1, lr}; \
cfi_adjust_cfa_offset (12); \
cfi_rel_offset (lr, 8); \
.save {lr}; \
.pad #8
# define UNDOCARGS_2 \
- ldmfd sp!, {r0, r1}; \
+ pop {r0, r1}; \
cfi_adjust_cfa_offset (-8); \
RESTART_UNWIND
# define RESTORE_LR_2 \
@@ -120,13 +120,13 @@
# define DOCARGS_3 \
.save {r7}; \
- stmfd sp!, {r0, r1, r2, r3, lr}; \
+ push {r0, r1, r2, r3, lr}; \
cfi_adjust_cfa_offset (20); \
cfi_rel_offset (lr, 16); \
.save {lr}; \
.pad #16
# define UNDOCARGS_3 \
- ldmfd sp!, {r0, r1, r2, r3}; \
+ pop {r0, r1, r2, r3}; \
cfi_adjust_cfa_offset (-16); \
RESTART_UNWIND
# define RESTORE_LR_3 \
@@ -134,13 +134,13 @@
# define DOCARGS_4 \
.save {r7}; \
- stmfd sp!, {r0, r1, r2, r3, lr}; \
+ push {r0, r1, r2, r3, lr}; \
cfi_adjust_cfa_offset (20); \
cfi_rel_offset (lr, 16); \
.save {lr}; \
.pad #16
# define UNDOCARGS_4 \
- ldmfd sp!, {r0, r1, r2, r3}; \
+ pop {r0, r1, r2, r3}; \
cfi_adjust_cfa_offset (-16); \
RESTART_UNWIND
# define RESTORE_LR_4 \
@@ -149,13 +149,13 @@
/* r4 is only stmfd'ed for correct stack alignment. */
# define DOCARGS_5 \
.save {r4, r7}; \
- stmfd sp!, {r0, r1, r2, r3, r4, lr}; \
+ push {r0, r1, r2, r3, r4, lr}; \
cfi_adjust_cfa_offset (24); \
cfi_rel_offset (lr, 20); \
.save {lr}; \
.pad #20
# define UNDOCARGS_5 \
- ldmfd sp!, {r0, r1, r2, r3}; \
+ pop {r0, r1, r2, r3}; \
cfi_adjust_cfa_offset (-16); \
.fnend; \
.fnstart; \
@@ -163,20 +163,20 @@
.save {lr}; \
.pad #4
# define RESTORE_LR_5 \
- ldmfd sp!, {r4, lr}; \
+ pop {r4, lr}; \
cfi_adjust_cfa_offset (-8); \
/* r4 will be marked as restored later. */ \
cfi_restore (lr)
# define DOCARGS_6 \
.save {r4, r5, r7}; \
- stmfd sp!, {r0, r1, r2, r3, lr}; \
+ push {r0, r1, r2, r3, lr}; \
cfi_adjust_cfa_offset (20); \
cfi_rel_offset (lr, 16); \
.save {lr}; \
.pad #16
# define UNDOCARGS_6 \
- ldmfd sp!, {r0, r1, r2, r3}; \
+ pop {r0, r1, r2, r3}; \
cfi_adjust_cfa_offset (-16); \
.fnend; \
.fnstart; \
@@ -217,13 +217,13 @@ extern int __local_multiple_threads attribute_hidden;
header.multiple_threads) == 0, 1)
# else
# define SINGLE_THREAD_P \
- stmfd sp!, {r0, lr}; \
+ push {r0, lr}; \
cfi_adjust_cfa_offset (8); \
cfi_rel_offset (lr, 4); \
bl __aeabi_read_tp; \
NEGOFF_ADJ_BASE(r0, MULTIPLE_THREADS_OFFSET); \
ldr ip, NEGOFF_OFF1(r0, MULTIPLE_THREADS_OFFSET); \
- ldmfd sp!, {r0, lr}; \
+ pop {r0, lr}; \
cfi_adjust_cfa_offset (-8); \
cfi_restore (lr); \
teq ip, #0
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
index 58ca9ac..44a0dc9 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
@@ -90,7 +90,7 @@ asm (
"_Unwind_Resume:\n"
" .cfi_sections .debug_frame\n"
" " CFI_STARTPROC "\n"
-" stmfd sp!, {r4, r5, r6, lr}\n"
+" push {r4, r5, r6, lr}\n"
" " CFI_ADJUST_CFA_OFFSET (16)" \n"
" " CFI_REL_OFFSET (r4, 0) "\n"
" " CFI_REL_OFFSET (r5, 4) "\n"
@@ -105,7 +105,7 @@ asm (
" cmp r3, #0\n"
" beq 4f\n"
"5: mov r0, r6\n"
-" ldmfd sp!, {r4, r5, r6, lr}\n"
+" pop {r4, r5, r6, lr}\n"
" " CFI_ADJUST_CFA_OFFSET (-16) "\n"
" " CFI_RESTORE (r4) "\n"
" " CFI_RESTORE (r5) "\n"
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c
index 0a3ad95..4c15827 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c
@@ -53,7 +53,7 @@ asm (
"_Unwind_Resume:\n"
" .cfi_sections .debug_frame\n"
" " CFI_STARTPROC "\n"
-" stmfd sp!, {r4, r5, r6, lr}\n"
+" push {r4, r5, r6, lr}\n"
" " CFI_ADJUST_CFA_OFFSET (16)" \n"
" " CFI_REL_OFFSET (r4, 0) "\n"
" " CFI_REL_OFFSET (r5, 4) "\n"
@@ -68,7 +68,7 @@ asm (
" cmp r3, #0\n"
" beq 4f\n"
"5: mov r0, r6\n"
-" ldmfd sp!, {r4, r5, r6, lr}\n"
+" pop {r4, r5, r6, lr}\n"
" " CFI_ADJUST_CFA_OFFSET (-16) "\n"
" " CFI_RESTORE (r4) "\n"
" " CFI_RESTORE (r5) "\n"
diff --git a/ports/sysdeps/unix/sysv/linux/arm/syscall.S b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
index 665ecb4..bdd5a52 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/syscall.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
@@ -23,7 +23,7 @@
ENTRY (syscall)
mov ip, sp
- stmfd sp!, {r4, r5, r6, r7}
+ push {r4, r5, r6, r7}
cfi_adjust_cfa_offset (16)
cfi_rel_offset (r4, 0)
cfi_rel_offset (r5, 4)
@@ -35,7 +35,7 @@ ENTRY (syscall)
mov r2, r3
ldmfd ip, {r3, r4, r5, r6}
swi 0x0
- ldmfd sp!, {r4, r5, r6, r7}
+ pop {r4, r5, r6, r7}
cfi_adjust_cfa_offset (-16)
cfi_restore (r4)
cfi_restore (r5)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index e448e61..f77af7f 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -132,23 +132,22 @@ __local_syscall_error: \
# else
# if defined(__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
# define POP_PC \
- ldr lr, [sp], #4; \
+ pop { lr }; \
cfi_adjust_cfa_offset (-4); \
cfi_restore (lr); \
bx lr
# else
-# define POP_PC \
- ldr pc, [sp], #4
+# define POP_PC pop { pc }
# endif
# define SYSCALL_ERROR_HANDLER \
__local_syscall_error: \
- str lr, [sp, #-4]!; \
+ push { lr }; \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (lr, 0); \
- str r0, [sp, #-4]!; \
+ push { r0 }; \
cfi_adjust_cfa_offset (4); \
bl PLTJMP(C_SYMBOL_NAME(__errno_location)); \
- ldr r1, [sp], #4; \
+ pop { r1 }; \
cfi_adjust_cfa_offset (-4); \
rsb r1, r1, #0; \
str r1, [r0]; \
@@ -215,7 +214,7 @@ __local_syscall_error: \
#undef DOARGS_0
#define DOARGS_0 \
.fnstart; \
- str r7, [sp, #-4]!; \
+ push { r7 }; \
cfi_adjust_cfa_offset (4); \
cfi_rel_offset (r7, 0); \
.save { r7 }
@@ -230,7 +229,7 @@ __local_syscall_error: \
#undef DOARGS_5
#define DOARGS_5 \
.fnstart; \
- stmfd sp!, {r4, r7}; \
+ push {r4, r7}; \
cfi_adjust_cfa_offset (8); \
cfi_rel_offset (r4, 0); \
cfi_rel_offset (r7, 4); \
@@ -240,7 +239,7 @@ __local_syscall_error: \
#define DOARGS_6 \
.fnstart; \
mov ip, sp; \
- stmfd sp!, {r4, r5, r7}; \
+ push {r4, r5, r7}; \
cfi_adjust_cfa_offset (12); \
cfi_rel_offset (r4, 0); \
cfi_rel_offset (r5, 4); \
@@ -251,7 +250,7 @@ __local_syscall_error: \
#define DOARGS_7 \
.fnstart; \
mov ip, sp; \
- stmfd sp!, {r4, r5, r6, r7}; \
+ push {r4, r5, r6, r7}; \
cfi_adjust_cfa_offset (16); \
cfi_rel_offset (r4, 0); \
cfi_rel_offset (r5, 4); \
@@ -262,7 +261,7 @@ __local_syscall_error: \
#undef UNDOARGS_0
#define UNDOARGS_0 \
- ldr r7, [sp], #4; \
+ pop { r7 }; \
cfi_adjust_cfa_offset (-4); \
cfi_restore (r7); \
.fnend
@@ -276,14 +275,14 @@ __local_syscall_error: \
#define UNDOARGS_4 UNDOARGS_0
#undef UNDOARGS_5
#define UNDOARGS_5 \
- ldmfd sp!, {r4, r7}; \
+ pop {r4, r7}; \
cfi_adjust_cfa_offset (-8); \
cfi_restore (r4); \
cfi_restore (r7); \
.fnend
#undef UNDOARGS_6
#define UNDOARGS_6 \
- ldmfd sp!, {r4, r5, r7}; \
+ pop {r4, r5, r7}; \
cfi_adjust_cfa_offset (-12); \
cfi_restore (r4); \
cfi_restore (r5); \
@@ -291,7 +290,7 @@ __local_syscall_error: \
.fnend
#undef UNDOARGS_7
#define UNDOARGS_7 \
- ldmfd sp!, {r4, r5, r6, r7}; \
+ pop {r4, r5, r6, r7}; \
cfi_adjust_cfa_offset (-16); \
cfi_restore (r4); \
cfi_restore (r5); \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
index ae931f7..128a640 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
@@ -37,7 +37,7 @@ ENTRY (__vfork)
mov ip, r7
cfi_register (r7, ip)
.fnstart
- str r7, [sp, #-4]!
+ push { r7 }
cfi_adjust_cfa_offset (4)
.save { r7 }
ldr r7, =SYS_ify (vfork)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 14/26] arm: Use push/pop mnemonics
2013-02-27 3:17 ` [PATCH 14/26] arm: Use push/pop mnemonics Richard Henderson
@ 2013-02-28 1:03 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 1:03 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> For arm this makes no difference--the result is bit-for-bit identical;
> for thumb this results in smaller encodings. Perhaps it ought not and
> this is in fact an assembler bug, but I also think it's clearer.
> ---
> * sysdeps/arm/arm-mcount.S (_mcount): Use push/pop mnemonics.
> * sysdeps/arm/crti.S, sysdeps/arm/crtn.S: Likewise.
> * sysdeps/arm/dl-tlsdesc.S: Likewise.
> * sysdeps/arm/dl-trampoline.S: Likewise.
> * sysdeps/arm/start.S: Likewise.
> * sysdeps/arm/memcpy.S (PULL): Rename macro from pull.
> (PUSH): Rename macro from push.
> (memcpy): Use push/pop mnemonics.
> * sysdeps/arm/memmove.S: Similarly.
> * sysdeps/arm/sysdep.h (CALL_MCOUNT): Use push/pop mnemonics.
> * sysdeps/unix/sysv/linux/arm/____longjmp_chk.S: Likewise.
> * sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
> * sysdeps/unix/sysv/linux/arm/mmap.S: Likewise.
> * sysdeps/unix/sysv/linux/arm/mmap64.S: Likewise.
> * sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h: Likewise.
> * sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c: Likewise.
> * sysdeps/unix/sysv/linux/arm/nptl/unwind-resume.c: Likewise.
> * sysdeps/unix/sysv/linux/arm/syscall.S: Likewise.
> * sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise.
> * sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.
OK.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 20/26] arm: Implement armv6t2 optimized strlen
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (7 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 14/26] arm: Use push/pop mnemonics Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 17:12 ` Måns Rullgård
2013-02-27 3:17 ` [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc Richard Henderson
` (19 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Twice as fast for long strings and 50% faster for short strings
over the armv4 version on A15.
---
* sysdeps/arm/armv6t2/strlen.S: New file.
---
ports/sysdeps/arm/armv6t2/strlen.S | 93 ++++++++++++++++++++++++++++++++++++++
1 file changed, 93 insertions(+)
create mode 100644 ports/sysdeps/arm/armv6t2/strlen.S
diff --git a/ports/sysdeps/arm/armv6t2/strlen.S b/ports/sysdeps/arm/armv6t2/strlen.S
new file mode 100644
index 0000000..d7d6e1f
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strlen.S
@@ -0,0 +1,93 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+ENTRY(strlen)
+ @ r0 = start of string
+ pld [r0]
+
+ @ To cater to long strings, we want to search through a few
+ @ characters until we reach an aligned pointer. To cater to
+ @ small strings, we don't want to start doing word operations
+ @ immediately. The compromise is a maximum of 16 bytes less
+ @ whatever is required to end with an aligned pointer.
+ @ r3 = number of characters to search in alignment loop
+ and r3, r0, #7
+ s(mov) r1, r0 @ Save the input pointer
+ rsb r3, r3, #16
+
+ @ Loop until we find ...
+1: ldrb r2, [r0], #1
+ subs r3, r3, #1 @ ... the aligment point
+ it ne
+ cmpne r2, #0 @ ... or EOS
+ bne 1b
+
+ @ Disambiguate the exit possibilites above
+ cmp r2, #0 @ Found EOS
+ ittt eq
+ subeq r0, r0, #1 @ Undo post-inc above
+ subeq r0, r0, r1 @ Subtract input to compute length
+ bxeq lr
+
+ @ So now we're aligned.
+ ldrd r2, r3, [r0], #8
+ movw ip, #0xfefe
+ pld [r0, #64]
+ movt ip, #0xfefe
+ pld [r0, #128]
+ pld [r0, #192]
+
+ @ Loop searching for EOS or C, 8 bytes at a time.
+ @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+ @ that was originally zero and 0xff otherwise. Therefore we consider
+ @ the lsb of each byte the "found" bit, with 0 for a match.
+ .balign 16
+2: uqadd8 r2, r2, ip @ Find EOS
+ uqadd8 r3, r3, ip
+ pld [r0, #256] @ Prefetch 4 lines ahead
+ s(and) r3, r3, r2 @ Combine the two words
+ mvns r3, r3 @ Test for any found bit true
+ it eq
+ ldrdeq r2, r3, [r0], #8
+ beq 2b
+
+ @ Found something. Disambiguate between first and second words.
+ @ Adjust r0 to point to the word containing the match.
+ @ Adjust r2 to the found bits for the word containing the match.
+ mvns r2, r2
+ itee ne
+ subne r0, r0, #8
+ moveq r2, r3
+ subeq r0, r0, #4
+
+ @ Find the bit-offset of the match within the word.
+#ifdef __ARMEL__
+ rbit r2, r2 @ For LE we need count-trailing-zeros
+#endif
+ clz r2, r2
+ add r0, r0, r2, lsr #3 @ Adjust the pointer to the found byte
+ s(sub) r0, r0, r1 @ Subtract input to compute length
+ bx lr
+
+END(strlen)
+
+libc_hidden_builtin_def (strlen)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 20/26] arm: Implement armv6t2 optimized strlen
2013-02-27 3:17 ` [PATCH 20/26] arm: Implement armv6t2 optimized strlen Richard Henderson
@ 2013-02-27 17:12 ` Måns Rullgård
2013-02-27 17:44 ` Richard Henderson
0 siblings, 1 reply; 63+ messages in thread
From: Måns Rullgård @ 2013-02-27 17:12 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports, Joseph Myers
Richard Henderson <rth@twiddle.net> writes:
> +ENTRY(strlen)
> + @ r0 = start of string
> + pld [r0]
> +
> + @ To cater to long strings, we want to search through a few
> + @ characters until we reach an aligned pointer. To cater to
> + @ small strings, we don't want to start doing word operations
> + @ immediately. The compromise is a maximum of 16 bytes less
> + @ whatever is required to end with an aligned pointer.
> + @ r3 = number of characters to search in alignment loop
> + and r3, r0, #7
> + s(mov) r1, r0 @ Save the input pointer
> + rsb r3, r3, #16
> +
> + @ Loop until we find ...
> +1: ldrb r2, [r0], #1
> + subs r3, r3, #1 @ ... the aligment point
> + it ne
> + cmpne r2, #0 @ ... or EOS
> + bne 1b
> +
> + @ Disambiguate the exit possibilites above
> + cmp r2, #0 @ Found EOS
> + ittt eq
> + subeq r0, r0, #1 @ Undo post-inc above
> + subeq r0, r0, r1 @ Subtract input to compute length
> + bxeq lr
> +
> + @ So now we're aligned.
> + ldrd r2, r3, [r0], #8
> + movw ip, #0xfefe
> + pld [r0, #64]
> + movt ip, #0xfefe
> + pld [r0, #128]
> + pld [r0, #192]
> +
> + @ Loop searching for EOS or C, 8 bytes at a time.
This comment seems to be for strchr().
> + @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
> + @ that was originally zero and 0xff otherwise. Therefore we consider
> + @ the lsb of each byte the "found" bit, with 0 for a match.
> + .balign 16
> +2: uqadd8 r2, r2, ip @ Find EOS
> + uqadd8 r3, r3, ip
> + pld [r0, #256] @ Prefetch 4 lines ahead
> + s(and) r3, r3, r2 @ Combine the two words
> + mvns r3, r3 @ Test for any found bit true
> + it eq
> + ldrdeq r2, r3, [r0], #8
> + beq 2b
Subtracting the values (with UQSUB8) from 1 instead would result in a 0
result any non-zero input and a 1 for "found", i.e. the inverse of what
you have here. Testing for a match anywhere in the double-word then
becomes a single ORRS instruction. Unless I'm making some stupid mistake.
> + @ Found something. Disambiguate between first and second words.
> + @ Adjust r0 to point to the word containing the match.
> + @ Adjust r2 to the found bits for the word containing the match.
> + mvns r2, r2
> + itee ne
> + subne r0, r0, #8
> + moveq r2, r3
> + subeq r0, r0, #4
> +
> + @ Find the bit-offset of the match within the word.
> +#ifdef __ARMEL__
> + rbit r2, r2 @ For LE we need count-trailing-zeros
> +#endif
> + clz r2, r2
> + add r0, r0, r2, lsr #3 @ Adjust the pointer to the found byte
> + s(sub) r0, r0, r1 @ Subtract input to compute length
> + bx lr
> +
> +END(strlen)
This code could be made to work for any ARMv6 by (conditionally)
replacing the MOVW/MOVT with some equivalent and the RBIT by REV. REV
works since only the lsb in each byte can be set, so the result of CLZ
will simply be 7 more than we want, and the 3 low-order bits are shifted
out anyway.
--
Måns Rullgård
mans@mansr.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 20/26] arm: Implement armv6t2 optimized strlen
2013-02-27 17:12 ` Måns Rullgård
@ 2013-02-27 17:44 ` Richard Henderson
0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 17:44 UTC (permalink / raw)
To: Måns Rullgård; +Cc: libc-ports, Joseph Myers
On 02/27/2013 09:12 AM, Måns Rullgård wrote:
> Richard Henderson <rth@twiddle.net> writes:
>
>> +ENTRY(strlen)
...
>> + @ Loop searching for EOS or C, 8 bytes at a time.
>
> This comment seems to be for strchr().
Whoops. As you can imagine there's some amount of cut and paste here. ;-)
> Subtracting the values (with UQSUB8) from 1 instead would result in a 0
> result any non-zero input and a 1 for "found", i.e. the inverse of what
> you have here. Testing for a match anywhere in the double-word then
> becomes a single ORRS instruction. Unless I'm making some stupid mistake.
Yes, this works. And a good idea for improvement.
> This code could be made to work for any ARMv6 by (conditionally)
> replacing the MOVW/MOVT with some equivalent and the RBIT by REV. REV
> works since only the lsb in each byte can be set, so the result of CLZ
> will simply be 7 more than we want, and the 3 low-order bits are shifted
> out anyway.
Ah, I'd mis-read the document the first time round and thought uqadd8 was an
armv6t2 instruction. I'll rearrange all these so that armv6 can benefit.
Which makes patch 3 once again useful... ;-)
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (8 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 20/26] arm: Implement armv6t2 optimized strlen Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 21:57 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 11/26] arm: Introduce and use NEGOFF series of macros Richard Henderson
` (18 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Unless we're trying old interworking, there's no point restoring to
LR first. Everthing from armv5 on handles pop as an interworking jump.
---
* sysdeps/arm/arm-mcount.S (_mcount): Use pop into pc unless
__ARM_ARCH_4T__ and __THUMB_INTERWORK__.
* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_undefweak): Likewise.
(_dl_tlsdesc_dynamic): Likewise.
---
ports/sysdeps/arm/arm-mcount.S | 6 +++---
ports/sysdeps/arm/dl-tlsdesc.S | 15 ++++++++++++---
2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/ports/sysdeps/arm/arm-mcount.S b/ports/sysdeps/arm/arm-mcount.S
index b6e5ec7..8ad0779 100644
--- a/ports/sysdeps/arm/arm-mcount.S
+++ b/ports/sysdeps/arm/arm-mcount.S
@@ -82,9 +82,7 @@ ENTRY(_mcount)
ldrne r0, [r0, #-4]
movsne r1, lr
blne __mcount_internal
-#ifdef __thumb2__
- pop {r0, r1, r2, r3, fp, pc}
-#else
+#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
pop {r0, r1, r2, r3, fp, lr}
cfi_adjust_cfa_offset (-24)
cfi_restore (r0)
@@ -94,6 +92,8 @@ ENTRY(_mcount)
cfi_restore (fp)
cfi_restore (lr)
bx lr
+#else
+ pop {r0, r1, r2, r3, fp, pc}
#endif
END(_mcount)
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 417b8b3..6c47743 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -51,10 +51,14 @@ _dl_tlsdesc_undefweak:
cfi_rel_offset (lr,0)
bl __aeabi_read_tp
rsb r0, r0, #0
+#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
pop {lr}
cfi_adjust_cfa_offset (-4)
cfi_restore (lr)
- BX (lr)
+ bx lr
+#else
+ pop {pc}
+#endif
cfi_endproc
.fnend
@@ -118,13 +122,18 @@ _dl_tlsdesc_dynamic:
1: mov r0, r1
bl __tls_get_addr
rsb r0, r4, r0
-2: pop {r2,r3,r4, lr}
+2:
+#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
+ pop {r2,r3,r4, lr}
cfi_adjust_cfa_offset (-16)
cfi_restore (lr)
cfi_restore (r4)
cfi_restore (r3)
cfi_restore (r2)
- BX (lr)
+ bx lr
+#else
+ pop {r2,r3,r4, pc}
+#endif
.fnend
cfi_endproc
.size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 11/26] arm: Introduce and use NEGOFF series of macros
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (9 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 17/26] arm: Unless arm4t, pop return address directly into pc Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 3:17 ` [PATCH 13/26] arm: Store lr in r2 around GET_TLS Richard Henderson
` (17 subsequent siblings)
28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
There are several places in which we access negative offsets from
the thread-pointer, but thumb2 only supports positive offsets in
memory references.
Avoid duplicating the rather large macros in which these references
are embedded by abstracting out the operation.
---
* sysdeps/arm/sysdep.h (NEGOFF_ADJ_BASE): New macro.
(NEGOFF_ADJ_BASE2, NEGOFF_OFF1, NEGOFF_OFF2): New macros.
* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Use them.
* sysdeps/unix/sysv/linux/arm/nptl/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h (SINGLE_THREAD_P):
Likewise.
---
ports/sysdeps/arm/sysdep.h | 16 ++++++++++++++++
ports/sysdeps/unix/sysv/linux/arm/clone.S | 5 +++--
ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S | 11 ++++++-----
.../sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h | 19 ++++++++++---------
ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S | 14 ++++++++------
5 files changed, 43 insertions(+), 22 deletions(-)
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index b7ba9b1..2d40823 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -147,4 +147,20 @@
99: OP R, [pc, T]
#endif
+/* Cope with negative memory offsets, which thumb can't encode.
+ Use NEGOFF_ADJ_BASE to (conditionally) alter the base register,
+ and then NEGOFF_OFF1 to use 0 for thumb and the offset for arm,
+ or NEGOFF_OFF2 to use A-B for thumb and A for arm. */
+#ifdef __thumb2__
+# define NEGOFF_ADJ_BASE(R, OFF) add R, R, $OFF
+# define NEGOFF_ADJ_BASE2(D, S, OFF) add D, S, $OFF
+# define NEGOFF_OFF1(R, OFF) [R]
+# define NEGOFF_OFF2(R, OFFA, OFFB) [R, $((OFFA) - (OFFB))]
+#else
+# define NEGOFF_ADJ_BASE(R, OFF)
+# define NEGOFF_ADJ_BASE2(D, S, OFF) mov D, S
+# define NEGOFF_OFF1(R, OFF) [R, $OFF]
+# define NEGOFF_OFF2(R, OFFA, OFFB) [R, $OFFA]
+#endif
+
#endif /* __ASSEMBLER__ */
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index 9de37f2..58ee7b4 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -81,8 +81,9 @@ PSEUDO_END (__clone)
ite ne
movne r0, #-1
swieq 0x0
- str r0, [r1, #PID_OFFSET]
- str r0, [r1, #TID_OFFSET]
+ NEGOFF_ADJ_BASE(r1, TID_OFFSET)
+ str r0, NEGOFF_OFF1(r1, TID_OFFSET)
+ str r0, NEGOFF_OFF2(r1, PID_OFFSET, TID_OFFSET)
3:
#endif
@ pick the function arg and call address off the stack and execute
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
index 749aaab..bc0a771 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
@@ -26,14 +26,15 @@
ldr lr, [sp], #4; /* Restore LR. */ \
cfi_adjust_cfa_offset (-4); \
cfi_restore (lr); \
- mov r2, r0; /* Save the TLS addr in r2. */ \
- ldr r3, [r2, #PID_OFFSET]; /* Load the saved PID. */ \
- rsb r0, r3, #0; /* Negate it. */ \
- str r0, [r2, #PID_OFFSET] /* Store the temporary PID. */
+ NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2. */ \
+ ldr r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID. */ \
+ rsb r0, r3, #0; /* Negate it. */ \
+ str r0, NEGOFF_OFF1(r2, PID_OFFSET); /* Store the temp PID. */
/* Restore the old PID value in the parent. */
#define RESTORE_PID \
cmp r0, #0; /* If we are the parent... */ \
- strne r3, [r2, #PID_OFFSET] /* ... restore the saved PID. */
+ it ne; \
+ strne r3, NEGOFF_OFF1(r2, PID_OFFSET); /* restore the saved PID. */
#include "../vfork.S"
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index b6dc3e0..0c9e780 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -217,15 +217,16 @@ extern int __local_multiple_threads attribute_hidden;
header.multiple_threads) == 0, 1)
# else
# define SINGLE_THREAD_P \
- stmfd sp!, {r0, lr}; \
- cfi_adjust_cfa_offset (8); \
- cfi_rel_offset (lr, 4); \
- bl __aeabi_read_tp; \
- ldr ip, [r0, #MULTIPLE_THREADS_OFFSET]; \
- ldmfd sp!, {r0, lr}; \
- cfi_adjust_cfa_offset (-8); \
- cfi_restore (lr); \
- teq ip, #0
+ stmfd sp!, {r0, lr}; \
+ cfi_adjust_cfa_offset (8); \
+ cfi_rel_offset (lr, 4); \
+ bl __aeabi_read_tp; \
+ NEGOFF_ADJ_BASE(r0, MULTIPLE_THREADS_OFFSET); \
+ ldr ip, NEGOFF_OFF1(r0, MULTIPLE_THREADS_OFFSET); \
+ ldmfd sp!, {r0, lr}; \
+ cfi_adjust_cfa_offset (-8); \
+ cfi_restore (lr); \
+ teq ip, #0
# define SINGLE_THREAD_P_PIC(x) SINGLE_THREAD_P
# endif
# endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
index 1bbe5c6..3c0ef78 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
@@ -26,15 +26,17 @@
ldr lr, [sp], #4; /* Restore LR. */ \
cfi_adjust_cfa_offset (-4); \
cfi_restore (lr); \
- mov r2, r0; /* Save the TLS addr in r2. */ \
- ldr r3, [r2, #PID_OFFSET]; /* Load the saved PID. */ \
- rsbs r0, r3, #0; /* Negate it. */ \
- moveq r0, #0x80000000; /* Use 0x80000000 if it was 0. */ \
- str r0, [r2, #PID_OFFSET] /* Store the temporary PID. */
+ NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2. */ \
+ ldr r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID. */ \
+ rsbs r0, r3, #0; /* Negate it. */ \
+ it eq; \
+ moveq r0, #0x80000000; /* Use 0x80000000 if it was 0. */ \
+ str r0, NEGOFF_OFF1(r2, PID_OFFSET); /* Store the temp PID. */
/* Restore the old PID value in the parent. */
#define RESTORE_PID \
cmp r0, #0; /* If we are the parent... */ \
- strne r3, [r2, #PID_OFFSET] /* ... restore the saved PID. */
+ it ne; \
+ strne r3, NEGOFF_OFF1(r2, PID_OFFSET); /* restore the saved PID. */
#include "../vfork.S"
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 13/26] arm: Store lr in r2 around GET_TLS
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (10 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 11/26] arm: Introduce and use NEGOFF series of macros Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 3:17 ` [PATCH 18/26] arm: Use GET_TLS more often Richard Henderson
` (16 subsequent siblings)
28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Rather than on the stack.
---
* sysdeps/unix/sysv/linux/arm/nptl/vfork.S (SAVE_PID): Save lr to r2
around the GET_TLS call.
* sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S (SAVE_PID): Likewise.
---
ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S | 8 +++-----
ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S | 8 +++-----
2 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
index bc0a771..cd51122 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/pt-vfork.S
@@ -19,12 +19,10 @@
/* Save the PID value. */
#define SAVE_PID \
- str lr, [sp, #-4]!; /* Save LR. */ \
- cfi_adjust_cfa_offset (4); \
- cfi_rel_offset (lr, 0); \
+ mov r2, lr; /* Save LR. */ \
+ cfi_register (lr, r2); \
GET_TLS; \
- ldr lr, [sp], #4; /* Restore LR. */ \
- cfi_adjust_cfa_offset (-4); \
+ mov lr, r2; /* Restore LR. */ \
cfi_restore (lr); \
NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2. */ \
ldr r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID. */ \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
index 3c0ef78..4007081 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/vfork.S
@@ -19,12 +19,10 @@
/* Save the PID value. */
#define SAVE_PID \
- str lr, [sp, #-4]!; /* Save LR. */ \
- cfi_adjust_cfa_offset (4); \
- cfi_rel_offset (lr, 0); \
+ mov r2, lr; /* Save LR. */ \
+ cfi_register (lr, r2); \
GET_TLS; \
- ldr lr, [sp], #4; /* Restore LR. */ \
- cfi_adjust_cfa_offset (-4); \
+ mov lr, r2; /* Restore LR. */ \
cfi_restore (lr); \
NEGOFF_ADJ_BASE2(r2, r0, PID_OFFSET); /* Save the TLS addr in r2. */ \
ldr r3, NEGOFF_OFF1(r2, PID_OFFSET); /* Load the saved PID. */ \
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 18/26] arm: Use GET_TLS more often
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (11 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 13/26] arm: Store lr in r2 around GET_TLS Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 21:59 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
` (15 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
---
* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_undefweak): Use GET_TLS,
save LR in R1, and return directly from R1.
(_dl_tlsdesc_dynamic): Use GET_TLS.
* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
(SINGLE_THREAD_P): Use GET_TLS.
---
ports/sysdeps/arm/dl-tlsdesc.S | 23 +++++++---------------
.../unix/sysv/linux/arm/nptl/sysdep-cancel.h | 2 +-
2 files changed, 8 insertions(+), 17 deletions(-)
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 6c47743..12214f1 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -44,22 +44,13 @@ _dl_tlsdesc_return:
.fnstart
.align 2
_dl_tlsdesc_undefweak:
- @ Are we allowed a misaligned stack pointer calling read_tp?
- .save {lr}
- push {lr}
- cfi_adjust_cfa_offset (4)
- cfi_rel_offset (lr,0)
- bl __aeabi_read_tp
+ @ ??? The only GET_TLS implementation in tree is Linux,
+ @ which is guaranteed to clobber only R0 and LR.
+ mov r1, lr
+ cfi_register (lr, r1)
+ GET_TLS
rsb r0, r0, #0
-#if defined (__ARM_ARCH_4T__) && defined (__THUMB_INTERWORK__)
- pop {lr}
- cfi_adjust_cfa_offset (-4)
- cfi_restore (lr)
- bx lr
-#else
- pop {pc}
-#endif
-
+ BX (r1)
cfi_endproc
.fnend
.size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
@@ -104,7 +95,7 @@ _dl_tlsdesc_dynamic:
cfi_rel_offset (r4,8)
cfi_rel_offset (lr,12)
ldr r1, [r0] /* td */
- bl __aeabi_read_tp
+ GET_TLS
mov r4, r0 /* r4 = tp */
ldr r0, [r0]
ldr r2, [r1, #8] /* gen_count */
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index f0f7043..c2ab0ce 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -220,7 +220,7 @@ extern int __local_multiple_threads attribute_hidden;
push {r0, lr}; \
cfi_adjust_cfa_offset (8); \
cfi_rel_offset (lr, 4); \
- bl __aeabi_read_tp; \
+ GET_TLS; \
NEGOFF_ADJ_BASE(r0, MULTIPLE_THREADS_OFFSET); \
ldr ip, NEGOFF_OFF1(r0, MULTIPLE_THREADS_OFFSET); \
pop {r0, lr}; \
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 19/26] arm: Add optimized ffs for armv6t2
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (12 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 18/26] arm: Use GET_TLS more often Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 15:51 ` Måns Rullgård
2013-02-27 17:49 ` Roland McGrath
2013-02-27 3:17 ` [PATCH 15/26] arm: Delete LOADREGS macro Richard Henderson
` (14 subsequent siblings)
28 siblings, 2 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Half the size of gcc 4.8's output.
---
* sysdeps/arm/armv6t2/ffs.S: New file.
* sysdeps/arm/armv6t2/ffsll.S: New file.
---
ports/sysdeps/arm/armv6t2/ffs.S | 34 +++++++++++++++++++++++++++
ports/sysdeps/arm/armv6t2/ffsll.S | 49 +++++++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+)
create mode 100644 ports/sysdeps/arm/armv6t2/ffs.S
create mode 100644 ports/sysdeps/arm/armv6t2/ffsll.S
diff --git a/ports/sysdeps/arm/armv6t2/ffs.S b/ports/sysdeps/arm/armv6t2/ffs.S
new file mode 100644
index 0000000..765fb5d
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/ffs.S
@@ -0,0 +1,34 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+ENTRY(__ffs)
+ cmp r0, #0
+ ittt ne
+ rbitne r0, r0
+ clzne r0, r0
+ addne r0, r0, #1
+ bx lr
+END(__ffs)
+
+weak_alias (__ffs, ffs)
+weak_alias (__ffs, ffsl)
+libc_hidden_builtin_def (ffs)
diff --git a/ports/sysdeps/arm/armv6t2/ffsll.S b/ports/sysdeps/arm/armv6t2/ffsll.S
new file mode 100644
index 0000000..d428509
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/ffsll.S
@@ -0,0 +1,49 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+ENTRY(ffsll)
+ @ If low part is 0, operate on the high part. Ensure that the
+ @ word on which we operate is in r0. Set r2 to the bit offset
+ @ of the word being considered. Set the flags for the word
+ @ being operated on.
+#ifdef __ARMEL__
+ cmp r0, #0
+ itee ne
+ movne r2, #0
+ moveq r2, #32
+ movseq r0, r1
+#else
+ cmp r1, #0
+ ittee ne
+ movne r2, #0
+ movne r0, r1
+ moveq r2, #32
+ cmpeq r0, #0
+#endif
+ @ Perform the ffs on r0.
+ itttt ne
+ rbitne r0, r0
+ clzne r0, r0
+ addne r0, r0, #1
+ addne r0, r0, r2
+ bx lr
+END(ffsll)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 19/26] arm: Add optimized ffs for armv6t2
2013-02-27 3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
@ 2013-02-27 15:51 ` Måns Rullgård
2013-02-27 16:34 ` Richard Henderson
2013-02-27 17:49 ` Roland McGrath
1 sibling, 1 reply; 63+ messages in thread
From: Måns Rullgård @ 2013-02-27 15:51 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports, Joseph Myers
Richard Henderson <rth@twiddle.net> writes:
> +ENTRY(__ffs)
> + cmp r0, #0
> + ittt ne
> + rbitne r0, r0
> + clzne r0, r0
> + addne r0, r0, #1
> + bx lr
> +END(__ffs)
Making the RBIT unconditional (bit-reverse of zero is still zero) is
better since it reduces dependencies between instructions. Depending on
microarchitecture details, this might save a cycle.
--
Måns Rullgård
mans@mansr.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 19/26] arm: Add optimized ffs for armv6t2
2013-02-27 15:51 ` Måns Rullgård
@ 2013-02-27 16:34 ` Richard Henderson
0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 16:34 UTC (permalink / raw)
To: Måns Rullgård; +Cc: libc-ports, Joseph Myers
On 02/27/2013 07:51 AM, Måns Rullgård wrote:
> Richard Henderson <rth@twiddle.net> writes:
>
>> +ENTRY(__ffs)
>> + cmp r0, #0
>> + ittt ne
>> + rbitne r0, r0
>> + clzne r0, r0
>> + addne r0, r0, #1
>> + bx lr
>> +END(__ffs)
>
> Making the RBIT unconditional (bit-reverse of zero is still zero) is
> better since it reduces dependencies between instructions. Depending on
> microarchitecture details, this might save a cycle.
>
Fair enough. Consider this change made for any next round.
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 19/26] arm: Add optimized ffs for armv6t2
2013-02-27 3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
2013-02-27 15:51 ` Måns Rullgård
@ 2013-02-27 17:49 ` Roland McGrath
1 sibling, 0 replies; 63+ messages in thread
From: Roland McGrath @ 2013-02-27 17:49 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports, Joseph Myers
Space before paren. Descriptive line at top of new files.
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 15/26] arm: Delete LOADREGS macro
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (13 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 19/26] arm: Add optimized ffs for armv6t2 Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 1:24 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 12/26] arm: Enable thumb2 mode in assembly files Richard Henderson
` (13 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
There was only one user. It's "condition" argument was used
for "ia" rather than an actual condition. The apcs26 syntax
is almost certainly not needed, given current binutils requirements.
---
* sysdeps/arm/__longjmp.S (__longjmp): Use ldmia insn directly.
* sysdeps/arm/sysdep.h (LOADREGS): Remove.
---
ports/sysdeps/arm/__longjmp.S | 2 +-
ports/sysdeps/arm/sysdep.h | 4 ----
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/ports/sysdeps/arm/__longjmp.S b/ports/sysdeps/arm/__longjmp.S
index af4b963..050227b 100644
--- a/ports/sysdeps/arm/__longjmp.S
+++ b/ports/sysdeps/arm/__longjmp.S
@@ -37,7 +37,7 @@ ENTRY (__longjmp)
cfi_undefined (r4)
CHECK_SP (r4)
#endif
- LOADREGS(ia, ip!, {v1-v6, sl, fp, sp, lr})
+ ldmia ip!, {v1-v6, sl, fp, sp, lr}
cfi_restore (v1)
cfi_restore (v2)
cfi_restore (v3)
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index fed3dfd..bfdba27 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -35,8 +35,6 @@
/* APCS-32 doesn't preserve the condition codes across function call. */
#ifdef __APCS_32__
-#define LOADREGS(cond, base, reglist...)\
- ldm##cond base,reglist
#ifdef __USE_BX__
#define RETINSTR(cond, reg) \
bx##cond reg
@@ -49,8 +47,6 @@
mov pc, _reg
#endif
#else /* APCS-26 */
-#define LOADREGS(cond, base, reglist...)\
- ldm##cond base,reglist^
#define RETINSTR(cond, reg) \
mov##cond##s pc, reg
#define DO_RET(_reg) \
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 15/26] arm: Delete LOADREGS macro
2013-02-27 3:17 ` [PATCH 15/26] arm: Delete LOADREGS macro Richard Henderson
@ 2013-02-28 1:24 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 1:24 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> There was only one user. It's "condition" argument was used
> for "ia" rather than an actual condition. The apcs26 syntax
> is almost certainly not needed, given current binutils requirements.
> ---
> * sysdeps/arm/__longjmp.S (__longjmp): Use ldmia insn directly.
> * sysdeps/arm/sysdep.h (LOADREGS): Remove.
OK. The __APCS_32__ conditional can simply be removed; there's no
practical support for 26-bit ARM (or anything older than v4) in glibc (and
I don't know if v4 as opposed to v4t will actually work, although in
principle it should work via linking with --fix-v4bx-interworking).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 12/26] arm: Enable thumb2 mode in assembly files
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (14 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 15/26] arm: Delete LOADREGS macro Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 3:17 ` [PATCH 08/26] arm: Add IT insns for thumb mode Richard Henderson
` (12 subsequent siblings)
28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
The preceeding patches have allowed for the few incompatibilities
between arm and thumb2 mode, or have marked the file as not wanting
to use thumb2 mode.
Note that one still has to edit config.make in the build directory
to add ASFLAGS to add -mthumb...
---
* sysdeps/arm/sysdep.h [__ASSEMBLER__]: Enable thumb2 if __thumb2__.
---
ports/sysdeps/arm/sysdep.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 2d40823..3459219 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -114,6 +114,17 @@
the caller. */
.eabi_attribute 24, 1
+/* The thumb2 encoding is reasonably complete. Unless suppressed, use it. */
+#ifdef NO_THUMB
+# undef __thumb__
+# undef __thumb2__
+ .arm
+#endif
+#ifdef __thumb2__
+ .syntax unified
+ .thumb
+#endif
+
/* We occasionally want to use the S form simply to achieve a smaller
instruction form in Thumb mode. Never set the flags in ARM mode. */
#ifdef __thumb__
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 08/26] arm: Add IT insns for thumb mode
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (15 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 12/26] arm: Enable thumb2 mode in assembly files Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 0:41 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 16/26] arm: Commonize BX conditionals Richard Henderson
` (11 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
These are ignored by the assembler in ARM mode, so by
default this has no effect on generated code.
---
* ports/sysdeps/arm/arm-mcount.S: Always use unified syntax and
always add IT markup.
* sysdeps/unix/sysv/linux/arm/mmap64.S (__mmap64): Likewise.
* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Add IT markup.
* sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/arm/mmap.S (__mmap): Likewise.
* sysdeps/unix/sysv/linux/arm/syscall.S (syscall): Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep.h (PSEUDO_RET): Likewise.
* sysdeps/unix/sysv/linux/arm/vfork.S (__vfork): Likewise.
---
ports/sysdeps/arm/arm-mcount.S | 9 ++-------
ports/sysdeps/arm/dl-tlsdesc.S | 1 +
ports/sysdeps/unix/arm/sysdep.S | 5 +++--
ports/sysdeps/unix/sysv/linux/arm/clone.S | 4 +++-
ports/sysdeps/unix/sysv/linux/arm/mmap.S | 1 +
ports/sysdeps/unix/sysv/linux/arm/mmap64.S | 6 +++++-
ports/sysdeps/unix/sysv/linux/arm/syscall.S | 1 +
ports/sysdeps/unix/sysv/linux/arm/sysdep.h | 1 +
ports/sysdeps/unix/sysv/linux/arm/vfork.S | 1 +
9 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/ports/sysdeps/arm/arm-mcount.S b/ports/sysdeps/arm/arm-mcount.S
index 6c24271..679d042 100644
--- a/ports/sysdeps/arm/arm-mcount.S
+++ b/ports/sysdeps/arm/arm-mcount.S
@@ -24,8 +24,8 @@
#ifdef __thumb2__
.thumb
- .syntax unified
#endif
+ .syntax unified
/* Use an assembly stub with a special ABI. The calling lr has been
@@ -77,15 +77,10 @@ ENTRY(_mcount)
cfi_rel_offset (r3, 12)
cfi_rel_offset (fp, 16)
cfi_rel_offset (lr, 20)
-#ifdef __thumb2__
movs r0, fp
ittt ne
ldrne r0, [r0, #-4]
-#else
- movs fp, fp
- ldrne r0, [fp, #-4]
-#endif
- movnes r1, lr
+ movsne r1, lr
blne __mcount_internal
#ifdef __thumb2__
ldmia sp!, {r0, r1, r2, r3, fp, pc}
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 0ae3abb..c3e2b3e 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -116,6 +116,7 @@ _dl_tlsdesc_dynamic:
ldr r3, [r1]
ldr r2, [r0, r3, lsl #3]
cmn r2, #1
+ ittt ne
ldrne r3, [r1, #4]
addne r3, r2, r3
rsbne r0, r4, r3
diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index 425f4ac..951642f 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -31,8 +31,9 @@ __syscall_error:
/* We translate the system's EWOULDBLOCK error into EAGAIN.
The GNU C library always defines EWOULDBLOCK==EAGAIN.
EWOULDBLOCK_sys is the original number. */
- cmp r0, $EWOULDBLOCK_sys /* Is it the old EWOULDBLOCK? */
- moveq r0, $EAGAIN /* Yes; translate it to EAGAIN. */
+ cmp r0, $EWOULDBLOCK_sys /* Is it the old EWOULDBLOCK? */
+ it eq
+ moveq r0, $EAGAIN /* Yes; translate it to EAGAIN. */
#endif
#ifndef IS_IN_rtld
diff --git a/ports/sysdeps/unix/sysv/linux/arm/clone.S b/ports/sysdeps/unix/sysv/linux/arm/clone.S
index 8807781..9de37f2 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/clone.S
@@ -33,6 +33,7 @@
ENTRY(__clone)
@ sanity check args
cmp r0, #0
+ ite ne
cmpne r1, #0
moveq r0, #-EINVAL
beq PLTJMP(syscall_error)
@@ -76,8 +77,9 @@ PSEUDO_END (__clone)
GET_TLS
mov r1, r0
tst ip, #CLONE_VM
- movne r0, #-1
ldr r7, =SYS_ify(getpid)
+ ite ne
+ movne r0, #-1
swieq 0x0
str r0, [r1, #PID_OFFSET]
str r0, [r1, #TID_OFFSET]
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap.S b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
index fa8a2b8..68560b0 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap.S
@@ -51,6 +51,7 @@ ENTRY (__mmap)
cfi_restore (r5)
cmn r0, $4096
+ it cc
RETINSTR(cc, lr)
b PLTJMP(syscall_error)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
index 2eafd1b..dcbab3a 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/mmap64.S
@@ -17,6 +17,8 @@
#include <sysdep.h>
+ .syntax unified
+
#define EINVAL 22
#ifdef __ARMEB__
@@ -42,7 +44,8 @@ ENTRY (__mmap64)
cfi_remember_state
movs r4, ip, lsl $20 @ check that offset is page-aligned
mov ip, ip, lsr $12
- moveqs r4, r5, lsr $12 @ check for overflow
+ it eq
+ movseq r4, r5, lsr $12 @ check for overflow
bne .Linval
ldr r4, [sp, $8] @ load fd
orr r5, ip, r5, lsl $20 @ compose page offset
@@ -52,6 +55,7 @@ ENTRY (__mmap64)
cfi_adjust_cfa_offset (-8)
cfi_restore (r4)
cfi_restore (r5)
+ it cc
RETINSTR(cc, lr)
b PLTJMP(syscall_error)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/syscall.S b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
index c6dd57d..665ecb4 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/syscall.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/syscall.S
@@ -42,6 +42,7 @@ ENTRY (syscall)
cfi_restore (r6)
cfi_restore (r7)
cmn r0, #4096
+ it cc
RETINSTR(cc, lr)
b PLTJMP(syscall_error)
PSEUDO_END (syscall)
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index dae9d98..c1f2c9e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -79,6 +79,7 @@
cmn r0, $4096;
#define PSEUDO_RET \
+ it cc; \
RETINSTR(cc, lr); \
b PLTJMP(SYSCALL_ERROR)
#undef ret
diff --git a/ports/sysdeps/unix/sysv/linux/arm/vfork.S b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
index 4f84c57..ae931f7 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/vfork.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/vfork.S
@@ -51,6 +51,7 @@ ENTRY (__vfork)
RESTORE_PID
#endif
cmn a1, #4096
+ it cc
RETINSTR(cc, lr)
b PLTJMP(SYSCALL_ERROR)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 08/26] arm: Add IT insns for thumb mode
2013-02-27 3:17 ` [PATCH 08/26] arm: Add IT insns for thumb mode Richard Henderson
@ 2013-02-28 0:41 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 0:41 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> These are ignored by the assembler in ARM mode, so by
> default this has no effect on generated code.
> ---
> * ports/sysdeps/arm/arm-mcount.S: Always use unified syntax and
> always add IT markup.
> * sysdeps/unix/sysv/linux/arm/mmap64.S (__mmap64): Likewise.
> * sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Add IT markup.
> * sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
> * sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
> * sysdeps/unix/sysv/linux/arm/mmap.S (__mmap): Likewise.
> * sysdeps/unix/sysv/linux/arm/syscall.S (syscall): Likewise.
> * sysdeps/unix/sysv/linux/arm/sysdep.h (PSEUDO_RET): Likewise.
> * sysdeps/unix/sysv/linux/arm/vfork.S (__vfork): Likewise.
OK.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 16/26] arm: Commonize BX conditionals
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (16 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 08/26] arm: Add IT insns for thumb mode Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 21:51 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 21/26] arm: Implement armv6t2 optimized strcpy Richard Henderson
` (10 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Add BLX macro in addition and use it where appropriate.
---
* sysdeps/arm/sysdep.h (BX, BXC, BLX): New macros.
(DO_RET): Use BX.
(RETINSTR): Use BXC.
* sysdeps/arm/dl-tlsdesc.S (BX): Remove.
* sysdeps/arm/dl-trampoline.S (BX): Remove.
(_dl_runtime_profile): Use BLX.
---
ports/sysdeps/arm/dl-tlsdesc.S | 6 ------
ports/sysdeps/arm/dl-trampoline.S | 9 +--------
ports/sysdeps/arm/sysdep.h | 29 +++++++++++++----------------
3 files changed, 14 insertions(+), 30 deletions(-)
diff --git a/ports/sysdeps/arm/dl-tlsdesc.S b/ports/sysdeps/arm/dl-tlsdesc.S
index 15a0c21..417b8b3 100644
--- a/ports/sysdeps/arm/dl-tlsdesc.S
+++ b/ports/sysdeps/arm/dl-tlsdesc.S
@@ -20,12 +20,6 @@
#include <tls.h>
#include "tlsdesc.h"
-#ifdef __USE_BX__
- #define BX(x) bx x
-#else
- #define BX(x) mov pc, x
-#endif
-
.text
@ emit debug information with cfi
@ use arm-specific pseudos for unwinding itself
diff --git a/ports/sysdeps/arm/dl-trampoline.S b/ports/sysdeps/arm/dl-trampoline.S
index a13a4c3..c34c61e 100644
--- a/ports/sysdeps/arm/dl-trampoline.S
+++ b/ports/sysdeps/arm/dl-trampoline.S
@@ -21,12 +21,6 @@
#include <sysdep.h>
#include <libc-symbols.h>
-#if defined(__USE_BX__)
-#define BX(x) bx x
-#else
-#define BX(x) mov pc, x
-#endif
-
.text
.globl _dl_runtime_resolve
.type _dl_runtime_resolve, #function
@@ -192,8 +186,7 @@ _dl_runtime_profile:
add ip, r7, #72
ldmia ip, {r0-r3}
ldr ip, [r7, #264]
- mov lr, pc
- BX(ip)
+ BLX(ip)
stmia r7, {r0-r3}
@ Call pltexit.
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index bfdba27..71abb7a 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -33,26 +33,23 @@
#define PLTJMP(_x) _x##(PLT)
-/* APCS-32 doesn't preserve the condition codes across function call. */
-#ifdef __APCS_32__
#ifdef __USE_BX__
-#define RETINSTR(cond, reg) \
- bx##cond reg
-#define DO_RET(_reg) \
- bx _reg
+# define BX(R) bx R
+# define BXC(C, R) bx##C R
+# ifdef __ARM_ARCH_4T__
+# define BLX(R) mov lr, pc; bx R
+# else
+# define BLX(R) blx R
+# endif
#else
-#define RETINSTR(cond, reg) \
- mov##cond pc, reg
-#define DO_RET(_reg) \
- mov pc, _reg
-#endif
-#else /* APCS-26 */
-#define RETINSTR(cond, reg) \
- mov##cond##s pc, reg
-#define DO_RET(_reg) \
- movs pc, _reg
+# define BX(R) mov pc, R
+# define BXC(C, R) mov##C pc, R
+# define BLX(R) mov lr, pc; mov pc, R
#endif
+#define DO_RET(R) BX(R)
+#define RETINSTR(C, R) BXC(C, R)
+
/* Define an entry point visible from C. */
#define ENTRY(name) \
.globl C_SYMBOL_NAME(name); \
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 21/26] arm: Implement armv6t2 optimized strcpy
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (17 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 16/26] arm: Commonize BX conditionals Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 3:17 ` [PATCH 24/26] arm: Add optimized addmul_1 Richard Henderson
` (9 subsequent siblings)
28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Four times faster than the byte-by-byte default version.
---
* sysdeps/arm/armv6t2/strcpy.S: New file.
* sysdeps/arm/armv6t2/stpcpy.S: New file.
---
ports/sysdeps/arm/armv6t2/stpcpy.S | 1 +
ports/sysdeps/arm/armv6t2/strcpy.S | 213 +++++++++++++++++++++++++++++++++++++
2 files changed, 214 insertions(+)
create mode 100644 ports/sysdeps/arm/armv6t2/stpcpy.S
create mode 100644 ports/sysdeps/arm/armv6t2/strcpy.S
diff --git a/ports/sysdeps/arm/armv6t2/stpcpy.S b/ports/sysdeps/arm/armv6t2/stpcpy.S
new file mode 100644
index 0000000..21a4f38
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/stpcpy.S
@@ -0,0 +1 @@
+/* Defined in strcpy.S. */
diff --git a/ports/sysdeps/arm/armv6t2/strcpy.S b/ports/sysdeps/arm/armv6t2/strcpy.S
new file mode 100644
index 0000000..a0e3bcc
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strcpy.S
@@ -0,0 +1,213 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+/* Endian independent macros for shifting bytes within registers. */
+#ifdef __ARMEB__
+#define lsh_gt lsr
+#define lsh_ls lsl
+#else
+#define lsh_gt lsl
+#define lsh_ls lsr
+#endif
+
+#ifndef USE_AS_STPCPY
+# define STRCPY strcpy
+#endif
+
+ .syntax unified
+ .text
+
+ENTRY(__stpcpy)
+ @ Signal stpcpy with NULL in IP.
+ s(mov) ip, #0
+ b 0f
+END(__stpcpy)
+
+weak_alias (__stpcpy, stpcpy)
+libc_hidden_def (__stpcpy)
+libc_hidden_builtin_def (stpcpy)
+
+ENTRY(strcpy)
+ @ Signal strcpy with DEST in IP.
+ mov ip, r0
+0:
+ @ To cater to long strings, we want 8 byte alignment in the source.
+ @ To cater to small strings, we don't want to start that right away.
+ @ Loop up to 16 times, less whatever it takes to reach alignment.
+ and r3, r1, #7
+ rsb r3, r3, #16
+
+ @ Loop until we find ...
+1: ldrb r2, [r1], #1
+ subs r3, r3, #1 @ ... the alignment point
+ strb r2, [r0], #1
+ it ne
+ cmpne r2, #0 @ ... or EOS
+ bne 1b
+
+ @ Disambiguate the exit possibilites above
+ cmp r2, #0 @ Found EOS
+ beq .Lreturn
+
+ @ Load the next two words asap
+ ldrd r2, r3, [r1], #8
+
+ @ For longer strings, we actaully need a stack frame.
+ push { r4, r5, r6, r7 }
+ cfi_adjust_cfa_offset (16)
+ cfi_rel_offset (r4, 0)
+ cfi_rel_offset (r5, 4)
+ cfi_rel_offset (r6, 8)
+ cfi_rel_offset (r7, 12)
+
+ @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+ @ that was originally zero and 0xff otherwise. Therefore we consider
+ @ the lsb of each byte the "found" bit, with 0 for a match.
+ movw r7, #0xfefe
+ tst r0, #3 @ Test alignment of DEST
+ movt r7, #0xfefe
+ bne .Lunaligned
+
+ @ So now source (r1) is aligned to 8, and dest (r0) is aligned to 4.
+ @ Loop, reading 8 bytes at a time, searching for EOS.
+ .balign 16
+2: uqadd8 r4, r2, r7 @ Find EOS
+ uqadd8 r5, r3, r7
+ pld [r1, #256]
+ mvns r4, r4 @ EOS in first word?
+
+ pld [r0, #256]
+ bne 3f
+ str r2, [r0], #4
+ mvns r5, r5 @ EOS in second word?
+
+ bne 4f
+ str r3, [r0], #4
+ ldrd r2, r3, [r1], #8
+ b 2b
+
+3: s(sub) r1, r1, #4 @ backup to first word
+4: s(sub) r1, r1, #4 @ backup to second word
+
+ @ ... then finish up any tail a byte at a time.
+ @ Note that we generally back up and re-read source bytes,
+ @ but we'll not re-write dest bytes.
+.Lbyte_loop:
+ ldrb r2, [r1], #1
+ cmp r2, #0
+ strb r2, [r0], #1
+ bne .Lbyte_loop
+
+ pop { r4, r5, r6, r7 }
+ cfi_remember_state
+ cfi_adjust_cfa_offset (-16)
+ cfi_restore (r4)
+ cfi_restore (r5)
+ cfi_restore (r6)
+ cfi_restore (r7)
+
+.Lreturn:
+ cmp ip, #0 @ Was this strcpy or strcpy?
+ ite eq
+ subeq r0, r0, #1 @ stpcpy: undo post-inc from store
+ movne r0, ip @ strcpy: return original dest
+ bx lr
+
+.Lunaligned:
+ cfi_restore_state
+ @ Here, source is aligned to 8, but the destination is not word
+ @ aligned. Therefore we have to shift the data in order to be
+ @ able to perform aligned word stores.
+
+ @ Find out which misalignment we're dealing with.
+ tst r0, #1
+ beq .Lunaligned2
+ tst r0, #2
+ bne .Lunaligned3
+ @ Fallthru to .Lunaligned1.
+
+.macro unaligned_copy unalign
+ @ Prologue to unaligned loop. Seed shifted non-zero bytes.
+ uqadd8 r4, r2, r7 @ Find EOS
+ uqadd8 r5, r3, r7
+ mvns r4, r4 @ EOS in first word?
+ it ne
+ subne r1, r1, #8
+ bne .Lbyte_loop
+#ifdef __ARMEB__
+ rev r2, r2 @ Byte stores below need LE data
+#endif
+ @ Store a few bytes from the first word.
+ @ At the same time we align r0 and shift out bytes from r2.
+.rept 4-\unalign
+ strb r2, [r0], #1
+ s(lsr) r2, r2, #8
+.endr
+#ifdef __ARMEB__
+ rev r2, r2 @ Undo previous rev
+#endif
+ @ Rotated unaligned copy loop. The tail of the prologue is
+ @ shared with the loop itself.
+ .balign 8
+1: mvns r5, r5 @ EOS in second word?
+ bne 4f
+ @ Combine first and second words
+ orr r2, r2, r3, lsh_gt #(\unalign*8)
+ @ Save leftover bytes from the two words
+ lsh_ls r6, r3, #((4-\unalign)*8)
+ str r2, [r0], #4
+ @ The "real" start of the unaligned copy loop.
+ ldrd r2, r3, [r1], #8 @ Load 8 more bytes
+ uqadd8 r4, r2, r7 @ Find EOS
+ pld [r1, #256]
+ uqadd8 r5, r3, r7
+ pld [r0, #256]
+ mvns r4, r4 @ EOS in first word?
+ bne 3f
+ @ Combine the leftover and the first word
+ orr r6, r6, r2, lsh_gt #(\unalign*8)
+ @ Discard used bytes from the first word.
+ lsh_ls r2, r2, #((4-\unalign)*8)
+ str r6, [r0], #4
+ b 1b
+ @ Found EOS in one of the words; adjust backward
+3: s(sub) r1, r1, #4
+ mov r2, r6
+4: s(sub) r1, r1, #4
+ @ And store the remaining bytes from the leftover
+#ifdef __ARMEB__
+ rev r2, r2
+#endif
+.rept \unalign
+ strb r2, [r0], #1
+ s(lsr) r2, r2, #8
+.endr
+ b .Lbyte_loop
+.endm
+
+.Lunaligned1:
+ unaligned_copy 1
+.Lunaligned2:
+ unaligned_copy 2
+.Lunaligned3:
+ unaligned_copy 3
+
+END(STRCPY)
+
+libc_hidden_builtin_def (strcpy)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 24/26] arm: Add optimized addmul_1
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (18 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 21/26] arm: Implement armv6t2 optimized strcpy Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 13:58 ` Måns Rullgård
2013-02-27 3:17 ` [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr Richard Henderson
` (8 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Written from scratch rather than copied from GMP, due to GPL 2.1 vs
GPL 3, but tested with the GMP testsuite.
This is 25% faster than the generic code as measured on Cortex-A15,
and the same speed as GMP on the same core. It's probably slower
than GMP on the A8 and A9 cores though.
---
* sysdeps/arm/addmul_1.S: New file.
---
ports/sysdeps/arm/addmul_1.S | 60 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
create mode 100644 ports/sysdeps/arm/addmul_1.S
diff --git a/ports/sysdeps/arm/addmul_1.S b/ports/sysdeps/arm/addmul_1.S
new file mode 100644
index 0000000..ecb8983
--- /dev/null
+++ b/ports/sysdeps/arm/addmul_1.S
@@ -0,0 +1,60 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+@ cycles/limb
+@ StrongArm ?
+@ Cortex-A8 ?
+@ Cortex-A9 ?
+@ Cortex-A15 4
+
+/* mp_limb_t mpn_addmul_1(res_ptr, src1_ptr, size, s2_limb) */
+
+ENTRY(__mpn_addmul_1)
+ push { r4, r5, r6 }
+ cfi_adjust_cfa_offset (12)
+ cfi_rel_offset (r4, 0)
+ cfi_rel_offset (r5, 4)
+ cfi_rel_offset (r6, 8)
+
+ ldr r6, [r1], #4
+ ldr r5, [r0]
+ mov r4, #0 /* init carry in */
+ b 1f
+0:
+ ldr r6, [r1], #4 /* load next ul */
+ adds r4, r4, r5 /* (out, c) = cl + lpl */
+ ldr r5, [r0, #4] /* load next rl */
+ str r4, [r0], #4
+ adc r4, ip, #0 /* cl = hpl + c */
+1:
+ mov ip, #0 /* zero-extend rl */
+ umlal r5, ip, r6, r3 /* (hpl, lpl) = ul * vl + rl */
+ subs r2, r2, #1
+ bne 0b
+
+ adds r4, r4, r5 /* (out, c) = cl + llpl */
+ str r4, [r0]
+ adc r0, ip, #0 /* return hpl + c */
+
+ pop { r4, r5, r6 }
+ DO_RET(lr)
+END(__mpn_addmul_1)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 24/26] arm: Add optimized addmul_1
2013-02-27 3:17 ` [PATCH 24/26] arm: Add optimized addmul_1 Richard Henderson
@ 2013-02-28 13:58 ` Måns Rullgård
2013-02-28 18:19 ` Richard Henderson
0 siblings, 1 reply; 63+ messages in thread
From: Måns Rullgård @ 2013-02-28 13:58 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports, Joseph Myers
Richard Henderson <rth@twiddle.net> writes:
> +ENTRY(__mpn_addmul_1)
> + push { r4, r5, r6 }
> + cfi_adjust_cfa_offset (12)
> + cfi_rel_offset (r4, 0)
> + cfi_rel_offset (r5, 4)
> + cfi_rel_offset (r6, 8)
> +
> + ldr r6, [r1], #4
> + ldr r5, [r0]
> + mov r4, #0 /* init carry in */
> + b 1f
> +0:
> + ldr r6, [r1], #4 /* load next ul */
> + adds r4, r4, r5 /* (out, c) = cl + lpl */
> + ldr r5, [r0, #4] /* load next rl */
> + str r4, [r0], #4
> + adc r4, ip, #0 /* cl = hpl + c */
You might gain a cycle here on some cores by replacing r4 by something
else in the adds/str sequence and reversing the order of the last two
insns to better exploit dual-issue. On most semi-modern cores you can
get another register for free by pushing one more to the stack
(load/store multiple instructions transfer registers pairwise).
I'd expect this to benefit the A8 and maybe A9. On A15 it should make
no difference.
> +1:
> + mov ip, #0 /* zero-extend rl */
> + umlal r5, ip, r6, r3 /* (hpl, lpl) = ul * vl + rl */
> + subs r2, r2, #1
> + bne 0b
> +
> + adds r4, r4, r5 /* (out, c) = cl + llpl */
> + str r4, [r0]
> + adc r0, ip, #0 /* return hpl + c */
> +
> + pop { r4, r5, r6 }
> + DO_RET(lr)
> +END(__mpn_addmul_1)
--
Måns Rullgård
mans@mansr.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 24/26] arm: Add optimized addmul_1
2013-02-28 13:58 ` Måns Rullgård
@ 2013-02-28 18:19 ` Richard Henderson
2013-02-28 19:37 ` Måns Rullgård
0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-28 18:19 UTC (permalink / raw)
To: Måns Rullgård; +Cc: libc-ports, Joseph Myers
On 02/28/2013 05:58 AM, Måns Rullgård wrote:
>> > +0:
>> > + ldr r6, [r1], #4 /* load next ul */
>> > + adds r4, r4, r5 /* (out, c) = cl + lpl */
>> > + ldr r5, [r0, #4] /* load next rl */
>> > + str r4, [r0], #4
>> > + adc r4, ip, #0 /* cl = hpl + c */
> You might gain a cycle here on some cores by replacing r4 by something
> else in the adds/str sequence and reversing the order of the last two
> insns to better exploit dual-issue. On most semi-modern cores you can
> get another register for free by pushing one more to the stack
> (load/store multiple instructions transfer registers pairwise).
>
> I'd expect this to benefit the A8 and maybe A9. On A15 it should make
> no difference.
>
To swap the adc and str, I'd have to add another move insn too. I guess the
intent is that would dual-issue with the store, giving us 6 insns in 3 cycles
as opposed to 5 insns in 4 cycles?
Fair enough.
I'm not willing to work *too* hard on this. If someone cares about the last
cycle of performance on A[89], they should work on getting the real libgmp
routines re-licensed for glibc. I'm not willing to do politics.
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 24/26] arm: Add optimized addmul_1
2013-02-28 18:19 ` Richard Henderson
@ 2013-02-28 19:37 ` Måns Rullgård
0 siblings, 0 replies; 63+ messages in thread
From: Måns Rullgård @ 2013-02-28 19:37 UTC (permalink / raw)
To: Richard Henderson; +Cc: Måns Rullgård, libc-ports, Joseph Myers
Richard Henderson <rth@twiddle.net> writes:
> On 02/28/2013 05:58 AM, Måns Rullgård wrote:
>>> > +0:
>>> > + ldr r6, [r1], #4 /* load next ul */
>>> > + adds r4, r4, r5 /* (out, c) = cl + lpl */
>>> > + ldr r5, [r0, #4] /* load next rl */
>>> > + str r4, [r0], #4
>>> > + adc r4, ip, #0 /* cl = hpl + c */
>> You might gain a cycle here on some cores by replacing r4 by something
>> else in the adds/str sequence and reversing the order of the last two
>> insns to better exploit dual-issue. On most semi-modern cores you can
>> get another register for free by pushing one more to the stack
>> (load/store multiple instructions transfer registers pairwise).
>>
>> I'd expect this to benefit the A8 and maybe A9. On A15 it should make
>> no difference.
>>
>
> To swap the adc and str, I'd have to add another move insn too. I guess the
> intent is that would dual-issue with the store, giving us 6 insns in 3 cycles
> as opposed to 5 insns in 4 cycles?
I meant like this:
ldr r6, [r1], #4 /* load next ul */
adds r7, r4, r5 /* (out, c) = cl + lpl */
ldr r5, [r0, #4] /* load next rl */
adc r4, ip, #0 /* cl = hpl + c */
str r7, [r0], #4
It seems to me this leaves everything with the same values as your
version. r7 can be pushed/popped for free since you're currently
preserving and odd number of registers.
> Fair enough.
>
> I'm not willing to work *too* hard on this. If someone cares about the last
> cycle of performance on A[89], they should work on getting the real libgmp
> routines re-licensed for glibc. I'm not willing to do politics.
Nor am I.
--
Måns Rullgård
mans@mansr.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (19 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 24/26] arm: Add optimized addmul_1 Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 1:31 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 25/26] arm: Add optimized submul_1 Richard Henderson
` (7 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Not specifically speed tested against the byte-by-byte versions,
but expected to be about as fast as the new strlen.
---
* sysdeps/arm/armv6t2/strchr.S: New file.
* sysdeps/arm/armv6t2/strrchr.S: New file.
* sysdeps/arm/armv6t2/rawmemchr.S: New file.
---
ports/sysdeps/arm/armv6t2/rawmemchr.S | 81 ++++++++++++++++++++
ports/sysdeps/arm/armv6t2/strchr.S | 138 ++++++++++++++++++++++++++++++++++
ports/sysdeps/arm/armv6t2/strrchr.S | 137 +++++++++++++++++++++++++++++++++
3 files changed, 356 insertions(+)
create mode 100644 ports/sysdeps/arm/armv6t2/rawmemchr.S
create mode 100644 ports/sysdeps/arm/armv6t2/strchr.S
create mode 100644 ports/sysdeps/arm/armv6t2/strrchr.S
diff --git a/ports/sysdeps/arm/armv6t2/rawmemchr.S b/ports/sysdeps/arm/armv6t2/rawmemchr.S
new file mode 100644
index 0000000..eea7707
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/rawmemchr.S
@@ -0,0 +1,81 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+ENTRY(__rawmemchr)
+ @ r0 = start of string
+ @ r1 = character to match
+ @ returns a pointer to the match, which must be present.
+ uxtb r1, r1
+
+ @ Loop until we find ...
+1: ldrb r2, [r0], #1
+ cmp r2, r1 @ ... the character
+ it ne
+ tstne r0, #7 @ ... the aligment point
+ bne 1b
+
+ @ Disambiguate the exit possibilites above
+ cmp r2, r1 @ Found the character
+ itt eq
+ subeq r0, r0, #1
+ bxeq lr
+
+ @ So now we're aligned.
+ orr r1, r1, r1, lsl #8 @ Replicate C to all bytes
+ movw ip, #0xfefe
+ orr r1, r1, r1, lsl #16
+ movt ip, #0xfefe
+
+ @ Loop searching for EOS or C, 8 bytes at a time.
+ @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+ @ that was originally zero and 0xff otherwise. Therefore we consider
+ @ the lsb of each byte the "found" bit, with 0 for a match.
+2: ldrd r2, r3, [r0], #8
+ s(eor) r2, r2, r1 @ Convert C bytes to 0
+ s(eor) r3, r3, r1
+ uqadd8 r2, r2, ip @ Find C
+ uqadd8 r3, r3, ip
+ s(and) r3, r3, r2 @ Combine the two words
+ mvns r3, r3 @ Test for any found bit true
+ beq 2b
+
+ @ Found something. Disambiguate between first and second words.
+ @ Adjust r0 to point to the word containing the match.
+ @ Adjust r2 to the found bits for the word containing the match.
+ mvns r2, r2
+ itee ne
+ subne r0, r0, #8
+ subeq r0, r0, #4
+ moveq r2, r3
+
+ @ Find the bit-offset of the match within the word.
+#ifdef __ARMEL__
+ rbit r2, r2 @ For LE we need count-trailing-zeros
+#endif
+ clz r2, r2
+ add r0, r0, r2, lsr #3 @ Adjust the pointer to the found byte
+ bx lr
+
+END(__rawmemchr)
+
+weak_alias (__rawmemchr, rawmemchr)
+libc_hidden_def (__rawmemchr)
diff --git a/ports/sysdeps/arm/armv6t2/strchr.S b/ports/sysdeps/arm/armv6t2/strchr.S
new file mode 100644
index 0000000..e7f5acf
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strchr.S
@@ -0,0 +1,138 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+ENTRY(strchr)
+ @ r0 = start of string
+ @ r1 = character to match
+ @ returns NULL for no match, or a pointer to the match
+
+ @ To cater to long strings, we want to search through a few
+ @ characters until we reach an aligned pointer. To cater to
+ @ small strings, we don't want to start doing word operations
+ @ immediately. The compromise is a maximum of 32 bytes less
+ @ whatever is required to end with an aligned pointer.
+ @ r3 = number of characters to search in alignment loop
+ and r3, r0, #7
+ uxtb r1, r1
+ rsb r3, r3, #32
+
+ @ Loop until we find ...
+1: ldrb r2, [r0], #1
+ subs r3, r3, #1 @ ... the aligment point
+ it ne
+ cmpne r2, r1 @ ... or the character
+ it ne
+ cmpne r2, #0 @ ... or EOS
+ bne 1b
+
+ @ Disambiguate the exit possibilites above
+ cmp r2, r1 @ Found the character
+ itt eq
+ subeq r0, r0, #1
+ bxeq lr
+
+ cmp r2, #0 @ Found EOS
+ itt eq
+ moveq r0, #0
+ bxeq lr
+
+ @ So now we're aligned. Now we actually need a stack frame.
+ push { r4, r5, r6, r7 }
+ cfi_adjust_cfa_offset (16)
+ cfi_rel_offset (r4, 0)
+ cfi_rel_offset (r5, 4)
+ cfi_rel_offset (r6, 8)
+ cfi_rel_offset (r7, 12)
+
+ orr r1, r1, r1, lsl #8 @ Replicate C to all bytes
+ movw ip, #0xfefe
+ orr r1, r1, r1, lsl #16
+ movt ip, #0xfefe
+
+ @ Loop searching for EOS or C, 8 bytes at a time.
+2: ldrd r2, r3, [r0], #8
+ @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+ @ that was originally zero and 0xff otherwise. Therefore we consider
+ @ the lsb of each byte the "found" bit, with 0 for a match.
+ uqadd8 r4, r2, ip @ Find EOS
+ uqadd8 r5, r3, ip
+ eor r6, r2, r1 @ Convert C bytes to 0
+ eor r7, r3, r1
+ uqadd8 r6, r6, ip @ Find C
+ uqadd8 r7, r7, ip
+ s(and) r4, r4, r6 @ Combine found for EOS and C
+ s(and) r5, r5, r7
+ and r6, r4, r5 @ Combine the two words
+ mvns r6, r6 @ Test for any found bit true
+ beq 2b
+
+ @ Invert the sense of the found bits. After this we have 1 in
+ @ any byte that contains a match, and 0 otherwise.
+ s(mvn) r5, r5
+ mvns r4, r4
+
+ @ Found something. Disambiguate between first and second words.
+ @ Adjust r0 to point to the word containing the match.
+ @ Adjust r2 to the contents of the word containing the match.
+ @ Adjust r4 to the found bits for the word containing the match.
+ iteee ne
+ subne r0, r0, #8
+ subeq r0, r0, #4
+ moveq r4, r5
+ moveq r2, r3
+
+ @ Find the bit-offset of the match within the word.
+#ifdef __ARMEL__
+ @ For little-endian, we only need to reverse the bits so that
+ @ count-leading-zeros becomes in effect count-trailing-zeros.
+ rbit r4, r4
+ clz r3, r4
+#else
+ @ For big-endian, we're matching 0x01 (not 0x80), and so the
+ @ bit offset is 7 too high. Also, we byte-swap the word so
+ @ that we can shift down to extract the found byte.
+ clz r3, r4
+ rev r2, r2
+ s(sub) r3, r3, #7
+#endif
+ s(lsr) r2, r2, r3 @ Shift down found byte
+ add r0, r0, r3, lsr #3 @ Adjust the pointer to the found byte
+ uxtb r2, r2 @ Extract found byte
+ uxtb r1, r1 @ Undo replication of C
+
+ pop { r4, r5, r6, r7 }
+ cfi_adjust_cfa_offset (-16)
+ cfi_restore (r4)
+ cfi_restore (r5)
+ cfi_restore (r6)
+ cfi_restore (r7)
+
+ @ Disambiguate between EOS and C.
+ cmp r2, r1
+ it ne
+ movne r0, #0 @ Found EOS, return NULL
+ bx lr
+
+END(strchr)
+
+weak_alias (strchr, index)
+libc_hidden_builtin_def (strchr)
diff --git a/ports/sysdeps/arm/armv6t2/strrchr.S b/ports/sysdeps/arm/armv6t2/strrchr.S
new file mode 100644
index 0000000..483e52a
--- /dev/null
+++ b/ports/sysdeps/arm/armv6t2/strrchr.S
@@ -0,0 +1,137 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+ENTRY(strrchr)
+ @ r0 = start of string
+ @ r1 = character to match
+ @ returns NULL for no match, or a pointer to the match
+
+ mov r3, r0
+ s(mov) r0, #0
+ uxtb r1, r1
+
+ @ Loop a few times until we're aligned.
+ tst r3, #7
+ beq 2f
+1: ldrb r2, [r3], #1
+ cmp r2, r1 @ Find the character
+ it eq
+ subeq r0, r3, #1
+ cmp r2, #0 @ Find EOS
+ it eq
+ bxeq lr
+ tst r3, #7 @ Find the aligment point
+ bne 1b
+
+ @ So now we're aligned. Now we actually need a stack frame.
+2: push { r4, r5, r6, r7 }
+ cfi_adjust_cfa_offset (16)
+ cfi_rel_offset (r4, 0)
+ cfi_rel_offset (r5, 4)
+ cfi_rel_offset (r6, 8)
+ cfi_rel_offset (r7, 12)
+
+ orr r1, r1, r1, lsl #8 @ Replicate C to all bytes
+ movw ip, #0xfefe
+ orr r1, r1, r1, lsl #16
+ movt ip, #0xfefe
+ s(mov) r2, #0 @ No found bits yet
+
+ @ Loop searching for EOS and C, 8 bytes at a time.
+ @ Any time we find a match in a word, we copy the address of
+ @ the word to r0, and the found bits to r2.
+3: ldrd r4, r5, [r3], #8
+ @ Adding (unsigned saturating) 0xfe means result of 0xfe for any byte
+ @ that was originally zero and 0xff otherwise. Therefore we consider
+ @ the lsb of each byte the "found" bit, with 0 for a match.
+ uqadd8 r6, r4, ip @ Find EOS
+ uqadd8 r7, r5, ip
+ s(eor) r4, r4, r1 @ Convert C bytes to 0
+ s(eor) r5, r5, r1
+ uqadd8 r4, r4, ip @ Find C
+ uqadd8 r5, r5, ip
+ mvns r6, r6 @ Found EOS, first word
+ bne 4f
+ mvns r4, r4 @ Handle C, first word
+ itt ne
+ subne r0, r3, #8
+ movne r2, r4
+ mvns r7, r7 @ Found EOS, second word
+ bne 5f
+ mvns r5, r5 @ Handle C, second word
+ itt ne
+ subne r0, r3, #4
+ movne r2, r5
+ b 3b
+
+ @ Found EOS in second word; fold to first word.
+5: s(add) r3, r3, #4 @ Dec pointer to 2nd word, with below
+ mov r4, r5 @ Overwrite first word C found
+ mov r6, r7 @ Overwrite first word EOS found
+
+ @ Found EOS. Zap found C after EOS.
+4: s(sub) r3, r3, #8 @ Decrement pointer to first word
+ s(mvn) r4, r4 @ Positive found bit for C
+#ifdef __ARMEL__
+ sub r7, r6, #1 @ Toggle EOS lsb and below
+ s(eor) r6, r6, r7 @ All bits below and including lsb
+ ands r4, r4, r6 @ Zap C above EOS
+#else
+ clz r6, r6 @ Find highest EOS bit set.
+ s(mvn) r7, #0
+ s(add) r6, r6, #1
+ s(lsr) r7, r7, r6 @ All bits below msb
+ bics r4, r4, r7 @ Zap C below EOS
+#endif
+ itt ne
+ movne r2, r4 @ Copy to result, if still non-zero
+ movne r0, r3
+
+ pop { r4, r5, r6, r7 }
+ cfi_adjust_cfa_offset (-16)
+ cfi_restore (r4)
+ cfi_restore (r5)
+ cfi_restore (r6)
+ cfi_restore (r7)
+
+ @ Adjust the result pointer if we found a word containing C.
+ @ Rather than fight with thumb IT insn about how many insns
+ @ we'd like to conditionally execute, just jump over them here.
+#ifdef __thumb2__
+#define ne(insn) insn
+ cbz r2, 6f @ Did we find any C?
+#else
+#define ne(insn) insn##ne
+ cmp r2, #0
+#endif
+#ifdef __ARMEB__
+ ne(rbit) r2, r2 @ BE needs count-trailing-zeros
+#endif
+ ne(clz) r2, r2 @ Find the bit offset of the last C
+ ne(rsb) r2, r2, #32 @ Convert to a count from the right
+ ne(add) r0, r0, r2, lsr #3 @ Convert to byte offset and add.
+6: bx lr
+
+END(strrchr)
+
+weak_alias (strrchr, rindex)
+libc_hidden_builtin_def (strrchr)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 25/26] arm: Add optimized submul_1
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (20 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 22/26] arm: Implement armv6t2 optimized strchr, strrchr, rawmemchr Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 3:17 ` [PATCH 06/26] arm: Use pc_ofs Richard Henderson
` (6 subsequent siblings)
28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Written from scratch rather than copied from GMP, due to GPL 2.1 vs
GPL 3, but tested with the GMP testsuite.
This is 50% faster than the generic code as measured on Cortex-A15.
It is 25% slower than the current GMP routine on the same core.
---
* sysdeps/arm/submul_1.S: New file.
---
ports/sysdeps/arm/submul_1.S | 67 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 67 insertions(+)
create mode 100644 ports/sysdeps/arm/submul_1.S
diff --git a/ports/sysdeps/arm/submul_1.S b/ports/sysdeps/arm/submul_1.S
new file mode 100644
index 0000000..35e1348
--- /dev/null
+++ b/ports/sysdeps/arm/submul_1.S
@@ -0,0 +1,67 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+@ cycles/limb
+@ StrongArm ?
+@ Cortex-A8 ?
+@ Cortex-A9 ?
+@ Cortex-A15 4
+
+/* mp_limb_t mpn_submul_1(res_ptr, src1_ptr, size, s2_limb) */
+
+ENTRY(__mpn_submul_1)
+ push { r4, r5, r6, r7 }
+ cfi_adjust_cfa_offset (16)
+ cfi_rel_offset (r4, 0)
+ cfi_rel_offset (r5, 4)
+ cfi_rel_offset (r6, 8)
+ cfi_rel_offset (r7, 12)
+
+ ldr r6, [r1], #4
+ ldr r7, [r0]
+ mov r4, #0 /* init carry in */
+ b 1f
+0:
+ ldr r6, [r1], #4 /* load next ul */
+ adds r5, r5, r4 /* (lpl, c) = lpl + cl */
+ adc r4, ip, #0 /* cl = hpl + c */
+ subs r5, r7, r5 /* (lpl, !c) = rl - lpl */
+ ldr r7, [r0, #4] /* load next rl */
+ it cc
+ addcc r4, r4, #1 /* cl += !c */
+ str r5, [r0], #4
+1:
+ umull r5, ip, r6, r3 /* (hpl, lpl) = ul * vl */
+ subs r2, r2, #1
+ bne 0b
+
+ adds r5, r5, r4 /* (lpl, c) = lpl + cl */
+ adc r4, ip, #0 /* cl = hpl + c */
+ subs r5, r7, r5 /* (lpl, !c) = rl - lpl */
+ str r5, [r0], #4
+ ite cc
+ addcc r0, r4, #1 /* cl += !c */
+ movcs r0, r4 /* return carry */
+
+ pop { r4, r5, r6, r7 }
+ DO_RET(lr)
+END(__mpn_submul_1)
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 06/26] arm: Use pc_ofs
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (21 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 25/26] arm: Add optimized submul_1 Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 0:21 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 10/26] arm: Introduce and use LDST_PCREL Richard Henderson
` (5 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Scour the source for raw "-8" adjustments that are related to the
offset created by reading the pc.
---
* sysdeps/arm/__longjmp.S (__longjmp): Use pc_ofs.
* sysdeps/arm/setjmp.S (__sigsetjmp): Likewise.
* sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
* sysdeps/unix/sysv/linux/arm/getcontext.S (__getcontext): Likewise.
* sysdeps/unix/sysv/linux/arm/setcontext.S (__startcontext): Likewise.
* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
(SINGLE_THREAD_P): Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep.h
(SYSCALL_ERROR_HANDLER): Likewise.
---
ports/sysdeps/arm/__longjmp.S | 4 ++--
ports/sysdeps/arm/setjmp.S | 4 ++--
ports/sysdeps/unix/arm/sysdep.S | 4 ++--
ports/sysdeps/unix/sysv/linux/arm/getcontext.S | 2 +-
ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h | 2 +-
ports/sysdeps/unix/sysv/linux/arm/setcontext.S | 2 +-
ports/sysdeps/unix/sysv/linux/arm/sysdep.h | 2 +-
7 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/ports/sysdeps/arm/__longjmp.S b/ports/sysdeps/arm/__longjmp.S
index 5c04f36..28281d5 100644
--- a/ports/sysdeps/arm/__longjmp.S
+++ b/ports/sysdeps/arm/__longjmp.S
@@ -105,12 +105,12 @@ ENTRY (__longjmp)
#ifdef NEED_HWCAP
# ifdef IS_IN_rtld
-1: .long _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1: .long _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
.Lrtld_local_ro:
.long C_SYMBOL_NAME(_rtld_local_ro)(GOTOFF)
# else
# ifdef PIC
-1: .long _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1: .long _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
.Lrtld_global_ro:
.long C_SYMBOL_NAME(_rtld_global_ro)(GOT)
# else
diff --git a/ports/sysdeps/arm/setjmp.S b/ports/sysdeps/arm/setjmp.S
index 4b7542a..774c78a 100644
--- a/ports/sysdeps/arm/setjmp.S
+++ b/ports/sysdeps/arm/setjmp.S
@@ -91,12 +91,12 @@ ENTRY (__sigsetjmp)
#ifdef NEED_HWCAP
# ifdef IS_IN_rtld
-1: .long _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1: .long _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
.Lrtld_local_ro:
.long C_SYMBOL_NAME(_rtld_local_ro)(GOTOFF)
# else
# ifdef PIC
-1: .long _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1: .long _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
.Lrtld_global_ro:
.long C_SYMBOL_NAME(_rtld_global_ro)(GOT)
# else
diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index da07d85..76137b3 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -50,14 +50,14 @@ __syscall_error:
mvn r0, #0
RETINSTR (, ip)
-1: .word errno(gottpoff) + (. - 2b - 8)
+1: .word errno(gottpoff) + (. - 2b - pc_ofs)
#elif RTLD_PRIVATE_ERRNO
ldr r1, 1f
0: str r0, [pc, r1]
mvn r0, $0
DO_RET(r14)
-1: .word C_SYMBOL_NAME(rtld_errno) - 0b - 8
+1: .word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs
#else
#error "Unsupported non-TLS case"
#endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/getcontext.S b/ports/sysdeps/unix/sysv/linux/arm/getcontext.S
index f7857c1..69cae48 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/getcontext.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/getcontext.S
@@ -103,7 +103,7 @@ ENTRY(__getcontext)
END(__getcontext)
#ifdef PIC
-1: .long _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1: .long _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
.Lrtld_global_ro:
.long C_SYMBOL_NAME(_rtld_global_ro)(GOT)
#else
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index 1b0a244..1745f9e 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -212,7 +212,7 @@ extern int __local_multiple_threads attribute_hidden;
ldr ip, [pc, ip]; \
teq ip, #0;
# define PSEUDO_PROLOGUE \
- 1: .word __local_multiple_threads - 2f - 8;
+ 1: .word __local_multiple_threads - 2f - pc_ofs;
# endif
# else
/* There is no __local_multiple_threads for librt, so use the TCB. */
diff --git a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
index 8e71f5b..8d96c57 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
+++ b/ports/sysdeps/unix/sysv/linux/arm/setcontext.S
@@ -93,7 +93,7 @@ ENTRY(__startcontext)
END(__startcontext)
#ifdef PIC
-1: .long _GLOBAL_OFFSET_TABLE_ - 0b - 8
+1: .long _GLOBAL_OFFSET_TABLE_ - 0b - pc_ofs
.Lrtld_global_ro:
.long C_SYMBOL_NAME(_rtld_global_ro)(GOT)
#else
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index 6b5bb14..cb237d9 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -114,7 +114,7 @@ __local_syscall_error: \
0: str r0, [pc, r1]; \
mvn r0, #0; \
DO_RET(lr); \
-1: .word C_SYMBOL_NAME(rtld_errno) - 0b - 8;
+1: .word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs;
# else
# if defined(__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
# define POP_PC \
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 06/26] arm: Use pc_ofs
2013-02-27 3:17 ` [PATCH 06/26] arm: Use pc_ofs Richard Henderson
@ 2013-02-28 0:21 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 0:21 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> * sysdeps/arm/__longjmp.S (__longjmp): Use pc_ofs.
> * sysdeps/arm/setjmp.S (__sigsetjmp): Likewise.
> * sysdeps/unix/arm/sysdep.S (__syscall_error): Likewise.
> * sysdeps/unix/sysv/linux/arm/getcontext.S (__getcontext): Likewise.
> * sysdeps/unix/sysv/linux/arm/setcontext.S (__startcontext): Likewise.
> * sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
> (SINGLE_THREAD_P): Likewise.
> * sysdeps/unix/sysv/linux/arm/sysdep.h
> (SYSCALL_ERROR_HANDLER): Likewise.
OK, once the macro itself is in, and updated as necessary for a renaming
of the macro into uppercase.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 10/26] arm: Introduce and use LDST_PCREL
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (22 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 06/26] arm: Use pc_ofs Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-28 1:00 ` Joseph S. Myers
2013-02-27 3:17 ` [PATCH 26/26] arm: Add optimized add_n and sub_n Richard Henderson
` (4 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Macro-ising the few instances where we need to distinguish between
arm and thumb pc-relative memory operations.
---
* sysdeps/arm/sysdep.h (LDST_PCREL): New macro.
* sysdeps/unix/arm/sysdep.S (__syscall_error): Use LDST_PCREL.
Fix up gottpoff load of errno for thumb2.
* sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
(SINGLE_THREAD_P): Use LDST_PCREL.
(PSEUDO_PROLOGUE): Remove.
(PSEUDO): Don't use it.
* sysdeps/unix/sysv/linux/arm/sysdep.h (SYSCALL_ERROR_HANDLER):
Use LDST_PCREL.
---
ports/sysdeps/arm/sysdep.h | 18 +++++++++++++
ports/sysdeps/unix/arm/sysdep.S | 30 +++++++++++-----------
.../unix/sysv/linux/arm/nptl/sysdep-cancel.h | 10 ++------
ports/sysdeps/unix/sysv/linux/arm/sysdep.h | 10 +++-----
4 files changed, 39 insertions(+), 29 deletions(-)
diff --git a/ports/sysdeps/arm/sysdep.h b/ports/sysdeps/arm/sysdep.h
index 4a9f05a..b7ba9b1 100644
--- a/ports/sysdeps/arm/sysdep.h
+++ b/ports/sysdeps/arm/sysdep.h
@@ -129,4 +129,22 @@
# define pc_ofs 8
#endif
+/* Load or store to/from a pc-relative EXPR into/from R, using T. */
+#ifdef __thumb2__
+# define LDST_PCREL(OP, R, T, EXPR) \
+ ldr T, 98f; \
+ .subsection 2; \
+98: .word EXPR - 99f - pc_ofs; \
+ .previous; \
+99: add T, T, pc; \
+ OP R, [T]
+#else
+# define LDST_PCREL(OP, R, T, EXPR) \
+ ldr T, 98f; \
+ .subsection 2; \
+98: .word EXPR - 99f - pc_ofs; \
+ .previous; \
+99: OP R, [pc, T]
+#endif
+
#endif /* __ASSEMBLER__ */
diff --git a/ports/sysdeps/unix/arm/sysdep.S b/ports/sysdeps/unix/arm/sysdep.S
index 951642f..969628a 100644
--- a/ports/sysdeps/unix/arm/sysdep.S
+++ b/ports/sysdeps/unix/arm/sysdep.S
@@ -37,26 +37,26 @@ __syscall_error:
#endif
#ifndef IS_IN_rtld
- mov ip, lr
+ mov ip, lr
cfi_register (lr, ip)
- mov r1, r0
-
+ mov r1, r0
GET_TLS
+ ldr r2, 1f
+#ifdef __thumb__
+2: add r2, r2, pc
+ ldr r2, [r2]
+#else
+2: ldr r2, [pc, r2]
+#endif
+ str r1, [r0, r2]
+ mvn r0, #0
+ DO_RET(ip)
- ldr r2, 1f
-2: ldr r2, [pc, r2]
- str r1, [r0, r2]
- mvn r0, #0
- RETINSTR (, ip)
-
-1: .word errno(gottpoff) + (. - 2b - pc_ofs)
+1: .word errno(gottpoff) + (. - 2b - pc_ofs)
#elif RTLD_PRIVATE_ERRNO
- ldr r1, 1f
-0: str r0, [pc, r1]
- mvn r0, $0
+ LDST_PCREL(str, r0, r1, C_SYMBOL_NAME(rtld_errno))
+ mvn r0, #0
DO_RET(r14)
-
-1: .word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs
#else
#error "Unsupported non-TLS case"
#endif
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
index 1745f9e..b6dc3e0 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
@@ -31,7 +31,6 @@
# undef PSEUDO
# define PSEUDO(name, syscall_name, args) \
.section ".text"; \
- PSEUDO_PROLOGUE; \
.type __##syscall_name##_nocancel,%function; \
.globl __##syscall_name##_nocancel; \
__##syscall_name##_nocancel: \
@@ -207,12 +206,8 @@ extern int __local_multiple_threads attribute_hidden;
# define SINGLE_THREAD_P __builtin_expect (__local_multiple_threads == 0, 1)
# else
# define SINGLE_THREAD_P \
- ldr ip, 1b; \
-2: \
- ldr ip, [pc, ip]; \
- teq ip, #0;
-# define PSEUDO_PROLOGUE \
- 1: .word __local_multiple_threads - 2f - pc_ofs;
+ LDST_PCREL(ldr, ip, ip, __local_multiple_threads); \
+ teq ip, #0
# endif
# else
/* There is no __local_multiple_threads for librt, so use the TCB. */
@@ -221,7 +216,6 @@ extern int __local_multiple_threads attribute_hidden;
__builtin_expect (THREAD_GETMEM (THREAD_SELF, \
header.multiple_threads) == 0, 1)
# else
-# define PSEUDO_PROLOGUE
# define SINGLE_THREAD_P \
stmfd sp!, {r0, lr}; \
cfi_adjust_cfa_offset (8); \
diff --git a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
index c1f2c9e..e448e61 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/sysdep.h
@@ -125,12 +125,10 @@
# if RTLD_PRIVATE_ERRNO
# define SYSCALL_ERROR_HANDLER \
__local_syscall_error: \
- ldr r1, 1f; \
- rsb r0, r0, #0; \
-0: str r0, [pc, r1]; \
- mvn r0, #0; \
- DO_RET(lr); \
-1: .word C_SYMBOL_NAME(rtld_errno) - 0b - pc_ofs;
+ rsb r0, r0, #0; \
+ LDST_PCREL(str, r0, r1, C_SYMBOL_NAME(rtld_errno)); \
+ mvn r0, #0; \
+ DO_RET(lr)
# else
# if defined(__ARM_ARCH_4T__) && defined(__THUMB_INTERWORK__)
# define POP_PC \
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 10/26] arm: Introduce and use LDST_PCREL
2013-02-27 3:17 ` [PATCH 10/26] arm: Introduce and use LDST_PCREL Richard Henderson
@ 2013-02-28 1:00 ` Joseph S. Myers
0 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 1:00 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> Macro-ising the few instances where we need to distinguish between
> arm and thumb pc-relative memory operations.
> ---
> * sysdeps/arm/sysdep.h (LDST_PCREL): New macro.
> * sysdeps/unix/arm/sysdep.S (__syscall_error): Use LDST_PCREL.
> Fix up gottpoff load of errno for thumb2.
> * sysdeps/unix/sysv/linux/arm/nptl/sysdep-cancel.h
> (SINGLE_THREAD_P): Use LDST_PCREL.
> (PSEUDO_PROLOGUE): Remove.
> (PSEUDO): Don't use it.
> * sysdeps/unix/sysv/linux/arm/sysdep.h (SYSCALL_ERROR_HANDLER):
> Use LDST_PCREL.
This patch appears to include whitespace changes to otherwise unmodified
code, as well as the substantive changes. Please separate the whitespace
and substantive changes and resubmit.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 26/26] arm: Add optimized add_n and sub_n
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (23 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 10/26] arm: Introduce and use LDST_PCREL Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 3:17 ` [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8 Richard Henderson
` (3 subsequent siblings)
28 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
Written from scratch rather than copied from GMP, due to GPL 2.1 vs
GPL 3, but tested with the GMP testsuite.
This is 250% faster than the generic code as measured on Cortex-A15,
and the same speed as GMP on the same core, and probably everywhere.
---
* sysdeps/arm/add_n.S: New file.
* sysdeps/arm/sub_n.S: New file.
---
ports/sysdeps/arm/add_n.S | 83 +++++++++++++++++++++++++++++++++++++++++++++++
ports/sysdeps/arm/sub_n.S | 2 ++
2 files changed, 85 insertions(+)
create mode 100644 ports/sysdeps/arm/add_n.S
create mode 100644 ports/sysdeps/arm/sub_n.S
diff --git a/ports/sysdeps/arm/add_n.S b/ports/sysdeps/arm/add_n.S
new file mode 100644
index 0000000..bbfb701
--- /dev/null
+++ b/ports/sysdeps/arm/add_n.S
@@ -0,0 +1,83 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .syntax unified
+ .text
+
+#ifdef USE_AS_SUB_N
+# define INITC cmp r0, r0
+# define OPC sbcs
+# define RETC s(sbc) r0, r0, r0; s(neg) r0, r0
+# define FUNC __mpn_sub_n
+#else
+# define INITC cmn r0, #0
+# define OPC adcs
+# define RETC s(mov) r0, #0; s(adc) r0, r0, r0
+# define FUNC __mpn_add_n
+#endif
+
+/* mp_limb_t mpn_add_n(res_ptr, src1_ptr, src2_ptr, size) */
+
+ENTRY(FUNC)
+ push { r4, r5, r6, r7, r8, r9, lr }
+ cfi_adjust_cfa_offset (28)
+ cfi_rel_offset (r4, 0)
+ cfi_rel_offset (r5, 4)
+ cfi_rel_offset (r6, 8)
+ cfi_rel_offset (r7, 12)
+ cfi_rel_offset (r8, 16)
+ cfi_rel_offset (r9, 20)
+ cfi_rel_offset (lr, 24)
+
+ INITC /* initialize carry flag */
+ tst r3, #1 /* count & 1 == 1? */
+ add lr, r1, r3, lsl #2 /* compute end src1 */
+ beq 1f
+
+ ldr r4, [r1], #4 /* do one to make count even */
+ ldr r5, [r2], #4
+ OPC r4, r4, r5
+ teq r1, lr /* end of count? (preserve carry) */
+ str r4, [r0], #4
+ beq 9f
+1:
+ tst r3, #2 /* count & 2 == 2? */
+ beq 2f
+ ldm r1!, { r4, r5 } /* do two to make count 0 mod 4 */
+ ldm r2!, { r6, r7 }
+ OPC r4, r4, r6
+ OPC r5, r5, r7
+ teq r1, lr /* end of count? */
+ stm r0!, { r4, r5 }
+ beq 9f
+2:
+ ldm r1!, { r3, r4, r5, r6 } /* do four each loop */
+ ldm r2!, { r7, r8, r9, ip }
+ OPC r3, r3, r7
+ OPC r4, r4, r8
+ OPC r5, r5, r9
+ OPC r6, r6, ip
+ teq r1, lr
+ stm r0!, { r3, r4, r5, r6 }
+ bne 2b
+
+9:
+ RETC /* copy carry out */
+ pop { r4, r5, r6, r7, r8, r9, pc }
+END(FUNC)
diff --git a/ports/sysdeps/arm/sub_n.S b/ports/sysdeps/arm/sub_n.S
new file mode 100644
index 0000000..8eafa41
--- /dev/null
+++ b/ports/sysdeps/arm/sub_n.S
@@ -0,0 +1,2 @@
+#define USE_AS_SUB_N
+#include "add_n.S"
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (24 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 26/26] arm: Add optimized add_n and sub_n Richard Henderson
@ 2013-02-27 3:17 ` Richard Henderson
2013-02-27 17:54 ` Joseph S. Myers
2013-02-27 15:41 ` [PATCH 00/26] ARM improvements Måns Rullgård
` (2 subsequent siblings)
28 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 3:17 UTC (permalink / raw)
To: libc-ports; +Cc: Joseph Myers
New defines from gcc 4.8:
#define __ARM_ARCH_ISA_ARM 1
#define __ARM_ARCH_PROFILE 65
#define __ARM_ARCH_ISA_THUMB 2
#define __ARM_ARCH 7
all of which got in the way of the one we wanted:
#define __ARM_ARCH_7A__ 1
---
* sysdeps/arm/preconfigure: Adjust scan for __ARM_ARCH_* defines.
---
ports/sysdeps/arm/preconfigure | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ports/sysdeps/arm/preconfigure b/ports/sysdeps/arm/preconfigure
index 20f6d91..b0c0540 100644
--- a/ports/sysdeps/arm/preconfigure
+++ b/ports/sysdeps/arm/preconfigure
@@ -10,7 +10,7 @@ arm*)
# an appropriate directory exists in sysdeps/arm
archcppflag=`echo "" |
$CC $CFLAGS $CPPFLAGS -E -dM - |
- grep __ARM_ARCH |
+ grep '__ARM_ARCH_.*__' |
sed -e 's/^#define //' -e 's/ .*//'`
case x$archcppflag in
--
1.8.1.2
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8
2013-02-27 3:17 ` [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8 Richard Henderson
@ 2013-02-27 17:54 ` Joseph S. Myers
2013-02-27 18:11 ` Richard Henderson
0 siblings, 1 reply; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 17:54 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
On Tue, 26 Feb 2013, Richard Henderson wrote:
> - grep __ARM_ARCH |
> + grep '__ARM_ARCH_.*__' |
I think this should be restricted more closely to those likely to be
relevant, using [0-9] before the .* - OK with that change (presuming it
works). Just this particular change would be reasonable to cherry-pick to
2.17 branch as well for the benefit of anyone using that release branch
with 4.8.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8
2013-02-27 17:54 ` Joseph S. Myers
@ 2013-02-27 18:11 ` Richard Henderson
0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 18:11 UTC (permalink / raw)
To: Joseph S. Myers; +Cc: libc-ports
On 02/27/2013 09:54 AM, Joseph S. Myers wrote:
> On Tue, 26 Feb 2013, Richard Henderson wrote:
>
>> - grep __ARM_ARCH |
>> + grep '__ARM_ARCH_.*__' |
>
> I think this should be restricted more closely to those likely to be
> relevant, using [0-9] before the .* - OK with that change (presuming it
> works). Just this particular change would be reasonable to cherry-pick to
> 2.17 branch as well for the benefit of anyone using that release branch
> with 4.8.
>
Yes, this [0-9] change works. I'll commit that separately.
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 00/26] ARM improvements
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (25 preceding siblings ...)
2013-02-27 3:17 ` [PATCH 02/26] arm: Update preconfigure fragment for gcc 4.8 Richard Henderson
@ 2013-02-27 15:41 ` Måns Rullgård
2013-02-27 16:59 ` Joseph S. Myers
2013-02-28 22:05 ` Joseph S. Myers
28 siblings, 0 replies; 63+ messages in thread
From: Måns Rullgård @ 2013-02-27 15:41 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports, Joseph Myers
Richard Henderson <rth@twiddle.net> writes:
> Patches 19-23 add improved string routines for armv6t2. I've had these
> hanging around for almost 2 years without properly submitting them.
> Which is perhaps a bit silly, but the A8 host I was originally doing
> testing on has a dreadfully low resolution clock, so it was hard to get
> real numbers. Whereas the A15 has a 1ns resolution CLOCK_MONOTONIC_RAW.
> I can post the benchmarks under separate cover if you like.
With a suitable kernel hack, you can access the cycle counter from
userspace on any ARMv7 device. This is probably more accurate and
avoids the syscall overhead.
--
Måns Rullgård
mans@mansr.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 00/26] ARM improvements
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (26 preceding siblings ...)
2013-02-27 15:41 ` [PATCH 00/26] ARM improvements Måns Rullgård
@ 2013-02-27 16:59 ` Joseph S. Myers
2013-02-27 17:34 ` Richard Henderson
2013-02-28 22:05 ` Joseph S. Myers
28 siblings, 1 reply; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-27 16:59 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
Could you please clarify how these patches have been tested? In
particular, what testing has been done for big-endian (I think the string
functions at least do need bit-endian testing - it should be possible to
run string tests with userspace QEMU without needing BE hardware).
> Patches 4-18 improve the ability to build libc as a thumb2 binary.
> In the end, almost all assembly is done in thumb2 mode if -mthumb
> is present in ASFLAGS. Its that last that's the sticky part: by
> default we copy only a couple of flags over from CFLAGS. I'm not
> sure why we're not passing them all to the assembler. So at the
> moment I'm just putting ASFLAGS on the make command-line to get
> what I want.
I'd typically expect builds to be done with CC containing any relevant
options for this sort of thing, rather than CFLAGS.
That also raises the question of dependencies between the patches. Given
a patch series like this, each subset 1-N of the patches should generally
leave the tree in a working state. But if a patch (say patch 6) makes
changes to .S code for __thumb2__ that are only correct after that
actually means the generated code is Thumb-2 (patch 12) that leaves a
broken intermediate state (given a compiler that defaults to Thumb-2,
whether because configured --with-mode=thumb or because of the options in
$CC), meaning the changes can't quite go in the given order (patch 5 could
define pc_ofs unconditionally to 8 in assembly code, for example, and only
patch 12 change the value for Thumb-2 assembly).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 00/26] ARM improvements
2013-02-27 16:59 ` Joseph S. Myers
@ 2013-02-27 17:34 ` Richard Henderson
0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-02-27 17:34 UTC (permalink / raw)
To: Joseph S. Myers; +Cc: libc-ports
On 02/27/2013 08:58 AM, Joseph S. Myers wrote:
> Could you please clarify how these patches have been tested?
As make check on a Cortex-A15 LE, both in arm and thumb mode.
> what testing has been done for big-endian (I think the string
> functions at least do need bit-endian testing - it should be possible to
> run string tests with userspace QEMU without needing BE hardware).
I haven't done that recently, but I can do that again.
> I'd typically expect builds to be done with CC containing any relevant
> options for this sort of thing, rather than CFLAGS.
Point. I'll give that a try in future.
> That also raises the question of dependencies between the patches. Given
> a patch series like this, each subset 1-N of the patches should generally
> leave the tree in a working state. But if a patch (say patch 6) makes
> changes to .S code for __thumb2__ that are only correct after that
> actually means the generated code is Thumb-2 (patch 12) that leaves a
> broken intermediate state (given a compiler that defaults to Thumb-2,
> whether because configured --with-mode=thumb or because of the options in
> $CC), meaning the changes can't quite go in the given order (patch 5 could
> define pc_ofs unconditionally to 8 in assembly code, for example, and only
> patch 12 change the value for Thumb-2 assembly).
I was attempting such an order, but I see what you mean about cpp conditionals
not matching up with the actual assembly mode.
As far as I can remember, pc_ofs is the only such example. The rest of the
changes -- I'm thinking of how pc is added to addresses and negative offset
addressing -- while conditioned on __thumb2__ still produce valid ARM insns.
r~
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 00/26] ARM improvements
2013-02-27 3:16 [PATCH 00/26] ARM improvements Richard Henderson
` (27 preceding siblings ...)
2013-02-27 16:59 ` Joseph S. Myers
@ 2013-02-28 22:05 ` Joseph S. Myers
28 siblings, 0 replies; 63+ messages in thread
From: Joseph S. Myers @ 2013-02-28 22:05 UTC (permalink / raw)
To: Richard Henderson; +Cc: libc-ports
I think you should now put in all the approved patches (except if any
don't make sense without a previous patch that hasn't been approved),
subject to testing that they do work in the sequence in which they go in.
Then repost the remaining patches, adapted as appropriate for the comments
that have been posted on libc-ports so far.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 63+ messages in thread