* [PATCH 0/11] Improve Mips target
@ 2025-01-23 13:42 Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 00/11] " Aleksandar Rakic
` (11 more replies)
0 siblings, 12 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:42 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu
This patch series improves the support for the MIPS target in glibc,
including the enhancements to the MIPS target and several bug fixes.
These patches are cherry-picked from the branch mips_rel/2_28/master on
the MIPS' repository: https://github.com/MIPS/glibc .
Further details on the individual changes are included in the respective
patches.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 00/11] Improve Mips target
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
@ 2025-01-23 13:42 ` Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 01/11] Updates for microMIPS Release 6 Aleksandar Rakic
` (10 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:42 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu
Aleksandar Rakic (11):
Updates for microMIPS Release 6
Fix rtld link_map initialization issues
Fix issues with removing no-reorder directives
Add C implementation of memcpy/memset
Add optimized assembly for strcmp
Fix prefetching beyond copied memory
Fix strcmp bug for little endian target
Add script to run tests through a qemu wrapper
Avoid warning from -Wbuiltin-declaration-mismatch
Avoid GCC 11 warning from -Wmaybe-uninitialized
Prevent turning memset into self-recursion
elf/rtld.c | 14 +-
scripts/cross-test-qemu.sh | 152 ++++
sysdeps/ieee754/dbl-64/s_modf.c | 4 +
sysdeps/ieee754/dbl-64/s_sincos.c | 4 +
sysdeps/ieee754/soft-fp/s_fdiv.c | 1 +
sysdeps/mips/Makefile | 5 +
sysdeps/mips/add_n.S | 12 +-
sysdeps/mips/addmul_1.S | 11 +-
sysdeps/mips/bsd-setjmp.S | 2 +-
sysdeps/mips/dl-machine.h | 15 +-
sysdeps/mips/dl-trampoline.c | 4 -
sysdeps/mips/lshift.S | 12 +-
sysdeps/mips/machine-gmon.h | 82 ++
sysdeps/mips/memcpy.S | 868 -------------------
sysdeps/mips/memcpy.c | 449 ++++++++++
sysdeps/mips/memset.S | 426 ---------
sysdeps/mips/memset.c | 187 ++++
sysdeps/mips/mips32/crtn.S | 12 +-
sysdeps/mips/mips64/__longjmp.c | 2 +-
sysdeps/mips/mips64/add_n.S | 12 +-
sysdeps/mips/mips64/addmul_1.S | 11 +-
sysdeps/mips/mips64/lshift.S | 12 +-
sysdeps/mips/mips64/mul_1.S | 11 +-
sysdeps/mips/mips64/n32/crtn.S | 12 +-
sysdeps/mips/mips64/n64/crtn.S | 12 +-
sysdeps/mips/mips64/rshift.S | 12 +-
sysdeps/mips/mips64/sub_n.S | 12 +-
sysdeps/mips/mips64/submul_1.S | 11 +-
sysdeps/mips/mul_1.S | 11 +-
sysdeps/mips/rshift.S | 12 +-
sysdeps/mips/strcmp.S | 229 +++--
sysdeps/mips/sub_n.S | 12 +-
sysdeps/mips/submul_1.S | 11 +-
sysdeps/mips/sys/asm.h | 20 +-
sysdeps/unix/mips/mips32/sysdep.h | 4 -
sysdeps/unix/mips/mips64/sysdep.h | 4 -
sysdeps/unix/mips/sysdep.h | 2 -
sysdeps/unix/sysv/linux/mips/mips32/sysdep.h | 10 -
sysdeps/unix/sysv/linux/mips/mips64/sysdep.h | 14 -
39 files changed, 1108 insertions(+), 1588 deletions(-)
create mode 100755 scripts/cross-test-qemu.sh
delete mode 100644 sysdeps/mips/memcpy.S
create mode 100644 sysdeps/mips/memcpy.c
delete mode 100644 sysdeps/mips/memset.S
create mode 100644 sysdeps/mips/memset.c
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 01/11] Updates for microMIPS Release 6
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 00/11] " Aleksandar Rakic
@ 2025-01-23 13:42 ` Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 02/11] Fix rtld link_map initialization issues Aleksandar Rakic
` (9 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:42 UTC (permalink / raw)
To: libc-alpha
Cc: aleksandar.rakic, djordje.todorovic, cfu, Matthew Fortune,
Andrew Bennett, Faraz Shahbazker
* Remove noreorder
* Fix PC relative code label calculations for microMIPSR6
* Add special versions of code that would be de-optimised by removing
noreorder
* Avoid use of un-aligned ADDIUPC instruction for address calculation.
Cherry-picked 94a52199502361be4a5b1cc616661e287416cc8d
from https://github.com/MIPS/glibc
Signed-off-by: Matthew Fortune <matthew.fortune@imgtec.com>
Signed-off-by: Andrew Bennett <andrew.bennett@imgtec.com>
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/mips/add_n.S | 12 +-
sysdeps/mips/addmul_1.S | 11 +-
sysdeps/mips/dl-machine.h | 15 ++-
sysdeps/mips/dl-trampoline.c | 4 -
sysdeps/mips/lshift.S | 12 +-
sysdeps/mips/machine-gmon.h | 82 +++++++++++++
sysdeps/mips/memcpy.S | 120 +++++++++++--------
sysdeps/mips/memset.S | 62 +++++-----
sysdeps/mips/mips32/crtn.S | 12 +-
sysdeps/mips/mips64/__longjmp.c | 2 +-
sysdeps/mips/mips64/add_n.S | 12 +-
sysdeps/mips/mips64/addmul_1.S | 11 +-
sysdeps/mips/mips64/lshift.S | 12 +-
sysdeps/mips/mips64/mul_1.S | 11 +-
sysdeps/mips/mips64/n32/crtn.S | 12 +-
sysdeps/mips/mips64/n64/crtn.S | 12 +-
sysdeps/mips/mips64/rshift.S | 12 +-
sysdeps/mips/mips64/sub_n.S | 12 +-
sysdeps/mips/mips64/submul_1.S | 11 +-
sysdeps/mips/mul_1.S | 11 +-
sysdeps/mips/rshift.S | 12 +-
sysdeps/mips/sub_n.S | 12 +-
sysdeps/mips/submul_1.S | 11 +-
sysdeps/mips/sys/asm.h | 20 +---
sysdeps/unix/mips/mips32/sysdep.h | 4 -
sysdeps/unix/mips/mips64/sysdep.h | 4 -
sysdeps/unix/mips/sysdep.h | 2 -
sysdeps/unix/sysv/linux/mips/mips32/sysdep.h | 10 --
sysdeps/unix/sysv/linux/mips/mips64/sysdep.h | 14 ---
29 files changed, 260 insertions(+), 277 deletions(-)
diff --git a/sysdeps/mips/add_n.S b/sysdeps/mips/add_n.S
index 234e1e3c8d..f4d98fa38c 100644
--- a/sysdeps/mips/add_n.S
+++ b/sysdeps/mips/add_n.S
@@ -31,19 +31,16 @@ along with the GNU MP Library. If not, see
.option pic2
#endif
ENTRY (__mpn_add_n)
- .set noreorder
#ifdef __PIC__
.cpload t9
#endif
- .set nomacro
-
lw $10,0($5)
lw $11,0($6)
addiu $7,$7,-1
and $9,$7,4-1 /* number of limbs in first loop */
- beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */
move $2,$0
+ beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */
subu $7,$7,$9
@@ -61,11 +58,10 @@ L(Loop0): addiu $9,$9,-1
addiu $6,$6,4
move $10,$12
move $11,$13
- bne $9,$0,L(Loop0)
addiu $4,$4,4
+ bne $9,$0,L(Loop0)
L(L0): beq $7,$0,L(end)
- nop
L(Loop): addiu $7,$7,-4
@@ -108,14 +104,14 @@ L(Loop): addiu $7,$7,-4
addiu $5,$5,16
addiu $6,$6,16
- bne $7,$0,L(Loop)
addiu $4,$4,16
+ bne $7,$0,L(Loop)
L(end): addu $11,$11,$2
sltu $8,$11,$2
addu $11,$10,$11
sltu $2,$11,$10
sw $11,0($4)
- j $31
or $2,$2,$8
+ jr $31
END (__mpn_add_n)
diff --git a/sysdeps/mips/addmul_1.S b/sysdeps/mips/addmul_1.S
index 523478d7e8..eea26630fc 100644
--- a/sysdeps/mips/addmul_1.S
+++ b/sysdeps/mips/addmul_1.S
@@ -31,12 +31,9 @@ along with the GNU MP Library. If not, see
.option pic2
#endif
ENTRY (__mpn_addmul_1)
- .set noreorder
#ifdef __PIC__
.cpload t9
#endif
- .set nomacro
-
/* warm up phase 0 */
lw $8,0($5)
@@ -50,12 +47,12 @@ ENTRY (__mpn_addmul_1)
#endif
addiu $6,$6,-1
- beq $6,$0,L(LC0)
move $2,$0 /* zero cy2 */
+ beq $6,$0,L(LC0)
addiu $6,$6,-1
- beq $6,$0,L(LC1)
lw $8,0($5) /* load new s1 limb as early as possible */
+ beq $6,$0,L(LC1)
L(Loop): lw $10,0($4)
#if __mips_isa_rev < 6
@@ -81,8 +78,8 @@ L(Loop): lw $10,0($4)
addu $2,$2,$10
sw $3,0($4)
addiu $4,$4,4
- bne $6,$0,L(Loop) /* should be "bnel" */
addu $2,$9,$2 /* add high product limb and carry from addition */
+ bne $6,$0,L(Loop) /* should be "bnel" */
/* cool down phase 1 */
L(LC1): lw $10,0($4)
@@ -123,6 +120,6 @@ L(LC0): lw $10,0($4)
sltu $10,$3,$10
addu $2,$2,$10
sw $3,0($4)
- j $31
addu $2,$9,$2 /* add high product limb and carry from addition */
+ jr $31
END (__mpn_addmul_1)
diff --git a/sysdeps/mips/dl-machine.h b/sysdeps/mips/dl-machine.h
index 10e30f1e90..a360dfcd63 100644
--- a/sysdeps/mips/dl-machine.h
+++ b/sysdeps/mips/dl-machine.h
@@ -127,16 +127,13 @@ elf_machine_load_address (void)
{
ElfW(Addr) addr;
#ifndef __mips16
- asm (" .set noreorder\n"
- " " STRINGXP (PTR_LA) " %0, 0f\n"
+ asm (" " STRINGXP (PTR_LA) " %0, 0f\n"
# if !defined __mips_isa_rev || __mips_isa_rev < 6
" bltzal $0, 0f\n"
- " nop\n"
+#else
+ " bal 0f\n"
+#endif
"0: " STRINGXP (PTR_SUBU) " %0, $31, %0\n"
-# else
- "0: addiupc $31, 0\n"
- " " STRINGXP (PTR_SUBU) " %0, $31, %0\n"
-# endif
" .set reorder\n"
: "=r" (addr)
: /* No inputs */
@@ -237,7 +234,9 @@ do { \
and not just plain _start. */
#ifndef __mips16
-# if !defined __mips_isa_rev || __mips_isa_rev < 6
+/* Although microMIPSr6 has an ADDIUPC instruction, it must be 4-byte aligned
+ for the address calculation to be valid. */
+# if !defined __mips_isa_rev || __mips_isa_rev < 6 || defined __mips_micromips
# define LCOFF STRINGXP(.Lcof2)
# define LOAD_31 STRINGXP(bltzal $8) "," STRINGXP(.Lcof2)
# else
diff --git a/sysdeps/mips/dl-trampoline.c b/sysdeps/mips/dl-trampoline.c
index 603ee2d2f8..915e1da6ad 100644
--- a/sysdeps/mips/dl-trampoline.c
+++ b/sysdeps/mips/dl-trampoline.c
@@ -301,7 +301,6 @@ asm ("\n\
.ent _dl_runtime_resolve\n\
_dl_runtime_resolve:\n\
.frame $29, " STRINGXP(ELF_DL_FRAME_SIZE) ", $31\n\
- .set noreorder\n\
# Save GP.\n\
1: move $3, $28\n\
# Save arguments and sp value in stack.\n\
@@ -311,7 +310,6 @@ _dl_runtime_resolve:\n\
# Compute GP.\n\
2: " STRINGXP(SETUP_GP) "\n\
" STRINGXV(SETUP_GP64 (0, _dl_runtime_resolve)) "\n\
- .set reorder\n\
# Save slot call pc.\n\
move $2, $31\n\
" IFABIO32(STRINGXP(CPRESTORE(32))) "\n\
@@ -358,7 +356,6 @@ asm ("\n\
.ent _dl_runtime_pltresolve\n\
_dl_runtime_pltresolve:\n\
.frame $29, " STRINGXP(ELF_DL_PLT_FRAME_SIZE) ", $31\n\
- .set noreorder\n\
# Save arguments and sp value in stack.\n\
1: " STRINGXP(PTR_SUBIU) " $29, " STRINGXP(ELF_DL_PLT_FRAME_SIZE) "\n\
" IFABIO32(STRINGXP(PTR_L) " $13, " STRINGXP(PTRSIZE) "($28)") "\n\
@@ -368,7 +365,6 @@ _dl_runtime_pltresolve:\n\
# Compute GP.\n\
2: " STRINGXP(SETUP_GP) "\n\
" STRINGXV(SETUP_GP64 (0, _dl_runtime_pltresolve)) "\n\
- .set reorder\n\
" IFABIO32(STRINGXP(CPRESTORE(32))) "\n\
" ELF_DL_PLT_SAVE_ARG_REGS "\
move $4, $13\n\
diff --git a/sysdeps/mips/lshift.S b/sysdeps/mips/lshift.S
index 04caa76a84..c6c42aa1f5 100644
--- a/sysdeps/mips/lshift.S
+++ b/sysdeps/mips/lshift.S
@@ -30,12 +30,9 @@ along with the GNU MP Library. If not, see
.option pic2
#endif
ENTRY (__mpn_lshift)
- .set noreorder
#ifdef __PIC__
.cpload t9
#endif
- .set nomacro
-
sll $2,$6,2
addu $5,$5,$2 /* make r5 point at end of src */
lw $10,-4($5) /* load first limb */
@@ -43,8 +40,8 @@ ENTRY (__mpn_lshift)
addu $4,$4,$2 /* make r4 point at end of res */
addiu $6,$6,-1
and $9,$6,4-1 /* number of limbs in first loop */
- beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */
srl $2,$10,$13 /* compute function result */
+ beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */
subu $6,$6,$9
@@ -56,11 +53,10 @@ L(Loop0): lw $3,-8($5)
srl $12,$3,$13
move $10,$3
or $8,$11,$12
- bne $9,$0,L(Loop0)
sw $8,0($4)
+ bne $9,$0,L(Loop0)
L(L0): beq $6,$0,L(Lend)
- nop
L(Loop): lw $3,-8($5)
addiu $4,$4,-16
@@ -88,10 +84,10 @@ L(Loop): lw $3,-8($5)
addiu $5,$5,-16
or $8,$14,$9
- bgtz $6,L(Loop)
sw $8,0($4)
+ bgtz $6,L(Loop)
L(Lend): sll $8,$10,$7
- j $31
sw $8,-4($4)
+ jr $31
END (__mpn_lshift)
diff --git a/sysdeps/mips/machine-gmon.h b/sysdeps/mips/machine-gmon.h
index e2e0756575..d890e5ec19 100644
--- a/sysdeps/mips/machine-gmon.h
+++ b/sysdeps/mips/machine-gmon.h
@@ -34,6 +34,42 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc)
# define CPRESTORE
#endif
+#if __mips_isa_rev > 5 && defined (__mips_micromips)
+#define MCOUNT asm(\
+ ".globl _mcount;\n\t" \
+ ".align 2;\n\t" \
+ ".set push;\n\t" \
+ ".set nomips16;\n\t" \
+ ".type _mcount,@function;\n\t" \
+ ".ent _mcount\n\t" \
+ "_mcount:\n\t" \
+ ".frame $sp,44,$31\n\t" \
+ ".set noat;\n\t" \
+ CPLOAD \
+ "subu $29,$29,48;\n\t" \
+ CPRESTORE \
+ "sw $4,24($29);\n\t" \
+ "sw $5,28($29);\n\t" \
+ "sw $6,32($29);\n\t" \
+ "sw $7,36($29);\n\t" \
+ "sw $2,40($29);\n\t" \
+ "sw $1,16($29);\n\t" \
+ "sw $31,20($29);\n\t" \
+ "move $5,$31;\n\t" \
+ "move $4,$1;\n\t" \
+ "balc __mcount;\n\t" \
+ "lw $4,24($29);\n\t" \
+ "lw $5,28($29);\n\t" \
+ "lw $6,32($29);\n\t" \
+ "lw $7,36($29);\n\t" \
+ "lw $2,40($29);\n\t" \
+ "lw $1,20($29);\n\t" \
+ "lw $31,16($29);\n\t" \
+ "addu $29,$29,56;\n\t" \
+ "jrc $1;\n\t" \
+ ".end _mcount;\n\t" \
+ ".set pop");
+#else
#define MCOUNT asm(\
".globl _mcount;\n\t" \
".align 2;\n\t" \
@@ -71,6 +107,7 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc)
"move $31,$1;\n\t" \
".end _mcount;\n\t" \
".set pop");
+#endif
#else
@@ -97,6 +134,50 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc)
# error "Unknown ABI"
#endif
+#if __mips_isa_rev > 5 && defined (__mips_micromips)
+#define MCOUNT asm(\
+ ".globl _mcount;\n\t" \
+ ".align 3;\n\t" \
+ ".set push;\n\t" \
+ ".set nomips16;\n\t" \
+ ".type _mcount,@function;\n\t" \
+ ".ent _mcount\n\t" \
+ "_mcount:\n\t" \
+ ".frame $sp,88,$31\n\t" \
+ ".set noat;\n\t" \
+ PTR_SUBU_STRING " $29,$29,96;\n\t" \
+ CPSETUP \
+ "sd $4,24($29);\n\t" \
+ "sd $5,32($29);\n\t" \
+ "sd $6,40($29);\n\t" \
+ "sd $7,48($29);\n\t" \
+ "sd $8,56($29);\n\t" \
+ "sd $9,64($29);\n\t" \
+ "sd $10,72($29);\n\t" \
+ "sd $11,80($29);\n\t" \
+ "sd $2,16($29);\n\t" \
+ "sd $1,0($29);\n\t" \
+ "sd $31,8($29);\n\t" \
+ "move $5,$31;\n\t" \
+ "move $4,$1;\n\t" \
+ "balc __mcount;\n\t" \
+ "ld $4,24($29);\n\t" \
+ "ld $5,32($29);\n\t" \
+ "ld $6,40($29);\n\t" \
+ "ld $7,48($29);\n\t" \
+ "ld $8,56($29);\n\t" \
+ "ld $9,64($29);\n\t" \
+ "ld $10,72($29);\n\t" \
+ "ld $11,80($29);\n\t" \
+ "ld $2,16($29);\n\t" \
+ "ld $1,8($29);\n\t" \
+ "ld $31,0($29);\n\t" \
+ CPRETURN \
+ PTR_ADDU_STRING " $29,$29,96;\n\t" \
+ "jrc $1;\n\t" \
+ ".end _mcount;\n\t" \
+ ".set pop");
+#else
#define MCOUNT asm(\
".globl _mcount;\n\t" \
".align 3;\n\t" \
@@ -142,5 +223,6 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc)
"move $31,$1;\n\t" \
".end _mcount;\n\t" \
".set pop");
+#endif
#endif
diff --git a/sysdeps/mips/memcpy.S b/sysdeps/mips/memcpy.S
index 5b277e07c5..96d1c92d89 100644
--- a/sysdeps/mips/memcpy.S
+++ b/sysdeps/mips/memcpy.S
@@ -86,6 +86,12 @@
# endif
#endif
+#if __mips_isa_rev > 5 && defined (__mips_micromips)
+# define PTR_BC bc16
+#else
+# define PTR_BC bc
+#endif
+
/*
* Using PREFETCH_HINT_LOAD_STREAMED instead of PREFETCH_LOAD on load
* prefetches appear to offer a slight performance advantage.
@@ -272,7 +278,6 @@ LEAF(MEMCPY_NAME, 0)
LEAF(MEMCPY_NAME)
#endif
.set nomips16
- .set noreorder
/*
* Below we handle the case where memcpy is called with overlapping src and dst.
* Although memcpy is not required to handle this case, some parts of Android
@@ -284,10 +289,9 @@ LEAF(MEMCPY_NAME)
xor t1,t0,t2
PTR_SUBU t0,t1,t2
sltu t2,t0,a2
- beq t2,zero,L(memcpy)
la t9,memmove
+ beq t2,zero,L(memcpy)
jr t9
- nop
L(memcpy):
#endif
/*
@@ -295,12 +299,12 @@ L(memcpy):
* size, copy dst pointer to v0 for the return value.
*/
slti t2,a2,(2 * NSIZE)
- bne t2,zero,L(lasts)
#if defined(RETURN_FIRST_PREFETCH) || defined(RETURN_LAST_PREFETCH)
move v0,zero
#else
move v0,a0
#endif
+ bne t2,zero,L(lasts)
#ifndef R6_CODE
@@ -312,12 +316,12 @@ L(memcpy):
*/
xor t8,a1,a0
andi t8,t8,(NSIZE-1) /* t8 is a0/a1 word-displacement */
- bne t8,zero,L(unaligned)
PTR_SUBU a3, zero, a0
+ bne t8,zero,L(unaligned)
andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */
+ PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */
beq a3,zero,L(aligned) /* if a3=0, it is already aligned */
- PTR_SUBU a2,a2,a3 /* a2 is the remaining bytes count */
C_LDHI t8,0(a1)
PTR_ADDU a1,a1,a3
@@ -332,18 +336,24 @@ L(memcpy):
* align instruction.
*/
andi t8,a0,7
+#ifdef __mips_micromips
+ auipc t9,%pcrel_hi(L(atable))
+ addiu t9,t9,%pcrel_lo(L(atable)+4)
+ PTR_LSA t9,t8,t9,1
+#else
lapc t9,L(atable)
PTR_LSA t9,t8,t9,2
+#endif
jrc t9
L(atable):
- bc L(lb0)
- bc L(lb7)
- bc L(lb6)
- bc L(lb5)
- bc L(lb4)
- bc L(lb3)
- bc L(lb2)
- bc L(lb1)
+ PTR_BC L(lb0)
+ PTR_BC L(lb7)
+ PTR_BC L(lb6)
+ PTR_BC L(lb5)
+ PTR_BC L(lb4)
+ PTR_BC L(lb3)
+ PTR_BC L(lb2)
+ PTR_BC L(lb1)
L(lb7):
lb a3, 6(a1)
sb a3, 6(a0)
@@ -374,20 +384,26 @@ L(lb1):
L(lb0):
andi t8,a1,(NSIZE-1)
+#ifdef __mips_micromips
+ auipc t9,%pcrel_hi(L(jtable))
+ addiu t9,t9,%pcrel_lo(L(jtable)+4)
+ PTR_LSA t9,t8,t9,1
+#else
lapc t9,L(jtable)
PTR_LSA t9,t8,t9,2
+#endif
jrc t9
L(jtable):
- bc L(aligned)
- bc L(r6_unaligned1)
- bc L(r6_unaligned2)
- bc L(r6_unaligned3)
-# ifdef USE_DOUBLE
- bc L(r6_unaligned4)
- bc L(r6_unaligned5)
- bc L(r6_unaligned6)
- bc L(r6_unaligned7)
-# endif
+ PTR_BC L(aligned)
+ PTR_BC L(r6_unaligned1)
+ PTR_BC L(r6_unaligned2)
+ PTR_BC L(r6_unaligned3)
+#ifdef USE_DOUBLE
+ PTR_BC L(r6_unaligned4)
+ PTR_BC L(r6_unaligned5)
+ PTR_BC L(r6_unaligned6)
+ PTR_BC L(r6_unaligned7)
+#endif
#endif /* R6_CODE */
L(aligned):
@@ -401,8 +417,8 @@ L(aligned):
*/
andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */
- beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */
PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */
+ beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */
PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */
/* When in the loop we may prefetch with the 'prepare to store' hint,
@@ -428,7 +444,6 @@ L(aligned):
# if PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE
sltu v1,t9,a0
bgtz v1,L(skip_set)
- nop
PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4)
L(skip_set):
# else
@@ -444,11 +459,16 @@ L(skip_set):
#endif
L(loop16w):
C_LD t0,UNIT(0)(a1)
+/* We need to separate out the C_LD instruction here so that it will work
+ both when it is used by itself and when it is used with the branch
+ instruction. */
#if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */
+ C_LD t1,UNIT(1)(a1)
bgtz v1,L(skip_pref)
-#endif
+#else
C_LD t1,UNIT(1)(a1)
+#endif
#ifdef R6_CODE
PREFETCH_FOR_STORE (2, a0)
#else
@@ -502,8 +522,8 @@ L(skip_pref):
C_ST REG6,UNIT(14)(a0)
C_ST REG7,UNIT(15)(a0)
PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */
- bne a0,a3,L(loop16w)
PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */
+ bne a0,a3,L(loop16w)
move a2,t8
/* Here we have src and dest word-aligned but less than 64-bytes or
@@ -517,7 +537,6 @@ L(chkw):
andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */
/* The t8 is the reminder count past 32-bytes */
beq a2,t8,L(chk1w) /* When a2=t8, no 32-byte chunk */
- nop
C_LD t0,UNIT(0)(a1)
C_LD t1,UNIT(1)(a1)
C_LD REG2,UNIT(2)(a1)
@@ -546,8 +565,8 @@ L(chkw):
*/
L(chk1w):
andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */
- beq a2,t8,L(lastw)
PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */
+ beq a2,t8,L(lastw)
PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */
/* copying in words (4-byte or 8-byte chunks) */
@@ -555,8 +574,8 @@ L(wordCopy_loop):
C_LD REG3,UNIT(0)(a1)
PTR_ADDIU a0,a0,UNIT(1)
PTR_ADDIU a1,a1,UNIT(1)
- bne a0,a3,L(wordCopy_loop)
C_ST REG3,UNIT(-1)(a0)
+ bne a0,a3,L(wordCopy_loop)
/* If we have been copying double words, see if we can copy a single word
before doing byte copies. We can have, at most, one word to copy. */
@@ -574,17 +593,16 @@ L(lastw):
/* Copy the last 8 (or 16) bytes */
L(lastb):
- blez a2,L(leave)
PTR_ADDU a3,a0,a2 /* a3 is the last dst address */
+ blez a2,L(leave)
L(lastbloop):
lb v1,0(a1)
PTR_ADDIU a0,a0,1
PTR_ADDIU a1,a1,1
- bne a0,a3,L(lastbloop)
sb v1,-1(a0)
+ bne a0,a3,L(lastbloop)
L(leave):
- j ra
- nop
+ jr ra
/* We jump here with a memcpy of less than 8 or 16 bytes, depending on
whether or not USE_DOUBLE is defined. Instead of just doing byte
@@ -625,8 +643,8 @@ L(wcopy_loop):
L(unaligned):
andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */
+ PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */
beqz a3,L(ua_chk16w) /* if a3=0, it is already aligned */
- PTR_SUBU a2,a2,a3 /* a2 is the remaining bytes count */
C_LDHI v1,UNIT(0)(a1)
C_LDLO v1,UNITM1(1)(a1)
@@ -644,8 +662,8 @@ L(unaligned):
L(ua_chk16w):
andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */
- beq a2,t8,L(ua_chkw) /* if a2==t8, no 64-byte/128-byte chunks */
PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */
+ beq a2,t8,L(ua_chkw) /* if a2==t8, no 64-byte/128-byte chunks */
PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */
# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
@@ -664,7 +682,6 @@ L(ua_chk16w):
# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
sltu v1,t9,a0
bgtz v1,L(ua_skip_set)
- nop
PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4)
L(ua_skip_set):
# else
@@ -676,11 +693,16 @@ L(ua_loop16w):
C_LDHI t0,UNIT(0)(a1)
C_LDHI t1,UNIT(1)(a1)
C_LDHI REG2,UNIT(2)(a1)
+/* We need to separate out the C_LDHI instruction here so that it will work
+ both when it is used by itself and when it is used with the branch
+ instruction. */
# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
sltu v1,t9,a0
+ C_LDHI REG3,UNIT(3)(a1)
bgtz v1,L(ua_skip_pref)
-# endif
+# else
C_LDHI REG3,UNIT(3)(a1)
+# endif
PREFETCH_FOR_STORE (4, a0)
PREFETCH_FOR_STORE (5, a0)
L(ua_skip_pref):
@@ -731,8 +753,8 @@ L(ua_skip_pref):
C_ST REG6,UNIT(14)(a0)
C_ST REG7,UNIT(15)(a0)
PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */
- bne a0,a3,L(ua_loop16w)
PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */
+ bne a0,a3,L(ua_loop16w)
move a2,t8
/* Here we have src and dest word-aligned but less than 64-bytes or
@@ -745,7 +767,6 @@ L(ua_chkw):
andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */
/* t8 is the reminder count past 32-bytes */
beq a2,t8,L(ua_chk1w) /* When a2=t8, no 32-byte chunk */
- nop
C_LDHI t0,UNIT(0)(a1)
C_LDHI t1,UNIT(1)(a1)
C_LDHI REG2,UNIT(2)(a1)
@@ -778,8 +799,8 @@ L(ua_chkw):
*/
L(ua_chk1w):
andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */
- beq a2,t8,L(ua_smallCopy)
PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */
+ beq a2,t8,L(ua_smallCopy)
PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */
/* copying in words (4-byte or 8-byte chunks) */
@@ -788,22 +809,21 @@ L(ua_wordCopy_loop):
C_LDLO v1,UNITM1(1)(a1)
PTR_ADDIU a0,a0,UNIT(1)
PTR_ADDIU a1,a1,UNIT(1)
- bne a0,a3,L(ua_wordCopy_loop)
C_ST v1,UNIT(-1)(a0)
+ bne a0,a3,L(ua_wordCopy_loop)
/* Copy the last 8 (or 16) bytes */
L(ua_smallCopy):
- beqz a2,L(leave)
PTR_ADDU a3,a0,a2 /* a3 is the last dst address */
+ beqz a2,L(leave)
L(ua_smallCopy_loop):
lb v1,0(a1)
PTR_ADDIU a0,a0,1
PTR_ADDIU a1,a1,1
- bne a0,a3,L(ua_smallCopy_loop)
sb v1,-1(a0)
+ bne a0,a3,L(ua_smallCopy_loop)
- j ra
- nop
+ jr ra
#else /* R6_CODE */
@@ -816,9 +836,9 @@ L(ua_smallCopy_loop):
# endif
# define R6_UNALIGNED_WORD_COPY(BYTEOFFSET) \
andi REG7, a2, (NSIZE-1);/* REG7 is # of bytes to by bytes. */ \
- beq REG7, a2, L(lastb); /* Check for bytes to copy by word */ \
PTR_SUBU a3, a2, REG7; /* a3 is number of bytes to be copied in */ \
/* (d)word chunks. */ \
+ beq REG7, a2, L(lastb); /* Check for bytes to copy by word */ \
move a2, REG7; /* a2 is # of bytes to copy byte by byte */ \
/* after word loop is finished. */ \
PTR_ADDU REG6, a0, a3; /* REG6 is the dst address after loop. */ \
@@ -831,10 +851,9 @@ L(r6_ua_wordcopy##BYTEOFFSET): \
PTR_ADDIU a0, a0, UNIT(1); /* Increment destination pointer. */ \
PTR_ADDIU REG2, REG2, UNIT(1); /* Increment aligned source pointer.*/ \
move t0, t1; /* Move second part of source to first. */ \
- bne a0, REG6,L(r6_ua_wordcopy##BYTEOFFSET); \
C_ST REG3, UNIT(-1)(a0); \
+ bne a0, REG6,L(r6_ua_wordcopy##BYTEOFFSET); \
j L(lastb); \
- nop
/* We are generating R6 code, the destination is 4 byte aligned and
the source is not 4 byte aligned. t8 is 1, 2, or 3 depending on the
@@ -859,7 +878,6 @@ L(r6_unaligned7):
#endif /* R6_CODE */
.set at
- .set reorder
END(MEMCPY_NAME)
#ifndef ANDROID_CHANGES
# ifdef _LIBC
diff --git a/sysdeps/mips/memset.S b/sysdeps/mips/memset.S
index 466599b9f4..0c8375c9f5 100644
--- a/sysdeps/mips/memset.S
+++ b/sysdeps/mips/memset.S
@@ -82,6 +82,12 @@
# endif
#endif
+#if __mips_isa_rev > 5 && defined (__mips_micromips)
+# define PTR_BC bc16
+#else
+# define PTR_BC bc
+#endif
+
/* Using PREFETCH_HINT_PREPAREFORSTORE instead of PREFETCH_STORE
or PREFETCH_STORE_STREAMED offers a large performance advantage
but PREPAREFORSTORE has some special restrictions to consider.
@@ -205,17 +211,16 @@ LEAF(MEMSET_NAME)
#endif
.set nomips16
- .set noreorder
-/* If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of
+/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of
size, copy dst pointer to v0 for the return value. */
- slti t2,a2,(2 * NSIZE)
- bne t2,zero,L(lastb)
+ slti t2,a2,(4 * NSIZE)
move v0,a0
+ bne t2,zero,L(lastb)
/* If memset value is not zero, we copy it to all the bytes in a 32 or 64
bit word. */
- beq a1,zero,L(set0) /* If memset value is zero no smear */
PTR_SUBU a3,zero,a0
+ beq a1,zero,L(set0) /* If memset value is zero no smear */
nop
/* smear byte into 32 or 64 bit word */
@@ -251,26 +256,30 @@ LEAF(MEMSET_NAME)
L(set0):
#ifndef R6_CODE
andi t2,a3,(NSIZE-1) /* word-unaligned address? */
- beq t2,zero,L(aligned) /* t2 is the unalignment count */
PTR_SUBU a2,a2,t2
+ beq t2,zero,L(aligned) /* t2 is the unalignment count */
C_STHI a1,0(a0)
PTR_ADDU a0,a0,t2
#else /* R6_CODE */
- andi t2,a0,(NSIZE-1)
+ andi t2,a0,7
+# ifdef __mips_micromips
+ auipc t9,%pcrel_hi(L(atable))
+ addiu t9,t9,%pcrel_lo(L(atable)+4)
+ PTR_LSA t9,t2,t9,1
+# else
lapc t9,L(atable)
PTR_LSA t9,t2,t9,2
+# endif
jrc t9
L(atable):
- bc L(aligned)
-# ifdef USE_DOUBLE
- bc L(lb7)
- bc L(lb6)
- bc L(lb5)
- bc L(lb4)
-# endif
- bc L(lb3)
- bc L(lb2)
- bc L(lb1)
+ PTR_BC L(aligned)
+ PTR_BC L(lb7)
+ PTR_BC L(lb6)
+ PTR_BC L(lb5)
+ PTR_BC L(lb4)
+ PTR_BC L(lb3)
+ PTR_BC L(lb2)
+ PTR_BC L(lb1)
L(lb7):
sb a1,6(a0)
L(lb6):
@@ -300,8 +309,8 @@ L(aligned):
left to store or we would have jumped to L(lastb) earlier in the code. */
#ifdef DOUBLE_ALIGN
andi t2,a3,4
- beq t2,zero,L(double_aligned)
PTR_SUBU a2,a2,t2
+ beq t2,zero,L(double_aligned)
sw a1,0(a0)
PTR_ADDU a0,a0,t2
L(double_aligned):
@@ -313,8 +322,8 @@ L(double_aligned):
chunks have been copied. We will loop, incrementing a0 until it equals
a3. */
andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */
- beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */
PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */
+ beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */
PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */
/* When in the loop we may prefetch with the 'prepare to store' hint,
@@ -339,7 +348,6 @@ L(loop16w):
&& (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */
bgtz v1,L(skip_pref)
- nop
#endif
#ifdef R6_CODE
PREFETCH_FOR_STORE (2, a0)
@@ -366,7 +374,6 @@ L(skip_pref):
C_ST a1,UNIT(15)(a0)
PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */
bne a0,a3,L(loop16w)
- nop
move a2,t8
/* Here we have dest word-aligned but less than 64-bytes or 128 bytes to go.
@@ -376,7 +383,6 @@ L(chkw):
andi t8,a2,NSIZEMASK /* is there a 32-byte/64-byte chunk. */
/* the t8 is the reminder count past 32-bytes */
beq a2,t8,L(chk1w)/* when a2==t8, no 32-byte chunk */
- nop
C_ST a1,UNIT(0)(a0)
C_ST a1,UNIT(1)(a0)
C_ST a1,UNIT(2)(a0)
@@ -394,30 +400,28 @@ L(chkw):
been copied. We will loop, incrementing a0 until a0 equals a3. */
L(chk1w):
andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */
- beq a2,t8,L(lastb)
PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */
+ beq a2,t8,L(lastb)
PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */
/* copying in words (4-byte or 8 byte chunks) */
L(wordCopy_loop):
PTR_ADDIU a0,a0,UNIT(1)
- bne a0,a3,L(wordCopy_loop)
C_ST a1,UNIT(-1)(a0)
+ bne a0,a3,L(wordCopy_loop)
/* Copy the last 8 (or 16) bytes */
L(lastb):
- blez a2,L(leave)
PTR_ADDU a3,a0,a2 /* a3 is the last dst address */
+ blez a2,L(leave)
L(lastbloop):
PTR_ADDIU a0,a0,1
- bne a0,a3,L(lastbloop)
sb a1,-1(a0)
+ bne a0,a3,L(lastbloop)
L(leave):
- j ra
- nop
+ jr ra
.set at
- .set reorder
END(MEMSET_NAME)
#ifndef ANDROID_CHANGES
# ifdef _LIBC
diff --git a/sysdeps/mips/mips32/crtn.S b/sysdeps/mips/mips32/crtn.S
index 89ecbd9882..568aabd86e 100644
--- a/sysdeps/mips/mips32/crtn.S
+++ b/sysdeps/mips/mips32/crtn.S
@@ -40,18 +40,10 @@
.section .init,"ax",@progbits
lw $31,28($sp)
- .set noreorder
- .set nomacro
- j $31
addiu $sp,$sp,32
- .set macro
- .set reorder
+ jr $31
.section .fini,"ax",@progbits
lw $31,28($sp)
- .set noreorder
- .set nomacro
- j $31
addiu $sp,$sp,32
- .set macro
- .set reorder
+ jr $31
diff --git a/sysdeps/mips/mips64/__longjmp.c b/sysdeps/mips/mips64/__longjmp.c
index 4a93e884c0..1a9bb7b23e 100644
--- a/sysdeps/mips/mips64/__longjmp.c
+++ b/sysdeps/mips/mips64/__longjmp.c
@@ -87,7 +87,7 @@ __longjmp (__jmp_buf env_arg, int val_arg)
else
asm volatile ("move $2, %0" : : "r" (val));
- asm volatile ("j $31");
+ asm volatile ("jr $31");
/* Avoid `volatile function does return' warnings. */
for (;;);
diff --git a/sysdeps/mips/mips64/add_n.S b/sysdeps/mips/mips64/add_n.S
index 345d62fbc5..bab523fd5a 100644
--- a/sysdeps/mips/mips64/add_n.S
+++ b/sysdeps/mips/mips64/add_n.S
@@ -37,16 +37,13 @@ ENTRY (__mpn_add_n)
#ifdef __PIC__
SETUP_GP /* ??? unused */
#endif
- .set noreorder
- .set nomacro
-
ld $10,0($5)
ld $11,0($6)
daddiu $7,$7,-1
and $9,$7,4-1 # number of limbs in first loop
- beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
move $2,$0
+ beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
dsubu $7,$7,$9
@@ -64,11 +61,10 @@ L(Loop0): daddiu $9,$9,-1
daddiu $6,$6,8
move $10,$12
move $11,$13
- bne $9,$0,L(Loop0)
daddiu $4,$4,8
+ bne $9,$0,L(Loop0)
L(L0): beq $7,$0,L(Lend)
- nop
L(Loop): daddiu $7,$7,-4
@@ -111,15 +107,15 @@ L(Loop): daddiu $7,$7,-4
daddiu $5,$5,32
daddiu $6,$6,32
- bne $7,$0,L(Loop)
daddiu $4,$4,32
+ bne $7,$0,L(Loop)
L(Lend): daddu $11,$11,$2
sltu $8,$11,$2
daddu $11,$10,$11
sltu $2,$11,$10
sd $11,0($4)
- j $31
or $2,$2,$8
+ jr $31
END (__mpn_add_n)
diff --git a/sysdeps/mips/mips64/addmul_1.S b/sysdeps/mips/mips64/addmul_1.S
index d105938f00..d84edd76a0 100644
--- a/sysdeps/mips/mips64/addmul_1.S
+++ b/sysdeps/mips/mips64/addmul_1.S
@@ -36,9 +36,6 @@ ENTRY (__mpn_addmul_1)
#ifdef PIC
SETUP_GP /* ??? unused */
#endif
- .set noreorder
- .set nomacro
-
# warm up phase 0
ld $8,0($5)
@@ -52,12 +49,12 @@ ENTRY (__mpn_addmul_1)
#endif
daddiu $6,$6,-1
- beq $6,$0,L(LC0)
move $2,$0 # zero cy2
+ beq $6,$0,L(LC0)
daddiu $6,$6,-1
- beq $6,$0,L(LC1)
ld $8,0($5) # load new s1 limb as early as possible
+ beq $6,$0,L(LC1)
L(Loop): ld $10,0($4)
#if __mips_isa_rev < 6
@@ -83,8 +80,8 @@ L(Loop): ld $10,0($4)
daddu $2,$2,$10
sd $3,0($4)
daddiu $4,$4,8
- bne $6,$0,L(Loop)
daddu $2,$9,$2 # add high product limb and carry from addition
+ bne $6,$0,L(Loop)
# cool down phase 1
L(LC1): ld $10,0($4)
@@ -125,7 +122,7 @@ L(LC0): ld $10,0($4)
sltu $10,$3,$10
daddu $2,$2,$10
sd $3,0($4)
- j $31
daddu $2,$9,$2 # add high product limb and carry from addition
+ jr $31
END (__mpn_addmul_1)
diff --git a/sysdeps/mips/mips64/lshift.S b/sysdeps/mips/mips64/lshift.S
index 2ea2e58b85..ca84385998 100644
--- a/sysdeps/mips/mips64/lshift.S
+++ b/sysdeps/mips/mips64/lshift.S
@@ -36,9 +36,6 @@ ENTRY (__mpn_lshift)
#ifdef __PIC__
SETUP_GP /* ??? unused */
#endif
- .set noreorder
- .set nomacro
-
dsll $2,$6,3
daddu $5,$5,$2 # make r5 point at end of src
ld $10,-8($5) # load first limb
@@ -46,8 +43,8 @@ ENTRY (__mpn_lshift)
daddu $4,$4,$2 # make r4 point at end of res
daddiu $6,$6,-1
and $9,$6,4-1 # number of limbs in first loop
- beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
dsrl $2,$10,$13 # compute function result
+ beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
dsubu $6,$6,$9
@@ -59,11 +56,10 @@ L(Loop0): ld $3,-16($5)
dsrl $12,$3,$13
move $10,$3
or $8,$11,$12
- bne $9,$0,L(Loop0)
sd $8,0($4)
+ bne $9,$0,L(Loop0)
L(L0): beq $6,$0,L(Lend)
- nop
L(Loop): ld $3,-16($5)
daddiu $4,$4,-32
@@ -91,10 +87,10 @@ L(Loop): ld $3,-16($5)
daddiu $5,$5,-32
or $8,$14,$9
- bgtz $6,L(Loop)
sd $8,0($4)
+ bgtz $6,L(Loop)
L(Lend): dsll $8,$10,$7
- j $31
sd $8,-8($4)
+ jr $31
END (__mpn_lshift)
diff --git a/sysdeps/mips/mips64/mul_1.S b/sysdeps/mips/mips64/mul_1.S
index 321789b345..7604bac3a2 100644
--- a/sysdeps/mips/mips64/mul_1.S
+++ b/sysdeps/mips/mips64/mul_1.S
@@ -37,9 +37,6 @@ ENTRY (__mpn_mul_1)
#ifdef __PIC__
SETUP_GP /* ??? unused */
#endif
- .set noreorder
- .set nomacro
-
# warm up phase 0
ld $8,0($5)
@@ -53,12 +50,12 @@ ENTRY (__mpn_mul_1)
#endif
daddiu $6,$6,-1
- beq $6,$0,L(LC0)
move $2,$0 # zero cy2
+ beq $6,$0,L(LC0)
daddiu $6,$6,-1
- beq $6,$0,L(LC1)
ld $8,0($5) # load new s1 limb as early as possible
+ beq $6,$0,L(LC1)
#if __mips_isa_rev < 6
L(Loop): mflo $10
@@ -80,8 +77,8 @@ L(Loop): move $10,$11
sltu $2,$10,$2 # carry from previous addition -> $2
sd $10,0($4)
daddiu $4,$4,8
- bne $6,$0,L(Loop)
daddu $2,$9,$2 # add high product limb and carry from addition
+ bne $6,$0,L(Loop)
# cool down phase 1
#if __mips_isa_rev < 6
@@ -114,7 +111,7 @@ L(LC0): move $10,$11
daddu $10,$10,$2
sltu $2,$10,$2
sd $10,0($4)
- j $31
daddu $2,$9,$2 # add high product limb and carry from addition
+ jr $31
END (__mpn_mul_1)
diff --git a/sysdeps/mips/mips64/n32/crtn.S b/sysdeps/mips/mips64/n32/crtn.S
index 633d79cfad..8d4c83381c 100644
--- a/sysdeps/mips/mips64/n32/crtn.S
+++ b/sysdeps/mips/mips64/n32/crtn.S
@@ -41,19 +41,11 @@
.section .init,"ax",@progbits
ld $31,8($sp)
ld $28,0($sp)
- .set noreorder
- .set nomacro
- j $31
addiu $sp,$sp,16
- .set macro
- .set reorder
+ jr $31
.section .fini,"ax",@progbits
ld $31,8($sp)
ld $28,0($sp)
- .set noreorder
- .set nomacro
- j $31
addiu $sp,$sp,16
- .set macro
- .set reorder
+ jr $31
diff --git a/sysdeps/mips/mips64/n64/crtn.S b/sysdeps/mips/mips64/n64/crtn.S
index 99ed1e3263..110040c9fc 100644
--- a/sysdeps/mips/mips64/n64/crtn.S
+++ b/sysdeps/mips/mips64/n64/crtn.S
@@ -41,19 +41,11 @@
.section .init,"ax",@progbits
ld $31,8($sp)
ld $28,0($sp)
- .set noreorder
- .set nomacro
- j $31
daddiu $sp,$sp,16
- .set macro
- .set reorder
+ jr $31
.section .fini,"ax",@progbits
ld $31,8($sp)
ld $28,0($sp)
- .set noreorder
- .set nomacro
- j $31
daddiu $sp,$sp,16
- .set macro
- .set reorder
+ jr $31
diff --git a/sysdeps/mips/mips64/rshift.S b/sysdeps/mips/mips64/rshift.S
index 1f6e3a2a12..153aacfd86 100644
--- a/sysdeps/mips/mips64/rshift.S
+++ b/sysdeps/mips/mips64/rshift.S
@@ -36,15 +36,12 @@ ENTRY (__mpn_rshift)
#ifdef __PIC__
SETUP_GP /* ??? unused */
#endif
- .set noreorder
- .set nomacro
-
ld $10,0($5) # load first limb
dsubu $13,$0,$7
daddiu $6,$6,-1
and $9,$6,4-1 # number of limbs in first loop
- beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
dsll $2,$10,$13 # compute function result
+ beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
dsubu $6,$6,$9
@@ -56,11 +53,10 @@ L(Loop0): ld $3,8($5)
dsll $12,$3,$13
move $10,$3
or $8,$11,$12
- bne $9,$0,L(Loop0)
sd $8,-8($4)
+ bne $9,$0,L(Loop0)
L(L0): beq $6,$0,L(Lend)
- nop
L(Loop): ld $3,8($5)
daddiu $4,$4,32
@@ -88,10 +84,10 @@ L(Loop): ld $3,8($5)
daddiu $5,$5,32
or $8,$14,$9
- bgtz $6,L(Loop)
sd $8,-8($4)
+ bgtz $6,L(Loop)
L(Lend): dsrl $8,$10,$7
- j $31
sd $8,0($4)
+ jr $31
END (__mpn_rshift)
diff --git a/sysdeps/mips/mips64/sub_n.S b/sysdeps/mips/mips64/sub_n.S
index b83d5ccab6..5b7337472f 100644
--- a/sysdeps/mips/mips64/sub_n.S
+++ b/sysdeps/mips/mips64/sub_n.S
@@ -37,16 +37,13 @@ ENTRY (__mpn_sub_n)
#ifdef __PIC__
SETUP_GP /* ??? unused */
#endif
- .set noreorder
- .set nomacro
-
ld $10,0($5)
ld $11,0($6)
daddiu $7,$7,-1
and $9,$7,4-1 # number of limbs in first loop
- beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
move $2,$0
+ beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop
dsubu $7,$7,$9
@@ -64,11 +61,10 @@ L(Loop0): daddiu $9,$9,-1
daddiu $6,$6,8
move $10,$12
move $11,$13
- bne $9,$0,L(Loop0)
daddiu $4,$4,8
+ bne $9,$0,L(Loop0)
L(L0): beq $7,$0,L(Lend)
- nop
L(Loop): daddiu $7,$7,-4
@@ -111,15 +107,15 @@ L(Loop): daddiu $7,$7,-4
daddiu $5,$5,32
daddiu $6,$6,32
- bne $7,$0,L(Loop)
daddiu $4,$4,32
+ bne $7,$0,L(Loop)
L(Lend): daddu $11,$11,$2
sltu $8,$11,$2
dsubu $11,$10,$11
sltu $2,$10,$11
sd $11,0($4)
- j $31
or $2,$2,$8
+ jr $31
END (__mpn_sub_n)
diff --git a/sysdeps/mips/mips64/submul_1.S b/sysdeps/mips/mips64/submul_1.S
index 46f26e8dde..121433d232 100644
--- a/sysdeps/mips/mips64/submul_1.S
+++ b/sysdeps/mips/mips64/submul_1.S
@@ -37,9 +37,6 @@ ENTRY (__mpn_submul_1)
#ifdef __PIC__
SETUP_GP /* ??? unused */
#endif
- .set noreorder
- .set nomacro
-
# warm up phase 0
ld $8,0($5)
@@ -53,12 +50,12 @@ ENTRY (__mpn_submul_1)
#endif
daddiu $6,$6,-1
- beq $6,$0,L(LC0)
move $2,$0 # zero cy2
+ beq $6,$0,L(LC0)
daddiu $6,$6,-1
- beq $6,$0,L(LC1)
ld $8,0($5) # load new s1 limb as early as possible
+ beq $6,$0,L(LC1)
L(Loop): ld $10,0($4)
#if __mips_isa_rev < 6
@@ -84,8 +81,8 @@ L(Loop): ld $10,0($4)
daddu $2,$2,$10
sd $3,0($4)
daddiu $4,$4,8
- bne $6,$0,L(Loop)
daddu $2,$9,$2 # add high product limb and carry from addition
+ bne $6,$0,L(Loop)
# cool down phase 1
L(LC1): ld $10,0($4)
@@ -126,7 +123,7 @@ L(LC0): ld $10,0($4)
sgtu $10,$3,$10
daddu $2,$2,$10
sd $3,0($4)
- j $31
daddu $2,$9,$2 # add high product limb and carry from addition
+ jr $31
END (__mpn_submul_1)
diff --git a/sysdeps/mips/mul_1.S b/sysdeps/mips/mul_1.S
index cfd4cc7cd5..ae65ebe79d 100644
--- a/sysdeps/mips/mul_1.S
+++ b/sysdeps/mips/mul_1.S
@@ -31,12 +31,9 @@ along with the GNU MP Library. If not, see
.option pic2
#endif
ENTRY (__mpn_mul_1)
- .set noreorder
#ifdef __PIC__
.cpload t9
#endif
- .set nomacro
-
/* warm up phase 0 */
lw $8,0($5)
@@ -50,12 +47,12 @@ ENTRY (__mpn_mul_1)
#endif
addiu $6,$6,-1
- beq $6,$0,L(LC0)
move $2,$0 /* zero cy2 */
+ beq $6,$0,L(LC0)
addiu $6,$6,-1
- beq $6,$0,L(LC1)
lw $8,0($5) /* load new s1 limb as early as possible */
+ beq $6,$0,L(LC1)
#if __mips_isa_rev < 6
@@ -78,8 +75,8 @@ L(Loop): move $10,$11
sltu $2,$10,$2 /* carry from previous addition -> $2 */
sw $10,0($4)
addiu $4,$4,4
- bne $6,$0,L(Loop) /* should be "bnel" */
addu $2,$9,$2 /* add high product limb and carry from addition */
+ bne $6,$0,L(Loop) /* should be "bnel" */
/* cool down phase 1 */
#if __mips_isa_rev < 6
@@ -112,6 +109,6 @@ L(LC0): move $10,$11
addu $10,$10,$2
sltu $2,$10,$2
sw $10,0($4)
- j $31
addu $2,$9,$2 /* add high product limb and carry from addition */
+ jr $31
END (__mpn_mul_1)
diff --git a/sysdeps/mips/rshift.S b/sysdeps/mips/rshift.S
index e19fa41234..b453ca2ba7 100644
--- a/sysdeps/mips/rshift.S
+++ b/sysdeps/mips/rshift.S
@@ -30,18 +30,15 @@ along with the GNU MP Library. If not, see
.option pic2
#endif
ENTRY (__mpn_rshift)
- .set noreorder
#ifdef __PIC__
.cpload t9
#endif
- .set nomacro
-
lw $10,0($5) /* load first limb */
subu $13,$0,$7
addiu $6,$6,-1
and $9,$6,4-1 /* number of limbs in first loop */
+ sll $2,$10,$13 /* compute function result */
beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop*/
- sll $2,$10,$13 /* compute function result */
subu $6,$6,$9
@@ -53,11 +50,10 @@ L(Loop0): lw $3,4($5)
sll $12,$3,$13
move $10,$3
or $8,$11,$12
+ sw $8,-4($4)
bne $9,$0,L(Loop0)
- sw $8,-4($4)
L(L0): beq $6,$0,L(Lend)
- nop
L(Loop): lw $3,4($5)
addiu $4,$4,16
@@ -85,10 +81,10 @@ L(Loop): lw $3,4($5)
addiu $5,$5,16
or $8,$14,$9
+ sw $8,-4($4)
bgtz $6,L(Loop)
- sw $8,-4($4)
L(Lend): srl $8,$10,$7
- j $31
sw $8,0($4)
+ jr $31
END (__mpn_rshift)
diff --git a/sysdeps/mips/sub_n.S b/sysdeps/mips/sub_n.S
index 3e988ecbb4..9f7cb5458d 100644
--- a/sysdeps/mips/sub_n.S
+++ b/sysdeps/mips/sub_n.S
@@ -31,19 +31,16 @@ along with the GNU MP Library. If not, see
.option pic2
#endif
ENTRY (__mpn_sub_n)
- .set noreorder
#ifdef __PIC__
.cpload t9
#endif
- .set nomacro
-
lw $10,0($5)
lw $11,0($6)
addiu $7,$7,-1
and $9,$7,4-1 /* number of limbs in first loop */
- beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */
move $2,$0
+ beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */
subu $7,$7,$9
@@ -61,11 +58,10 @@ L(Loop0): addiu $9,$9,-1
addiu $6,$6,4
move $10,$12
move $11,$13
- bne $9,$0,L(Loop0)
addiu $4,$4,4
+ bne $9,$0,L(Loop0)
L(L0): beq $7,$0,L(Lend)
- nop
L(Loop): addiu $7,$7,-4
@@ -108,14 +104,14 @@ L(Loop): addiu $7,$7,-4
addiu $5,$5,16
addiu $6,$6,16
- bne $7,$0,L(Loop)
addiu $4,$4,16
+ bne $7,$0,L(Loop)
L(Lend): addu $11,$11,$2
sltu $8,$11,$2
subu $11,$10,$11
sltu $2,$10,$11
sw $11,0($4)
- j $31
or $2,$2,$8
+ jr $31
END (__mpn_sub_n)
diff --git a/sysdeps/mips/submul_1.S b/sysdeps/mips/submul_1.S
index be8e2844ef..8405801c57 100644
--- a/sysdeps/mips/submul_1.S
+++ b/sysdeps/mips/submul_1.S
@@ -31,12 +31,9 @@ along with the GNU MP Library. If not, see
.option pic2
#endif
ENTRY (__mpn_submul_1)
- .set noreorder
#ifdef __PIC__
.cpload t9
#endif
- .set nomacro
-
/* warm up phase 0 */
lw $8,0($5)
@@ -50,12 +47,12 @@ ENTRY (__mpn_submul_1)
#endif
addiu $6,$6,-1
- beq $6,$0,L(LC0)
move $2,$0 /* zero cy2 */
+ beq $6,$0,L(LC0)
addiu $6,$6,-1
- beq $6,$0,L(LC1)
lw $8,0($5) /* load new s1 limb as early as possible */
+ beq $6,$0,L(LC1)
L(Loop): lw $10,0($4)
#if __mips_isa_rev < 6
@@ -81,8 +78,8 @@ L(Loop): lw $10,0($4)
addu $2,$2,$10
sw $3,0($4)
addiu $4,$4,4
- bne $6,$0,L(Loop) /* should be "bnel" */
addu $2,$9,$2 /* add high product limb and carry from addition */
+ bne $6,$0,L(Loop) /* should be "bnel" */
/* cool down phase 1 */
L(LC1): lw $10,0($4)
@@ -123,6 +120,6 @@ L(LC0): lw $10,0($4)
sgtu $10,$3,$10
addu $2,$2,$10
sw $3,0($4)
- j $31
addu $2,$9,$2 /* add high product limb and carry from addition */
+ jr $31
END (__mpn_submul_1)
diff --git a/sysdeps/mips/sys/asm.h b/sysdeps/mips/sys/asm.h
index e43eb39ca3..62f9e549c6 100644
--- a/sysdeps/mips/sys/asm.h
+++ b/sysdeps/mips/sys/asm.h
@@ -71,23 +71,21 @@
.set reorder
/* Set gp when not at 1st instruction */
# define SETUP_GPX(r) \
- .set noreorder; \
move r, $31; /* Save old ra. */ \
bal 10f; /* Find addr of cpload. */ \
- nop; \
10: \
+ .set noreorder; \
.cpload $31; \
- move $31, r; \
- .set reorder
+ .set reorder; \
+ move $31, r;
# define SETUP_GPX_L(r, l) \
- .set noreorder; \
move r, $31; /* Save old ra. */ \
bal l; /* Find addr of cpload. */ \
- nop; \
l: \
+ .set noreorder; \
.cpload $31; \
- move $31, r; \
- .set reorder
+ .set reorder; \
+ move $31, r;
# define SAVE_GP(x) \
.cprestore x /* Save gp trigger t9/jalr conversion. */
# define SETUP_GP64(a, b)
@@ -108,20 +106,14 @@ l: \
.cpsetup $25, gpoffset, proc
# define SETUP_GPX64(cp_reg, ra_save) \
move ra_save, $31; /* Save old ra. */ \
- .set noreorder; \
bal 10f; /* Find addr of .cpsetup. */ \
- nop; \
10: \
- .set reorder; \
.cpsetup $31, cp_reg, 10b; \
move $31, ra_save
# define SETUP_GPX64_L(cp_reg, ra_save, l) \
move ra_save, $31; /* Save old ra. */ \
- .set noreorder; \
bal l; /* Find addr of .cpsetup. */ \
- nop; \
l: \
- .set reorder; \
.cpsetup $31, cp_reg, l; \
move $31, ra_save
# define RESTORE_GP64 \
diff --git a/sysdeps/unix/mips/mips32/sysdep.h b/sysdeps/unix/mips/mips32/sysdep.h
index c515b94540..df3f73a4eb 100644
--- a/sysdeps/unix/mips/mips32/sysdep.h
+++ b/sysdeps/unix/mips/mips32/sysdep.h
@@ -38,18 +38,14 @@
L(syse1):
#else
#define PSEUDO(name, syscall_name, args) \
- .set noreorder; \
.set nomips16; \
.align 2; \
cfi_startproc; \
99: j __syscall_error; \
- nop; \
cfi_endproc; \
ENTRY(name) \
- .set noreorder; \
li v0, SYS_ify(syscall_name); \
syscall; \
- .set reorder; \
bne a3, zero, 99b; \
L(syse1):
#endif
diff --git a/sysdeps/unix/mips/mips64/sysdep.h b/sysdeps/unix/mips/mips64/sysdep.h
index 6565b84e3a..c0772002e6 100644
--- a/sysdeps/unix/mips/mips64/sysdep.h
+++ b/sysdeps/unix/mips/mips64/sysdep.h
@@ -45,18 +45,14 @@
L(syse1):
#else
#define PSEUDO(name, syscall_name, args) \
- .set noreorder; \
.align 2; \
.set nomips16; \
cfi_startproc; \
99: j __syscall_error; \
- nop; \
cfi_endproc; \
ENTRY(name) \
- .set noreorder; \
li v0, SYS_ify(syscall_name); \
syscall; \
- .set reorder; \
bne a3, zero, 99b; \
L(syse1):
#endif
diff --git a/sysdeps/unix/mips/sysdep.h b/sysdeps/unix/mips/sysdep.h
index d1e0460260..07cd5c4a06 100644
--- a/sysdeps/unix/mips/sysdep.h
+++ b/sysdeps/unix/mips/sysdep.h
@@ -48,7 +48,6 @@
.align 2; \
ENTRY(name) \
.set nomips16; \
- .set noreorder; \
li v0, SYS_ify(syscall_name); \
syscall
@@ -61,7 +60,6 @@
.align 2; \
ENTRY(name) \
.set nomips16; \
- .set noreorder; \
li v0, SYS_ify(syscall_name); \
syscall
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
index 47a1b97351..647a66ee1f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
+++ b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
@@ -140,10 +140,8 @@ union __mips_syscall_return
register long int __v0 asm ("$2"); \
register long int __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set reorder" \
: "=r" (__v0), "=r" (__a3) \
: input \
: __SYSCALL_CLOBBERS); \
@@ -164,10 +162,8 @@ union __mips_syscall_return
register long int __a0 asm ("$4") = _arg1; \
register long int __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set reorder" \
: "=r" (__v0), "=r" (__a3) \
: input, "r" (__a0) \
: __SYSCALL_CLOBBERS); \
@@ -190,10 +186,8 @@ union __mips_syscall_return
register long int __a1 asm ("$5") = _arg2; \
register long int __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "=r" (__a3) \
: input, "r" (__a0), "r" (__a1) \
: __SYSCALL_CLOBBERS); \
@@ -219,10 +213,8 @@ union __mips_syscall_return
register long int __a2 asm ("$6") = _arg3; \
register long int __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "=r" (__a3) \
: input, "r" (__a0), "r" (__a1), "r" (__a2) \
: __SYSCALL_CLOBBERS); \
@@ -249,10 +241,8 @@ union __mips_syscall_return
register long int __a2 asm ("$6") = _arg3; \
register long int __a3 asm ("$7") = _arg4; \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "+r" (__a3) \
: input, "r" (__a0), "r" (__a1), "r" (__a2) \
: __SYSCALL_CLOBBERS); \
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h
index 0438bed23d..8f4787352a 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h
+++ b/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h
@@ -95,10 +95,8 @@
register __syscall_arg_t __v0 asm ("$2"); \
register __syscall_arg_t __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set reorder" \
: "=r" (__v0), "=r" (__a3) \
: input \
: __SYSCALL_CLOBBERS); \
@@ -119,10 +117,8 @@
register __syscall_arg_t __a0 asm ("$4") = _arg1; \
register __syscall_arg_t __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set reorder" \
: "=r" (__v0), "=r" (__a3) \
: input, "r" (__a0) \
: __SYSCALL_CLOBBERS); \
@@ -145,10 +141,8 @@
register __syscall_arg_t __a1 asm ("$5") = _arg2; \
register __syscall_arg_t __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "=r" (__a3) \
: input, "r" (__a0), "r" (__a1) \
: __SYSCALL_CLOBBERS); \
@@ -173,10 +167,8 @@
register __syscall_arg_t __a2 asm ("$6") = _arg3; \
register __syscall_arg_t __a3 asm ("$7"); \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "=r" (__a3) \
: input, "r" (__a0), "r" (__a1), "r" (__a2) \
: __SYSCALL_CLOBBERS); \
@@ -203,10 +195,8 @@
register __syscall_arg_t __a2 asm ("$6") = _arg3; \
register __syscall_arg_t __a3 asm ("$7") = _arg4; \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "+r" (__a3) \
: input, "r" (__a0), "r" (__a1), "r" (__a2) \
: __SYSCALL_CLOBBERS); \
@@ -235,10 +225,8 @@
register __syscall_arg_t __a3 asm ("$7") = _arg4; \
register __syscall_arg_t __a4 asm ("$8") = _arg5; \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "+r" (__a3) \
: input, "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a4) \
: __SYSCALL_CLOBBERS); \
@@ -269,10 +257,8 @@
register __syscall_arg_t __a4 asm ("$8") = _arg5; \
register __syscall_arg_t __a5 asm ("$9") = _arg6; \
__asm__ volatile ( \
- ".set\tnoreorder\n\t" \
v0_init \
"syscall\n\t" \
- ".set\treorder" \
: "=r" (__v0), "+r" (__a3) \
: input, "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a4), \
"r" (__a5) \
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 02/11] Fix rtld link_map initialization issues
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 00/11] " Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 01/11] Updates for microMIPS Release 6 Aleksandar Rakic
@ 2025-01-23 13:42 ` Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 03/11] Fix issues with removing no-reorder directives Aleksandar Rakic
` (8 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:42 UTC (permalink / raw)
To: libc-alpha
Cc: aleksandar.rakic, djordje.todorovic, cfu, Matthew Fortune,
Faraz Shahbazker
Import patch fixing rtld link_map initialization issues from:
https://sourceware.org/ml/libc-alpha/2015-03/msg00704.html
Author: Sandra Loosemore
Cherry-picked 1507c7be47ef07d4b264168ab031d8c2ed4678f2
from https://github.com/MIPS/glibc
Signed-off-by: Matthew Fortune <matthew.fortune@imgtec.com>
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
elf/rtld.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/elf/rtld.c b/elf/rtld.c
index 1e2e9ad5a8..252f4d6666 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -522,7 +522,7 @@ _dl_start (void *arg)
rtld_timer_start (&info.start_time);
#endif
- /* Partly clean the `bootstrap_map' structure up. Don't use
+ /* Zero-initialize the `bootstrap_map' structure. Don't use
`memset' since it might not be built in or inlined and we cannot
make function calls at this point. Use '__builtin_memset' if we
know it is available. We do not have to clear the memory if we
@@ -530,12 +530,14 @@ _dl_start (void *arg)
are initialized to zero by default. */
#ifndef DONT_USE_BOOTSTRAP_MAP
# ifdef HAVE_BUILTIN_MEMSET
- __builtin_memset (bootstrap_map.l_info, '\0', sizeof (bootstrap_map.l_info));
+ __builtin_memset (&bootstrap_map, '\0', sizeof (struct link_map));
# else
- for (size_t cnt = 0;
- cnt < sizeof (bootstrap_map.l_info) / sizeof (bootstrap_map.l_info[0]);
- ++cnt)
- bootstrap_map.l_info[cnt] = 0;
+ {
+ char *p = (char *) &bootstrap_map;
+ char *pend = p + sizeof (struct link_map);
+ while (p < pend)
+ *(p++) = '\0';
+ }
# endif
#endif
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 03/11] Fix issues with removing no-reorder directives
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (2 preceding siblings ...)
2025-01-23 13:42 ` [PATCH 02/11] Fix rtld link_map initialization issues Aleksandar Rakic
@ 2025-01-23 13:42 ` Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 04/11] Add C implementation of memcpy/memset Aleksandar Rakic
` (7 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:42 UTC (permalink / raw)
To: libc-alpha
Cc: aleksandar.rakic, djordje.todorovic, cfu, Andrew Bennett,
Faraz Shahbazker
1. Added -O2 to the Makefile to ensure that assembly sources have
their delay slots filled.
2. Also move the no-reorder directive into the PIC section of the
setjmp code.
Cherry-picked 4e451260675b2e54535eafc2df35d92653acd084
from https://github.com/MIPS/glibc
Signed-off-by: Andrew Bennett <andrew.bennett@imgtec.com>
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/mips/Makefile | 2 ++
sysdeps/mips/bsd-setjmp.S | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/sysdeps/mips/Makefile b/sysdeps/mips/Makefile
index d189973aa0..17ddc2a97c 100644
--- a/sysdeps/mips/Makefile
+++ b/sysdeps/mips/Makefile
@@ -18,9 +18,11 @@ CPPFLAGS-crtn.S += $(pic-ccflag)
endif
ASFLAGS-.os += $(pic-ccflag)
+
# libc.a and libc_p.a must be compiled with -fPIE/-fpie for static PIE.
ASFLAGS-.o += $(pie-default)
ASFLAGS-.op += $(pie-default)
+ASFLAGS += -O2
ifeq ($(subdir),elf)
diff --git a/sysdeps/mips/bsd-setjmp.S b/sysdeps/mips/bsd-setjmp.S
index 7e4d7dcb0b..8c06b9957c 100644
--- a/sysdeps/mips/bsd-setjmp.S
+++ b/sysdeps/mips/bsd-setjmp.S
@@ -28,8 +28,8 @@
.option pic2
#endif
ENTRY (setjmp)
- .set noreorder
#ifdef __PIC__
+ .set noreorder
.cpload t9
.set reorder
la t9, C_SYMBOL_NAME (__sigsetjmp)
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 04/11] Add C implementation of memcpy/memset
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (3 preceding siblings ...)
2025-01-23 13:42 ` [PATCH 03/11] Fix issues with removing no-reorder directives Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 05/11] Add optimized assembly for strcmp Aleksandar Rakic
` (6 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu, Faraz Shahbazker
Add improved C implementation of memcpy/memset and remove corresponding
.S files.
Cherry-picked 6b74133706246af94b71e4154e4ca09482828c9f
from https://github.com/MIPS/glibc
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/mips/memcpy.S | 886 ------------------------------------------
sysdeps/mips/memcpy.c | 415 ++++++++++++++++++++
sysdeps/mips/memset.S | 430 --------------------
sysdeps/mips/memset.c | 187 +++++++++
4 files changed, 602 insertions(+), 1316 deletions(-)
delete mode 100644 sysdeps/mips/memcpy.S
create mode 100644 sysdeps/mips/memcpy.c
delete mode 100644 sysdeps/mips/memset.S
create mode 100644 sysdeps/mips/memset.c
diff --git a/sysdeps/mips/memcpy.S b/sysdeps/mips/memcpy.S
deleted file mode 100644
index 96d1c92d89..0000000000
--- a/sysdeps/mips/memcpy.S
+++ /dev/null
@@ -1,886 +0,0 @@
-/* Copyright (C) 2012-2024 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library. If not, see
- <https://www.gnu.org/licenses/>. */
-
-#ifdef ANDROID_CHANGES
-# include "machine/asm.h"
-# include "machine/regdef.h"
-# define USE_MEMMOVE_FOR_OVERLAP
-# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD_STREAMED
-# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE
-#elif _LIBC
-# include <sysdep.h>
-# include <regdef.h>
-# include <sys/asm.h>
-# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD_STREAMED
-# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE
-#elif defined _COMPILING_NEWLIB
-# include "machine/asm.h"
-# include "machine/regdef.h"
-# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD_STREAMED
-# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE
-#else
-# include <regdef.h>
-# include <sys/asm.h>
-#endif
-
-#if (_MIPS_ISA == _MIPS_ISA_MIPS4) || (_MIPS_ISA == _MIPS_ISA_MIPS5) || \
- (_MIPS_ISA == _MIPS_ISA_MIPS32) || (_MIPS_ISA == _MIPS_ISA_MIPS64)
-# ifndef DISABLE_PREFETCH
-# define USE_PREFETCH
-# endif
-#endif
-
-#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABI64) || (_MIPS_SIM == _ABIN32))
-# ifndef DISABLE_DOUBLE
-# define USE_DOUBLE
-# endif
-#endif
-
-/* Some asm.h files do not have the L macro definition. */
-#ifndef L
-# if _MIPS_SIM == _ABIO32
-# define L(label) $L ## label
-# else
-# define L(label) .L ## label
-# endif
-#endif
-
-/* Some asm.h files do not have the PTR_ADDIU macro definition. */
-#ifndef PTR_ADDIU
-# ifdef USE_DOUBLE
-# define PTR_ADDIU daddiu
-# else
-# define PTR_ADDIU addiu
-# endif
-#endif
-
-/* Some asm.h files do not have the PTR_SRA macro definition. */
-#ifndef PTR_SRA
-# ifdef USE_DOUBLE
-# define PTR_SRA dsra
-# else
-# define PTR_SRA sra
-# endif
-#endif
-
-/* New R6 instructions that may not be in asm.h. */
-#ifndef PTR_LSA
-# if _MIPS_SIM == _ABI64
-# define PTR_LSA dlsa
-# else
-# define PTR_LSA lsa
-# endif
-#endif
-
-#if __mips_isa_rev > 5 && defined (__mips_micromips)
-# define PTR_BC bc16
-#else
-# define PTR_BC bc
-#endif
-
-/*
- * Using PREFETCH_HINT_LOAD_STREAMED instead of PREFETCH_LOAD on load
- * prefetches appear to offer a slight performance advantage.
- *
- * Using PREFETCH_HINT_PREPAREFORSTORE instead of PREFETCH_STORE
- * or PREFETCH_STORE_STREAMED offers a large performance advantage
- * but PREPAREFORSTORE has some special restrictions to consider.
- *
- * Prefetch with the 'prepare for store' hint does not copy a memory
- * location into the cache, it just allocates a cache line and zeros
- * it out. This means that if you do not write to the entire cache
- * line before writing it out to memory some data will get zero'ed out
- * when the cache line is written back to memory and data will be lost.
- *
- * Also if you are using this memcpy to copy overlapping buffers it may
- * not behave correctly when using the 'prepare for store' hint. If you
- * use the 'prepare for store' prefetch on a memory area that is in the
- * memcpy source (as well as the memcpy destination), then you will get
- * some data zero'ed out before you have a chance to read it and data will
- * be lost.
- *
- * If you are going to use this memcpy routine with the 'prepare for store'
- * prefetch you may want to set USE_MEMMOVE_FOR_OVERLAP in order to avoid
- * the problem of running memcpy on overlapping buffers.
- *
- * There are ifdef'ed sections of this memcpy to make sure that it does not
- * do prefetches on cache lines that are not going to be completely written.
- * This code is only needed and only used when PREFETCH_STORE_HINT is set to
- * PREFETCH_HINT_PREPAREFORSTORE. This code assumes that cache lines are
- * 32 bytes and if the cache line is larger it will not work correctly.
- */
-
-#ifdef USE_PREFETCH
-# define PREFETCH_HINT_LOAD 0
-# define PREFETCH_HINT_STORE 1
-# define PREFETCH_HINT_LOAD_STREAMED 4
-# define PREFETCH_HINT_STORE_STREAMED 5
-# define PREFETCH_HINT_LOAD_RETAINED 6
-# define PREFETCH_HINT_STORE_RETAINED 7
-# define PREFETCH_HINT_WRITEBACK_INVAL 25
-# define PREFETCH_HINT_PREPAREFORSTORE 30
-
-/*
- * If we have not picked out what hints to use at this point use the
- * standard load and store prefetch hints.
- */
-# ifndef PREFETCH_STORE_HINT
-# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE
-# endif
-# ifndef PREFETCH_LOAD_HINT
-# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD
-# endif
-
-/*
- * We double everything when USE_DOUBLE is true so we do 2 prefetches to
- * get 64 bytes in that case. The assumption is that each individual
- * prefetch brings in 32 bytes.
- */
-
-# ifdef USE_DOUBLE
-# define PREFETCH_CHUNK 64
-# define PREFETCH_FOR_LOAD(chunk, reg) \
- pref PREFETCH_LOAD_HINT, (chunk)*64(reg); \
- pref PREFETCH_LOAD_HINT, ((chunk)*64)+32(reg)
-# define PREFETCH_FOR_STORE(chunk, reg) \
- pref PREFETCH_STORE_HINT, (chunk)*64(reg); \
- pref PREFETCH_STORE_HINT, ((chunk)*64)+32(reg)
-# else
-# define PREFETCH_CHUNK 32
-# define PREFETCH_FOR_LOAD(chunk, reg) \
- pref PREFETCH_LOAD_HINT, (chunk)*32(reg)
-# define PREFETCH_FOR_STORE(chunk, reg) \
- pref PREFETCH_STORE_HINT, (chunk)*32(reg)
-# endif
-/* MAX_PREFETCH_SIZE is the maximum size of a prefetch, it must not be less
- * than PREFETCH_CHUNK, the assumed size of each prefetch. If the real size
- * of a prefetch is greater than MAX_PREFETCH_SIZE and the PREPAREFORSTORE
- * hint is used, the code will not work correctly. If PREPAREFORSTORE is not
- * used then MAX_PREFETCH_SIZE does not matter. */
-# define MAX_PREFETCH_SIZE 128
-/* PREFETCH_LIMIT is set based on the fact that we never use an offset greater
- * than 5 on a STORE prefetch and that a single prefetch can never be larger
- * than MAX_PREFETCH_SIZE. We add the extra 32 when USE_DOUBLE is set because
- * we actually do two prefetches in that case, one 32 bytes after the other. */
-# ifdef USE_DOUBLE
-# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + 32 + MAX_PREFETCH_SIZE
-# else
-# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + MAX_PREFETCH_SIZE
-# endif
-# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) \
- && ((PREFETCH_CHUNK * 4) < MAX_PREFETCH_SIZE)
-/* We cannot handle this because the initial prefetches may fetch bytes that
- * are before the buffer being copied. We start copies with an offset
- * of 4 so avoid this situation when using PREPAREFORSTORE. */
-#error "PREFETCH_CHUNK is too large and/or MAX_PREFETCH_SIZE is too small."
-# endif
-#else /* USE_PREFETCH not defined */
-# define PREFETCH_FOR_LOAD(offset, reg)
-# define PREFETCH_FOR_STORE(offset, reg)
-#endif
-
-#if __mips_isa_rev > 5
-# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
-# undef PREFETCH_STORE_HINT
-# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE_STREAMED
-# endif
-# define R6_CODE
-#endif
-
-/* Allow the routine to be named something else if desired. */
-#ifndef MEMCPY_NAME
-# define MEMCPY_NAME memcpy
-#endif
-
-/* We use these 32/64 bit registers as temporaries to do the copying. */
-#define REG0 t0
-#define REG1 t1
-#define REG2 t2
-#define REG3 t3
-#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABIO32) || (_MIPS_SIM == _ABIO64))
-# define REG4 t4
-# define REG5 t5
-# define REG6 t6
-# define REG7 t7
-#else
-# define REG4 ta0
-# define REG5 ta1
-# define REG6 ta2
-# define REG7 ta3
-#endif
-
-/* We load/store 64 bits at a time when USE_DOUBLE is true.
- * The C_ prefix stands for CHUNK and is used to avoid macro name
- * conflicts with system header files. */
-
-#ifdef USE_DOUBLE
-# define C_ST sd
-# define C_LD ld
-# ifdef __MIPSEB
-# define C_LDHI ldl /* high part is left in big-endian */
-# define C_STHI sdl /* high part is left in big-endian */
-# define C_LDLO ldr /* low part is right in big-endian */
-# define C_STLO sdr /* low part is right in big-endian */
-# else
-# define C_LDHI ldr /* high part is right in little-endian */
-# define C_STHI sdr /* high part is right in little-endian */
-# define C_LDLO ldl /* low part is left in little-endian */
-# define C_STLO sdl /* low part is left in little-endian */
-# endif
-# define C_ALIGN dalign /* r6 align instruction */
-#else
-# define C_ST sw
-# define C_LD lw
-# ifdef __MIPSEB
-# define C_LDHI lwl /* high part is left in big-endian */
-# define C_STHI swl /* high part is left in big-endian */
-# define C_LDLO lwr /* low part is right in big-endian */
-# define C_STLO swr /* low part is right in big-endian */
-# else
-# define C_LDHI lwr /* high part is right in little-endian */
-# define C_STHI swr /* high part is right in little-endian */
-# define C_LDLO lwl /* low part is left in little-endian */
-# define C_STLO swl /* low part is left in little-endian */
-# endif
-# define C_ALIGN align /* r6 align instruction */
-#endif
-
-/* Bookkeeping values for 32 vs. 64 bit mode. */
-#ifdef USE_DOUBLE
-# define NSIZE 8
-# define NSIZEMASK 0x3f
-# define NSIZEDMASK 0x7f
-#else
-# define NSIZE 4
-# define NSIZEMASK 0x1f
-# define NSIZEDMASK 0x3f
-#endif
-#define UNIT(unit) ((unit)*NSIZE)
-#define UNITM1(unit) (((unit)*NSIZE)-1)
-
-#ifdef ANDROID_CHANGES
-LEAF(MEMCPY_NAME, 0)
-#else
-LEAF(MEMCPY_NAME)
-#endif
- .set nomips16
-/*
- * Below we handle the case where memcpy is called with overlapping src and dst.
- * Although memcpy is not required to handle this case, some parts of Android
- * like Skia rely on such usage. We call memmove to handle such cases.
- */
-#ifdef USE_MEMMOVE_FOR_OVERLAP
- PTR_SUBU t0,a0,a1
- PTR_SRA t2,t0,31
- xor t1,t0,t2
- PTR_SUBU t0,t1,t2
- sltu t2,t0,a2
- la t9,memmove
- beq t2,zero,L(memcpy)
- jr t9
-L(memcpy):
-#endif
-/*
- * If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of
- * size, copy dst pointer to v0 for the return value.
- */
- slti t2,a2,(2 * NSIZE)
-#if defined(RETURN_FIRST_PREFETCH) || defined(RETURN_LAST_PREFETCH)
- move v0,zero
-#else
- move v0,a0
-#endif
- bne t2,zero,L(lasts)
-
-#ifndef R6_CODE
-
-/*
- * If src and dst have different alignments, go to L(unaligned), if they
- * have the same alignment (but are not actually aligned) do a partial
- * load/store to make them aligned. If they are both already aligned
- * we can start copying at L(aligned).
- */
- xor t8,a1,a0
- andi t8,t8,(NSIZE-1) /* t8 is a0/a1 word-displacement */
- PTR_SUBU a3, zero, a0
- bne t8,zero,L(unaligned)
-
- andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */
- PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */
- beq a3,zero,L(aligned) /* if a3=0, it is already aligned */
-
- C_LDHI t8,0(a1)
- PTR_ADDU a1,a1,a3
- C_STHI t8,0(a0)
- PTR_ADDU a0,a0,a3
-
-#else /* R6_CODE */
-
-/*
- * Align the destination and hope that the source gets aligned too. If it
- * doesn't we jump to L(r6_unaligned*) to do unaligned copies using the r6
- * align instruction.
- */
- andi t8,a0,7
-#ifdef __mips_micromips
- auipc t9,%pcrel_hi(L(atable))
- addiu t9,t9,%pcrel_lo(L(atable)+4)
- PTR_LSA t9,t8,t9,1
-#else
- lapc t9,L(atable)
- PTR_LSA t9,t8,t9,2
-#endif
- jrc t9
-L(atable):
- PTR_BC L(lb0)
- PTR_BC L(lb7)
- PTR_BC L(lb6)
- PTR_BC L(lb5)
- PTR_BC L(lb4)
- PTR_BC L(lb3)
- PTR_BC L(lb2)
- PTR_BC L(lb1)
-L(lb7):
- lb a3, 6(a1)
- sb a3, 6(a0)
-L(lb6):
- lb a3, 5(a1)
- sb a3, 5(a0)
-L(lb5):
- lb a3, 4(a1)
- sb a3, 4(a0)
-L(lb4):
- lb a3, 3(a1)
- sb a3, 3(a0)
-L(lb3):
- lb a3, 2(a1)
- sb a3, 2(a0)
-L(lb2):
- lb a3, 1(a1)
- sb a3, 1(a0)
-L(lb1):
- lb a3, 0(a1)
- sb a3, 0(a0)
-
- li t9,8
- subu t8,t9,t8
- PTR_SUBU a2,a2,t8
- PTR_ADDU a0,a0,t8
- PTR_ADDU a1,a1,t8
-L(lb0):
-
- andi t8,a1,(NSIZE-1)
-#ifdef __mips_micromips
- auipc t9,%pcrel_hi(L(jtable))
- addiu t9,t9,%pcrel_lo(L(jtable)+4)
- PTR_LSA t9,t8,t9,1
-#else
- lapc t9,L(jtable)
- PTR_LSA t9,t8,t9,2
-#endif
- jrc t9
-L(jtable):
- PTR_BC L(aligned)
- PTR_BC L(r6_unaligned1)
- PTR_BC L(r6_unaligned2)
- PTR_BC L(r6_unaligned3)
-#ifdef USE_DOUBLE
- PTR_BC L(r6_unaligned4)
- PTR_BC L(r6_unaligned5)
- PTR_BC L(r6_unaligned6)
- PTR_BC L(r6_unaligned7)
-#endif
-#endif /* R6_CODE */
-
-L(aligned):
-
-/*
- * Now dst/src are both aligned to (word or double word) aligned addresses
- * Set a2 to count how many bytes we have to copy after all the 64/128 byte
- * chunks are copied and a3 to the dst pointer after all the 64/128 byte
- * chunks have been copied. We will loop, incrementing a0 and a1 until a0
- * equals a3.
- */
-
- andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */
- PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */
- beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */
- PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */
-
-/* When in the loop we may prefetch with the 'prepare to store' hint,
- * in this case the a0+x should not be past the "t0-32" address. This
- * means: for x=128 the last "safe" a0 address is "t0-160". Alternatively,
- * for x=64 the last "safe" a0 address is "t0-96" In the current version we
- * will use "prefetch hint,128(a0)", so "t0-160" is the limit.
- */
-#if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
- PTR_ADDU t0,a0,a2 /* t0 is the "past the end" address */
- PTR_SUBU t9,t0,PREFETCH_LIMIT /* t9 is the "last safe pref" address */
-#endif
- PREFETCH_FOR_LOAD (0, a1)
- PREFETCH_FOR_LOAD (1, a1)
- PREFETCH_FOR_LOAD (2, a1)
- PREFETCH_FOR_LOAD (3, a1)
-#if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE)
- PREFETCH_FOR_STORE (1, a0)
- PREFETCH_FOR_STORE (2, a0)
- PREFETCH_FOR_STORE (3, a0)
-#endif
-#if defined(RETURN_FIRST_PREFETCH) && defined(USE_PREFETCH)
-# if PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE
- sltu v1,t9,a0
- bgtz v1,L(skip_set)
- PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4)
-L(skip_set):
-# else
- PTR_ADDIU v0,a0,(PREFETCH_CHUNK*1)
-# endif
-#endif
-#if defined(RETURN_LAST_PREFETCH) && defined(USE_PREFETCH) \
- && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE)
- PTR_ADDIU v0,a0,(PREFETCH_CHUNK*3)
-# ifdef USE_DOUBLE
- PTR_ADDIU v0,v0,32
-# endif
-#endif
-L(loop16w):
- C_LD t0,UNIT(0)(a1)
-/* We need to separate out the C_LD instruction here so that it will work
- both when it is used by itself and when it is used with the branch
- instruction. */
-#if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
- sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */
- C_LD t1,UNIT(1)(a1)
- bgtz v1,L(skip_pref)
-#else
- C_LD t1,UNIT(1)(a1)
-#endif
-#ifdef R6_CODE
- PREFETCH_FOR_STORE (2, a0)
-#else
- PREFETCH_FOR_STORE (4, a0)
- PREFETCH_FOR_STORE (5, a0)
-#endif
-#if defined(RETURN_LAST_PREFETCH) && defined(USE_PREFETCH)
- PTR_ADDIU v0,a0,(PREFETCH_CHUNK*5)
-# ifdef USE_DOUBLE
- PTR_ADDIU v0,v0,32
-# endif
-#endif
-L(skip_pref):
- C_LD REG2,UNIT(2)(a1)
- C_LD REG3,UNIT(3)(a1)
- C_LD REG4,UNIT(4)(a1)
- C_LD REG5,UNIT(5)(a1)
- C_LD REG6,UNIT(6)(a1)
- C_LD REG7,UNIT(7)(a1)
-#ifdef R6_CODE
- PREFETCH_FOR_LOAD (3, a1)
-#else
- PREFETCH_FOR_LOAD (4, a1)
-#endif
- C_ST t0,UNIT(0)(a0)
- C_ST t1,UNIT(1)(a0)
- C_ST REG2,UNIT(2)(a0)
- C_ST REG3,UNIT(3)(a0)
- C_ST REG4,UNIT(4)(a0)
- C_ST REG5,UNIT(5)(a0)
- C_ST REG6,UNIT(6)(a0)
- C_ST REG7,UNIT(7)(a0)
-
- C_LD t0,UNIT(8)(a1)
- C_LD t1,UNIT(9)(a1)
- C_LD REG2,UNIT(10)(a1)
- C_LD REG3,UNIT(11)(a1)
- C_LD REG4,UNIT(12)(a1)
- C_LD REG5,UNIT(13)(a1)
- C_LD REG6,UNIT(14)(a1)
- C_LD REG7,UNIT(15)(a1)
-#ifndef R6_CODE
- PREFETCH_FOR_LOAD (5, a1)
-#endif
- C_ST t0,UNIT(8)(a0)
- C_ST t1,UNIT(9)(a0)
- C_ST REG2,UNIT(10)(a0)
- C_ST REG3,UNIT(11)(a0)
- C_ST REG4,UNIT(12)(a0)
- C_ST REG5,UNIT(13)(a0)
- C_ST REG6,UNIT(14)(a0)
- C_ST REG7,UNIT(15)(a0)
- PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */
- PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */
- bne a0,a3,L(loop16w)
- move a2,t8
-
-/* Here we have src and dest word-aligned but less than 64-bytes or
- * 128 bytes to go. Check for a 32(64) byte chunk and copy if there
- * is one. Otherwise jump down to L(chk1w) to handle the tail end of
- * the copy.
- */
-
-L(chkw):
- PREFETCH_FOR_LOAD (0, a1)
- andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */
- /* The t8 is the reminder count past 32-bytes */
- beq a2,t8,L(chk1w) /* When a2=t8, no 32-byte chunk */
- C_LD t0,UNIT(0)(a1)
- C_LD t1,UNIT(1)(a1)
- C_LD REG2,UNIT(2)(a1)
- C_LD REG3,UNIT(3)(a1)
- C_LD REG4,UNIT(4)(a1)
- C_LD REG5,UNIT(5)(a1)
- C_LD REG6,UNIT(6)(a1)
- C_LD REG7,UNIT(7)(a1)
- PTR_ADDIU a1,a1,UNIT(8)
- C_ST t0,UNIT(0)(a0)
- C_ST t1,UNIT(1)(a0)
- C_ST REG2,UNIT(2)(a0)
- C_ST REG3,UNIT(3)(a0)
- C_ST REG4,UNIT(4)(a0)
- C_ST REG5,UNIT(5)(a0)
- C_ST REG6,UNIT(6)(a0)
- C_ST REG7,UNIT(7)(a0)
- PTR_ADDIU a0,a0,UNIT(8)
-
-/*
- * Here we have less than 32(64) bytes to copy. Set up for a loop to
- * copy one word (or double word) at a time. Set a2 to count how many
- * bytes we have to copy after all the word (or double word) chunks are
- * copied and a3 to the dst pointer after all the (d)word chunks have
- * been copied. We will loop, incrementing a0 and a1 until a0 equals a3.
- */
-L(chk1w):
- andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */
- PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */
- beq a2,t8,L(lastw)
- PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */
-
-/* copying in words (4-byte or 8-byte chunks) */
-L(wordCopy_loop):
- C_LD REG3,UNIT(0)(a1)
- PTR_ADDIU a0,a0,UNIT(1)
- PTR_ADDIU a1,a1,UNIT(1)
- C_ST REG3,UNIT(-1)(a0)
- bne a0,a3,L(wordCopy_loop)
-
-/* If we have been copying double words, see if we can copy a single word
- before doing byte copies. We can have, at most, one word to copy. */
-
-L(lastw):
-#ifdef USE_DOUBLE
- andi t8,a2,3 /* a2 is the remainder past 4 byte chunks. */
- beq t8,a2,L(lastb)
- move a2,t8
- lw REG3,0(a1)
- sw REG3,0(a0)
- PTR_ADDIU a0,a0,4
- PTR_ADDIU a1,a1,4
-#endif
-
-/* Copy the last 8 (or 16) bytes */
-L(lastb):
- PTR_ADDU a3,a0,a2 /* a3 is the last dst address */
- blez a2,L(leave)
-L(lastbloop):
- lb v1,0(a1)
- PTR_ADDIU a0,a0,1
- PTR_ADDIU a1,a1,1
- sb v1,-1(a0)
- bne a0,a3,L(lastbloop)
-L(leave):
- jr ra
-
-/* We jump here with a memcpy of less than 8 or 16 bytes, depending on
- whether or not USE_DOUBLE is defined. Instead of just doing byte
- copies, check the alignment and size and use lw/sw if possible.
- Otherwise, do byte copies. */
-
-L(lasts):
- andi t8,a2,3
- beq t8,a2,L(lastb)
-
- andi t9,a0,3
- bne t9,zero,L(lastb)
- andi t9,a1,3
- bne t9,zero,L(lastb)
-
- PTR_SUBU a3,a2,t8
- PTR_ADDU a3,a0,a3
-
-L(wcopy_loop):
- lw REG3,0(a1)
- PTR_ADDIU a0,a0,4
- PTR_ADDIU a1,a1,4
- bne a0,a3,L(wcopy_loop)
- sw REG3,-4(a0)
-
- b L(lastb)
- move a2,t8
-
-#ifndef R6_CODE
-/*
- * UNALIGNED case, got here with a3 = "negu a0"
- * This code is nearly identical to the aligned code above
- * but only the destination (not the source) gets aligned
- * so we need to do partial loads of the source followed
- * by normal stores to the destination (once we have aligned
- * the destination).
- */
-
-L(unaligned):
- andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */
- PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */
- beqz a3,L(ua_chk16w) /* if a3=0, it is already aligned */
-
- C_LDHI v1,UNIT(0)(a1)
- C_LDLO v1,UNITM1(1)(a1)
- PTR_ADDU a1,a1,a3
- C_STHI v1,UNIT(0)(a0)
- PTR_ADDU a0,a0,a3
-
-/*
- * Now the destination (but not the source) is aligned
- * Set a2 to count how many bytes we have to copy after all the 64/128 byte
- * chunks are copied and a3 to the dst pointer after all the 64/128 byte
- * chunks have been copied. We will loop, incrementing a0 and a1 until a0
- * equals a3.
- */
-
-L(ua_chk16w):
- andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */
- PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */
- beq a2,t8,L(ua_chkw) /* if a2==t8, no 64-byte/128-byte chunks */
- PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */
-
-# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
- PTR_ADDU t0,a0,a2 /* t0 is the "past the end" address */
- PTR_SUBU t9,t0,PREFETCH_LIMIT /* t9 is the "last safe pref" address */
-# endif
- PREFETCH_FOR_LOAD (0, a1)
- PREFETCH_FOR_LOAD (1, a1)
- PREFETCH_FOR_LOAD (2, a1)
-# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE)
- PREFETCH_FOR_STORE (1, a0)
- PREFETCH_FOR_STORE (2, a0)
- PREFETCH_FOR_STORE (3, a0)
-# endif
-# if defined(RETURN_FIRST_PREFETCH) && defined(USE_PREFETCH)
-# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
- sltu v1,t9,a0
- bgtz v1,L(ua_skip_set)
- PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4)
-L(ua_skip_set):
-# else
- PTR_ADDIU v0,a0,(PREFETCH_CHUNK*1)
-# endif
-# endif
-L(ua_loop16w):
- PREFETCH_FOR_LOAD (3, a1)
- C_LDHI t0,UNIT(0)(a1)
- C_LDHI t1,UNIT(1)(a1)
- C_LDHI REG2,UNIT(2)(a1)
-/* We need to separate out the C_LDHI instruction here so that it will work
- both when it is used by itself and when it is used with the branch
- instruction. */
-# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
- sltu v1,t9,a0
- C_LDHI REG3,UNIT(3)(a1)
- bgtz v1,L(ua_skip_pref)
-# else
- C_LDHI REG3,UNIT(3)(a1)
-# endif
- PREFETCH_FOR_STORE (4, a0)
- PREFETCH_FOR_STORE (5, a0)
-L(ua_skip_pref):
- C_LDHI REG4,UNIT(4)(a1)
- C_LDHI REG5,UNIT(5)(a1)
- C_LDHI REG6,UNIT(6)(a1)
- C_LDHI REG7,UNIT(7)(a1)
- C_LDLO t0,UNITM1(1)(a1)
- C_LDLO t1,UNITM1(2)(a1)
- C_LDLO REG2,UNITM1(3)(a1)
- C_LDLO REG3,UNITM1(4)(a1)
- C_LDLO REG4,UNITM1(5)(a1)
- C_LDLO REG5,UNITM1(6)(a1)
- C_LDLO REG6,UNITM1(7)(a1)
- C_LDLO REG7,UNITM1(8)(a1)
- PREFETCH_FOR_LOAD (4, a1)
- C_ST t0,UNIT(0)(a0)
- C_ST t1,UNIT(1)(a0)
- C_ST REG2,UNIT(2)(a0)
- C_ST REG3,UNIT(3)(a0)
- C_ST REG4,UNIT(4)(a0)
- C_ST REG5,UNIT(5)(a0)
- C_ST REG6,UNIT(6)(a0)
- C_ST REG7,UNIT(7)(a0)
- C_LDHI t0,UNIT(8)(a1)
- C_LDHI t1,UNIT(9)(a1)
- C_LDHI REG2,UNIT(10)(a1)
- C_LDHI REG3,UNIT(11)(a1)
- C_LDHI REG4,UNIT(12)(a1)
- C_LDHI REG5,UNIT(13)(a1)
- C_LDHI REG6,UNIT(14)(a1)
- C_LDHI REG7,UNIT(15)(a1)
- C_LDLO t0,UNITM1(9)(a1)
- C_LDLO t1,UNITM1(10)(a1)
- C_LDLO REG2,UNITM1(11)(a1)
- C_LDLO REG3,UNITM1(12)(a1)
- C_LDLO REG4,UNITM1(13)(a1)
- C_LDLO REG5,UNITM1(14)(a1)
- C_LDLO REG6,UNITM1(15)(a1)
- C_LDLO REG7,UNITM1(16)(a1)
- PREFETCH_FOR_LOAD (5, a1)
- C_ST t0,UNIT(8)(a0)
- C_ST t1,UNIT(9)(a0)
- C_ST REG2,UNIT(10)(a0)
- C_ST REG3,UNIT(11)(a0)
- C_ST REG4,UNIT(12)(a0)
- C_ST REG5,UNIT(13)(a0)
- C_ST REG6,UNIT(14)(a0)
- C_ST REG7,UNIT(15)(a0)
- PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */
- PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */
- bne a0,a3,L(ua_loop16w)
- move a2,t8
-
-/* Here we have src and dest word-aligned but less than 64-bytes or
- * 128 bytes to go. Check for a 32(64) byte chunk and copy if there
- * is one. Otherwise jump down to L(ua_chk1w) to handle the tail end of
- * the copy. */
-
-L(ua_chkw):
- PREFETCH_FOR_LOAD (0, a1)
- andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */
- /* t8 is the reminder count past 32-bytes */
- beq a2,t8,L(ua_chk1w) /* When a2=t8, no 32-byte chunk */
- C_LDHI t0,UNIT(0)(a1)
- C_LDHI t1,UNIT(1)(a1)
- C_LDHI REG2,UNIT(2)(a1)
- C_LDHI REG3,UNIT(3)(a1)
- C_LDHI REG4,UNIT(4)(a1)
- C_LDHI REG5,UNIT(5)(a1)
- C_LDHI REG6,UNIT(6)(a1)
- C_LDHI REG7,UNIT(7)(a1)
- C_LDLO t0,UNITM1(1)(a1)
- C_LDLO t1,UNITM1(2)(a1)
- C_LDLO REG2,UNITM1(3)(a1)
- C_LDLO REG3,UNITM1(4)(a1)
- C_LDLO REG4,UNITM1(5)(a1)
- C_LDLO REG5,UNITM1(6)(a1)
- C_LDLO REG6,UNITM1(7)(a1)
- C_LDLO REG7,UNITM1(8)(a1)
- PTR_ADDIU a1,a1,UNIT(8)
- C_ST t0,UNIT(0)(a0)
- C_ST t1,UNIT(1)(a0)
- C_ST REG2,UNIT(2)(a0)
- C_ST REG3,UNIT(3)(a0)
- C_ST REG4,UNIT(4)(a0)
- C_ST REG5,UNIT(5)(a0)
- C_ST REG6,UNIT(6)(a0)
- C_ST REG7,UNIT(7)(a0)
- PTR_ADDIU a0,a0,UNIT(8)
-/*
- * Here we have less than 32(64) bytes to copy. Set up for a loop to
- * copy one word (or double word) at a time.
- */
-L(ua_chk1w):
- andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */
- PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */
- beq a2,t8,L(ua_smallCopy)
- PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */
-
-/* copying in words (4-byte or 8-byte chunks) */
-L(ua_wordCopy_loop):
- C_LDHI v1,UNIT(0)(a1)
- C_LDLO v1,UNITM1(1)(a1)
- PTR_ADDIU a0,a0,UNIT(1)
- PTR_ADDIU a1,a1,UNIT(1)
- C_ST v1,UNIT(-1)(a0)
- bne a0,a3,L(ua_wordCopy_loop)
-
-/* Copy the last 8 (or 16) bytes */
-L(ua_smallCopy):
- PTR_ADDU a3,a0,a2 /* a3 is the last dst address */
- beqz a2,L(leave)
-L(ua_smallCopy_loop):
- lb v1,0(a1)
- PTR_ADDIU a0,a0,1
- PTR_ADDIU a1,a1,1
- sb v1,-1(a0)
- bne a0,a3,L(ua_smallCopy_loop)
-
- jr ra
-
-#else /* R6_CODE */
-
-# ifdef __MIPSEB
-# define SWAP_REGS(X,Y) X, Y
-# define ALIGN_OFFSET(N) (N)
-# else
-# define SWAP_REGS(X,Y) Y, X
-# define ALIGN_OFFSET(N) (NSIZE-N)
-# endif
-# define R6_UNALIGNED_WORD_COPY(BYTEOFFSET) \
- andi REG7, a2, (NSIZE-1);/* REG7 is # of bytes to by bytes. */ \
- PTR_SUBU a3, a2, REG7; /* a3 is number of bytes to be copied in */ \
- /* (d)word chunks. */ \
- beq REG7, a2, L(lastb); /* Check for bytes to copy by word */ \
- move a2, REG7; /* a2 is # of bytes to copy byte by byte */ \
- /* after word loop is finished. */ \
- PTR_ADDU REG6, a0, a3; /* REG6 is the dst address after loop. */ \
- PTR_SUBU REG2, a1, t8; /* REG2 is the aligned src address. */ \
- PTR_ADDU a1, a1, a3; /* a1 is addr of source after word loop. */ \
- C_LD t0, UNIT(0)(REG2); /* Load first part of source. */ \
-L(r6_ua_wordcopy##BYTEOFFSET): \
- C_LD t1, UNIT(1)(REG2); /* Load second part of source. */ \
- C_ALIGN REG3, SWAP_REGS(t1,t0), ALIGN_OFFSET(BYTEOFFSET); \
- PTR_ADDIU a0, a0, UNIT(1); /* Increment destination pointer. */ \
- PTR_ADDIU REG2, REG2, UNIT(1); /* Increment aligned source pointer.*/ \
- move t0, t1; /* Move second part of source to first. */ \
- C_ST REG3, UNIT(-1)(a0); \
- bne a0, REG6,L(r6_ua_wordcopy##BYTEOFFSET); \
- j L(lastb); \
-
- /* We are generating R6 code, the destination is 4 byte aligned and
- the source is not 4 byte aligned. t8 is 1, 2, or 3 depending on the
- alignment of the source. */
-
-L(r6_unaligned1):
- R6_UNALIGNED_WORD_COPY(1)
-L(r6_unaligned2):
- R6_UNALIGNED_WORD_COPY(2)
-L(r6_unaligned3):
- R6_UNALIGNED_WORD_COPY(3)
-# ifdef USE_DOUBLE
-L(r6_unaligned4):
- R6_UNALIGNED_WORD_COPY(4)
-L(r6_unaligned5):
- R6_UNALIGNED_WORD_COPY(5)
-L(r6_unaligned6):
- R6_UNALIGNED_WORD_COPY(6)
-L(r6_unaligned7):
- R6_UNALIGNED_WORD_COPY(7)
-# endif
-#endif /* R6_CODE */
-
- .set at
-END(MEMCPY_NAME)
-#ifndef ANDROID_CHANGES
-# ifdef _LIBC
-libc_hidden_builtin_def (MEMCPY_NAME)
-# endif
-#endif
diff --git a/sysdeps/mips/memcpy.c b/sysdeps/mips/memcpy.c
new file mode 100644
index 0000000000..8c3aec7b36
--- /dev/null
+++ b/sysdeps/mips/memcpy.c
@@ -0,0 +1,415 @@
+/*
+ * Copyright (C) 2024 MIPS Tech, LLC
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials provided with the distribution.
+ * 3. Neither the name of the copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived from this
+ * software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#ifdef __GNUC__
+
+#undef memcpy
+
+/* Typical observed latency in cycles in fetching from DRAM. */
+#define LATENCY_CYCLES 63
+
+/* Pre-fetch performance is subject to accurate prefetch ahead,
+ which in turn depends on both the cache-line size and the amount
+ of look-ahead. Since cache-line size is not nominally fixed in
+ a typically library built for multiple platforms, we make conservative
+ assumptions in the default case. This code will typically operate
+ on such conservative assumptions, but if compiled with the correct
+ -mtune=xx options, will perform even better on those specific
+ platforms. */
+#if defined(_MIPS_TUNE_OCTEON2) || defined(_MIPS_TUNE_OCTEON3)
+ #define CACHE_LINE 128
+ #define BLOCK_CYCLES 30
+ #undef LATENCY_CYCLES
+ #define LATENCY_CYCLES 150
+#elif defined(_MIPS_TUNE_I6400) || defined(_MIPS_TUNE_I6500)
+ #define CACHE_LINE 64
+ #define BLOCK_CYCLES 16
+#elif defined(_MIPS_TUNE_P6600)
+ #define CACHE_LINE 32
+ #define BLOCK_CYCLES 12
+#elif defined(_MIPS_TUNE_INTERAPTIV) || defined(_MIPS_TUNE_INTERAPTIV_MR2)
+ #define CACHE_LINE 32
+ #define BLOCK_CYCLES 30
+#else
+ #define CACHE_LINE 32
+ #define BLOCK_CYCLES 11
+#endif
+
+/* Pre-fetch look ahead = ceil (latency / block-cycles) */
+#define PREF_AHEAD (LATENCY_CYCLES / BLOCK_CYCLES \
+ + ((LATENCY_CYCLES % BLOCK_CYCLES) == 0 ? 0 : 1))
+
+/* Unroll-factor, controls how many words at a time in the core loop. */
+#define BLOCK (CACHE_LINE == 128 ? 16 : 8)
+
+#define __overloadable
+#if !defined(UNALIGNED_INSTR_SUPPORT)
+/* does target have unaligned lw/ld/ualw/uald instructions? */
+ #define UNALIGNED_INSTR_SUPPORT 0
+#if (__mips_isa_rev < 6 && !defined(__mips1))
+ #undef UNALIGNED_INSTR_SUPPORT
+ #define UNALIGNED_INSTR_SUPPORT 1
+ #endif
+#endif
+#if !defined(HW_UNALIGNED_SUPPORT)
+/* Does target have hardware support for unaligned accesses? */
+ #define HW_UNALIGNED_SUPPORT 0
+ #if __mips_isa_rev >= 6
+ #undef HW_UNALIGNED_SUPPORT
+ #define HW_UNALIGNED_SUPPORT 1
+ #endif
+#endif
+#define ENABLE_PREFETCH 1
+#if ENABLE_PREFETCH
+ #define PREFETCH(addr) __builtin_prefetch (addr, 0, 0)
+#else
+ #define PREFETCH(addr)
+#endif
+
+#include <string.h>
+
+#ifdef __mips64
+typedef unsigned long long reg_t;
+typedef struct
+{
+ reg_t B0:8, B1:8, B2:8, B3:8, B4:8, B5:8, B6:8, B7:8;
+} bits_t;
+#else
+typedef unsigned long reg_t;
+typedef struct
+{
+ reg_t B0:8, B1:8, B2:8, B3:8;
+} bits_t;
+#endif
+
+#define CACHE_LINES_PER_BLOCK ((BLOCK * sizeof (reg_t) > CACHE_LINE) ? \
+ (BLOCK * sizeof (reg_t) / CACHE_LINE) \
+ : 1)
+
+typedef union
+{
+ reg_t v;
+ bits_t b;
+} bitfields_t;
+
+#define DO_BYTE(a, i) \
+ a[i] = bw.b.B##i; \
+ len--; \
+ if(!len) return ret; \
+
+/* This code is called when aligning a pointer, there are remaining bytes
+ after doing word compares, or architecture does not have some form
+ of unaligned support. */
+static inline void * __attribute__ ((always_inline))
+do_bytes (void *a, const void *b, unsigned long len, void *ret)
+{
+ unsigned char *x = (unsigned char *) a;
+ unsigned char *y = (unsigned char *) b;
+ unsigned long i;
+ /* 'len' might be zero here, so preloading the first two values
+ before the loop may access unallocated memory. */
+ for (i = 0; i < len; i++)
+ {
+ *x = *y;
+ x++;
+ y++;
+ }
+ return ret;
+}
+
+/* This code is called to copy only remaining bytes within word or doubleword */
+static inline void * __attribute__ ((always_inline))
+do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret)
+{
+ unsigned char *x = (unsigned char *) a;
+ bitfields_t bw;
+ if(len > 0)
+ {
+ bw.v = *(reg_t *)b;
+ DO_BYTE(x, 0);
+ DO_BYTE(x, 1);
+ DO_BYTE(x, 2);
+#ifdef __mips64
+ DO_BYTE(x, 3);
+ DO_BYTE(x, 4);
+ DO_BYTE(x, 5);
+ DO_BYTE(x, 6);
+#endif
+ }
+ return ret;
+}
+
+static inline void * __attribute__ ((always_inline))
+do_words_remaining (reg_t *a, const reg_t *b, unsigned long words,
+ unsigned long bytes, void *ret)
+{
+ /* Use a set-back so that load/stores have incremented addresses in
+ order to promote bonding. */
+ int off = (BLOCK - words);
+ a -= off;
+ b -= off;
+ switch (off)
+ {
+ case 1: a[1] = b[1]; // Fall through
+ case 2: a[2] = b[2]; // Fall through
+ case 3: a[3] = b[3]; // Fall through
+ case 4: a[4] = b[4]; // Fall through
+ case 5: a[5] = b[5]; // Fall through
+ case 6: a[6] = b[6]; // Fall through
+ case 7: a[7] = b[7]; // Fall through
+#if BLOCK==16
+ case 8: a[8] = b[8]; // Fall through
+ case 9: a[9] = b[9]; // Fall through
+ case 10: a[10] = b[10]; // Fall through
+ case 11: a[11] = b[11]; // Fall through
+ case 12: a[12] = b[12]; // Fall through
+ case 13: a[13] = b[13]; // Fall through
+ case 14: a[14] = b[14]; // Fall through
+ case 15: a[15] = b[15];
+#endif
+ }
+ return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret);
+}
+
+#if !HW_UNALIGNED_SUPPORT
+#if UNALIGNED_INSTR_SUPPORT
+/* For MIPS GCC, there are no unaligned builtins - so this struct forces
+ the compiler to treat the pointer access as unaligned. */
+struct ulw
+{
+ reg_t uli;
+} __attribute__ ((packed));
+static inline void * __attribute__ ((always_inline))
+do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words,
+ unsigned long bytes, void *ret)
+{
+ /* Use a set-back so that load/stores have incremented addresses in
+ order to promote bonding. */
+ int off = (BLOCK - words);
+ a -= off;
+ b -= off;
+ switch (off)
+ {
+ case 1: a[1].uli = b[1]; // Fall through
+ case 2: a[2].uli = b[2]; // Fall through
+ case 3: a[3].uli = b[3]; // Fall through
+ case 4: a[4].uli = b[4]; // Fall through
+ case 5: a[5].uli = b[5]; // Fall through
+ case 6: a[6].uli = b[6]; // Fall through
+ case 7: a[7].uli = b[7]; // Fall through
+#if BLOCK==16
+ case 8: a[8].uli = b[8]; // Fall through
+ case 9: a[9].uli = b[9]; // Fall through
+ case 10: a[10].uli = b[10]; // Fall through
+ case 11: a[11].uli = b[11]; // Fall through
+ case 12: a[12].uli = b[12]; // Fall through
+ case 13: a[13].uli = b[13]; // Fall through
+ case 14: a[14].uli = b[14]; // Fall through
+ case 15: a[15].uli = b[15];
+#endif
+ }
+ return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret);
+}
+
+/* The first pointer is not aligned while second pointer is. */
+static void *
+unaligned_words (struct ulw *a, const reg_t * b,
+ unsigned long words, unsigned long bytes, void *ret)
+{
+ unsigned long i, words_by_block, words_by_1;
+ words_by_1 = words % BLOCK;
+ words_by_block = words / BLOCK;
+ for (; words_by_block > 0; words_by_block--)
+ {
+ if (words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK)
+ for (i = 0; i < CACHE_LINES_PER_BLOCK; i++)
+ PREFETCH (b + (BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i));
+
+ reg_t y0 = b[0], y1 = b[1], y2 = b[2], y3 = b[3];
+ reg_t y4 = b[4], y5 = b[5], y6 = b[6], y7 = b[7];
+ a[0].uli = y0;
+ a[1].uli = y1;
+ a[2].uli = y2;
+ a[3].uli = y3;
+ a[4].uli = y4;
+ a[5].uli = y5;
+ a[6].uli = y6;
+ a[7].uli = y7;
+#if BLOCK==16
+ y0 = b[8], y1 = b[9], y2 = b[10], y3 = b[11];
+ y4 = b[12], y5 = b[13], y6 = b[14], y7 = b[15];
+ a[8].uli = y0;
+ a[9].uli = y1;
+ a[10].uli = y2;
+ a[11].uli = y3;
+ a[12].uli = y4;
+ a[13].uli = y5;
+ a[14].uli = y6;
+ a[15].uli = y7;
+#endif
+ a += BLOCK;
+ b += BLOCK;
+ }
+
+ /* Mop up any remaining bytes. */
+ return do_uwords_remaining (a, b, words_by_1, bytes, ret);
+}
+
+#else
+
+/* No HW support or unaligned lw/ld/ualw/uald instructions. */
+static void *
+unaligned_words (reg_t * a, const reg_t * b,
+ unsigned long words, unsigned long bytes, void *ret)
+{
+ unsigned long i;
+ unsigned char *x;
+ for (i = 0; i < words; i++)
+ {
+ bitfields_t bw;
+ bw.v = *((reg_t*) b);
+ x = (unsigned char *) a;
+ x[0] = bw.b.B0;
+ x[1] = bw.b.B1;
+ x[2] = bw.b.B2;
+ x[3] = bw.b.B3;
+#ifdef __mips64
+ x[4] = bw.b.B4;
+ x[5] = bw.b.B5;
+ x[6] = bw.b.B6;
+ x[7] = bw.b.B7;
+#endif
+ a += 1;
+ b += 1;
+ }
+ /* Mop up any remaining bytes. */
+ return do_bytes_remaining (a, b, bytes, ret);
+}
+
+#endif /* UNALIGNED_INSTR_SUPPORT */
+#endif /* HW_UNALIGNED_SUPPORT */
+
+/* both pointers are aligned, or first isn't and HW support for unaligned. */
+static void *
+aligned_words (reg_t * a, const reg_t * b,
+ unsigned long words, unsigned long bytes, void *ret)
+{
+ unsigned long i, words_by_block, words_by_1;
+ words_by_1 = words % BLOCK;
+ words_by_block = words / BLOCK;
+ for (; words_by_block > 0; words_by_block--)
+ {
+ if(words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK)
+ for (i = 0; i < CACHE_LINES_PER_BLOCK; i++)
+ PREFETCH (b + ((BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i)));
+
+ reg_t x0 = b[0], x1 = b[1], x2 = b[2], x3 = b[3];
+ reg_t x4 = b[4], x5 = b[5], x6 = b[6], x7 = b[7];
+ a[0] = x0;
+ a[1] = x1;
+ a[2] = x2;
+ a[3] = x3;
+ a[4] = x4;
+ a[5] = x5;
+ a[6] = x6;
+ a[7] = x7;
+#if BLOCK==16
+ x0 = b[8], x1 = b[9], x2 = b[10], x3 = b[11];
+ x4 = b[12], x5 = b[13], x6 = b[14], x7 = b[15];
+ a[8] = x0;
+ a[9] = x1;
+ a[10] = x2;
+ a[11] = x3;
+ a[12] = x4;
+ a[13] = x5;
+ a[14] = x6;
+ a[15] = x7;
+#endif
+ a += BLOCK;
+ b += BLOCK;
+ }
+
+ /* mop up any remaining bytes. */
+ return do_words_remaining (a, b, words_by_1, bytes, ret);
+}
+
+void *
+memcpy (void *a, const void *b, size_t len) __overloadable
+{
+ unsigned long bytes, words, i;
+ void *ret = a;
+ /* shouldn't hit that often. */
+ if (len <= 8)
+ return do_bytes (a, b, len, a);
+
+ /* Start pre-fetches ahead of time. */
+ if (len > CACHE_LINE * (PREF_AHEAD - 1))
+ for (i = 1; i < PREF_AHEAD - 1; i++)
+ PREFETCH ((char *)b + CACHE_LINE * i);
+ else
+ for (i = 1; i < len / CACHE_LINE; i++)
+ PREFETCH ((char *)b + CACHE_LINE * i);
+
+ /* Align the second pointer to word/dword alignment.
+ Note that the pointer is only 32-bits for o32/n32 ABIs. For
+ n32, loads are done as 64-bit while address remains 32-bit. */
+ bytes = ((unsigned long) b) % (sizeof (reg_t));
+
+ if (bytes)
+ {
+ bytes = (sizeof (reg_t)) - bytes;
+ if (bytes > len)
+ bytes = len;
+ do_bytes (a, b, bytes, ret);
+ if (len == bytes)
+ return ret;
+ len -= bytes;
+ a = (void *) (((unsigned char *) a) + bytes);
+ b = (const void *) (((unsigned char *) b) + bytes);
+ }
+
+ /* Second pointer now aligned. */
+ words = len / sizeof (reg_t);
+ bytes = len % sizeof (reg_t);
+
+#if HW_UNALIGNED_SUPPORT
+ /* treat possible unaligned first pointer as aligned. */
+ return aligned_words (a, b, words, bytes, ret);
+#else
+ if (((unsigned long) a) % sizeof (reg_t) == 0)
+ return aligned_words (a, b, words, bytes, ret);
+ /* need to use unaligned instructions on first pointer. */
+ return unaligned_words (a, b, words, bytes, ret);
+#endif
+}
+
+libc_hidden_builtin_def (memcpy)
+
+#else
+#include <string/memcpy.c>
+#endif
diff --git a/sysdeps/mips/memset.S b/sysdeps/mips/memset.S
deleted file mode 100644
index 0c8375c9f5..0000000000
--- a/sysdeps/mips/memset.S
+++ /dev/null
@@ -1,430 +0,0 @@
-/* Copyright (C) 2013-2024 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library. If not, see
- <https://www.gnu.org/licenses/>. */
-
-#ifdef ANDROID_CHANGES
-# include "machine/asm.h"
-# include "machine/regdef.h"
-# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE
-#elif _LIBC
-# include <sysdep.h>
-# include <regdef.h>
-# include <sys/asm.h>
-# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE
-#elif defined _COMPILING_NEWLIB
-# include "machine/asm.h"
-# include "machine/regdef.h"
-# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE
-#else
-# include <regdef.h>
-# include <sys/asm.h>
-#endif
-
-/* Check to see if the MIPS architecture we are compiling for supports
- prefetching. */
-
-#if (__mips == 4) || (__mips == 5) || (__mips == 32) || (__mips == 64)
-# ifndef DISABLE_PREFETCH
-# define USE_PREFETCH
-# endif
-#endif
-
-#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABI64) || (_MIPS_SIM == _ABIN32))
-# ifndef DISABLE_DOUBLE
-# define USE_DOUBLE
-# endif
-#endif
-
-#ifndef USE_DOUBLE
-# ifndef DISABLE_DOUBLE_ALIGN
-# define DOUBLE_ALIGN
-# endif
-#endif
-
-
-/* Some asm.h files do not have the L macro definition. */
-#ifndef L
-# if _MIPS_SIM == _ABIO32
-# define L(label) $L ## label
-# else
-# define L(label) .L ## label
-# endif
-#endif
-
-/* Some asm.h files do not have the PTR_ADDIU macro definition. */
-#ifndef PTR_ADDIU
-# ifdef USE_DOUBLE
-# define PTR_ADDIU daddiu
-# else
-# define PTR_ADDIU addiu
-# endif
-#endif
-
-/* New R6 instructions that may not be in asm.h. */
-#ifndef PTR_LSA
-# if _MIPS_SIM == _ABI64
-# define PTR_LSA dlsa
-# else
-# define PTR_LSA lsa
-# endif
-#endif
-
-#if __mips_isa_rev > 5 && defined (__mips_micromips)
-# define PTR_BC bc16
-#else
-# define PTR_BC bc
-#endif
-
-/* Using PREFETCH_HINT_PREPAREFORSTORE instead of PREFETCH_STORE
- or PREFETCH_STORE_STREAMED offers a large performance advantage
- but PREPAREFORSTORE has some special restrictions to consider.
-
- Prefetch with the 'prepare for store' hint does not copy a memory
- location into the cache, it just allocates a cache line and zeros
- it out. This means that if you do not write to the entire cache
- line before writing it out to memory some data will get zero'ed out
- when the cache line is written back to memory and data will be lost.
-
- There are ifdef'ed sections of this memcpy to make sure that it does not
- do prefetches on cache lines that are not going to be completely written.
- This code is only needed and only used when PREFETCH_STORE_HINT is set to
- PREFETCH_HINT_PREPAREFORSTORE. This code assumes that cache lines are
- less than MAX_PREFETCH_SIZE bytes and if the cache line is larger it will
- not work correctly. */
-
-#ifdef USE_PREFETCH
-# define PREFETCH_HINT_STORE 1
-# define PREFETCH_HINT_STORE_STREAMED 5
-# define PREFETCH_HINT_STORE_RETAINED 7
-# define PREFETCH_HINT_PREPAREFORSTORE 30
-
-/* If we have not picked out what hints to use at this point use the
- standard load and store prefetch hints. */
-# ifndef PREFETCH_STORE_HINT
-# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE
-# endif
-
-/* We double everything when USE_DOUBLE is true so we do 2 prefetches to
- get 64 bytes in that case. The assumption is that each individual
- prefetch brings in 32 bytes. */
-# ifdef USE_DOUBLE
-# define PREFETCH_CHUNK 64
-# define PREFETCH_FOR_STORE(chunk, reg) \
- pref PREFETCH_STORE_HINT, (chunk)*64(reg); \
- pref PREFETCH_STORE_HINT, ((chunk)*64)+32(reg)
-# else
-# define PREFETCH_CHUNK 32
-# define PREFETCH_FOR_STORE(chunk, reg) \
- pref PREFETCH_STORE_HINT, (chunk)*32(reg)
-# endif
-
-/* MAX_PREFETCH_SIZE is the maximum size of a prefetch, it must not be less
- than PREFETCH_CHUNK, the assumed size of each prefetch. If the real size
- of a prefetch is greater than MAX_PREFETCH_SIZE and the PREPAREFORSTORE
- hint is used, the code will not work correctly. If PREPAREFORSTORE is not
- used than MAX_PREFETCH_SIZE does not matter. */
-# define MAX_PREFETCH_SIZE 128
-/* PREFETCH_LIMIT is set based on the fact that we never use an offset greater
- than 5 on a STORE prefetch and that a single prefetch can never be larger
- than MAX_PREFETCH_SIZE. We add the extra 32 when USE_DOUBLE is set because
- we actually do two prefetches in that case, one 32 bytes after the other. */
-# ifdef USE_DOUBLE
-# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + 32 + MAX_PREFETCH_SIZE
-# else
-# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + MAX_PREFETCH_SIZE
-# endif
-
-# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) \
- && ((PREFETCH_CHUNK * 4) < MAX_PREFETCH_SIZE)
-/* We cannot handle this because the initial prefetches may fetch bytes that
- are before the buffer being copied. We start copies with an offset
- of 4 so avoid this situation when using PREPAREFORSTORE. */
-# error "PREFETCH_CHUNK is too large and/or MAX_PREFETCH_SIZE is too small."
-# endif
-#else /* USE_PREFETCH not defined */
-# define PREFETCH_FOR_STORE(offset, reg)
-#endif
-
-#if __mips_isa_rev > 5
-# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
-# undef PREFETCH_STORE_HINT
-# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE_STREAMED
-# endif
-# define R6_CODE
-#endif
-
-/* Allow the routine to be named something else if desired. */
-#ifndef MEMSET_NAME
-# define MEMSET_NAME memset
-#endif
-
-/* We load/store 64 bits at a time when USE_DOUBLE is true.
- The C_ prefix stands for CHUNK and is used to avoid macro name
- conflicts with system header files. */
-
-#ifdef USE_DOUBLE
-# define C_ST sd
-# ifdef __MIPSEB
-# define C_STHI sdl /* high part is left in big-endian */
-# else
-# define C_STHI sdr /* high part is right in little-endian */
-# endif
-#else
-# define C_ST sw
-# ifdef __MIPSEB
-# define C_STHI swl /* high part is left in big-endian */
-# else
-# define C_STHI swr /* high part is right in little-endian */
-# endif
-#endif
-
-/* Bookkeeping values for 32 vs. 64 bit mode. */
-#ifdef USE_DOUBLE
-# define NSIZE 8
-# define NSIZEMASK 0x3f
-# define NSIZEDMASK 0x7f
-#else
-# define NSIZE 4
-# define NSIZEMASK 0x1f
-# define NSIZEDMASK 0x3f
-#endif
-#define UNIT(unit) ((unit)*NSIZE)
-#define UNITM1(unit) (((unit)*NSIZE)-1)
-
-#ifdef ANDROID_CHANGES
-LEAF(MEMSET_NAME,0)
-#else
-LEAF(MEMSET_NAME)
-#endif
-
- .set nomips16
-/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of
- size, copy dst pointer to v0 for the return value. */
- slti t2,a2,(4 * NSIZE)
- move v0,a0
- bne t2,zero,L(lastb)
-
-/* If memset value is not zero, we copy it to all the bytes in a 32 or 64
- bit word. */
- PTR_SUBU a3,zero,a0
- beq a1,zero,L(set0) /* If memset value is zero no smear */
- nop
-
- /* smear byte into 32 or 64 bit word */
-#if ((__mips == 64) || (__mips == 32)) && (__mips_isa_rev >= 2)
-# ifdef USE_DOUBLE
- dins a1, a1, 8, 8 /* Replicate fill byte into half-word. */
- dins a1, a1, 16, 16 /* Replicate fill byte into word. */
- dins a1, a1, 32, 32 /* Replicate fill byte into dbl word. */
-# else
- ins a1, a1, 8, 8 /* Replicate fill byte into half-word. */
- ins a1, a1, 16, 16 /* Replicate fill byte into word. */
-# endif
-#else
-# ifdef USE_DOUBLE
- and a1,0xff
- dsll t2,a1,8
- or a1,t2
- dsll t2,a1,16
- or a1,t2
- dsll t2,a1,32
- or a1,t2
-# else
- and a1,0xff
- sll t2,a1,8
- or a1,t2
- sll t2,a1,16
- or a1,t2
-# endif
-#endif
-
-/* If the destination address is not aligned do a partial store to get it
- aligned. If it is already aligned just jump to L(aligned). */
-L(set0):
-#ifndef R6_CODE
- andi t2,a3,(NSIZE-1) /* word-unaligned address? */
- PTR_SUBU a2,a2,t2
- beq t2,zero,L(aligned) /* t2 is the unalignment count */
- C_STHI a1,0(a0)
- PTR_ADDU a0,a0,t2
-#else /* R6_CODE */
- andi t2,a0,7
-# ifdef __mips_micromips
- auipc t9,%pcrel_hi(L(atable))
- addiu t9,t9,%pcrel_lo(L(atable)+4)
- PTR_LSA t9,t2,t9,1
-# else
- lapc t9,L(atable)
- PTR_LSA t9,t2,t9,2
-# endif
- jrc t9
-L(atable):
- PTR_BC L(aligned)
- PTR_BC L(lb7)
- PTR_BC L(lb6)
- PTR_BC L(lb5)
- PTR_BC L(lb4)
- PTR_BC L(lb3)
- PTR_BC L(lb2)
- PTR_BC L(lb1)
-L(lb7):
- sb a1,6(a0)
-L(lb6):
- sb a1,5(a0)
-L(lb5):
- sb a1,4(a0)
-L(lb4):
- sb a1,3(a0)
-L(lb3):
- sb a1,2(a0)
-L(lb2):
- sb a1,1(a0)
-L(lb1):
- sb a1,0(a0)
-
- li t9,NSIZE
- subu t2,t9,t2
- PTR_SUBU a2,a2,t2
- PTR_ADDU a0,a0,t2
-#endif /* R6_CODE */
-
-L(aligned):
-/* If USE_DOUBLE is not set we may still want to align the data on a 16
- byte boundary instead of an 8 byte boundary to maximize the opportunity
- of proAptiv chips to do memory bonding (combining two sequential 4
- byte stores into one 8 byte store). We know there are at least 4 bytes
- left to store or we would have jumped to L(lastb) earlier in the code. */
-#ifdef DOUBLE_ALIGN
- andi t2,a3,4
- PTR_SUBU a2,a2,t2
- beq t2,zero,L(double_aligned)
- sw a1,0(a0)
- PTR_ADDU a0,a0,t2
-L(double_aligned):
-#endif
-
-/* Now the destination is aligned to (word or double word) aligned address
- Set a2 to count how many bytes we have to copy after all the 64/128 byte
- chunks are copied and a3 to the dest pointer after all the 64/128 byte
- chunks have been copied. We will loop, incrementing a0 until it equals
- a3. */
- andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */
- PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */
- beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */
- PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */
-
-/* When in the loop we may prefetch with the 'prepare to store' hint,
- in this case the a0+x should not be past the "t0-32" address. This
- means: for x=128 the last "safe" a0 address is "t0-160". Alternatively,
- for x=64 the last "safe" a0 address is "t0-96" In the current version we
- will use "prefetch hint,128(a0)", so "t0-160" is the limit. */
-#if defined(USE_PREFETCH) \
- && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
- PTR_ADDU t0,a0,a2 /* t0 is the "past the end" address */
- PTR_SUBU t9,t0,PREFETCH_LIMIT /* t9 is the "last safe pref" address */
-#endif
-#if defined(USE_PREFETCH) \
- && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE)
- PREFETCH_FOR_STORE (1, a0)
- PREFETCH_FOR_STORE (2, a0)
- PREFETCH_FOR_STORE (3, a0)
-#endif
-
-L(loop16w):
-#if defined(USE_PREFETCH) \
- && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE)
- sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */
- bgtz v1,L(skip_pref)
-#endif
-#ifdef R6_CODE
- PREFETCH_FOR_STORE (2, a0)
-#else
- PREFETCH_FOR_STORE (4, a0)
- PREFETCH_FOR_STORE (5, a0)
-#endif
-L(skip_pref):
- C_ST a1,UNIT(0)(a0)
- C_ST a1,UNIT(1)(a0)
- C_ST a1,UNIT(2)(a0)
- C_ST a1,UNIT(3)(a0)
- C_ST a1,UNIT(4)(a0)
- C_ST a1,UNIT(5)(a0)
- C_ST a1,UNIT(6)(a0)
- C_ST a1,UNIT(7)(a0)
- C_ST a1,UNIT(8)(a0)
- C_ST a1,UNIT(9)(a0)
- C_ST a1,UNIT(10)(a0)
- C_ST a1,UNIT(11)(a0)
- C_ST a1,UNIT(12)(a0)
- C_ST a1,UNIT(13)(a0)
- C_ST a1,UNIT(14)(a0)
- C_ST a1,UNIT(15)(a0)
- PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */
- bne a0,a3,L(loop16w)
- move a2,t8
-
-/* Here we have dest word-aligned but less than 64-bytes or 128 bytes to go.
- Check for a 32(64) byte chunk and copy if there is one. Otherwise
- jump down to L(chk1w) to handle the tail end of the copy. */
-L(chkw):
- andi t8,a2,NSIZEMASK /* is there a 32-byte/64-byte chunk. */
- /* the t8 is the reminder count past 32-bytes */
- beq a2,t8,L(chk1w)/* when a2==t8, no 32-byte chunk */
- C_ST a1,UNIT(0)(a0)
- C_ST a1,UNIT(1)(a0)
- C_ST a1,UNIT(2)(a0)
- C_ST a1,UNIT(3)(a0)
- C_ST a1,UNIT(4)(a0)
- C_ST a1,UNIT(5)(a0)
- C_ST a1,UNIT(6)(a0)
- C_ST a1,UNIT(7)(a0)
- PTR_ADDIU a0,a0,UNIT(8)
-
-/* Here we have less than 32(64) bytes to set. Set up for a loop to
- copy one word (or double word) at a time. Set a2 to count how many
- bytes we have to copy after all the word (or double word) chunks are
- copied and a3 to the dest pointer after all the (d)word chunks have
- been copied. We will loop, incrementing a0 until a0 equals a3. */
-L(chk1w):
- andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */
- PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */
- beq a2,t8,L(lastb)
- PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */
-
-/* copying in words (4-byte or 8 byte chunks) */
-L(wordCopy_loop):
- PTR_ADDIU a0,a0,UNIT(1)
- C_ST a1,UNIT(-1)(a0)
- bne a0,a3,L(wordCopy_loop)
-
-/* Copy the last 8 (or 16) bytes */
-L(lastb):
- PTR_ADDU a3,a0,a2 /* a3 is the last dst address */
- blez a2,L(leave)
-L(lastbloop):
- PTR_ADDIU a0,a0,1
- sb a1,-1(a0)
- bne a0,a3,L(lastbloop)
-L(leave):
- jr ra
-
- .set at
-END(MEMSET_NAME)
-#ifndef ANDROID_CHANGES
-# ifdef _LIBC
-libc_hidden_builtin_def (MEMSET_NAME)
-# endif
-#endif
diff --git a/sysdeps/mips/memset.c b/sysdeps/mips/memset.c
new file mode 100644
index 0000000000..813b3bc0e6
--- /dev/null
+++ b/sysdeps/mips/memset.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2024 MIPS Tech, LLC
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials provided with the distribution.
+ * 3. Neither the name of the copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived from this
+ * software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#ifdef __GNUC__
+
+#undef memset
+
+#include <string.h>
+
+#if _MIPS_SIM == _ABIO32
+#define SIZEOF_reg_t 4
+typedef unsigned long reg_t;
+#else
+#define SIZEOF_reg_t 8
+typedef unsigned long long reg_t;
+#endif
+
+typedef struct bits8
+{
+ reg_t B0:8, B1:8, B2:8, B3:8;
+#if SIZEOF_reg_t == 8
+ reg_t B4:8, B5:8, B6:8, B7:8;
+#endif
+} bits8_t;
+typedef struct bits16
+{
+ reg_t B0:16, B1:16;
+#if SIZEOF_reg_t == 8
+ reg_t B2:16, B3:16;
+#endif
+} bits16_t;
+typedef struct bits32
+{
+ reg_t B0:32;
+#if SIZEOF_reg_t == 8
+ reg_t B1:32;
+#endif
+} bits32_t;
+
+/* This union assumes that small structures can be in registers. If
+ not, then memory accesses will be done - not optimal, but ok. */
+typedef union
+{
+ reg_t v;
+ bits8_t b8;
+ bits16_t b16;
+ bits32_t b32;
+} bitfields_t;
+
+/* This code is called when aligning a pointer or there are remaining bytes
+ after doing word sets. */
+static inline void * __attribute__ ((always_inline))
+do_bytes (void *a, void *retval, unsigned char fill, const unsigned long len)
+{
+ unsigned char *x = ((unsigned char *) a);
+ unsigned long i;
+
+ for (i = 0; i < len; i++)
+ *x++ = fill;
+
+ return retval;
+}
+
+/* Pointer is aligned. */
+static void *
+do_aligned_words (reg_t * a, void * retval, reg_t fill,
+ unsigned long words, unsigned long bytes)
+{
+ unsigned long i, words_by_1, words_by_16;
+
+ words_by_1 = words % 16;
+ words_by_16 = words / 16;
+
+ /*
+ * Note: prefetching the store memory is not beneficial on most
+ * cores since the ls/st unit has store buffers that will be filled
+ * before the cache line is actually needed.
+ *
+ * Also, using prepare-for-store cache op is problematic since we
+ * don't know the implementation-defined cache line length and we
+ * don't want to touch unintended memory.
+ */
+ for (i = 0; i < words_by_16; i++)
+ {
+ a[0] = fill;
+ a[1] = fill;
+ a[2] = fill;
+ a[3] = fill;
+ a[4] = fill;
+ a[5] = fill;
+ a[6] = fill;
+ a[7] = fill;
+ a[8] = fill;
+ a[9] = fill;
+ a[10] = fill;
+ a[11] = fill;
+ a[12] = fill;
+ a[13] = fill;
+ a[14] = fill;
+ a[15] = fill;
+ a += 16;
+ }
+
+ /* do remaining words. */
+ for (i = 0; i < words_by_1; i++)
+ *a++ = fill;
+
+ /* mop up any remaining bytes. */
+ return do_bytes (a, retval, fill, bytes);
+}
+
+void *
+memset (void *a, int ifill, size_t len)
+{
+ unsigned long bytes, words;
+ bitfields_t fill;
+ void *retval = (void *) a;
+
+ /* shouldn't hit that often. */
+ if (len < 16)
+ return do_bytes (a, retval, ifill, len);
+
+ /* Align the pointer to word/dword alignment.
+ Note that the pointer is only 32-bits for o32/n32 ABIs. For
+ n32, loads are done as 64-bit while address remains 32-bit. */
+ bytes = ((unsigned long) a) % (sizeof (reg_t) * 2);
+ if (bytes)
+ {
+ bytes = (sizeof (reg_t) * 2 - bytes);
+ if (bytes > len)
+ bytes = len;
+ do_bytes (a, retval, ifill, bytes);
+ if (len == bytes)
+ return retval;
+ len -= bytes;
+ a = (void *) (((unsigned char *) a) + bytes);
+ }
+
+ /* Create correct fill value for reg_t sized variable. */
+ if (ifill != 0)
+ {
+ fill.b8.B0 = (unsigned char) ifill;
+ fill.b8.B1 = fill.b8.B0;
+ fill.b16.B1 = fill.b16.B0;
+#if SIZEOF_reg_t == 8
+ fill.b32.B1 = fill.b32.B0;
+#endif
+ }
+ else
+ fill.v = 0;
+
+ words = len / sizeof (reg_t);
+ bytes = len % sizeof (reg_t);
+ return do_aligned_words (a, retval, fill.v, words, bytes);
+}
+
+
+libc_hidden_builtin_def (memset)
+
+#else
+#include <string/memset.c>
+#endif
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 05/11] Add optimized assembly for strcmp
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (4 preceding siblings ...)
2025-01-23 13:43 ` [PATCH 04/11] Add C implementation of memcpy/memset Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 06/11] Fix prefetching beyond copied memory Aleksandar Rakic
` (5 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu, Faraz Shahbazker
Cherry-picked ff356419673a5d122335dd81bd5726de7bc5e08f
from https://github.com/MIPS/glibc
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/mips/strcmp.S | 228 +++++++++++++++++++++++++-----------------
1 file changed, 137 insertions(+), 91 deletions(-)
diff --git a/sysdeps/mips/strcmp.S b/sysdeps/mips/strcmp.S
index 36379be021..4878cd3aac 100644
--- a/sysdeps/mips/strcmp.S
+++ b/sysdeps/mips/strcmp.S
@@ -1,4 +1,5 @@
/* Copyright (C) 2014-2024 Free Software Foundation, Inc.
+ Optimized strcmp for MIPS
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -22,9 +23,6 @@
# include <sysdep.h>
# include <regdef.h>
# include <sys/asm.h>
-#elif defined _COMPILING_NEWLIB
-# include "machine/asm.h"
-# include "machine/regdef.h"
#else
# include <regdef.h>
# include <sys/asm.h>
@@ -46,6 +44,10 @@
performance loss, so we are not turning it on by default. */
#if defined(ENABLE_CLZ) && (__mips_isa_rev > 1)
# define USE_CLZ
+#elif (__mips_isa_rev >= 2)
+# define USE_EXT 1
+#else
+# define USE_EXT 0
#endif
/* Some asm.h files do not have the L macro definition. */
@@ -66,6 +68,10 @@
# endif
#endif
+/* Haven't yet found a configuration where DSP code outperforms
+ normal assembly. */
+#define __mips_using_dsp 0
+
/* Allow the routine to be named something else if desired. */
#ifndef STRCMP_NAME
# define STRCMP_NAME strcmp
@@ -77,28 +83,35 @@ LEAF(STRCMP_NAME, 0)
LEAF(STRCMP_NAME)
#endif
.set nomips16
- .set noreorder
-
or t0, a0, a1
- andi t0,0x3
+ andi t0, t0, 0x3
bne t0, zero, L(byteloop)
/* Both strings are 4 byte aligned at this point. */
+ li t8, 0x01010101
+#if !__mips_using_dsp
+ li t9, 0x7f7f7f7f
+#endif
- lui t8, 0x0101
- ori t8, t8, 0x0101
- lui t9, 0x7f7f
- ori t9, 0x7f7f
-
-#define STRCMP32(OFFSET) \
- lw v0, OFFSET(a0); \
- lw v1, OFFSET(a1); \
- subu t0, v0, t8; \
- bne v0, v1, L(worddiff); \
- nor t1, v0, t9; \
- and t0, t0, t1; \
+#if __mips_using_dsp
+# define STRCMP32(OFFSET) \
+ lw a2, OFFSET(a0); \
+ lw a3, OFFSET(a1); \
+ subu_s.qb t0, t8, a2; \
+ bne a2, a3, L(worddiff); \
bne t0, zero, L(returnzero)
+#else /* !__mips_using_dsp */
+# define STRCMP32(OFFSET) \
+ lw a2, OFFSET(a0); \
+ lw a3, OFFSET(a1); \
+ subu t0, a2, t8; \
+ nor t1, a2, t9; \
+ bne a2, a3, L(worddiff); \
+ and t1, t0, t1; \
+ bne t1, zero, L(returnzero)
+#endif /* __mips_using_dsp */
+ .align 2
L(wordloop):
STRCMP32(0)
DELAY_READ
@@ -113,112 +126,143 @@ L(wordloop):
STRCMP32(20)
DELAY_READ
STRCMP32(24)
- DELAY_READ
- STRCMP32(28)
+ lw a2, 28(a0)
+ lw a3, 28(a1)
+#if __mips_using_dsp
+ subu_s.qb t0, t8, a2
+#else
+ subu t0, a2, t8
+ nor t1, a2, t9
+ and t1, t0, t1
+#endif
+
PTR_ADDIU a0, a0, 32
- b L(wordloop)
+ bne a2, a3, L(worddiff)
PTR_ADDIU a1, a1, 32
+ beq t1, zero, L(wordloop)
L(returnzero):
- j ra
move v0, zero
+ jr ra
+ .align 2
L(worddiff):
#ifdef USE_CLZ
- subu t0, v0, t8
- nor t1, v0, t9
- and t1, t0, t1
- xor t0, v0, v1
+ xor t0, a2, a3
or t0, t0, t1
# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
wsbh t0, t0
rotr t0, t0, 16
-# endif
+# endif /* LITTLE_ENDIAN */
clz t1, t0
- and t1, 0xf8
-# if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
- neg t1
- addu t1, 24
+ or t0, t1, 24 /* Only care about multiples of 8. */
+ xor t1, t1, t0 /* {0,8,16,24} => {24,16,8,0} */
+# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ sllv a2,a2,t1
+ sllv a3,a3,t1
+# else
+ srlv a2,a2,t1
+ srlv a3,a3,t1
# endif
- rotrv v0, v0, t1
- rotrv v1, v1, t1
- and v0, v0, 0xff
- and v1, v1, 0xff
- j ra
- subu v0, v0, v1
+ subu v0, a2, a3
+ jr ra
#else /* USE_CLZ */
# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
- andi t0, v0, 0xff
- beq t0, zero, L(wexit01)
- andi t1, v1, 0xff
- bne t0, t1, L(wexit01)
-
- srl t8, v0, 8
- srl t9, v1, 8
- andi t8, t8, 0xff
+ andi a0, a2, 0xff /* abcd => d */
+ andi a1, a3, 0xff
+ beq a0, zero, L(wexit01)
+# if USE_EXT
+ ext t8, a2, 8, 8
+ bne a0, a1, L(wexit01)
+ ext t9, a3, 8, 8
beq t8, zero, L(wexit89)
+ ext a0, a2, 16, 8
+ bne t8, t9, L(wexit89)
+ ext a1, a3, 16, 8
+# else /* !USE_EXT */
+ srl t8, a2, 8
+ bne a0, a1, L(wexit01)
+ srl t9, a3, 8
+ andi t8, t8, 0xff
andi t9, t9, 0xff
+ beq t8, zero, L(wexit89)
+ srl a0, a2, 16
bne t8, t9, L(wexit89)
+ srl a1, a3, 16
+ andi a0, a0, 0xff
+ andi a1, a1, 0xff
+# endif /* !USE_EXT */
- srl t0, v0, 16
- srl t1, v1, 16
- andi t0, t0, 0xff
- beq t0, zero, L(wexit01)
- andi t1, t1, 0xff
- bne t0, t1, L(wexit01)
-
- srl t8, v0, 24
- srl t9, v1, 24
# else /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */
- srl t0, v0, 24
- beq t0, zero, L(wexit01)
- srl t1, v1, 24
- bne t0, t1, L(wexit01)
+ srl a0, a2, 24 /* abcd => a */
+ srl a1, a3, 24
+ beq a0, zero, L(wexit01)
- srl t8, v0, 16
- srl t9, v1, 16
- andi t8, t8, 0xff
+# if USE_EXT
+ ext t8, a2, 16, 8
+ bne a0, a1, L(wexit01)
+ ext t9, a3, 16, 8
beq t8, zero, L(wexit89)
+ ext a0, a2, 8, 8
+ bne t8, t9, L(wexit89)
+ ext a1, a3, 8, 8
+# else /* ! USE_EXT */
+ srl t8, a2, 8
+ bne a0, a1, L(wexit01)
+ srl t9, a3, 8
+ andi t8, t8, 0xff
andi t9, t9, 0xff
+ beq t8, zero, L(wexit89)
+ srl a0, a2, 16
bne t8, t9, L(wexit89)
+ srl a1, a3, 16
+ andi a0, a0, 0xff
+ andi a1, a1, 0xff
+# endif /* USE_EXT */
- srl t0, v0, 8
- srl t1, v1, 8
- andi t0, t0, 0xff
- beq t0, zero, L(wexit01)
- andi t1, t1, 0xff
- bne t0, t1, L(wexit01)
-
- andi t8, v0, 0xff
- andi t9, v1, 0xff
# endif /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */
+ beq a0, zero, L(wexit01)
+ bne a0, a1, L(wexit01)
+
+ /* The other bytes are identical, so just subract the 2 words
+ and return the difference. */
+ move a0, a2
+ move a1, a3
+
+L(wexit01):
+ subu v0, a0, a1
+ jr ra
+
L(wexit89):
- j ra
subu v0, t8, t9
-L(wexit01):
- j ra
- subu v0, t0, t1
+ jr ra
+
#endif /* USE_CLZ */
+#define DELAY_NOP nop
+
/* It might seem better to do the 'beq' instruction between the two 'lbu'
instructions so that the nop is not needed but testing showed that this
code is actually faster (based on glibc strcmp test). */
-#define BYTECMP01(OFFSET) \
- lbu v0, OFFSET(a0); \
- lbu v1, OFFSET(a1); \
- beq v0, zero, L(bexit01); \
- nop; \
- bne v0, v1, L(bexit01)
-
-#define BYTECMP89(OFFSET) \
- lbu t8, OFFSET(a0); \
+
+#define BYTECMP01(OFFSET) \
+ lbu a3, OFFSET(a1); \
+ DELAY_NOP; \
+ beq a2, zero, L(bexit01); \
+ lbu t8, OFFSET+1(a0); \
+ bne a2, a3, L(bexit01)
+
+#define BYTECMP89(OFFSET) \
lbu t9, OFFSET(a1); \
+ DELAY_NOP; \
beq t8, zero, L(bexit89); \
- nop; \
+ lbu a2, OFFSET+1(a0); \
bne t8, t9, L(bexit89)
+ .align 2
L(byteloop):
+ lbu a2, 0(a0)
BYTECMP01(0)
BYTECMP89(1)
BYTECMP01(2)
@@ -226,20 +270,22 @@ L(byteloop):
BYTECMP01(4)
BYTECMP89(5)
BYTECMP01(6)
- BYTECMP89(7)
+ lbu t9, 7(a1)
+
PTR_ADDIU a0, a0, 8
- b L(byteloop)
+ beq t8, zero, L(bexit89)
PTR_ADDIU a1, a1, 8
+ beq t8, t9, L(byteloop)
-L(bexit01):
- j ra
- subu v0, v0, v1
L(bexit89):
- j ra
subu v0, t8, t9
+ jr ra
+
+L(bexit01):
+ subu v0, a2, a3
+ jr ra
.set at
- .set reorder
END(STRCMP_NAME)
#ifndef ANDROID_CHANGES
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 06/11] Fix prefetching beyond copied memory
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (5 preceding siblings ...)
2025-01-23 13:43 ` [PATCH 05/11] Add optimized assembly for strcmp Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 07/11] Fix strcmp bug for little endian target Aleksandar Rakic
` (4 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu, Faraz Shahbazker
GTM18-287/PP118771: memcpy prefetches beyond copied memory.
Fix prefetching in core loop to avoid exceeding the operated upon
memory region. Revert accidentally changed prefetch-hint back to
streaming mode. Refactor various bits and provide pre-processor
checks to allow parameters to be overridden from compiler command
line.
Cherry-picked 132e0bbbbed01f95ec88b68b5f7f2056f6125531
from https://github.com/MIPS/glibc
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/mips/memcpy.c | 188 +++++++++++++++++++++++++-----------------
1 file changed, 111 insertions(+), 77 deletions(-)
diff --git a/sysdeps/mips/memcpy.c b/sysdeps/mips/memcpy.c
index 8c3aec7b36..798e991f6d 100644
--- a/sysdeps/mips/memcpy.c
+++ b/sysdeps/mips/memcpy.c
@@ -1,37 +1,29 @@
-/*
- * Copyright (C) 2024 MIPS Tech, LLC
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are met:
- *
- * 1. Redistributions of source code must retain the above copyright notice,
- * this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright notice,
- * this list of conditions and the following disclaimer in the documentation
- * and/or other materials provided with the distribution.
- * 3. Neither the name of the copyright holder nor the names of its
- * contributors may be used to endorse or promote products derived from this
- * software without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
- * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- * POSSIBILITY OF SUCH DAMAGE.
-*/
+/* Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Wave Computing
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <http://www.gnu.org/licenses/>. */
#ifdef __GNUC__
#undef memcpy
/* Typical observed latency in cycles in fetching from DRAM. */
-#define LATENCY_CYCLES 63
+#ifndef LATENCY_CYCLES
+ #define LATENCY_CYCLES 63
+#endif
/* Pre-fetch performance is subject to accurate prefetch ahead,
which in turn depends on both the cache-line size and the amount
@@ -48,30 +40,42 @@
#define LATENCY_CYCLES 150
#elif defined(_MIPS_TUNE_I6400) || defined(_MIPS_TUNE_I6500)
#define CACHE_LINE 64
- #define BLOCK_CYCLES 16
+ #define BLOCK_CYCLES 15
#elif defined(_MIPS_TUNE_P6600)
#define CACHE_LINE 32
- #define BLOCK_CYCLES 12
+ #define BLOCK_CYCLES 15
#elif defined(_MIPS_TUNE_INTERAPTIV) || defined(_MIPS_TUNE_INTERAPTIV_MR2)
#define CACHE_LINE 32
#define BLOCK_CYCLES 30
#else
- #define CACHE_LINE 32
- #define BLOCK_CYCLES 11
+ #ifndef CACHE_LINE
+ #define CACHE_LINE 32
+ #endif
+ #ifndef BLOCK_CYCLES
+ #ifdef __nanomips__
+ #define BLOCK_CYCLES 20
+ #else
+ #define BLOCK_CYCLES 11
+ #endif
+ #endif
#endif
/* Pre-fetch look ahead = ceil (latency / block-cycles) */
#define PREF_AHEAD (LATENCY_CYCLES / BLOCK_CYCLES \
+ ((LATENCY_CYCLES % BLOCK_CYCLES) == 0 ? 0 : 1))
-/* Unroll-factor, controls how many words at a time in the core loop. */
-#define BLOCK (CACHE_LINE == 128 ? 16 : 8)
+/* The unroll-factor controls how many words at a time in the core loop. */
+#ifndef BLOCK_SIZE
+ #define BLOCK_SIZE (CACHE_LINE == 128 ? 16 : 8)
+#elif BLOCK_SIZE != 8 && BLOCK_SIZE != 16
+ #error "BLOCK_SIZE must be 8 or 16"
+#endif
#define __overloadable
#if !defined(UNALIGNED_INSTR_SUPPORT)
/* does target have unaligned lw/ld/ualw/uald instructions? */
#define UNALIGNED_INSTR_SUPPORT 0
-#if (__mips_isa_rev < 6 && !defined(__mips1))
+#if (__mips_isa_rev < 6 && !defined(__mips1)) || defined(__nanomips__)
#undef UNALIGNED_INSTR_SUPPORT
#define UNALIGNED_INSTR_SUPPORT 1
#endif
@@ -79,17 +83,35 @@
#if !defined(HW_UNALIGNED_SUPPORT)
/* Does target have hardware support for unaligned accesses? */
#define HW_UNALIGNED_SUPPORT 0
- #if __mips_isa_rev >= 6
+ #if __mips_isa_rev >= 6 && !defined(__nanomips__)
#undef HW_UNALIGNED_SUPPORT
#define HW_UNALIGNED_SUPPORT 1
#endif
#endif
-#define ENABLE_PREFETCH 1
+
+#ifndef ENABLE_PREFETCH
+ #define ENABLE_PREFETCH 1
+#endif
+
+#ifndef ENABLE_PREFETCH_CHECK
+ #define ENABLE_PREFETCH_CHECK 0
+#endif
+
#if ENABLE_PREFETCH
- #define PREFETCH(addr) __builtin_prefetch (addr, 0, 0)
-#else
+ #if ENABLE_PREFETCH_CHECK
+#include <assert.h>
+static char *limit;
+#define PREFETCH(addr) \
+ do { \
+ assert ((char *)(addr) < limit); \
+ __builtin_prefetch ((addr), 0, 1); \
+ } while (0)
+#else /* ENABLE_PREFETCH_CHECK */
+ #define PREFETCH(addr) __builtin_prefetch (addr, 0, 1)
+ #endif /* ENABLE_PREFETCH_CHECK */
+#else /* ENABLE_PREFETCH */
#define PREFETCH(addr)
-#endif
+#endif /* ENABLE_PREFETCH */
#include <string.h>
@@ -99,17 +121,18 @@ typedef struct
{
reg_t B0:8, B1:8, B2:8, B3:8, B4:8, B5:8, B6:8, B7:8;
} bits_t;
-#else
+#else /* __mips64 */
typedef unsigned long reg_t;
typedef struct
{
reg_t B0:8, B1:8, B2:8, B3:8;
} bits_t;
-#endif
+#endif /* __mips64 */
-#define CACHE_LINES_PER_BLOCK ((BLOCK * sizeof (reg_t) > CACHE_LINE) ? \
- (BLOCK * sizeof (reg_t) / CACHE_LINE) \
- : 1)
+#define CACHE_LINES_PER_BLOCK \
+ ((BLOCK_SIZE * sizeof (reg_t) > CACHE_LINE) \
+ ? (BLOCK_SIZE * sizeof (reg_t) / CACHE_LINE) \
+ : 1)
typedef union
{
@@ -120,7 +143,7 @@ typedef union
#define DO_BYTE(a, i) \
a[i] = bw.b.B##i; \
len--; \
- if(!len) return ret; \
+ if (!len) return ret; \
/* This code is called when aligning a pointer, there are remaining bytes
after doing word compares, or architecture does not have some form
@@ -148,7 +171,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret)
{
unsigned char *x = (unsigned char *) a;
bitfields_t bw;
- if(len > 0)
+ if (len > 0)
{
bw.v = *(reg_t *)b;
DO_BYTE(x, 0);
@@ -159,7 +182,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret)
DO_BYTE(x, 4);
DO_BYTE(x, 5);
DO_BYTE(x, 6);
-#endif
+#endif /* __mips64 */
}
return ret;
}
@@ -170,7 +193,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words,
{
/* Use a set-back so that load/stores have incremented addresses in
order to promote bonding. */
- int off = (BLOCK - words);
+ int off = (BLOCK_SIZE - words);
a -= off;
b -= off;
switch (off)
@@ -182,7 +205,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words,
case 5: a[5] = b[5]; // Fall through
case 6: a[6] = b[6]; // Fall through
case 7: a[7] = b[7]; // Fall through
-#if BLOCK==16
+#if BLOCK_SIZE==16
case 8: a[8] = b[8]; // Fall through
case 9: a[9] = b[9]; // Fall through
case 10: a[10] = b[10]; // Fall through
@@ -191,9 +214,9 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words,
case 13: a[13] = b[13]; // Fall through
case 14: a[14] = b[14]; // Fall through
case 15: a[15] = b[15];
-#endif
+#endif /* BLOCK_SIZE==16 */
}
- return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret);
+ return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret);
}
#if !HW_UNALIGNED_SUPPORT
@@ -210,7 +233,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words,
{
/* Use a set-back so that load/stores have incremented addresses in
order to promote bonding. */
- int off = (BLOCK - words);
+ int off = (BLOCK_SIZE - words);
a -= off;
b -= off;
switch (off)
@@ -222,7 +245,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words,
case 5: a[5].uli = b[5]; // Fall through
case 6: a[6].uli = b[6]; // Fall through
case 7: a[7].uli = b[7]; // Fall through
-#if BLOCK==16
+#if BLOCK_SIZE==16
case 8: a[8].uli = b[8]; // Fall through
case 9: a[9].uli = b[9]; // Fall through
case 10: a[10].uli = b[10]; // Fall through
@@ -231,9 +254,9 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words,
case 13: a[13].uli = b[13]; // Fall through
case 14: a[14].uli = b[14]; // Fall through
case 15: a[15].uli = b[15];
-#endif
+#endif /* BLOCK_SIZE==16 */
}
- return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret);
+ return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret);
}
/* The first pointer is not aligned while second pointer is. */
@@ -242,13 +265,19 @@ unaligned_words (struct ulw *a, const reg_t * b,
unsigned long words, unsigned long bytes, void *ret)
{
unsigned long i, words_by_block, words_by_1;
- words_by_1 = words % BLOCK;
- words_by_block = words / BLOCK;
+ words_by_1 = words % BLOCK_SIZE;
+ words_by_block = words / BLOCK_SIZE;
+
for (; words_by_block > 0; words_by_block--)
{
- if (words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK)
+ /* This condition is deliberately conservative. One could theoretically
+ pre-fetch another time around in some cases without crossing the page
+ boundary at the limit, but checking for the right conditions here is
+ too expensive to be worth it. */
+ if (words_by_block > PREF_AHEAD)
for (i = 0; i < CACHE_LINES_PER_BLOCK; i++)
- PREFETCH (b + (BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i));
+ PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK)
+ * (PREF_AHEAD + i)));
reg_t y0 = b[0], y1 = b[1], y2 = b[2], y3 = b[3];
reg_t y4 = b[4], y5 = b[5], y6 = b[6], y7 = b[7];
@@ -260,7 +289,7 @@ unaligned_words (struct ulw *a, const reg_t * b,
a[5].uli = y5;
a[6].uli = y6;
a[7].uli = y7;
-#if BLOCK==16
+#if BLOCK_SIZE==16
y0 = b[8], y1 = b[9], y2 = b[10], y3 = b[11];
y4 = b[12], y5 = b[13], y6 = b[14], y7 = b[15];
a[8].uli = y0;
@@ -271,16 +300,16 @@ unaligned_words (struct ulw *a, const reg_t * b,
a[13].uli = y5;
a[14].uli = y6;
a[15].uli = y7;
-#endif
- a += BLOCK;
- b += BLOCK;
+#endif /* BLOCK_SIZE==16 */
+ a += BLOCK_SIZE;
+ b += BLOCK_SIZE;
}
/* Mop up any remaining bytes. */
return do_uwords_remaining (a, b, words_by_1, bytes, ret);
}
-#else
+#else /* !UNALIGNED_INSTR_SUPPORT */
/* No HW support or unaligned lw/ld/ualw/uald instructions. */
static void *
@@ -320,13 +349,15 @@ aligned_words (reg_t * a, const reg_t * b,
unsigned long words, unsigned long bytes, void *ret)
{
unsigned long i, words_by_block, words_by_1;
- words_by_1 = words % BLOCK;
- words_by_block = words / BLOCK;
+ words_by_1 = words % BLOCK_SIZE;
+ words_by_block = words / BLOCK_SIZE;
+
for (; words_by_block > 0; words_by_block--)
{
- if(words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK)
+ if (words_by_block > PREF_AHEAD)
for (i = 0; i < CACHE_LINES_PER_BLOCK; i++)
- PREFETCH (b + ((BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i)));
+ PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK)
+ * (PREF_AHEAD + i)));
reg_t x0 = b[0], x1 = b[1], x2 = b[2], x3 = b[3];
reg_t x4 = b[4], x5 = b[5], x6 = b[6], x7 = b[7];
@@ -338,7 +369,7 @@ aligned_words (reg_t * a, const reg_t * b,
a[5] = x5;
a[6] = x6;
a[7] = x7;
-#if BLOCK==16
+#if BLOCK_SIZE==16
x0 = b[8], x1 = b[9], x2 = b[10], x3 = b[11];
x4 = b[12], x5 = b[13], x6 = b[14], x7 = b[15];
a[8] = x0;
@@ -349,9 +380,9 @@ aligned_words (reg_t * a, const reg_t * b,
a[13] = x5;
a[14] = x6;
a[15] = x7;
-#endif
- a += BLOCK;
- b += BLOCK;
+#endif /* BLOCK_SIZE==16 */
+ a += BLOCK_SIZE;
+ b += BLOCK_SIZE;
}
/* mop up any remaining bytes. */
@@ -363,13 +394,16 @@ memcpy (void *a, const void *b, size_t len) __overloadable
{
unsigned long bytes, words, i;
void *ret = a;
+#if ENABLE_PREFETCH_CHECK
+ limit = (char *)b + len;
+#endif /* ENABLE_PREFETCH_CHECK */
/* shouldn't hit that often. */
if (len <= 8)
return do_bytes (a, b, len, a);
/* Start pre-fetches ahead of time. */
- if (len > CACHE_LINE * (PREF_AHEAD - 1))
- for (i = 1; i < PREF_AHEAD - 1; i++)
+ if (len > CACHE_LINE * PREF_AHEAD)
+ for (i = 1; i < PREF_AHEAD; i++)
PREFETCH ((char *)b + CACHE_LINE * i);
else
for (i = 1; i < len / CACHE_LINE; i++)
@@ -400,12 +434,12 @@ memcpy (void *a, const void *b, size_t len) __overloadable
#if HW_UNALIGNED_SUPPORT
/* treat possible unaligned first pointer as aligned. */
return aligned_words (a, b, words, bytes, ret);
-#else
+#else /* !HW_UNALIGNED_SUPPORT */
if (((unsigned long) a) % sizeof (reg_t) == 0)
return aligned_words (a, b, words, bytes, ret);
/* need to use unaligned instructions on first pointer. */
return unaligned_words (a, b, words, bytes, ret);
-#endif
+#endif /* HW_UNALIGNED_SUPPORT */
}
libc_hidden_builtin_def (memcpy)
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 07/11] Fix strcmp bug for little endian target
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (6 preceding siblings ...)
2025-01-23 13:43 ` [PATCH 06/11] Fix prefetching beyond copied memory Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 16:20 ` Joseph Myers
2025-01-23 18:23 ` Adhemerval Zanella Netto
2025-01-23 13:43 ` [PATCH 08/11] Add script to run tests through a qemu wrapper Aleksandar Rakic
` (3 subsequent siblings)
11 siblings, 2 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu, Faraz Shahbazker
Strcmp gives incorrect result for little endian targets under
the following conditions:
1. Length of 1st string is 1 less than a multiple of 4 (i.e len%4=3)
2. First string is a prefix of the second string
3. The first differing character in the second string is extended
ASCII (that is > 127)
Cherry-picked 7c709e878f836069bbdbf42979937794623cfa68
from https://github.com/MIPS/glibc
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/mips/strcmp.S | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/sysdeps/mips/strcmp.S b/sysdeps/mips/strcmp.S
index 4878cd3aac..8d1bab12ec 100644
--- a/sysdeps/mips/strcmp.S
+++ b/sysdeps/mips/strcmp.S
@@ -225,10 +225,13 @@ L(worddiff):
beq a0, zero, L(wexit01)
bne a0, a1, L(wexit01)
- /* The other bytes are identical, so just subract the 2 words
- and return the difference. */
+# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ srl a0, a2, 24
+ srl a1, a3, 24
+# else /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */
move a0, a2
move a1, a3
+# endif /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */
L(wexit01):
subu v0, a0, a1
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 08/11] Add script to run tests through a qemu wrapper
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (7 preceding siblings ...)
2025-01-23 13:43 ` [PATCH 07/11] Fix strcmp bug for little endian target Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 09/11] Avoid warning from -Wbuiltin-declaration-mismatch Aleksandar Rakic
` (2 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu, Faraz Shahbazker
GTM19-545: Add script to run tests through a qemu wrapper
Cherry-picked 9f9923a4f14406026426d857acf9c2babe2908bf
from https://github.com/MIPS/glibc
Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
scripts/cross-test-qemu.sh | 152 +++++++++++++++++++++++++++++++++++++
1 file changed, 152 insertions(+)
create mode 100755 scripts/cross-test-qemu.sh
diff --git a/scripts/cross-test-qemu.sh b/scripts/cross-test-qemu.sh
new file mode 100755
index 0000000000..7636414141
--- /dev/null
+++ b/scripts/cross-test-qemu.sh
@@ -0,0 +1,152 @@
+#!/bin/bash
+# Run a testcase on a remote system, via qemu.
+# Copyright (C) 2024 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# usage: cross-test-qemu.sh HOST COMMAND ...
+# Run with --help flag to get more detailed help.
+
+progname="$(basename $0)"
+
+usage="usage: ${progname} [--ssh SSH] HOST COMMAND ..."
+timeoutfactor=$TIMEOUTFACTOR
+addon_libpath=""
+while [ $# -gt 0 ]; do
+ case "$1" in
+
+ "--timeoutfactor")
+ shift
+ if [ $# -lt 1 ]; then
+ break
+ fi
+ timeoutfactor="$1"
+ ;;
+
+ "--addon-libpath")
+ shift
+ if [ $# -lt 1 ]; then
+ break
+ fi
+ addon_libpath="$1"
+ ;;
+
+ "--help")
+ echo "$usage"
+ echo "$help"
+ exit 0
+ ;;
+
+ *)
+ break
+ ;;
+ esac
+ shift
+done
+
+if [ $# -lt 1 ]; then
+ echo "$usage" >&2
+ echo "Type '${progname} --help' for more detailed help." >&2
+ exit 1
+fi
+
+emulator="$1"; shift
+envpat="[:alpha:]*=.*"
+ldpat=".*/.*ld.*\.so.*"
+lgccpat="libgcc_s.so.1"
+libpat="--library-path"
+ldpath=""
+lgccpath=""
+envlist=""
+liblist=""
+command=""
+toolchain=`dirname \`dirname $emulator\``
+target=`ls $toolchain | grep -e linux-gnu`
+# Print the sequence of arguments as strings properly quoted for the
+# Bourne shell, separated by spaces.
+bourne_quote ()
+{
+ local arg qarg libflag variant
+ libflag=0
+
+ for arg in $@; do
+ if [ "x$done" != "x" ]; then
+ command="$command $arg"
+ elif [[ $arg =~ $envpat ]]; then
+ if [ -z $envlist ]; then
+ envlist="$arg"
+ else
+ envlist="$arg,$envlist"
+ fi
+ elif [[ $arg =~ $ldpat ]]; then
+ ldfile=`basename $arg`
+ variant=`basename \`dirname \\\`dirname $arg\\\`\``
+ libdir=${variant##*_}
+ variant=${variant%_*}
+ variant=${variant#obj_}
+ ldpath=$toolchain/sysroot/$variant
+ if [ ! -f $ldpath/$libdir/$ldfile ]; then
+ ldpath=`dirname $arg`
+ fi
+ lgccpath=$toolchain/$target/lib/$variant/$libdir
+ liblist="$ldpath:$lgccpath:$liblist"
+ elif [[ $arg =~ $libpat ]]; then
+ libflag=1
+ elif [ $libflag -ne 0 ]; then
+ liblist="$arg:$liblist"
+ libflag=0
+ elif [ "x$arg" != "xenv" ]; then
+ if [[ $arg =~ "tst-" ]]; then
+ if [ -f $arg ]; then
+ done=1
+ fi
+ fi
+ command="$command $arg"
+ fi
+ done
+}
+
+# Transform the current argument list into a properly quoted Bourne shell
+# command string.
+bourne_quote "$@"
+
+liblist=$addon_libpath:$liblist
+liblist=`tr -s : <<< $liblist`
+liblist=${liblist#:*}
+liblist=${liblist%*:}
+
+if [ "x$liblist" != "x" ]; then
+ LIBPATH_OPT="-E LD_LIBRARY_PATH=$liblist"
+fi
+
+if [ "x$envlist" != "x" ]; then
+ ENV_OPT="-E $envlist"
+fi
+
+if [ "x$ldpath" != "x" ]; then
+ LDPATH_OPT="-L $ldpath"
+fi
+
+if [ "x$timeoutfactor" != "x" ]; then
+ $emulator $LDPATH_OPT $LIBPATH_OPT $ENV_OPT $command &
+ pid=$!
+ trap "kill -SIGINT $pid" SIGALRM
+ sleep $timeoutfactor && kill -SIGALRM $$
+ exit 1
+else
+ $emulator $LDPATH_OPT $LIBPATH_OPT $ENV_OPT $command
+fi
+
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 09/11] Avoid warning from -Wbuiltin-declaration-mismatch
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (8 preceding siblings ...)
2025-01-23 13:43 ` [PATCH 08/11] Add script to run tests through a qemu wrapper Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 16:16 ` Joseph Myers
2025-01-23 13:43 ` [PATCH 10/11] Avoid GCC 11 warning from -Wmaybe-uninitialized Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 11/11] Prevent turning memset into self-recursion Aleksandar Rakic
11 siblings, 1 reply; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu
Avoid GCC 11 warning from -Wbuiltin-declaration-mismatch for modfl and
sincosl under MIPS o32 ABI.
Cherry-picked 056065bbe644d396a6fadd7c759f91bba1855bd6
from https://github.com/MIPS/glibc
Signed-off-by: Chao-ying Fu <cfu@mips.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/ieee754/dbl-64/s_modf.c | 4 ++++
sysdeps/ieee754/dbl-64/s_sincos.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/sysdeps/ieee754/dbl-64/s_modf.c b/sysdeps/ieee754/dbl-64/s_modf.c
index 0de2084caf..eda2d65b51 100644
--- a/sysdeps/ieee754/dbl-64/s_modf.c
+++ b/sysdeps/ieee754/dbl-64/s_modf.c
@@ -23,6 +23,7 @@
#include <math_private.h>
#include <libm-alias-double.h>
#include <stdint.h>
+#include <libc-diag.h>
static const double one = 1.0;
@@ -60,5 +61,8 @@ __modf(double x, double *iptr)
}
}
#ifndef __modf
+DIAG_PUSH_NEEDS_COMMENT;
+DIAG_IGNORE_NEEDS_COMMENT (11, "-Wbuiltin-declaration-mismatch");
libm_alias_double (__modf, modf)
+DIAG_POP_NEEDS_COMMENT;
#endif
diff --git a/sysdeps/ieee754/dbl-64/s_sincos.c b/sysdeps/ieee754/dbl-64/s_sincos.c
index adbc57af28..531940d4c8 100644
--- a/sysdeps/ieee754/dbl-64/s_sincos.c
+++ b/sysdeps/ieee754/dbl-64/s_sincos.c
@@ -23,6 +23,7 @@
#include <fenv_private.h>
#include <math-underflow.h>
#include <libm-alias-double.h>
+#include <libc-diag.h>
#ifndef SECTION
# define SECTION
@@ -106,5 +107,8 @@ __sincos (double x, double *sinx, double *cosx)
*sinx = *cosx = x / x;
}
#ifndef __sincos
+DIAG_PUSH_NEEDS_COMMENT;
+DIAG_IGNORE_NEEDS_COMMENT (11, "-Wbuiltin-declaration-mismatch");
libm_alias_double (__sincos, sincos)
+DIAG_POP_NEEDS_COMMENT;
#endif
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 10/11] Avoid GCC 11 warning from -Wmaybe-uninitialized
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (9 preceding siblings ...)
2025-01-23 13:43 ` [PATCH 09/11] Avoid warning from -Wbuiltin-declaration-mismatch Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 11/11] Prevent turning memset into self-recursion Aleksandar Rakic
11 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu
Cherry-picked 4dad697124b3bc82d9f4fbad62f30224216ab996
from https://github.com/MIPS/glibc
Signed-off-by: Chao-ying Fu <cfu@mips.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/ieee754/soft-fp/s_fdiv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sysdeps/ieee754/soft-fp/s_fdiv.c b/sysdeps/ieee754/soft-fp/s_fdiv.c
index 8c92aa6fb2..d02da4ca71 100644
--- a/sysdeps/ieee754/soft-fp/s_fdiv.c
+++ b/sysdeps/ieee754/soft-fp/s_fdiv.c
@@ -35,6 +35,7 @@
may be where the macro is defined. This happens only with -O1. */
DIAG_PUSH_NEEDS_COMMENT;
DIAG_IGNORE_NEEDS_COMMENT (8, "-Wmaybe-uninitialized");
+DIAG_IGNORE_NEEDS_COMMENT (11, "-Wmaybe-uninitialized");
#include <soft-fp.h>
#include <single.h>
#include <double.h>
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 11/11] Prevent turning memset into self-recursion
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
` (10 preceding siblings ...)
2025-01-23 13:43 ` [PATCH 10/11] Avoid GCC 11 warning from -Wmaybe-uninitialized Aleksandar Rakic
@ 2025-01-23 13:43 ` Aleksandar Rakic
2025-01-23 16:19 ` Joseph Myers
11 siblings, 1 reply; 17+ messages in thread
From: Aleksandar Rakic @ 2025-01-23 13:43 UTC (permalink / raw)
To: libc-alpha; +Cc: aleksandar.rakic, djordje.todorovic, cfu, Dragan Mladjenovic
Prevent GCC 11 from turning memset into self-recursion.
GCC11 transforms byte-by-byte set loop pattern in memset.c into
a memset call, causing runtime failures. Apply -fno-builtin for
both the memset.c and memcpy.c to prevent similar bugs in future.
Cherry-picked 31906b3556bc18cfdb7a3d84a669d95486450704
from https://github.com/MIPS/glibc
Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com>
Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
---
sysdeps/mips/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/sysdeps/mips/Makefile b/sysdeps/mips/Makefile
index 17ddc2a97c..4464d73902 100644
--- a/sysdeps/mips/Makefile
+++ b/sysdeps/mips/Makefile
@@ -24,6 +24,9 @@ ASFLAGS-.o += $(pie-default)
ASFLAGS-.op += $(pie-default)
ASFLAGS += -O2
+CFLAGS-memset.c += -fno-builtin
+CFLAGS-memcpy.c += -fno-builtin
+
ifeq ($(subdir),elf)
# These tests fail on all mips configurations (BZ 29404)
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/11] Avoid warning from -Wbuiltin-declaration-mismatch
2025-01-23 13:43 ` [PATCH 09/11] Avoid warning from -Wbuiltin-declaration-mismatch Aleksandar Rakic
@ 2025-01-23 16:16 ` Joseph Myers
0 siblings, 0 replies; 17+ messages in thread
From: Joseph Myers @ 2025-01-23 16:16 UTC (permalink / raw)
To: Aleksandar Rakic; +Cc: libc-alpha, aleksandar.rakic, djordje.todorovic, cfu
On Thu, 23 Jan 2025, Aleksandar Rakic wrote:
> Avoid GCC 11 warning from -Wbuiltin-declaration-mismatch for modfl and
> sincosl under MIPS o32 ABI.
This should not be needed. math/Makefile has
CFLAGS-s_modf.c += -fno-builtin-modfl
CFLAGS-s_sincos.c += -fno-builtin-sincosl
which are supposed to avoid such warnings. (It wouldn't surprise me if
we're missing some such -fno-builtin-* for functions not currently
supported as built-in functions in GCC, but if such built-in functions get
added in future and result in glibc build failures, we can add the
corresponding options that were previously missed - the options work fine
even when there is no such built-in function.)
--
Joseph S. Myers
josmyers@redhat.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 11/11] Prevent turning memset into self-recursion
2025-01-23 13:43 ` [PATCH 11/11] Prevent turning memset into self-recursion Aleksandar Rakic
@ 2025-01-23 16:19 ` Joseph Myers
0 siblings, 0 replies; 17+ messages in thread
From: Joseph Myers @ 2025-01-23 16:19 UTC (permalink / raw)
To: Aleksandar Rakic
Cc: libc-alpha, aleksandar.rakic, djordje.todorovic, cfu, Dragan Mladjenovic
On Thu, 23 Jan 2025, Aleksandar Rakic wrote:
> Prevent GCC 11 from turning memset into self-recursion.
> GCC11 transforms byte-by-byte set loop pattern in memset.c into
> a memset call, causing runtime failures. Apply -fno-builtin for
> both the memset.c and memcpy.c to prevent similar bugs in future.
We use inhibit_loop_to_libcall to provide __attribute__ ((__optimize__
("-fno-tree-loop-distribute-patterns"))) in such cases.
--
Joseph S. Myers
josmyers@redhat.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 07/11] Fix strcmp bug for little endian target
2025-01-23 13:43 ` [PATCH 07/11] Fix strcmp bug for little endian target Aleksandar Rakic
@ 2025-01-23 16:20 ` Joseph Myers
2025-01-23 18:23 ` Adhemerval Zanella Netto
1 sibling, 0 replies; 17+ messages in thread
From: Joseph Myers @ 2025-01-23 16:20 UTC (permalink / raw)
To: Aleksandar Rakic
Cc: libc-alpha, aleksandar.rakic, djordje.todorovic, cfu, Faraz Shahbazker
On Thu, 23 Jan 2025, Aleksandar Rakic wrote:
> Strcmp gives incorrect result for little endian targets under
> the following conditions:
> 1. Length of 1st string is 1 less than a multiple of 4 (i.e len%4=3)
> 2. First string is a prefix of the second string
> 3. The first differing character in the second string is extended
> ASCII (that is > 127)
Is there a test in the glibc testsuite that fails before and passes after
this patch? If not, one should be added.
--
Joseph S. Myers
josmyers@redhat.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 07/11] Fix strcmp bug for little endian target
2025-01-23 13:43 ` [PATCH 07/11] Fix strcmp bug for little endian target Aleksandar Rakic
2025-01-23 16:20 ` Joseph Myers
@ 2025-01-23 18:23 ` Adhemerval Zanella Netto
1 sibling, 0 replies; 17+ messages in thread
From: Adhemerval Zanella Netto @ 2025-01-23 18:23 UTC (permalink / raw)
To: Aleksandar Rakic, libc-alpha
Cc: aleksandar.rakic, djordje.todorovic, cfu, Faraz Shahbazker
On 23/01/25 10:43, Aleksandar Rakic wrote:
> Strcmp gives incorrect result for little endian targets under
> the following conditions:
> 1. Length of 1st string is 1 less than a multiple of 4 (i.e len%4=3)
> 2. First string is a prefix of the second string
> 3. The first differing character in the second string is extended
> ASCII (that is > 127)
>
> Cherry-picked 7c709e878f836069bbdbf42979937794623cfa68
> from https://github.com/MIPS/glibc
>
> Signed-off-by: Faraz Shahbazker <fshahbazker@wavecomp.com>
> Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
> ---
> sysdeps/mips/strcmp.S | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/mips/strcmp.S b/sysdeps/mips/strcmp.S
> index 4878cd3aac..8d1bab12ec 100644
> --- a/sysdeps/mips/strcmp.S
> +++ b/sysdeps/mips/strcmp.S
> @@ -225,10 +225,13 @@ L(worddiff):
> beq a0, zero, L(wexit01)
> bne a0, a1, L(wexit01)
>
> - /* The other bytes are identical, so just subract the 2 words
> - and return the difference. */
> +# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> + srl a0, a2, 24
> + srl a1, a3, 24
> +# else /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */
> move a0, a2
> move a1, a3
> +# endif /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */
>
> L(wexit01):
> subu v0, a0, a1
Can't you use the generic implementation instead? If I understand correctly,
mips optimizes only the aligned case, while generic code also do word size
read for unaligned case (with the MERGE and shift tricks). The only trick
I see that mips implementation does is loop unrolling, which I think you
can do by adding some compiler flags on mips Makefile.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-01-23 18:23 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-23 13:42 [PATCH 0/11] Improve Mips target Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 00/11] " Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 01/11] Updates for microMIPS Release 6 Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 02/11] Fix rtld link_map initialization issues Aleksandar Rakic
2025-01-23 13:42 ` [PATCH 03/11] Fix issues with removing no-reorder directives Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 04/11] Add C implementation of memcpy/memset Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 05/11] Add optimized assembly for strcmp Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 06/11] Fix prefetching beyond copied memory Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 07/11] Fix strcmp bug for little endian target Aleksandar Rakic
2025-01-23 16:20 ` Joseph Myers
2025-01-23 18:23 ` Adhemerval Zanella Netto
2025-01-23 13:43 ` [PATCH 08/11] Add script to run tests through a qemu wrapper Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 09/11] Avoid warning from -Wbuiltin-declaration-mismatch Aleksandar Rakic
2025-01-23 16:16 ` Joseph Myers
2025-01-23 13:43 ` [PATCH 10/11] Avoid GCC 11 warning from -Wmaybe-uninitialized Aleksandar Rakic
2025-01-23 13:43 ` [PATCH 11/11] Prevent turning memset into self-recursion Aleksandar Rakic
2025-01-23 16:19 ` Joseph Myers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).