public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations
@ 2019-09-30 14:55 Christos Gentsos
  2019-09-30 15:31 ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 5+ messages in thread
From: Christos Gentsos @ 2019-09-30 14:55 UTC (permalink / raw)
  To: newlib

The same pop instruction that is used to restore registers can be used
to return from the function (as it is already done in other function
implementations).
---
 newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
 newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
index 61a72581..a0aad852 100644
--- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
@@ -49,9 +49,7 @@ __aeabi_memmove:
 	subs	r3, r3, #1
 	bcs	1b
 2:
-	pop	{r4}
-	pop	{r1}
-	bx	r1
+	pop	{r4, pc}
 3:
 	movs	r3, #0
 	cmp	r2, #0
diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
index aa8f2719..5bb80b20 100644
--- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
@@ -110,9 +110,7 @@ __aeabi_memset:
 	cmp	r4, r3
 	bne	8b
 9:
-	pop	{r4, r5, r6}
-	pop	{r1}
-	bx	r1
+	pop	{r4, r5, r6, pc}
 10:
 	movs	r3, r0
 	movs	r4, r1
-- 
2.23.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations
  2019-09-30 14:55 [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations Christos Gentsos
@ 2019-09-30 15:31 ` Richard Earnshaw (lists)
  2019-09-30 15:37   ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-30 15:31 UTC (permalink / raw)
  To: Christos Gentsos, newlib

On 30/09/2019 15:55, Christos Gentsos wrote:
> The same pop instruction that is used to restore registers can be used
> to return from the function (as it is already done in other function
> implementations).
> ---
>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>   2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> index 61a72581..a0aad852 100644
> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> @@ -49,9 +49,7 @@ __aeabi_memmove:
>   	subs	r3, r3, #1
>   	bcs	1b
>   2:
> -	pop	{r4}
> -	pop	{r1}
> -	bx	r1
> +	pop	{r4, pc}
>   3:
>   	movs	r3, #0
>   	cmp	r2, #0
> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> index aa8f2719..5bb80b20 100644
> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> @@ -110,9 +110,7 @@ __aeabi_memset:
>   	cmp	r4, r3
>   	bne	8b
>   9:
> -	pop	{r4, r5, r6}
> -	pop	{r1}
> -	bx	r1
> +	pop	{r4, r5, r6, pc}
>   10:
>   	movs	r3, r0
>   	movs	r4, r1
> 

No.  That isn't interworking clean on armv4t, which we still need to 
support.

Sorry.

R.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations
  2019-09-30 15:31 ` Richard Earnshaw (lists)
@ 2019-09-30 15:37   ` Richard Earnshaw (lists)
  2019-09-30 17:17     ` Christos Gentsos
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-30 15:37 UTC (permalink / raw)
  To: Christos Gentsos, newlib

On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:
> On 30/09/2019 15:55, Christos Gentsos wrote:
>> The same pop instruction that is used to restore registers can be used
>> to return from the function (as it is already done in other function
>> implementations).
>> ---
>>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>>   2 files changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S 
>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>> index 61a72581..a0aad852 100644
>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>> @@ -49,9 +49,7 @@ __aeabi_memmove:
>>       subs    r3, r3, #1
>>       bcs    1b
>>   2:
>> -    pop    {r4}
>> -    pop    {r1}
>> -    bx    r1
>> +    pop    {r4, pc}
>>   3:
>>       movs    r3, #0
>>       cmp    r2, #0
>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S 
>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>> index aa8f2719..5bb80b20 100644
>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>> @@ -110,9 +110,7 @@ __aeabi_memset:
>>       cmp    r4, r3
>>       bne    8b
>>   9:
>> -    pop    {r4, r5, r6}
>> -    pop    {r1}
>> -    bx    r1
>> +    pop    {r4, r5, r6, pc}
>>   10:
>>       movs    r3, r0
>>       movs    r4, r1
>>
> 
> No.  That isn't interworking clean on armv4t, which we still need to 
> support.
> 
> Sorry.
> 
> R.

However, a patch that tests __ARM_ARCH >=5 and uses your improved 
sequence only in that case (preserving the old code otherwise) would 
probably be OK :-)

R.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations
  2019-09-30 15:37   ` Richard Earnshaw (lists)
@ 2019-09-30 17:17     ` Christos Gentsos
  2019-10-07 14:31       ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 5+ messages in thread
From: Christos Gentsos @ 2019-09-30 17:17 UTC (permalink / raw)
  To: Richard Earnshaw (lists), newlib

On Mon, Sep 30 2019 at 16:37:26 +0100, Richard Earnshaw (lists) wrote:
> On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:
>> On 30/09/2019 15:55, Christos Gentsos wrote:
>>> The same pop instruction that is used to restore registers can be used
>>> to return from the function (as it is already done in other function
>>> implementations).
>>> ---
>>>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>>>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>>>   2 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S 
>>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>> index 61a72581..a0aad852 100644
>>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>> @@ -49,9 +49,7 @@ __aeabi_memmove:
>>>       subs    r3, r3, #1
>>>       bcs    1b
>>>   2:
>>> -    pop    {r4}
>>> -    pop    {r1}
>>> -    bx    r1
>>> +    pop    {r4, pc}
>>>   3:
>>>       movs    r3, #0
>>>       cmp    r2, #0
>>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S 
>>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>> index aa8f2719..5bb80b20 100644
>>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>> @@ -110,9 +110,7 @@ __aeabi_memset:
>>>       cmp    r4, r3
>>>       bne    8b
>>>   9:
>>> -    pop    {r4, r5, r6}
>>> -    pop    {r1}
>>> -    bx    r1
>>> +    pop    {r4, r5, r6, pc}
>>>   10:
>>>       movs    r3, r0
>>>       movs    r4, r1
>>>
>> 
>> No.  That isn't interworking clean on armv4t, which we still need to 
>> support.
>> 
>> Sorry.
>> 
>> R.
>
> However, a patch that tests __ARM_ARCH >=5 and uses your improved 
> sequence only in that case (preserving the old code otherwise) would 
> probably be OK :-)
>
> R.

Oh sorry then, I wasn't aware of that, thanks for the correction. I
re-made the patch such that it now checks for __ARM_ARCH, as per your
suggestion. Does it look better?

Thanks,
Christos
---
 newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 ++++
 newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
index 61a72581..465a5a19 100644
--- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
@@ -49,9 +49,13 @@ __aeabi_memmove:
 	subs	r3, r3, #1
 	bcs	1b
 2:
+#if __ARM_ARCH >= 5
+	pop	{r4, pc}
+#else
 	pop	{r4}
 	pop	{r1}
 	bx	r1
+#endif
 3:
 	movs	r3, #0
 	cmp	r2, #0
diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
index aa8f2719..52094a7b 100644
--- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
@@ -110,9 +110,13 @@ __aeabi_memset:
 	cmp	r4, r3
 	bne	8b
 9:
+#if __ARM_ARCH >= 5
+	pop	{r4, r5, r6, pc}
+#else
 	pop	{r4, r5, r6}
 	pop	{r1}
 	bx	r1
+#endif
 10:
 	movs	r3, r0
 	movs	r4, r1
-- 
2.23.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations
  2019-09-30 17:17     ` Christos Gentsos
@ 2019-10-07 14:31       ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Earnshaw (lists) @ 2019-10-07 14:31 UTC (permalink / raw)
  To: Christos Gentsos, newlib

Thanks.  I tweaked this slightly to include the compatibility header 
acle-compat.h and put this in.

R.

On 30/09/2019 18:16, Christos Gentsos wrote:
> On Mon, Sep 30 2019 at 16:37:26 +0100, Richard Earnshaw (lists) wrote:
>> On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:
>>> On 30/09/2019 15:55, Christos Gentsos wrote:
>>>> The same pop instruction that is used to restore registers can be used
>>>> to return from the function (as it is already done in other function
>>>> implementations).
>>>> ---
>>>>    newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>>>>    newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>>>>    2 files changed, 2 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> index 61a72581..a0aad852 100644
>>>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> @@ -49,9 +49,7 @@ __aeabi_memmove:
>>>>        subs    r3, r3, #1
>>>>        bcs    1b
>>>>    2:
>>>> -    pop    {r4}
>>>> -    pop    {r1}
>>>> -    bx    r1
>>>> +    pop    {r4, pc}
>>>>    3:
>>>>        movs    r3, #0
>>>>        cmp    r2, #0
>>>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> index aa8f2719..5bb80b20 100644
>>>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> @@ -110,9 +110,7 @@ __aeabi_memset:
>>>>        cmp    r4, r3
>>>>        bne    8b
>>>>    9:
>>>> -    pop    {r4, r5, r6}
>>>> -    pop    {r1}
>>>> -    bx    r1
>>>> +    pop    {r4, r5, r6, pc}
>>>>    10:
>>>>        movs    r3, r0
>>>>        movs    r4, r1
>>>>
>>>
>>> No.  That isn't interworking clean on armv4t, which we still need to
>>> support.
>>>
>>> Sorry.
>>>
>>> R.
>>
>> However, a patch that tests __ARM_ARCH >=5 and uses your improved
>> sequence only in that case (preserving the old code otherwise) would
>> probably be OK :-)
>>
>> R.
> 
> Oh sorry then, I wasn't aware of that, thanks for the correction. I
> re-made the patch such that it now checks for __ARM_ARCH, as per your
> suggestion. Does it look better?
> 
> Thanks,
> Christos
> ---
>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 ++++
>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 ++++
>   2 files changed, 8 insertions(+)
> 
> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> index 61a72581..465a5a19 100644
> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> @@ -49,9 +49,13 @@ __aeabi_memmove:
>   	subs	r3, r3, #1
>   	bcs	1b
>   2:
> +#if __ARM_ARCH >= 5
> +	pop	{r4, pc}
> +#else
>   	pop	{r4}
>   	pop	{r1}
>   	bx	r1
> +#endif
>   3:
>   	movs	r3, #0
>   	cmp	r2, #0
> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> index aa8f2719..52094a7b 100644
> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> @@ -110,9 +110,13 @@ __aeabi_memset:
>   	cmp	r4, r3
>   	bne	8b
>   9:
> +#if __ARM_ARCH >= 5
> +	pop	{r4, r5, r6, pc}
> +#else
>   	pop	{r4, r5, r6}
>   	pop	{r1}
>   	bx	r1
> +#endif
>   10:
>   	movs	r3, r0
>   	movs	r4, r1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-07 14:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-30 14:55 [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations Christos Gentsos
2019-09-30 15:31 ` Richard Earnshaw (lists)
2019-09-30 15:37   ` Richard Earnshaw (lists)
2019-09-30 17:17     ` Christos Gentsos
2019-10-07 14:31       ` Richard Earnshaw (lists)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).