[PATCH] Fix i686 memchr overflow calculation (BZ#21182)

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
@ 2017-03-14 17:28 Adhemerval Zanella
  2017-03-14 19:48 ` Wainer dos Santos Moschetta
  2017-03-28 19:22 ` Adhemerval Zanella
  0 siblings, 2 replies; 9+ messages in thread
From: Adhemerval Zanella @ 2017-03-14 17:28 UTC (permalink / raw)
  To: libc-alpha

This patch fixes the regression added by 23d2770 for final address
overflow calculation.  The subtraction of the considered size (16)
at line 120 is at wrong place, for sizes less than 16 subsequent
overflow check will not take in consideration an invalid size (since
the subtraction will be negative).  Also, the lea instruction also
does not raise the carry flag (CF) that is used in subsequent jbe
to check for overflow.

The fix is to follow x86_64 logic from 3daef2c where the overflow
is first check and a sub instruction is issued.  In case of resulting
negative size, CF will be set by the sub instruction and a NULL
result will be returned.  The patch also add similar tests reported
in bug report.

Checked on i686-linux-gnu and x86_64-linux-gnu.

	[BZ# 21182]
	* string/test-memchr.c (do_test): Add BZ#21182 checks for address
	near end of a page.
	* sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
	overflow calculation.
---
 ChangeLog                                 | 7 +++++++
 string/test-memchr.c                      | 6 ++++++
 sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/string/test-memchr.c b/string/test-memchr.c
index d64d10c..87a077b 100644
--- a/string/test-memchr.c
+++ b/string/test-memchr.c
@@ -210,6 +210,12 @@ test_main (void)
       do_test (0, i, i + 1, i + 1, 0);
     }
 
+  /* BZ#21182 - wrong overflow calculation for i686 implementation
+     with address near end of the page.  */
+  for (i = 2; i < 16; ++i)
+    /* page_size is in fact getpagesize() * 2.  */
+    do_test (page_size/2 - i, i, i, 1, 0x9B);
+
   do_random_tests ();
   return ret;
 }
diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
index 910679c..e41f324 100644
--- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
+++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
@@ -117,7 +117,6 @@ L(crosscache):
 
 # ifndef USE_AS_RAWMEMCHR
 	jnz	L(match_case2_prolog1)
-	lea	-16(%edx), %edx
         /* Calculate the last acceptable address and check for possible
            addition overflow by using satured math:
            edx = ecx + edx
@@ -125,6 +124,7 @@ L(crosscache):
 	add	%ecx, %edx
 	sbb	%eax, %eax
 	or	%eax, %edx
+	sub	$16, %edx
 	jbe	L(return_null)
 	lea	16(%edi), %edi
 # else
-- 
2.7.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-03-14 17:28 [PATCH] Fix i686 memchr overflow calculation (BZ#21182) Adhemerval Zanella
@ 2017-03-14 19:48 ` Wainer dos Santos Moschetta
  2017-03-28 19:22 ` Adhemerval Zanella
  1 sibling, 0 replies; 9+ messages in thread
From: Wainer dos Santos Moschetta @ 2017-03-14 19:48 UTC (permalink / raw)
  To: libc-alpha


On 03/14/2017 02:28 PM, Adhemerval Zanella wrote:
> This patch fixes the regression added by 23d2770 for final address
> overflow calculation.  The subtraction of the considered size (16)
> at line 120 is at wrong place, for sizes less than 16 subsequent
> overflow check will not take in consideration an invalid size (since
> the subtraction will be negative).  Also, the lea instruction also
> does not raise the carry flag (CF) that is used in subsequent jbe
> to check for overflow.
>
> The fix is to follow x86_64 logic from 3daef2c where the overflow
> is first check and a sub instruction is issued.  In case of resulting
> negative size, CF will be set by the sub instruction and a NULL
> result will be returned.  The patch also add similar tests reported
> in bug report.
>
> Checked on i686-linux-gnu and x86_64-linux-gnu.
>
> 	[BZ# 21182]
> 	* string/test-memchr.c (do_test): Add BZ#21182 checks for address
> 	near end of a page.
> 	* sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
> 	overflow calculation.
> ---
>  ChangeLog                                 | 7 +++++++
>  string/test-memchr.c                      | 6 ++++++
>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>  3 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/string/test-memchr.c b/string/test-memchr.c
> index d64d10c..87a077b 100644
> --- a/string/test-memchr.c
> +++ b/string/test-memchr.c
> @@ -210,6 +210,12 @@ test_main (void)
>        do_test (0, i, i + 1, i + 1, 0);
>      }
>
> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
> +     with address near end of the page.  */
> +  for (i = 2; i < 16; ++i)
> +    /* page_size is in fact getpagesize() * 2.  */
> +    do_test (page_size/2 - i, i, i, 1, 0x9B);

page_size / 2

> +
>    do_random_tests ();
>    return ret;
>  }
> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
> index 910679c..e41f324 100644
> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
> @@ -117,7 +117,6 @@ L(crosscache):
>
>  # ifndef USE_AS_RAWMEMCHR
>  	jnz	L(match_case2_prolog1)
> -	lea	-16(%edx), %edx
>          /* Calculate the last acceptable address and check for possible
>             addition overflow by using satured math:
>             edx = ecx + edx
> @@ -125,6 +124,7 @@ L(crosscache):
>  	add	%ecx, %edx
>  	sbb	%eax, %eax
>  	or	%eax, %edx
> +	sub	$16, %edx
>  	jbe	L(return_null)
>  	lea	16(%edi), %edi
>  # else

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-03-14 17:28 [PATCH] Fix i686 memchr overflow calculation (BZ#21182) Adhemerval Zanella
  2017-03-14 19:48 ` Wainer dos Santos Moschetta
@ 2017-03-28 19:22 ` Adhemerval Zanella
  2017-03-28 21:47   ` H.J. Lu
  1 sibling, 1 reply; 9+ messages in thread
From: Adhemerval Zanella @ 2017-03-28 19:22 UTC (permalink / raw)
  To: libc-alpha

Ping (with the page_size/2 fix included).

On 14/03/2017 14:28, Adhemerval Zanella wrote:
> This patch fixes the regression added by 23d2770 for final address
> overflow calculation.  The subtraction of the considered size (16)
> at line 120 is at wrong place, for sizes less than 16 subsequent
> overflow check will not take in consideration an invalid size (since
> the subtraction will be negative).  Also, the lea instruction also
> does not raise the carry flag (CF) that is used in subsequent jbe
> to check for overflow.
> 
> The fix is to follow x86_64 logic from 3daef2c where the overflow
> is first check and a sub instruction is issued.  In case of resulting
> negative size, CF will be set by the sub instruction and a NULL
> result will be returned.  The patch also add similar tests reported
> in bug report.
> 
> Checked on i686-linux-gnu and x86_64-linux-gnu.
> 
> 	[BZ# 21182]
> 	* string/test-memchr.c (do_test): Add BZ#21182 checks for address
> 	near end of a page.
> 	* sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
> 	overflow calculation.
> ---
>  ChangeLog                                 | 7 +++++++
>  string/test-memchr.c                      | 6 ++++++
>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/string/test-memchr.c b/string/test-memchr.c
> index d64d10c..87a077b 100644
> --- a/string/test-memchr.c
> +++ b/string/test-memchr.c
> @@ -210,6 +210,12 @@ test_main (void)
>        do_test (0, i, i + 1, i + 1, 0);
>      }
>  
> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
> +     with address near end of the page.  */
> +  for (i = 2; i < 16; ++i)
> +    /* page_size is in fact getpagesize() * 2.  */
> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
> +
>    do_random_tests ();
>    return ret;
>  }
> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
> index 910679c..e41f324 100644
> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
> @@ -117,7 +117,6 @@ L(crosscache):
>  
>  # ifndef USE_AS_RAWMEMCHR
>  	jnz	L(match_case2_prolog1)
> -	lea	-16(%edx), %edx
>          /* Calculate the last acceptable address and check for possible
>             addition overflow by using satured math:
>             edx = ecx + edx
> @@ -125,6 +124,7 @@ L(crosscache):
>  	add	%ecx, %edx
>  	sbb	%eax, %eax
>  	or	%eax, %edx
> +	sub	$16, %edx
>  	jbe	L(return_null)
>  	lea	16(%edi), %edi
>  # else
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-03-28 19:22 ` Adhemerval Zanella
@ 2017-03-28 21:47   ` H.J. Lu
  2017-05-19 13:44     ` H.J. Lu
  0 siblings, 1 reply; 9+ messages in thread
From: H.J. Lu @ 2017-03-28 21:47 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GNU C Library

On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> Ping (with the page_size/2 fix included).
>
> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>> This patch fixes the regression added by 23d2770 for final address
>> overflow calculation.  The subtraction of the considered size (16)
>> at line 120 is at wrong place, for sizes less than 16 subsequent
>> overflow check will not take in consideration an invalid size (since
>> the subtraction will be negative).  Also, the lea instruction also
>> does not raise the carry flag (CF) that is used in subsequent jbe
>> to check for overflow.
>>
>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>> is first check and a sub instruction is issued.  In case of resulting
>> negative size, CF will be set by the sub instruction and a NULL
>> result will be returned.  The patch also add similar tests reported
>> in bug report.
>>
>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>
>>       [BZ# 21182]
>>       * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>       near end of a page.
>>       * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>       overflow calculation.
>> ---
>>  ChangeLog                                 | 7 +++++++
>>  string/test-memchr.c                      | 6 ++++++
>>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>> index d64d10c..87a077b 100644
>> --- a/string/test-memchr.c
>> +++ b/string/test-memchr.c
>> @@ -210,6 +210,12 @@ test_main (void)
>>        do_test (0, i, i + 1, i + 1, 0);
>>      }
>>
>> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
>> +     with address near end of the page.  */
>> +  for (i = 2; i < 16; ++i)
>> +    /* page_size is in fact getpagesize() * 2.  */
>> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
>> +
>>    do_random_tests ();
>>    return ret;
>>  }
>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>> index 910679c..e41f324 100644
>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>> @@ -117,7 +117,6 @@ L(crosscache):
>>
>>  # ifndef USE_AS_RAWMEMCHR
>>       jnz     L(match_case2_prolog1)
>> -     lea     -16(%edx), %edx
>>          /* Calculate the last acceptable address and check for possible
>>             addition overflow by using satured math:
>>             edx = ecx + edx
>> @@ -125,6 +124,7 @@ L(crosscache):
>>       add     %ecx, %edx
>>       sbb     %eax, %eax
>>       or      %eax, %edx
>> +     sub     $16, %edx
>>       jbe     L(return_null)
>>       lea     16(%edi), %edi
>>  # else
>>

Looks good.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-03-28 21:47   ` H.J. Lu
@ 2017-05-19 13:44     ` H.J. Lu
  2017-05-19 14:45       ` Adhemerval Zanella
  0 siblings, 1 reply; 9+ messages in thread
From: H.J. Lu @ 2017-05-19 13:44 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GNU C Library

On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>> Ping (with the page_size/2 fix included).
>>
>> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>>> This patch fixes the regression added by 23d2770 for final address
>>> overflow calculation.  The subtraction of the considered size (16)
>>> at line 120 is at wrong place, for sizes less than 16 subsequent
>>> overflow check will not take in consideration an invalid size (since
>>> the subtraction will be negative).  Also, the lea instruction also
>>> does not raise the carry flag (CF) that is used in subsequent jbe
>>> to check for overflow.
>>>
>>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>>> is first check and a sub instruction is issued.  In case of resulting
>>> negative size, CF will be set by the sub instruction and a NULL
>>> result will be returned.  The patch also add similar tests reported
>>> in bug report.
>>>
>>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>>
>>>       [BZ# 21182]
>>>       * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>>       near end of a page.
>>>       * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>>       overflow calculation.
>>> ---
>>>  ChangeLog                                 | 7 +++++++
>>>  string/test-memchr.c                      | 6 ++++++
>>>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>>> index d64d10c..87a077b 100644
>>> --- a/string/test-memchr.c
>>> +++ b/string/test-memchr.c
>>> @@ -210,6 +210,12 @@ test_main (void)
>>>        do_test (0, i, i + 1, i + 1, 0);
>>>      }
>>>
>>> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
>>> +     with address near end of the page.  */
>>> +  for (i = 2; i < 16; ++i)
>>> +    /* page_size is in fact getpagesize() * 2.  */
>>> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
>>> +
>>>    do_random_tests ();
>>>    return ret;
>>>  }
>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>> index 910679c..e41f324 100644
>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>> @@ -117,7 +117,6 @@ L(crosscache):
>>>
>>>  # ifndef USE_AS_RAWMEMCHR
>>>       jnz     L(match_case2_prolog1)
>>> -     lea     -16(%edx), %edx
>>>          /* Calculate the last acceptable address and check for possible
>>>             addition overflow by using satured math:
>>>             edx = ecx + edx
>>> @@ -125,6 +124,7 @@ L(crosscache):
>>>       add     %ecx, %edx
>>>       sbb     %eax, %eax
>>>       or      %eax, %edx
>>> +     sub     $16, %edx
>>>       jbe     L(return_null)
>>>       lea     16(%edi), %edi
>>>  # else
>>>
>
> Looks good.
>
>

Just a thought.  Is this approach

        /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)"
           instead of "(edx + ecx) - 16" to void possible addition
           overflow.  */
        neg     %ecx
        add     $16, %ecx
        sub     %ecx, %edx
        jbe     L(return_null)

a little bit better?


-- 
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-05-19 13:44     ` H.J. Lu
@ 2017-05-19 14:45       ` Adhemerval Zanella
  2017-05-19 15:10         ` Adhemerval Zanella
  0 siblings, 1 reply; 9+ messages in thread
From: Adhemerval Zanella @ 2017-05-19 14:45 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library



On 19/05/2017 10:44, H.J. Lu wrote:
> On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>> Ping (with the page_size/2 fix included).
>>>
>>> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>>>> This patch fixes the regression added by 23d2770 for final address
>>>> overflow calculation.  The subtraction of the considered size (16)
>>>> at line 120 is at wrong place, for sizes less than 16 subsequent
>>>> overflow check will not take in consideration an invalid size (since
>>>> the subtraction will be negative).  Also, the lea instruction also
>>>> does not raise the carry flag (CF) that is used in subsequent jbe
>>>> to check for overflow.
>>>>
>>>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>>>> is first check and a sub instruction is issued.  In case of resulting
>>>> negative size, CF will be set by the sub instruction and a NULL
>>>> result will be returned.  The patch also add similar tests reported
>>>> in bug report.
>>>>
>>>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>>>
>>>>       [BZ# 21182]
>>>>       * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>>>       near end of a page.
>>>>       * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>>>       overflow calculation.
>>>> ---
>>>>  ChangeLog                                 | 7 +++++++
>>>>  string/test-memchr.c                      | 6 ++++++
>>>>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>>>> index d64d10c..87a077b 100644
>>>> --- a/string/test-memchr.c
>>>> +++ b/string/test-memchr.c
>>>> @@ -210,6 +210,12 @@ test_main (void)
>>>>        do_test (0, i, i + 1, i + 1, 0);
>>>>      }
>>>>
>>>> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
>>>> +     with address near end of the page.  */
>>>> +  for (i = 2; i < 16; ++i)
>>>> +    /* page_size is in fact getpagesize() * 2.  */
>>>> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
>>>> +
>>>>    do_random_tests ();
>>>>    return ret;
>>>>  }
>>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>> index 910679c..e41f324 100644
>>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>> @@ -117,7 +117,6 @@ L(crosscache):
>>>>
>>>>  # ifndef USE_AS_RAWMEMCHR
>>>>       jnz     L(match_case2_prolog1)
>>>> -     lea     -16(%edx), %edx
>>>>          /* Calculate the last acceptable address and check for possible
>>>>             addition overflow by using satured math:
>>>>             edx = ecx + edx
>>>> @@ -125,6 +124,7 @@ L(crosscache):
>>>>       add     %ecx, %edx
>>>>       sbb     %eax, %eax
>>>>       or      %eax, %edx
>>>> +     sub     $16, %edx
>>>>       jbe     L(return_null)
>>>>       lea     16(%edi), %edi
>>>>  # else
>>>>
>>
>> Looks good.
>>
>>
> 
> Just a thought.  Is this approach
> 
>         /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)"
>            instead of "(edx + ecx) - 16" to void possible addition
>            overflow.  */
>         neg     %ecx
>         add     $16, %ecx
>         sub     %ecx, %edx
>         jbe     L(return_null)
> 
> a little bit better?

I do not have a strong preference here, although imho it is simpler to
understand why the saturated math would work here and it is the same
strategy used on other arch implementations.  May you might outline in
comment that ecx is always in the range of [0,64) so '-ecx + 16' can
possible underflow.  Is is this change just for micro-optimization?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-05-19 14:45       ` Adhemerval Zanella
@ 2017-05-19 15:10         ` Adhemerval Zanella
  2017-05-19 16:14           ` H.J. Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Adhemerval Zanella @ 2017-05-19 15:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library



On 19/05/2017 11:45, Adhemerval Zanella wrote:
> 
> 
> On 19/05/2017 10:44, H.J. Lu wrote:
>> On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
>>> <adhemerval.zanella@linaro.org> wrote:
>>>> Ping (with the page_size/2 fix included).
>>>>
>>>> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>>>>> This patch fixes the regression added by 23d2770 for final address
>>>>> overflow calculation.  The subtraction of the considered size (16)
>>>>> at line 120 is at wrong place, for sizes less than 16 subsequent
>>>>> overflow check will not take in consideration an invalid size (since
>>>>> the subtraction will be negative).  Also, the lea instruction also
>>>>> does not raise the carry flag (CF) that is used in subsequent jbe
>>>>> to check for overflow.
>>>>>
>>>>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>>>>> is first check and a sub instruction is issued.  In case of resulting
>>>>> negative size, CF will be set by the sub instruction and a NULL
>>>>> result will be returned.  The patch also add similar tests reported
>>>>> in bug report.
>>>>>
>>>>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>>>>
>>>>>       [BZ# 21182]
>>>>>       * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>>>>       near end of a page.
>>>>>       * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>>>>       overflow calculation.
>>>>> ---
>>>>>  ChangeLog                                 | 7 +++++++
>>>>>  string/test-memchr.c                      | 6 ++++++
>>>>>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>>>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>>>>> index d64d10c..87a077b 100644
>>>>> --- a/string/test-memchr.c
>>>>> +++ b/string/test-memchr.c
>>>>> @@ -210,6 +210,12 @@ test_main (void)
>>>>>        do_test (0, i, i + 1, i + 1, 0);
>>>>>      }
>>>>>
>>>>> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
>>>>> +     with address near end of the page.  */
>>>>> +  for (i = 2; i < 16; ++i)
>>>>> +    /* page_size is in fact getpagesize() * 2.  */
>>>>> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
>>>>> +
>>>>>    do_random_tests ();
>>>>>    return ret;
>>>>>  }
>>>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>> index 910679c..e41f324 100644
>>>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>> @@ -117,7 +117,6 @@ L(crosscache):
>>>>>
>>>>>  # ifndef USE_AS_RAWMEMCHR
>>>>>       jnz     L(match_case2_prolog1)
>>>>> -     lea     -16(%edx), %edx
>>>>>          /* Calculate the last acceptable address and check for possible
>>>>>             addition overflow by using satured math:
>>>>>             edx = ecx + edx
>>>>> @@ -125,6 +124,7 @@ L(crosscache):
>>>>>       add     %ecx, %edx
>>>>>       sbb     %eax, %eax
>>>>>       or      %eax, %edx
>>>>> +     sub     $16, %edx
>>>>>       jbe     L(return_null)
>>>>>       lea     16(%edi), %edi
>>>>>  # else
>>>>>
>>>
>>> Looks good.
>>>
>>>
>>
>> Just a thought.  Is this approach
>>
>>         /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)"
>>            instead of "(edx + ecx) - 16" to void possible addition
>>            overflow.  */
>>         neg     %ecx
>>         add     $16, %ecx
>>         sub     %ecx, %edx
>>         jbe     L(return_null)
>>
>> a little bit better?
> 
> I do not have a strong preference here, although imho it is simpler to
> understand why the saturated math would work here and it is the same
> strategy used on other arch implementations.  May you might outline in
> comment that ecx is always in the range of [0,64) so '-ecx + 16' can
> possible underflow.  Is is this change just for micro-optimization?
> 

Sorry, my bad here.  I made some confusion, in fact what I meant was ecx
was indeed in range [0,16) (is the and instruction for alignment that 
uses 64) and your suggestion *can't* overflow.  I would recommend
you to just outline the possible value of 'edx' in comments. 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-05-19 15:10         ` Adhemerval Zanella
@ 2017-05-19 16:14           ` H.J. Lu
  2017-05-19 17:24             ` Adhemerval Zanella
  0 siblings, 1 reply; 9+ messages in thread
From: H.J. Lu @ 2017-05-19 16:14 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GNU C Library

[-- Attachment #1: Type: text/plain, Size: 4430 bytes --]

On Fri, May 19, 2017 at 8:10 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
> On 19/05/2017 11:45, Adhemerval Zanella wrote:
>>
>>
>> On 19/05/2017 10:44, H.J. Lu wrote:
>>> On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>> Ping (with the page_size/2 fix included).
>>>>>
>>>>> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>>>>>> This patch fixes the regression added by 23d2770 for final address
>>>>>> overflow calculation.  The subtraction of the considered size (16)
>>>>>> at line 120 is at wrong place, for sizes less than 16 subsequent
>>>>>> overflow check will not take in consideration an invalid size (since
>>>>>> the subtraction will be negative).  Also, the lea instruction also
>>>>>> does not raise the carry flag (CF) that is used in subsequent jbe
>>>>>> to check for overflow.
>>>>>>
>>>>>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>>>>>> is first check and a sub instruction is issued.  In case of resulting
>>>>>> negative size, CF will be set by the sub instruction and a NULL
>>>>>> result will be returned.  The patch also add similar tests reported
>>>>>> in bug report.
>>>>>>
>>>>>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>>>>>
>>>>>>       [BZ# 21182]
>>>>>>       * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>>>>>       near end of a page.
>>>>>>       * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>>>>>       overflow calculation.
>>>>>> ---
>>>>>>  ChangeLog                                 | 7 +++++++
>>>>>>  string/test-memchr.c                      | 6 ++++++
>>>>>>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>>>>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>>>>>> index d64d10c..87a077b 100644
>>>>>> --- a/string/test-memchr.c
>>>>>> +++ b/string/test-memchr.c
>>>>>> @@ -210,6 +210,12 @@ test_main (void)
>>>>>>        do_test (0, i, i + 1, i + 1, 0);
>>>>>>      }
>>>>>>
>>>>>> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
>>>>>> +     with address near end of the page.  */
>>>>>> +  for (i = 2; i < 16; ++i)
>>>>>> +    /* page_size is in fact getpagesize() * 2.  */
>>>>>> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
>>>>>> +
>>>>>>    do_random_tests ();
>>>>>>    return ret;
>>>>>>  }
>>>>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>> index 910679c..e41f324 100644
>>>>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>> @@ -117,7 +117,6 @@ L(crosscache):
>>>>>>
>>>>>>  # ifndef USE_AS_RAWMEMCHR
>>>>>>       jnz     L(match_case2_prolog1)
>>>>>> -     lea     -16(%edx), %edx
>>>>>>          /* Calculate the last acceptable address and check for possible
>>>>>>             addition overflow by using satured math:
>>>>>>             edx = ecx + edx
>>>>>> @@ -125,6 +124,7 @@ L(crosscache):
>>>>>>       add     %ecx, %edx
>>>>>>       sbb     %eax, %eax
>>>>>>       or      %eax, %edx
>>>>>> +     sub     $16, %edx
>>>>>>       jbe     L(return_null)
>>>>>>       lea     16(%edi), %edi
>>>>>>  # else
>>>>>>
>>>>
>>>> Looks good.
>>>>
>>>>
>>>
>>> Just a thought.  Is this approach
>>>
>>>         /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)"
>>>            instead of "(edx + ecx) - 16" to void possible addition
>>>            overflow.  */
>>>         neg     %ecx
>>>         add     $16, %ecx
>>>         sub     %ecx, %edx
>>>         jbe     L(return_null)
>>>
>>> a little bit better?
>>
>> I do not have a strong preference here, although imho it is simpler to
>> understand why the saturated math would work here and it is the same
>> strategy used on other arch implementations.  May you might outline in
>> comment that ecx is always in the range of [0,64) so '-ecx + 16' can
>> possible underflow.  Is is this change just for micro-optimization?
>>
>
> Sorry, my bad here.  I made some confusion, in fact what I meant was ecx
> was indeed in range [0,16) (is the and instruction for alignment that
> uses 64) and your suggestion *can't* overflow.  I would recommend
> you to just outline the possible value of 'edx' in comments.

Here is a patch for SSE2 memchr.   OK for master?

-- 
H.J.

[-- Attachment #2: 0001-x86-Optimize-SSE2-memchr-overflow-calculation.patch --]
[-- Type: text/x-patch, Size: 2506 bytes --]

From 70c5d56431e54c7ef8dc8b8eb300a343a97ab72d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 19 May 2017 09:04:23 -0700
Subject: [PATCH] x86: Optimize SSE2 memchr overflow calculation

SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16.  Use
"edx - (16 - ecx)", instead of satured math, to avoid possible addition
overflow.  This replaces

	add	%ecx, %edx
	sbb	%eax, %eax
	or	%eax, %edx
	sub	$16, %edx

with

	neg	%ecx
	add	$16, %ecx
	sub	%ecx, %edx

It is the same for x86_64, except for rcx/rdx, instead of ecx/edx.

	* sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use
	"edx + ecx - 16" to avoid possible addition overflow.
	* sysdeps/x86_64/memchr.S (memchr): Likewise.
---
 sysdeps/i386/i686/multiarch/memchr-sse2.S | 14 ++++++--------
 sysdeps/x86_64/memchr.S                   | 14 ++++++--------
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
index e41f324..172d70d 100644
--- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
+++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
@@ -117,14 +117,12 @@ L(crosscache):
 
 # ifndef USE_AS_RAWMEMCHR
 	jnz	L(match_case2_prolog1)
-        /* Calculate the last acceptable address and check for possible
-           addition overflow by using satured math:
-           edx = ecx + edx
-           edx |= -(edx < ecx)  */
-	add	%ecx, %edx
-	sbb	%eax, %eax
-	or	%eax, %edx
-	sub	$16, %edx
+        /* "ecx" is less than 16.  Calculate "edx + ecx - 16" by using
+	   "edx - (16 - ecx)" instead of "(edx + ecx) - 16" to void
+	   possible addition overflow.  */
+	neg	%ecx
+	add	$16, %ecx
+	sub	%ecx, %edx
 	jbe	L(return_null)
 	lea	16(%edi), %edi
 # else
diff --git a/sysdeps/x86_64/memchr.S b/sysdeps/x86_64/memchr.S
index a205a25..f82e1c5 100644
--- a/sysdeps/x86_64/memchr.S
+++ b/sysdeps/x86_64/memchr.S
@@ -76,14 +76,12 @@ L(crosscache):
 
 	.p2align 4
 L(unaligned_no_match):
-        /* Calculate the last acceptable address and check for possible
-           addition overflow by using satured math:
-           rdx = rcx + rdx
-           rdx |= -(rdx < rcx)  */
-	add	%rcx, %rdx
-	sbb	%rax, %rax
-	or	%rax, %rdx
-	sub	$16, %rdx
+        /* "rcx" is less than 16.  Calculate "rdx + rcx - 16" by using
+	   "rdx - (16 - rcx)" instead of "(rdx + rcx) - 16" to void
+	   possible addition overflow.  */
+	neg	%rcx
+	add	$16, %rcx
+	sub	%rcx, %rdx
 	jbe	L(return_null)
 	add	$16, %rdi
 	sub	$64, %rdx
-- 
2.9.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
  2017-05-19 16:14           ` H.J. Lu
@ 2017-05-19 17:24             ` Adhemerval Zanella
  0 siblings, 0 replies; 9+ messages in thread
From: Adhemerval Zanella @ 2017-05-19 17:24 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library



On 19/05/2017 13:14, H.J. Lu wrote:
> On Fri, May 19, 2017 at 8:10 AM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>> On 19/05/2017 11:45, Adhemerval Zanella wrote:
>>>
>>>
>>> On 19/05/2017 10:44, H.J. Lu wrote:
>>>> On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>> Ping (with the page_size/2 fix included).
>>>>>>
>>>>>> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>>>>>>> This patch fixes the regression added by 23d2770 for final address
>>>>>>> overflow calculation.  The subtraction of the considered size (16)
>>>>>>> at line 120 is at wrong place, for sizes less than 16 subsequent
>>>>>>> overflow check will not take in consideration an invalid size (since
>>>>>>> the subtraction will be negative).  Also, the lea instruction also
>>>>>>> does not raise the carry flag (CF) that is used in subsequent jbe
>>>>>>> to check for overflow.
>>>>>>>
>>>>>>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>>>>>>> is first check and a sub instruction is issued.  In case of resulting
>>>>>>> negative size, CF will be set by the sub instruction and a NULL
>>>>>>> result will be returned.  The patch also add similar tests reported
>>>>>>> in bug report.
>>>>>>>
>>>>>>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>>>>>>
>>>>>>>       [BZ# 21182]
>>>>>>>       * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>>>>>>       near end of a page.
>>>>>>>       * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>>>>>>       overflow calculation.
>>>>>>> ---
>>>>>>>  ChangeLog                                 | 7 +++++++
>>>>>>>  string/test-memchr.c                      | 6 ++++++
>>>>>>>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>>>>>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>>>>>>> index d64d10c..87a077b 100644
>>>>>>> --- a/string/test-memchr.c
>>>>>>> +++ b/string/test-memchr.c
>>>>>>> @@ -210,6 +210,12 @@ test_main (void)
>>>>>>>        do_test (0, i, i + 1, i + 1, 0);
>>>>>>>      }
>>>>>>>
>>>>>>> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
>>>>>>> +     with address near end of the page.  */
>>>>>>> +  for (i = 2; i < 16; ++i)
>>>>>>> +    /* page_size is in fact getpagesize() * 2.  */
>>>>>>> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
>>>>>>> +
>>>>>>>    do_random_tests ();
>>>>>>>    return ret;
>>>>>>>  }
>>>>>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>>> index 910679c..e41f324 100644
>>>>>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>>> @@ -117,7 +117,6 @@ L(crosscache):
>>>>>>>
>>>>>>>  # ifndef USE_AS_RAWMEMCHR
>>>>>>>       jnz     L(match_case2_prolog1)
>>>>>>> -     lea     -16(%edx), %edx
>>>>>>>          /* Calculate the last acceptable address and check for possible
>>>>>>>             addition overflow by using satured math:
>>>>>>>             edx = ecx + edx
>>>>>>> @@ -125,6 +124,7 @@ L(crosscache):
>>>>>>>       add     %ecx, %edx
>>>>>>>       sbb     %eax, %eax
>>>>>>>       or      %eax, %edx
>>>>>>> +     sub     $16, %edx
>>>>>>>       jbe     L(return_null)
>>>>>>>       lea     16(%edi), %edi
>>>>>>>  # else
>>>>>>>
>>>>>
>>>>> Looks good.
>>>>>
>>>>>
>>>>
>>>> Just a thought.  Is this approach
>>>>
>>>>         /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)"
>>>>            instead of "(edx + ecx) - 16" to void possible addition
>>>>            overflow.  */
>>>>         neg     %ecx
>>>>         add     $16, %ecx
>>>>         sub     %ecx, %edx
>>>>         jbe     L(return_null)
>>>>
>>>> a little bit better?
>>>
>>> I do not have a strong preference here, although imho it is simpler to
>>> understand why the saturated math would work here and it is the same
>>> strategy used on other arch implementations.  May you might outline in
>>> comment that ecx is always in the range of [0,64) so '-ecx + 16' can
>>> possible underflow.  Is is this change just for micro-optimization?
>>>
>>
>> Sorry, my bad here.  I made some confusion, in fact what I meant was ecx
>> was indeed in range [0,16) (is the and instruction for alignment that
>> uses 64) and your suggestion *can't* overflow.  I would recommend
>> you to just outline the possible value of 'edx' in comments.
> 
> Here is a patch for SSE2 memchr.   OK for master?
> 

LGTM, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-19 17:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-14 17:28 [PATCH] Fix i686 memchr overflow calculation (BZ#21182) Adhemerval Zanella
2017-03-14 19:48 ` Wainer dos Santos Moschetta
2017-03-28 19:22 ` Adhemerval Zanella
2017-03-28 21:47   ` H.J. Lu
2017-05-19 13:44     ` H.J. Lu
2017-05-19 14:45       ` Adhemerval Zanella
2017-05-19 15:10         ` Adhemerval Zanella
2017-05-19 16:14           ` H.J. Lu
2017-05-19 17:24             ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).