* [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation.
@ 2015-05-25 5:10 Ondřej Bílka
2015-05-25 11:58 ` [RFC PATCH v2] " Ondřej Bílka
2015-05-28 16:51 ` [RFC PATCH] " Joseph Myers
0 siblings, 2 replies; 9+ messages in thread
From: Ondřej Bílka @ 2015-05-25 5:10 UTC (permalink / raw)
To: libc-alpha
Hi,
I just found that on x64 gcc __builtin_strcmp suck a lot. And by lot I
mean its around three times slower than libcall by using rep cmpsb which
even intel manual says that shouldn't be used.
So I decided to write a strcmp inline that works. As reencoding it as
gcc pass would be extra effort without any benefit I skipped it.
This adds x64 specific implementation for constant arguments less than
16 bytes. These are quite common as programmer often does checks like
if (strcmp (x, "foo"))
Extending this for 32 bytes and more would be straightforward but
wouldn't help much as there aren't lot of that large words.
As same trick could be used for strncmp and memcmp with size <= 16
with few extra checks as we exploited alignment of string literals.
It could be optimized more with cooperation from gcc. A page-cross check
could be omitted in most cases using dataflow that gcc already does in
fortify_source. A CROSS_PAGE macro could first check for
__builtin_valid_range_p(x, x+16) which evaluates to true if gcc can
prove that x is more than 16 bytes large.
A possible issue would be introducing sse with string.h. How detect gcc
-no-sse flag?
* sysdeps/x86_64/bits/string.h: New file.
diff --git a/sysdeps/x86_64/bits/string.h b/sysdeps/x86_64/bits/string.h
index 5893676..c8c5d2d 100644
--- a/sysdeps/x86_64/bits/string.h
+++ b/sysdeps/x86_64/bits/string.h
@@ -14,6 +14,9 @@
#ifdef _USE_GNU
# if __GNUC_PREREQ (3, 2)
# define _HAVE_STRING_ARCH_strcmp
+# define _HAVE_STRING_ARCH_strncmp
+# define _HAVE_STRING_ARCH_memcmp
+
# include <stdint.h>
# include <emmintrin.h>
# define __LOAD(x) _mm_load_si128 ((__tp_vector *) (x))
@@ -23,17 +26,30 @@
typedef __m128i __tp_vector;
typedef uint64_t __tp_mask;
-static inline __attribute__ ((always_inline)) int
-__strcmp_c (char *s, char *c, int n)
+#define CROSS_PAGE(p) __builtin_expect (((uintptr_t) s) % 4096 \
+ > 4096 - sizeof (__tp_vector) , 0)
+
+static inline __attribute__ ((always_inline))
+int
+__memcmp_small_a (char *s, char *c, int n)
{
- if (__builtin_expect (((uintptr_t) s) % 4096 > 4096 - sizeof (__tp_vector) , 0))
- return strcmp (s, c);
- __tp_mask m = get_mask (__EQ (__LOADU (s), __LOAD(c))) | 1UL << n;
+ if (CROSS_PAGE (s))
+ return memcmp (s, c, n);
+ __tp_mask m = get_mask (__EQ (__LOADU (s), __LOAD (c))) | 1UL << n;
int found = __builtin_ctzl (m);
return s[found] - c[found];
}
-
-#define __strcmp_cs(s1, s2) -strcmp_c (s2, s1)
+static inline __attribute__ ((always_inline))
+int
+__memcmp_small (char *s, char *c, int n)
+{
+ if (CROSS_PAGE (s) || CROSS_PAGE (c))
+ return memcmp (s, c, n);
+ __tp_mask m = get_mask (__EQ (__LOADU (s), __LOADU (c))) | 1UL << n;
+ int found = __builtin_ctzl (m);
+ return s[found] - c[found];
+}
+#define __min(x,y) (x < y ? x : y)
/* Dereferencing a pointer arg to run sizeof on it fails for the void
pointer case, so we use this instead.
@@ -43,12 +59,27 @@ __strcmp_c (char *s, char *c, int n)
# define strcmp(s1, s2) \
- (__extension__ \
- (__builtin_constant_p (s1) && sizeof (s1) <= 16 \
- ? __strcmp_c (s1, s2, sizeof (s1)) \
- : (__builtin_constant_p (s2) && sizeof (s2) <= 16 \
- ? __strcmp_cs (s1, s2, sizeof (s2)) \
+ (__extension__ \
+ (__builtin_constant_p (s1) && sizeof (s1) <= 16 \
+ ? __memcmp_small_a (s1, s2, sizeof (s1)) \
+ : (__builtin_constant_p (s2) && sizeof (s2) <= 16 \
+ ? - __memcmp_small_a (s2, s1, sizeof (s2)) \
: strcmp (s1, s2))))
+
+# define strncmp(s1, s2, n) \
+ (__extension__ \
+ (__builtin_constant_p (s1) && sizeof (s1) <= 16 \
+ ? __memcmp_small_a (s1, s2, min (n, sizeof (s1))) \
+ : (__builtin_constant_p (s2) && sizeof (s2) <= 16 \
+ ? - __memcmp_small_a (s2, s1, min (n, sizeof (s2))) \
+ : strncmp (s1, s2, n))))
+
+# define memcmp(s1, s2, n) \
+ (__extension__ \
+ (__builtin_constant_p (n <= 16) && n <= 16 \
+ ? n == 0 ? 0 : __memcmp_small (s1, s2, n - 1)) \
+ : memcmp (s1, s2, n))
+
# undef __string2_1bptr_p
# endif
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH v2] Add strcmp, strncmp, memcmp inline implementation.
2015-05-25 5:10 [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation Ondřej Bílka
@ 2015-05-25 11:58 ` Ondřej Bílka
2015-05-25 12:04 ` Andrew Pinski
2015-05-28 16:56 ` Joseph Myers
2015-05-28 16:51 ` [RFC PATCH] " Joseph Myers
1 sibling, 2 replies; 9+ messages in thread
From: Ondřej Bílka @ 2015-05-25 11:58 UTC (permalink / raw)
To: libc-alpha
Sorry for noise,
I wanted to ask if one needs to surround strlen with
__builtin_constant_p to avoid runtime overhead, and when I read patch I
realized that I by mistake included older partial patch. What I intended
is here, please ignore previous one. I know that it had bug when you
have string literal with zero in middle.
So do I need that to be safe? And other comments?
On Mon, May 25, 2015 at 03:58:26AM +0200, OndÅej BÃlka wrote:
> Hi,
>
> I just found that on x64 gcc __builtin_strcmp suck a lot. And by lot I
> mean its around three times slower than libcall by using rep cmpsb which
> even intel manual says that shouldn't be used.
>
> So I decided to write a strcmp inline that works. As reencoding it as
> gcc pass would be extra effort without any benefit I skipped it.
>
> This adds x64 specific implementation for constant arguments less than
> 16 bytes. These are quite common as programmer often does checks like
>
> if (strcmp (x, "foo"))
>
> Extending this for 32 bytes and more would be straightforward but
> wouldn't help much as there aren't lot of that large words.
>
> As same trick could be used for strncmp and memcmp with size <= 16
> with few extra checks as we exploited alignment of string literals.
>
> It could be optimized more with cooperation from gcc. A page-cross check
> could be omitted in most cases using dataflow that gcc already does in
> fortify_source. A CROSS_PAGE macro could first check for
> __builtin_valid_range_p(x, x+16) which evaluates to true if gcc can
> prove that x is more than 16 bytes large.
>
>
> A possible issue would be introducing sse with string.h. How detect gcc
> -no-sse flag?
>
* sysdeps/x86_64/bits/string.h: New file.
diff --git a/sysdeps/x86_64/bits/string.h b/sysdeps/x86_64/bits/string.h
new file mode 100644
index 0000000..c4d154b
--- /dev/null
+++ b/sysdeps/x86_64/bits/string.h
@@ -0,0 +1,87 @@
+/* This file should provide inline versions of string functions.
+
+ Surround GCC-specific parts with #ifdef __GNUC__, and use `__extern_inline'.
+
+ This file should define __STRING_INLINES if functions are actually defined
+ as inlines. */
+
+#ifndef _BITS_STRING_H
+#define _BITS_STRING_H 1
+
+/* Define if architecture can access unaligned multi-byte variables. */
+#define _STRING_ARCH_unaligned 0
+
+#ifdef _USE_GNU
+# if __GNUC_PREREQ (3, 2)
+# define _HAVE_STRING_ARCH_strcmp
+# define _HAVE_STRING_ARCH_strncmp
+# define _HAVE_STRING_ARCH_memcmp
+
+# include <stdint.h>
+# include <emmintrin.h>
+# define __LOAD(x) _mm_load_si128 ((__tp_vector *) (x))
+# define __LOADU(x) _mm_loadu_si128 ((__tp_vector *) (x))
+# define get_mask(x) ((uint64_t) _mm_movemask_epi8 (x))
+# define __EQ _mm_cmpeq_epi8
+typedef __m128i __tp_vector;
+typedef uint64_t __tp_mask;
+
+#define CROSS_PAGE(p) __builtin_expect (((uintptr_t) s) % 4096 \
+ > 4096 - sizeof (__tp_vector) , 0)
+
+static inline __attribute__ ((always_inline))
+int
+__memcmp_small_a (char *s, char *c, int n)
+{
+ if (CROSS_PAGE (s))
+ return memcmp (s, c, n);
+ __tp_mask m = get_mask (__EQ (__LOADU (s), __LOAD (c))) | 1UL << n;
+ int found = __builtin_ctzl (m);
+ return s[found] - c[found];
+}
+static inline __attribute__ ((always_inline))
+int
+__memcmp_small (char *s, char *c, int n)
+{
+ if (CROSS_PAGE (s) || CROSS_PAGE (c))
+ return memcmp (s, c, n);
+ __tp_mask m = get_mask (__EQ (__LOADU (s), __LOADU (c))) | 1UL << n;
+ int found = __builtin_ctzl (m);
+ return s[found] - c[found];
+}
+#define __min(x,y) (x < y ? x : y)
+
+/* Dereferencing a pointer arg to run sizeof on it fails for the void
+ pointer case, so we use this instead.
+ Note that __x is evaluated twice. */
+#define __string2_1bptr_p(__x) __builtin_constant_p (__x) && \
+ ((size_t)(const void *)((__x) + 1) - (size_t)(const void *)(__x) == 1)
+
+
+# define strcmp(s1, s2) \
+ (__extension__ \
+ (__string2_1bptr_p (s1) && strlen (s1) <= 16 \
+ ? __memcmp_small_a (s1, s2, strlen (s1)) \
+ : (__string2_1bptr_p (s2) && strlen (s2) <= 16 \
+ ? - __memcmp_small_a (s2, s1, strlen (s2)) \
+ : strcmp (s1, s2))))
+
+# define strncmp(s1, s2, n) \
+ (__extension__ \
+ (__string2_1bptr_p (s1) && strlen (s1) <= 16 \
+ ? __memcmp_small_a (s1, s2, min (n, strlen (s1))) \
+ : (__string2_1bptr_p (s2) && strlen (s2) <= 16 \
+ ? - __memcmp_small_a (s2, s1, min (n, strlen (s2))) \
+ : strncmp (s1, s2, n))))
+
+# define memcmp(s1, s2, n) \
+ (__extension__ \
+ (__builtin_constant_p (n <= 16) && n <= 16 \
+ ? n == 0 ? 0 : __memcmp_small (s1, s2, n - 1)) \
+ : memcmp (s1, s2, n))
+
+
+# undef __string2_1bptr_p
+# endif
+#endif
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v2] Add strcmp, strncmp, memcmp inline implementation.
2015-05-25 11:58 ` [RFC PATCH v2] " Ondřej Bílka
@ 2015-05-25 12:04 ` Andrew Pinski
2015-05-25 12:16 ` Ondřej Bílka
2015-05-28 16:56 ` Joseph Myers
1 sibling, 1 reply; 9+ messages in thread
From: Andrew Pinski @ 2015-05-25 12:04 UTC (permalink / raw)
To: Ondřej Bílka; +Cc: GNU C Library
On Mon, May 25, 2015 at 4:11 PM, Ondřej Bílka <neleai@seznam.cz> wrote:
> Sorry for noise,
>
> I wanted to ask if one needs to surround strlen with
> __builtin_constant_p to avoid runtime overhead, and when I read patch I
> realized that I by mistake included older partial patch. What I intended
> is here, please ignore previous one. I know that it had bug when you
> have string literal with zero in middle.
>
> So do I need that to be safe? And other comments?
Yes what will it take to get this into GCC instead of adding one more
hack to glibc for this?
Sorry but I feel like this would be better in GCC than doing this in glibc.
Thanks,
Andrew
>
> On Mon, May 25, 2015 at 03:58:26AM +0200, Ondřej Bílka wrote:
>> Hi,
>>
>> I just found that on x64 gcc __builtin_strcmp suck a lot. And by lot I
>> mean its around three times slower than libcall by using rep cmpsb which
>> even intel manual says that shouldn't be used.
>>
>> So I decided to write a strcmp inline that works. As reencoding it as
>> gcc pass would be extra effort without any benefit I skipped it.
>>
>> This adds x64 specific implementation for constant arguments less than
>> 16 bytes. These are quite common as programmer often does checks like
>>
>> if (strcmp (x, "foo"))
>>
>> Extending this for 32 bytes and more would be straightforward but
>> wouldn't help much as there aren't lot of that large words.
>>
>> As same trick could be used for strncmp and memcmp with size <= 16
>> with few extra checks as we exploited alignment of string literals.
>>
>> It could be optimized more with cooperation from gcc. A page-cross check
>> could be omitted in most cases using dataflow that gcc already does in
>> fortify_source. A CROSS_PAGE macro could first check for
>> __builtin_valid_range_p(x, x+16) which evaluates to true if gcc can
>> prove that x is more than 16 bytes large.
>>
>>
>> A possible issue would be introducing sse with string.h. How detect gcc
>> -no-sse flag?
>>
>
> * sysdeps/x86_64/bits/string.h: New file.
>
> diff --git a/sysdeps/x86_64/bits/string.h b/sysdeps/x86_64/bits/string.h
> new file mode 100644
> index 0000000..c4d154b
> --- /dev/null
> +++ b/sysdeps/x86_64/bits/string.h
> @@ -0,0 +1,87 @@
> +/* This file should provide inline versions of string functions.
> +
> + Surround GCC-specific parts with #ifdef __GNUC__, and use `__extern_inline'.
> +
> + This file should define __STRING_INLINES if functions are actually defined
> + as inlines. */
> +
> +#ifndef _BITS_STRING_H
> +#define _BITS_STRING_H 1
> +
> +/* Define if architecture can access unaligned multi-byte variables. */
> +#define _STRING_ARCH_unaligned 0
> +
> +#ifdef _USE_GNU
> +# if __GNUC_PREREQ (3, 2)
> +# define _HAVE_STRING_ARCH_strcmp
> +# define _HAVE_STRING_ARCH_strncmp
> +# define _HAVE_STRING_ARCH_memcmp
> +
> +# include <stdint.h>
> +# include <emmintrin.h>
> +# define __LOAD(x) _mm_load_si128 ((__tp_vector *) (x))
> +# define __LOADU(x) _mm_loadu_si128 ((__tp_vector *) (x))
> +# define get_mask(x) ((uint64_t) _mm_movemask_epi8 (x))
> +# define __EQ _mm_cmpeq_epi8
> +typedef __m128i __tp_vector;
> +typedef uint64_t __tp_mask;
> +
> +#define CROSS_PAGE(p) __builtin_expect (((uintptr_t) s) % 4096 \
> + > 4096 - sizeof (__tp_vector) , 0)
> +
> +static inline __attribute__ ((always_inline))
> +int
> +__memcmp_small_a (char *s, char *c, int n)
> +{
> + if (CROSS_PAGE (s))
> + return memcmp (s, c, n);
> + __tp_mask m = get_mask (__EQ (__LOADU (s), __LOAD (c))) | 1UL << n;
> + int found = __builtin_ctzl (m);
> + return s[found] - c[found];
> +}
> +static inline __attribute__ ((always_inline))
> +int
> +__memcmp_small (char *s, char *c, int n)
> +{
> + if (CROSS_PAGE (s) || CROSS_PAGE (c))
> + return memcmp (s, c, n);
> + __tp_mask m = get_mask (__EQ (__LOADU (s), __LOADU (c))) | 1UL << n;
> + int found = __builtin_ctzl (m);
> + return s[found] - c[found];
> +}
> +#define __min(x,y) (x < y ? x : y)
> +
> +/* Dereferencing a pointer arg to run sizeof on it fails for the void
> + pointer case, so we use this instead.
> + Note that __x is evaluated twice. */
> +#define __string2_1bptr_p(__x) __builtin_constant_p (__x) && \
> + ((size_t)(const void *)((__x) + 1) - (size_t)(const void *)(__x) == 1)
> +
> +
> +# define strcmp(s1, s2) \
> + (__extension__ \
> + (__string2_1bptr_p (s1) && strlen (s1) <= 16 \
> + ? __memcmp_small_a (s1, s2, strlen (s1)) \
> + : (__string2_1bptr_p (s2) && strlen (s2) <= 16 \
> + ? - __memcmp_small_a (s2, s1, strlen (s2)) \
> + : strcmp (s1, s2))))
> +
> +# define strncmp(s1, s2, n) \
> + (__extension__ \
> + (__string2_1bptr_p (s1) && strlen (s1) <= 16 \
> + ? __memcmp_small_a (s1, s2, min (n, strlen (s1))) \
> + : (__string2_1bptr_p (s2) && strlen (s2) <= 16 \
> + ? - __memcmp_small_a (s2, s1, min (n, strlen (s2))) \
> + : strncmp (s1, s2, n))))
> +
> +# define memcmp(s1, s2, n) \
> + (__extension__ \
> + (__builtin_constant_p (n <= 16) && n <= 16 \
> + ? n == 0 ? 0 : __memcmp_small (s1, s2, n - 1)) \
> + : memcmp (s1, s2, n))
> +
> +
> +# undef __string2_1bptr_p
> +# endif
> +#endif
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v2] Add strcmp, strncmp, memcmp inline implementation.
2015-05-25 12:04 ` Andrew Pinski
@ 2015-05-25 12:16 ` Ondřej Bílka
0 siblings, 0 replies; 9+ messages in thread
From: Ondřej Bílka @ 2015-05-25 12:16 UTC (permalink / raw)
To: Andrew Pinski; +Cc: GNU C Library
On Mon, May 25, 2015 at 04:23:52PM +0800, Andrew Pinski wrote:
> On Mon, May 25, 2015 at 4:11 PM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > Sorry for noise,
> >
> > I wanted to ask if one needs to surround strlen with
> > __builtin_constant_p to avoid runtime overhead, and when I read patch I
> > realized that I by mistake included older partial patch. What I intended
> > is here, please ignore previous one. I know that it had bug when you
> > have string literal with zero in middle.
> >
> > So do I need that to be safe? And other comments?
>
>
> Yes what will it take to get this into GCC instead of adding one more
> hack to glibc for this?
> Sorry but I feel like this would be better in GCC than doing this in glibc.
>
Sorry but it goes both ways. I could also argue why you should add one
more hack to gcc? See.
Try to convince me. My argument is easier maintainability. Try write a
gcc patch and post it. Then compare these side-by-side how are they easy
to read (I am not big fan of lisp).
Also if you want to rewrite this to gcc you are welcome. You will have
lot of work to do.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation.
2015-05-25 5:10 [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation Ondřej Bílka
2015-05-25 11:58 ` [RFC PATCH v2] " Ondřej Bílka
@ 2015-05-28 16:51 ` Joseph Myers
2015-05-28 17:17 ` Alexander Monakov
1 sibling, 1 reply; 9+ messages in thread
From: Joseph Myers @ 2015-05-28 16:51 UTC (permalink / raw)
To: Ondřej Bílka; +Cc: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]
On Mon, 25 May 2015, Ondøej BĂlka wrote:
> Hi,
>
> I just found that on x64 gcc __builtin_strcmp suck a lot. And by lot I
> mean its around three times slower than libcall by using rep cmpsb which
> even intel manual says that shouldn't be used.
GCC bug report number? It's not helpful to say "suck a lot" without
reporting the issues to the other project (identifying the compiler
options etc. in use). We need to cooperate appropriately with other free
software projects in developing glibc.
> So I decided to write a strcmp inline that works. As reencoding it as
> gcc pass would be extra effort without any benefit I skipped it.
There is obvious benefit to compiler implementations e.g. for calls to
strcmp in kernel space.
> * sysdeps/x86_64/bits/string.h: New file.
No installed headers should ever be in x86_64 or i386 sysdeps directories.
The installed headers should always come from sysdeps/x86 directories and
be usable for both 32-bit and 64-bit compilations by containing
appropriate conditionals on e.g. __x86_64__. Don't regress on HJ's
implementation of this for 2.16.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v2] Add strcmp, strncmp, memcmp inline implementation.
2015-05-25 11:58 ` [RFC PATCH v2] " Ondřej Bílka
2015-05-25 12:04 ` Andrew Pinski
@ 2015-05-28 16:56 ` Joseph Myers
1 sibling, 0 replies; 9+ messages in thread
From: Joseph Myers @ 2015-05-28 16:56 UTC (permalink / raw)
To: Ondřej Bílka; +Cc: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 299 bytes --]
On Mon, 25 May 2015, OndÅej BÃlka wrote:
> +# define get_mask(x) ((uint64_t) _mm_movemask_epi8 (x))
Even inside __USE_GNU, it's inappropriate to use names such as get_mask,
CROSS_PAGE, s, c, n, m, found, always_inline from the user's namespace.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation.
2015-05-28 16:51 ` [RFC PATCH] " Joseph Myers
@ 2015-05-28 17:17 ` Alexander Monakov
2015-05-28 22:57 ` Joseph Myers
0 siblings, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2015-05-28 17:17 UTC (permalink / raw)
To: Joseph Myers; +Cc: Ondřej Bílka, libc-alpha
[-- Attachment #1: Type: TEXT/PLAIN, Size: 670 bytes --]
On Thu, 28 May 2015, Joseph Myers wrote:
> On Mon, 25 May 2015, Ondøej BĂlka wrote:
>
> > Hi,
> >
> > I just found that on x64 gcc __builtin_strcmp suck a lot. And by lot I
> > mean its around three times slower than libcall by using rep cmpsb which
> > even intel manual says that shouldn't be used.
>
> GCC bug report number? It's not helpful to say "suck a lot" without
> reporting the issues to the other project (identifying the compiler
> options etc. in use). We need to cooperate appropriately with other free
> software projects in developing glibc.
I think this report should serve well:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
Alexander
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation.
2015-05-28 17:17 ` Alexander Monakov
@ 2015-05-28 22:57 ` Joseph Myers
2015-05-28 23:44 ` Ondřej Bílka
0 siblings, 1 reply; 9+ messages in thread
From: Joseph Myers @ 2015-05-28 22:57 UTC (permalink / raw)
To: Alexander Monakov; +Cc: Ondřej Bílka, libc-alpha
[-- Attachment #1: Type: text/plain, Size: 952 bytes --]
On Thu, 28 May 2015, Alexander Monakov wrote:
> On Thu, 28 May 2015, Joseph Myers wrote:
> > On Mon, 25 May 2015, Ondøej BĂlka wrote:
> >
> > > Hi,
> > >
> > > I just found that on x64 gcc __builtin_strcmp suck a lot. And by lot I
> > > mean its around three times slower than libcall by using rep cmpsb which
> > > even intel manual says that shouldn't be used.
> >
> > GCC bug report number? It's not helpful to say "suck a lot" without
> > reporting the issues to the other project (identifying the compiler
> > options etc. in use). We need to cooperate appropriately with other free
> > software projects in developing glibc.
>
> I think this report should serve well:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
Thanks, that's a report for one specific issue with memcmp. Each other
issue with each function should also have bugs filed (with a meta-bug
depending on them all).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation.
2015-05-28 22:57 ` Joseph Myers
@ 2015-05-28 23:44 ` Ondřej Bílka
0 siblings, 0 replies; 9+ messages in thread
From: Ondřej Bílka @ 2015-05-28 23:44 UTC (permalink / raw)
To: Joseph Myers; +Cc: Alexander Monakov, libc-alpha
On Thu, May 28, 2015 at 09:06:19PM +0000, Joseph Myers wrote:
> On Thu, 28 May 2015, Alexander Monakov wrote:
>
> > On Thu, 28 May 2015, Joseph Myers wrote:
> > > On Mon, 25 May 2015, OndÅej BÃlka wrote:
> > >
> > > > Hi,
> > > >
> > > > I just found that on x64 gcc __builtin_strcmp suck a lot. And by lot I
> > > > mean its around three times slower than libcall by using rep cmpsb which
> > > > even intel manual says that shouldn't be used.
> > >
> > > GCC bug report number? It's not helpful to say "suck a lot" without
> > > reporting the issues to the other project (identifying the compiler
> > > options etc. in use). We need to cooperate appropriately with other free
> > > software projects in developing glibc.
> >
> > I think this report should serve well:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
>
> Thanks, that's a report for one specific issue with memcmp. Each other
> issue with each function should also have bugs filed (with a meta-bug
> depending on them all).
>
strcmp/strncmp would be duplicates of that bug. Its result of correct
transformantion strcmp(s, c) -> memcmp (s, c, strlen(c) + 1) when c is
string literal.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-05-28 22:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-25 5:10 [RFC PATCH] Add strcmp, strncmp, memcmp inline implementation Ondřej Bílka
2015-05-25 11:58 ` [RFC PATCH v2] " Ondřej Bílka
2015-05-25 12:04 ` Andrew Pinski
2015-05-25 12:16 ` Ondřej Bílka
2015-05-28 16:56 ` Joseph Myers
2015-05-28 16:51 ` [RFC PATCH] " Joseph Myers
2015-05-28 17:17 ` Alexander Monakov
2015-05-28 22:57 ` Joseph Myers
2015-05-28 23:44 ` Ondřej Bílka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).