public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug stdio/17577] New: sscanf extremely slow on large strings
@ 2014-11-11 13:49 shabbyx at gmail dot com
2014-11-11 13:50 ` [Bug stdio/17577] " shabbyx at gmail dot com
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: shabbyx at gmail dot com @ 2014-11-11 13:49 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
Bug ID: 17577
Summary: sscanf extremely slow on large strings
Product: glibc
Version: 2.19
Status: NEW
Severity: normal
Priority: P2
Component: stdio
Assignee: unassigned at sourceware dot org
Reporter: shabbyx at gmail dot com
The bug is that `sscanf` (`vsscanf` actually) is very slow on large strings. It
_seems_ like `vsscanf` is first trying to find the end of the string (with
`rawmemchr`), which is expensive given that most of the string is not going to
be read.
Here's a test code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
#define N 50000
static int _debug_helper(const char *src, int *a, int *n)
{
#if 1
return sscanf(src, "%d%n", a, n);
#else
char *end;
errno = 0;
*a = strtol(src, &end, 10);
*n = end - src;
return errno == 0 && *n > 0;
#endif
}
int main()
{
int i;
int a;
int n;
int so_far = 0;
char *big_string = malloc(N * 4 + 1);
for (i = 0; i < N; ++i)
strcpy(big_string + i * 4, "123 ");
big_string[N * 4] = '\0';
while (1)
{
if (_debug_helper(big_string + so_far, &a, &n) != 1)
break;
so_far += n;
}
return 0;
}
Compile with: gcc -Wall -g -O0 main.c
Running this code with `N = 50000` and using `sscanf`, I get:
$ time ./a.out
real 0m1.602s
user 0m1.596s
sys 0m0.000s
Running it with `N = 50000` and using the substitute code, I get:
$ time ./a.out
real 0m0.002s
user 0m0.000s
sys 0m0.000s
Which is considrably smaller. Note that this shows that the part with `malloc`
and initialization take very small time. Running callgrind shows that almost
all of the time when using `sscanf` is spent in `rawmemchr`. Indeed, using gdb
and randomly hitting CTRL+C, you always end up with a stack trace like this:
#0 __rawmemchr_ia32 () at ../sysdeps/i386/i686/multiarch/../../rawmemchr.S:167
#1 0xb7e78a06 in _IO_str_init_static_internal () at strops.c:44
#2 0xb7e5c857 in __GI___isoc99_vsscanf () at isoc99_vsscanf.c:41
#3 0xb7e5c7cf in __isoc99_sscanf () at isoc99_sscanf.c:31
#4 0x08048494 in _debug_helper () at main.c:11
#5 0x08048517 in main () at main.c:41
This means that `rawmemchr` is slowing down `sscanf` by an unnecessary degree.
To further prove this point (and to confirm my guess that `rawmemchr` is
reading the whole string), here are a couple more tests:
With `N = 25000` and using `sscanf`:
$ time ./a.out
real 0m0.407s
user 0m0.404s
sys 0m0.000s
With `N = 12500` and using `sscanf`:
$ time ./a.out
real 0m0.106s
user 0m0.104s
sys 0m0.000s
This clearly shows an `O(N^2)` behavior. The main loop of the program is
`O(N)`, which means `sscanf` is running at `O(N)`. For large `N`, this is
significant. On the other hand, the actual behavior of `sscanf` should be to
read from the string according to the format string and no more, which in this
case (using `%d` and "123" as values) is of constant time.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
@ 2014-11-11 13:50 ` shabbyx at gmail dot com
2021-03-01 12:56 ` marat at slonopotamus dot org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: shabbyx at gmail dot com @ 2014-11-11 13:50 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
Shahbaz Youssefi <shabbyx at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Host| |Linux (Ubuntu 14.04)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
2014-11-11 13:50 ` [Bug stdio/17577] " shabbyx at gmail dot com
@ 2021-03-01 12:56 ` marat at slonopotamus dot org
2021-03-02 12:46 ` amonakov at gmail dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: marat at slonopotamus dot org @ 2021-03-01 12:56 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
Marat Radchenko <marat at slonopotamus dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |marat at slonopotamus dot org
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
2014-11-11 13:50 ` [Bug stdio/17577] " shabbyx at gmail dot com
2021-03-01 12:56 ` marat at slonopotamus dot org
@ 2021-03-02 12:46 ` amonakov at gmail dot com
2021-03-03 2:22 ` benfrantzdale at gmail dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: amonakov at gmail dot com @ 2021-03-02 12:46 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
Alexander Monakov <amonakov at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
` (2 preceding siblings ...)
2021-03-02 12:46 ` amonakov at gmail dot com
@ 2021-03-03 2:22 ` benfrantzdale at gmail dot com
2021-03-03 2:26 ` benfrantzdale at gmail dot com
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: benfrantzdale at gmail dot com @ 2021-03-03 2:22 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
BenFrantzDale <benfrantzdale at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |benfrantzdale at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
` (3 preceding siblings ...)
2021-03-03 2:22 ` benfrantzdale at gmail dot com
@ 2021-03-03 2:26 ` benfrantzdale at gmail dot com
2021-03-03 3:35 ` fweimer at redhat dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: benfrantzdale at gmail dot com @ 2021-03-03 2:26 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
--- Comment #1 from BenFrantzDale <benfrantzdale at gmail dot com> ---
Does the C standard prescribe a big-O for sscanf? That is, while clearly this
could be faster, is the current implementation compliant (in which case it’s a
bug in the C standard) or is it just an implementation that’s compliant but
could be better?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
` (4 preceding siblings ...)
2021-03-03 2:26 ` benfrantzdale at gmail dot com
@ 2021-03-03 3:35 ` fweimer at redhat dot com
2021-03-03 19:31 ` bugdal at aerifal dot cx
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: fweimer at redhat dot com @ 2021-03-03 3:35 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |fweimer at redhat dot com
--- Comment #2 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to BenFrantzDale from comment #1)
> Does the C standard prescribe a big-O for sscanf? That is, while clearly
> this could be faster, is the current implementation compliant (in which case
> it’s a bug in the C standard) or is it just an implementation that’s
> compliant but could be better?
There is no complexity class requirement in the standard.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
` (5 preceding siblings ...)
2021-03-03 3:35 ` fweimer at redhat dot com
@ 2021-03-03 19:31 ` bugdal at aerifal dot cx
2021-03-15 20:38 ` dkk089 at gmail dot com
2021-03-15 21:56 ` amonakov at gmail dot com
8 siblings, 0 replies; 10+ messages in thread
From: bugdal at aerifal dot cx @ 2021-03-03 19:31 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
Rich Felker <bugdal at aerifal dot cx> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bugdal at aerifal dot cx
--- Comment #3 from Rich Felker <bugdal at aerifal dot cx> ---
Of course the standard does not make any guarantees anywhere about complexity
class, but this is a pretty serious QoI issue that's being hit in the real
world and makes sscanf largely unusable for parsers. I'd like to see it fixed.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
` (6 preceding siblings ...)
2021-03-03 19:31 ` bugdal at aerifal dot cx
@ 2021-03-15 20:38 ` dkk089 at gmail dot com
2021-03-15 21:56 ` amonakov at gmail dot com
8 siblings, 0 replies; 10+ messages in thread
From: dkk089 at gmail dot com @ 2021-03-15 20:38 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
Daniel Kozar <dkk089 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dkk089 at gmail dot com
--- Comment #4 from Daniel Kozar <dkk089 at gmail dot com> ---
This seems to be related to the CPU architecture. Running the posted code on a
virtualised pristine installation of Ubuntu 14.04 in i386 mode gives the
following results :
$ time ./a.out
real 0m0.718s
user 0m0.716s
sys 0m0.000s
However, running the same code on a separate installation of 14.04, but in
x86_64 mode gives drastically different results :
$ time ./a.out
real 0m0.086s
user 0m0.085s
sys 0m0.001s
This is reproducible with current versions of GCC and glibc (10.2.0 and 2.33
respectively), this time on actual hardware :
i386 (-m32) :
$ time ./a.out
real 0m0.688s
user 0m0.688s
sys 0m0.000s
x86_64 :
$ time ./a.out
real 0m0.042s
user 0m0.042s
sys 0m0.000s
The CPU is Ryzen 3700X.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug stdio/17577] sscanf extremely slow on large strings
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
` (7 preceding siblings ...)
2021-03-15 20:38 ` dkk089 at gmail dot com
@ 2021-03-15 21:56 ` amonakov at gmail dot com
8 siblings, 0 replies; 10+ messages in thread
From: amonakov at gmail dot com @ 2021-03-15 21:56 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17577
--- Comment #5 from Alexander Monakov <amonakov at gmail dot com> ---
That is a difference in constant factor, on 64-bit it uses AVX2 rawmemchr
(selected by ifunc), but on 32-bit sscanf invokes basic ia32 rawmemchr
implementation directly.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-03-15 21:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-11 13:49 [Bug stdio/17577] New: sscanf extremely slow on large strings shabbyx at gmail dot com
2014-11-11 13:50 ` [Bug stdio/17577] " shabbyx at gmail dot com
2021-03-01 12:56 ` marat at slonopotamus dot org
2021-03-02 12:46 ` amonakov at gmail dot com
2021-03-03 2:22 ` benfrantzdale at gmail dot com
2021-03-03 2:26 ` benfrantzdale at gmail dot com
2021-03-03 3:35 ` fweimer at redhat dot com
2021-03-03 19:31 ` bugdal at aerifal dot cx
2021-03-15 20:38 ` dkk089 at gmail dot com
2021-03-15 21:56 ` amonakov at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).