* Remove string length check from sscanf()
@ 2021-03-04 11:20 Dominic Letz
2021-03-04 12:01 ` Florian Weimer
0 siblings, 1 reply; 2+ messages in thread
From: Dominic Letz @ 2021-03-04 11:20 UTC (permalink / raw)
To: libc-help
You might or might not have seen this article on the bad performance of
sscanf() when used repeatedly in scanning a big file.
https://news.ycombinator.com/item?id=26296339
It boils down to sscanf("%f", JSON) needing to call strlen (or in glibc
case __rawmemchr) on a 10MB string. Now the article is entertaining and
all but I was wondering why not fix the root cause in sscanf() itself
instead of having workarounds. Especially as this is not limited to GTA
but also other apps: https://www.mattkeeter.com/blog/2021-03-01-happen/
So why do check the size at all?
I thought one could go char by char until the pattern (e.g. "%f") is
fulfilled and then return. That would solve the issue on it's root --
and who knows how many programs would suddenly get faster...
So looking at glibc sources I've found the culprit in abstraction. Looks
like a FILE* like string-buffer object is created around the c-string:
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/sscanf.c;h=75daedd2aebe392e7f0d9e5d8816c1524b28f6ec;hb=HEAD#l34
And that abstraction does call __rawmemchr when initializing to know
it's bounds here:
https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/strops.c;h=6208518cdd861b770787c3f64cbfa3b6b9fb9afd;hb=HEAD#l41
I'm reading this source the first time, but I guess to not break
anything I could introduce a new type of FILE* string-buffer let's say
in 'strops_incr.c' that is working incrementally reading one char at the
time from the underlying string skipping the strlen()...
So that brings me to my question. Is the incremental approach something
that would get accepted when I prepare a patch? And how / where to
submit that? (Sorry GitHub generation speaking).
Happy to hear any feedback!
Best
Dominic
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Remove string length check from sscanf()
2021-03-04 11:20 Remove string length check from sscanf() Dominic Letz
@ 2021-03-04 12:01 ` Florian Weimer
0 siblings, 0 replies; 2+ messages in thread
From: Florian Weimer @ 2021-03-04 12:01 UTC (permalink / raw)
To: Dominic Letz; +Cc: libc-help
* Dominic Letz:
> I'm reading this source the first time, but I guess to not break
> anything I could introduce a new type of FILE* string-buffer let's say
> in 'strops_incr.c' that is working incrementally reading one char at
> the time from the underlying string skipping the strlen()...
And maybe special-case the char-by-char check for small buffer sizes
based on the results of a strnlen call. I agree that this should work.
Even today, applications can use fmemopen and fscanf to avoid the
overhead.
> So that brings me to my question. Is the incremental approach
> something that would get accepted when I prepare a patch? And how /
> where to submit that? (Sorry GitHub generation speaking).
You could send a patch to libc-alpha (git send-email should work). Note
that the change will exceed the size threshold for FSF copyright
assignment, so you'll have to follow that process as well:
Contribution Checklist: FSF copyright assignment
<https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_assignment>
Thanks,
Florian
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-03-04 12:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-04 11:20 Remove string length check from sscanf() Dominic Letz
2021-03-04 12:01 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).