From: Jonathan Wakely <jwakely.gcc@gmail.com>
To: "François Dumont" <frs.dumont@gmail.com>
Cc: Lewis Hyatt <lhyatt@gmail.com>,
"libstdc++" <libstdc++@gcc.gnu.org>,
Dietmar Kuehl <dietmar_kuehl@yahoo.com>
Subject: Re: ostream::operator<<() and sputn()
Date: Mon, 10 Jan 2022 16:07:02 +0000 [thread overview]
Message-ID: <CAH6eHdSejdzFPVP+v_J8XWgd6AY9eoJ+x4Y4c5mC_bUcO=MJjg@mail.gmail.com> (raw)
In-Reply-To: <1ee79d3c-a373-cb2a-f975-e62d182f1882@gmail.com>
On Thu, 15 Jul 2021 at 18:11, François Dumont wrote:
>
> On 14/07/21 11:45 pm, Lewis Hyatt via Libstdc++ wrote:
> > On Wed, Jul 14, 2021 at 5:31 PM Jonathan Wakely<jwakely.gcc@gmail.com> wrote:
> >> On Wed, 14 Jul 2021 at 22:26, Lewis Hyatt via Libstdc++
> >> <libstdc++@gcc.gnu.org> wrote:
> >>> Hello-
> >>>
> >>> I noticed that libstdc++'s implementation of ostream::operator<<() prefers
> >>> to call sputn() on the underlying streambuf for all char, char*, and string
> >>> output operations, including single characters, rather than manipulate the
> >>> buffer directly. I am curious why it works this way, it feels perhaps
> >>> suboptimal to me because sputn() is mandated to call the virtual function
> >>> xsputn() on every call, while e.g. sputc() simply manipulates the buffer and
> >>> only needs a virtual call when the buffer is full. I always thought that the
> >>> buffer abstraction and the resulting avoidance of virtual calls for the
> >>> majority of operations was the main point of streambuf's design, and that
> >>> sputn() was meant for cases when the output would be large enough to
> >>> overflow the buffer anyway, if it may be possible to skip the buffer and
> >>> flush directly instead?
> >>>
> >>> It seems to me that for most typical use cases, xsputn() is still going to
> >>> want to use the buffer if the output fits into it; libstdc++ does this in
> >>> basic_filebuf, for example. So then it would seem to be beneficial to try
> >>> the buffer prior to making the virtual function call, instead of after --
> >>> especially because the typical char instantiation of __ostream_insert that
> >>> makes this call for operator<<() is hidden inside the .so, and is not
> >>> inlined or eligible for devirtualization optimizations.
> >>>
> >>> FWIW, here is a small test case.
> >>>
> >>> ---------
> >>> #include <ostream>
> >>> #include <iostream>
> >>> #include <fstream>
> >>> #include <sstream>
> >>> #include <chrono>
> >>> #include <random>
> >>> using namespace std;
> >>>
> >>> int main() {
> >>> constexpr size_t N = 500000000;
> >>> string s(N, 'x');
> >>>
> >>> ofstream of{"/dev/null"};
> >>> ostringstream os;
> >>> ostream* streams[] = {&of, &os};
> >>> mt19937 rng{random_device{}()};
> >>>
> >>> const auto timed_run = [&](const char* label, auto&& callback) {
> >>> const auto t1 = chrono::steady_clock::now();
> >>> for(char c: s) callback(*streams[rng() % 2], c);
> >>> const auto t2 = chrono::steady_clock::now();
> >>> cout << label << " took: "
> >>> << chrono::duration<double>(t2-t1).count()
> >>> << " seconds" << endl;
> >>> };
> >>>
> >>> timed_run("insert with put()", [](ostream& o, char c) {o.put(c);});
> >>> timed_run("insert with op<< ", [](ostream& o, char c) {o << c;});
> >>> }
> >>> ---------
> >>>
> >>> This is what I get with the current trunk:
> >>> ---------
> >>> insert with put() took: 6.12152 seconds
> >>> insert with op<< took: 13.4437 seconds
> >>> ---------
> >>>
> >>> And this is what I get with the attached patch:
> >>> ---------
> >>> insert with put() took: 6.08313 seconds
> >>> insert with op<< took: 8.24565 seconds
> >>> ---------
> >>>
> >>> So the overhead of calling operator<< vs calling put() was reduced by more
> >>> than 3X.
> >>>
> >>> The prototype patch calls an internal alternate to sputn(), which tries the
> >>> buffer prior to calling xsputn().
> >> This won't work if a user provides an explicit specialization of
> >> basic_streambuf<char, MyTraits>. std::basic_ostream<char, MyTraits>
> >> will still try to call your new function, but it won't be present in
> >> the user's specialization, so will fail to compile. The basic_ostream
> >> primary template can only use the standard API of basic_streambuf. The
> >> std::basic_ostream<char> specialization can use non-standard members
> >> of std::basic_streambuf<char> because we know users can't specialize
> >> that.
> > Thanks, makes sense, this was more just a quick proof of concept. I
> > guess a real version could work around this, well it could be
> > implemented purely in terms of sputc() too. Am curious if you think
> > the overall idea is worthwhile though? Partly I am trying to
> > understand it better, like it was a bit surprising to me that the
> > standard says that sputn() *must* call xsputn(). Feels like calling
> > it, only if a call to overflow() would otherwise be necessary, makes
> > more sense to me...
> >
> >
> > -Lewis
> > .
>
> I think that the issue you spotted can be summarize by the
> implementation of operator<< in <ostream>:
>
> template<typename _CharT, typename _Traits>
> inline basic_ostream<_CharT, _Traits>&
> operator<<(basic_ostream<_CharT, _Traits>& __out, _CharT __c)
> { return __ostream_insert(__out, &__c, 1); }
>
> To output a single _CharT is treated as to output a C string.
>
> If you add the plumbing to have a __ostream_insert(__out, __c) then
> buffering should take place normally as it will end-up into a call to sputc.
Indeed. This improves the performance of Lewis's testcase so there is
almost no overhead compared to using ostream::put(c) directly:
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -505,7 +505,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
template<typename _CharT, typename _Traits>
inline basic_ostream<_CharT, _Traits>&
operator<<(basic_ostream<_CharT, _Traits>& __out, _CharT __c)
- { return __ostream_insert(__out, &__c, 1); }
+ {
+ if (__out.width() != 0)
+ return __ostream_insert(__out, &__c, 1);
+ __out.put(__c);
+ return __out;
+ }
template<typename _CharT, typename _Traits>
inline basic_ostream<_CharT, _Traits>&
@@ -516,7 +521,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
template<typename _Traits>
inline basic_ostream<char, _Traits>&
operator<<(basic_ostream<char, _Traits>& __out, char __c)
- { return __ostream_insert(__out, &__c, 1); }
+ {
+ if (__out.width() != 0)
+ return __ostream_insert(__out, &__c, 1);
+ __out.put(__c);
+ return __out;
+ }
// Signed and unsigned
template<typename _Traits>
I think the compiler will optimize this more aggressively than if the
new code is inside __ostream_write or __ostream_insert.
We could also use put(__c) when __out.width() == 1 but then we'd need
to also call width(0) if put(__c) succeeds, and that extra logic
defeats some of the optimization. Using width()==1 is rather pointless
when inserting single chars (as it is no different to width(0)) so I'm
not too worried about not optimizing that case.
> Either it is worthwhile or not, I would say that if you need it and
> eventually implement it then do not hesitate to submit it here !
I think this is simple enough that it's worth doing. It improves the
default case where there's no padding, and shouldn't add much overhead
when there is padding.
If it turns out that requiring an observable call to xsputn is not
required for string views and strings, we can consider something like
Lewis's patch as well (which would help when writing many strings
where most of them fit in the buffer).
next prev parent reply other threads:[~2022-01-10 16:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-14 21:26 Lewis Hyatt
2021-07-14 21:30 ` Jonathan Wakely
2021-07-14 21:45 ` Lewis Hyatt
2021-07-15 17:11 ` François Dumont
2022-01-10 16:07 ` Jonathan Wakely [this message]
2022-01-10 23:18 ` Lewis Hyatt
2022-01-11 1:09 ` Jonathan Wakely
2021-07-14 21:54 ` Dietmar Kühl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAH6eHdSejdzFPVP+v_J8XWgd6AY9eoJ+x4Y4c5mC_bUcO=MJjg@mail.gmail.com' \
--to=jwakely.gcc@gmail.com \
--cc=dietmar_kuehl@yahoo.com \
--cc=frs.dumont@gmail.com \
--cc=lhyatt@gmail.com \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).