public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: "François Dumont" <frs.dumont@gmail.com>
To: Lewis Hyatt <lhyatt@gmail.com>, Jonathan Wakely <jwakely.gcc@gmail.com>
Cc: libstdc++ <libstdc++@gcc.gnu.org>
Subject: Re: ostream::operator<<() and sputn()
Date: Thu, 15 Jul 2021 19:11:50 +0200	[thread overview]
Message-ID: <1ee79d3c-a373-cb2a-f975-e62d182f1882@gmail.com> (raw)
In-Reply-To: <CAA_5UQ4D5zk8WFjLFtasWicbUhTCKUDUiyDKHzACpvyT7Lfgpw@mail.gmail.com>

On 14/07/21 11:45 pm, Lewis Hyatt via Libstdc++ wrote:
> On Wed, Jul 14, 2021 at 5:31 PM Jonathan Wakely<jwakely.gcc@gmail.com>  wrote:
>> On Wed, 14 Jul 2021 at 22:26, Lewis Hyatt via Libstdc++
>> <libstdc++@gcc.gnu.org>  wrote:
>>> Hello-
>>>
>>> I noticed that libstdc++'s implementation of ostream::operator<<() prefers
>>> to call sputn() on the underlying streambuf for all char, char*, and string
>>> output operations, including single characters, rather than manipulate the
>>> buffer directly. I am curious why it works this way, it feels perhaps
>>> suboptimal to me because sputn() is mandated to call the virtual function
>>> xsputn() on every call, while e.g. sputc() simply manipulates the buffer and
>>> only needs a virtual call when the buffer is full. I always thought that the
>>> buffer abstraction and the resulting avoidance of virtual calls for the
>>> majority of operations was the main point of streambuf's design, and that
>>> sputn() was meant for cases when the output would be large enough to
>>> overflow the buffer anyway, if it may be possible to skip the buffer and
>>> flush directly instead?
>>>
>>> It seems to me that for most typical use cases, xsputn() is still going to
>>> want to use the buffer if the output fits into it; libstdc++ does this in
>>> basic_filebuf, for example. So then it would seem to be beneficial to try
>>> the buffer prior to making the virtual function call, instead of after --
>>> especially because the typical char instantiation of __ostream_insert that
>>> makes this call for operator<<() is hidden inside the .so, and is not
>>> inlined or eligible for devirtualization optimizations.
>>>
>>> FWIW, here is a small test case.
>>>
>>> ---------
>>> #include <ostream>
>>> #include <iostream>
>>> #include <fstream>
>>> #include <sstream>
>>> #include <chrono>
>>> #include <random>
>>> using namespace std;
>>>
>>> int main() {
>>>      constexpr size_t N = 500000000;
>>>      string s(N, 'x');
>>>
>>>      ofstream of{"/dev/null"};
>>>      ostringstream os;
>>>      ostream* streams[] = {&of, &os};
>>>      mt19937 rng{random_device{}()};
>>>
>>>      const auto timed_run = [&](const char* label, auto&& callback) {
>>>          const auto t1 = chrono::steady_clock::now();
>>>          for(char c: s) callback(*streams[rng() % 2], c);
>>>          const auto t2 = chrono::steady_clock::now();
>>>          cout << label << " took: "
>>>               << chrono::duration<double>(t2-t1).count()
>>>               << " seconds" << endl;
>>>      };
>>>
>>>      timed_run("insert with put()", [](ostream& o, char c) {o.put(c);});
>>>      timed_run("insert with op<< ", [](ostream& o, char c) {o << c;});
>>> }
>>> ---------
>>>
>>> This is what I get with the current trunk:
>>> ---------
>>> insert with put() took: 6.12152 seconds
>>> insert with op<<  took: 13.4437 seconds
>>> ---------
>>>
>>> And this is what I get with the attached patch:
>>> ---------
>>> insert with put() took: 6.08313 seconds
>>> insert with op<<  took: 8.24565 seconds
>>> ---------
>>>
>>> So the overhead of calling operator<< vs calling put() was reduced by more
>>> than 3X.
>>>
>>> The prototype patch calls an internal alternate to sputn(), which tries the
>>> buffer prior to calling xsputn().
>> This won't work if a user provides an explicit specialization of
>> basic_streambuf<char, MyTraits>. std::basic_ostream<char, MyTraits>
>> will still try to call your new function, but it won't be present in
>> the user's specialization, so will fail to compile. The basic_ostream
>> primary template can only use the standard API of basic_streambuf. The
>> std::basic_ostream<char> specialization can use non-standard members
>> of std::basic_streambuf<char> because we know users can't specialize
>> that.
> Thanks, makes sense, this was more just a quick proof of concept. I
> guess a real version could work around this, well it could be
> implemented purely in terms of sputc() too. Am curious if you think
> the overall idea is worthwhile though? Partly I am trying to
> understand it better, like it was a bit surprising to me that the
> standard says that sputn() *must* call xsputn(). Feels like calling
> it, only if a call to overflow() would otherwise be necessary, makes
> more sense to me...
>
>
> -Lewis
> .

I think that the issue you spotted can be summarize by the 
implementation of operator<< in <ostream>:

   template<typename _CharT, typename _Traits>
     inline basic_ostream<_CharT, _Traits>&
     operator<<(basic_ostream<_CharT, _Traits>& __out, _CharT __c)
     { return __ostream_insert(__out, &__c, 1); }

To output a single _CharT is treated as to output a C string.

If you add the plumbing to have a __ostream_insert(__out, __c) then 
buffering should take place normally as it will end-up into a call to sputc.

Either it is worthwhile or not, I would say that if you need it and 
eventually implement it then do not hesitate to submit it here !

François



  reply	other threads:[~2021-07-15 17:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 21:26 Lewis Hyatt
2021-07-14 21:30 ` Jonathan Wakely
2021-07-14 21:45   ` Lewis Hyatt
2021-07-15 17:11     ` François Dumont [this message]
2022-01-10 16:07       ` Jonathan Wakely
2022-01-10 23:18         ` Lewis Hyatt
2022-01-11  1:09           ` Jonathan Wakely
2021-07-14 21:54   ` Dietmar Kühl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ee79d3c-a373-cb2a-f975-e62d182f1882@gmail.com \
    --to=frs.dumont@gmail.com \
    --cc=jwakely.gcc@gmail.com \
    --cc=lhyatt@gmail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).