public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Lewis Hyatt <lhyatt@gmail.com>
To: Jonathan Wakely <jwakely.gcc@gmail.com>
Cc: "François Dumont" <frs.dumont@gmail.com>,
	libstdc++ <libstdc++@gcc.gnu.org>,
	"Dietmar Kuehl" <dietmar_kuehl@yahoo.com>
Subject: Re: ostream::operator<<() and sputn()
Date: Mon, 10 Jan 2022 18:18:40 -0500	[thread overview]
Message-ID: <CAA_5UQ6FbJedoAj8phxyPjJKVvkikM9zD6ug1kk-fw12fyvDCg@mail.gmail.com> (raw)
In-Reply-To: <CAH6eHdSejdzFPVP+v_J8XWgd6AY9eoJ+x4Y4c5mC_bUcO=MJjg@mail.gmail.com>

On Mon, Jan 10, 2022 at 11:07 AM Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>
> On Thu, 15 Jul 2021 at 18:11, François Dumont wrote:
> >
> > On 14/07/21 11:45 pm, Lewis Hyatt via Libstdc++ wrote:
> > > On Wed, Jul 14, 2021 at 5:31 PM Jonathan Wakely<jwakely.gcc@gmail.com>  wrote:
> > >> On Wed, 14 Jul 2021 at 22:26, Lewis Hyatt via Libstdc++
> > >> <libstdc++@gcc.gnu.org>  wrote:
> > >>> Hello-
> > >>>
> > >>> I noticed that libstdc++'s implementation of ostream::operator<<() prefers
> > >>> to call sputn() on the underlying streambuf for all char, char*, and string
> > >>> output operations, including single characters, rather than manipulate the
> > >>> buffer directly. I am curious why it works this way, it feels perhaps
> > >>> suboptimal to me because sputn() is mandated to call the virtual function
> > >>> xsputn() on every call, while e.g. sputc() simply manipulates the buffer and
> > >>> only needs a virtual call when the buffer is full. I always thought that the
> > >>> buffer abstraction and the resulting avoidance of virtual calls for the
> > >>> majority of operations was the main point of streambuf's design, and that
> > >>> sputn() was meant for cases when the output would be large enough to
> > >>> overflow the buffer anyway, if it may be possible to skip the buffer and
> > >>> flush directly instead?
> > >>>
> > >>> It seems to me that for most typical use cases, xsputn() is still going to
> > >>> want to use the buffer if the output fits into it; libstdc++ does this in
> > >>> basic_filebuf, for example. So then it would seem to be beneficial to try
> > >>> the buffer prior to making the virtual function call, instead of after --
> > >>> especially because the typical char instantiation of __ostream_insert that
> > >>> makes this call for operator<<() is hidden inside the .so, and is not
> > >>> inlined or eligible for devirtualization optimizations.
> > >>>
> > >>> FWIW, here is a small test case.
> > >>>
> > >>> ---------
> > >>> #include <ostream>
> > >>> #include <iostream>
> > >>> #include <fstream>
> > >>> #include <sstream>
> > >>> #include <chrono>
> > >>> #include <random>
> > >>> using namespace std;
> > >>>
> > >>> int main() {
> > >>>      constexpr size_t N = 500000000;
> > >>>      string s(N, 'x');
> > >>>
> > >>>      ofstream of{"/dev/null"};
> > >>>      ostringstream os;
> > >>>      ostream* streams[] = {&of, &os};
> > >>>      mt19937 rng{random_device{}()};
> > >>>
> > >>>      const auto timed_run = [&](const char* label, auto&& callback) {
> > >>>          const auto t1 = chrono::steady_clock::now();
> > >>>          for(char c: s) callback(*streams[rng() % 2], c);
> > >>>          const auto t2 = chrono::steady_clock::now();
> > >>>          cout << label << " took: "
> > >>>               << chrono::duration<double>(t2-t1).count()
> > >>>               << " seconds" << endl;
> > >>>      };
> > >>>
> > >>>      timed_run("insert with put()", [](ostream& o, char c) {o.put(c);});
> > >>>      timed_run("insert with op<< ", [](ostream& o, char c) {o << c;});
> > >>> }
> > >>> ---------
> > >>>
> > >>> This is what I get with the current trunk:
> > >>> ---------
> > >>> insert with put() took: 6.12152 seconds
> > >>> insert with op<<  took: 13.4437 seconds
> > >>> ---------
> > >>>
> > >>> And this is what I get with the attached patch:
> > >>> ---------
> > >>> insert with put() took: 6.08313 seconds
> > >>> insert with op<<  took: 8.24565 seconds
> > >>> ---------
> > >>>
> > >>> So the overhead of calling operator<< vs calling put() was reduced by more
> > >>> than 3X.
> > >>>
> > >>> The prototype patch calls an internal alternate to sputn(), which tries the
> > >>> buffer prior to calling xsputn().
> > >> This won't work if a user provides an explicit specialization of
> > >> basic_streambuf<char, MyTraits>. std::basic_ostream<char, MyTraits>
> > >> will still try to call your new function, but it won't be present in
> > >> the user's specialization, so will fail to compile. The basic_ostream
> > >> primary template can only use the standard API of basic_streambuf. The
> > >> std::basic_ostream<char> specialization can use non-standard members
> > >> of std::basic_streambuf<char> because we know users can't specialize
> > >> that.
> > > Thanks, makes sense, this was more just a quick proof of concept. I
> > > guess a real version could work around this, well it could be
> > > implemented purely in terms of sputc() too. Am curious if you think
> > > the overall idea is worthwhile though? Partly I am trying to
> > > understand it better, like it was a bit surprising to me that the
> > > standard says that sputn() *must* call xsputn(). Feels like calling
> > > it, only if a call to overflow() would otherwise be necessary, makes
> > > more sense to me...
> > >
> > >
> > > -Lewis
> > > .
> >
> > I think that the issue you spotted can be summarize by the
> > implementation of operator<< in <ostream>:
> >
> >    template<typename _CharT, typename _Traits>
> >      inline basic_ostream<_CharT, _Traits>&
> >      operator<<(basic_ostream<_CharT, _Traits>& __out, _CharT __c)
> >      { return __ostream_insert(__out, &__c, 1); }
> >
> > To output a single _CharT is treated as to output a C string.
> >
> > If you add the plumbing to have a __ostream_insert(__out, __c) then
> > buffering should take place normally as it will end-up into a call to sputc.
>
> Indeed. This improves the performance of Lewis's testcase so there is
> almost no overhead compared to using ostream::put(c) directly:
>
> --- a/libstdc++-v3/include/std/ostream
> +++ b/libstdc++-v3/include/std/ostream
> @@ -505,7 +505,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   template<typename _CharT, typename _Traits>
>     inline basic_ostream<_CharT, _Traits>&
>     operator<<(basic_ostream<_CharT, _Traits>& __out, _CharT __c)
> -    { return __ostream_insert(__out, &__c, 1); }
> +    {
> +      if (__out.width() != 0)
> +       return __ostream_insert(__out, &__c, 1);
> +      __out.put(__c);
> +      return __out;
> +    }
>
>   template<typename _CharT, typename _Traits>
>     inline basic_ostream<_CharT, _Traits>&
> @@ -516,7 +521,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   template<typename _Traits>
>     inline basic_ostream<char, _Traits>&
>     operator<<(basic_ostream<char, _Traits>& __out, char __c)
> -    { return __ostream_insert(__out, &__c, 1); }
> +    {
> +      if (__out.width() != 0)
> +       return __ostream_insert(__out, &__c, 1);
> +      __out.put(__c);
> +      return __out;
> +    }
>
>   // Signed and unsigned
>   template<typename _Traits>
>
> I think the compiler will optimize this more aggressively than if the
> new code is inside __ostream_write or __ostream_insert.
>
> We could also use put(__c) when __out.width() == 1 but then we'd need
> to also call width(0) if put(__c) succeeds, and that extra logic
> defeats some of the optimization. Using width()==1 is rather pointless
> when inserting single chars (as it is no different to width(0)) so I'm
> not too worried about not optimizing that case.
>
> > Either it is worthwhile or not, I would say that if you need it and
> > eventually implement it then do not hesitate to submit it here !
>
> I think this is simple enough that it's worth doing. It improves the
> default case where there's no padding, and shouldn't add much overhead
> when there is padding.
>
> If it turns out that requiring an observable call to xsputn is not
> required for string views and strings, we can consider something like
> Lewis's patch as well (which would help when writing many strings
> where most of them fit in the buffer).

Thanks a lot for circling back to this. I feel like your change is a
pure improvement with little downside, so it would be great to have
it. Let me know please if I should do anything more there.

To me, the whole situation with sputn() is just unfortunate, because I
think it should not have been specified to always make the virtual
call to xsputn(), but rather only make it if needed. Then everything
would work as well as possible. Like after this change to the
single-char overload, a similar unexpected performance issue will
affect something like:  cout << "a"  compared to cout << 'a', where
now the former will make a virtual call every time and the latter
won't. libstdc++ could handle the char* overload case as well, but
then it still would not be permitted to handle a string or
string_view, so at some level the standard specification is the
constraint... Seems unlikely it will change though, so I think it
means that avoiding sputn() wherever permitted will always be an
avenue towards improving performance.

-Lewis


-Lewis

  reply	other threads:[~2022-01-10 23:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 21:26 Lewis Hyatt
2021-07-14 21:30 ` Jonathan Wakely
2021-07-14 21:45   ` Lewis Hyatt
2021-07-15 17:11     ` François Dumont
2022-01-10 16:07       ` Jonathan Wakely
2022-01-10 23:18         ` Lewis Hyatt [this message]
2022-01-11  1:09           ` Jonathan Wakely
2021-07-14 21:54   ` Dietmar Kühl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAA_5UQ6FbJedoAj8phxyPjJKVvkikM9zD6ug1kk-fw12fyvDCg@mail.gmail.com \
    --to=lhyatt@gmail.com \
    --cc=dietmar_kuehl@yahoo.com \
    --cc=frs.dumont@gmail.com \
    --cc=jwakely.gcc@gmail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).