From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c]) by sourceware.org (Postfix) with ESMTPS id ED540385802E for ; Wed, 14 Jul 2021 21:46:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org ED540385802E Received: by mail-lj1-x22c.google.com with SMTP id h9so5600495ljm.5 for ; Wed, 14 Jul 2021 14:46:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=TswrPuOoywO+rCMRYvvDmf6U4IoNoY9X7XqTYamPiYY=; b=t6BYmfbSmwBETROJTsVjFQJ4WpuMHxKVYdQARzemxCtzrCj/aLDqSORvvIgj4kh6KF 9DTq5fhR/RCqhIiATeAIjVO5XyA0/LCXMhq98XY83r2bZMBTMUTWe/juRV/T8Tn67zpe nKVhtWdAF3UcqZPozU4C9Y7MNGmw27B+dfbnXMcPVSgHtax+YbcinvWn57vrd/1Nh2rc lrI1e55LJqWKjR82x3WosaVTagImkoEGttu747wfjKZi3QHX5hibrIc3b4XyZJzcWHcV PBflOqfVDrc9+YQPqM7GM+hoCopB/5Jt8Y010ycgVZFebzE6P33mGWlIOQ1zPAe+JRph B77w== X-Gm-Message-State: AOAM5315rpYSQv171D4sbKCdLgyQ82jCJFlSsoFfPjJ0tk+FHIE6qWF1 ETnhZhg5FBHVyR0rG9AxXBrOc08MEzObCeY//AI= X-Google-Smtp-Source: ABdhPJyXpIhdIrfYoU381uHRPGqg0jkxS14UcTET7IS/jiRw2pdTJ69gitwEnawAC8DtC8575u7+R5VCl6CjUyiYPCE= X-Received: by 2002:a2e:a58b:: with SMTP id m11mr41467ljp.286.1626299163618; Wed, 14 Jul 2021 14:46:03 -0700 (PDT) MIME-Version: 1.0 References: <20210714212609.GA78610@ldh-imac.local> In-Reply-To: From: Lewis Hyatt Date: Wed, 14 Jul 2021 17:45:52 -0400 Message-ID: Subject: Re: ostream::operator<<() and sputn() To: Jonathan Wakely Cc: "libstdc++" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3031.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jul 2021 21:46:06 -0000 On Wed, Jul 14, 2021 at 5:31 PM Jonathan Wakely wrote: > > On Wed, 14 Jul 2021 at 22:26, Lewis Hyatt via Libstdc++ > wrote: > > > > Hello- > > > > I noticed that libstdc++'s implementation of ostream::operator<<() prefers > > to call sputn() on the underlying streambuf for all char, char*, and string > > output operations, including single characters, rather than manipulate the > > buffer directly. I am curious why it works this way, it feels perhaps > > suboptimal to me because sputn() is mandated to call the virtual function > > xsputn() on every call, while e.g. sputc() simply manipulates the buffer and > > only needs a virtual call when the buffer is full. I always thought that the > > buffer abstraction and the resulting avoidance of virtual calls for the > > majority of operations was the main point of streambuf's design, and that > > sputn() was meant for cases when the output would be large enough to > > overflow the buffer anyway, if it may be possible to skip the buffer and > > flush directly instead? > > > > It seems to me that for most typical use cases, xsputn() is still going to > > want to use the buffer if the output fits into it; libstdc++ does this in > > basic_filebuf, for example. So then it would seem to be beneficial to try > > the buffer prior to making the virtual function call, instead of after -- > > especially because the typical char instantiation of __ostream_insert that > > makes this call for operator<<() is hidden inside the .so, and is not > > inlined or eligible for devirtualization optimizations. > > > > FWIW, here is a small test case. > > > > --------- > > #include > > #include > > #include > > #include > > #include > > #include > > using namespace std; > > > > int main() { > > constexpr size_t N = 500000000; > > string s(N, 'x'); > > > > ofstream of{"/dev/null"}; > > ostringstream os; > > ostream* streams[] = {&of, &os}; > > mt19937 rng{random_device{}()}; > > > > const auto timed_run = [&](const char* label, auto&& callback) { > > const auto t1 = chrono::steady_clock::now(); > > for(char c: s) callback(*streams[rng() % 2], c); > > const auto t2 = chrono::steady_clock::now(); > > cout << label << " took: " > > << chrono::duration(t2-t1).count() > > << " seconds" << endl; > > }; > > > > timed_run("insert with put()", [](ostream& o, char c) {o.put(c);}); > > timed_run("insert with op<< ", [](ostream& o, char c) {o << c;}); > > } > > --------- > > > > This is what I get with the current trunk: > > --------- > > insert with put() took: 6.12152 seconds > > insert with op<< took: 13.4437 seconds > > --------- > > > > And this is what I get with the attached patch: > > --------- > > insert with put() took: 6.08313 seconds > > insert with op<< took: 8.24565 seconds > > --------- > > > > So the overhead of calling operator<< vs calling put() was reduced by more > > than 3X. > > > > The prototype patch calls an internal alternate to sputn(), which tries the > > buffer prior to calling xsputn(). > > This won't work if a user provides an explicit specialization of > basic_streambuf. std::basic_ostream > will still try to call your new function, but it won't be present in > the user's specialization, so will fail to compile. The basic_ostream > primary template can only use the standard API of basic_streambuf. The > std::basic_ostream specialization can use non-standard members > of std::basic_streambuf because we know users can't specialize > that. Thanks, makes sense, this was more just a quick proof of concept. I guess a real version could work around this, well it could be implemented purely in terms of sputc() too. Am curious if you think the overall idea is worthwhile though? Partly I am trying to understand it better, like it was a bit surprising to me that the standard says that sputn() *must* call xsputn(). Feels like calling it, only if a call to overflow() would otherwise be necessary, makes more sense to me... -Lewis