From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x12d.google.com (mail-il1-x12d.google.com [IPv6:2607:f8b0:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id 74528385802E for ; Wed, 14 Jul 2021 21:26:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 74528385802E Received: by mail-il1-x12d.google.com with SMTP id b14so3018655ilf.7 for ; Wed, 14 Jul 2021 14:26:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition; bh=vQll561Rha8n3XBv4n6/aEDckDwsm3/IvCD+ISyiN/w=; b=qb8fhk3BubrsKO0Yhr993h+m2rY8Dc9weMedRzOnjpngsi0aLVy1xCExbVTwwWoANH 4/ZdAF9vOyZQz0iLP2fYvTU3TvcMOuhwjeb2a6bFgTO3QVZmUCv6rz37sOZ8z1lcZSt2 ZByzwhLzeL5GBZolLy2KrqB2SfEeTXCx/OfZHAGirCExHk3Xov95FFf8RpKVD1WrR/sZ nKQar8xyVC0do9SzMVAc3UB3HMIEkvo18sh8B438j3QCIhgu1PPVqu/daBtaLIoTlWIr PgvsPzW7ofnDk34iNAJ02Tdlu2ZwjXEDxTiO5YKu+zArZq5MJay7eAGZ8UR1Diz4OwI5 WEyQ== X-Gm-Message-State: AOAM530KazlxPTQaT4w8IoIm+kFZ7o3tL5EMAKGs0zoUhr8nesFbahcY D47xrzPmkjIMnpgw9wUd5waDSCfcS2Q= X-Google-Smtp-Source: ABdhPJzlYqDg3tug8bj7E4XtVWU58ZeyB5rDURdwweCzNOQhern1B8TopRlF5KrgWjVwc1nYZo/0pw== X-Received: by 2002:a92:6c11:: with SMTP id h17mr7800009ilc.96.1626297972550; Wed, 14 Jul 2021 14:26:12 -0700 (PDT) Received: from ldh-imac.local (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id p25sm1983984ioj.18.2021.07.14.14.26.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Jul 2021 14:26:11 -0700 (PDT) Date: Wed, 14 Jul 2021 17:26:09 -0400 From: Lewis Hyatt To: libstdc++@gcc.gnu.org Subject: ostream::operator<<() and sputn() Message-ID: <20210714212609.GA78610@ldh-imac.local> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="Kj7319i9nmIyA2yE" Content-Disposition: inline X-Spam-Status: No, score=-3039.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jul 2021 21:26:15 -0000 --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hello- I noticed that libstdc++'s implementation of ostream::operator<<() prefers to call sputn() on the underlying streambuf for all char, char*, and string output operations, including single characters, rather than manipulate the buffer directly. I am curious why it works this way, it feels perhaps suboptimal to me because sputn() is mandated to call the virtual function xsputn() on every call, while e.g. sputc() simply manipulates the buffer and only needs a virtual call when the buffer is full. I always thought that the buffer abstraction and the resulting avoidance of virtual calls for the majority of operations was the main point of streambuf's design, and that sputn() was meant for cases when the output would be large enough to overflow the buffer anyway, if it may be possible to skip the buffer and flush directly instead? It seems to me that for most typical use cases, xsputn() is still going to want to use the buffer if the output fits into it; libstdc++ does this in basic_filebuf, for example. So then it would seem to be beneficial to try the buffer prior to making the virtual function call, instead of after -- especially because the typical char instantiation of __ostream_insert that makes this call for operator<<() is hidden inside the .so, and is not inlined or eligible for devirtualization optimizations. FWIW, here is a small test case. --------- #include #include #include #include #include #include using namespace std; int main() { constexpr size_t N = 500000000; string s(N, 'x'); ofstream of{"/dev/null"}; ostringstream os; ostream* streams[] = {&of, &os}; mt19937 rng{random_device{}()}; const auto timed_run = [&](const char* label, auto&& callback) { const auto t1 = chrono::steady_clock::now(); for(char c: s) callback(*streams[rng() % 2], c); const auto t2 = chrono::steady_clock::now(); cout << label << " took: " << chrono::duration(t2-t1).count() << " seconds" << endl; }; timed_run("insert with put()", [](ostream& o, char c) {o.put(c);}); timed_run("insert with op<< ", [](ostream& o, char c) {o << c;}); } --------- This is what I get with the current trunk: --------- insert with put() took: 6.12152 seconds insert with op<< took: 13.4437 seconds --------- And this is what I get with the attached patch: --------- insert with put() took: 6.08313 seconds insert with op<< took: 8.24565 seconds --------- So the overhead of calling operator<< vs calling put() was reduced by more than 3X. The prototype patch calls an internal alternate to sputn(), which tries the buffer prior to calling xsputn(). All the tests still pass with this patch in place, but I think it would not be suitable as-is, because it affects the overload for std::string_view, and I think the standard may actually mandate that xsputn() be called always for this case. (It says it should be output as-if sputn() were called, which I think makes xsputn() should be an observable side-effect.) Still it would work fine outside the string case AFAIK, and I thought it was maybe an interesting result worth sharing, and am curious what others think here, or if there is a reason I am missing why it's better the current way. Thanks! -Lewis --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline; filename="buffered_sputn.txt" diff --git a/libstdc++-v3/include/bits/ostream.tcc b/libstdc++-v3/include/bits/ostream.tcc index 06b2217d216..614b347736b 100644 --- a/libstdc++-v3/include/bits/ostream.tcc +++ b/libstdc++-v3/include/bits/ostream.tcc @@ -195,7 +195,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION ios_base::iostate __err = ios_base::goodbit; __try { - if (this->rdbuf()->sputn(__s, __n) != __n) + if (this->rdbuf()->__buffered_sputn(__s, __n) != __n) __err = ios_base::badbit; } __catch(__cxxabiv1::__forced_unwind&) diff --git a/libstdc++-v3/include/bits/ostream_insert.h b/libstdc++-v3/include/bits/ostream_insert.h index 72eea8b1d05..650ea1afd9e 100644 --- a/libstdc++-v3/include/bits/ostream_insert.h +++ b/libstdc++-v3/include/bits/ostream_insert.h @@ -48,7 +48,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typedef basic_ostream<_CharT, _Traits> __ostream_type; typedef typename __ostream_type::ios_base __ios_base; - const streamsize __put = __out.rdbuf()->sputn(__s, __n); + const streamsize __put = __out.rdbuf()->__buffered_sputn(__s, __n); if (__put != __n) __out.setstate(__ios_base::badbit); } diff --git a/libstdc++-v3/include/bits/streambuf.tcc b/libstdc++-v3/include/bits/streambuf.tcc index 22464c4401c..7e803e4fc08 100644 --- a/libstdc++-v3/include/bits/streambuf.tcc +++ b/libstdc++-v3/include/bits/streambuf.tcc @@ -108,6 +108,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __ret; } + template + streamsize + basic_streambuf<_CharT, _Traits>:: + __buffered_sputn(const char_type* __s, streamsize __n) + { + const streamsize __buf_len = this->epptr() - this->pptr(); + if (__buf_len >= __n) + { + traits_type::copy(this->pptr(), __s, __n); + this->__safe_pbump(__n); + return __n; + } + else + return this->xsputn(__s, __n); + } + // Conceivably, this could be used to implement buffer-to-buffer // copies, if this was ever desired in an un-ambiguous way by the // standard. diff --git a/libstdc++-v3/include/std/streambuf b/libstdc++-v3/include/std/streambuf index e213104c4f4..a1a2277adc7 100644 --- a/libstdc++-v3/include/std/streambuf +++ b/libstdc++-v3/include/std/streambuf @@ -455,6 +455,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION sputn(const char_type* __s, streamsize __n) { return this->xsputn(__s, __n); } + /** + * @brief Entry point for all single-character output functions. + * @param __s A buffer read area. + * @param __n A count. + * + * Internal output function. + * + * + * Has the same effect as sputn(__s,__n), but attempts to use the + * existing buffer space rather than make a virtual call, if possible. + */ + streamsize + __buffered_sputn(const char_type* __s, streamsize __n); + protected: /** * @brief Base constructor. --Kj7319i9nmIyA2yE--