From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id 2C44E3854829 for ; Thu, 15 Jul 2021 17:11:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2C44E3854829 Received: by mail-wm1-x329.google.com with SMTP id g12so4167316wme.2 for ; Thu, 15 Jul 2021 10:11:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=ZaZCa9cWU18VLYyLRuIiCUNc0m1DVJTrqeljjU+Tt5w=; b=aauwNrS5K7e73YT8kexLUavMpzpGVbyOu0WloMeOg0YVjeh+pnB+Cc0nysbOsT+w5n 0ZJZh0kgXz6CeBPlEECPjn94APZ2sNaqJigK9my1QmJb2RCYXulymbbwNoEAGZ8GVaua JZymBOiqy83Jv464P67Q4KISilI0Z23pKvQSJh1OxgFDdC7dEomqMBHP3WUaW42YIMPQ HXJRzU2+/BMwnACrrcgj4ABbc+6ewy+yrHoM+lnKA+HHrakNKFBRN5neCIF9szymihce /n3D+9EyXEeQfqQt/MUOX7XBnxbOBiu4J4MiwjtKcuKQqR2EPnI4QvjKfo/WoNhn/d0+ SNvA== X-Gm-Message-State: AOAM5325/K+tGqsySKBPJ4As2Y6kqO6ahUXp9H3nGUSmbXQ9UByJ6r4E 3Gm2l0+ljXAotxv3cXgX1K3blskk/jRrpQ== X-Google-Smtp-Source: ABdhPJyOZUZYd7xLq7JCPo2hgWG3ouPXj84rHK0LD2eDVzkbof8ZeDeTxfZj0ph+PNZHV5mGVMCp3g== X-Received: by 2002:a05:600c:5107:: with SMTP id o7mr5768716wms.184.1626369114731; Thu, 15 Jul 2021 10:11:54 -0700 (PDT) Received: from [10.25.4.38] ([109.190.253.11]) by smtp.googlemail.com with ESMTPSA id x18sm6919841wrw.19.2021.07.15.10.11.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 Jul 2021 10:11:53 -0700 (PDT) From: =?UTF-8?Q?Fran=c3=a7ois_Dumont?= Subject: Re: ostream::operator<<() and sputn() To: Lewis Hyatt , Jonathan Wakely Cc: libstdc++ References: <20210714212609.GA78610@ldh-imac.local> Message-ID: <1ee79d3c-a373-cb2a-f975-e62d182f1882@gmail.com> Date: Thu, 15 Jul 2021 19:11:50 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Jul 2021 17:11:57 -0000 On 14/07/21 11:45 pm, Lewis Hyatt via Libstdc++ wrote: > On Wed, Jul 14, 2021 at 5:31 PM Jonathan Wakely wrote: >> On Wed, 14 Jul 2021 at 22:26, Lewis Hyatt via Libstdc++ >> wrote: >>> Hello- >>> >>> I noticed that libstdc++'s implementation of ostream::operator<<() prefers >>> to call sputn() on the underlying streambuf for all char, char*, and string >>> output operations, including single characters, rather than manipulate the >>> buffer directly. I am curious why it works this way, it feels perhaps >>> suboptimal to me because sputn() is mandated to call the virtual function >>> xsputn() on every call, while e.g. sputc() simply manipulates the buffer and >>> only needs a virtual call when the buffer is full. I always thought that the >>> buffer abstraction and the resulting avoidance of virtual calls for the >>> majority of operations was the main point of streambuf's design, and that >>> sputn() was meant for cases when the output would be large enough to >>> overflow the buffer anyway, if it may be possible to skip the buffer and >>> flush directly instead? >>> >>> It seems to me that for most typical use cases, xsputn() is still going to >>> want to use the buffer if the output fits into it; libstdc++ does this in >>> basic_filebuf, for example. So then it would seem to be beneficial to try >>> the buffer prior to making the virtual function call, instead of after -- >>> especially because the typical char instantiation of __ostream_insert that >>> makes this call for operator<<() is hidden inside the .so, and is not >>> inlined or eligible for devirtualization optimizations. >>> >>> FWIW, here is a small test case. >>> >>> --------- >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> using namespace std; >>> >>> int main() { >>> constexpr size_t N = 500000000; >>> string s(N, 'x'); >>> >>> ofstream of{"/dev/null"}; >>> ostringstream os; >>> ostream* streams[] = {&of, &os}; >>> mt19937 rng{random_device{}()}; >>> >>> const auto timed_run = [&](const char* label, auto&& callback) { >>> const auto t1 = chrono::steady_clock::now(); >>> for(char c: s) callback(*streams[rng() % 2], c); >>> const auto t2 = chrono::steady_clock::now(); >>> cout << label << " took: " >>> << chrono::duration(t2-t1).count() >>> << " seconds" << endl; >>> }; >>> >>> timed_run("insert with put()", [](ostream& o, char c) {o.put(c);}); >>> timed_run("insert with op<< ", [](ostream& o, char c) {o << c;}); >>> } >>> --------- >>> >>> This is what I get with the current trunk: >>> --------- >>> insert with put() took: 6.12152 seconds >>> insert with op<< took: 13.4437 seconds >>> --------- >>> >>> And this is what I get with the attached patch: >>> --------- >>> insert with put() took: 6.08313 seconds >>> insert with op<< took: 8.24565 seconds >>> --------- >>> >>> So the overhead of calling operator<< vs calling put() was reduced by more >>> than 3X. >>> >>> The prototype patch calls an internal alternate to sputn(), which tries the >>> buffer prior to calling xsputn(). >> This won't work if a user provides an explicit specialization of >> basic_streambuf. std::basic_ostream >> will still try to call your new function, but it won't be present in >> the user's specialization, so will fail to compile. The basic_ostream >> primary template can only use the standard API of basic_streambuf. The >> std::basic_ostream specialization can use non-standard members >> of std::basic_streambuf because we know users can't specialize >> that. > Thanks, makes sense, this was more just a quick proof of concept. I > guess a real version could work around this, well it could be > implemented purely in terms of sputc() too. Am curious if you think > the overall idea is worthwhile though? Partly I am trying to > understand it better, like it was a bit surprising to me that the > standard says that sputn() *must* call xsputn(). Feels like calling > it, only if a call to overflow() would otherwise be necessary, makes > more sense to me... > > > -Lewis > . I think that the issue you spotted can be summarize by the implementation of operator<< in :   template     inline basic_ostream<_CharT, _Traits>&     operator<<(basic_ostream<_CharT, _Traits>& __out, _CharT __c)     { return __ostream_insert(__out, &__c, 1); } To output a single _CharT is treated as to output a C string. If you add the plumbing to have a __ostream_insert(__out, __c) then buffering should take place normally as it will end-up into a call to sputc. Either it is worthwhile or not, I would say that if you need it and eventually implement it then do not hesitate to submit it here ! François