From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by sourceware.org (Postfix) with ESMTPS id 12F58385802E for ; Wed, 14 Jul 2021 21:31:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 12F58385802E Received: by mail-wr1-x42e.google.com with SMTP id v5so4993674wrt.3 for ; Wed, 14 Jul 2021 14:31:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Bi8FL+N5UcwILUQJqH88G3zkd9O6QAFX+lq8Go9k2dQ=; b=rEfCWs5sqHFdfZVFY8oaPw662p+n2F6/PiJmncTak3lDjtrAOJV80Ee/2a5bS7byXS GesClRHQV4KZj7lK7DnogShmmZijIm8jIWKdcD+l6+O9ecZXgerzLgMr0nWotUd4cO8f zLuWBomUhhi64HIwy3dOvHmZRJGN3lSfZAYGkLXR4Ek58lnIH2Sxt/Jp6HXneV6svIQ7 AmdMZDlmXgRp20JGIX/gw9kmKgMZ4zMF2J5xVc8cxd9zu9PXr8DiufbduWQdR1hEkmo6 wSAt73oCH04W48usAMcZVyfKkcsHpJKLhVqZW0Rdv1Gc3We3YBvXKpOQ/Odluyd6bM1q oltA== X-Gm-Message-State: AOAM530SByPaKq6zjVe5FXVbUcJvqFOd/PIltpWafdSJ5p/vdsh85ZS2 Ugkf62E8rNZPB5Xc/ySJXhZXbdtWFySrg28yRkM= X-Google-Smtp-Source: ABdhPJxpY3Jym6WqZzo0wex3tnYUR63omXNBX+UNxWstjW4Xir2VnXvuLc221mlEiI9/0HMvicJ+9zTxxRVHIe1WrKw= X-Received: by 2002:adf:a74a:: with SMTP id e10mr169201wrd.185.1626298260035; Wed, 14 Jul 2021 14:31:00 -0700 (PDT) MIME-Version: 1.0 References: <20210714212609.GA78610@ldh-imac.local> In-Reply-To: <20210714212609.GA78610@ldh-imac.local> From: Jonathan Wakely Date: Wed, 14 Jul 2021 22:30:48 +0100 Message-ID: Subject: Re: ostream::operator<<() and sputn() To: Lewis Hyatt Cc: "libstdc++" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jul 2021 21:31:02 -0000 On Wed, 14 Jul 2021 at 22:26, Lewis Hyatt via Libstdc++ wrote: > > Hello- > > I noticed that libstdc++'s implementation of ostream::operator<<() prefers > to call sputn() on the underlying streambuf for all char, char*, and string > output operations, including single characters, rather than manipulate the > buffer directly. I am curious why it works this way, it feels perhaps > suboptimal to me because sputn() is mandated to call the virtual function > xsputn() on every call, while e.g. sputc() simply manipulates the buffer and > only needs a virtual call when the buffer is full. I always thought that the > buffer abstraction and the resulting avoidance of virtual calls for the > majority of operations was the main point of streambuf's design, and that > sputn() was meant for cases when the output would be large enough to > overflow the buffer anyway, if it may be possible to skip the buffer and > flush directly instead? > > It seems to me that for most typical use cases, xsputn() is still going to > want to use the buffer if the output fits into it; libstdc++ does this in > basic_filebuf, for example. So then it would seem to be beneficial to try > the buffer prior to making the virtual function call, instead of after -- > especially because the typical char instantiation of __ostream_insert that > makes this call for operator<<() is hidden inside the .so, and is not > inlined or eligible for devirtualization optimizations. > > FWIW, here is a small test case. > > --------- > #include > #include > #include > #include > #include > #include > using namespace std; > > int main() { > constexpr size_t N = 500000000; > string s(N, 'x'); > > ofstream of{"/dev/null"}; > ostringstream os; > ostream* streams[] = {&of, &os}; > mt19937 rng{random_device{}()}; > > const auto timed_run = [&](const char* label, auto&& callback) { > const auto t1 = chrono::steady_clock::now(); > for(char c: s) callback(*streams[rng() % 2], c); > const auto t2 = chrono::steady_clock::now(); > cout << label << " took: " > << chrono::duration(t2-t1).count() > << " seconds" << endl; > }; > > timed_run("insert with put()", [](ostream& o, char c) {o.put(c);}); > timed_run("insert with op<< ", [](ostream& o, char c) {o << c;}); > } > --------- > > This is what I get with the current trunk: > --------- > insert with put() took: 6.12152 seconds > insert with op<< took: 13.4437 seconds > --------- > > And this is what I get with the attached patch: > --------- > insert with put() took: 6.08313 seconds > insert with op<< took: 8.24565 seconds > --------- > > So the overhead of calling operator<< vs calling put() was reduced by more > than 3X. > > The prototype patch calls an internal alternate to sputn(), which tries the > buffer prior to calling xsputn(). This won't work if a user provides an explicit specialization of basic_streambuf. std::basic_ostream will still try to call your new function, but it won't be present in the user's specialization, so will fail to compile. The basic_ostream primary template can only use the standard API of basic_streambuf. The std::basic_ostream specialization can use non-standard members of std::basic_streambuf because we know users can't specialize that.