From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id B63F63895FFC for ; Thu, 6 Oct 2022 13:29:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B63F63895FFC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665062962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E42wKdnn/POXkWRDo0w7Ja8UkqSCqk34Ye65nkPf8bs=; b=jRs9Ash1sNTvNFULb6MaBTaeHrH5mcNVYTDJsFD1zdggRIpARFhzDxuUZwboOHKBe7KsaU ysgXApBMIaFmdBPzuC4dqeRvVBZpYUM2ZVp5ma+sPQAboPgmZg5z8Q08LJFB/p66sKC0zW cxvqYlJCjTqzEtbGqRqBdtgydrmfcAM= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-300-3J-76iMWPP29U6TlcMpAlw-1; Thu, 06 Oct 2022 09:29:21 -0400 X-MC-Unique: 3J-76iMWPP29U6TlcMpAlw-1 Received: by mail-qt1-f197.google.com with SMTP id fy20-20020a05622a5a1400b0035bef08641dso1050786qtb.18 for ; Thu, 06 Oct 2022 06:29:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date; bh=E42wKdnn/POXkWRDo0w7Ja8UkqSCqk34Ye65nkPf8bs=; b=d3r675jMhkvZymT01N5oVNRLzHpzBsBCGmeooMfvHfNTnjp7P62fiGTyj1yX+iJT9o K8rUwdT2xj4sXMRaAlqCsx3WNvQ23J9FsUBTbxzTzs3uMO7VYCS/aYAxkcgfbAIytdaN s8dT6LsDY1cOLsQfYsbWVOIlAE69DmDgPtYyJWJd+FHSqxTAwjRVOnfrtgKV78aCx7j4 9h1b9LUFkq3Vb03qp3l6pH+xZDxKyKa4pJCnzGLretozkCZQlEfWHbeAwQ2+XP2XXJfq m4dW6cpc24jVvj4Iy3oPAHBx5EZgjqusBUPh91E+dUMCf8blhdRvcLfqjpUc3VhrLN3k Fbig== X-Gm-Message-State: ACrzQf020EXltHv1x9vKmPJBdIm+BbR16itQKnV50yn7QZNAJR+Md4vQ 2XmQAoV9AtRqXyA3oBJqjWmPFsQfkPb23p8gWmLG5D0YkSECaiLCqlj4toppB2x1qHjBemz9QTH /48bMR3b5uLZPTzbO68J1vbHVquzskns= X-Received: by 2002:a0c:9a0d:0:b0:4b1:982e:96d4 with SMTP id p13-20020a0c9a0d000000b004b1982e96d4mr3844952qvd.114.1665062960964; Thu, 06 Oct 2022 06:29:20 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5NGomhiQZiM9g9rGkmOjDbK2bA15MhdKGBpN+vkQL8h60yJG+21X3l5oZlO3eFW0rLIQNzEDKUZy22q1CkHjE= X-Received: by 2002:a0c:9a0d:0:b0:4b1:982e:96d4 with SMTP id p13-20020a0c9a0d000000b004b1982e96d4mr3844920qvd.114.1665062960622; Thu, 06 Oct 2022 06:29:20 -0700 (PDT) MIME-Version: 1.0 References: <20220905225046.193799-1-cf.natali@gmail.com> In-Reply-To: From: Jonathan Wakely Date: Thu, 6 Oct 2022 14:29:09 +0100 Message-ID: Subject: Re: [PING 2] [PATCH] libstdc++: basic_filebuf: don't flush more often than necessary. To: =?UTF-8?Q?Charles=2DFran=C3=A7ois_Natali?= Cc: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Sorry for the lack of review. I've been trying to remember (and find) some previous discussions related to this topic, but haven't managed to find it yet. The patch does look sensible (and is the same as the one attached to PR 63746) so I'll make sure to review it in time for the GCC 13 cut-off. I noticed that you contributed your test case with a FSF copyright assignment header. Do you actually have a copyright assignment for GCC contributions? If not, you would either need to complete that paperwork with the FSF, or alternatively just contribute under the DCO terms instead: https://gcc.gnu.org/dco.html On Thu, 6 Oct 2022 at 14:24, Charles-Fran=C3=A7ois Natali via Libstdc++ wrote: > > On Thu, Sep 22, 2022, 17:51 Charles-Fran=C3=A7ois Natali > wrote: > > > > > On Mon, Sep 5, 2022, 23:51 Charles-Francois Natali > > wrote: > > > >> `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of > >> size 1024 and above, seemingly as an optimisation. > >> > >> This can have a significant performance impact if the overhead of a > >> `write` syscall is non-negligible, e.g. on a slow disk, on network > >> filesystems, or simply during IO contention because instead of flushin= g > >> every `BUFSIZ` (by default), we can flush every 1024 char. > >> The impact is even greater with custom larger buffers, e.g. for networ= k > >> filesystems, because the code could issue `write` for example 1000X mo= re > >> often than necessary with respect to the buffer size. > >> It also introduces a significant discontinuity in performance when > >> writing chunks of size 1024 and above. > >> > >> See this reproducer which writes down a fixed number of chunks to a fi= le > >> open with `O_SYNC` - to replicate high-latency `write` - for varying > >> size of chunks: > >> > >> ``` > >> $ cat test_fstream_flush.cpp > >> > >> int > >> main(int argc, char* argv[]) > >> { > >> assert(argc =3D=3D 3); > >> > >> const auto* path =3D argv[1]; > >> const auto chunk_size =3D std::stoul(argv[2]); > >> > >> const auto fd =3D > >> open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666= ); > >> assert(fd >=3D 0); > >> > >> auto filebuf =3D __gnu_cxx::stdio_filebuf(fd, std::ios_base::o= ut); > >> auto stream =3D std::ostream(&filebuf); > >> > >> const auto chunk =3D std::vector(chunk_size); > >> > >> for (auto i =3D 0; i < 1'000; ++i) { > >> stream.write(chunk.data(), chunk.size()); > >> } > >> > >> return 0; > >> } > >> ``` > >> > >> ``` > >> $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=3Dc++17 > >> $ for i in $(seq 1021 1025); do echo -e "\n$i"; time > >> /tmp/test_fstream_flush /tmp/foo $i; done > >> > >> 1021 > >> > >> real 0m0.997s > >> user 0m0.000s > >> sys 0m0.038s > >> > >> 1022 > >> > >> real 0m0.939s > >> user 0m0.005s > >> sys 0m0.032s > >> > >> 1023 > >> > >> real 0m0.954s > >> user 0m0.005s > >> sys 0m0.034s > >> > >> 1024 > >> > >> real 0m7.102s > >> user 0m0.040s > >> sys 0m0.192s > >> > >> 1025 > >> > >> real 0m7.204s > >> user 0m0.025s > >> sys 0m0.209s > >> ``` > >> > >> See the huge drop in performance at the 1024-boundary. > >> > >> An `strace` confirms that from size 1024 we effectively defeat > >> buffering: > >> 1023-sized writes > >> ``` > >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > >> /tmp/foo 1023 2>&1 | head -n5 > >> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC= , > >> 0666) =3D 3 > >> writev(3, > >> [{iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0= \0\0\0\0"..., > >> iov_len=3D8184}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1023}], 2) =3D 9207 > >> writev(3, > >> [{iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0= \0\0\0\0"..., > >> iov_len=3D8184}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1023}], 2) =3D 9207 > >> writev(3, > >> [{iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0= \0\0\0\0"..., > >> iov_len=3D8184}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1023}], 2) =3D 9207 > >> writev(3, > >> [{iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0= \0\0\0\0"..., > >> iov_len=3D8184}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1023}], 2) =3D 9207 > >> ``` > >> > >> vs 1024-sized writes > >> ``` > >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > >> /tmp/foo 1024 2>&1 | head -n5 > >> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC= , > >> 0666) =3D 3 > >> writev(3, [{iov_base=3DNULL, iov_len=3D0}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1024}], 2) =3D 1024 > >> writev(3, [{iov_base=3D"", iov_len=3D0}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1024}], 2) =3D 1024 > >> writev(3, [{iov_base=3D"", iov_len=3D0}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1024}], 2) =3D 1024 > >> writev(3, [{iov_base=3D"", iov_len=3D0}, > >> {iov_base=3D"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\= 0\0\0\0"..., > >> iov_len=3D1024}], 2) =3D 1024 > >> ``` > >> > >> Instead, it makes sense to only bypass the buffer if the amount of dat= a > >> to be written is larger than the buffer capacity. > >> > >> Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D63746 > >> --- > >> libstdc++-v3/include/bits/fstream.tcc | 9 +-- > >> .../27_io/basic_filebuf/sputn/char/63746.cc | 55 ++++++++++++++++++= + > >> 2 files changed, 58 insertions(+), 6 deletions(-) > >> create mode 100644 > >> libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> > >> diff --git a/libstdc++-v3/include/bits/fstream.tcc > >> b/libstdc++-v3/include/bits/fstream.tcc > >> index 7ccc887b8..2e9369628 100644 > >> --- a/libstdc++-v3/include/bits/fstream.tcc > >> +++ b/libstdc++-v3/include/bits/fstream.tcc > >> @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> { > >> streamsize __ret =3D 0; > >> // Optimization in the always_noconv() case, to be generalized = in > >> the > >> - // future: when __n is sufficiently large we write directly > >> instead of > >> - // using the buffer. > >> + // future: when __n is larger than the available capacity we wr= ite > >> + // directly instead of using the buffer. > >> const bool __testout =3D (_M_mode & ios_base::out > >> || _M_mode & ios_base::app); > >> if (__check_facet(_M_codecvt).always_noconv() > >> && __testout && !_M_reading) > >> { > >> - // Measurement would reveal the best choice. > >> - const streamsize __chunk =3D 1ul << 10; > >> streamsize __bufavail =3D this->epptr() - this->pptr(); > >> > >> // Don't mistake 'uncommitted' mode buffered with unbuffered= . > >> if (!_M_writing && _M_buf_size > 1) > >> __bufavail =3D _M_buf_size - 1; > >> > >> - const streamsize __limit =3D std::min(__chunk, __bufavail); > >> - if (__n >=3D __limit) > >> + if (__n >=3D __bufavail) > >> { > >> const streamsize __buffill =3D this->pptr() - this->pbas= e(); > >> const char* __buf =3D reinterpret_cast >> char*>(this->pbase()); > >> diff --git > >> a/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> new file mode 100644 > >> index 000000000..36448e049 > >> --- /dev/null > >> +++ b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> @@ -0,0 +1,55 @@ > >> +// Copyright (C) 2013-2022 Free Software Foundation, Inc. > >> +// > >> +// This file is part of the GNU ISO C++ Library. This library is fre= e > >> +// software; you can redistribute it and/or modify it under the > >> +// terms of the GNU General Public License as published by the > >> +// Free Software Foundation; either version 3, or (at your option) > >> +// any later version. > >> + > >> +// This library is distributed in the hope that it will be useful, > >> +// but WITHOUT ANY WARRANTY; without even the implied warranty of > >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >> +// GNU General Public License for more details. > >> + > >> +// You should have received a copy of the GNU General Public License > >> along > >> +// with this library; see the file COPYING3. If not see > >> +// . > >> + > >> +// { dg-require-fileio "" } > >> + > >> +#include > >> +#include > >> + > >> +class testbuf : public std::filebuf { > >> +public: > >> + char_type* pub_pprt() const > >> + { > >> + return this->pptr(); > >> + } > >> + > >> + char_type* pub_pbase() const > >> + { > >> + return this->pbase(); > >> + } > >> +}; > >> + > >> +void test01() > >> +{ > >> + using namespace std; > >> + > >> + // Leave capacity to avoid flush. > >> + const streamsize chunk_size =3D BUFSIZ - 1 - 1; > >> + const char data[chunk_size] =3D {}; > >> + > >> + testbuf a_f; > >> + VERIFY( a_f.open("tmp_63746_sputn", ios_base::out) ); > >> + VERIFY( chunk_size =3D=3D a_f.sputn(data, chunk_size) ); > >> + VERIFY( (a_f.pub_pprt() - a_f.pub_pbase()) =3D=3D chunk_size ); > >> + VERIFY( a_f.close() ); > >> +} > >> + > >> +int main() > >> +{ > >> + test01(); > >> + return 0; > >> +} > >> -- > >> 2.30.2 > >> > >> >