public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: Jonathan Wakely <jwakely@redhat.com>
Cc: "Jannik Glückert" <jannik.glueckert@gmail.com>,
	libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v2 1/2] libstdc++: also use sendfile for big files
Date: Wed, 22 Mar 2023 12:14:54 +0000	[thread overview]
Message-ID: <CACb0b4mJS8Bq_6e0OHkSFME6SQ21E4jA+qXQM3krLH3RA=We2g@mail.gmail.com> (raw)
In-Reply-To: <ZBjegJuddjPghBd2@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 1564 bytes --]

On Mon, 20 Mar 2023 at 22:30, Jonathan Wakely via Libstdc++ <
libstdc++@gcc.gnu.org> wrote:

> On 20/03/23 22:27 +0000, Jonathan Wakely wrote:
> >On 06/03/23 20:52 +0100, Jannik Glückert wrote:
> >>we were previously only using sendfile for files smaller than 2GB, as
> >>sendfile needs to be called repeatedly for files bigger than that.
> >>
> >>some quick numbers, copying a 16GB file, average of 10 repetitions:
> >>   old:
> >>       real: 13.4s
> >>       user: 0.14s
> >>       sys : 7.43s
> >>   new:
> >>       real: 8.90s
> >>       user: 0.00s
> >>       sys : 3.68s
> >>
> >>Additionally, this fixes
> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
> >>
> >>libstdc++-v3/ChangeLog:
> >>
> >>       * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
> >>       * config.h.in: Regenerate.
> >>       * configure: Regenerate.
> >>       * src/filesystem/ops-common.h: enable sendfile for files
> >>         >2GB in std::filesystem::copy_file, skip zero-length files
>
> Also, the ChangeLog entry needs to be indented with tabs, name the
> changed functions, and should be complete sentences, e.g.
>
>         * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
>         * config.h.in: Regenerate.
>         * configure: Regenerate.
>         * src/filesystem/ops-common.h (copy_file_sendfile): Define new
>         function for sendfile logic. Loop to support large files. Skip
>         zero-length files.
>         (do_copy_file): Use it.
>
>
Here's what I plan to commit in a few weeks when GCC 14 Stage 1 opens.

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 6862 bytes --]

commit c177825f952bb353cdf412f46f45539b8992abe1
Author: Jannik Glückert <jannik.glueckert@gmail.com>
Date:   Mon Mar 6 19:52:08 2023

    libstdc++: Also use sendfile for big files
    
    We were previously only using sendfile for files smaller than 2GB, as
    sendfile needs to be called repeatedly for files bigger than that.
    
    Some quick numbers, copying a 16GB file, average of 10 repetitions:
        old:
            real: 13.4s
            user: 0.14s
            sys : 7.43s
        new:
            real: 8.90s
            user: 0.00s
            sys : 3.68s
    
    libstdc++-v3/ChangeLog:
    
            * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
            * config.h.in: Regenerate.
            * configure: Regenerate.
            * src/filesystem/ops-common.h (copy_file_sendfile): Define new
            function for sendfile logic. Loop to support large files. Skip
            zero-length files.
            (do_copy_file): Use it.
    
    Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 5136c0571e8..85a09a5a869 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4583,6 +4583,7 @@ dnl  _GLIBCXX_USE_FCHMOD
 dnl  _GLIBCXX_USE_FCHMODAT
 dnl  _GLIBCXX_USE_SENDFILE
 dnl  HAVE_LINK
+dnl  HAVE_LSEEK
 dnl  HAVE_READLINK
 dnl  HAVE_SYMLINK
 dnl
@@ -4718,25 +4719,6 @@ dnl
   if test $glibcxx_cv_fchmodat = yes; then
     AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in <sys/stat.h>.])
   fi
-dnl
-  AC_CACHE_CHECK([for sendfile that can copy files],
-    glibcxx_cv_sendfile, [dnl
-    case "${target_os}" in
-      gnu* | linux* | solaris* | uclinux*)
-	GCC_TRY_COMPILE_OR_LINK(
-	  [#include <sys/sendfile.h>],
-	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
-	  [glibcxx_cv_sendfile=yes],
-	  [glibcxx_cv_sendfile=no])
-	;;
-      *)
-	glibcxx_cv_sendfile=no
-	;;
-    esac
-  ])
-  if test $glibcxx_cv_sendfile = yes; then
-    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
-  fi
 dnl
   AC_CACHE_CHECK([for link],
     glibcxx_cv_link, [dnl
@@ -4749,6 +4731,18 @@ dnl
   if test $glibcxx_cv_link = yes; then
     AC_DEFINE(HAVE_LINK, 1, [Define if link is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for lseek],
+    glibcxx_cv_lseek, [dnl
+    GCC_TRY_COMPILE_OR_LINK(
+      [#include <unistd.h>],
+      [lseek(1, 0, SEEK_SET);],
+      [glibcxx_cv_lseek=yes],
+      [glibcxx_cv_lseek=no])
+  ])
+  if test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(HAVE_LSEEK, 1, [Define if lseek is available in <unistd.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for readlink],
     glibcxx_cv_readlink, [dnl
@@ -4785,6 +4779,25 @@ dnl
   if test $glibcxx_cv_truncate = yes; then
     AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for sendfile that can copy files],
+    glibcxx_cv_sendfile, [dnl
+    case "${target_os}" in
+      gnu* | linux* | solaris* | uclinux*)
+	GCC_TRY_COMPILE_OR_LINK(
+	  [#include <sys/sendfile.h>],
+	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
+	  [glibcxx_cv_sendfile=yes],
+	  [glibcxx_cv_sendfile=no])
+	;;
+      *)
+	glibcxx_cv_sendfile=no
+	;;
+    esac
+  ])
+  if test $glibcxx_cv_sendfile = yes && test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for fdopendir],
     glibcxx_cv_fdopendir, [dnl
diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index c95511b5c95..7874a95488a 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -51,6 +51,7 @@
 # include <ext/stdio_filebuf.h>
 # ifdef _GLIBCXX_USE_SENDFILE
 #  include <sys/sendfile.h> // sendfile
+#  include <unistd.h> // lseek
 # endif
 #endif
 
@@ -358,6 +359,34 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
   }
 
 #ifdef NEED_DO_COPY_FILE
+#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  bool
+  copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
+  {
+    // a zero-length file is either empty, or not copyable by this syscall
+    // return early to avoid the syscall cost
+    if (length == 0)
+      {
+	errno = EINVAL;
+	return false;
+      }
+    size_t bytes_left = length;
+    off_t offset = 0;
+    ssize_t bytes_copied;
+    do
+      {
+	bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left);
+	bytes_left -= bytes_copied;
+      }
+    while (bytes_left > 0 && bytes_copied > 0);
+    if (bytes_copied < 0)
+      {
+	::lseek(fd_out, 0, SEEK_SET);
+	return false;
+      }
+    return true;
+  }
+#endif
   bool
   do_copy_file(const char_type* from, const char_type* to,
 	       std::filesystem::copy_options_existing_file options,
@@ -498,28 +527,31 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 	return false;
       }
 
-    size_t count = from_st->st_size;
+    bool has_copied = false;
+
 #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
-    off_t offset = 0;
-    ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
-    if (n < 0 && errno != ENOSYS && errno != EINVAL)
+    if (!has_copied)
+      has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
+    if (!has_copied)
       {
-	ec.assign(errno, std::generic_category());
-	return false;
-      }
-    if ((size_t)n == count)
-      {
-	if (!out.close() || !in.close())
+	if (errno != ENOSYS && errno != EINVAL)
 	  {
 	    ec.assign(errno, std::generic_category());
 	    return false;
 	  }
-	ec.clear();
-	return true;
       }
-    else if (n > 0)
-      count -= n;
-#endif // _GLIBCXX_USE_SENDFILE
+#endif
+
+    if (has_copied)
+      {
+        if (!out.close() || !in.close())
+          {
+	    ec.assign(errno, std::generic_category());
+	    return false;
+          }
+	ec.clear();
+        return true;
+      }
 
     using std::ios;
     __gnu_cxx::stdio_filebuf<char> sbin(in.fd, ios::in|ios::binary);
@@ -530,29 +562,12 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
     if (sbout.is_open())
       out.fd = -1;
 
-#ifdef _GLIBCXX_USE_SENDFILE
-    if (n != 0)
-      {
-	if (n < 0)
-	  n = 0;
-
-	const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
-	const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
-
-	const std::streampos errpos(std::streamoff(-1));
-	if (p1 == errpos || p2 == errpos)
-	  {
-	    ec = std::make_error_code(std::errc::io_error);
-	    return false;
-	  }
-      }
-#endif
-
-    if (count && !(std::ostream(&sbout) << &sbin))
+    if (from_st->st_size && !(std::ostream(&sbout) << &sbin))
       {
 	ec = std::make_error_code(std::errc::io_error);
 	return false;
       }
+
     if (!sbout.close() || !sbin.close())
       {
 	ec.assign(errno, std::generic_category());

  reply	other threads:[~2023-03-22 12:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15 19:29 [PATCH v2 1/2] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file Jannik Glückert
2023-03-20 15:16 ` Jonathan Wakely
2023-03-20 15:18   ` Jonathan Wakely
2023-03-20 22:27 ` [PATCH v2 1/2] libstdc++: also use sendfile for big files Jonathan Wakely
2023-03-20 22:30   ` Jonathan Wakely
2023-03-22 12:14     ` Jonathan Wakely [this message]
2023-03-22 12:18       ` Jonathan Wakely
2023-03-22 12:20         ` Jonathan Wakely
2023-06-06 11:37           ` Jonathan Wakely
2023-06-06 11:36         ` Jonathan Wakely
2023-06-06 11:35       ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACb0b4mJS8Bq_6e0OHkSFME6SQ21E4jA+qXQM3krLH3RA=We2g@mail.gmail.com' \
    --to=jwakely@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jannik.glueckert@gmail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).