public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file
@ 2023-03-15 19:29 Jannik Glückert
  2023-03-20 15:16 ` Jonathan Wakely
  2023-03-20 22:27 ` [PATCH v2 1/2] libstdc++: also use sendfile for big files Jonathan Wakely
  0 siblings, 2 replies; 11+ messages in thread
From: Jannik Glückert @ 2023-03-15 19:29 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 515 bytes --]

This iteration improves error handling for copy_file_range,
particularly around undocumented error codes in earlier kernel
versions.
Additionally this fixes the userspace copy fallback to handle
zero-length files such as in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178.

Lastly, the case "src gets resized during the copy loop" is now
considered and will return true once the loop hits EOF (this is the
only situation, aside from a zero-length src, where sendfile and
copy_file_range return 0).

Best
Jannik

[-- Attachment #2: 0001-libstdc-also-use-sendfile-for-big-files.patch --]
[-- Type: text/x-patch, Size: 7391 bytes --]

From b55eb8dccaa44f07d8acbe6294326a46c920b04f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jannik=20Gl=C3=BCckert?= <jannik.glueckert@gmail.com>
Date: Mon, 6 Mar 2023 20:52:08 +0100
Subject: [PATCH 1/2] libstdc++: also use sendfile for big files
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

we were previously only using sendfile for files smaller than 2GB, as
sendfile needs to be called repeatedly for files bigger than that.

some quick numbers, copying a 16GB file, average of 10 repetitions:
    old:
        real: 13.4s
        user: 0.14s
        sys : 7.43s
    new:
        real: 8.90s
        user: 0.00s
        sys : 3.68s

Additionally, this fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178

libstdc++-v3/ChangeLog:

        * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
        * config.h.in: Regenerate.
        * configure: Regenerate.
        * src/filesystem/ops-common.h: enable sendfile for files
          >2GB in std::filesystem::copy_file, skip zero-length files

Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>
---
 libstdc++-v3/acinclude.m4                |  51 +++++----
 libstdc++-v3/config.h.in                 |   3 +
 libstdc++-v3/configure                   | 127 ++++++++++++++++-------
 libstdc++-v3/src/filesystem/ops-common.h |  86 ++++++++-------
 4 files changed, 175 insertions(+), 92 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 5136c0571e8..85a09a5a869 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4583,6 +4583,7 @@ dnl  _GLIBCXX_USE_FCHMOD
 dnl  _GLIBCXX_USE_FCHMODAT
 dnl  _GLIBCXX_USE_SENDFILE
 dnl  HAVE_LINK
+dnl  HAVE_LSEEK
 dnl  HAVE_READLINK
 dnl  HAVE_SYMLINK
 dnl
@@ -4718,25 +4719,6 @@ dnl
   if test $glibcxx_cv_fchmodat = yes; then
     AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in <sys/stat.h>.])
   fi
-dnl
-  AC_CACHE_CHECK([for sendfile that can copy files],
-    glibcxx_cv_sendfile, [dnl
-    case "${target_os}" in
-      gnu* | linux* | solaris* | uclinux*)
-	GCC_TRY_COMPILE_OR_LINK(
-	  [#include <sys/sendfile.h>],
-	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
-	  [glibcxx_cv_sendfile=yes],
-	  [glibcxx_cv_sendfile=no])
-	;;
-      *)
-	glibcxx_cv_sendfile=no
-	;;
-    esac
-  ])
-  if test $glibcxx_cv_sendfile = yes; then
-    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
-  fi
 dnl
   AC_CACHE_CHECK([for link],
     glibcxx_cv_link, [dnl
@@ -4749,6 +4731,18 @@ dnl
   if test $glibcxx_cv_link = yes; then
     AC_DEFINE(HAVE_LINK, 1, [Define if link is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for lseek],
+    glibcxx_cv_lseek, [dnl
+    GCC_TRY_COMPILE_OR_LINK(
+      [#include <unistd.h>],
+      [lseek(1, 0, SEEK_SET);],
+      [glibcxx_cv_lseek=yes],
+      [glibcxx_cv_lseek=no])
+  ])
+  if test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(HAVE_LSEEK, 1, [Define if lseek is available in <unistd.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for readlink],
     glibcxx_cv_readlink, [dnl
@@ -4785,6 +4779,25 @@ dnl
   if test $glibcxx_cv_truncate = yes; then
     AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in <unistd.h>.])
   fi
+dnl
+  AC_CACHE_CHECK([for sendfile that can copy files],
+    glibcxx_cv_sendfile, [dnl
+    case "${target_os}" in
+      gnu* | linux* | solaris* | uclinux*)
+	GCC_TRY_COMPILE_OR_LINK(
+	  [#include <sys/sendfile.h>],
+	  [sendfile(1, 2, (off_t*)0, sizeof 1);],
+	  [glibcxx_cv_sendfile=yes],
+	  [glibcxx_cv_sendfile=no])
+	;;
+      *)
+	glibcxx_cv_sendfile=no
+	;;
+    esac
+  ])
+  if test $glibcxx_cv_sendfile = yes && test $glibcxx_cv_lseek = yes; then
+    AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
+  fi
 dnl
   AC_CACHE_CHECK([for fdopendir],
     glibcxx_cv_fdopendir, [dnl
diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index abbfca43e5c..9e1b1d41dc5 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -51,6 +51,7 @@
 # include <ext/stdio_filebuf.h>
 # ifdef _GLIBCXX_USE_SENDFILE
 #  include <sys/sendfile.h> // sendfile
+#  include <unistd.h> // lseek
 # endif
 #endif
 
@@ -358,6 +359,32 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
   }
 
 #ifdef NEED_DO_COPY_FILE
+#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  bool
+  copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
+  {
+    // a zero-length file is either empty, or not copyable by this syscall
+    // return early to avoid the syscall cost
+    if (length == 0)
+      {
+        errno = EINVAL;
+        return false;
+      }
+    size_t bytes_left = length;
+    off_t offset = 0;
+    ssize_t bytes_copied;
+    do {
+      bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left);
+      bytes_left -= bytes_copied;
+    } while (bytes_left > 0 && bytes_copied > 0);
+    if (bytes_copied < 0)
+      {
+        ::lseek(fd_out, 0, SEEK_SET);
+        return false;
+      }
+    return true;
+  }
+#endif
   bool
   do_copy_file(const char_type* from, const char_type* to,
 	       std::filesystem::copy_options_existing_file options,
@@ -498,28 +525,30 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 	return false;
       }
 
-    size_t count = from_st->st_size;
+    bool has_copied = false;
+
 #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
-    off_t offset = 0;
-    ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
-    if (n < 0 && errno != ENOSYS && errno != EINVAL)
+    if (!has_copied)
+      has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
+    if (!has_copied)
       {
-	ec.assign(errno, std::generic_category());
-	return false;
+      if (errno != ENOSYS && errno != EINVAL)
+        {
+          ec.assign(errno, std::generic_category());
+          return false;
+        }
       }
-    if ((size_t)n == count)
+#endif
+
+    if (has_copied)
       {
-	if (!out.close() || !in.close())
-	  {
-	    ec.assign(errno, std::generic_category());
-	    return false;
-	  }
-	ec.clear();
-	return true;
+        if (!out.close() || !in.close())
+          {
+	          ec.assign(errno, std::generic_category());
+	          return false;
+          }
+        return true;
       }
-    else if (n > 0)
-      count -= n;
-#endif // _GLIBCXX_USE_SENDFILE
 
     using std::ios;
     __gnu_cxx::stdio_filebuf<char> sbin(in.fd, ios::in|ios::binary);
@@ -530,29 +559,12 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
     if (sbout.is_open())
       out.fd = -1;
 
-#ifdef _GLIBCXX_USE_SENDFILE
-    if (n != 0)
+    if (!(std::ostream(&sbout) << &sbin))
       {
-	if (n < 0)
-	  n = 0;
-
-	const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
-	const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
-
-	const std::streampos errpos(std::streamoff(-1));
-	if (p1 == errpos || p2 == errpos)
-	  {
-	    ec = std::make_error_code(std::errc::io_error);
-	    return false;
-	  }
+  ec = std::make_error_code(std::errc::io_error);
+  return false;
       }
-#endif
 
-    if (count && !(std::ostream(&sbout) << &sbin))
-      {
-	ec = std::make_error_code(std::errc::io_error);
-	return false;
-      }
     if (!sbout.close() || !sbin.close())
       {
 	ec.assign(errno, std::generic_category());
-- 
2.39.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-06 11:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15 19:29 [PATCH v2 1/2] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file Jannik Glückert
2023-03-20 15:16 ` Jonathan Wakely
2023-03-20 15:18   ` Jonathan Wakely
2023-03-20 22:27 ` [PATCH v2 1/2] libstdc++: also use sendfile for big files Jonathan Wakely
2023-03-20 22:30   ` Jonathan Wakely
2023-03-22 12:14     ` Jonathan Wakely
2023-03-22 12:18       ` Jonathan Wakely
2023-03-22 12:20         ` Jonathan Wakely
2023-06-06 11:37           ` Jonathan Wakely
2023-06-06 11:36         ` Jonathan Wakely
2023-06-06 11:35       ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).