From: "Jannik Glückert" <jannik.glueckert@gmail.com>
To: libstdc++@gcc.gnu.org
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH v2 1/2] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file
Date: Wed, 15 Mar 2023 20:29:58 +0100 [thread overview]
Message-ID: <CAFqe=zKDb8AV9dubh6Jqokg_qynXWfVsENxhDd45Nm8bi7oyZQ@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 515 bytes --]
This iteration improves error handling for copy_file_range,
particularly around undocumented error codes in earlier kernel
versions.
Additionally this fixes the userspace copy fallback to handle
zero-length files such as in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178.
Lastly, the case "src gets resized during the copy loop" is now
considered and will return true once the loop hits EOF (this is the
only situation, aside from a zero-length src, where sendfile and
copy_file_range return 0).
Best
Jannik
[-- Attachment #2: 0001-libstdc-also-use-sendfile-for-big-files.patch --]
[-- Type: text/x-patch, Size: 7391 bytes --]
From b55eb8dccaa44f07d8acbe6294326a46c920b04f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jannik=20Gl=C3=BCckert?= <jannik.glueckert@gmail.com>
Date: Mon, 6 Mar 2023 20:52:08 +0100
Subject: [PATCH 1/2] libstdc++: also use sendfile for big files
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
we were previously only using sendfile for files smaller than 2GB, as
sendfile needs to be called repeatedly for files bigger than that.
some quick numbers, copying a 16GB file, average of 10 repetitions:
old:
real: 13.4s
user: 0.14s
sys : 7.43s
new:
real: 8.90s
user: 0.00s
sys : 3.68s
Additionally, this fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178
libstdc++-v3/ChangeLog:
* acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define
* config.h.in: Regenerate.
* configure: Regenerate.
* src/filesystem/ops-common.h: enable sendfile for files
>2GB in std::filesystem::copy_file, skip zero-length files
Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>
---
libstdc++-v3/acinclude.m4 | 51 +++++----
libstdc++-v3/config.h.in | 3 +
libstdc++-v3/configure | 127 ++++++++++++++++-------
libstdc++-v3/src/filesystem/ops-common.h | 86 ++++++++-------
4 files changed, 175 insertions(+), 92 deletions(-)
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 5136c0571e8..85a09a5a869 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4583,6 +4583,7 @@ dnl _GLIBCXX_USE_FCHMOD
dnl _GLIBCXX_USE_FCHMODAT
dnl _GLIBCXX_USE_SENDFILE
dnl HAVE_LINK
+dnl HAVE_LSEEK
dnl HAVE_READLINK
dnl HAVE_SYMLINK
dnl
@@ -4718,25 +4719,6 @@ dnl
if test $glibcxx_cv_fchmodat = yes; then
AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in <sys/stat.h>.])
fi
-dnl
- AC_CACHE_CHECK([for sendfile that can copy files],
- glibcxx_cv_sendfile, [dnl
- case "${target_os}" in
- gnu* | linux* | solaris* | uclinux*)
- GCC_TRY_COMPILE_OR_LINK(
- [#include <sys/sendfile.h>],
- [sendfile(1, 2, (off_t*)0, sizeof 1);],
- [glibcxx_cv_sendfile=yes],
- [glibcxx_cv_sendfile=no])
- ;;
- *)
- glibcxx_cv_sendfile=no
- ;;
- esac
- ])
- if test $glibcxx_cv_sendfile = yes; then
- AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
- fi
dnl
AC_CACHE_CHECK([for link],
glibcxx_cv_link, [dnl
@@ -4749,6 +4731,18 @@ dnl
if test $glibcxx_cv_link = yes; then
AC_DEFINE(HAVE_LINK, 1, [Define if link is available in <unistd.h>.])
fi
+dnl
+ AC_CACHE_CHECK([for lseek],
+ glibcxx_cv_lseek, [dnl
+ GCC_TRY_COMPILE_OR_LINK(
+ [#include <unistd.h>],
+ [lseek(1, 0, SEEK_SET);],
+ [glibcxx_cv_lseek=yes],
+ [glibcxx_cv_lseek=no])
+ ])
+ if test $glibcxx_cv_lseek = yes; then
+ AC_DEFINE(HAVE_LSEEK, 1, [Define if lseek is available in <unistd.h>.])
+ fi
dnl
AC_CACHE_CHECK([for readlink],
glibcxx_cv_readlink, [dnl
@@ -4785,6 +4779,25 @@ dnl
if test $glibcxx_cv_truncate = yes; then
AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in <unistd.h>.])
fi
+dnl
+ AC_CACHE_CHECK([for sendfile that can copy files],
+ glibcxx_cv_sendfile, [dnl
+ case "${target_os}" in
+ gnu* | linux* | solaris* | uclinux*)
+ GCC_TRY_COMPILE_OR_LINK(
+ [#include <sys/sendfile.h>],
+ [sendfile(1, 2, (off_t*)0, sizeof 1);],
+ [glibcxx_cv_sendfile=yes],
+ [glibcxx_cv_sendfile=no])
+ ;;
+ *)
+ glibcxx_cv_sendfile=no
+ ;;
+ esac
+ ])
+ if test $glibcxx_cv_sendfile = yes && test $glibcxx_cv_lseek = yes; then
+ AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in <sys/sendfile.h>.])
+ fi
dnl
AC_CACHE_CHECK([for fdopendir],
glibcxx_cv_fdopendir, [dnl
diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index abbfca43e5c..9e1b1d41dc5 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -51,6 +51,7 @@
# include <ext/stdio_filebuf.h>
# ifdef _GLIBCXX_USE_SENDFILE
# include <sys/sendfile.h> // sendfile
+# include <unistd.h> // lseek
# endif
#endif
@@ -358,6 +359,32 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
}
#ifdef NEED_DO_COPY_FILE
+#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
+ bool
+ copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
+ {
+ // a zero-length file is either empty, or not copyable by this syscall
+ // return early to avoid the syscall cost
+ if (length == 0)
+ {
+ errno = EINVAL;
+ return false;
+ }
+ size_t bytes_left = length;
+ off_t offset = 0;
+ ssize_t bytes_copied;
+ do {
+ bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left);
+ bytes_left -= bytes_copied;
+ } while (bytes_left > 0 && bytes_copied > 0);
+ if (bytes_copied < 0)
+ {
+ ::lseek(fd_out, 0, SEEK_SET);
+ return false;
+ }
+ return true;
+ }
+#endif
bool
do_copy_file(const char_type* from, const char_type* to,
std::filesystem::copy_options_existing_file options,
@@ -498,28 +525,30 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
return false;
}
- size_t count = from_st->st_size;
+ bool has_copied = false;
+
#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
- off_t offset = 0;
- ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
- if (n < 0 && errno != ENOSYS && errno != EINVAL)
+ if (!has_copied)
+ has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
+ if (!has_copied)
{
- ec.assign(errno, std::generic_category());
- return false;
+ if (errno != ENOSYS && errno != EINVAL)
+ {
+ ec.assign(errno, std::generic_category());
+ return false;
+ }
}
- if ((size_t)n == count)
+#endif
+
+ if (has_copied)
{
- if (!out.close() || !in.close())
- {
- ec.assign(errno, std::generic_category());
- return false;
- }
- ec.clear();
- return true;
+ if (!out.close() || !in.close())
+ {
+ ec.assign(errno, std::generic_category());
+ return false;
+ }
+ return true;
}
- else if (n > 0)
- count -= n;
-#endif // _GLIBCXX_USE_SENDFILE
using std::ios;
__gnu_cxx::stdio_filebuf<char> sbin(in.fd, ios::in|ios::binary);
@@ -530,29 +559,12 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
if (sbout.is_open())
out.fd = -1;
-#ifdef _GLIBCXX_USE_SENDFILE
- if (n != 0)
+ if (!(std::ostream(&sbout) << &sbin))
{
- if (n < 0)
- n = 0;
-
- const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
- const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
-
- const std::streampos errpos(std::streamoff(-1));
- if (p1 == errpos || p2 == errpos)
- {
- ec = std::make_error_code(std::errc::io_error);
- return false;
- }
+ ec = std::make_error_code(std::errc::io_error);
+ return false;
}
-#endif
- if (count && !(std::ostream(&sbout) << &sbin))
- {
- ec = std::make_error_code(std::errc::io_error);
- return false;
- }
if (!sbout.close() || !sbin.close())
{
ec.assign(errno, std::generic_category());
--
2.39.2
next reply other threads:[~2023-03-15 19:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 19:29 Jannik Glückert [this message]
2023-03-20 15:16 ` Jonathan Wakely
2023-03-20 15:18 ` Jonathan Wakely
2023-03-20 22:27 ` [PATCH v2 1/2] libstdc++: also use sendfile for big files Jonathan Wakely
2023-03-20 22:30 ` Jonathan Wakely
2023-03-22 12:14 ` Jonathan Wakely
2023-03-22 12:18 ` Jonathan Wakely
2023-03-22 12:20 ` Jonathan Wakely
2023-06-06 11:37 ` Jonathan Wakely
2023-06-06 11:36 ` Jonathan Wakely
2023-06-06 11:35 ` Jonathan Wakely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFqe=zKDb8AV9dubh6Jqokg_qynXWfVsENxhDd45Nm8bi7oyZQ@mail.gmail.com' \
--to=jannik.glueckert@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).