public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] libstdc++: use copy_file_range
@ 2023-03-07 11:01 Jonathan Wakely
  0 siblings, 0 replies; only message in thread
From: Jonathan Wakely @ 2023-03-07 11:01 UTC (permalink / raw)
  To: Jannik Glückert

On 06/03/23 23:11 +0100, Jannik Glückert wrote:
>copy_file_range is a recent-ish syscall for copying files. It is similar
>to sendfile but allows filesystem-specific optimizations. Common are:
>Reflinks: BTRFS, XFS, ZFS (does not implement the syscall yet)
>Server-side copy: NFS, SMB
>
>If copy_file_range is not available for the given files, fall back to
>sendfile / userspace copy.

Thanks for the patch!

Please note the legal prerequisites for GCC contributions:
https://gcc.gnu.org/contribute.html#legal

>libstdc++-v3/ChangeLog:
>
>	* acinclude.m4 (_GLIBCXX_USE_COPY_FILE_RANGE): define
>        * config.h.in: Regenerate.
>	* configure: Regenerate.
>	* src/filesystem/ops-common.h: use copy_file_range in
>          std::filesystem::copy_file
>---
> libstdc++-v3/acinclude.m4                | 20 ++++++++
> libstdc++-v3/config.h.in                 |  3 ++
> libstdc++-v3/configure                   | 62 ++++++++++++++++++++++++
> libstdc++-v3/src/filesystem/ops-common.h | 34 +++++++++++++
> 4 files changed, 119 insertions(+)
>
>diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
>index 5136c0571e8..ca09e1d22db 100644
>--- a/libstdc++-v3/acinclude.m4
>+++ b/libstdc++-v3/acinclude.m4
>@@ -4581,6 +4581,7 @@ dnl  _GLIBCXX_USE_UTIMENSAT
> dnl  _GLIBCXX_USE_ST_MTIM
> dnl  _GLIBCXX_USE_FCHMOD
> dnl  _GLIBCXX_USE_FCHMODAT
>+dnl  _GLIBCXX_USE_COPY_FILE_RANGE
> dnl  _GLIBCXX_USE_SENDFILE
> dnl  HAVE_LINK
> dnl  HAVE_READLINK
>@@ -4718,6 +4719,25 @@ dnl
>   if test $glibcxx_cv_fchmodat = yes; then
>     AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in <sys/stat.h>.])
>   fi
>+dnl
>+  AC_CACHE_CHECK([for copy_file_range that can copy files],
>+    glibcxx_cv_copy_file_range, [dnl
>+    case "${target_os}" in
>+      linux*)
>+	GCC_TRY_COMPILE_OR_LINK(
>+	  [#include <unistd.h>],
>+	  [copy_file_range(1, NULL, 2, NULL, 1, 0);],

This should either include <stddef.h> for NULL, or use nullptr.

>+	  [glibcxx_cv_copy_file_range=yes],
>+	  [glibcxx_cv_copy_file_range=no])
>+	;;
>+      *)
>+	glibcxx_cv_copy_file_range=no
>+	;;
>+    esac
>+  ])
>+  if test $glibcxx_cv_copy_file_range = yes; then
>+    AC_DEFINE(_GLIBCXX_USE_COPY_FILE_RANGE, 1, [Define if copy_file_range is available in <unistd.h>.])
>+  fi
> dnl
>   AC_CACHE_CHECK([for sendfile that can copy files],
>     glibcxx_cv_sendfile, [dnl
>diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
>index d8afc6a4d64..0491dc8d811 100644
>--- a/libstdc++-v3/src/filesystem/ops-common.h
>+++ b/libstdc++-v3/src/filesystem/ops-common.h
>@@ -49,6 +49,9 @@
> #ifdef NEED_DO_COPY_FILE
> # include <filesystem>
> # include <ext/stdio_filebuf.h>
>+# ifdef _GLIBCXX_USE_COPY_FILE_RANGE
>+#  include <unistd.h> // copy_file_range
>+# endif
> # ifdef _GLIBCXX_USE_SENDFILE
> #  include <sys/sendfile.h> // sendfile
> # endif
>@@ -358,6 +361,24 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
>   }
>
> #ifdef NEED_DO_COPY_FILE
>+#ifdef _GLIBCXX_USE_COPY_FILE_RANGE
>+  bool
>+  copy_file_copy_file_range(int fd_in, int fd_out, size_t length) noexcept
>+  {
>+    size_t bytes_left = length;
>+    off_t offset = 0;
>+    ssize_t bytes_copied;
>+    do {
>+      bytes_copied = ::copy_file_range(fd_in, &offset, fd_out, NULL, bytes_left, 0);

If there's a good reason to use &offset here instead of nullptr then I
think we need a comment explaining it. I initially thought the point
was to not adjust the file offset for fd_in, so that if
copy_file_range fails after the first call, we retry using sendfile or
filebufs and restart copying from the beginning again. But
copy_file_range will have updated the file offset of fd_out in that
case, and so sendfile/filebuf would start copying from the beginning
of fd_in but write to the end of fd_out, producing an incorrect copy.

Either we should pass &off_in and &off_out, so that neither file
offset is updated, or we should rewind the output file offset before
retrying with sendfile/filebuf. Your first patch removes the existing
code that seeks the filebuf positions to account for any partial
sendfile copying. I'm not sure the code is correct after removing
that.

>+      if (bytes_copied < 0)
>+        {
>+          return false;
>+        }
>+      bytes_left -= bytes_copied;
>+    } while (bytes_left > 0 && bytes_copied > 0);

This will break out of the loop of copy_file_range returns zero, even
if bytes_left != 0. It's not clear from the man page whether that can
happen (it says it will return zero if the file offset of fd_in is at
EOF, but it doesn't say it *won't* return zero otherwise). This does
match the loop condition in the man page's example though.


>+    return true;
>+  }
>+#endif
> #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
>   bool
>   copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
>@@ -518,6 +539,19 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
>
>     bool has_copied = false;
>
>+#ifdef _GLIBCXX_USE_COPY_FILE_RANGE
>+    if (!has_copied)
>+      has_copied = copy_file_copy_file_range(in.fd, out.fd, from_st->st_size);
>+    if (!has_copied)
>+      {
>+        if (errno != EFBIG && errno != EOPNOTSUPP && errno != EOVERFLOW && errno != EXDEV)

Shouldn't this be checking ENOSYS not EOPNOTSUPP?

If copy_file_range fails with EFBIG of EOVERFLOW, won't sendfile and
the filebuf copy fail in the same way?

Are there actually error conditions where copy_file_range will succeed
on the first call, then fail on a subsequent call (e.g. with EFBIG),
but then sendfile or filebuf would succeed? If not, we can just report
an error for EFBIG and EOVERFLOW.


>+          {
>+            ec.assign(errno, std::generic_category());
>+            return false;
>+          }
>+      }
>+#endif
>+
> #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
>     if (!has_copied)
>       has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
>-- 
>2.39.2


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-03-07 13:13 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07 11:01 [PATCH 2/2] libstdc++: use copy_file_range Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).