public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org,
	 "Andreas K . Huettel" <dilfridge@gentoo.org>,
	 Paul Eggert <eggert@cs.ucla.edu>
Subject: Re: [PATCH v5 2/5] linux: Set internal DIR filepos as off64_t (BZ #23960, BZ #24050)
Date: Mon, 20 Feb 2023 19:39:58 +0100	[thread overview]
Message-ID: <87ilfwdvup.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <20230127172834.391311-3-adhemerval.zanella@linaro.org> (Adhemerval Zanella's message of "Fri, 27 Jan 2023 14:28:31 -0300")

* Adhemerval Zanella:

> diff --git a/dirent/tst-seekdir.c b/dirent/tst-seekdir.c
> index dcdd699b09..187eda7584 100644
> --- a/dirent/tst-seekdir.c
> +++ b/dirent/tst-seekdir.c
> @@ -41,6 +41,14 @@ do_test (void)
>        if (i == 400)
>  	break;
>      }
> +  if (i < 3)
> +    {
> +      /* Non-lfs opendir skips entries that can not be represented (for
> +	 instance if d_off is not an offset but rather an internal filesystem
> +	 representation.  For this case there is no point in continue the
> +	 testcase.  */
> +      return 77;
> +    }
>  
>    printf ("going back past 4-th entry...\n");

I think this needs to check errno properly (set to zero before readdir,
and if readdir fails, report a test error if errno is not zero).

> index adcf8234f1..8f58a1c3a6 100644
> --- a/sysdeps/unix/sysv/linux/dirstream.h
> +++ b/sysdeps/unix/sysv/linux/dirstream.h
> @@ -22,6 +22,7 @@
>  #include <sys/types.h>
>  
>  #include <libc-lock.h>
> +#include <telldir.h>
>  
>  /* Directory stream type.
>  
> @@ -38,13 +39,16 @@ struct __dirstream
>      size_t size;		/* Total valid data in the block.  */
>      size_t offset;		/* Current offset into the block.  */
>  
> -    off_t filepos;		/* Position of next entry to read.  */
> +    off64_t filepos;		/* Position of next entry to read.  */
>  
>      int errcode;		/* Delayed error code.  */
>  
>  #if !defined __OFF_T_MATCHES_OFF64_T || !defined __INO_T_MATCHES_INO64_T
>      struct dirent tdp;
>  #endif
> +#if _DIRENT_OFFSET_TRANSLATION
> +    struct dirstream_loc_t locs; /* off64_t to long int map for telldir.  */
> +#endif

Please add a comment explaining the purpose of the new field.

> diff --git a/sysdeps/unix/sysv/linux/readdir.c b/sysdeps/unix/sysv/linux/readdir.c
> index cd0ccaf33a..7a7f484c36 100644
> --- a/sysdeps/unix/sysv/linux/readdir.c
> +++ b/sysdeps/unix/sysv/linux/readdir.c
> @@ -36,6 +36,15 @@ dirstream_entry (struct __dirstream *ds, const struct dirent64 *dp64)
>    if (dp64->d_reclen - offsetof (struct dirent64, d_name) > NAME_MAX)
>      return false;
>  
> +  /* telldir can not return an error, so preallocate the map if the entry can
> +     not be packed directly.  */
> +  if (telldir_need_dirstream (dp64->d_off))
> +    {
> +      dirstream_loc_add (&ds->locs, dp64->d_off);
> +      if (dirstream_loc_has_failed (&ds->locs))
> +	return false;
> +    }

I recommend to reserve the space here (but only for d_off values that
need it), and perform the assignment only if there is an actual telldir
call.  This avoids all the additional processing and allocations for the
common case that telldir is never called.  This will matter quite a bit
for large directories.

Reservation can be implemented by adding a 0 dummy value (offset 0 will
never be recorded), unless the last element is already 0.  I don't think
we need to change dynarray for this.


> diff --git a/sysdeps/unix/sysv/linux/seekdir.c b/sysdeps/unix/sysv/linux/seekdir.c
> index 939ccc4447..30cce691a4 100644
> --- a/sysdeps/unix/sysv/linux/seekdir.c
> +++ b/sysdeps/unix/sysv/linux/seekdir.c

> +
> +  if (dirp->filepos != filepos)
> +    {
> +      __lseek64 (dirp->fd, filepos, SEEK_SET);
> +      dirp->filepos = filepos;
> +      dirp->offset = 0;
> +      dirp->size = 0;
> +    }

I think we should seek unconditionally, just in case someone messed with
dirfd directly.

> diff --git a/sysdeps/unix/sysv/linux/telldir.c b/sysdeps/unix/sysv/linux/telldir.c
> index 1e5c129e9f..c3ef14f3da 100644
> --- a/sysdeps/unix/sysv/linux/telldir.c
> +++ b/sysdeps/unix/sysv/linux/telldir.c
> @@ -15,9 +15,11 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */

> +  else
> +    {
> +      dsp.l = -1;
> +
> +      size_t i;
> +      for (i = 0; i < dirstream_loc_size (&dirp->locs); i++)
> +	if (*dirstream_loc_at (&dirp->locs, i) == dirp->filepos)
> +	  break;
> +      /* It should be pre-allocated on readdir.  */
> +      assert (i != dirstream_loc_size (&dirp->locs));

This should overwrite the final 0 element if the offset could not be
found.

I think the quadratic behavior is acceptable in this case, given that we
do not really expect telldir to be called.

> diff --git a/sysdeps/unix/sysv/linux/telldir.h b/sysdeps/unix/sysv/linux/telldir.h
> new file mode 100644
> index 0000000000..758bcb0eb3
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/telldir.h
> @@ -0,0 +1,65 @@
> +/* Linux internal telldir definitions.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _TELLDIR_H
> +#define _TELLDIR_H 1
> +
> +#if _DIRENT_OFFSET_TRANSLATION
> +/* On platforms where 'long int' is smaller than 'off64_t' this is how the
> +   returned value is encoded and returned by 'telldir'.  If the directory
> +   offset can be enconded in 31 bits it is returned in the 'info' member
> +   with 'is_packed' set to 1.
> +
> +   Otherwise, the 'info' member describes an index in a dynamic array at
> +   'DIR' structure.  */
> +
> +union dirstream_packed
> +{
> +  long int l;
> +  struct
> +  {
> +    unsigned long int is_packed:1;
> +    unsigned long int info:31;
> +  } p;
> +};

Good explanation in the comment, but … can't you use negative values for
indices, and ~ for encoding/decoding?  I think that's a bit clearer.

> +static __always_inline bool
> +telldir_need_dirstream (__off64_t d_off)
> +{
> +  return d_off >= 1UL << 31;
> +}

Hmm, can d_off be negative?

> diff --git a/sysdeps/unix/sysv/linux/tst-opendir-nolfs.c b/sysdeps/unix/sysv/linux/tst-opendir-nolfs.c
> new file mode 100644
> index 0000000000..52e18171a7

> +static int
> +do_test (void)
> +{
> +  DIR *dirp = opendir (dirname);
> +  TEST_VERIFY_EXIT (dirp != NULL);
> +
> +  long int *tdirp = xmalloc (nfiles * sizeof (long int));
> +  struct dirent **ddirp = xmalloc (nfiles * sizeof (struct dirent *));

xreallocarray?

> +  /* For non-LFS, the entry is skipped if it can not be converted.  */
> +  int count = 0;
> +  for (; count < nfiles; count++)
> +    {
> +      tdirp[count] = telldir (dirp);
> +      struct dirent *dp = readdir (dirp);
> +      if (dp == NULL)
> +	break;

Should probably verify the count variable here (and check errno too?).

> +      ddirp[count] = xmalloc (dp->d_reclen);
> +      memcpy (ddirp[count], dp, dp->d_reclen);
> +    }
> +
> +  closedir (dirp);
> +
> +  /* Check against the getdents64 syscall.  */
> +  int fd = xopen (dirname, O_RDONLY | O_DIRECTORY, 0);
> +  int i = 0;
> +  while (true)

If we create predictable file names (maybe just numbers?), this kind of
checking becomes way easier.

Thanks,
Florian


  reply	other threads:[~2023-02-20 18:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-27 17:28 [PATCH v5 0/5] Fix opendir regression on some FS Adhemerval Zanella
2023-01-27 17:28 ` [PATCH v5 1/5] linux: Use getdents64 on non-LFS readdir Adhemerval Zanella
2023-01-27 17:38   ` Andreas Schwab
2023-02-20 13:31   ` Florian Weimer
2023-02-20 17:55     ` Florian Weimer
2023-03-01 16:34       ` Adhemerval Zanella Netto
2023-01-27 17:28 ` [PATCH v5 2/5] linux: Set internal DIR filepos as off64_t (BZ #23960, BZ #24050) Adhemerval Zanella
2023-02-20 18:39   ` Florian Weimer [this message]
2023-03-01 17:40     ` Adhemerval Zanella Netto
2023-01-27 17:28 ` [PATCH v5 3/5] linux: Add __readdir64_unlocked Adhemerval Zanella
2023-01-27 17:28 ` [PATCH v5 4/5] linux: Add __old_readdir64_unlocked Adhemerval Zanella
2023-01-27 17:28 ` [PATCH v5 5/5] linux: Use getdents64 on readdir64 compat implementation Adhemerval Zanella
2023-02-09 20:14 ` [PATCH v5 0/5] Fix opendir regression on some FS Andreas K. Huettel
2023-02-09 20:14 ` Andreas K. Huettel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ilfwdvup.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=dilfridge@gentoo.org \
    --cc=eggert@cs.ucla.edu \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).