public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
@ 2016-04-07 15:53 Florian Weimer
  2016-04-08 18:44 ` Roland McGrath
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2016-04-07 15:53 UTC (permalink / raw)
  To: GNU C Library

[-- Attachment #1: Type: text/plain, Size: 387 bytes --]

Previously, application code had to set up d_ino and d_namlen members
if the platform supported them, involving conditional compilation.
The changes cause glob to ignore d_ino in GLOB_ALTDIRFUNC mode, and
always rely on the string length instead of using d_namlen.  DT_UNKNOWN
is zero, so zero-initialization of struct dirent will set the d_type
member to a conservative value.

Florian

[-- Attachment #2: 0001-glob-Simplify-the-interface-for-the-GLOB_ALTDIRFUNC-.patch --]
[-- Type: text/x-patch, Size: 6647 bytes --]

2016-04-07  Florian Weimer  <fweimer@redhat.com>

	glob: Simplify and document the interface for the GLOB_ALTDIRFUNC
	callback function gl_readdir.
	* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove.
	(CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy.
	(glob_in_dir): Remove len.  Only skip non-real entries if
	GLOB_ALTDIRFUNC is not set.  Use strdup instead of malloc and
	memcpy to copy the name.
	* posix/bug-glob2.c (my_readdir): Clear struct dirent object.  Do
	not set d_ino explicitly.
	* posix/tst-gnuglob.c (my_readdir): Likewise.
	* manual/pattern.texi (Calling Glob): Document requirements for
	implementations of the gl_readdir callback function.

diff --git a/manual/pattern.texi b/manual/pattern.texi
index d1b9275..c026e68 100644
--- a/manual/pattern.texi
+++ b/manual/pattern.texi
@@ -237,6 +237,17 @@ function used to read the contents of a directory.  It is used if the
 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 this field is @w{@code{struct dirent *(*) (void *)}}.
 
+An implementation of @code{gl_readdir} should initialize the
+@code{struct dirent} object to zero, up to the @code{d_name} member.  As
+an optimization, it may set the @code{d_type} member if the file type of
+the entry is known.  The @code{d_name} member must be null-terminated;
+the entire string is used by @code{glob}.
+
+The @code{struct dirent} object can be overwritten by a subsequent call
+to the @code{gl_readdir} callback function on the same directory stream
+created by the @code{gl_opendir} callback function.  It should be
+deallocated by the @code{gl_closedir} callback function.
+
 This is a GNU extension.
 
 @item gl_opendir
diff --git a/posix/bug-glob2.c b/posix/bug-glob2.c
index ddf5ec9..3726f83 100644
--- a/posix/bug-glob2.c
+++ b/posix/bug-glob2.c
@@ -193,7 +193,7 @@ my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  memset (&dir->d, 0, offsetof (struct dirent, d_name));
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;
@@ -202,13 +202,11 @@ my_readdir (void *gdir)
   strcpy (dir->d.d_name, filesystem[dir->idx].name);
 
 #ifdef _DIRENT_HAVE_D_TYPE
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_type: %d, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino, dir->d.d_type,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_type: %d, d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_type, dir->d.d_name);
 #else
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_name);
 #endif
 
   ++dir->idx;
diff --git a/posix/glob.c b/posix/glob.c
index 0c04c3c..21fd242 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -57,10 +57,8 @@
 
 #if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
 # include <dirent.h>
-# define NAMLEN(dirent) strlen((dirent)->d_name)
 #else
 # define dirent direct
-# define NAMLEN(dirent) (dirent)->d_namlen
 # ifdef HAVE_SYS_NDIR_H
 #  include <sys/ndir.h>
 # endif
@@ -76,12 +74,6 @@
 #endif
 
 
-/* In GNU systems, <dirent.h> defines this macro for us.  */
-#ifdef _D_NAMLEN
-# undef NAMLEN
-# define NAMLEN(d) _D_NAMLEN(d)
-#endif
-
 /* When used in the GNU libc the symbol _DIRENT_HAVE_D_TYPE is available
    if the `d_type' member for `struct dirent' is available.
    HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB.  */
@@ -105,12 +97,6 @@
 
 /* If the system has the `struct dirent64' type we use it internally.  */
 #if defined _LIBC && !defined COMPILE_GLOB64
-# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
-#  define CONVERT_D_NAMLEN(d64, d32)
-# else
-#  define CONVERT_D_NAMLEN(d64, d32) \
-  (d64)->d_namlen = (d32)->d_namlen;
-# endif
 
 # if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
 #  define CONVERT_D_INO(d64, d32)
@@ -127,8 +113,7 @@
 # endif
 
 # define CONVERT_DIRENT_DIRENT64(d64, d32) \
-  memcpy ((d64)->d_name, (d32)->d_name, NAMLEN (d32) + 1);		      \
-  CONVERT_D_NAMLEN (d64, d32)						      \
+  strcpy ((d64)->d_name, (d32)->d_name);				      \
   CONVERT_D_INO (d64, d32)						      \
   CONVERT_D_TYPE (d64, d32)
 #endif
@@ -1554,7 +1539,6 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 	  while (1)
 	    {
 	      const char *name;
-	      size_t len;
 #if defined _LIBC && !defined COMPILE_GLOB64
 	      struct dirent64 *d;
 	      union
@@ -1586,7 +1570,10 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 #endif
 	      if (d == NULL)
 		break;
-	      if (! REAL_DIR_ENTRY (d))
+	      /* If the GLOB_ALTDIRFUNC callback is used, we expect
+		 the callback to return only existing entries.  */
+	      if (! REAL_DIR_ENTRY (d)
+		  && __builtin_expect (flags & GLOB_ALTDIRFUNC, 0) == 0)
 		continue;
 
 	      /* If we shall match only directories use the information
@@ -1622,12 +1609,10 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 			  names = newnames;
 			  cur = 0;
 			}
-		      len = NAMLEN (d);
-		      names->name[cur] = (char *) malloc (len + 1);
+		      names->name[cur] = strdup (d->d_name);
 		      if (names->name[cur] == NULL)
 			goto memory_error;
-		      *((char *) mempcpy (names->name[cur++], name, len))
-			= '\0';
+		      ++cur;
 		      ++nfound;
 		    }
 		}
diff --git a/posix/tst-gnuglob.c b/posix/tst-gnuglob.c
index 992b997..4eae689 100644
--- a/posix/tst-gnuglob.c
+++ b/posix/tst-gnuglob.c
@@ -211,7 +211,7 @@ my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  memset (&dir->d, 0, offsetof (struct dirent, d_name));
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;
@@ -220,13 +220,11 @@ my_readdir (void *gdir)
   strcpy (dir->d.d_name, filesystem[dir->idx].name);
 
 #ifdef _DIRENT_HAVE_D_TYPE
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_type: %d, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino, dir->d.d_type,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_type: %d, d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_type, dir->d.d_name);
 #else
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_name);
 #endif
 
   ++dir->idx;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-07 15:53 [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir Florian Weimer
@ 2016-04-08 18:44 ` Roland McGrath
  2016-04-11 16:57   ` Florian Weimer
  0 siblings, 1 reply; 10+ messages in thread
From: Roland McGrath @ 2016-04-08 18:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

This is a potentially breaking ABI and API change.  The documentation does
not say anything specific about what the gl_readdir function's protocol is.
It just says "an alternative implementation of readdir".  The way I'd read
that, and the reality of the status quo ante, is that a trivial wrapper
around readdir will behave identically to not using GLOB_ALTDIRFUNC at all.
That means that d_ino==0 results must be ignored.

IMHO this means we need symbol versioning to make a change to the treatment
of d_ino.

It's a sufficiently conservative change to start ignoring d_namlen,
so you could do that separately.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-08 18:44 ` Roland McGrath
@ 2016-04-11 16:57   ` Florian Weimer
  2016-04-11 21:19     ` Roland McGrath
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2016-04-11 16:57 UTC (permalink / raw)
  To: Roland McGrath; +Cc: GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1537 bytes --]

On 04/08/2016 08:44 PM, Roland McGrath wrote:
> This is a potentially breaking ABI and API change.  The documentation does
> not say anything specific about what the gl_readdir function's protocol is.
> It just says "an alternative implementation of readdir".  The way I'd read
> that, and the reality of the status quo ante, is that a trivial wrapper
> around readdir will behave identically to not using GLOB_ALTDIRFUNC at all.
> That means that d_ino==0 results must be ignored.
>
> IMHO this means we need symbol versioning to make a change to the treatment
> of d_ino.

I'm trying to get clarification if getdents on Linux can ever return 
zero d_ino values.  It's going to be difficult to work this out due to 
the VFS layer.  My hunch is that inode 0 is more likely to mean “I can't 
tell you the inode number right now here”, and not “please skip this entry”.

In a sense, with glob64, supplying readdir64 as the callback does not 
actually change behavior if the d_ino == 0 is left out because that 
readdir64 implementation would not be POSIX-compliant (d_ino has to be 
the file serial number if the member exists).

I think we have to filter in readdir in order to comply with POSIX if 
the alleged historic race (d_ino is 0 during removal) can happen on 
Linux or Hurd.

> It's a sufficiently conservative change to start ignoring d_namlen,
> so you could do that separately.

Indeed, this should be a different conversation.  I tried to word the 
new documentation in a way that leaves the door open for that.

Florian

[-- Attachment #2: 0001-glob-Simplify-the-interface-for-the-GLOB_ALTDIRFUNC-.patch --]
[-- Type: text/x-patch, Size: 6762 bytes --]

glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir

Previously, application code had to set up d_ino and d_namlen members
if the platform supported them, involving conditional compilation.
DT_UNKNOWN is zero, so zero-initialization of struct dirent will set
the d_type member to a conservative value.

Changing the behavior with regards to d_ino is left to a future
cleanup.

2016-04-11  Florian Weimer  <fweimer@redhat.com>

	glob: Simplify and document the interface for the GLOB_ALTDIRFUNC
	callback function gl_readdir.
	* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove.
	(CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy.
	(glob_in_dir): Remove len.  Use strdup instead of malloc and
	memcpy to copy the name.
	* posix/bug-glob2.c (my_readdir): Clear struct dirent object.  Set
	d_ino to 1.
	* posix/tst-gnuglob.c (my_readdir): Likewise.
	* manual/pattern.texi (Calling Glob): Document requirements for
	implementations of the gl_readdir callback function.

diff --git a/manual/pattern.texi b/manual/pattern.texi
index d1b9275..fa5ecd7 100644
--- a/manual/pattern.texi
+++ b/manual/pattern.texi
@@ -237,6 +237,19 @@ function used to read the contents of a directory.  It is used if the
 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 this field is @w{@code{struct dirent *(*) (void *)}}.
 
+An implementation of @code{gl_readdir} should initialize the
+@code{struct dirent} object to zero, up to the @code{d_name} member.
+The @code{d_ino} member must be set to a non-zero value, otherwise
+@code{glob} may ignore the returned directory entry.  As an
+optimization, it may set the @code{d_type} member if the file type of
+the entry is known.  The @code{d_name} member must be null-terminated;
+the entire string is used by @code{glob}.
+
+The @code{struct dirent} object can be overwritten by a subsequent call
+to the @code{gl_readdir} callback function on the same directory stream
+created by the @code{gl_opendir} callback function.  It should be
+deallocated by the @code{gl_closedir} callback function.
+
 This is a GNU extension.
 
 @item gl_opendir
diff --git a/posix/bug-glob2.c b/posix/bug-glob2.c
index ddf5ec9..8c377b6 100644
--- a/posix/bug-glob2.c
+++ b/posix/bug-glob2.c
@@ -193,7 +193,8 @@ my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  memset (&dir->d, 0, offsetof (struct dirent, d_name));
+  dir->d.d_ino = 1;		/* glob should not skip this entry.  */
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;
@@ -202,13 +203,11 @@ my_readdir (void *gdir)
   strcpy (dir->d.d_name, filesystem[dir->idx].name);
 
 #ifdef _DIRENT_HAVE_D_TYPE
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_type: %d, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino, dir->d.d_type,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_type: %d, d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_type, dir->d.d_name);
 #else
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_name);
 #endif
 
   ++dir->idx;
diff --git a/posix/glob.c b/posix/glob.c
index 0c04c3c..9ae76ac 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -57,10 +57,8 @@
 
 #if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
 # include <dirent.h>
-# define NAMLEN(dirent) strlen((dirent)->d_name)
 #else
 # define dirent direct
-# define NAMLEN(dirent) (dirent)->d_namlen
 # ifdef HAVE_SYS_NDIR_H
 #  include <sys/ndir.h>
 # endif
@@ -76,12 +74,6 @@
 #endif
 
 
-/* In GNU systems, <dirent.h> defines this macro for us.  */
-#ifdef _D_NAMLEN
-# undef NAMLEN
-# define NAMLEN(d) _D_NAMLEN(d)
-#endif
-
 /* When used in the GNU libc the symbol _DIRENT_HAVE_D_TYPE is available
    if the `d_type' member for `struct dirent' is available.
    HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB.  */
@@ -105,12 +97,6 @@
 
 /* If the system has the `struct dirent64' type we use it internally.  */
 #if defined _LIBC && !defined COMPILE_GLOB64
-# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
-#  define CONVERT_D_NAMLEN(d64, d32)
-# else
-#  define CONVERT_D_NAMLEN(d64, d32) \
-  (d64)->d_namlen = (d32)->d_namlen;
-# endif
 
 # if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
 #  define CONVERT_D_INO(d64, d32)
@@ -127,8 +113,7 @@
 # endif
 
 # define CONVERT_DIRENT_DIRENT64(d64, d32) \
-  memcpy ((d64)->d_name, (d32)->d_name, NAMLEN (d32) + 1);		      \
-  CONVERT_D_NAMLEN (d64, d32)						      \
+  strcpy ((d64)->d_name, (d32)->d_name);				      \
   CONVERT_D_INO (d64, d32)						      \
   CONVERT_D_TYPE (d64, d32)
 #endif
@@ -1554,7 +1539,6 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 	  while (1)
 	    {
 	      const char *name;
-	      size_t len;
 #if defined _LIBC && !defined COMPILE_GLOB64
 	      struct dirent64 *d;
 	      union
@@ -1622,12 +1606,10 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 			  names = newnames;
 			  cur = 0;
 			}
-		      len = NAMLEN (d);
-		      names->name[cur] = (char *) malloc (len + 1);
+		      names->name[cur] = strdup (d->d_name);
 		      if (names->name[cur] == NULL)
 			goto memory_error;
-		      *((char *) mempcpy (names->name[cur++], name, len))
-			= '\0';
+		      ++cur;
 		      ++nfound;
 		    }
 		}
diff --git a/posix/tst-gnuglob.c b/posix/tst-gnuglob.c
index 992b997..184e030 100644
--- a/posix/tst-gnuglob.c
+++ b/posix/tst-gnuglob.c
@@ -211,7 +211,8 @@ my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  memset (&dir->d, 0, offsetof (struct dirent, d_name));
+  dir->d.d_ino = 1;		/* glob should not skip this entry.  */
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;
@@ -220,13 +221,11 @@ my_readdir (void *gdir)
   strcpy (dir->d.d_name, filesystem[dir->idx].name);
 
 #ifdef _DIRENT_HAVE_D_TYPE
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_type: %d, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino, dir->d.d_type,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_type: %d, d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_type, dir->d.d_name);
 #else
-  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_ino: %ld, d_name: \"%s\" }\n",
-	  dir->level, (long int) dir->idx, dir->d.d_ino,
-	  dir->d.d_name);
+  PRINTF ("my_readdir ({ level: %d, idx: %ld }) = { d_name: \"%s\" }\n",
+	  dir->level, (long int) dir->idx, dir->d.d_name);
 #endif
 
   ++dir->idx;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-11 16:57   ` Florian Weimer
@ 2016-04-11 21:19     ` Roland McGrath
  2016-04-11 22:27       ` Paul Eggert
  2016-04-12 14:01       ` Florian Weimer
  0 siblings, 2 replies; 10+ messages in thread
From: Roland McGrath @ 2016-04-11 21:19 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

> I'm trying to get clarification if getdents on Linux can ever return 
> zero d_ino values.  It's going to be difficult to work this out due to 
> the VFS layer.  My hunch is that inode 0 is more likely to mean  I can't 
> tell you the inode number right now here , and not  please skip this entry .

I don't know where your hunch comes from, but I don't believe it for a
second.  The d_ino==0 convention has been universal in Unix since before
Linux existed.  Any filesystem that returns d_ino==0 entries for getdents
will have those entries completely ignored by userland, so I doubt anyone
has implemented a filesystem that way.

> In a sense, with glob64, supplying readdir64 as the callback does not 
> actually change behavior if the d_ino == 0 is left out because that 
> readdir64 implementation would not be POSIX-compliant (d_ino has to be 
> the file serial number if the member exists).
>
> I think we have to filter in readdir in order to comply with POSIX if 
> the alleged historic race (d_ino is 0 during removal) can happen on 
> Linux or Hurd.

It was never a race.  In filesystems where it happened, it was the case
that removing an entry often consisted of nothing but clearing its d_ino
field in the on-disk directory.  Only some subsequent operation on the same
directory, like creating a new entry, might overwrite deleted entries or
cause a compaction.  And originally, the on-disk directory format was just
read directly using 'read' (there was no separate getdirentries/getdents
system call), so it was a userland responsibility to ignore the deleted
entries.

Hmm.  Perhaps no readdir function has actually returned d_ino==0 entries
since before 1003.1-1988.  The GNU implementations have indeed always
filtered out such entries.  But code using readdir has always had the
checks.

I don't have the ancient 1003.1 editions in front of me right now (can
probably check next week when I'm back home).  I'm not sure whether there
was ever a standard for readdir that said anything about d_ino different
from what the current POSIX.1 says (which indeed disallows d_ino==0
entries).  My recollection is that early versions of 1003.1 did not specify
d_ino/d_fileno at all.

> +An implementation of @code{gl_readdir} should initialize the
> +@code{struct dirent} object to zero, up to the @code{d_name} member.
> +The @code{d_ino} member must be set to a non-zero value, otherwise
> +@code{glob} may ignore the returned directory entry.  

This wording is awkward.  It seems to contradict itself, as it first
implies that d_ino should be zero and then says it must be nonzero.  I'm
not convinced this "zero the object" advice is the way to go anyway.  If
there is struct padding in struct dirent, it doesn't matter that it be
zero.  If it has d_namlen, it (now) doesn't matter what it contains at all.
Perhaps it's better to say just which fields glob examines and that the
protocol requires only that those fields be set.

Regardless of what we settle on as best practice, I think we want a little
code example here that demonstrates it.

> +The @code{struct dirent} object can be overwritten by a subsequent call
> +to the @code{gl_readdir} callback function on the same directory stream
> +created by the @code{gl_opendir} callback function.  It should be
> +deallocated by the @code{gl_closedir} callback function.

I think it would be better to word this in terms of what glob does, rather
than what the callback should do.  e.g.

@code{glob} examines the @code{struct dirent} object only immediately after
each call to your @code{gl_readdir} callback function and does not store
the pointer returned.  That pointer is not expected to remain valid after a
subsequent call to the @code{gl_readdir} or @code{gl_closedir} callback.  A
common way to implement these callbacks it to reuse the same buffer each
time the @code{gl_readdir} function is called and free the buffer in the
@code{gl_closedir} function.


The code changes look fine to me.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-11 21:19     ` Roland McGrath
@ 2016-04-11 22:27       ` Paul Eggert
  2016-04-12 12:16         ` Florian Weimer
  2016-04-12 14:01       ` Florian Weimer
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2016-04-11 22:27 UTC (permalink / raw)
  To: Roland McGrath, Florian Weimer; +Cc: GNU C Library

On 04/11/2016 02:19 PM, Roland McGrath wrote:
> I'm not sure whether there
> was ever a standard for readdir that said anything about d_ino different
> from what the current POSIX.1 says (which indeed disallows d_ino==0
> entries).

I don't think there was either.

> My recollection is that early versions of 1003.1 did not specify
> d_ino/d_fileno at all.

Yes, that's right. I just checked, and 1003.1-1988 and 1003.1b-1993 both 
specify only d_name.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-11 22:27       ` Paul Eggert
@ 2016-04-12 12:16         ` Florian Weimer
  2016-04-13  0:01           ` Paul Eggert
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2016-04-12 12:16 UTC (permalink / raw)
  To: Paul Eggert, Roland McGrath; +Cc: GNU C Library

On 04/12/2016 12:27 AM, Paul Eggert wrote:
> On 04/11/2016 02:19 PM, Roland McGrath wrote:
>> I'm not sure whether there
>> was ever a standard for readdir that said anything about d_ino different
>> from what the current POSIX.1 says (which indeed disallows d_ino==0
>> entries).
>
> I don't think there was either.

Can we remove the d_ino checking in glob from glibc at least (in a 
future patch)?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-11 21:19     ` Roland McGrath
  2016-04-11 22:27       ` Paul Eggert
@ 2016-04-12 14:01       ` Florian Weimer
  2016-04-26 14:17         ` Florian Weimer
  1 sibling, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2016-04-12 14:01 UTC (permalink / raw)
  To: Roland McGrath; +Cc: GNU C Library

[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]

On 04/11/2016 11:19 PM, Roland McGrath wrote:
>> I'm trying to get clarification if getdents on Linux can ever return
>> zero d_ino values.  It's going to be difficult to work this out due to
>> the VFS layer.  My hunch is that inode 0 is more likely to mean  I can't
>> tell you the inode number right now here , and not  please skip this entry .
>
> I don't know where your hunch comes from, but I don't believe it for a
> second.  The d_ino==0 convention has been universal in Unix since before
> Linux existed.  Any filesystem that returns d_ino==0 entries for getdents
> will have those entries completely ignored by userland, so I doubt anyone
> has implemented a filesystem that way.

Non-GNU userland would work, so it could be a file system restricted to 
embedded applications.

>> +An implementation of @code{gl_readdir} should initialize the
>> +@code{struct dirent} object to zero, up to the @code{d_name} member.
>> +The @code{d_ino} member must be set to a non-zero value, otherwise
>> +@code{glob} may ignore the returned directory entry.
>
> This wording is awkward.  It seems to contradict itself, as it first
> implies that d_ino should be zero and then says it must be nonzero.  I'm
> not convinced this "zero the object" advice is the way to go anyway.  If
> there is struct padding in struct dirent, it doesn't matter that it be
> zero.  If it has d_namlen, it (now) doesn't matter what it contains at all.
> Perhaps it's better to say just which fields glob examines and that the
> protocol requires only that those fields be set.
>
> Regardless of what we settle on as best practice, I think we want a little
> code example here that demonstrates it.

Right.  I realized that d_type and d_ino are unconditionally available 
in glibc, so I got rid of the memset and spelled out all relevant 
members explicitly.  I added the mkdirent example function in the new 
version of the patch.

> @code{glob} examines the @code{struct dirent} object only immediately after
> each call to your @code{gl_readdir} callback function and does not store
> the pointer returned.  That pointer is not expected to remain valid after a
> subsequent call to the @code{gl_readdir} or @code{gl_closedir} callback.  A
> common way to implement these callbacks it to reuse the same buffer each
> time the @code{gl_readdir} function is called and free the buffer in the
> @code{gl_closedir} function.

I expanded a bit on that in the new version of the patch, too.  I hope 
the table nesting isn't too awkward.  To me, it looks okay in Info, HTML 
and PDF.

Florian

[-- Attachment #2: 0001-glob-Simplify-the-interface-for-the-GLOB_ALTDIRFUNC-.patch --]
[-- Type: text/x-patch, Size: 7786 bytes --]

glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir

Previously, application code had to set up the d_namlen member if
the target supported it, involving conditional compilation.  After
this change, glob will use the length of the string in d_name instead
of d_namlen to determine the file name length.  All glibc targets
provide the d_type and d_ino members, and setting the as needed for
gl_readdir is straightforward.

Changing the behavior with regards to d_ino is left to a future
cleanup.

2016-04-12  Florian Weimer  <fweimer@redhat.com>

	glob: Simplify and document the interface for the GLOB_ALTDIRFUNC
	callback function gl_readdir.
	* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove.
	(CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy.
	(glob_in_dir): Remove len.  Use strdup instead of malloc and
	memcpy to copy the name.
	* manual/pattern.texi (Calling Glob): Document requirements for
	implementations of the gl_readdir callback function.
	* manual/examples/mkdirent.c: New example.
	* posix/bug-glob2.c (my_readdir): Set d_ino to 1 unconditionally,
	per the manual guidance.
	* posix/tst-gnuglob.c (my_readdir): Likewise.

diff --git a/manual/examples/mkdirent.c b/manual/examples/mkdirent.c
new file mode 100644
index 0000000..eea794d
--- /dev/null
+++ b/manual/examples/mkdirent.c
@@ -0,0 +1,42 @@
+/* Example for creating a struct dirent object for use with glob.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, if not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <dirent.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+
+struct dirent *
+mkdirent (const char *name)
+{
+  size_t dirent_size = offsetof (struct dirent, d_name) + 1;
+  size_t name_length = strlen (name);
+  size_t total_size = dirent_size + name_length;
+  if (total_size < dirent_size)
+    {
+      errno = ENOMEM;
+      return NULL;
+    }
+  struct dirent *result = malloc (total_size);
+  if (result == NULL)
+    return NULL;
+  result->d_type = DT_UNKNOWN;
+  result->d_ino = 1;            /* Do not skip this entry.  */
+  memcpy (result->d_name, name, name_length + 1);
+  return result;
+}
diff --git a/manual/pattern.texi b/manual/pattern.texi
index d1b9275..565e7eb 100644
--- a/manual/pattern.texi
+++ b/manual/pattern.texi
@@ -237,7 +237,44 @@ function used to read the contents of a directory.  It is used if the
 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 this field is @w{@code{struct dirent *(*) (void *)}}.
 
-This is a GNU extension.
+An implementation of @code{gl_readdir} needs to initialize the following
+members of the @code{struct dirent} object:
+
+@table @code
+@item d_type
+This member should be set to the file type of the entry if it is known.
+Otherwise, the value @code{DT_UNKNOWN} can be used.  The @code{glob}
+function may use the specified file type to avoid callbacks in cases
+where the file type indicates that the data is not required.
+
+@item d_ino
+This member needs to be non-zero, otherwise @code{glob} may skip the
+current entry and call the @code{gl_readdir} callback function again to
+retrieve another entry.
+
+@item d_name
+This member must be set to the name of the entry.  It must be
+null-terminated.
+@end table
+
+The example below shows how to allocate a @code{struct dirent} object
+containing a given name.
+
+@smallexample
+@include mkdirent.c.texi
+@end smallexample
+
+The @code{glob} function reads the @code{struct dirent} members listed
+above and makes a copy of the file name in the @code{d_name} member
+immediately after the @code{gl_readdir} callback function returns.
+Future invocations of any of the callback functions may dealloacte or
+reuse the buffer.  It is the responsibility of the caller of the
+@code{glob} function to allocate and deallocate the buffer, around the
+call to @code{glob} or using the callback functions.  For example, an
+application could allocate the buffer in the @code{gl_readdir} callback
+function, and deallocate it in the @code{gl_closedir} callback function.
+
+The @code{gl_readdir} member is a GNU extension.
 
 @item gl_opendir
 The address of an alternative implementation of the @code{opendir}
diff --git a/posix/bug-glob2.c b/posix/bug-glob2.c
index ddf5ec9..0fdc5d0 100644
--- a/posix/bug-glob2.c
+++ b/posix/bug-glob2.c
@@ -193,7 +193,7 @@ my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  dir->d.d_ino = 1;		/* glob should not skip this entry.  */
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;
diff --git a/posix/glob.c b/posix/glob.c
index 0c04c3c..9ae76ac 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -57,10 +57,8 @@
 
 #if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
 # include <dirent.h>
-# define NAMLEN(dirent) strlen((dirent)->d_name)
 #else
 # define dirent direct
-# define NAMLEN(dirent) (dirent)->d_namlen
 # ifdef HAVE_SYS_NDIR_H
 #  include <sys/ndir.h>
 # endif
@@ -76,12 +74,6 @@
 #endif
 
 
-/* In GNU systems, <dirent.h> defines this macro for us.  */
-#ifdef _D_NAMLEN
-# undef NAMLEN
-# define NAMLEN(d) _D_NAMLEN(d)
-#endif
-
 /* When used in the GNU libc the symbol _DIRENT_HAVE_D_TYPE is available
    if the `d_type' member for `struct dirent' is available.
    HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB.  */
@@ -105,12 +97,6 @@
 
 /* If the system has the `struct dirent64' type we use it internally.  */
 #if defined _LIBC && !defined COMPILE_GLOB64
-# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
-#  define CONVERT_D_NAMLEN(d64, d32)
-# else
-#  define CONVERT_D_NAMLEN(d64, d32) \
-  (d64)->d_namlen = (d32)->d_namlen;
-# endif
 
 # if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
 #  define CONVERT_D_INO(d64, d32)
@@ -127,8 +113,7 @@
 # endif
 
 # define CONVERT_DIRENT_DIRENT64(d64, d32) \
-  memcpy ((d64)->d_name, (d32)->d_name, NAMLEN (d32) + 1);		      \
-  CONVERT_D_NAMLEN (d64, d32)						      \
+  strcpy ((d64)->d_name, (d32)->d_name);				      \
   CONVERT_D_INO (d64, d32)						      \
   CONVERT_D_TYPE (d64, d32)
 #endif
@@ -1554,7 +1539,6 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 	  while (1)
 	    {
 	      const char *name;
-	      size_t len;
 #if defined _LIBC && !defined COMPILE_GLOB64
 	      struct dirent64 *d;
 	      union
@@ -1622,12 +1606,10 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 			  names = newnames;
 			  cur = 0;
 			}
-		      len = NAMLEN (d);
-		      names->name[cur] = (char *) malloc (len + 1);
+		      names->name[cur] = strdup (d->d_name);
 		      if (names->name[cur] == NULL)
 			goto memory_error;
-		      *((char *) mempcpy (names->name[cur++], name, len))
-			= '\0';
+		      ++cur;
 		      ++nfound;
 		    }
 		}
diff --git a/posix/tst-gnuglob.c b/posix/tst-gnuglob.c
index 992b997..b7b859b 100644
--- a/posix/tst-gnuglob.c
+++ b/posix/tst-gnuglob.c
@@ -211,7 +211,7 @@ my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  dir->d.d_ino = 1;		/* glob should not skip this entry.  */
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-12 12:16         ` Florian Weimer
@ 2016-04-13  0:01           ` Paul Eggert
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Eggert @ 2016-04-13  0:01 UTC (permalink / raw)
  To: Florian Weimer, Roland McGrath; +Cc: GNU C Library

On 04/12/2016 05:16 AM, Florian Weimer wrote:
> Can we remove the d_ino checking in glob from glibc at least (in a 
> future patch)? 
I have no objection.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-12 14:01       ` Florian Weimer
@ 2016-04-26 14:17         ` Florian Weimer
  2016-04-28 21:55           ` Roland McGrath
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2016-04-26 14:17 UTC (permalink / raw)
  To: Roland McGrath; +Cc: GNU C Library

On 04/12/2016 04:01 PM, Florian Weimer wrote:
> glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
>
> Previously, application code had to set up the d_namlen member if
> the target supported it, involving conditional compilation.  After
> this change, glob will use the length of the string in d_name instead
> of d_namlen to determine the file name length.  All glibc targets
> provide the d_type and d_ino members, and setting the as needed for
> gl_readdir is straightforward.
>
> Changing the behavior with regards to d_ino is left to a future
> cleanup.
>
> 2016-04-12  Florian Weimer<fweimer@redhat.com>
>
> 	glob: Simplify and document the interface for the GLOB_ALTDIRFUNC
> 	callback function gl_readdir.
> 	* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove.
> 	(CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy.
> 	(glob_in_dir): Remove len.  Use strdup instead of malloc and
> 	memcpy to copy the name.
> 	* manual/pattern.texi (Calling Glob): Document requirements for
> 	implementations of the gl_readdir callback function.
> 	* manual/examples/mkdirent.c: New example.
> 	* posix/bug-glob2.c (my_readdir): Set d_ino to 1 unconditionally,
> 	per the manual guidance.
> 	* posix/tst-gnuglob.c (my_readdir): Likewise.

Ping?

I'd like to move forward with these cleanups and eventually fix 
CVE-2016-1234.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
  2016-04-26 14:17         ` Florian Weimer
@ 2016-04-28 21:55           ` Roland McGrath
  0 siblings, 0 replies; 10+ messages in thread
From: Roland McGrath @ 2016-04-28 21:55 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

Approved

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-28 21:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-07 15:53 [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir Florian Weimer
2016-04-08 18:44 ` Roland McGrath
2016-04-11 16:57   ` Florian Weimer
2016-04-11 21:19     ` Roland McGrath
2016-04-11 22:27       ` Paul Eggert
2016-04-12 12:16         ` Florian Weimer
2016-04-13  0:01           ` Paul Eggert
2016-04-12 14:01       ` Florian Weimer
2016-04-26 14:17         ` Florian Weimer
2016-04-28 21:55           ` Roland McGrath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).