public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Ulrich Drepper <drepper@redhat.com>
Cc: Glibc hackers <libc-hacker@sources.redhat.com>
Subject: [RFC] *scanf and %as, %aS, %a[ and %ms, %mS, %m[
Date: Sat, 28 Jul 2007 11:10:00 -0000	[thread overview]
Message-ID: <20070728111430.GT4603@sunsite.mff.cuni.cz> (raw)

Hi!

We leak memory in our %as, %aS and %a[ implementation:
when we successfully malloc a buffer for %as, %aS, %a[, but some kind of
error (e.g. input_error () or conv_error ()) happens before the
corresponding ++done, the memory is not freed and the caller has no way
to find out it should do that (if that input_error () happens on the
very first %as argument, *scanf will return EOF resp. otherwise will
return number of before that argument assigned arguments).  This is solvable
just by making sure strptr is set to NULL on the next ++done and freeing
*strptr when strptr != NULL.  With that fix, the semantics is quite clear
- when *scanf returns EOF, the caller doesn't need to free any memory,
when it returns X != EOF, all arguments up to Xth which used %as, %aS, %a[
need to be freed (so for X 0 nothing, for sscanf ("%as %as", &p, &q) == 1
p needs to be freed, etc.).  If the initial malloc failed, *scanf
will return the number of already assigned arguments.  If realloc failure
happens, *scanf will return the number of already assigned arguments plus
1 (i.e. as if it successfully converted even the current %as/%aS/%a[ string)
and truncates the string to whatever was read at that point and sets
errno to ENOMEM.

http://www.opengroup.org/austin/aardvark/latest/xshbug2.txt
shows POSIX is planning to standardize m modifier instead (which is highly
desirable, because a modifier conflicts with ISO C99 standardized
conversion specifier for hex floats), but there are a few issues in that
text:
1) it first documents m modifier to be applicable only to s, S and [, but
then in c and C description talks about %mc and %mC.  For c/C the caller
always knows the size of the buffer before the call, so %mc/%mC isn't strictly
needed.
2) for ENOMEM case from both internal malloc and realloc, the spec says
a conversion error is supposed to happen (I haven't found anywhere specified
what is meant by conversion error), errno is set to ENOMEM and all already
malloced arguments must be freed.
I can understand why the %as behavior of truncating the string on realloc
can be seen as inappropriate, but the freeing of all other already assigned
malloced string sounds very weird to me, unless "conversion error" means
returning EOF (in which case the caller really has no way to know how many
strings were converted).  I have implemented that below to make it easier
to discuss this.  Alternatively, if "conversion error" is meant what
glibc's conv_error () does, i.e. just let errno be set and let it return
the number of already successfully assigned arguments, then requiring
all malloced strings to be freed rather than the one currently being
malloced/realloced sounds very wrong to me.  The caller needs to handle
the freeing anyway for cases like input_error etc. - in all cases it
knows all successfully assigned arguments need to be freed, all not
assigned ones are effectively unitialized.  This would certainly simplify
the changes a lot - only the if (strptr != NULL) { free (*strptr); *strptr = NULL; }
case would need to be handled, not all the keeping of pointers to all
arguments.  if (flags & POSIX_MALLOC) would just do a conv_error ();
and let the current string be freed automatically (do nothing in the
initial malloc case).  Inspection of all return places from *scanf
show that EOF (i.e. the case where nothing must be left allocated)
is returned only from code before the format string starts to be parsed
(e.g. ORIENT or ARGCHECK; nothing was malloced at that point, nothing
needs freeing) or for input_error () iff done == 0 (either nothing
has been malloced, or we still have it in *strptr).

Once this is clarified, I'd like to add new *scanf entry points
for -std=c99 or -D_XOPEN_SOURCE{,=500,=600} strict modes
which would disable handling of %as, %aS, %a[ by using similar trick as
*printf_chk routines do for %n in writable strings.

--- libc/stdio-common/vfscanf.c.jj	2007-07-19 19:46:48.000000000 +0200
+++ libc/stdio-common/vfscanf.c	2007-07-28 11:46:33.000000000 +0200
@@ -60,12 +60,13 @@
 #define NOSKIP		0x0020	/* do not skip blanks */
 #define NUMBER_SIGNED	0x0040	/* signed integer */
 #define GROUP		0x0080	/* ': group numbers */
-#define MALLOC		0x0100	/* a: malloc strings */
+#define GNU_MALLOC	0x0100	/* a: malloc strings */
 #define CHAR		0x0200	/* hh: char */
 #define I18N		0x0400	/* I: use locale's digits */
 #define HEXA_FLOAT	0x0800	/* hexadecimal float */
 #define READ_POINTER	0x1000	/* this is a pointer value */
-
+#define POSIX_MALLOC	0x2000	/* m: malloc strings */
+#define MALLOC		(GNU_MALLOC | POSIX_MALLOC)
 
 #include <locale/localeinfo.h>
 #include <libioP.h>
@@ -146,6 +147,21 @@
 			  if (done == 0) done = EOF;			      \
 			  goto errout;					      \
 			} while (0)
+#define add_ptr_to_free(ptr)						      \
+  do									      \
+    {									      \
+      if (ptrs_to_free == NULL						      \
+	  || ptrs_to_free->count == (sizeof (ptrs_to_free->ptrs)	      \
+				     / sizeof (ptrs_to_free->ptrs[0])))	      \
+	{								      \
+	  struct ptrs_to_free *new_ptrs = alloca (sizeof (*ptrs_to_free));    \
+	  new_ptrs->count = 0;						      \
+	  new_ptrs->next = ptrs_to_free;				      \
+	  ptrs_to_free = new_ptrs;					      \
+	}								      \
+      ptrs_to_free->ptrs[ptrs_to_free->count++] = (ptr);		      \
+    }									      \
+  while (0)
 #define ARGCHECK(s, format)						      \
   do									      \
     {									      \
@@ -169,6 +185,12 @@
   _IO_funlockfile (S);							      \
   __libc_cleanup_region_end (0)
 
+struct ptrs_to_free
+{
+  size_t count;
+  struct ptrs_to_free *next;
+  char **ptrs[32];
+};
 
 /* Read formatted input from S according to the format string
    FORMAT, using the argument list in ARG.
@@ -218,6 +240,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 #else
   const char *thousands;
 #endif
+  struct ptrs_to_free *ptrs_to_free = NULL;
   /* State for the conversions.  */
   mbstate_t state;
   /* Integral holding variables.  */
@@ -493,8 +516,11 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 	    }
 	  /* String conversions (%s, %[) take a `char **'
 	     arg and fill it in with a malloc'd pointer.  */
-	  flags |= MALLOC;
+	  flags |= GNU_MALLOC;
 	  break;
+        case L_('m'):
+          flags |= POSIX_MALLOC;
+          break;
 	case L_('z'):
 	  if (need_longlong && sizeof (size_t) > sizeof (unsigned long int))
 	    flags |= LONGDBL;
@@ -741,6 +767,10 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		  if (flags & MALLOC)					      \
 		    {							      \
 		      /* The string is to be stored in a malloc'd buffer.  */ \
+		      /* For %mS using char ** is actually wrong, but	      \
+			 shouldn't make a difference on any arch glibc	      \
+			 supports and would unnecessarily complicate	      \
+			 things. */					      \
 		      strptr = ARG (char **);				      \
 		      if (strptr == NULL)				      \
 			conv_error ();					      \
@@ -748,6 +778,12 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		      strsize = 100;					      \
 		      *strptr = (char *) malloc (strsize * sizeof (Type));    \
 		      Str = (Type *) *strptr;				      \
+		      if (Str != NULL)					      \
+			add_ptr_to_free (strptr);			      \
+		      else if (flags & POSIX_MALLOC)			      \
+			/* XXX http://www.opengroup.org/austin/aardvark/latest/xshbug2.txt \
+			   is unclear here what exactly should happen.  */    \
+			goto reteof;					      \
 		    }							      \
 		  else							      \
 		    Str = ARG (Type *);					      \
@@ -796,10 +832,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 						       strleng + MB_CUR_MAX);
 			    if (newstr == NULL)
 			      {
+				if (flags & POSIX_MALLOC)
+				  /* XXX */
+				  goto reteof;
 				/* We lose.  Oh well.  Terminate the
 				   string and stop converting,
 				   so at least we don't skip any input.  */
 				((char *) (*strptr))[strleng] = '\0';
+				strptr = NULL;
 				++done;
 				conv_error ();
 			      }
@@ -843,10 +883,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 			      str = (char *) realloc (*strptr, strsize + 1);
 			      if (str == NULL)
 				{
+				  if (flags & POSIX_MALLOC)
+				    /* XXX */
+				    goto reteof;
 				  /* We lose.  Oh well.  Terminate the
 				     string and stop converting,
 				     so at least we don't skip any input.  */
 				  ((char *) (*strptr))[strsize - 1] = '\0';
+				  strptr = NULL;
 				  ++done;
 				  conv_error ();
 				}
@@ -886,10 +930,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		      newstr = (char *) realloc (*strptr, strleng + n + 1);
 		      if (newstr == NULL)
 			{
+			  if (flags & POSIX_MALLOC)
+			    /* XXX */
+			    goto reteof;
 			  /* We lose.  Oh well.  Terminate the string
 			     and stop converting, so at least we don't
 			     skip any input.  */
 			  ((char *) (*strptr))[strleng] = '\0';
+			  strptr = NULL;
 			  ++done;
 			  conv_error ();
 			}
@@ -911,6 +959,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		      if (cp != NULL)
 			*strptr = cp;
 		    }
+		  strptr = NULL;
 
 		  ++done;
 		}
@@ -964,10 +1013,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 							* sizeof (wchar_t));
 			    if (wstr == NULL)
 			      {
+				if (flags & POSIX_MALLOC)
+				  /* XXX */
+				  goto reteof;
 				/* We lose.  Oh well.  Terminate the string
 				   and stop converting, so at least we don't
 				   skip any input.  */
 				((wchar_t *) (*strptr))[strsize - 1] = L'\0';
+				strptr = NULL;
 				++done;
 				conv_error ();
 			      }
@@ -1033,10 +1086,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 						       * sizeof (wchar_t)));
 			  if (wstr == NULL)
 			    {
+			      if (flags & POSIX_MALLOC)
+				/* XXX */
+				goto reteof;
 			      /* We lose.  Oh well.  Terminate the
 				 string and stop converting, so at
 				 least we don't skip any input.  */
 			      ((wchar_t *) (*strptr))[strsize - 1] = L'\0';
+			      strptr = NULL;
 			      ++done;
 			      conv_error ();
 			    }
@@ -1072,6 +1129,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		    if (cp != NULL)
 		      *strptr = (char *) cp;
 		  }
+		strptr = NULL;
 
 		++done;
 	      }
@@ -2219,10 +2277,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 						  * sizeof (wchar_t));
 			      if (wstr == NULL)
 				{
+				  if (flags & POSIX_MALLOC)
+				    /* XXX */
+				    goto reteof;
 				  /* We lose.  Oh well.  Terminate the string
 				     and stop converting, so at least we don't
 				     skip any input.  */
 				  ((wchar_t *) (*strptr))[strsize - 1] = L'\0';
+				  strptr = NULL;
 				  ++done;
 				  conv_error ();
 				}
@@ -2298,10 +2360,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 						   * sizeof (wchar_t)));
 			      if (wstr == NULL)
 				{
+				  if (flags & POSIX_MALLOC)
+				    /* XXX */
+				    goto reteof;
 				  /* We lose.  Oh well.  Terminate the
 				     string and stop converting,
 				     so at least we don't skip any input.  */
 				  ((wchar_t *) (*strptr))[strsize - 1] = L'\0';
+				  strptr = NULL;
 				  ++done;
 				  conv_error ();
 				}
@@ -2349,6 +2415,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		      if (cp != NULL)
 			*strptr = (char *) cp;
 		    }
+		  strptr = NULL;
 
 		  ++done;
 		}
@@ -2435,10 +2502,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 							 strleng + MB_CUR_MAX);
 			      if (newstr == NULL)
 				{
+				  if (flags & POSIX_MALLOC)
+				    /* XXX */
+				    goto reteof;
 				  /* We lose.  Oh well.  Terminate the string
 				     and stop converting, so at least we don't
 				     skip any input.  */
 				  ((char *) (*strptr))[strleng] = '\0';
+				  strptr = NULL;
 				  ++done;
 				  conv_error ();
 				}
@@ -2497,10 +2568,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 				  newsize = strsize + 1;
 				  goto allocagain;
 				}
+			      if (flags & POSIX_MALLOC)
+				/* XXX */
+				goto reteof;
 			      /* We lose.  Oh well.  Terminate the
 				 string and stop converting,
 				 so at least we don't skip any input.  */
 			      ((char *) (*strptr))[strsize - 1] = '\0';
+			      strptr = NULL;
 			      ++done;
 			      conv_error ();
 			    }
@@ -2537,10 +2612,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		      newstr = (char *) realloc (*strptr, strleng + n + 1);
 		      if (newstr == NULL)
 			{
+			  if (flags & POSIX_MALLOC)
+			    /* XXX */
+			    goto reteof;
 			  /* We lose.  Oh well.  Terminate the string
 			     and stop converting, so at least we don't
 			     skip any input.  */
 			  ((char *) (*strptr))[strleng] = '\0';
+			  strptr = NULL;
 			  ++done;
 			  conv_error ();
 			}
@@ -2562,6 +2641,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const
 		      if (cp != NULL)
 			*strptr = cp;
 		    }
+		  strptr = NULL;
 
 		  ++done;
 		}
@@ -2600,6 +2680,31 @@ _IO_vfscanf_internal (_IO_FILE *s, const
   if (errp != NULL)
     *errp |= errval;
 
+  if (done == EOF)
+    {
+  reteof:
+      if (__builtin_expect (ptrs_to_free != NULL, 0))
+	{
+	  struct ptrs_to_free *p = ptrs_to_free;
+	  while (p != NULL)
+	    {
+	      for (size_t cnt = 0; cnt < p->count; ++cnt)
+		{
+		  free (*p->ptrs[cnt]);
+		  *p->ptrs[cnt] = NULL;
+		}
+	      p = p->next;
+	      free (ptrs_to_free);
+	      ptrs_to_free = p;
+	    }
+	}
+      return EOF;
+    }
+  else if (__builtin_expect (strptr != NULL, 0))
+    {
+      free (*strptr);
+      *strptr = NULL;
+    }
   return done;
 }
 

	Jakub

                 reply	other threads:[~2007-07-28 11:10 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070728111430.GT4603@sunsite.mff.cuni.cz \
    --to=jakub@redhat.com \
    --cc=drepper@redhat.com \
    --cc=libc-hacker@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).