From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 650 invoked by alias); 28 Jul 2007 11:10:12 -0000 Received: (qmail 634 invoked by uid 22791); 28 Jul 2007 11:10:12 -0000 X-Spam-Check-By: sourceware.org Received: from sunsite.ms.mff.cuni.cz (HELO sunsite.mff.cuni.cz) (195.113.15.26) by sourceware.org (qpsmtpd/0.31) with ESMTP; Sat, 28 Jul 2007 11:10:08 +0000 Received: from sunsite.mff.cuni.cz (localhost.localdomain [127.0.0.1]) by sunsite.mff.cuni.cz (8.13.8/8.13.8) with ESMTP id l6SBEVX3003881; Sat, 28 Jul 2007 13:14:31 +0200 Received: (from jakub@localhost) by sunsite.mff.cuni.cz (8.13.8/8.13.8/Submit) id l6SBEVd9003880; Sat, 28 Jul 2007 13:14:31 +0200 Date: Sat, 28 Jul 2007 11:10:00 -0000 From: Jakub Jelinek To: Ulrich Drepper Cc: Glibc hackers Subject: [RFC] *scanf and %as, %aS, %a[ and %ms, %mS, %m[ Message-ID: <20070728111430.GT4603@sunsite.mff.cuni.cz> Reply-To: Jakub Jelinek Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.2i Mailing-List: contact libc-hacker-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-hacker-owner@sourceware.org X-SW-Source: 2007-07/txt/msg00049.txt.bz2 Hi! We leak memory in our %as, %aS and %a[ implementation: when we successfully malloc a buffer for %as, %aS, %a[, but some kind of error (e.g. input_error () or conv_error ()) happens before the corresponding ++done, the memory is not freed and the caller has no way to find out it should do that (if that input_error () happens on the very first %as argument, *scanf will return EOF resp. otherwise will return number of before that argument assigned arguments). This is solvable just by making sure strptr is set to NULL on the next ++done and freeing *strptr when strptr != NULL. With that fix, the semantics is quite clear - when *scanf returns EOF, the caller doesn't need to free any memory, when it returns X != EOF, all arguments up to Xth which used %as, %aS, %a[ need to be freed (so for X 0 nothing, for sscanf ("%as %as", &p, &q) == 1 p needs to be freed, etc.). If the initial malloc failed, *scanf will return the number of already assigned arguments. If realloc failure happens, *scanf will return the number of already assigned arguments plus 1 (i.e. as if it successfully converted even the current %as/%aS/%a[ string) and truncates the string to whatever was read at that point and sets errno to ENOMEM. http://www.opengroup.org/austin/aardvark/latest/xshbug2.txt shows POSIX is planning to standardize m modifier instead (which is highly desirable, because a modifier conflicts with ISO C99 standardized conversion specifier for hex floats), but there are a few issues in that text: 1) it first documents m modifier to be applicable only to s, S and [, but then in c and C description talks about %mc and %mC. For c/C the caller always knows the size of the buffer before the call, so %mc/%mC isn't strictly needed. 2) for ENOMEM case from both internal malloc and realloc, the spec says a conversion error is supposed to happen (I haven't found anywhere specified what is meant by conversion error), errno is set to ENOMEM and all already malloced arguments must be freed. I can understand why the %as behavior of truncating the string on realloc can be seen as inappropriate, but the freeing of all other already assigned malloced string sounds very weird to me, unless "conversion error" means returning EOF (in which case the caller really has no way to know how many strings were converted). I have implemented that below to make it easier to discuss this. Alternatively, if "conversion error" is meant what glibc's conv_error () does, i.e. just let errno be set and let it return the number of already successfully assigned arguments, then requiring all malloced strings to be freed rather than the one currently being malloced/realloced sounds very wrong to me. The caller needs to handle the freeing anyway for cases like input_error etc. - in all cases it knows all successfully assigned arguments need to be freed, all not assigned ones are effectively unitialized. This would certainly simplify the changes a lot - only the if (strptr != NULL) { free (*strptr); *strptr = NULL; } case would need to be handled, not all the keeping of pointers to all arguments. if (flags & POSIX_MALLOC) would just do a conv_error (); and let the current string be freed automatically (do nothing in the initial malloc case). Inspection of all return places from *scanf show that EOF (i.e. the case where nothing must be left allocated) is returned only from code before the format string starts to be parsed (e.g. ORIENT or ARGCHECK; nothing was malloced at that point, nothing needs freeing) or for input_error () iff done == 0 (either nothing has been malloced, or we still have it in *strptr). Once this is clarified, I'd like to add new *scanf entry points for -std=c99 or -D_XOPEN_SOURCE{,=500,=600} strict modes which would disable handling of %as, %aS, %a[ by using similar trick as *printf_chk routines do for %n in writable strings. --- libc/stdio-common/vfscanf.c.jj 2007-07-19 19:46:48.000000000 +0200 +++ libc/stdio-common/vfscanf.c 2007-07-28 11:46:33.000000000 +0200 @@ -60,12 +60,13 @@ #define NOSKIP 0x0020 /* do not skip blanks */ #define NUMBER_SIGNED 0x0040 /* signed integer */ #define GROUP 0x0080 /* ': group numbers */ -#define MALLOC 0x0100 /* a: malloc strings */ +#define GNU_MALLOC 0x0100 /* a: malloc strings */ #define CHAR 0x0200 /* hh: char */ #define I18N 0x0400 /* I: use locale's digits */ #define HEXA_FLOAT 0x0800 /* hexadecimal float */ #define READ_POINTER 0x1000 /* this is a pointer value */ - +#define POSIX_MALLOC 0x2000 /* m: malloc strings */ +#define MALLOC (GNU_MALLOC | POSIX_MALLOC) #include #include @@ -146,6 +147,21 @@ if (done == 0) done = EOF; \ goto errout; \ } while (0) +#define add_ptr_to_free(ptr) \ + do \ + { \ + if (ptrs_to_free == NULL \ + || ptrs_to_free->count == (sizeof (ptrs_to_free->ptrs) \ + / sizeof (ptrs_to_free->ptrs[0]))) \ + { \ + struct ptrs_to_free *new_ptrs = alloca (sizeof (*ptrs_to_free)); \ + new_ptrs->count = 0; \ + new_ptrs->next = ptrs_to_free; \ + ptrs_to_free = new_ptrs; \ + } \ + ptrs_to_free->ptrs[ptrs_to_free->count++] = (ptr); \ + } \ + while (0) #define ARGCHECK(s, format) \ do \ { \ @@ -169,6 +185,12 @@ _IO_funlockfile (S); \ __libc_cleanup_region_end (0) +struct ptrs_to_free +{ + size_t count; + struct ptrs_to_free *next; + char **ptrs[32]; +}; /* Read formatted input from S according to the format string FORMAT, using the argument list in ARG. @@ -218,6 +240,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const #else const char *thousands; #endif + struct ptrs_to_free *ptrs_to_free = NULL; /* State for the conversions. */ mbstate_t state; /* Integral holding variables. */ @@ -493,8 +516,11 @@ _IO_vfscanf_internal (_IO_FILE *s, const } /* String conversions (%s, %[) take a `char **' arg and fill it in with a malloc'd pointer. */ - flags |= MALLOC; + flags |= GNU_MALLOC; break; + case L_('m'): + flags |= POSIX_MALLOC; + break; case L_('z'): if (need_longlong && sizeof (size_t) > sizeof (unsigned long int)) flags |= LONGDBL; @@ -741,6 +767,10 @@ _IO_vfscanf_internal (_IO_FILE *s, const if (flags & MALLOC) \ { \ /* The string is to be stored in a malloc'd buffer. */ \ + /* For %mS using char ** is actually wrong, but \ + shouldn't make a difference on any arch glibc \ + supports and would unnecessarily complicate \ + things. */ \ strptr = ARG (char **); \ if (strptr == NULL) \ conv_error (); \ @@ -748,6 +778,12 @@ _IO_vfscanf_internal (_IO_FILE *s, const strsize = 100; \ *strptr = (char *) malloc (strsize * sizeof (Type)); \ Str = (Type *) *strptr; \ + if (Str != NULL) \ + add_ptr_to_free (strptr); \ + else if (flags & POSIX_MALLOC) \ + /* XXX http://www.opengroup.org/austin/aardvark/latest/xshbug2.txt \ + is unclear here what exactly should happen. */ \ + goto reteof; \ } \ else \ Str = ARG (Type *); \ @@ -796,10 +832,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const strleng + MB_CUR_MAX); if (newstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((char *) (*strptr))[strleng] = '\0'; + strptr = NULL; ++done; conv_error (); } @@ -843,10 +883,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const str = (char *) realloc (*strptr, strsize + 1); if (str == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((char *) (*strptr))[strsize - 1] = '\0'; + strptr = NULL; ++done; conv_error (); } @@ -886,10 +930,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const newstr = (char *) realloc (*strptr, strleng + n + 1); if (newstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((char *) (*strptr))[strleng] = '\0'; + strptr = NULL; ++done; conv_error (); } @@ -911,6 +959,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const if (cp != NULL) *strptr = cp; } + strptr = NULL; ++done; } @@ -964,10 +1013,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const * sizeof (wchar_t)); if (wstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((wchar_t *) (*strptr))[strsize - 1] = L'\0'; + strptr = NULL; ++done; conv_error (); } @@ -1033,10 +1086,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const * sizeof (wchar_t))); if (wstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((wchar_t *) (*strptr))[strsize - 1] = L'\0'; + strptr = NULL; ++done; conv_error (); } @@ -1072,6 +1129,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const if (cp != NULL) *strptr = (char *) cp; } + strptr = NULL; ++done; } @@ -2219,10 +2277,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const * sizeof (wchar_t)); if (wstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((wchar_t *) (*strptr))[strsize - 1] = L'\0'; + strptr = NULL; ++done; conv_error (); } @@ -2298,10 +2360,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const * sizeof (wchar_t))); if (wstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((wchar_t *) (*strptr))[strsize - 1] = L'\0'; + strptr = NULL; ++done; conv_error (); } @@ -2349,6 +2415,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const if (cp != NULL) *strptr = (char *) cp; } + strptr = NULL; ++done; } @@ -2435,10 +2502,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const strleng + MB_CUR_MAX); if (newstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((char *) (*strptr))[strleng] = '\0'; + strptr = NULL; ++done; conv_error (); } @@ -2497,10 +2568,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const newsize = strsize + 1; goto allocagain; } + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((char *) (*strptr))[strsize - 1] = '\0'; + strptr = NULL; ++done; conv_error (); } @@ -2537,10 +2612,14 @@ _IO_vfscanf_internal (_IO_FILE *s, const newstr = (char *) realloc (*strptr, strleng + n + 1); if (newstr == NULL) { + if (flags & POSIX_MALLOC) + /* XXX */ + goto reteof; /* We lose. Oh well. Terminate the string and stop converting, so at least we don't skip any input. */ ((char *) (*strptr))[strleng] = '\0'; + strptr = NULL; ++done; conv_error (); } @@ -2562,6 +2641,7 @@ _IO_vfscanf_internal (_IO_FILE *s, const if (cp != NULL) *strptr = cp; } + strptr = NULL; ++done; } @@ -2600,6 +2680,31 @@ _IO_vfscanf_internal (_IO_FILE *s, const if (errp != NULL) *errp |= errval; + if (done == EOF) + { + reteof: + if (__builtin_expect (ptrs_to_free != NULL, 0)) + { + struct ptrs_to_free *p = ptrs_to_free; + while (p != NULL) + { + for (size_t cnt = 0; cnt < p->count; ++cnt) + { + free (*p->ptrs[cnt]); + *p->ptrs[cnt] = NULL; + } + p = p->next; + free (ptrs_to_free); + ptrs_to_free = p; + } + } + return EOF; + } + else if (__builtin_expect (strptr != NULL, 0)) + { + free (*strptr); + *strptr = NULL; + } return done; } Jakub