public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] manual: Document optind zero set behaviour (BZ#23157)
@ 2018-05-21 19:39 Adhemerval Zanella
  2018-05-23 17:52 ` Carlos O'Donell
  0 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella @ 2018-05-21 19:39 UTC (permalink / raw)
  To: libc-alpha

POSIX [1] does not explicit state the expected way to rescans the same
vector more than once.  FreeBSD [2], for instance, exports a non-standard
variable 'optreset' which must be set to '1' prior the second and each
additional call to 'getopt'.  GLIBC in turn requires the program to
reset 'optind' to 0 instead (and POSIX states the behavior is unspecified).

Unfortunately this is not documented on the manual, only on man-pages [3]
(on NOTES).  This patch adds an explanation of this behavior on manual.

	* manual/getopt.texi: Document optind zero set behaviour.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/
[2] https://www.freebsd.org/cgi/man.cgi?getopt(3)
[3] http://man7.org/linux/man-pages/man3/getopt.3.html
---
 ChangeLog          | 4 ++++
 manual/getopt.texi | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/manual/getopt.texi b/manual/getopt.texi
index 5485fc4..a4f6366 100644
--- a/manual/getopt.texi
+++ b/manual/getopt.texi
@@ -45,6 +45,12 @@ of the @var{argv} array to be processed.  Once @code{getopt} has found
 all of the option arguments, you can use this variable to determine
 where the remaining non-option arguments begin.  The initial value of
 this variable is @code{1}.
+
+Resetting the variable value to @code{0} forces the invocation of an
+internal initialization routine and it is used mainly when a program
+wants to rescan the same vector more than once.  It also should be used
+to scan multiple argument vectors or if @code{POSIXLY_CORRECT} is changed
+between scans.
 @end deftypevar
 
 @deftypevar {char *} optarg
-- 
2.7.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] manual: Document optind zero set behaviour (BZ#23157)
  2018-05-21 19:39 [PATCH] manual: Document optind zero set behaviour (BZ#23157) Adhemerval Zanella
@ 2018-05-23 17:52 ` Carlos O'Donell
  2018-05-29 18:45   ` Adhemerval Zanella
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos O'Donell @ 2018-05-23 17:52 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 05/21/2018 03:38 PM, Adhemerval Zanella wrote:
> POSIX [1] does not explicit state the expected way to rescans the same
> vector more than once.  FreeBSD [2], for instance, exports a non-standard
> variable 'optreset' which must be set to '1' prior the second and each
> additional call to 'getopt'.  GLIBC in turn requires the program to
> reset 'optind' to 0 instead (and POSIX states the behavior is unspecified).

I see 5 getopt test cases that use optind = 1 to reparse the options.

Is it optind = 1 or optind = 0?

Can you verify the exact behaviour by adding a specific test case that
tests for *just* this particular behaviour?

> Unfortunately this is not documented on the manual, only on man-pages [3]
> (on NOTES).  This patch adds an explanation of this behavior on manual.
> 
> 	* manual/getopt.texi: Document optind zero set behaviour.
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/
> [2] https://www.freebsd.org/cgi/man.cgi?getopt(3)
> [3] http://man7.org/linux/man-pages/man3/getopt.3.html
> ---
>  ChangeLog          | 4 ++++
>  manual/getopt.texi | 6 ++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/manual/getopt.texi b/manual/getopt.texi
> index 5485fc4..a4f6366 100644
> --- a/manual/getopt.texi
> +++ b/manual/getopt.texi
> @@ -45,6 +45,12 @@ of the @var{argv} array to be processed.  Once @code{getopt} has found
>  all of the option arguments, you can use this variable to determine
>  where the remaining non-option arguments begin.  The initial value of
>  this variable is @code{1}.
> +
> +Resetting the variable value to @code{0} forces the invocation of an
> +internal initialization routine and it is used mainly when a program
> +wants to rescan the same vector more than once.  It also should be used
> +to scan multiple argument vectors or if @code{POSIXLY_CORRECT} is changed
> +between scans.
>  @end deftypevar

Suggest:
Resetting the variable's value to @code{0} forces the invocation of an
internal initialization routine and, on subsequent calls to getopt, causes
the program to rescan the same vector more than once.  This behaviour may
also be used to scan multiple argument vectors or if @code{POSIXLY_CORRECT}
is changed between scans.
  
>  @deftypevar {char *} optarg
> 

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] manual: Document optind zero set behaviour (BZ#23157)
  2018-05-23 17:52 ` Carlos O'Donell
@ 2018-05-29 18:45   ` Adhemerval Zanella
  2018-05-29 20:02     ` Carlos O'Donell
  0 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella @ 2018-05-29 18:45 UTC (permalink / raw)
  To: Carlos O'Donell, libc-alpha



On 23/05/2018 14:52, Carlos O'Donell wrote:
> On 05/21/2018 03:38 PM, Adhemerval Zanella wrote:
>> POSIX [1] does not explicit state the expected way to rescans the same
>> vector more than once.  FreeBSD [2], for instance, exports a non-standard
>> variable 'optreset' which must be set to '1' prior the second and each
>> additional call to 'getopt'.  GLIBC in turn requires the program to
>> reset 'optind' to 0 instead (and POSIX states the behavior is unspecified).
> 
> I see 5 getopt test cases that use optind = 1 to reparse the options.
> 
> Is it optind = 1 or optind = 0?
> 
> Can you verify the exact behaviour by adding a specific test case that
> tests for *just* this particular behaviour?

The issue is getopt uses information tied to argv internal contents to
determine whether to advance to next argv element.  On the testcase from
BZ#23157 it does:

---
void test_getopt(int argc, char **argv, const char *optstring, int expected) {
    int oc = getopt(argc, argv, optstring);
    if (oc == expected) {
        printf("PASS getopt = %c (%d)\n", oc, oc);
    } else {
        printf("FAIL getopt = %c (%d), expected %c (%d)\n",
               oc, oc, expected, expected);
    }
}

int main(int ac, char **av) {
    int argc;
    char *argv[16];
    char test[16];

    argv[0] = "ignored-1";
    strcpy(test, "-a");
    argv[1] = test;
    argv[2] = "non-option-1";
    argv[3] = NULL;
    argc = 3;

    /* As expected */
    test_getopt(argc, argv, "ab", 'a');
    /* As expected */
    test_getopt(argc, argv, "ab", -1);

    argv[0] = "ignored-2";
    argv[1] = "non-option-2";
    argv[2] = NULL;
    argc = 2;

    optind = 1;
    strcpy(test, "-ab");

    /* Fails, as __nextchar is still pointing into 'test' */
    test_getopt(argc, argv, "ab", -1);

    return 0;
}
---

After second getopt called by test_getopt, __nextchar will point to &test[2] which
'\0'.  Without changing 'test', the behaviour would be to indicate the next argv
should be used as:

posix/getopt.c

492   if (d->__nextchar == NULL || *d->__nextchar == '\0')                               
493     {                                                                                
494       /* Advance to the next ARGV-element.  */                                       
495                                                                                      
496       /* Give FIRST_NONOPT & LAST_NONOPT rational values if OPTIND has been          
497          moved back by the user (who may also have changed the arguments).  */       
498       if (d->__last_nonopt > d->optind)                                          

However, since program explicit changed 'test' value, the __nextchar on third
getopt invocation will yield 'b' instead and thus will basically invalidate
the algorithm.  Without changing test contents, setting optind to 1 works 
unless the man-pages noted cases.

Now I am not sure if program is abusing of getopt semantic, or glibc is tying
with information it should (the input argument), or if it just undefined
behaviour.

> 
>> Unfortunately this is not documented on the manual, only on man-pages [3]
>> (on NOTES).  This patch adds an explanation of this behavior on manual.
>>
>> 	* manual/getopt.texi: Document optind zero set behaviour.
>>
>> [1] http://pubs.opengroup.org/onlinepubs/9699919799/
>> [2] https://www.freebsd.org/cgi/man.cgi?getopt(3)
>> [3] http://man7.org/linux/man-pages/man3/getopt.3.html
>> ---
>>  ChangeLog          | 4 ++++
>>  manual/getopt.texi | 6 ++++++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/manual/getopt.texi b/manual/getopt.texi
>> index 5485fc4..a4f6366 100644
>> --- a/manual/getopt.texi
>> +++ b/manual/getopt.texi
>> @@ -45,6 +45,12 @@ of the @var{argv} array to be processed.  Once @code{getopt} has found
>>  all of the option arguments, you can use this variable to determine
>>  where the remaining non-option arguments begin.  The initial value of
>>  this variable is @code{1}.
>> +
>> +Resetting the variable value to @code{0} forces the invocation of an
>> +internal initialization routine and it is used mainly when a program
>> +wants to rescan the same vector more than once.  It also should be used
>> +to scan multiple argument vectors or if @code{POSIXLY_CORRECT} is changed
>> +between scans.
>>  @end deftypevar
> 
> Suggest:
> Resetting the variable's value to @code{0} forces the invocation of an
> internal initialization routine and, on subsequent calls to getopt, causes
> the program to rescan the same vector more than once.  This behaviour may
> also be used to scan multiple argument vectors or if @code{POSIXLY_CORRECT}
> is changed between scans.

I would add it is required as well if argv contents is changed over the calls.

>   
>>  @deftypevar {char *} optarg
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] manual: Document optind zero set behaviour (BZ#23157)
  2018-05-29 18:45   ` Adhemerval Zanella
@ 2018-05-29 20:02     ` Carlos O'Donell
  2018-05-29 21:52       ` Adhemerval Zanella
  2018-05-29 22:47       ` Paul Eggert
  0 siblings, 2 replies; 7+ messages in thread
From: Carlos O'Donell @ 2018-05-29 20:02 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 05/29/2018 02:44 PM, Adhemerval Zanella wrote:
> Now I am not sure if program is abusing of getopt semantic, or glibc is tying
> with information it should (the input argument), or if it just undefined
> behaviour.

We have no test cases that use optind setting to zero. That isn't proof conclusive
that the feature is not supported.

We have only a manual page with a note about resetting optind to 0 to allow the
reset to happen. That's not canonical.

We have a BSD implementation that does something similar but with a distinct
variable.

My suggestion:

* Review other implementations of GNU getopt and see what they support for
  optind setting to 0, e.g. https://github.com/agriffis/pure-getopt
  If they don't support setting optind to 0, then that's better proof that
  we should not support this.

* Consider copying BSD's non-standard optreset. I like the idea of a distinct
  variable for use in the reset.

* Consider sending a patch to the linux man pages project to remove the
  note regarding glibc's undocumented method for rescanning.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] manual: Document optind zero set behaviour (BZ#23157)
  2018-05-29 20:02     ` Carlos O'Donell
@ 2018-05-29 21:52       ` Adhemerval Zanella
  2018-05-29 22:47       ` Paul Eggert
  1 sibling, 0 replies; 7+ messages in thread
From: Adhemerval Zanella @ 2018-05-29 21:52 UTC (permalink / raw)
  To: Carlos O'Donell, libc-alpha



On 29/05/2018 17:02, Carlos O'Donell wrote:
> On 05/29/2018 02:44 PM, Adhemerval Zanella wrote:
>> Now I am not sure if program is abusing of getopt semantic, or glibc is tying
>> with information it should (the input argument), or if it just undefined
>> behaviour.
> 
> We have no test cases that use optind setting to zero. That isn't proof conclusive
> that the feature is not supported.

We do:

  * posix/bug-getopt2.c explicit sets optind first to 0 and then to 1.
  * posix/tst-getopt-cancel.c also sets optind to 0.

Both are due to explicit reset internal state to handle the optstring starting
with '+'. 

> 
> We have only a manual page with a note about resetting optind to 0 to allow the
> reset to happen. That's not canonical.
> 
> We have a BSD implementation that does something similar but with a distinct
> variable.
> 
> My suggestion:
> 
> * Review other implementations of GNU getopt and see what they support for
>   optind setting to 0, e.g. https://github.com/agriffis/pure-getopt
>   If they don't support setting optind to 0, then that's better proof that
>   we should not support this.

I don't think this example applies here (optind is really tied to C semantic,
this bash project have a different to access the results). GLIBC getopt is
pretty much in sync with gnulib and I am not aware of any other gnu getopt
replacement.

> 
> * Consider copying BSD's non-standard optreset. I like the idea of a distinct
>   variable for use in the reset.

I am not sure because we will have different reset mechanism if we want to
provide the current semantic and we will need to define optreset interaction
with GNU extensions (which is essentially which adds optind complexity here).
We can also rewrite how optind interacts with GNU extensions and define optreset
would be canonical way, but it will also adds more code complexity and probably
requires a compat symbol.

> 
> * Consider sending a patch to the linux man pages project to remove the
>   note regarding glibc's undocumented method for rescanning.

I still prefer if we document more properly what are GLIBC behaviour regarding
non specified POSIX semantic.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] manual: Document optind zero set behaviour (BZ#23157)
  2018-05-29 20:02     ` Carlos O'Donell
  2018-05-29 21:52       ` Adhemerval Zanella
@ 2018-05-29 22:47       ` Paul Eggert
  2018-05-30 16:18         ` Carlos O'Donell
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Eggert @ 2018-05-29 22:47 UTC (permalink / raw)
  To: Carlos O'Donell, Adhemerval Zanella, libc-alpha

It's not just test cases. Some GNU programs set optind=0 and expect this 
to cause getopt_long to rescan. The Coreutils programs env, nice, and 
stty all do this. This suggests that getopt_long should keep this 
behavior and its documentation, and that for consistency's sake getopt 
should also keep doing it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] manual: Document optind zero set behaviour (BZ#23157)
  2018-05-29 22:47       ` Paul Eggert
@ 2018-05-30 16:18         ` Carlos O'Donell
  0 siblings, 0 replies; 7+ messages in thread
From: Carlos O'Donell @ 2018-05-30 16:18 UTC (permalink / raw)
  To: Paul Eggert, Adhemerval Zanella, libc-alpha

On 05/29/2018 06:45 PM, Paul Eggert wrote:
> It's not just test cases. Some GNU programs set optind=0 and expect
> this to cause getopt_long to rescan. The Coreutils programs env,
> nice, and stty all do this. This suggests that getopt_long should
> keep this behavior and its documentation, and that for consistency's
> sake getopt should also keep doing it.

Excellent, this is all proof conclusive that we need:

* Document optind=0 setting.

* Add a testcase to cover this specifically and call it out.

Adhemerval can you post a v2 with a test case and I'll review that?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-30 16:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-21 19:39 [PATCH] manual: Document optind zero set behaviour (BZ#23157) Adhemerval Zanella
2018-05-23 17:52 ` Carlos O'Donell
2018-05-29 18:45   ` Adhemerval Zanella
2018-05-29 20:02     ` Carlos O'Donell
2018-05-29 21:52       ` Adhemerval Zanella
2018-05-29 22:47       ` Paul Eggert
2018-05-30 16:18         ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).