public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Fwd: gawk core dumped on too many input values
       [not found] <e0fa509d-890a-2db7-c91f-2b1d904a9a1e@comcast.net>
@ 2023-08-27 18:24 ` Ed Morton
  2023-08-27 23:07   ` Jeremy Hetzler
  2023-08-28  8:38   ` Fwd: " Takashi Yano
  0 siblings, 2 replies; 6+ messages in thread
From: Ed Morton @ 2023-08-27 18:24 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

This (original email below) turned out to be a general cygwin issue, not 
a gawk issue:

$ LC_ALL=C sed 's/x/y/' $(seq 1000000)
Segmentation fault (core dumped)

$ LC_ALL=C grep 'foo' $(seq 1000000)
Segmentation fault (core dumped)

Regards,

     Ed.

-------- Forwarded Message --------
Subject: 	gawk core dumped on too many input values
Date: 	Sun, 27 Aug 2023 08:09:54 -0500
From: 	Ed Morton <mortoneccc@comcast.net>
To: 	bug-gawk@gnu.org <bug-gawk@gnu.org>



Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security 
-Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong 
--param=ssp-buffer-size=4 
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.2.2-1.x86_64/build=/usr/src/debug/gawk-5.2.2-1 
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.2.2-1.x86_64/src/gawk-5.2.2=/usr/src/debug/gawk-5.2.2-1 
-DNDEBUG
uname output: CYGWIN_NT-10.0-22621 TournaMart_2023 3.4.8-1.x86_64 
2023-08-17 17:02 UTC x86_64 Cygwin
Machine Type: x86_64-pc-cygwin

Gawk Version: 5.2.2

Attestation 1:
         I have read 
https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
         Yes

Attestation 2:
         I have not modified the sources before building gawk.
         True

Description:
         I was trying to test something related to ARG_MAX when I ran the
         awk script below and it core dumped instead of reporting an error
         and exiting gracefully. In case it's useful getconf ARG_MAX outputs
         32000.

Repeat-By:
         $ LC_ALL=C awk 'BEGIN{print ARGC}' $(seq 1000000)
         Segmentation fault (core dumped)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gawk core dumped on too many input values
  2023-08-27 18:24 ` Fwd: gawk core dumped on too many input values Ed Morton
@ 2023-08-27 23:07   ` Jeremy Hetzler
  2023-08-28 11:47     ` Joshuah Hurst
  2023-08-28  8:38   ` Fwd: " Takashi Yano
  1 sibling, 1 reply; 6+ messages in thread
From: Jeremy Hetzler @ 2023-08-27 23:07 UTC (permalink / raw)
  To: cygwin; +Cc: Ed Morton

[-- Attachment #1: Type: text/plain, Size: 2780 bytes --]

On Sun, Aug 27, 2023 at 2:25 PM Ed Morton via Cygwin <cygwin@cygwin.com>
wrote:
>
> This (original email below) turned out to be a general cygwin issue, not
> a gawk issue:
>
> $ LC_ALL=C sed 's/x/y/' $(seq 1000000)
> Segmentation fault (core dumped)
>
> $ LC_ALL=C grep 'foo' $(seq 1000000)
> Segmentation fault (core dumped)
>

Seems that all commands linked with cygwin1.dll will fault if you pass them
a long enough arglist.

For me, /bin/true faults on {1..258231} but not {1..258230}.

> $ /bin/true {1..258230}
>


> $ /bin/true {1..258231}
> Segmentation fault (core dumped)


strace, which is not linked with cygwin1.dll, exits cleanly.

> $ /bin/strace {1..300000}
> -bash: /bin/strace: Argument list too long


See this page [1] on maximum argument lengths.

It would be nice to document this limit, whatever it is.

It would also be nice to return an error to the shell on too-long arglist.

$ cat true.exe.stackdump
>
Exception: STATUS_STACK_OVERFLOW at rip=7FFD41BD4646
> rax=0000000000051F10 rbx=0000000800009991 rcx=00000007FFE03C50
> rdx=00007FFD41BF58A0 rsi=0000000000000000 rdi=0000000000000000
> r8 =FEFEFEFEFEFEFF42 r9 =00007FFD41BF5820 r10=FEFEFEFEFEFEFEFF
> r11=FEFEFEFEFEFEFEFF r12=000000080000998D r13=00000007FFFFCDF0
> r14=0000000000000000 r15=0000000000000000
> rbp=00000007FFFFCD30 rsp=00000007FFFFCC38
> program=C:\cygwin64\bin\true.exe, pid 44496, thread
> cs=0033 ds=002B es=002B fs=0053 gs=002B ss=002B
> Stack trace:
> Frame         Function      Args
> 0007FFFFCD30  7FFD41BD4646 (7FFD41A08035, 7FFD41A06F80, 000000000000,
> 000000000000) cygwin1.dll+0x1D4646
> 0007FFFFCD30  000000249F10 (7FFD41A06F80, 000000000000, 000000000000,
> 000000000000)
> 0007FFFFCD30  7FFD41BF5800 (000000000000, 000000000000, 000000000000,
> 000000000000) cygwin1.dll+0x1F5800
> 0007FFFFCD30  7FFD41A08035 (000000000000, 000000000000, 000000000000,
> 000000000000) cygwin1.dll+0x8035
> 0007FFFFFFF0  7FFD41A05C86 (000000000000, 000000000000, 000000000000,
> 000000000000) cygwin1.dll+0x5C86
> 0007FFFFFFF0  7FFD41A05D34 (000000000000, 000000000000, 000000000000,
> 000000000000) cygwin1.dll+0x5D34
> End of stack trace
> Loaded modules:
> 000100400000 true.exe
> 7FFD52F30000 ntdll.dll
> 7FFD52250000 KERNEL32.DLL
> 7FFD50940000 KERNELBASE.dll
> 0005EE2D0000 cygintl-8.dll
> 7FFD41A00000 cygwin1.dll
> 0003F9F70000 cygiconv-2.dll
> 7FFD51C70000 advapi32.dll
> 7FFD51FD0000 msvcrt.dll
> 7FFD525A0000 sechost.dll
> 7FFD52650000 RPCRT4.dll
> 7FFD4F3A0000 CRYPTBASE.DLL
> 7FFD50380000 bcryptPrimitives.dll



> $ uname -a
> CYGWIN_NT-10.0-22621 nzxt 3.4.8-1.x86_64 2023-08-17 17:02 UTC x86_64 Cygwin


Thanks,
Jeremy Hetzler

[1] https://www.in-ulm.de/~mascheck/various/argmax/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: gawk core dumped on too many input values
  2023-08-27 18:24 ` Fwd: gawk core dumped on too many input values Ed Morton
  2023-08-27 23:07   ` Jeremy Hetzler
@ 2023-08-28  8:38   ` Takashi Yano
  1 sibling, 0 replies; 6+ messages in thread
From: Takashi Yano @ 2023-08-28  8:38 UTC (permalink / raw)
  To: cygwin; +Cc: Ed Morton

On Sun, 27 Aug 2023 13:24:55 -0500
Ed Morton wrote:
> This (original email below) turned out to be a general cygwin issue, not 
> a gawk issue:
> 
> $ LC_ALL=C sed 's/x/y/' $(seq 1000000)
> Segmentation fault (core dumped)
> 
> $ LC_ALL=C grep 'foo' $(seq 1000000)
> Segmentation fault (core dumped)
> 
> Regards,
> 
>      Ed.
> 
> -------- Forwarded Message --------
> Subject: 	gawk core dumped on too many input values
> Date: 	Sun, 27 Aug 2023 08:09:54 -0500
> From: 	Ed Morton <mortoneccc@comcast.net>
> To: 	bug-gawk@gnu.org <bug-gawk@gnu.org>
> 
> 
> 
> Configuration Information [Automatically generated, do not change]:
> Machine: x86_64
> OS: cygwin
> Compiler: gcc
> Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security 
> -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong 
> --param=ssp-buffer-size=4 
> -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.2.2-1.x86_64/build=/usr/src/debug/gawk-5.2.2-1 
> -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.2.2-1.x86_64/src/gawk-5.2.2=/usr/src/debug/gawk-5.2.2-1 
> -DNDEBUG
> uname output: CYGWIN_NT-10.0-22621 TournaMart_2023 3.4.8-1.x86_64 
> 2023-08-17 17:02 UTC x86_64 Cygwin
> Machine Type: x86_64-pc-cygwin
> 
> Gawk Version: 5.2.2
> 
> Attestation 1:
>          I have read 
> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
>          Yes
> 
> Attestation 2:
>          I have not modified the sources before building gawk.
>          True
> 
> Description:
>          I was trying to test something related to ARG_MAX when I ran the
>          awk script below and it core dumped instead of reporting an error
>          and exiting gracefully. In case it's useful getconf ARG_MAX outputs
>          32000.
> 
> Repeat-By:
>          $ LC_ALL=C awk 'BEGIN{print ARGC}' $(seq 1000000)
>          Segmentation fault (core dumped)

Thanks for the report.
I will submit a patch for this issue shortly.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gawk core dumped on too many input values
  2023-08-27 23:07   ` Jeremy Hetzler
@ 2023-08-28 11:47     ` Joshuah Hurst
  2023-08-28 18:20       ` Brian Inglis
  0 siblings, 1 reply; 6+ messages in thread
From: Joshuah Hurst @ 2023-08-28 11:47 UTC (permalink / raw)
  To: cygwin

On Mon, Aug 28, 2023 at 1:08 AM Jeremy Hetzler via Cygwin
<cygwin@cygwin.com> wrote:
>
> On Sun, Aug 27, 2023 at 2:25 PM Ed Morton via Cygwin <cygwin@cygwin.com>
> wrote:
> >
> > This (original email below) turned out to be a general cygwin issue, not
> > a gawk issue:
> >
> > $ LC_ALL=C sed 's/x/y/' $(seq 1000000)
> > Segmentation fault (core dumped)
> >
> > $ LC_ALL=C grep 'foo' $(seq 1000000)
> > Segmentation fault (core dumped)
> >
>
> Seems that all commands linked with cygwin1.dll will fault if you pass them
> a long enough arglist.
>
> For me, /bin/true faults on {1..258231} but not {1..258230}.
>
> > $ /bin/true {1..258230}
> >
>
>
> > $ /bin/true {1..258231}
> > Segmentation fault (core dumped)
>
>
> strace, which is not linked with cygwin1.dll, exits cleanly.
>
> > $ /bin/strace {1..300000}
> > -bash: /bin/strace: Argument list too long
>
>
> See this page [1] on maximum argument lengths.
>
> It would be nice to document this limit, whatever it is.

Is this limit?

$ getconf -a | grep -E 'ARG_MAX'
_POSIX_ARG_MAX                      4096
ARG_MAX                             32000

>
> It would also be nice to return an error to the shell on too-long arglist.

+1
-- 
Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gawk core dumped on too many input values
  2023-08-28 11:47     ` Joshuah Hurst
@ 2023-08-28 18:20       ` Brian Inglis
  2023-08-29 13:03         ` Corinna Vinschen
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Inglis @ 2023-08-28 18:20 UTC (permalink / raw)
  To: cygwin; +Cc: Joshuah Hurst

On 2023-08-28 05:47, Joshuah Hurst via Cygwin wrote:
> On Mon, Aug 28, 2023 at 1:08 AM Jeremy Hetzler via Cygwin
> <cygwin@cygwin.com> wrote:
>>
>> On Sun, Aug 27, 2023 at 2:25 PM Ed Morton via Cygwin <cygwin@cygwin.com>
>> wrote:
>>>
>>> This (original email below) turned out to be a general cygwin issue, not
>>> a gawk issue:
>>>
>>> $ LC_ALL=C sed 's/x/y/' $(seq 1000000)
>>> Segmentation fault (core dumped)
>>>
>>> $ LC_ALL=C grep 'foo' $(seq 1000000)
>>> Segmentation fault (core dumped)
>>>
>>
>> Seems that all commands linked with cygwin1.dll will fault if you pass them
>> a long enough arglist.
>>
>> For me, /bin/true faults on {1..258231} but not {1..258230}.
>>
>>> $ /bin/true {1..258230}
>>>
>>
>>
>>> $ /bin/true {1..258231}
>>> Segmentation fault (core dumped)
>>
>>
>> strace, which is not linked with cygwin1.dll, exits cleanly.
>>
>>> $ /bin/strace {1..300000}
>>> -bash: /bin/strace: Argument list too long
>>
>>
>> See this page [1] on maximum argument lengths.
>>
>> It would be nice to document this limit, whatever it is.
> 
> Is this limit?
> 
> $ getconf -a | grep -E 'ARG_MAX'
> _POSIX_ARG_MAX                      4096
> ARG_MAX                             32000

On my system, /bin/true and /bin/echo get to between 250K and 256K before seg 
faulting, requiring between digits + separator + ptr (10+8)*(250-256)KB == 
3750-3840KB;

current Cygwin and `bash` `:` and `echo` internal commands get to between 128M 
and 256M before `bash` fork fails, and Windows gets unhappy with memory full, 
paging high, and page thrashing on both paging devices: calmed eventually with a 
few interrupts then `exec bash`;
requiring between digits + separator + ptr (10+8)*(128-256)MB == 2.25-4.5GB;

This seems to be extremely conservative:

$ xargs -r --show-limits <<< ' '
Your environment variables take up 9246 bytes
POSIX upper limit on argument length (this system): 20706
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 11460
Size of command buffer we are actually using: 20706
Maximum parallelism (--max-procs must be no greater): 2147483647
$ seq $((256*1024)) | xargs | wc -lwcL
     102  262144 1723903   18094

Once a patch is in a test version of Cygwin, I can rebuild and release a test 
version of findutils, which includes xargs, with updated results in the test 
release announcement.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gawk core dumped on too many input values
  2023-08-28 18:20       ` Brian Inglis
@ 2023-08-29 13:03         ` Corinna Vinschen
  0 siblings, 0 replies; 6+ messages in thread
From: Corinna Vinschen @ 2023-08-29 13:03 UTC (permalink / raw)
  To: cygwin; +Cc: Ed Morton

On Aug 28 12:20, Brian Inglis via Cygwin wrote:
> On 2023-08-28 05:47, Joshuah Hurst via Cygwin wrote:
> > On Mon, Aug 28, 2023 at 1:08 AM Jeremy Hetzler via Cygwin
> > <cygwin@cygwin.com> wrote:
> > > 
> > > On Sun, Aug 27, 2023 at 2:25 PM Ed Morton via Cygwin <cygwin@cygwin.com>
> > > wrote:
> > > > 
> > > > This (original email below) turned out to be a general cygwin issue, not
> > > > a gawk issue:
> > > > 
> > > > $ LC_ALL=C sed 's/x/y/' $(seq 1000000)
> > > > Segmentation fault (core dumped)
> > > > 
> > > > $ LC_ALL=C grep 'foo' $(seq 1000000)
> > > > Segmentation fault (core dumped)

This is fixed in current git and can be tested with the next test
release cygwin-3.5.0-0.404.gca2a4ec24362, which is just being built
and uploaded in a few mins.

> > [...]
> > Is this limit?
> > 
> > $ getconf -a | grep -E 'ARG_MAX'
> > _POSIX_ARG_MAX                      4096
> > ARG_MAX                             32000

This isn't the real limit.

ARG_MAX has been chosen at one point to be 32000, because that's a safe
size for the Windows command line length.  Therefore this is a hard limit
if you start non-Cygwin executables.  Cygwin executables don't have this
limit.  In fact, the limit is defined only by the amount of memory the
parent process has available when creating the argv and environment lists
for the child.

We fixed that in git.  As a result, sysconf(_SC_ARG_MAX) will now return
-1.  I. e., ARG_MAX has an indeterminate limit:

  $ getconf -a | grep -E 'ARG_MAX'
  _POSIX_ARG_MAX                      4096
  ARG_MAX
  $ getconf ARG_MAX
  undefined

However! limits.h still defines ARG_MAX as 32000, and we'll stick to
this, on account it being a safe value.

This has a precedent on Linux, where getconf returns something big, but
ARG_MAX is still 131072:

  $ grep ARG_MAX /usr/include/linux/limits.h
  #define ARG_MAX       131072	/* # bytes of args + environ for exec() */
  $ getconf ARG_MAX
  2097152

The limits.h limit of 131072 is historical (32 pages for argv and envp).
The getconf value is a quarter of the stack which is reserved for argv
and envp.

I hope that explains things sufficiently.

The patches will be backported to 3.4.9.


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-08-29 13:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <e0fa509d-890a-2db7-c91f-2b1d904a9a1e@comcast.net>
2023-08-27 18:24 ` Fwd: gawk core dumped on too many input values Ed Morton
2023-08-27 23:07   ` Jeremy Hetzler
2023-08-28 11:47     ` Joshuah Hurst
2023-08-28 18:20       ` Brian Inglis
2023-08-29 13:03         ` Corinna Vinschen
2023-08-28  8:38   ` Fwd: " Takashi Yano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).