public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
@ 2015-07-05 21:48 Corinna Vinschen
  2015-07-06  2:15 ` Ken Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-05 21:48 UTC (permalink / raw)
  To: cygwin

Hi Cygwin friends and users,


I released another TEST version of Cygwin.  The version number is
2.1.0-0.4.

This test release needs some good testing!

While the changes are still mostly interesting for developers, the
under-the-hood changes will potentially impact existing applications.

I'd like to release 2.1.0-1 in about two weeks, if possible.

==================================== tl;dr ==================================

What's new:
-----------

- Handle pthread stacksizes as in GLibc:  Default to RLIMIT_STACK resource.
  Allow to set RLIMIT_STACK via setrlimit.  Default RLIMIT_STACK to value
  from executable header as described on
  https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774.aspx
  Default stacksize to 2 Megs in case RLIMIT_STACK is set to RLIM_INFINITY.

- First cut of an implementation to allow signal handlers running on an
  alternate signal stack.
  
- New API sigaltstack, plus definitions for SA_ONSTACK, SS_ONSTACK, SS_DISABLE,
  MINSIGSTKSZ, SIGSTKSZ.

- New API: sethostname.


What changed:
-------------


Bug Fixes
---------

- Enable non-SA_RESTART behaviour on threads other than main thread.
  Addresses: https://cygwin.com/ml/cygwin/2015-06/msg00260.html

- Try to handle concurrent close on socket more gracefully
  Addresses: https://cygwin.com/ml/cygwin/2015-06/msg00235.html

- Fix fork failing after the parent recovered from a stack overflow.
  Addresses: https://cygwin.com/ml/cygwin/2015-06/msg00384.html

- Fix a crash on 64 bit XP/2003 when opening /proc/$PID/maps.

============================================================================

Changes compared to the previous test release:

- getrlimit/setrlimit RLIMIT_STACK handling has been improved considerably.
  The old implementation returned wrong values and was generally useless.

  After some discussion with collegues working on glibc and the Linux
  kernel, I now implemented RLIMIT_STACK more closely aligned with
  the behaviour on Linux/glibc:

  - By default, getrlimit(RLIMIT_STACK) returns the value of the default
    stacksize taken from the Windows executable file header in rlim_cur,
    only slightly changed to account for guard pages.

  - setrlimit(RLIMIT_STACK) now works and stores the new values for later
    use.

  - The rlim_cur value is now used as default stacksize when creating
    pthreads.  If rlim_cur is RLIM_INFINITY, the fallback stacksize is
    set to 2 Megs.

- So far, if the application set the guardpage size via
  pthread_attr_setguardsize(), the thread stack was set up fully
  commited with a NOACCESS guardpage at the stack bottom.  This approach
  wasted physical memory and it failed to trigger normal stack overflow
  exceptions.

  Cygwin now sets up a Windows-typical stack with only few commited
  pages and movable guardpages.  However, this requires OS support if
  the guardpage area requested by pthread_attr_setguardsize() differs
  from the default OS guardpage size.
  
  This OS support is only available starting with Windows 2003 and 64
  bit XP.  32 bit XP will still use the fully commited stack setup in
  this case.  Another nail in XP's coffin...

- If the pthread stack is not provided by the application (which is
  unusual anyway), the newly created stack will use a Windows compatible
  guardpage setup reflecting the setting from pthread_attr_setguardsize(),
  or the default OS-specific guardpage size.  

- When running a signal handler on the alternate signal stack, the
  handler is now called via a wrapper function.  This wrapper function
  checkr if the SEGV was triggered by a STATUS_STACK_OVERFLOW.
  
  If so, it restores the last set of guard pages on the primary thread
  stack.  If we don't do that, and if the handler longjmps, the stack
  stays broken and another stack overflow exits the process immediately
  with no chance to recover.

  If the handler simply returns, the wrapper restores the "broken"
  stack state to allow accessing the stack, then the exception handling
  triggers a SIG_DFL action for SIGSEGV:  Create a stackdump and exit.

Implementation details:

- The alternate signal stack installed via sigaltstack is only valid for
  the current thread.  Each thread must call its own sigaltstack.  On
  pthread_create, the alternate signal stack setting of the calling
  thread is *not* propagated to the newly created thread.  This follows
  current Linux semantics.

- The alternate signal stack is a minimal stack.  Certain datastructures
  used by Cygwin (_cygtls area) and Windows (on 32 bit: exception
  records) are not copied over to the alternate signal stack.  The stack
  settings in the Thread Environment Block (TEB) are not reflecting the
  current alternate stack while running the signal handler.  The TEB
  will still point to the original thread stack.  This seems to work
  nicely in my testing, but there may be Windows functions which stop
  working in this scenario.

- The volatile registers and the original stack registers are stored at
  the base of the alternate stack.  If you screw this up while running
  the signal handler, your thread is doomed on return to the caller.

I'd be grateful if curious developers would give this new sigaltstack
implementation and the changed RLIMIT_STACK handling a whirl and report
back if it's working for them as desired/expected.  And if not, simple
reproducers in plain C are most welcome in this case.  Discussing
aspects of this implementation may be best handled on the
cygwin-developers mailing list or the #cygwin-developers IRC channel on
Freenode.


Have fun,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-05 21:48 [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4 Corinna Vinschen
@ 2015-07-06  2:15 ` Ken Brown
  2015-07-06 10:02   ` Corinna Vinschen
  0 siblings, 1 reply; 14+ messages in thread
From: Ken Brown @ 2015-07-06  2:15 UTC (permalink / raw)
  To: cygwin

On 7/5/2015 5:34 PM, Corinna Vinschen wrote:
> This test release needs some good testing!

I repeated the emacs experiment discussed in the "[ANNOUNCEMENT] TEST RELEASE: 
Cygwin 2.1.0-0.1" thread.  In the 32-bit case, the results were more-or-less the 
same as before: I forced a stack overflow, emacs recovered, I tried to continue 
working, there was a second SIGSEGV, and handle_sigsegv bailed out because 
garbage collection was in progress.  This time I was unable to prevent the 
second SIGSEGV by resetting max-specpdl-size and max-lisp-eval-depth.  I'm not 
sure what caused the second SIGSEGV, but it might have nothing to do with Cygwin.

In the 64-bit case, however, the recovery from stack overflow never happened 
(i.e., the program never reached the siglongjmp).  Here's a gdb session:

$ gdb ./emacs.exe
[...]
(gdb) b handle_sigsegv
Breakpoint 3 at 0x1005657b3: file ../../master/src/sysdep.c, line 1643.
(gdb) r -Q
Starting program: /home/kbrown/src/emacs/64build/src/emacs.exe -Q
[At this point I force stack overflow.]

Program received signal SIGSEGV, Segmentation fault.
0x000000010053b08b in builtin_lisp_symbol (index=290)
     at ../../master/src/lisp.h:1069
1069      return make_lisp_symbol (lispsym + index);
(gdb) c
Continuing.

Breakpoint 3, handle_sigsegv (sig=11,
     siginfo=0x100a3e190 <sigsegv_stack+65232>, arg=0x82de50)
     at ../../master/src/sysdep.c:1643
1643      if (!gc_in_progress)
(gdb) l
1638    static void
1639    handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
1640    {
1641      /* Hard GC error may lead to stack overflow caused by
1642         too nested calls to mark_object.  No way to survive.  */
1643      if (!gc_in_progress)
1644        {
1645          struct rlimit rlim;
1646
1647          if (!getrlimit (RLIMIT_STACK, &rlim))
(gdb)
1648            {
1649    #ifdef CYGWIN
1650              enum { STACK_DANGER_ZONE = 32 * 1024 };
1651    #else
1652              enum { STACK_DANGER_ZONE = 16 * 1024 };
1653    #endif
1654              char *beg, *end, *addr;
1655
1656              beg = stack_bottom;
1657              end = stack_bottom + stack_direction * rlim.rlim_cur;
(gdb)
1658              if (beg > end)
1659                addr = beg, beg = end, end = addr;
1660              addr = (char *) siginfo->si_addr;
1661              /* If we're somewhere on stack and too close to
1662                 one of its boundaries, most likely this is it.  */
1663              if (beg < addr && addr < end
1664                  && (addr - beg < STACK_DANGER_ZONE
1665                      || end - addr < STACK_DANGER_ZONE))
1666                siglongjmp (return_to_command_loop, 1);
1667            }
(gdb) n
1647          if (!getrlimit (RLIMIT_STACK, &rlim))
(gdb)
1656              beg = stack_bottom;
(gdb)
1657              end = stack_bottom + stack_direction * rlim.rlim_cur;
(gdb)
1658              if (beg > end)
(gdb)
1660              addr = (char *) siginfo->si_addr;
(gdb)
1663              if (beg < addr && addr < end
(gdb) p beg
$1 = 0x82ca27 ""
(gdb) p addr
$2 = 0x33ff8 ""

Note that addr < beg, so we never reach the siglongjmp.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06  2:15 ` Ken Brown
@ 2015-07-06 10:02   ` Corinna Vinschen
  2015-07-06 13:15     ` Ken Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-06 10:02 UTC (permalink / raw)
  To: cygwin


[-- Attachment #1.1: Type: text/plain, Size: 6454 bytes --]

Hi Ken,


thanks for further testing this.


On Jul  5 22:15, Ken Brown wrote:
> On 7/5/2015 5:34 PM, Corinna Vinschen wrote:
> >This test release needs some good testing!
> 
> I repeated the emacs experiment discussed in the "[ANNOUNCEMENT] TEST
> RELEASE: Cygwin 2.1.0-0.1" thread.  In the 32-bit case, the results were
> more-or-less the same as before: I forced a stack overflow, emacs recovered,
> I tried to continue working, there was a second SIGSEGV, and handle_sigsegv
> bailed out because garbage collection was in progress.  This time I was
> unable to prevent the second SIGSEGV by resetting max-specpdl-size and
> max-lisp-eval-depth.  I'm not sure what caused the second SIGSEGV, but it
> might have nothing to do with Cygwin.
> 
> In the 64-bit case, however, the recovery from stack overflow never happened
> (i.e., the program never reached the siglongjmp).  Here's a gdb session:
> [...]
> 1647          if (!getrlimit (RLIMIT_STACK, &rlim))
> (gdb)
> 1656              beg = stack_bottom;
> (gdb)
> 1657              end = stack_bottom + stack_direction * rlim.rlim_cur;
> (gdb)
> 1658              if (beg > end)
> (gdb)
> 1660              addr = (char *) siginfo->si_addr;
> (gdb)
> 1663              if (beg < addr && addr < end
> (gdb) p beg
> $1 = 0x82ca27 ""
> (gdb) p addr
> $2 = 0x33ff8 ""

I can't reproduce this.  It works fine for me.  For reference I attached
my simplified testcase again.   It's basically the emacs SIGSEGV setup,
main triggers the stack overflow, the handler tries to write a file for
testing if that works from the handler, then it siglongjmps.  The main
function tests if it can still fork, and then it repeats the action to
test if we're back to normal in terms of signal handling.

If it works (and it does for me) the output looks like this:

  $ ./sigalt
  command loop 1 before crash
  command loop 1 after crash
  In child
  In parent
  command loop 2 before crash
  command loop 2 after crash
  In child
  In parent

On W8.1 for a standard GCC build of this testcase I get:

  (gdb) p beg
  $1 = 0x40ac3 <error: Cannot access memory at address 0x40ac3>
  (gdb) p addr
  $2 = 0x43848 <error: Cannot access memory at address 0x43848>
  (gdb) p end
  $3 = 0x23cac3 ""
  (gdb) p/x rlim.rlim_cur
  $5 = 0x1fc000

Check default stacksize:

  )$ peflags -x ./sigalt
  ./sigalt: stack reserve size      : 2097152 (0x200000) bytes

  0x200000 - dead zone 4K - default W8.1 64 bit guardpagesize 3 * 4K ==
  0x1fc000, the value rlim.rlim_cur returns.  Looks good to me.

On W8.1 32 bit under WOW:

  (gdb) p beg
  $1 = 0x8fc33 ""
  (gdb) p addr
  $2 = 0x92d5c <error: Cannot access memory at address 0x92d5c>
  (gdb) p end
  $3 = 0x28cc33 ""
  (gdb) p/x rlim.rlim_cur
  $4 = 0x1fd000

  $ peflags -x ./sigalt
  ./sigalt: stack reserve size      : 2097152 (0x200000) bytes

  0x200000 - dead zone 4K - default W8.1 32 bit guardpagesize 2 * 4K ==
  0x1fd000.

On W7 32 bit native:

(gdb) p beg
$1 = 0x2ec43 "\376\356..."
(gdb) p addr
$2 = 0x32d6c ""
(gdb) p end
$3 = 0x22cc43 ""
(gdb) p rlim.rlim_cur
$4 = 2088960
(gdb) p/x rlim.rlim_cur
$5 = 0x1fe000

  $ peflags -x ./sigalt
  ./sigalt: stack reserve size      : 2097152 (0x200000) bytes

  0x200000 - dead zone 4K - default W7 32 bit guardpagesize 1 * 4K ==
  0x1fe000.

> Note that addr < beg, so we never reach the siglongjmp.

I have no explanation for this.  What OS?  What does rlim_cur contain?
What does peflags -x print for this executable?

And last but not least, what is emacs doing there?  The stack should be
pretty much in a good shape when it's back to the main loop.  The stack
is fully commited and has the default number of guardpages at the bottom,
as it is just short of the stack overflow.

For debugging purposes I also added a global variable called "tib" and a
memory info struct called "m" to the testcase which are initialized
right at the start of main.  tib points to the start of the TEB (Thread
Environment Block, a Windows per-thread bookkeeping structure) of the
main thread.  If you expand it right after it's fetched, you get
something along these lines:

  (gdb) p *tib
  $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x20c000,
    SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
    ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}

Note the values of StackBase and StackLimit and compare with your beg and
end values.  StackBase is the upper limit of the stack.  It grows downward
from there.  StackLimit is the lowest address as yet commited.  It's not much
yet as you can see, 0x230000-0x20c000 == 0x24000 == 144K.  Since Cygwin
executables have a default stack of 2 Megs, the allocation base of the stack
is probably at 0x30000.  This can be checked by looking at m:

  (gdb) p m
  $1 = {BaseAddress = 0x22c000, AllocationBase = 0x30000, AllocationProtect = 4,
    RegionSize = 16384, State = 4096, Protect = 4, Type = 131072}

See the value of AllocationBase.

When you hit the breakpoint in handle_sigsegv, the output of tib should
look like this:

  (gdb) p *tib
  $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x32000,
    SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
    ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}

Observe the value of StackLimit.  For this output I ran the testcase on
W7 32 bit.  It has a default guardpage of 4K.  The new wrapper I wrote
in Cygwin restored the stack to its state rifght before the stack overflow
occured:

  - At 0x30000 we have the 4K dead zone, which is always only reserved,
    never commited.

  - At 0x31000 the 4K guard page starts.

  - Thus the StackLimit (the start of the commited region of the stack)
    starts at 0x32000.

You can utilize tib and m for testing in emacs as well.  Just do this:

  #include <windows.h>

  NT_TIB *tib;
  MEMORY_BASIC_INFORMATION m;

  [...]

  in main:

  /* Record (approximately) where the stack begins.  */
  stack_bottom = &stack_bottom_variable;
  tib = (NT_TIB *) __readfsdword(PcTeb);
  VirtualQuery (stack_bottom, &m, sizeof m);

It would be nice to find out why this happens to your emacs...


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #1.2: sigalt.c --]
[-- Type: text/plain, Size: 2815 bytes --]

#include <alloca.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <setjmp.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/wait.h>
#include <sys/fcntl.h>
#include <windows.h>

int stack_direction;
char *stack_bottom;

sigjmp_buf return_to_command_loop;

NT_TIB *tib;
MEMORY_BASIC_INFORMATION m;

/* Attempt to recover from SIGSEGV caused by C stack overflow.  */
static void
handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
{
  struct rlimit rlim;

  int fp = open ("sigalt.out", O_CREAT | O_TRUNC | O_WRONLY, 0644);
  if (fp < 0)
    perror ("open");
  else
    {
      write (fp, "ping\n", 5);
      close (fp);
    }
  if (!getrlimit (RLIMIT_STACK, &rlim))
    {
      enum { STACK_DANGER_ZONE = 32 * 1024 };
      char *beg, *end, *addr;

      beg = stack_bottom;
      end = stack_bottom + stack_direction * rlim.rlim_cur;
      if (beg > end)
	addr = beg, beg = end, end = addr;
      addr = (char *) siginfo->si_addr;
      /* If we're somewhere on stack and too close to
	 one of its boundaries, most likely this is it.  */
      if (beg < addr && addr < end
	  && (addr - beg < STACK_DANGER_ZONE
	      || end - addr < STACK_DANGER_ZONE))
	siglongjmp (return_to_command_loop, 1);
    }
  /* Otherwise we can't do anything with this.  */
  //abort ();
}

static int
init_sigsegv (void)
{
  struct sigaction sa;
  stack_t ss;

  stack_direction = ((char *) &ss < stack_bottom) ? -1 : 1;
  ss.ss_sp = malloc (SIGSTKSZ);
  ss.ss_size = SIGSTKSZ;
  ss.ss_flags = 0;
  if (sigaltstack (&ss, NULL) < 0)
    return 0;
  sigfillset (&sa.sa_mask);
  sa.sa_sigaction = handle_sigsegv;
  sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
  return sigaction (SIGSEGV, &sa, NULL) < 0 ? 0 : 1;
}

void foo ()
{
  int buf[512];
  foo ();
}

int
main ()
{
  int status;
  char stack_bottom_variable;
  /* Record (approximately) where the stack begins.  */
  stack_bottom = &stack_bottom_variable;

  tib = (NT_TIB *) __readfsdword(PcTeb);
  VirtualQuery (stack_bottom, &m, sizeof m);

  init_sigsegv ();
  if (!sigsetjmp (return_to_command_loop, 1))
    {
      printf ("command loop 1 before crash\n");
      foo ();
    }
  else
    {
      printf ("command loop 1 after crash\n");
      switch (fork ())
      	{
	case -1:
	  perror ("fork");
	  break;
	case 0:
	  printf ("In child\n");
	  exit (0);
	default:
	  wait (&status);
	  printf ("In parent\n");
	  break;
	}
    }
  if (!sigsetjmp (return_to_command_loop, 1))
    {
      printf ("command loop 2 before crash\n");
      foo ();
    }
  else
    {
      printf ("command loop 2 after crash\n");
      switch (fork ())
      	{
	case -1:
	  perror ("fork");
	  break;
	case 0:
	  printf ("In child\n");
	  exit (0);
	default:
	  wait (&status);
	  printf ("In parent\n");
	  break;
	}
    }
  return 0;
}

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06 10:02   ` Corinna Vinschen
@ 2015-07-06 13:15     ` Ken Brown
  2015-07-06 13:33       ` Ken Brown
  2015-07-06 14:45       ` Corinna Vinschen
  0 siblings, 2 replies; 14+ messages in thread
From: Ken Brown @ 2015-07-06 13:15 UTC (permalink / raw)
  To: cygwin

Hi Corinna,

On 7/6/2015 6:01 AM, Corinna Vinschen wrote:
> Hi Ken,
>
>
> thanks for further testing this.
>
>
> On Jul  5 22:15, Ken Brown wrote:
>> On 7/5/2015 5:34 PM, Corinna Vinschen wrote:
>>> This test release needs some good testing!
>>
>> I repeated the emacs experiment discussed in the "[ANNOUNCEMENT] TEST
>> RELEASE: Cygwin 2.1.0-0.1" thread.  In the 32-bit case, the results were
>> more-or-less the same as before: I forced a stack overflow, emacs recovered,
>> I tried to continue working, there was a second SIGSEGV, and handle_sigsegv
>> bailed out because garbage collection was in progress.  This time I was
>> unable to prevent the second SIGSEGV by resetting max-specpdl-size and
>> max-lisp-eval-depth.  I'm not sure what caused the second SIGSEGV, but it
>> might have nothing to do with Cygwin.
>>
>> In the 64-bit case, however, the recovery from stack overflow never happened
>> (i.e., the program never reached the siglongjmp).  Here's a gdb session:
>> [...]
>> 1647          if (!getrlimit (RLIMIT_STACK, &rlim))
>> (gdb)
>> 1656              beg = stack_bottom;
>> (gdb)
>> 1657              end = stack_bottom + stack_direction * rlim.rlim_cur;
>> (gdb)
>> 1658              if (beg > end)
>> (gdb)
>> 1660              addr = (char *) siginfo->si_addr;
>> (gdb)
>> 1663              if (beg < addr && addr < end
>> (gdb) p beg
>> $1 = 0x82ca27 ""
>> (gdb) p addr
>> $2 = 0x33ff8 ""
>
> I can't reproduce this.  It works fine for me.  For reference I attached
> my simplified testcase again.   It's basically the emacs SIGSEGV setup,
> main triggers the stack overflow, the handler tries to write a file for
> testing if that works from the handler, then it siglongjmps.  The main
> function tests if it can still fork, and then it repeats the action to
> test if we're back to normal in terms of signal handling.
>
> If it works (and it does for me) the output looks like this:
>
>    $ ./sigalt
>    command loop 1 before crash
>    command loop 1 after crash
>    In child
>    In parent
>    command loop 2 before crash
>    command loop 2 after crash
>    In child
>    In parent
>
> On W8.1 for a standard GCC build of this testcase I get:
>
>    (gdb) p beg
>    $1 = 0x40ac3 <error: Cannot access memory at address 0x40ac3>
>    (gdb) p addr
>    $2 = 0x43848 <error: Cannot access memory at address 0x43848>
>    (gdb) p end
>    $3 = 0x23cac3 ""
>    (gdb) p/x rlim.rlim_cur
>    $5 = 0x1fc000
>
> Check default stacksize:
>
>    )$ peflags -x ./sigalt
>    ./sigalt: stack reserve size      : 2097152 (0x200000) bytes
>
>    0x200000 - dead zone 4K - default W8.1 64 bit guardpagesize 3 * 4K ==
>    0x1fc000, the value rlim.rlim_cur returns.  Looks good to me.
>
> On W8.1 32 bit under WOW:
>
>    (gdb) p beg
>    $1 = 0x8fc33 ""
>    (gdb) p addr
>    $2 = 0x92d5c <error: Cannot access memory at address 0x92d5c>
>    (gdb) p end
>    $3 = 0x28cc33 ""
>    (gdb) p/x rlim.rlim_cur
>    $4 = 0x1fd000
>
>    $ peflags -x ./sigalt
>    ./sigalt: stack reserve size      : 2097152 (0x200000) bytes
>
>    0x200000 - dead zone 4K - default W8.1 32 bit guardpagesize 2 * 4K ==
>    0x1fd000.
>
> On W7 32 bit native:
>
> (gdb) p beg
> $1 = 0x2ec43 "\376\356..."
> (gdb) p addr
> $2 = 0x32d6c ""
> (gdb) p end
> $3 = 0x22cc43 ""
> (gdb) p rlim.rlim_cur
> $4 = 2088960
> (gdb) p/x rlim.rlim_cur
> $5 = 0x1fe000
>
>    $ peflags -x ./sigalt
>    ./sigalt: stack reserve size      : 2097152 (0x200000) bytes
>
>    0x200000 - dead zone 4K - default W7 32 bit guardpagesize 1 * 4K ==
>    0x1fe000.
>
>> Note that addr < beg, so we never reach the siglongjmp.
>
> I have no explanation for this.  What OS?  What does rlim_cur contain?
> What does peflags -x print for this executable?

I'm on W7 64-bit.  The problem seems to be that rlim_cur is too big.

$ peflags -x ./emacs
./emacs: stack reserve size      : 8388608 (0x800000) bytes

(gdb) p beg
$3 = 0x82ca27 ""
(gdb) p/x rlim.rlim_cur
$2 = 0x850e80

So there's overflow when end is computed:

(gdb) p end
$4 = 0xfffffffffffdbba7 <error: Cannot access memory at address 0xfffffffffffdbba7>

This doesn't happen when I run your testcase with the same 8MB stack size:

$ peflags -x0x800000 ./sigalt.exe
./sigalt.exe: stack reserve size      : 8388608 (0x800000) bytes

(gdb) p beg
$1 = 0x82cabb ""
(gdb) p/x rlim.rlim_cur
$2 = 0x7fd000
(gdb) p end
$3 = 0x2fabb

> And last but not least, what is emacs doing there?  The stack should be
> pretty much in a good shape when it's back to the main loop.  The stack
> is fully commited and has the default number of guardpages at the bottom,
> as it is just short of the stack overflow.
>
> For debugging purposes I also added a global variable called "tib" and a
> memory info struct called "m" to the testcase which are initialized
> right at the start of main.  tib points to the start of the TEB (Thread
> Environment Block, a Windows per-thread bookkeeping structure) of the
> main thread.  If you expand it right after it's fetched, you get
> something along these lines:
>
>    (gdb) p *tib
>    $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x20c000,
>      SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
>      ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}
>
> Note the values of StackBase and StackLimit and compare with your beg and
> end values.  StackBase is the upper limit of the stack.  It grows downward
> from there.  StackLimit is the lowest address as yet commited.  It's not much
> yet as you can see, 0x230000-0x20c000 == 0x24000 == 144K.  Since Cygwin
> executables have a default stack of 2 Megs, the allocation base of the stack
> is probably at 0x30000.  This can be checked by looking at m:
>
>    (gdb) p m
>    $1 = {BaseAddress = 0x22c000, AllocationBase = 0x30000, AllocationProtect = 4,
>      RegionSize = 16384, State = 4096, Protect = 4, Type = 131072}
>
> See the value of AllocationBase.
>
> When you hit the breakpoint in handle_sigsegv, the output of tib should
> look like this:
>
>    (gdb) p *tib
>    $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x32000,
>      SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
>      ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}
>
> Observe the value of StackLimit.  For this output I ran the testcase on
> W7 32 bit.  It has a default guardpage of 4K.  The new wrapper I wrote
> in Cygwin restored the stack to its state rifght before the stack overflow
> occured:
>
>    - At 0x30000 we have the 4K dead zone, which is always only reserved,
>      never commited.
>
>    - At 0x31000 the 4K guard page starts.
>
>    - Thus the StackLimit (the start of the commited region of the stack)
>      starts at 0x32000.
>
> You can utilize tib and m for testing in emacs as well.  Just do this:
>
>    #include <windows.h>
>
>    NT_TIB *tib;
>    MEMORY_BASIC_INFORMATION m;
>
>    [...]
>
>    in main:
>
>    /* Record (approximately) where the stack begins.  */
>    stack_bottom = &stack_bottom_variable;
>    tib = (NT_TIB *) __readfsdword(PcTeb);
>    VirtualQuery (stack_bottom, &m, sizeof m);

I'll try this next and report back.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06 13:15     ` Ken Brown
@ 2015-07-06 13:33       ` Ken Brown
  2015-07-06 14:52         ` Corinna Vinschen
  2015-07-06 14:45       ` Corinna Vinschen
  1 sibling, 1 reply; 14+ messages in thread
From: Ken Brown @ 2015-07-06 13:33 UTC (permalink / raw)
  To: cygwin

On 7/6/2015 9:15 AM, Ken Brown wrote:
> Hi Corinna,
>
> On 7/6/2015 6:01 AM, Corinna Vinschen wrote:
>> Hi Ken,
>>
>>
>> thanks for further testing this.
>>
>>
>> On Jul  5 22:15, Ken Brown wrote:
>>> On 7/5/2015 5:34 PM, Corinna Vinschen wrote:
>>>> This test release needs some good testing!
>>>
>>> I repeated the emacs experiment discussed in the "[ANNOUNCEMENT] TEST
>>> RELEASE: Cygwin 2.1.0-0.1" thread.  In the 32-bit case, the results were
>>> more-or-less the same as before: I forced a stack overflow, emacs recovered,
>>> I tried to continue working, there was a second SIGSEGV, and handle_sigsegv
>>> bailed out because garbage collection was in progress.  This time I was
>>> unable to prevent the second SIGSEGV by resetting max-specpdl-size and
>>> max-lisp-eval-depth.  I'm not sure what caused the second SIGSEGV, but it
>>> might have nothing to do with Cygwin.
>>>
>>> In the 64-bit case, however, the recovery from stack overflow never happened
>>> (i.e., the program never reached the siglongjmp).  Here's a gdb session:
>>> [...]
>>> 1647          if (!getrlimit (RLIMIT_STACK, &rlim))
>>> (gdb)
>>> 1656              beg = stack_bottom;
>>> (gdb)
>>> 1657              end = stack_bottom + stack_direction * rlim.rlim_cur;
>>> (gdb)
>>> 1658              if (beg > end)
>>> (gdb)
>>> 1660              addr = (char *) siginfo->si_addr;
>>> (gdb)
>>> 1663              if (beg < addr && addr < end
>>> (gdb) p beg
>>> $1 = 0x82ca27 ""
>>> (gdb) p addr
>>> $2 = 0x33ff8 ""
>>
>> I can't reproduce this.  It works fine for me.  For reference I attached
>> my simplified testcase again.   It's basically the emacs SIGSEGV setup,
>> main triggers the stack overflow, the handler tries to write a file for
>> testing if that works from the handler, then it siglongjmps.  The main
>> function tests if it can still fork, and then it repeats the action to
>> test if we're back to normal in terms of signal handling.
>>
>> If it works (and it does for me) the output looks like this:
>>
>>    $ ./sigalt
>>    command loop 1 before crash
>>    command loop 1 after crash
>>    In child
>>    In parent
>>    command loop 2 before crash
>>    command loop 2 after crash
>>    In child
>>    In parent
>>
>> On W8.1 for a standard GCC build of this testcase I get:
>>
>>    (gdb) p beg
>>    $1 = 0x40ac3 <error: Cannot access memory at address 0x40ac3>
>>    (gdb) p addr
>>    $2 = 0x43848 <error: Cannot access memory at address 0x43848>
>>    (gdb) p end
>>    $3 = 0x23cac3 ""
>>    (gdb) p/x rlim.rlim_cur
>>    $5 = 0x1fc000
>>
>> Check default stacksize:
>>
>>    )$ peflags -x ./sigalt
>>    ./sigalt: stack reserve size      : 2097152 (0x200000) bytes
>>
>>    0x200000 - dead zone 4K - default W8.1 64 bit guardpagesize 3 * 4K ==
>>    0x1fc000, the value rlim.rlim_cur returns.  Looks good to me.
>>
>> On W8.1 32 bit under WOW:
>>
>>    (gdb) p beg
>>    $1 = 0x8fc33 ""
>>    (gdb) p addr
>>    $2 = 0x92d5c <error: Cannot access memory at address 0x92d5c>
>>    (gdb) p end
>>    $3 = 0x28cc33 ""
>>    (gdb) p/x rlim.rlim_cur
>>    $4 = 0x1fd000
>>
>>    $ peflags -x ./sigalt
>>    ./sigalt: stack reserve size      : 2097152 (0x200000) bytes
>>
>>    0x200000 - dead zone 4K - default W8.1 32 bit guardpagesize 2 * 4K ==
>>    0x1fd000.
>>
>> On W7 32 bit native:
>>
>> (gdb) p beg
>> $1 = 0x2ec43 "\376\356..."
>> (gdb) p addr
>> $2 = 0x32d6c ""
>> (gdb) p end
>> $3 = 0x22cc43 ""
>> (gdb) p rlim.rlim_cur
>> $4 = 2088960
>> (gdb) p/x rlim.rlim_cur
>> $5 = 0x1fe000
>>
>>    $ peflags -x ./sigalt
>>    ./sigalt: stack reserve size      : 2097152 (0x200000) bytes
>>
>>    0x200000 - dead zone 4K - default W7 32 bit guardpagesize 1 * 4K ==
>>    0x1fe000.
>>
>>> Note that addr < beg, so we never reach the siglongjmp.
>>
>> I have no explanation for this.  What OS?  What does rlim_cur contain?
>> What does peflags -x print for this executable?
>
> I'm on W7 64-bit.  The problem seems to be that rlim_cur is too big.
>
> $ peflags -x ./emacs
> ./emacs: stack reserve size      : 8388608 (0x800000) bytes
>
> (gdb) p beg
> $3 = 0x82ca27 ""
> (gdb) p/x rlim.rlim_cur
> $2 = 0x850e80
>
> So there's overflow when end is computed:
>
> (gdb) p end
> $4 = 0xfffffffffffdbba7 <error: Cannot access memory at address 0xfffffffffffdbba7>
>
> This doesn't happen when I run your testcase with the same 8MB stack size:
>
> $ peflags -x0x800000 ./sigalt.exe
> ./sigalt.exe: stack reserve size      : 8388608 (0x800000) bytes
>
> (gdb) p beg
> $1 = 0x82cabb ""
> (gdb) p/x rlim.rlim_cur
> $2 = 0x7fd000
> (gdb) p end
> $3 = 0x2fabb
>
>> And last but not least, what is emacs doing there?  The stack should be
>> pretty much in a good shape when it's back to the main loop.  The stack
>> is fully commited and has the default number of guardpages at the bottom,
>> as it is just short of the stack overflow.
>>
>> For debugging purposes I also added a global variable called "tib" and a
>> memory info struct called "m" to the testcase which are initialized
>> right at the start of main.  tib points to the start of the TEB (Thread
>> Environment Block, a Windows per-thread bookkeeping structure) of the
>> main thread.  If you expand it right after it's fetched, you get
>> something along these lines:
>>
>>    (gdb) p *tib
>>    $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x20c000,
>>      SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
>>      ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}
>>
>> Note the values of StackBase and StackLimit and compare with your beg and
>> end values.  StackBase is the upper limit of the stack.  It grows downward
>> from there.  StackLimit is the lowest address as yet commited.  It's not much
>> yet as you can see, 0x230000-0x20c000 == 0x24000 == 144K.  Since Cygwin
>> executables have a default stack of 2 Megs, the allocation base of the stack
>> is probably at 0x30000.  This can be checked by looking at m:
>>
>>    (gdb) p m
>>    $1 = {BaseAddress = 0x22c000, AllocationBase = 0x30000, AllocationProtect = 4,
>>      RegionSize = 16384, State = 4096, Protect = 4, Type = 131072}
>>
>> See the value of AllocationBase.
>>
>> When you hit the breakpoint in handle_sigsegv, the output of tib should
>> look like this:
>>
>>    (gdb) p *tib
>>    $2 = {ExceptionList = 0x22cd78, StackBase = 0x230000, StackLimit = 0x32000,
>>      SubSystemTib = 0x0, {FiberData = 0x1e00, Version = 7680},
>>      ArbitraryUserPointer = 0x0, Self = 0x7ffdf000}
>>
>> Observe the value of StackLimit.  For this output I ran the testcase on
>> W7 32 bit.  It has a default guardpage of 4K.  The new wrapper I wrote
>> in Cygwin restored the stack to its state rifght before the stack overflow
>> occured:
>>
>>    - At 0x30000 we have the 4K dead zone, which is always only reserved,
>>      never commited.
>>
>>    - At 0x31000 the 4K guard page starts.
>>
>>    - Thus the StackLimit (the start of the commited region of the stack)
>>      starts at 0x32000.
>>
>> You can utilize tib and m for testing in emacs as well.  Just do this:
>>
>>    #include <windows.h>
>>
>>    NT_TIB *tib;
>>    MEMORY_BASIC_INFORMATION m;
>>
>>    [...]
>>
>>    in main:
>>
>>    /* Record (approximately) where the stack begins.  */
>>    stack_bottom = &stack_bottom_variable;
>>    tib = (NT_TIB *) __readfsdword(PcTeb);
>>    VirtualQuery (stack_bottom, &m, sizeof m);
>
> I'll try this next and report back.

PcTeb seems to be defined only on x86.  What should I do on x86_64?

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06 13:15     ` Ken Brown
  2015-07-06 13:33       ` Ken Brown
@ 2015-07-06 14:45       ` Corinna Vinschen
  2015-07-06 15:55         ` Ken Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-06 14:45 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2578 bytes --]

Hi Ken,

On Jul  6 09:15, Ken Brown wrote:
> Hi Corinna,
> 
> On 7/6/2015 6:01 AM, Corinna Vinschen wrote:
> >On Jul  5 22:15, Ken Brown wrote:
> >
> >I have no explanation for this.  What OS?  What does rlim_cur contain?
> >What does peflags -x print for this executable?
> 
> I'm on W7 64-bit.  The problem seems to be that rlim_cur is too big.
> 
> $ peflags -x ./emacs
> ./emacs: stack reserve size      : 8388608 (0x800000) bytes
> 
> (gdb) p beg
> $3 = 0x82ca27 ""
> (gdb) p/x rlim.rlim_cur
> $2 = 0x850e80

Does emacs call setrlimit by any chance?  Otherwise, rlim_cur should be
set to

  0x800000 - 0x1000 (4K dead zone) - 0x2000 (8K guard page on W7 64)
  == 0x7fd000.

> So there's overflow when end is computed:
> 
> (gdb) p end
> $4 = 0xfffffffffffdbba7 <error: Cannot access memory at address 0xfffffffffffdbba7>
> 
> This doesn't happen when I run your testcase with the same 8MB stack size:
> 
> $ peflags -x0x800000 ./sigalt.exe
> ./sigalt.exe: stack reserve size      : 8388608 (0x800000) bytes
> 
> (gdb) p beg
> $1 = 0x82cabb ""
> (gdb) p/x rlim.rlim_cur
> $2 = 0x7fd000

... like this.

Btw., *if* emacs calls setrlimit and then expects getrlimit to return
the *actual* size of the stack, rather than expecting that rlim_cur is
just a default value when setting up stacks, it's really doing something
borderline.

There's simply *no* guarantee that a stack can be extended to this size.
Any mmap() call could disallow growing the stack beyond its initial
size.  Worse, on Linux you can even mmap so that the stack doesn't
grow to the supposed initial maximum size at all.  The reason is that
Linux doesn't know the concept of "reserved" virtual memory, but the
stack is initially not commited in full either.

If you want to know how big your current stack *actually* is, you can
utilize pthread_getattr_np on Linux and Cygwin, like this:

#include <pthread.h>

  static void
  handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
  {
    pthread_attr_t attr;
    size_t stacksize;

    if (!pthread_getattr_np (pthread_self (), &attr)
	&& !pthread_attr_getstacksize (&attr, &stacksize))
      {
	beg = stack_bottom;
	end = stack_bottom + stack_direction * stacksize;

	[...]

Unfortunately this is non-portable as well, as the trailing _np denotes,
but at least there *is* a reliable method on Linux and Cygwin...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06 13:33       ` Ken Brown
@ 2015-07-06 14:52         ` Corinna Vinschen
  0 siblings, 0 replies; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-06 14:52 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

On Jul  6 09:32, Ken Brown wrote:
> On 7/6/2015 9:15 AM, Ken Brown wrote:
> >On 7/6/2015 6:01 AM, Corinna Vinschen wrote:
> >>You can utilize tib and m for testing in emacs as well.  Just do this:
> >>
> >>   #include <windows.h>
> >>
> >>   NT_TIB *tib;
> >>   MEMORY_BASIC_INFORMATION m;
> >>
> >>   [...]
> >>
> >>   in main:
> >>
> >>   /* Record (approximately) where the stack begins.  */
> >>   stack_bottom = &stack_bottom_variable;
> >>   tib = (NT_TIB *) __readfsdword(PcTeb);
> >>   VirtualQuery (stack_bottom, &m, sizeof m);
> >
> >I'll try this next and report back.
> 
> PcTeb seems to be defined only on x86.  What should I do on x86_64?

Oh, sorry, I forgot.  In theory you should call

  tib = (NT_TIB *) NtCurrentTeb ();

but there's a problem in the way this inline function is defined
which makes it unusable when not optimizing the code, at least
on 32 bit.  The above NtCurrentTeb works fine on x86_64, afaics.

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06 14:45       ` Corinna Vinschen
@ 2015-07-06 15:55         ` Ken Brown
  2015-07-06 16:34           ` Corinna Vinschen
  0 siblings, 1 reply; 14+ messages in thread
From: Ken Brown @ 2015-07-06 15:55 UTC (permalink / raw)
  To: cygwin

On 7/6/2015 10:45 AM, Corinna Vinschen wrote:
> Does emacs call setrlimit by any chance?

Yes, that's the problem.  The initialization code contains essentially the 
following:

if (!getrlimit (RLIMIT_STACK, &rlim))
     {
       long newlim;
       /* Approximate the amount regex.c needs per unit of re_max_failures.  */
       int ratio = 20 * sizeof (char *);
       /* Then add 33% to cover the size of the smaller stacks that regex.c
	 successively allocates and discards, on its way to the maximum.  */
       ratio += ratio / 3;
       /* Add in some extra to cover
	 what we're likely to use for other reasons.  */
       newlim = re_max_failures * ratio + 200000;
       if (newlim > rlim.rlim_max)
	{
	  newlim = rlim.rlim_max;
	  /* Don't let regex.c overflow the stack we have.  */
	  re_max_failures = (newlim - 200000) / ratio;
	}
       if (rlim.rlim_cur < newlim)
	rlim.rlim_cur = newlim;

       setrlimit (RLIMIT_STACK, &rlim);
     }

If I disable that code, the problem goes away: rlim_cur is set to the expected 
0x7fd000 in handle_sigsegv, and emacs recovers from the stack overflow.

I think I probably should disable that code on Cygwin anyway, because there's 
simply no need for it.  Some time ago I discovered that the default 2MB stack 
size was not big enough for emacs on Cygwin, and I made emacs use 8MB instead. 
So there's no need to enlarge it further.

> Btw., *if* emacs calls setrlimit and then expects getrlimit to return
> the *actual* size of the stack, rather than expecting that rlim_cur is
> just a default value when setting up stacks, it's really doing something
> borderline.
>
> There's simply *no* guarantee that a stack can be extended to this size.
> Any mmap() call could disallow growing the stack beyond its initial
> size.  Worse, on Linux you can even mmap so that the stack doesn't
> grow to the supposed initial maximum size at all.  The reason is that
> Linux doesn't know the concept of "reserved" virtual memory, but the
> stack is initially not commited in full either.
>
> If you want to know how big your current stack *actually* is, you can
> utilize pthread_getattr_np on Linux and Cygwin, like this:
>
> #include <pthread.h>
>
>    static void
>    handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
>    {
>      pthread_attr_t attr;
>      size_t stacksize;
>
>      if (!pthread_getattr_np (pthread_self (), &attr)
> 	&& !pthread_attr_getstacksize (&attr, &stacksize))
>        {
> 	beg = stack_bottom;
> 	end = stack_bottom + stack_direction * stacksize;
>
> 	[...]
>
> Unfortunately this is non-portable as well, as the trailing _np denotes,
> but at least there *is* a reliable method on Linux and Cygwin...

Thanks.  That fixes the problem too, even with the call to setrlimit left in. 
I'll report this to the emacs developers.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06 15:55         ` Ken Brown
@ 2015-07-06 16:34           ` Corinna Vinschen
  2015-07-07 15:49             ` Corinna Vinschen
  0 siblings, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-06 16:34 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3417 bytes --]

On Jul  6 11:54, Ken Brown wrote:
> On 7/6/2015 10:45 AM, Corinna Vinschen wrote:
> >Does emacs call setrlimit by any chance?
> 
> Yes, that's the problem.  The initialization code contains essentially the
> following:
> 
> if (!getrlimit (RLIMIT_STACK, &rlim))
>     {
>       long newlim;
>       /* Approximate the amount regex.c needs per unit of re_max_failures.  */
>       int ratio = 20 * sizeof (char *);
>       /* Then add 33% to cover the size of the smaller stacks that regex.c
> 	 successively allocates and discards, on its way to the maximum.  */
>       ratio += ratio / 3;
>       /* Add in some extra to cover
> 	 what we're likely to use for other reasons.  */
>       newlim = re_max_failures * ratio + 200000;
>       if (newlim > rlim.rlim_max)
> 	{
> 	  newlim = rlim.rlim_max;
> 	  /* Don't let regex.c overflow the stack we have.  */
> 	  re_max_failures = (newlim - 200000) / ratio;
> 	}
>       if (rlim.rlim_cur < newlim)
> 	rlim.rlim_cur = newlim;
> 
>       setrlimit (RLIMIT_STACK, &rlim);
>     }

Ok.

> If I disable that code, the problem goes away: rlim_cur is set to the
> expected 0x7fd000 in handle_sigsegv, and emacs recovers from the stack
> overflow.

:)))

> I think I probably should disable that code on Cygwin anyway, because
> there's simply no need for it.  Some time ago I discovered that the default
> 2MB stack size was not big enough for emacs on Cygwin, and I made emacs use
> 8MB instead. So there's no need to enlarge it further.

Yes, that probably makes sense.  The computed expression above has
another problem on Windows:  The stacksize is always a multiple of 64K
due to the dreaded allocation granularity.

> >Btw., *if* emacs calls setrlimit and then expects getrlimit to return
> >the *actual* size of the stack, rather than expecting that rlim_cur is
> >just a default value when setting up stacks, it's really doing something
> >borderline.
> >
> >There's simply *no* guarantee that a stack can be extended to this size.
> >Any mmap() call could disallow growing the stack beyond its initial
> >size.  Worse, on Linux you can even mmap so that the stack doesn't
> >grow to the supposed initial maximum size at all.  The reason is that
> >Linux doesn't know the concept of "reserved" virtual memory, but the
> >stack is initially not commited in full either.
> >
> >If you want to know how big your current stack *actually* is, you can
> >utilize pthread_getattr_np on Linux and Cygwin, like this:
> >
> >#include <pthread.h>
> >
> >   static void
> >   handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
> >   {
> >     pthread_attr_t attr;
> >     size_t stacksize;
> >
> >     if (!pthread_getattr_np (pthread_self (), &attr)
> >	&& !pthread_attr_getstacksize (&attr, &stacksize))
> >       {
> >	beg = stack_bottom;
> >	end = stack_bottom + stack_direction * stacksize;
> >
> >	[...]
> >
> >Unfortunately this is non-portable as well, as the trailing _np denotes,
> >but at least there *is* a reliable method on Linux and Cygwin...
> 
> Thanks.  That fixes the problem too, even with the call to setrlimit left
> in. I'll report this to the emacs developers.

Excellent, thanks for testing this!


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-06 16:34           ` Corinna Vinschen
@ 2015-07-07 15:49             ` Corinna Vinschen
  2015-07-07 18:05               ` Ken Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-07 15:49 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3171 bytes --]

On Jul  6 18:34, Corinna Vinschen wrote:
> On Jul  6 11:54, Ken Brown wrote:
> > On 7/6/2015 10:45 AM, Corinna Vinschen wrote:
> > >If you want to know how big your current stack *actually* is, you can
> > >utilize pthread_getattr_np on Linux and Cygwin, like this:
> > >
> > >#include <pthread.h>
> > >
> > >   static void
> > >   handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
> > >   {
> > >     pthread_attr_t attr;
> > >     size_t stacksize;
> > >
> > >     if (!pthread_getattr_np (pthread_self (), &attr)
> > >	&& !pthread_attr_getstacksize (&attr, &stacksize))
> > >       {
> > >	beg = stack_bottom;
> > >	end = stack_bottom + stack_direction * stacksize;
> > >
> > >	[...]
> > >
> > >Unfortunately this is non-portable as well, as the trailing _np denotes,
> > >but at least there *is* a reliable method on Linux and Cygwin...
> > 
> > Thanks.  That fixes the problem too, even with the call to setrlimit left
> > in. I'll report this to the emacs developers.
> 
> Excellent, thanks for testing this!

Uh oh.  We have a problem there.  This only worked accidentally, at least
on x86_64.  What happens is that pthread_getattr_np checks the validity
of the "attr" parameter and while doing so it may (validly) raise a SEGV.

Usually this SEGV is catched by a special SEH handler in Cygwin, which
is used to implement __try/__except blocks in Cygwin.  The validity
check returns the matching information "object uninitialized" to the
caller.

Not so here.  Since we're still in exception handling while running the
signal handler, another nested SEGV makes the OS kill the process without
calling any SEH exception handler on the way.

The problem is, there doesn't seem to be an elegant way around that on
x86_64.  From the application perspective you can just initialize the
pthread_attr_t to 0, as in

  pthread_attr_t attr = { 0 };

but that's ... unusual.  It's so unusual that nobody will ever think of
it.  The other way to "fix" this in the application itself is to call
pthread_getattr_np in the main() function, which works because we're not
running in the context of the exception handler.

The only solution inside Cygwin I found so far is this:

  Every myfault setup will have to capture the current thread context
  and set up a vectored continuation handler.  This handler will be
  called if no other exception handler feels responsible for an
  exception.  Fortunately it's called even while another exception is
  still handled.  The vectored handler then restores the thread context,
  just with tweaked instruction pointer.

What bugs me with this solution is not only that it looks rather
hackish, but also that it comes with a performance hit.  The fact
that every __try/__except block has to call RtlCaptureContext is
not exactly free of charge...

As you might have noticed, this has nothing to do with the alternate
stack.  It's just YA problem which cropped up during this testphase.


Oh well,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-07 15:49             ` Corinna Vinschen
@ 2015-07-07 18:05               ` Ken Brown
  2015-07-07 18:49                 ` Corinna Vinschen
  0 siblings, 1 reply; 14+ messages in thread
From: Ken Brown @ 2015-07-07 18:05 UTC (permalink / raw)
  To: cygwin

On 7/7/2015 11:49 AM, Corinna Vinschen wrote:
> On Jul  6 18:34, Corinna Vinschen wrote:
>> On Jul  6 11:54, Ken Brown wrote:
>>> On 7/6/2015 10:45 AM, Corinna Vinschen wrote:
>>>> If you want to know how big your current stack *actually* is, you can
>>>> utilize pthread_getattr_np on Linux and Cygwin, like this:
>>>>
>>>> #include <pthread.h>
>>>>
>>>>    static void
>>>>    handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
>>>>    {
>>>>      pthread_attr_t attr;
>>>>      size_t stacksize;
>>>>
>>>>      if (!pthread_getattr_np (pthread_self (), &attr)
>>>> 	&& !pthread_attr_getstacksize (&attr, &stacksize))
>>>>        {
>>>> 	beg = stack_bottom;
>>>> 	end = stack_bottom + stack_direction * stacksize;
>>>>
>>>> 	[...]
>>>>
>>>> Unfortunately this is non-portable as well, as the trailing _np denotes,
>>>> but at least there *is* a reliable method on Linux and Cygwin...
>>>
>>> Thanks.  That fixes the problem too, even with the call to setrlimit left
>>> in. I'll report this to the emacs developers.
>>
>> Excellent, thanks for testing this!
>
> Uh oh.  We have a problem there.  This only worked accidentally, at least
> on x86_64.  What happens is that pthread_getattr_np checks the validity
> of the "attr" parameter and while doing so it may (validly) raise a SEGV.

Yes, I discovered that too.  I was just about to send off an emacs bug 
report and patch, but then I decided to test it once more and got the SEGV.

> Usually this SEGV is catched by a special SEH handler in Cygwin, which
> is used to implement __try/__except blocks in Cygwin.  The validity
> check returns the matching information "object uninitialized" to the
> caller.
>
> Not so here.  Since we're still in exception handling while running the
> signal handler, another nested SEGV makes the OS kill the process without
> calling any SEH exception handler on the way.
>
> The problem is, there doesn't seem to be an elegant way around that on
> x86_64.  From the application perspective you can just initialize the
> pthread_attr_t to 0, as in
>
>    pthread_attr_t attr = { 0 };
>
> but that's ... unusual.  It's so unusual that nobody will ever think of
> it.  The other way to "fix" this in the application itself is to call
> pthread_getattr_np in the main() function, which works because we're not
> running in the context of the exception handler.
>
> The only solution inside Cygwin I found so far is this:
>
>    Every myfault setup will have to capture the current thread context
>    and set up a vectored continuation handler.  This handler will be
>    called if no other exception handler feels responsible for an
>    exception.  Fortunately it's called even while another exception is
>    still handled.  The vectored handler then restores the thread context,
>    just with tweaked instruction pointer.
>
> What bugs me with this solution is not only that it looks rather
> hackish, but also that it comes with a performance hit.  The fact
> that every __try/__except block has to call RtlCaptureContext is
> not exactly free of charge...
>
> As you might have noticed, this has nothing to do with the alternate
> stack.  It's just YA problem which cropped up during this testphase.

Yep.  But the good news is that the alternate stack is working.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-07 18:05               ` Ken Brown
@ 2015-07-07 18:49                 ` Corinna Vinschen
  2015-07-07 19:37                   ` Ken Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-07 18:49 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2543 bytes --]

On Jul  7 14:05, Ken Brown wrote:
> On 7/7/2015 11:49 AM, Corinna Vinschen wrote:
> >On Jul  6 18:34, Corinna Vinschen wrote:
> >>On Jul  6 11:54, Ken Brown wrote:
> >>>On 7/6/2015 10:45 AM, Corinna Vinschen wrote:
> >>>>If you want to know how big your current stack *actually* is, you can
> >>>>utilize pthread_getattr_np on Linux and Cygwin, like this:
> >>>>
> >>>>#include <pthread.h>
> >>>>
> >>>>   static void
> >>>>   handle_sigsegv (int sig, siginfo_t *siginfo, void *arg)
> >>>>   {
> >>>>     pthread_attr_t attr;
> >>>>     size_t stacksize;
> >>>>
> >>>>     if (!pthread_getattr_np (pthread_self (), &attr)
> >>>>	&& !pthread_attr_getstacksize (&attr, &stacksize))
> >>>>       {
> >>>>	beg = stack_bottom;
> >>>>	end = stack_bottom + stack_direction * stacksize;
> >>>>
> >>>>	[...]
> >>>>
> >>>>Unfortunately this is non-portable as well, as the trailing _np denotes,
> >>>>but at least there *is* a reliable method on Linux and Cygwin...
> >>>
> >>>Thanks.  That fixes the problem too, even with the call to setrlimit left
> >>>in. I'll report this to the emacs developers.
> >>
> >>Excellent, thanks for testing this!
> >
> >Uh oh.  We have a problem there.  This only worked accidentally, at least
> >on x86_64.  What happens is that pthread_getattr_np checks the validity
> >of the "attr" parameter and while doing so it may (validly) raise a SEGV.
> 
> Yes, I discovered that too.  I was just about to send off an emacs bug
> report and patch, but then I decided to test it once more and got the SEGV.
> [...]
> >As you might have noticed, this has nothing to do with the alternate
> >stack.  It's just YA problem which cropped up during this testphase.
> 
> Yep.  But the good news is that the alternate stack is working.

I spoke too soon.  This *is* a result of the alternate stack handling.
When the exception occurs while running on the alternate stack, the OS
exception handler checks if the stack pointer is valid, and since it's
not in the stackarea as stored in the TEB, it treats the stack as
corrupted.  That's why it stops calling the SEHs.

In the meantime I found a workaround for this problem with only a very
marginal performance hit.  I applied it to the repo and I'm just in the
process of creatsing new snapshots.  If the snapshots work for you I
create another test release.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-07 18:49                 ` Corinna Vinschen
@ 2015-07-07 19:37                   ` Ken Brown
  2015-07-08  8:16                     ` Corinna Vinschen
  0 siblings, 1 reply; 14+ messages in thread
From: Ken Brown @ 2015-07-07 19:37 UTC (permalink / raw)
  To: cygwin

On 7/7/2015 2:49 PM, Corinna Vinschen wrote:
> I spoke too soon.  This *is* a result of the alternate stack handling.
> When the exception occurs while running on the alternate stack, the OS
> exception handler checks if the stack pointer is valid, and since it's
> not in the stackarea as stored in the TEB, it treats the stack as
> corrupted.  That's why it stops calling the SEHs.
>
> In the meantime I found a workaround for this problem with only a very
> marginal performance hit.  I applied it to the repo and I'm just in the
> process of creatsing new snapshots.  If the snapshots work for you I
> create another test release.

They work for me.  I guess I can go ahead and file that emacs bug 
report.  Thanks.

Ken


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4
  2015-07-07 19:37                   ` Ken Brown
@ 2015-07-08  8:16                     ` Corinna Vinschen
  0 siblings, 0 replies; 14+ messages in thread
From: Corinna Vinschen @ 2015-07-08  8:16 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

On Jul  7 15:37, Ken Brown wrote:
> On 7/7/2015 2:49 PM, Corinna Vinschen wrote:
> >I spoke too soon.  This *is* a result of the alternate stack handling.
> >When the exception occurs while running on the alternate stack, the OS
> >exception handler checks if the stack pointer is valid, and since it's
> >not in the stackarea as stored in the TEB, it treats the stack as
> >corrupted.  That's why it stops calling the SEHs.
> >
> >In the meantime I found a workaround for this problem with only a very
> >marginal performance hit.  I applied it to the repo and I'm just in the
> >process of creatsing new snapshots.  If the snapshots work for you I
> >create another test release.
> 
> They work for me.  I guess I can go ahead and file that emacs bug report.
> Thanks.

Thank you.  Btw., if you want to be really sure that pthread_getattr_np
recovers from an uninitialized or invalid pthread_attr_t argument, just
initialize it like this:

  pthread_attr_t attr = { 42 };

This gives a compile time warning, but it makes sure that the internal
validity check SEGVs and the exception handling kicks in.

I'll release a new test version today.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-07-08  8:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-05 21:48 [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.4 Corinna Vinschen
2015-07-06  2:15 ` Ken Brown
2015-07-06 10:02   ` Corinna Vinschen
2015-07-06 13:15     ` Ken Brown
2015-07-06 13:33       ` Ken Brown
2015-07-06 14:52         ` Corinna Vinschen
2015-07-06 14:45       ` Corinna Vinschen
2015-07-06 15:55         ` Ken Brown
2015-07-06 16:34           ` Corinna Vinschen
2015-07-07 15:49             ` Corinna Vinschen
2015-07-07 18:05               ` Ken Brown
2015-07-07 18:49                 ` Corinna Vinschen
2015-07-07 19:37                   ` Ken Brown
2015-07-08  8:16                     ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).