public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Debugging malloc crash in gdb
@ 2022-10-18 10:35 David Allsopp
  2022-10-18 19:08 ` Jon Turney
  0 siblings, 1 reply; 8+ messages in thread
From: David Allsopp @ 2022-10-18 10:35 UTC (permalink / raw)
  To: cygwin

I'm wondering if I may be able to have some pointers for debugging what
seems to be an unexpected interaction between mmap/mprotect/munmap and
malloc with the OCaml runtime.

At the moment, I know that we crash in malloc, so my main question is how to
go further in gdb. I installed the cygwin-debuginfo package, but all I'm
getting is:

 
/cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550
: internal-error: void resume_1(gdb_signal): Assertion
`pc_in_thread_step_range (pc, tp)' failed.

The reproduction case is below (it's the OCaml runtime, so it's not exactly
minimal, but it seems to be very repeatable to get gdb to the position of
the crash).

In terms of memory, what OCaml is doing:

- At startup, 256M of address space is reserved (with mmap) for garbage
collected minor heaps ("minor arena")
- The first 2M of this is "committed" with mprotect for use by the program's
main thread
- The program then instructs the runtime to double the size of the minor
arena
- The 2M portion is "decommitted" with mprotect
- The 256M mmap'd region is munmap'd
- A new 512M region of address space is reserved
- The first 4M of this is "committed" with mprotect for use by the program's
main thread
- The program performs some assertion checks
- Book-keeping at the end of this causes malloc to be called, which
segfaults.

The crashing call to malloc is the first call to malloc since the 256M ->
512M munmap/map dance.

If the call to caml_mem_unmap at the end of unreserve_minor_heaps in
runtime/domain.c is omitted, then this program succeeds - i.e. malloc does
not appear to crash if the 256M region is left mapped. Obviously, I realise
this may well be unrelated to what's going wrong.

Any assistance to debug this further hugely appreciated!

Thanks,


David


---
Full repro instructions

Cygwin packages required: gcc-core, make, flexdll

Build:
  git clone https://github.com/dra27/ocaml -b restore-cygwin-break --depth 1
  cd ocaml
  ./configure --disable-native-compiler --disable-debugger
--disable-ocamldoc && make -j
  runtime/ocamlrun.exe ./ocamlc.exe -nostdlib -I stdlib
testsuite/tests/regression/pr9326/gc_set.ml -o gc_set.byte.exe

Crash:
  runtime/ocamlrun.exe ./gc_set.byte.exe

Debug:
  OCAMLRUNPARAM=v=0x1FFF gdb runtime/ocamlrun.exe
  break caml_gc_get
  run ./gc_set.byte.exe
  continue
  break alloc_generic_table
  continue
  break caml_stat_alloc_noexc
  continue
  step
  step
  step
  *boom*


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Debugging malloc crash in gdb
  2022-10-18 10:35 Debugging malloc crash in gdb David Allsopp
@ 2022-10-18 19:08 ` Jon Turney
  2022-10-19  6:20   ` Ariel Burbaickij
  2022-10-20  8:22   ` David Allsopp
  0 siblings, 2 replies; 8+ messages in thread
From: Jon Turney @ 2022-10-18 19:08 UTC (permalink / raw)
  To: David Allsopp, The Cygwin Mailing List

On 18/10/2022 11:35, David Allsopp wrote:
> I'm wondering if I may be able to have some pointers for debugging what
> seems to be an unexpected interaction between mmap/mprotect/munmap and
> malloc with the OCaml runtime.
> 
> At the moment, I know that we crash in malloc, so my main question is how to
> go further in gdb. I installed the cygwin-debuginfo package, but all I'm
> getting is:

Firstly, if the crash is inside the cygwin DLL, you must follow the 
advice in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on 
an exception inside cygwin itself.

[1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin


> /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550
> : internal-error: void resume_1(gdb_signal): Assertion
> `pc_in_thread_step_range (pc, tp)' failed.

This looks similar to the gdb crash reported [2], which I just don't 
have any time to look into.

[2] https://cygwin.com/pipermail/cygwin/2022-June/251714.html

I'd suggest reporting this as directed in 
https://www.sourceware.org/gdb/bugs/

(Note that self-service account creation is disabled on the sourceware 
bugzilla, due to spam problems, so you need to mail overseers as 
directed there, to request a Sourceware Bugzilla account.)

> The reproduction case is below (it's the OCaml runtime, so it's not exactly
> minimal, but it seems to be very repeatable to get gdb to the position of
> the crash).
> 
[...]
> 
> Any assistance to debug this further hugely appreciated!

It might be worth exploring if this gdb crash is seen in older versions 
of gcc, or with older gcc...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Debugging malloc crash in gdb
  2022-10-18 19:08 ` Jon Turney
@ 2022-10-19  6:20   ` Ariel Burbaickij
  2022-11-02 12:38     ` Jon Turney
  2022-10-20  8:22   ` David Allsopp
  1 sibling, 1 reply; 8+ messages in thread
From: Ariel Burbaickij @ 2022-10-19  6:20 UTC (permalink / raw)
  To: Jon Turney; +Cc: David Allsopp, The Cygwin Mailing List

[-- Attachment #1: Type: text/plain, Size: 2113 bytes --]

Hello all,
I reported it already, of course as it happened to me but alas no reaction
so far.

Kind Regards
Ariel Burbaickij

On Tuesday, October 18, 2022, Jon Turney <jon.turney@dronecode.org.uk>
wrote:

> On 18/10/2022 11:35, David Allsopp wrote:
>
>> I'm wondering if I may be able to have some pointers for debugging what
>> seems to be an unexpected interaction between mmap/mprotect/munmap and
>> malloc with the OCaml runtime.
>>
>> At the moment, I know that we crash in malloc, so my main question is how
>> to
>> go further in gdb. I installed the cygwin-debuginfo package, but all I'm
>> getting is:
>>
>
> Firstly, if the crash is inside the cygwin DLL, you must follow the advice
> in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on an
> exception inside cygwin itself.
>
> [1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin
>
>
> /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/
>> gdb/infrun.c:2550
>> : internal-error: void resume_1(gdb_signal): Assertion
>> `pc_in_thread_step_range (pc, tp)' failed.
>>
>
> This looks similar to the gdb crash reported [2], which I just don't have
> any time to look into.
>
> [2] https://cygwin.com/pipermail/cygwin/2022-June/251714.html
>
> I'd suggest reporting this as directed in https://www.sourceware.org/gdb
> /bugs/
>
> (Note that self-service account creation is disabled on the sourceware
> bugzilla, due to spam problems, so you need to mail overseers as directed
> there, to request a Sourceware Bugzilla account.)
>
> The reproduction case is below (it's the OCaml runtime, so it's not exactly
>> minimal, but it seems to be very repeatable to get gdb to the position of
>> the crash).
>>
>> [...]
>
>>
>> Any assistance to debug this further hugely appreciated!
>>
>
> It might be worth exploring if this gdb crash is seen in older versions of
> gcc, or with older gcc...
>
>
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Debugging malloc crash in gdb
  2022-10-18 19:08 ` Jon Turney
  2022-10-19  6:20   ` Ariel Burbaickij
@ 2022-10-20  8:22   ` David Allsopp
  2022-10-20  9:38     ` Ariel Burbaickij
  2022-11-02 12:38     ` Jon Turney
  1 sibling, 2 replies; 8+ messages in thread
From: David Allsopp @ 2022-10-20  8:22 UTC (permalink / raw)
  To: Jon Turney; +Cc: The Cygwin Mailing List

On Tue, 18 Oct 2022 at 20:09, Jon Turney wrote:
>
> On 18/10/2022 11:35, David Allsopp wrote:
> > I'm wondering if I may be able to have some pointers for debugging what
> > seems to be an unexpected interaction between mmap/mprotect/munmap and
> > malloc with the OCaml runtime.
> >
> > At the moment, I know that we crash in malloc, so my main question is how to
> > go further in gdb. I installed the cygwin-debuginfo package, but all I'm
> > getting is:
>
> Firstly, if the crash is inside the cygwin DLL, you must follow the
> advice in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on
> an exception inside cygwin itself.
>
> [1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin
>
>
> > /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550
> > : internal-error: void resume_1(gdb_signal): Assertion
> > `pc_in_thread_step_range (pc, tp)' failed.

I'm not sure now which combination of stepping directly into the
malloc call, adding set cygwin-exceptions on or switching to gdb 12.1,
but either way I was able to get to an invalid memory access in
mmap_alloc in malloc.cc. At this point, p was a pointer to the start
of the 256M block which had been passed to munmap.

What I then noticed from that is a bug in our code - the mmap'd region
was actually 256M+64K but the size passed to munmap was 256M... so the
munmap call was not releasing the entire block. Fixing that on the
OCaml side fixes the error completely - I don't know whether what we
were seeing before counts as a bug in Cygwin's allocator?

Many thanks!


David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Debugging malloc crash in gdb
  2022-10-20  8:22   ` David Allsopp
@ 2022-10-20  9:38     ` Ariel Burbaickij
  2022-11-02 12:38     ` Jon Turney
  1 sibling, 0 replies; 8+ messages in thread
From: Ariel Burbaickij @ 2022-10-20  9:38 UTC (permalink / raw)
  To: David Allsopp; +Cc: Jon Turney, The Cygwin Mailing List

[-- Attachment #1: Type: text/plain, Size: 2753 bytes --]

Hello David,
congrats on your bug fixing but gdb is pretty open that it considers it as
its own bug while running its "inferior", somewhere here:

if (tp->control.may_range_step)
{
/* If we're resuming a thread with the PC out of the step
range, then we're doing some nested/finer run control
operation, like stepping the thread out of the dynamic
linker or the displaced stepping scratch pad. We
shouldn't have allowed a range step then. */
gdb_assert (pc_in_thread_step_range (pc, tp));
}

whatever the logic behind setting may_range_step might be, it is (or should
be) as much decoupled from all the probable bugs in allocators of all the
possible flavours.

So, it should be investigated from the side of gdb maintainers too, for
sure, as I see it.

Kind Regards
Ariel Burbaickij


On Thu, Oct 20, 2022 at 10:22 AM David Allsopp <david@tarides.com> wrote:

> On Tue, 18 Oct 2022 at 20:09, Jon Turney wrote:
> >
> > On 18/10/2022 11:35, David Allsopp wrote:
> > > I'm wondering if I may be able to have some pointers for debugging what
> > > seems to be an unexpected interaction between mmap/mprotect/munmap and
> > > malloc with the OCaml runtime.
> > >
> > > At the moment, I know that we crash in malloc, so my main question is
> how to
> > > go further in gdb. I installed the cygwin-debuginfo package, but all
> I'm
> > > getting is:
> >
> > Firstly, if the crash is inside the cygwin DLL, you must follow the
> > advice in [1], and use 'set cygwin-exceptions on' to tell gdb to stop on
> > an exception inside cygwin itself.
> >
> > [1] https://cygwin.com/faq.html#faq.programming.debugging-cygwin
> >
> >
> > >
> /cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550
> > > : internal-error: void resume_1(gdb_signal): Assertion
> > > `pc_in_thread_step_range (pc, tp)' failed.
>
> I'm not sure now which combination of stepping directly into the
> malloc call, adding set cygwin-exceptions on or switching to gdb 12.1,
> but either way I was able to get to an invalid memory access in
> mmap_alloc in malloc.cc. At this point, p was a pointer to the start
> of the 256M block which had been passed to munmap.
>
> What I then noticed from that is a bug in our code - the mmap'd region
> was actually 256M+64K but the size passed to munmap was 256M... so the
> munmap call was not releasing the entire block. Fixing that on the
> OCaml side fixes the error completely - I don't know whether what we
> were seeing before counts as a bug in Cygwin's allocator?
>
> Many thanks!
>
>
> David
>
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Debugging malloc crash in gdb
  2022-10-20  8:22   ` David Allsopp
  2022-10-20  9:38     ` Ariel Burbaickij
@ 2022-11-02 12:38     ` Jon Turney
  1 sibling, 0 replies; 8+ messages in thread
From: Jon Turney @ 2022-11-02 12:38 UTC (permalink / raw)
  To: David Allsopp, The Cygwin Mailing List

On 20/10/2022 09:22, David Allsopp wrote:
> On Tue, 18 Oct 2022 at 20:09, Jon Turney wrote:
>> On 18/10/2022 11:35, David Allsopp wrote:
>>> I'm wondering if I may be able to have some pointers for debugging what
>>> seems to be an unexpected interaction between mmap/mprotect/munmap and
>>> malloc with the OCaml runtime.
[...]>>> 
/cygdrive/d/a/scallywag/gdb/gdb-11.2-1.x86_64/src/gdb-11.2/gdb/infrun.c:2550
>>> : internal-error: void resume_1(gdb_signal): Assertion
>>> `pc_in_thread_step_range (pc, tp)' failed.
> 
> I'm not sure now which combination of stepping directly into the
> malloc call, adding set cygwin-exceptions on or switching to gdb 12.1,
> but either way I was able to get to an invalid memory access in
> mmap_alloc in malloc.cc. At this point, p was a pointer to the start
> of the 256M block which had been passed to munmap.
> 
> What I then noticed from that is a bug in our code - the mmap'd region
> was actually 256M+64K but the size passed to munmap was 256M... so the
> munmap call was not releasing the entire block. Fixing that on the
> OCaml side fixes the error completely - I don't know whether what we
> were seeing before counts as a bug in Cygwin's allocator?

That depends.

Is the ocaml code relying on undefined behaviour, which just happens to 
work elsewhere, but fails on cygwin? Or is it defined behaviour, which 
Cygwin doesn't implement correctly?

(It's not unreasonable that Cygwin's memory allocator is more sensitive 
to some classes of errors than other implementations)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Debugging malloc crash in gdb
  2022-10-19  6:20   ` Ariel Burbaickij
@ 2022-11-02 12:38     ` Jon Turney
  2022-11-02 13:24       ` Ariel Burbaickij
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Turney @ 2022-11-02 12:38 UTC (permalink / raw)
  To: Ariel Burbaickij, The Cygwin Mailing List

On 19/10/2022 07:20, Ariel Burbaickij wrote:
> Hello all,
> I reported it already, of course as it happened to me but alas no reaction
> so far.
> 

Thanks for doing that.

Can you share a link to the gdb bug report?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Debugging malloc crash in gdb
  2022-11-02 12:38     ` Jon Turney
@ 2022-11-02 13:24       ` Ariel Burbaickij
  0 siblings, 0 replies; 8+ messages in thread
From: Ariel Burbaickij @ 2022-11-02 13:24 UTC (permalink / raw)
  To: Jon Turney; +Cc: The Cygwin Mailing List

[-- Attachment #1: Type: text/plain, Size: 584 bytes --]

>Can you share a link to the gdb bug report?
https://sourceware.org/bugzilla/show_bug.cgi?id=29513

So, open source products'  support reaction time is usually great but not
always great ;-). Unassigned for two months+ and counting.

Kind Regards
Ariel Burbaickij


On Wed, Nov 2, 2022 at 1:38 PM Jon Turney <jon.turney@dronecode.org.uk>
wrote:

> On 19/10/2022 07:20, Ariel Burbaickij wrote:
> > Hello all,
> > I reported it already, of course as it happened to me but alas no
> reaction
> > so far.
> >
>
> Thanks for doing that.
>
> Can you share a link to the gdb bug report?
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-02 13:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18 10:35 Debugging malloc crash in gdb David Allsopp
2022-10-18 19:08 ` Jon Turney
2022-10-19  6:20   ` Ariel Burbaickij
2022-11-02 12:38     ` Jon Turney
2022-11-02 13:24       ` Ariel Burbaickij
2022-10-20  8:22   ` David Allsopp
2022-10-20  9:38     ` Ariel Burbaickij
2022-11-02 12:38     ` Jon Turney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).