public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug d/101692] New: Program crushes at unpredictable moment of time
@ 2021-07-30 10:56 zed at lab127 dot karelia.ru
  2021-07-31  7:18 ` [Bug d/101692] " zed at lab127 dot karelia.ru
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-07-30 10:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

            Bug ID: 101692
           Summary: Program crushes at unpredictable moment of time
           Product: gcc
           Version: 8.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: d
          Assignee: ibuclaw at gdcproject dot org
          Reporter: zed at lab127 dot karelia.ru
  Target Milestone: ---

I am not sure this is really a gdc issue,
but a program behaves very strange -
it works normally for some time, and
then receives SIGSEGV at unpredictable moment
(but always at the same place of code).

Here is code snippet, where the crush occurs

    struct EpollEvent
    {
        align(1):
        uint event_mask;
        EventSource es;     // EventSource is a class
        /* just do not want to use that union, epoll_data_t */
    }

    bool wait()
    {
        EpollEvent event;

        if (done)
            return false;

        int n = epoll_wait(id, &event, 1, -1);
        if (-1 == n)
            return false;

        writefln("%s, n = %d", __FUNCTION__, n);
        writefln("%s", event); // <<< crashes here

        EventSource s = event.es;
        ulong ecode = s.eventCode(event.event_mask);
        mq.putMsg(null, s.owner, ecode, s);

        return true;
    }

Crush occurs when program is accessing 'event' variable
after return from epoll_wait().

And it does not depend on the type of this variable - 
I tried dynamic array EpollEvent[], static array EpollEvent[MAX],
no matter, after some period of normal functioning
program gets SIGSEGV at the indicated line.

gdb says the following:

Core was generated by `./echod'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f312ed79380 in ?? ()
(gdb) bt
#0  0x00007f312ed79380 in ?? ()
#1  0x0000555f33107071 in
std.format.formatObject!(std.stdio.File.LockingTextWriter, esrc.EventSource,
char).formatObject(ref std.stdio.File.LockingTextWriter, ref esrc.EventSource,
ref const(std.format.FormatSpec!(char).FormatSpec)) (w=...,
val=@0x7ffec19ab610: 0x7f312edb6300, f=...)
    at /usr/lib/gcc/x86_64-linux-gnu/8/include/d/std/format.d:3353
#2  0x0000555f33106fc9 in
std.format.formatValue!(std.stdio.File.LockingTextWriter, esrc.EventSource,
char).formatValue(ref std.stdio.File.LockingTextWriter, esrc.EventSource, ref
const(std.format.FormatSpec!(char).FormatSpec)) (w=..., val=0x7f312edb6300,
f=...)
    at /usr/lib/gcc/x86_64-linux-gnu/8/include/d/std/format.d:3450
#3  0x0000555f33106f39 in
std.format.formatElement!(std.stdio.File.LockingTextWriter, esrc.EventSource,
char).formatElement(ref std.stdio.File.LockingTextWriter, ref esrc.EventSource,
ref const(std.format.FormatSpec!(char).FormatSpec)) (w=...,
val=@0x7ffec19ab6b0: 0x7f312edb6300, f=...)
    at /usr/lib/gcc/x86_64-linux-gnu/8/include/d/std/format.d:3180
#4  0x0000555f3310683a in
std.format.formatValue!(std.stdio.File.LockingTextWriter, ecap.EpollEvent,
char).formatValue(ref std.stdio.File.LockingTextWriter, ref ecap.EpollEvent,
ref const(std.format.FormatSpec!(char).FormatSpec)) (w=..., val=..., f=...)
    at /usr/lib/gcc/x86_64-linux-gnu/8/include/d/std/format.d:3702
#5  0x0000555f3310634b in
std.format.formattedWrite!(std.stdio.File.LockingTextWriter, char,
ecap.EpollEvent).formattedWrite(ref std.stdio.File.LockingTextWriter,
const(char[]), ecap.EpollEvent) (w=..., fmt=..., _param_2=...) at
/usr/lib/gcc/x86_64-linux-gnu/8/include/d/std/format.d:568
#6  0x0000555f33105dc5 in std.stdio.File.writefln!(char,
ecap.EpollEvent).writefln(const(char[]), ecap.EpollEvent) (this=..., fmt=...,
_param_1=...)
    at /usr/lib/gcc/x86_64-linux-gnu/8/include/d/std/stdio.d:1496
#7  0x0000555f330f49ad in std.stdio.writefln!(char,
ecap.EpollEvent).writefln(const(char[]), ecap.EpollEvent) (fmt=...,
_param_1=...)
    at /usr/lib/gcc/x86_64-linux-gnu/8/include/d/std/stdio.d:3797
#8  0x0000555f330eca63 in ecap.EventQueue.wait() (this=0x7f312ed76000) at
engine/ecap.d:113
#9  0x0000555f330f472a in D main (args=...) at echod.d:46

engine/ecap.d:113 is that 'writefln("%s", event);'

Also it seemed to me, that crash is more probable when compiling with -Os.
Also 2, i tried dmd instead of gdc - crushes also occurs, but much less
frequently.

gdc --version
gdc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0
(Linux Mint 19.1 actually)

uname -a
Linux HP-Laptop 4.15.0-151-generic #157-Ubuntu SMP Fri Jul 9 23:07:57 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Program crushes at unpredictable moment of time
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
@ 2021-07-31  7:18 ` zed at lab127 dot karelia.ru
  2021-07-31  7:22 ` zed at lab127 dot karelia.ru
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-07-31  7:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #1 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---
Created attachment 51227
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51227&action=edit
references to objects in associative arrays

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Program crushes at unpredictable moment of time
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
  2021-07-31  7:18 ` [Bug d/101692] " zed at lab127 dot karelia.ru
@ 2021-07-31  7:22 ` zed at lab127 dot karelia.ru
  2021-07-31 11:50 ` zed at lab127 dot karelia.ru
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-07-31  7:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #2 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---

I've attached full source, just in case.
And I think, I began to understand, what is wrong with it.

In brief: it looks like GC deallocates objects,
which it should not deallocate, because
it does not take into account for references,
which are stored in associative arrays.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Program crushes at unpredictable moment of time
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
  2021-07-31  7:18 ` [Bug d/101692] " zed at lab127 dot karelia.ru
  2021-07-31  7:22 ` zed at lab127 dot karelia.ru
@ 2021-07-31 11:50 ` zed at lab127 dot karelia.ru
  2021-07-31 14:54 ` zed at lab127 dot karelia.ru
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-07-31 11:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #3 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---
I have reorginized my data - now references to EventSource class
and it's derivatives (timers, sockets, signals) are stored
in explicitly named data members of other classes.

That did not help, still crashes.

Then I added destructor for EventSource class to see,
if it is called or not. And it is...

sample output of a program...
-------------------------------------
'RX-0 @ IDLE' got 'M1' from 'CLIENT-2'
rx.RxSm.rxIdleM1 : fd = 12
'RX-0' registered 12 (esrc.Io)

!!! esrc.EventSource.~this() : esrc.ClientSocket // call of the destructor
!!! esrc.EventSource.~this() : esrc.Timer        // call of the destructor

'RX-0 @ IDLE' got 'M0' from 'SELF'
'RX-0' enabled 12 (esrc.Io, oneshot)
ecap.EventQueue.wait, n = 1, event @ 7FFE506B486C
ecap.EventQueue.wait, n = 1, event @ 7FFE506B486C, event source @ 7FB726991F40
Segmentation fault (core dumped)
------------------------------------

So now I understand, why the crash is happening -
an object reference, obtained from epoll_wait()
is invalid, because the object was already removed by GC.

But I do not understand, why GC destructs these objects...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Program crushes at unpredictable moment of time
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
                   ` (2 preceding siblings ...)
  2021-07-31 11:50 ` zed at lab127 dot karelia.ru
@ 2021-07-31 14:54 ` zed at lab127 dot karelia.ru
  2021-07-31 16:09 ` zed at lab127 dot karelia.ru
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-07-31 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #4 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---

investigation continues...

----------------------------------
'RX-98' deregistered 107 (esrc.Io)
   !!!  esrc.EventSource.~this() : esrc.Io (owner RX-97)
   this @ 0x7f845328b680    <<< note addr
   !!!  esrc.EventSource.~this() : esrc.Timer (owner RX-97)
   this @ 0x7f845328b6c0
___!!!___edsm.StageMachine.~this(): RX-97 destroyed...
....
ecap.EventQueue.wait, n = 1, event @ 7FFDBCE6712C,
   event source @ 0x7f845328b680   <<<
  // this address is invalid, object already destroyed by GC
Segmentation fault
----------------------------------

'RX-97' is a state machine instance.
RX machines are kept in a pool.
Slist is used as a stack/pool.
RX machines put themselves into the pool upon entering some state.

And it seems that since insertFront() method of Slist is not an
implicit assignment operator, D's GC does not count
reference in a list as a... reference. Oops.

Does it mean, it's impossible to store references in lists only?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Program crushes at unpredictable moment of time
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
                   ` (3 preceding siblings ...)
  2021-07-31 14:54 ` zed at lab127 dot karelia.ru
@ 2021-07-31 16:09 ` zed at lab127 dot karelia.ru
  2021-07-31 19:20 ` [Bug d/101692] Referenses in Slist are not counted by GC zed at lab127 dot karelia.ru
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-07-31 16:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #5 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---

Eventually I used following workaround:

    RxSm[] rxMachines;
    auto rxPool = new RestRoom();
    for (int k = 0; k < nConnections; k++) {
        auto sm = new RxSm(rxPool);
        rxMachines ~= sm;
        sm.run();
    }

This rxMachines array is just for the purpose of holding references,
since references in linked list are not counted at all by GC.

Now everything works fine, no mystical crashes any more :)
(...and I am beginning to dislike GC even more than before)

But the question remained.
Not counting references in Slist - is that by design?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Referenses in Slist are not counted by GC
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
                   ` (4 preceding siblings ...)
  2021-07-31 16:09 ` zed at lab127 dot karelia.ru
@ 2021-07-31 19:20 ` zed at lab127 dot karelia.ru
  2021-08-01  7:58 ` zed at lab127 dot karelia.ru
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-07-31 19:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #6 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---
... and there is another "interesting" observation...

In one one of the programs there is a StageMachine instance,
that catches SIGTERM and SIGINT and there is
'honest' reference to that instance in main(), but...

* if compliled with -Os optimization flag,
something absolutely unimaginable happens - 
this instance is destroyed by GC and what
is interesting, this happens every time
I run the program and it happens always
after ~6 seconds after the start

* if complied without -Os, everything is Ok.

wonders will never cease, -Os is doing something very 'special'

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Referenses in Slist are not counted by GC
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
                   ` (5 preceding siblings ...)
  2021-07-31 19:20 ` [Bug d/101692] Referenses in Slist are not counted by GC zed at lab127 dot karelia.ru
@ 2021-08-01  7:58 ` zed at lab127 dot karelia.ru
  2021-08-03 15:20 ` zed at lab127 dot karelia.ru
  2021-08-03 15:21 ` zed at lab127 dot karelia.ru
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-08-01  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #7 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---
(In reply to Eugene Zhiganov from comment #5)
> This rxMachines array is just for the purpose of holding references,
> since references in linked list are not counted at all by GC.

Maybe I am not right here...

Things a little bit more complicated.
Initially objects were created like this:

    auto rxPool = new RestRoom();
    for (int k = 0; k < maxClients; k++) {
        auto sm = new RxSm(rxPool);
        sm.run();
    }

After the the loop 'sm' is out of scope
and all these just created instances
are marked for deletion I guess.

RxSm has 3 states, INIT, IDLE and WORK.
run() method calls enter function for INIT state.
In this function rx do some initalization
and send a message to self (message means
'now go to IDLE state').

A little bit later, when program enters event loop
and starts processing messages, that message
is processed, machine enters IDLE state,
where it puts itself into a pool like this:

    void rxIdleEnter() {
        rR.put(this);
        // adds 'this' pointer to Slist
    }

But at this moment the instance is already marked
for deletion... and hence reference in the list
is not counted either because of

* object was marked for deletion

or because of

* references in Slist are not counted at all (???)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Referenses in Slist are not counted by GC
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
                   ` (6 preceding siblings ...)
  2021-08-01  7:58 ` zed at lab127 dot karelia.ru
@ 2021-08-03 15:20 ` zed at lab127 dot karelia.ru
  2021-08-03 15:21 ` zed at lab127 dot karelia.ru
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-08-03 15:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

--- Comment #8 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---
Created attachment 51250
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51250&action=edit
eventDriivenStateMachines in D,  working version

attached the file, maybe someone will be interested

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug d/101692] Referenses in Slist are not counted by GC
  2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
                   ` (7 preceding siblings ...)
  2021-08-03 15:20 ` zed at lab127 dot karelia.ru
@ 2021-08-03 15:21 ` zed at lab127 dot karelia.ru
  8 siblings, 0 replies; 10+ messages in thread
From: zed at lab127 dot karelia.ru @ 2021-08-03 15:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101692

Eugene Zhiganov <zed at lab127 dot karelia.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #9 from Eugene Zhiganov <zed at lab127 dot karelia.ru> ---
как-то так, наверное )

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-03 15:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 10:56 [Bug d/101692] New: Program crushes at unpredictable moment of time zed at lab127 dot karelia.ru
2021-07-31  7:18 ` [Bug d/101692] " zed at lab127 dot karelia.ru
2021-07-31  7:22 ` zed at lab127 dot karelia.ru
2021-07-31 11:50 ` zed at lab127 dot karelia.ru
2021-07-31 14:54 ` zed at lab127 dot karelia.ru
2021-07-31 16:09 ` zed at lab127 dot karelia.ru
2021-07-31 19:20 ` [Bug d/101692] Referenses in Slist are not counted by GC zed at lab127 dot karelia.ru
2021-08-01  7:58 ` zed at lab127 dot karelia.ru
2021-08-03 15:20 ` zed at lab127 dot karelia.ru
2021-08-03 15:21 ` zed at lab127 dot karelia.ru

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).