From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 34FC1386FC35; Thu, 3 Jun 2021 20:41:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 34FC1386FC35 From: "andrew.burgess at embecosm dot com" To: gdb-prs@sourceware.org Subject: [Bug gdb/26819] RISC-V: internal-error: int finish_step_over(execution_control_state*): Assertion Date: Thu, 03 Jun 2021 20:41:30 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: gdb X-Bugzilla-Version: HEAD X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: andrew.burgess at embecosm dot com X-Bugzilla-Status: REOPENED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: andrew.burgess at embecosm dot com X-Bugzilla-Target-Milestone: 11.1 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gdb-prs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-prs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jun 2021 20:41:31 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26819 --- Comment #38 from Andrew Burgess --- I took a look at this issue and I can reproduce the failure. First, an interesting aside: I originally tried to reproduce this issue on a GDB built with --enable-targets=3Dall, and couldn't reproduce the failure. = But, when I built a GDB with --target=3Driscv-elf I was able to reproduce the is= sue just fine. The problem turns out to be that by default the all-targets GDB would defau= lt to osabi GNU/Linux, while the riscv-elf GDB does not include Linux support,= so defaults to osabi 'none'. If I use the all-targets GDB and explicitly do '= set osabi none' then I can reproduce the failure with all-targets GDB. The difference is that GNU/Linux RISC-V doesn't support single stepping, so, when using that osabi we will make use of software single-step. The bare-m= etal 'none' osabi assumes that single step support is available. So, why is this test failing when using single stepping? It is my belief that the problem here is a bug in the multi-core support of openocd. My first clue is the following sequence of vCont and stop replies sent betw= een GDB and openocd: Sending packet: $vCont?#49 Packet received: vCont;c;C;s;S Sending packet: $vCont;c#a8 Packet received: T05thread:2; Sending packet: $vCont;s:2#24 Packet received: T05 Sending packet: $vCont;s:2;c#c2 Packet received: T05 Sending packet: $vCont;s:1;c#c1 Packet received: T05thread:2; Notice that after the first 'vCont;c' we get back 'T05thread:2' clearly indicating which thread stopped. Next GDB sends 'vCont;s:2', so steps only the single thread '2', now we get back 'T05'. This is annoying (no thread-id), but the original patches for = this issue addressed this, as only one thread was set running GDB correctly "guesses" the thread and carries on. After that GDB sends 'vCont;s:2;c', now we're single stepping thread 2, but allowing thread 1 to continue. Again the reply comes back 'T05'. This time GDB guesses thread 1 as the stopped thread, and things start to go off the rails. Notice however, that in the next packet GDB sends 'vCont;s:1;c', which is k= ind of the reverse, step thread 1, continue thread 2, but now we get a reply 'T05thread:2' which includes a thread-id. Weird! The summary of the above then is that sometimes openocd does not return a thread-id even when multiple threads are running. I looked a little into openocd, specifically I looked at the function gdb_signal_reply in server/gdb_server.c. This function is passed a 'struct target *target'. Whether we send a thread-id back or not depends on whether the 'target->rtos' field is set or not. Now, if I debug this function I see two different target pointers passed in= at different times, one target represents "riscv.cpu0", this target has its rt= os field set, the other target represents "riscv.cpu1", this target does not h= ave its rtos field set. Now, I don't know the openocd internals, but what seems to happen is that sometimes the target stops, and gdb_signal_reply is called with "riscv.cpu0" target, but because target->rtos is set the stop is reported against target->rtos->current_thread, which can be thread 2. Then, sometimes, the target stops with "riscv.cpu1", in this case openocd j= ust makes no attempt to add a thread-id as target->rtos is NULL. So, where does this leave us? The above explains why GDB starts getting confused, but doesn't fully expla= in why we eventually hit the assert. I'm still looking into the details of th= at, but wanted to record what I knew so far. GDB _could_ possibly be slightly smarter when it guesses, as in if we send 'vCont;s:2;c', then maybe we should guess that thread-2 is the likely threa= d to have stopped, as most of the time thread-2 single stepping will complete be= fore thread-1 hits anything interesting.... most of the time. But this really feels like working around a broken target. --=20 You are receiving this mail because: You are on the CC list for the bug.=