From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by sourceware.org (Postfix) with ESMTPS id 467D33851C2F for ; Sat, 6 Jun 2020 09:25:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 467D33851C2F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=andrew.burgess@embecosm.com Received: by mail-wm1-x341.google.com with SMTP id r9so10537570wmh.2 for ; Sat, 06 Jun 2020 02:25:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=do/Fz9kyQic61pRT/UxmkZUJUlNEf2Iq59yRXmbvW1A=; b=DJBK/qreJ1x3k5kbgFWjiI+0s9QihMsErX8QtNVt5pnaJt/hC8V0JCBpgHXZQJ14zC CprYSMtbroWa2KExuhhhl6iqobX+/ZV7FQOZJ+x1IiMb5SRqC+aq9/BzPvgbSv+46ej0 w4fsCaTfpT+yvJK+HciSCKgBamYrxxjwSPb66V+5ezryZtEpglyJl0smpUR/OtX2E2ar MLZefpUJcsksM9yIkSRcDyyyzSf8izANFfGTq1bPhT5gyC5M4vv+3vmwaI21nKq7iual +N+Hbu/4IzziH+3coK4OVEdbNc2kA4q4QWBhqomPdqaiWbHytYa5Cy+pHt2eXkE05es6 9O2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=do/Fz9kyQic61pRT/UxmkZUJUlNEf2Iq59yRXmbvW1A=; b=MDLtF/B9C9OAFTM4sYSd241GHTk4YFzVsLgic21rug1DwnxOwpSRIn+/w9inFyfpjw PHSSvpr0ximsnMxQxalQgTsUWIG7SKK9eYfbyPVpbZMTmL9qdacoOzoCIY02TKn4dYuE QcOPRb64YeWRqlNdZ3568kl4jSBDLtV9IWzDkKwd3cMiQXES+kY1TrxZvCtVlp24cJXG VpJR0boKlMK52Vma4Nwx64FFuFdz3ENlCIz9SrdieRGtw8u2WXz1cTTJMQeZ/PPswrVg Y+0x2wL7hYjRbc2jv/DdCC06Pjz4ZqN8AUAFhHYlQAnDnXI3pruMzMmv3yFB6rtwXicC 6yGw== X-Gm-Message-State: AOAM533jvnvcVFz726Zgnf6j6RcZm6OwqrREH1zhupfA0qzQu1BuR3tz lOlsiTjNq1HWeATGkLeXFio4Pw== X-Google-Smtp-Source: ABdhPJz9iIOuIntn2bnRL3Ekm1si88qsgcN3+E2XH4KCULyJCHfKSz30vjmmSH33R7cKVj8aY7BkcA== X-Received: by 2002:a1c:2044:: with SMTP id g65mr1126226wmg.16.1591435526114; Sat, 06 Jun 2020 02:25:26 -0700 (PDT) Received: from localhost (host86-128-12-16.range86-128.btcentralplus.com. [86.128.12.16]) by smtp.gmail.com with ESMTPSA id e5sm15873078wrw.19.2020.06.06.02.25.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 06 Jun 2020 02:25:25 -0700 (PDT) Date: Sat, 6 Jun 2020 10:25:24 +0100 From: Andrew Burgess To: Tom de Vries Cc: gdb-patches@sourceware.org Subject: Re: [PATCH][gdb/symtab] Fix line-table end-of-sequence sorting Message-ID: <20200606092524.GG3522@embecosm.com> References: <51b43dba-2213-a428-d4dd-10154f8d1f52@suse.de> <446082ca-c3d4-1a90-2a35-2669c8407095@suse.de> <18b1ee90-2ece-a5b4-787b-2507b081da81@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <18b1ee90-2ece-a5b4-787b-2507b081da81@suse.de> X-Operating-System: Linux/5.5.17-200.fc31.x86_64 (x86_64) X-Uptime: 09:58:19 up 47 days, 33 min, X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jun 2020 09:25:29 -0000 * Tom de Vries [2020-06-06 01:44:42 +0200]: > [ was: Re: [PATCH 2/3] gdb: Don't reorder line table entries too much > when sorting. ] >=20 > On 05-06-2020 18:00, Tom de Vries wrote: > > On 05-06-2020 16:49, Tom de Vries wrote: > >> On 23-12-2019 02:51, Andrew Burgess wrote: > >>> I had to make a small adjustment in find_pc_sect_line in order to > >>> correctly find the previous line in the line table. In some line > >>> tables I was seeing an actual line entry and an end of sequence marker > >>> at the same address, before this commit these would reorder to move > >>> the end of sequence marker before the line entry (end of sequence has > >>> line number 0). Now the end of sequence marker remains in its correct > >>> location, and in order to find a previous line we should step backward > >>> over any end of sequence markers. > >>> > >>> As an example, the binary: > >>> gdb/testsuite/outputs/gdb.dwarf2/dw2-ranges-func/dw2-ranges-func-lo= -cold > >>> > >>> Has this line table before the patch: > >>> > >>> INDEX LINE ADDRESS > >>> 0 48 0x0000000000400487 > >>> 1 END 0x000000000040048e > >>> 2 52 0x000000000040048e > >>> 3 54 0x0000000000400492 > >>> 4 56 0x0000000000400497 > >>> 5 END 0x000000000040049a > >>> 6 62 0x000000000040049a > >>> 7 END 0x00000000004004a1 > >>> 8 66 0x00000000004004a1 > >>> 9 68 0x00000000004004a5 > >>> 10 70 0x00000000004004aa > >>> 11 72 0x00000000004004b9 > >>> 12 END 0x00000000004004bc > >>> 13 76 0x00000000004004bc > >>> 14 78 0x00000000004004c0 > >>> 15 80 0x00000000004004c5 > >>> 16 END 0x00000000004004cc > >>> > >>> And after this patch: > >>> > >>> INDEX LINE ADDRESS > >>> 0 48 0x0000000000400487 > >>> 1 52 0x000000000040048e > >>> 2 END 0x000000000040048e > >>> 3 54 0x0000000000400492 > >>> 4 56 0x0000000000400497 > >>> 5 END 0x000000000040049a > >>> 6 62 0x000000000040049a > >>> 7 66 0x00000000004004a1 > >>> 8 END 0x00000000004004a1 > >>> 9 68 0x00000000004004a5 > >>> 10 70 0x00000000004004aa > >>> 11 72 0x00000000004004b9 > >>> 12 END 0x00000000004004bc > >>> 13 76 0x00000000004004bc > >>> 14 78 0x00000000004004c0 > >>> 15 80 0x00000000004004c5 > >>> 16 END 0x00000000004004cc > >>> > >>> When calling find_pc_sect_line with the address 0x000000000040048e, in > >>> both cases we find entry #3, we then try to find the previous entry, > >>> which originally found this entry '2 52 0x000000000040048e', > >>> after the patch it finds '2 END 0x000000000040048e', which > >>> cases the lookup to fail. > >>> > >>> By skipping the END marker after this patch we get back to the correct > >>> entry, which is now #1: '1 52 0x000000000040048e', and > >>> everything works again. > >> > >> I start to suspect that you have been working around an incorrect line > >> table. > >> > >> Consider this bit: > >> ... > >> 0 48 0x0000000000400487 > >> 1 52 0x000000000040048e > >> 2 END 0x000000000040048e > >> ... > >> > >> The end marker marks the address one past the end of the sequence. > >> Therefore, it makes no sense to have an entry in the sequence with the > >> same address as the end marker. > >> > >> [ dwarf doc: > >> > >> end_sequence: > >> > >> A boolean indicating that the current address is that of the first byte > >> after the end of a sequence of target machine instructions. end_sequen= ce > >> terminates a sequence of lines; therefore other information in the same > >> row is not meaningful. > >> > >> DW_LNE_end_sequence: > >> > >> The DW_LNE_end_sequence opcode takes no operands. It sets the > >> end_sequence register of the state machine to =E2=80=9Ctrue=E2=80=9D a= nd appends a row > >> to the matrix using the current values of the state-machine registers. > >> Then it resets the registers to the initial values specified above (see > >> Section 6.2.2). Every line number program sequence must end with a > >> DW_LNE_end_sequence instruction which creates a row whose address is > >> that of the byte after the last target machine instruction of the sequ= ence. > >> > >> ] > >> > >> The incorrect entry is generated by this dwarf assembler sequence: > >> ... > >> {DW_LNS_copy} > >> {DW_LNE_end_sequence} > >> ... > >> > >> I think we should probably fix the dwarf assembly test-cases. > >> > >> If we want to handle this in gdb, the thing that seems most logical to > >> me is to ignore this kind of entries. > >=20 > > Hmm, that seems to be done already, in buildsym_compunit::record_line. > >=20 > > Anyway, I was looking at the line table for > > gdb.dwarf2/dw2-ranges-base.exp, and got a line table with subsequent end > > markers: > > ... > > INDEX LINE ADDRESS IS-STMT > > 0 31 0x00000000004004a7 Y > > 1 21 0x00000000004004ae Y > > 2 END 0x00000000004004ae Y > > 3 11 0x00000000004004ba Y > > 4 END 0x00000000004004ba Y > > 5 END 0x00000000004004c6 Y > > ... > >=20 > > By using this patch: > > ... > > diff --git a/gdb/buildsym.c b/gdb/buildsym.c > > index 33bf6523e9..76f0b54ff6 100644 > > --- a/gdb/buildsym.c > > +++ b/gdb/buildsym.c > > @@ -943,6 +943,10 @@ buildsym_compunit::end_symtab_with_blockvector > > (struct block *static_block, > > =3D [] (const linetable_entry &ln1, > > const linetable_entry &ln2) -> bool > > { > > + if (ln1.pc =3D=3D ln2.pc > > + && ((ln1.line =3D=3D 0) !=3D (ln2.line =3D=3D 0))) > > + return ln1.line =3D=3D 0 ? true : false; > > + > > return (ln1.pc < ln2.pc); > > }; > >=20 > > ... > > I get the expected: > > ... > > INDEX LINE ADDRESS IS-STMT > > 0 31 0x00000000004004a7 Y > > 1 END 0x00000000004004ae Y > > 2 21 0x00000000004004ae Y > > 3 END 0x00000000004004ba Y > > 4 11 0x00000000004004ba Y > > 5 END 0x00000000004004c6 Y > > ... >=20 > Any comments on patch below? Yes! Thank you for working on this. This should go in, however, I'd like to tweak the commit message a bit please (see below). Also, do you plan to include the revert of find_pc_sect_line from 3d92a3e313 in this patch - I think you should. Thanks, Andrew >=20 > Thanks, > - Tom >=20 > [gdb/symtab] Fix line-table end-of-sequence sorting >=20 > Consider test-case gdb.dwarf2/dw2-ranges-base.exp. It has a line-table f= or > dw2-ranges-base.c like this: > ... > Line Number Statements: > [0x0000014e] Extended opcode 2: set Address to 0x4004ba > [0x00000159] Advance Line by 10 to 11 > [0x0000015b] Copy > [0x0000015c] Advance PC by 12 to 0x4004c6 > [0x0000015e] Advance Line by 19 to 30 > [0x00000160] Copy > [0x00000161] Extended opcode 1: End of Sequence >=20 > [0x00000164] Extended opcode 2: set Address to 0x4004ae > [0x0000016f] Advance Line by 20 to 21 > [0x00000171] Copy > [0x00000172] Advance PC by 12 to 0x4004ba > [0x00000174] Advance Line by 29 to 50 > [0x00000176] Copy > [0x00000177] Extended opcode 1: End of Sequence >=20 > [0x0000017a] Extended opcode 2: set Address to 0x4004a7 > [0x00000185] Advance Line by 30 to 31 > [0x00000187] Copy > [0x00000188] Advance PC by 7 to 0x4004ae > [0x0000018a] Advance Line by 39 to 70 > [0x0000018c] Copy > [0x0000018d] Extended opcode 1: End of Sequence > ... >=20 > The Copy followed by End-of-Sequence is as specified in the dwarf assembl= y, > but incorrect. F.i., consider: > ... > [0x0000015c] Advance PC by 12 to 0x4004c6 > [0x0000015e] Advance Line by 19 to 30 > [0x00000160] Copy > [0x00000161] Extended opcode 1: End of Sequence > ... >=20 > Both the Copy and the End-of-Sequence append a row to the matrix using the > same addres: 0x4004c6. The Copy declares a target instruction at that > address. The End-of-Sequence declares that the sequence ends before that > address. It's a contradiction that the target instruction is both part o= f the > sequence (according to Copy) and not part of the sequence (according to > End-of-Sequence). I don't want to argue about this specific test, but about the idea of an empty line occurring at the same address as an end of sequence marker. This can happen due to compiler optimisation, though it is perfectly reasonable to suggest that in this case the compiler should be removing the line marker, this doesn't always happen. I guess what I'm saying is that I think the case for this being obviously wrong is not quite as clear cut as you seem to imply, but also, I don't think this is really relevant to this bug - so maybe we could just drop this part? > The offending Copy is skipped though by buildsym_compunit::record_line for > unrelated reasons. Not really unrelated - specifically to catch this case (assuming we're thinking about the same code). > So, if we disable the sorting in > buildsym_compunit::end_symtab_with_blockvector, we have: > ... > INDEX LINE ADDRESS IS-STMT > 0 11 0x00000000004004ba Y > 1 END 0x00000000004004c6 Y > 2 21 0x00000000004004ae Y > 3 END 0x00000000004004ba Y > 4 31 0x00000000004004a7 Y > 5 END 0x00000000004004ae Y > ... > but if we re-enable the sorting, we have: > ... > INDEX LINE ADDRESS IS-STMT > 0 31 0x00000000004004a7 Y > 1 21 0x00000000004004ae Y > 2 END 0x00000000004004ae Y > 3 11 0x00000000004004ba Y > 4 END 0x00000000004004ba Y > 5 END 0x00000000004004c6 Y > ... > which has both: > - the contradictory order for the same-address pairs 1/2 and 3/4, as well= as > - a non-sensical pair of ENDs, > while we'd like: > ... > INDEX LINE ADDRESS IS-STMT > 0 31 0x00000000004004a7 Y > 1 END 0x00000000004004ae Y > 2 21 0x00000000004004ae Y > 3 END 0x00000000004004ba Y > 4 11 0x00000000004004ba Y > 5 END 0x00000000004004c6 Y > ... You're description here is absolutely correct, but I don't feel it doesn't draw enough attention to what the real mistake inside GDB is. Consider this line table, it shows two sequences, each delimited with an END marker. The table is extracted from the DWARF with the entries in this exact order: | Address | Line | |---------+------| | 0x100 | 10 | | 0x102 | 11 | | 0x104 | END | |---------+------| | 0x98 | 100 | | 0x9a | 101 | | 0x9c | 102 | | 0x9e | 103 | | 0x100 | END | |---------+------| What we want is to reorder the two sequences relative to each other. The current sorting does this by sorting on 'Address', while leaving entries at the same address in their original order, this gives: | Address | Line | |---------+------| | 0x98 | 100 | | 0x9a | 101 | | 0x9c | 102 | | 0x9e | 103 | | 0x100 | 10 | | 0x100 | END | |---------+------| | 0x102 | 11 | | 0x104 | END | |---------+------| Here we see that line 10 has jumped from being the start of one sequence to appear at the end of the other sequence's END marker, this is maintained the table order, but is clearly incorrect. A better sort would order by 'Address', but always move END markers to be the first entry at a given 'Address', this would give us: | Address | Line | |---------+------| | 0x98 | 100 | | 0x9a | 101 | | 0x9c | 102 | | 0x9e | 103 | | 0x100 | END | |---------+------| | 0x100 | 10 | | 0x102 | 11 | | 0x104 | END | |---------+------| >=20 > This is a regression since commit 3d92a3e313 "gdb: Don't reorder line tab= le > entries too much when sorting", that introduced sorting on address while > keeping entries with the same address in pre-sort order, which leads to > incorrect results if one of the entries is an End-Of-Sequence. Before commit 3d92a3e313, entries at the same address were sorted by line number. The correctly pushes END markers (line number 0) to be the first entry at that address, but also corrupts the ordering of non END markers. This commit corrects the mistake of 3d92a3e313, by finding a middle ground; keep entries at the same address in order _except_ for end markers, which are always sorted to be earlier in the table. >=20 > Fix this by handling End-Of-Sequence entries in the sorting function. >=20 > Tested on x86_64-linux. >=20 > gdb/ChangeLog: >=20 > 2020-06-06 Tom de Vries >=20 > * buildsym.c (buildsym_compunit::end_symtab_with_blockvector): Handle > End-Of-Sequence in lte_is_less_than. >=20 > gdb/testsuite/ChangeLog: >=20 > 2020-06-06 Tom de Vries >=20 > * gdb.dwarf2/dw2-ranges-base.exp: Test line-table order. >=20 > --- > gdb/buildsym.c | 4 ++++ > gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp | 14 ++++++++++++++ > 2 files changed, 18 insertions(+) >=20 > diff --git a/gdb/buildsym.c b/gdb/buildsym.c > index 33bf6523e9..76f0b54ff6 100644 > --- a/gdb/buildsym.c > +++ b/gdb/buildsym.c > @@ -943,6 +943,10 @@ buildsym_compunit::end_symtab_with_blockvector (stru= ct block *static_block, > =3D [] (const linetable_entry &ln1, > const linetable_entry &ln2) -> bool > { > + if (ln1.pc =3D=3D ln2.pc > + && ((ln1.line =3D=3D 0) !=3D (ln2.line =3D=3D 0))) > + return ln1.line =3D=3D 0 ? true : false; > + > return (ln1.pc < ln2.pc); > }; > =20 > diff --git a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp b/gdb/testsuite= /gdb.dwarf2/dw2-ranges-base.exp > index 92f8f6cecb..39281a8857 100644 > --- a/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp > +++ b/gdb/testsuite/gdb.dwarf2/dw2-ranges-base.exp > @@ -144,12 +144,26 @@ gdb_test "info line frame3" \ > =20 > # Ensure that the line table correctly tracks the end of sequence marker= s. > set end_seq_count 0 > +set prev -1 > +set seq_count 0 > gdb_test_multiple "maint info line-table gdb.dwarf2/dw2-ranges-base.c" \ > "count END markers in line table" { > -re "^$decimal\[ \t\]+$decimal\[ \t\]+$hex\(\[ \t\]+Y\)? *\r\n" { > + if { $prev !=3D -1 } { > + gdb_assert "$prev =3D=3D 1" \ > + "prev of normal entry at $seq_count is end marker" > + } > + set prev 0 > + incr seq_count > exp_continue > } > -re "^$decimal\[ \t\]+END\[ \t\]+$hex\(\[ \t\]+Y\)? *\r\n" { > + if { $prev !=3D -1 } { > + gdb_assert "$prev =3D=3D 0" \ > + "prev of end marker at $seq_count is normal entry" > + } > + set prev 1 > + incr seq_count > incr end_seq_count > exp_continue > }