From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tdevries@suse.de>
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
 by sourceware.org (Postfix) with ESMTPS id 9B7C7386181D
 for <dwz@sourceware.org>; Mon, 22 Feb 2021 08:24:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9B7C7386181D
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tdevries@suse.de
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
 by mx2.suse.de (Postfix) with ESMTP id B24B9AF0B;
 Mon, 22 Feb 2021 08:24:42 +0000 (UTC)
Date: Mon, 22 Feb 2021 09:24:41 +0100
From: Tom de Vries <tdevries@suse.de>
To: dwz@sourceware.org, jakub@redhat.com, mark@klomp.org
Subject: [FTR] Fix DW_AT_decl_file for odr
Message-ID: <20210222082439.GA14387@delia>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: dwz@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Dwz mailing list <dwz.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/dwz>,
 <mailto:dwz-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/dwz/>
List-Help: <mailto:dwz-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/dwz>,
 <mailto:dwz-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Feb 2021 08:24:45 -0000

Hi,

[ For now, this is a for-the-record posting.  I'm working on a different fix
for this. ]

Consider odr-struct.  It has two structs aaa (from different CUs), each with
members of type bbb and ccc, but in one case bbb is a decl, in the other case
ccc is a decl.

When doing odr, we end up with one struct aaa, and no decls:
...
$ dwz --odr odr-struct
$ readelf -wi odr-struct \
    | egrep -A2 "DW_TAG_structure" \
    | egrep "DW_TAG|DW_AT_name|DW_AT_decl"
 <1><19>: Abbrev Number: 25 (DW_TAG_structure_type)
    <1a>   DW_AT_name        : ccc
 <1><2f>: Abbrev Number: 25 (DW_TAG_structure_type)
    <30>   DW_AT_name        : aaa
 <1><4b>: Abbrev Number: 25 (DW_TAG_structure_type)
    <4c>   DW_AT_name        : bbb
...

Now consider using the same file for multifile optimization, combined with
odr:
...
$ cp odr-struct 1; cp 1 2; dwz -m 3 1 2 --odr
...
The desired outcome is that the structs aaa are unified (same as above) in
both 1 and 2, and then moved to multifile 3.

The multifile looks good:
...
$ readelf -wi 3 \
    | egrep -A2 "DW_TAG_structure" \
    | egrep "DW_TAG|DW_AT_name|DW_AT_decl"
 <1><14>: Abbrev Number: 15 (DW_TAG_structure_type)
    <15>   DW_AT_name        : ccc
 <1><2a>: Abbrev Number: 15 (DW_TAG_structure_type)
    <2b>   DW_AT_name        : aaa
 <1><46>: Abbrev Number: 15 (DW_TAG_structure_type)
    <47>   DW_AT_name        : bbb
...
but some struct types are left in 1:
...
$ readelf -wi 1 \
    | egrep -A2 "DW_TAG_structure" \
    | egrep "DW_TAG|DW_AT_name|DW_AT_decl"
 <1><1e>: Abbrev Number: 12 (DW_TAG_structure_type)
    <1f>   DW_AT_name        : aaa
 <1><3d>: Abbrev Number: 12 (DW_TAG_structure_type)
    <3e>   DW_AT_name        : bbb
...

The problem is that the DW_AT_decl_file is different for struct bbb in 1 and
3, and that causes the struct types to linger in 1.

The problem can already be shown without multifile mode using:
....
$ dwz odr-struct --odr
$ llvm-dwarfdump odr-struct \
    | grep -A3 struct \
    | egrep -v "^--|DW_AT_byte_size" \
    | sed 's%/.*/%%'
0x00000019:   DW_TAG_structure_type
                DW_AT_name      ("ccc")
                DW_AT_decl_file ("odr.cc")
0x0000002f:   DW_TAG_structure_type
                DW_AT_name      ("aaa")
                DW_AT_decl_file ("odr.h")
0x0000004b:   DW_TAG_structure_type
                DW_AT_name      ("bbb")
                DW_AT_decl_file ("odr.cc")
...
The DW_AT_decl_file for struct bbb should be odr-2.cc.

The problem is caused by by odr: odr allows defs and decls to be part of the
same duplicate chain, which breaks the invariant that DIEs in the duplicate
chain are isomorph.  During write_die, the first DIE in the chain is written
out as the representative copy, which means having a decl as the first in the
chain is counterproductive.  We have reorder_dups to fix this problem, which
detects if a duplicate chain starts with a decl and then moves the first def
before it.  However, this breaks another variant: that for each partition, all
representative DIEs are from the same CU.  Consequently, the file table of the
partition may not match with the DW_AT_decl_file number.

In other words, the reordered duplicate chain is in the wrong partition.

Fix this by detecting this problem and falling back to a "don't know"
DW_AT_decl_file.

Having done that in single-file mode, we need to do the same during
write-multifile, otherwise the corresponding DIEs in 1 and 3 won't match.

Thanks,
- Tom

Fix DW_AT_decl_file for odr

2021-02-19  Tom de Vries  <tdevries@suse.de>

	PR dwz/27438
	* dwz.c (write_die): Handle DW_AT_decl_file when writing out DIE
	belonging to top-level DIE that is part of a reordered duplicate
	chain.
	* testsuite/dwz.tests/odr-struct-multifile.sh: New test.

---
 dwz.c                                       | 31 +++++++++++++++--
 testsuite/dwz.tests/odr-struct-multifile.sh | 53 +++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/dwz.c b/dwz.c
index 076f39c..5e6064c 100644
--- a/dwz.c
+++ b/dwz.c
@@ -12204,7 +12204,31 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 	  while (form == DW_FORM_indirect)
 	    form = read_uleb128 (inptr);
 
-	  if (unlikely (wr_multifile || op_multifile)
+	  bool file_zero_p = false;
+	  if (unlikely (odr && (multifile_mode == 0 || wr_multifile))
+	      && (reft->attr[i].attr == DW_AT_decl_file
+		  || reft->attr[i].attr == DW_AT_call_file))
+	    {
+	      dw_die_ref td = die;
+	      while (td->die_toplevel == 0)
+		td = td->die_parent;
+
+	      dw_die_ref d1 = td->die_nextdup;
+	      dw_die_ref d2 = d1 ? d1->die_nextdup : NULL;
+	      if (d1 && d2 && d1->die_offset > d2->die_offset)
+		/* This DIE belongs to a top-level DIE that's part of a
+		   duplicate chain that was reordered.  Consequently,
+		   when doing single-file optimization, the value of this
+		   attribute refers to a different file table than is being
+		   used, so we fall back to the "don't know" case (though
+		   it might accidentally be correct once in a while).
+		   Having done that, we need to do the same for wr_multifile,
+		   otherwise the DIE in the optimized single-file won't match
+		   with the multifile.  */
+		file_zero_p = true;
+	    }
+
+	  if (unlikely (wr_multifile || op_multifile || file_zero_p)
 	      && (reft->attr[i].attr == DW_AT_decl_file
 		  || reft->attr[i].attr == DW_AT_call_file))
 	    {
@@ -12232,7 +12256,10 @@ write_die (unsigned char *ptr, dw_cu_ref cu, dw_die_ref die,
 		  continue;
 		default: abort ();
 		}
-	      value = line_htab_lookup (refcu, value);
+	      if (file_zero_p)
+		value = 0;
+	      else
+		value = line_htab_lookup (refcu, value);
 	      switch (t->attr[j].form)
 		{
 		  case DW_FORM_data1: write_8 (ptr, value); break;
diff --git a/testsuite/dwz.tests/odr-struct-multifile.sh b/testsuite/dwz.tests/odr-struct-multifile.sh
new file mode 100644
index 0000000..cc462c9
--- /dev/null
+++ b/testsuite/dwz.tests/odr-struct-multifile.sh
@@ -0,0 +1,53 @@
+if ! $execs/dwz-for-test --odr -v 2>/dev/null; then
+    exit 77
+fi
+
+cp $execs/odr-struct 1
+cp 1 2
+
+for name in aaa bbb ccc; do
+    cnt=$(readelf -wi 1 | grep -c "DW_AT_name.*:.*$name" || true)
+    [ $cnt -eq 2 ]
+done
+
+for name in member_one member_two member_three member_four; do
+    cnt=$(readelf -wi 1 | grep -c "DW_AT_name.*:.*$name" || true)
+    case $name in
+	member_one|member_two)
+	    [ $cnt -eq 2 ]
+	    ;;
+	member_three|member_four)
+	    [ $cnt -eq 1 ]
+	    ;;
+	esac
+done
+
+decl_cnt=$(readelf -wi 1 | grep -c "DW_AT_declaration" || true)
+
+$execs/dwz-for-test --odr 1 2 -m 3
+
+verify-dwarf.sh 1
+verify-dwarf.sh 3
+
+for name in aaa bbb ccc; do
+    cnt=$(readelf -wi 3 | grep -c "DW_AT_name.*:.*$name" || true)
+    [ $cnt -eq 1 ]
+done
+
+for name in member_one member_two member_three member_four; do
+    cnt=$(readelf -wi 3 | grep -c "DW_AT_name.*:.*$name" || true)
+    [ $cnt -eq 1 ]
+done
+
+
+for name in aaa bbb ccc; do
+    cnt=$(readelf -wi 1 | grep -c "DW_AT_name.*:.*$name" || true)
+    [ $cnt -eq 0 ]
+done
+
+for name in member_one member_two member_three member_four; do
+    cnt=$(readelf -wi 1 | grep -c "DW_AT_name.*:.*$name" || true)
+    [ $cnt -eq 0 ]
+done
+
+rm -f 1 2 3