From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vries@sourceware.org>
Received: by sourceware.org (Postfix, from userid 2205)
 id C6F36385829F; Tue, 12 Jul 2022 10:20:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C6F36385829F
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
From: Tom de Vries <vries@sourceware.org>
To: gdb-cvs@sourceware.org
Subject: [binutils-gdb] Fix -fsanitize=thread for per_cu fields
X-Act-Checkin: binutils-gdb
X-Git-Author: Tom de Vries <tdevries@suse.de>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: 68c0faca76a9b179af1a098946d6020091f43e06
X-Git-Newrev: ac3972d81f1065a156daa0da97320262845c2469
Message-Id: <20220712102022.C6F36385829F@sourceware.org>
Date: Tue, 12 Jul 2022 10:20:22 +0000 (GMT)
X-BeenThere: gdb-cvs@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-cvs mailing list <gdb-cvs.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-cvs>,
 <mailto:gdb-cvs-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-cvs/>
List-Help: <mailto:gdb-cvs-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-cvs>,
 <mailto:gdb-cvs-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Jul 2022 10:20:22 -0000

https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3Dac3972d81f10=
65a156daa0da97320262845c2469

commit ac3972d81f1065a156daa0da97320262845c2469
Author: Tom de Vries <tdevries@suse.de>
Date:   Tue Jul 12 12:20:13 2022 +0200

    Fix -fsanitize=3Dthread for per_cu fields
   =20
    When building gdb with -fsanitize=3Dthread and gcc 12, and running test=
-case
    gdb.dwarf2/dwz.exp, we run into a data race between:
    ...
      Read of size 1 at 0x7b200000300d by thread T2:^M
        #0 cutu_reader::cutu_reader(dwarf2_per_cu_data*, dwarf2_per_objfile=
*, \
        abbrev_table*, dwarf2_cu*, bool, abbrev_cache*) gdb/dwarf2/read.c:6=
164 \
        (gdb+0x82ec95)^M
    ...
    and:
    ...
      Previous write of size 1 at 0x7b200000300d by main thread:^M
        #0 prepare_one_comp_unit gdb/dwarf2/read.c:23588 (gdb+0x86f973)^M
    ...
   =20
    In other words, between:
    ...
      if (this_cu->reading_dwo_directly)
    ...
    and:
    ...
        cu->per_cu->lang =3D pretend_language;
    ...
   =20
    Likewise, we run into a data race between:
    ...
      Write of size 1 at 0x7b200000300e by thread T4:
        #0 process_psymtab_comp_unit gdb/dwarf2/read.c:6789 (gdb+0x830720)
    ...
    and:
    ...
      Previous read of size 1 at 0x7b200000300e by main thread:
        #0 cutu_reader::cutu_reader(dwarf2_per_cu_data*, dwarf2_per_objfile=
*, \
        abbrev_table*, dwarf2_cu*, bool, abbrev_cache*) gdb/dwarf2/read.c:6=
164 \
        (gdb+0x82edab)
    ...
   =20
    In other words, between:
    ...
          this_cu->unit_type =3D DW_UT_partial;
    ...
    and:
    ...
      if (this_cu->reading_dwo_directly)
    ...
   =20
    Likewise for the write to addresses_seen in cooked_indexer::check_bound=
s and a
    read from is_dwz in dwarf2_find_containing_comp_unit for test-case
    gdb.dwarf2/dw2-dir-file-name.exp and target board cc-with-dwz-m.
   =20
    The problem is that the written fields are part of the same memory loca=
tion as
    the read fields, so executing a read and write in different threads is
    undefined behavour.
   =20
    Making the written fields separate memory locations, using the new
    struct packed template fixes this.
   =20
    The set of fields has been established experimentally to be the
    minimal set to get rid of this type of -fsanitize=3Dthread errors, but
    more fields might require the same treatment.
   =20
    Looking at the properties of the lang field, unlike dwarf_version it's
    not available in the unit header, so it will be set the first time
    during the parallel cooked index reading.  The same holds for
    unit_type, and likewise for addresses_seen.
   =20
    dwarf2_per_cu_data::addresses_seen is moved so that the bitfields that
    currently follow it can be merged in the same memory location as the
    bitfields that currently precede it, for better packing.
   =20
    Tested on x86_64-linux.
   =20
    Co-Authored-By: Pedro Alves <pedro@palves.net>
   =20
    Change-Id: Ifa94f0a2cebfae5e8f6ddc73265f05e7fd9e1532

Diff:
---
 gdb/defs.h        |  3 +++
 gdb/dwarf2/read.h | 20 +++++++++++---------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/gdb/defs.h b/gdb/defs.h
index 99bfdd526ff..19f379d6588 100644
--- a/gdb/defs.h
+++ b/gdb/defs.h
@@ -232,6 +232,9 @@ enum language
 #define LANGUAGE_BITS 5
 gdb_static_assert (nr_languages <=3D (1 << LANGUAGE_BITS));
=20
+/* The number of bytes needed to represent all languages.  */
+#define LANGUAGE_BYTES ((LANGUAGE_BITS + HOST_CHAR_BIT - 1) / HOST_CHAR_BI=
T)
+
 enum precision_type
   {
     single_precision,
diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
index 5e0ce5e91a2..707dca9ee57 100644
--- a/gdb/dwarf2/read.h
+++ b/gdb/dwarf2/read.h
@@ -33,6 +33,7 @@
 #include "gdbsupport/gdb_obstack.h"
 #include "gdbsupport/hash_enum.h"
 #include "gdbsupport/function-view.h"
+#include "gdbsupport/packed.h"
=20
 /* Hold 'maintenance (set|show) dwarf' commands.  */
 extern struct cmd_list_element *set_dwarf_cmdlist;
@@ -105,11 +106,8 @@ struct dwarf2_per_cu_data
       reading_dwo_directly (false),
       tu_read (false),
       m_header_read_in (false),
-      addresses_seen (false),
       mark (false),
       files_read (false),
-      m_unit_type {},
-      m_lang (language_unknown),
       scanned (false)
   {
   }
@@ -162,10 +160,6 @@ public:
      it private at the moment.  */
   mutable bool m_header_read_in : 1;
=20
-  /* If addresses have been read for this CU (usually from
-     .debug_aranges), then this flag is set.  */
-  bool addresses_seen : 1;
-
   /* A temporary mark bit used when iterating over all CUs in
      expand_symtabs_matching.  */
   unsigned int mark : 1;
@@ -174,12 +168,20 @@ public:
      point in trying to read it again next time.  */
   bool files_read : 1;
=20
+  /* Wrap the following in struct packed instead of bitfields to avoid
+     data races when the bitfields end up on the same memory location
+     (per C++ memory model).  */
+
+  /* If addresses have been read for this CU (usually from
+     .debug_aranges), then this flag is set.  */
+  packed<bool, 1> addresses_seen =3D false;
+
 private:
   /* The unit type of this CU.  */
-  ENUM_BITFIELD (dwarf_unit_type) m_unit_type : 8;
+  packed<dwarf_unit_type, 1> m_unit_type =3D (dwarf_unit_type) 0;
=20
   /* The language of this CU.  */
-  ENUM_BITFIELD (language) m_lang : LANGUAGE_BITS;
+  packed<language, LANGUAGE_BYTES> m_lang =3D language_unknown;
=20
 public:
   /* True if this CU has been scanned by the indexer; false if