Re: [PATCH] Introduce struct packed template, fix -fsanitize=thread for per_cu fields

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Tom de Vries <tdevries@suse.de>
To: Pedro Alves <pedro@palves.net>, Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH] Introduce struct packed template, fix -fsanitize=thread for per_cu fields
Date: Thu, 7 Jul 2022 12:18:14 +0200	[thread overview]
Message-ID: <4af2061e-9583-93fe-f33b-dcf6828ccee3@suse.de> (raw)
In-Reply-To: <bbf37493-9bc4-c88f-bddc-4b22bd1d1382@palves.net>

[-- Attachment #1: Type: text/plain, Size: 6434 bytes --]

On 7/6/22 21:20, Pedro Alves wrote:
> On 2022-07-04 8:45 p.m., Tom de Vries via Gdb-patches wrote:
>> On 7/4/22 20:32, Tom Tromey wrote:
>>>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
>>>
>>> Tom>  /* The number of bits needed to represent all languages, with enough
>>> Tom>     padding to allow for reasonable growth.  */
>>> Tom> -#define LANGUAGE_BITS 5
>>> Tom> +#define LANGUAGE_BITS 8
>>>
>>> This will negatively affect the size of symbols and so I think it should
>>> be avoided.
>>>
>>
>> Ack, Pedro suggested a way to avoid this:
>> ...
>> +  struct {
>> +    /* The language of this CU.  */
>> +    ENUM_BITFIELD (language) m_lang : LANGUAGE_BITS;
>> +  };
>> ...
>>
> 
> It actually doesn't avoid it in this case,

We were merely discussing the usage of LANGUAGE_BITS for 
general_symbol_info::m_language, and indeed using the "struct { ... };" 
approach avoids changing the LANGUAGE_BITS and introducing a penalty on 
symbol size (which is a more numerous entity than CUs).

Still, of course it's also good to keep the dwarf2_per_cu_data struct as 
small as possible, so thanks for looking into this.

> as the following field will end up
> moved to the next byte, so if LANGUAGE_BITS is 5, we'll end up with 3 bits gap.
> 
> Actually, it's worse than that -- it will align m_lang to ENUM_BITFIELD(language)'s
> natural alignment, so it can introduce byte padding before and after too.  :-/  :-(
> 
> We can see it with "ptype /o" after applying your patch using the struct{}
> trick.  Note the "3-byte padding" below:
> 
>   ...
>   /*     13: 5   |       1 */    bool m_header_read_in : 1;
>   /* XXX  2-bit hole       */
>   /*     14      |       1 */    struct {
>   /*     14: 0   |       1 */        bool addresses_seen : 1;
>   /* XXX  7-bit padding    */
>   
>                                      /* total size (bytes):    1 */
>                                  };
>   /*     15: 0   |       4 */    unsigned int mark : 1;
>   /*     15: 1   |       1 */    bool files_read : 1;
>   /* XXX  6-bit hole       */
>   /*     16      |       4 */    struct {
>   /*     16: 0   |       4 */        dwarf_unit_type unit_type : 8;
>   /* XXX  3-byte padding   */                                               <<<<<< 3 bytes
>   
>                                      /* total size (bytes):    4 */
>                                  };
>   /*     20      |       4 */    struct {
>   /*     20: 0   |       4 */        language lang : 5;
>   /* XXX  3-bit padding    */
>   /* XXX  3-byte padding   */                                               <<<<<< 3 bytes
> 
>                                      /* total size (bytes):    4 */
>                                  };
>   ...
> 
> 
> 
> So, maybe we really want something else...  How about this alternative patch below?
> 
> I wrote the new struct packed template using the array for storage before I
> wrote the alternative to use __attribute__((packed)), so I left the array
> version in there, as a pure standards-conforming implementation.  We can just
> remove that array implementation completely, and use the ATTRIBUTE_PACKED macro
> if you don't think it's worth it.  All compilers worth their salt support attribute
> packed, or something like it, I believe.  The resulting gdbsupport/pack.h would
> be quite smaller.
> 
> I tested this on x86-64 Ubuntu 20.04, both attribute packed no attribute
> versions, saw no regressions.  Also smoke-tested with Clang (which uses the
> attribute packed implementation too, as it defines __GNUC__).
> 
> I have not actually tested this with -fsanitize=thread, though.  Would you
> be up for testing that, Tom, if this approach looks reasonable?
> 

Yes, of course.

I've applied the patch and then started with my latest approach which 
avoid locks and uses atomics:
...
diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
index f98d8b27649..bc1af0ec2d3 100644
--- a/gdb/dwarf2/read.h
+++ b/gdb/dwarf2/read.h
@@ -108,6 +108,7 @@ struct dwarf2_per_cu_data
        m_header_read_in (false),
        mark (false),
        files_read (false),
+      m_lang (language_unknown),
        scanned (false)
    {
    }
@@ -180,7 +181,7 @@ struct dwarf2_per_cu_data
    packed<dwarf_unit_type, 1> m_unit_type = (dwarf_unit_type) 0;

    /* The language of this CU.  */
-  packed<language, LANGUAGE_BYTES> m_lang = language_unknown;
+  std::atomic<language> m_lang __attribute__((packed));

  public:
    /* True if this CU has been scanned by the indexer; false if
@@ -332,11 +333,13 @@ struct dwarf2_per_cu_data

    void set_lang (enum language lang)
    {
-    /* We'd like to be more strict here, similar to what is done in
-       set_unit_type,  but currently a partial unit can go from unknown to
-       minimal to ada to c.  */
-    if (m_lang != lang)
-      m_lang = lang;
+    enum language nope = language_unknown;
+    if (m_lang.compare_exchange_strong (nope, lang))
+      return;
+    nope = lang;
+    if (m_lang.compare_exchange_strong (nope, lang))
+      return;
+    gdb_assert_not_reached ();
    }

    /* Free any cached file names.  */
...

I've tried both:
...
   packed<std::atomic<language>, LANGUAGE_BYTES> m_lang
     = language_unknown;
...
and:
...
   std::atomic<packed<language, LANGUAGE_BYTES>> m_lang
     = language_unknown;
...
and both give compilation errors:
...
src/gdb/dwarf2/read.h:184:58: error: could not convert 
‘language_unknown’ from ‘language’ to ‘std::atomic<packed<language, 1> >’
    std::atomic<packed<language, LANGUAGE_BYTES>> m_lang = language_unknown;
                                                           ^~~~~~~~~~~~~~~~
...
and:
...
src/gdb/../gdbsupport/packed.h:84:47: error: bit-field 
‘std::atomic<language> packed<std::atomic<language>, 1>::m_val’ with 
non-integral type
...

Maybe one of the two should work and the pack template needs further 
changes, I'm not sure.

Note btw that the attribute packed works here:
...
+  std::atomic<language> m_lang __attribute__((packed));
...
in the sense that it's got alignment 1:
...
         struct atomic<language>    m_lang \
           __attribute__((__aligned__(1))); /*    16     4 */
...
but given that there's no LANGUAGE_BITS/BYTES, we're back to size 4 for 
the m_lang field, and size 128 overall.

So for now I've settled for:
...
+  std::atomic<LANGUAGE_CONTAINER> m_lang;
...
which does get me back to size 120.

WIP patch attached.

Thanks,
- Tom


[-- Attachment #2: 0002-fix.patch --]
[-- Type: text/x-patch, Size: 2202 bytes --]

fix

---
 gdb/defs.h        |  3 +++
 gdb/dwarf2/read.h | 22 ++++++++++++++--------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/gdb/defs.h b/gdb/defs.h
index 19f379d6588..c129bf633a1 100644
--- a/gdb/defs.h
+++ b/gdb/defs.h
@@ -235,6 +235,9 @@ gdb_static_assert (nr_languages <= (1 << LANGUAGE_BITS));
 /* The number of bytes needed to represent all languages.  */
 #define LANGUAGE_BYTES ((LANGUAGE_BITS + HOST_CHAR_BIT - 1) / HOST_CHAR_BIT)
 
+#define LANGUAGE_CONTAINER unsigned char
+gdb_static_assert (sizeof (LANGUAGE_CONTAINER) >= LANGUAGE_BYTES);
+
 enum precision_type
   {
     single_precision,
diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
index f98d8b27649..f4362fe3ede 100644
--- a/gdb/dwarf2/read.h
+++ b/gdb/dwarf2/read.h
@@ -108,6 +108,7 @@ struct dwarf2_per_cu_data
       m_header_read_in (false),
       mark (false),
       files_read (false),
+      m_lang (language_unknown),
       scanned (false)
   {
   }
@@ -180,7 +181,7 @@ struct dwarf2_per_cu_data
   packed<dwarf_unit_type, 1> m_unit_type = (dwarf_unit_type) 0;
 
   /* The language of this CU.  */
-  packed<language, LANGUAGE_BYTES> m_lang = language_unknown;
+  std::atomic<LANGUAGE_CONTAINER> m_lang;
 
 public:
   /* True if this CU has been scanned by the indexer; false if
@@ -326,17 +327,22 @@ struct dwarf2_per_cu_data
 
   enum language lang () const
   {
-    gdb_assert (m_lang != language_unknown);
-    return m_lang;
+    LANGUAGE_CONTAINER lc = m_lang;
+    enum language l = (enum language)lc;
+    gdb_assert (l != language_unknown);
+    return l;
   }
 
   void set_lang (enum language lang)
   {
-    /* We'd like to be more strict here, similar to what is done in
-       set_unit_type,  but currently a partial unit can go from unknown to
-       minimal to ada to c.  */
-    if (m_lang != lang)
-      m_lang = lang;
+    LANGUAGE_CONTAINER lc = (LANGUAGE_CONTAINER)lang;
+    LANGUAGE_CONTAINER nope = (LANGUAGE_CONTAINER)language_unknown;
+    if (m_lang.compare_exchange_strong (nope, lc))
+      return;
+    nope = lang;
+    if (m_lang.compare_exchange_strong (nope, lc))
+      return;
+    gdb_assert_not_reached ();
   }
 
   /* Free any cached file names.  */

next prev parent reply	other threads:[~2022-07-07 10:18 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 15:29 [PATCH 1/5] [COVER-LETTER, RFC] Fix some fsanitize=thread issues in gdb's cooked index Tom de Vries
2022-06-29 15:29 ` [PATCH 2/5] [gdb/symtab] Fix data race on per_cu->dwarf_version Tom de Vries
2022-07-01 11:16   ` Tom de Vries
2022-07-02 11:07     ` Tom de Vries
2022-07-04 18:51       ` Tom Tromey
2022-07-04 19:43         ` Tom de Vries
2022-07-04 19:53           ` Tom Tromey
2022-06-29 15:29 ` [PATCH 3/5] [gdb/symtab] Work around fsanitize=address false positive for per_cu->lang Tom de Vries
2022-06-29 17:38   ` Pedro Alves
2022-06-29 18:25     ` Pedro Alves
2022-06-29 18:28       ` Pedro Alves
2022-07-04  7:04         ` [PATCH 3/5] [gdb/symtab] Work around fsanitize=address false positive for per_ cu->lang Tom de Vries
2022-07-04 18:32   ` [PATCH 3/5] [gdb/symtab] Work around fsanitize=address false positive for per_cu->lang Tom Tromey
2022-07-04 19:45     ` Tom de Vries
2022-07-06 19:20       ` [PATCH] Introduce struct packed template, fix -fsanitize=thread for per_cu fields Pedro Alves
2022-07-07 10:18         ` Tom de Vries [this message]
2022-07-07 15:26           ` Pedro Alves
2022-07-08 14:54             ` Tom de Vries
2022-07-12 10:22               ` Tom de Vries
2022-06-29 15:29 ` [PATCH 4/5] [gdb/symtab] Work around fsanitize=address false positive for per_cu->unit_type Tom de Vries
2022-06-29 15:29 ` [PATCH 5/5] [gdb/symtab] Fix data race on per_cu->lang Tom de Vries
2022-07-04 18:30   ` Tom Tromey
2022-07-05  8:17     ` Tom de Vries
2022-07-05 15:19     ` Tom de Vries
2022-07-06 15:42       ` Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4af2061e-9583-93fe-f33b-dcf6828ccee3@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).