On 2022-07-07 11:18 a.m., Tom de Vries wrote: > On 7/6/22 21:20, Pedro Alves wrote: >> On 2022-07-04 8:45 p.m., Tom de Vries via Gdb-patches wrote: >>> On 7/4/22 20:32, Tom Tromey wrote: >>>>>>>>> "Tom" == Tom de Vries writes: >>>> >>>> Tom>  /* The number of bits needed to represent all languages, with enough >>>> Tom>     padding to allow for reasonable growth.  */ >>>> Tom> -#define LANGUAGE_BITS 5 >>>> Tom> +#define LANGUAGE_BITS 8 >>>> >>>> This will negatively affect the size of symbols and so I think it should >>>> be avoided. >>>> >>> >>> Ack, Pedro suggested a way to avoid this: >>> ... >>> +  struct { >>> +    /* The language of this CU.  */ >>> +    ENUM_BITFIELD (language) m_lang : LANGUAGE_BITS; >>> +  }; >>> ... >>> >> >> It actually doesn't avoid it in this case, > > We were merely discussing the usage of LANGUAGE_BITS for general_symbol_info::m_language, and indeed using the "struct { ... };" approach avoids changing the LANGUAGE_BITS and introducing a penalty on symbol size (which is a more numerous entity than CUs). > Yeah, sorry, I realized it after sending and decided I'd deserve the incoming cluebat. :-) > Still, of course it's also good to keep the dwarf2_per_cu_data struct as small as possible, so thanks for looking into this. It's that, but also the desire to settle on some infrastructure or approach that we can reuse going forward. >> I have not actually tested this with -fsanitize=thread, though.  Would you >> be up for testing that, Tom, if this approach looks reasonable? >> > > Yes, of course. > > I've applied the patch and then started with my latest approach which avoid locks and uses atomics: Thanks. > ... > diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h > index f98d8b27649..bc1af0ec2d3 100644 > --- a/gdb/dwarf2/read.h > +++ b/gdb/dwarf2/read.h > @@ -108,6 +108,7 @@ struct dwarf2_per_cu_data >        m_header_read_in (false), >        mark (false), >        files_read (false), > +      m_lang (language_unknown), >        scanned (false) >    { >    } > @@ -180,7 +181,7 @@ struct dwarf2_per_cu_data >    packed m_unit_type = (dwarf_unit_type) 0; > >    /* The language of this CU.  */ > -  packed m_lang = language_unknown; > +  std::atomic m_lang __attribute__((packed)); > >  public: >    /* True if this CU has been scanned by the indexer; false if > @@ -332,11 +333,13 @@ struct dwarf2_per_cu_data > >    void set_lang (enum language lang) >    { > -    /* We'd like to be more strict here, similar to what is done in > -       set_unit_type,  but currently a partial unit can go from unknown to > -       minimal to ada to c.  */ > -    if (m_lang != lang) > -      m_lang = lang; > +    enum language nope = language_unknown; > +    if (m_lang.compare_exchange_strong (nope, lang)) > +      return; > +    nope = lang; > +    if (m_lang.compare_exchange_strong (nope, lang)) > +      return; > +    gdb_assert_not_reached (); >    } > >    /* Free any cached file names.  */ > ... > > I've tried both: > ... >   packed, LANGUAGE_BYTES> m_lang >     = language_unknown; > ... > and: > ... >   std::atomic> m_lang >     = language_unknown; > ... > and both give compilation errors: > ... > src/gdb/dwarf2/read.h:184:58: error: could not convert ‘language_unknown’ from ‘language’ to ‘std::atomic >’ >    std::atomic> m_lang = language_unknown; >                                                           ^~~~~~~~~~~~~~~~ > ... > and: > ... > src/gdb/../gdbsupport/packed.h:84:47: error: bit-field ‘std::atomic packed, 1>::m_val’ with non-integral type > ... > > Maybe one of the two should work and the pack template needs further changes, I'm not sure. Yes, I think std::atomic> should work. We need to write the initialized using {}, like this: std::atomic> m_lang {language_unknown}; and then we run into errors when comparing m_lang with enum language. That is because the preexisting operator==/operator!= would require converting from enum language to packed, and then from packed to std::atomic>. That is two implicit conversions, but C++ only does one automatically. We can fix that by adding some operator==/operator!= implementations. I've done that in patch #1 attached. I've also ditched the non-attribute-packed implementation. > > Note btw that the attribute packed works here: > ... > +  std::atomic m_lang __attribute__((packed)); > ... > in the sense that it's got alignment 1: > ... >         struct atomic    m_lang \ >           __attribute__((__aligned__(1))); /*    16     4 */ > ... > but given that there's no LANGUAGE_BITS/BYTES, we're back to size 4 for the m_lang field, and size 128 overall. > > So for now I've settled for: > ... > +  std::atomic m_lang; > ... > which does get me back to size 120. > > WIP patch attached. Please find attached 3 patches: #1 - Introduce struct packed template #2 - your original patch, but using struct packed, split to a separate patch. commit log updated. #3 - a version of your std::atomic WIP patch that uses std::atomic Patches #1 and #2 pass the testsuite cleanly for me. Patch #3 compiles, but runs into a couple regressions due to the gdb_assert_not_reached in set_lang being reached. I am not surprised since that set_lang code in your patch looked WIP and I just blindly converted to the new approach to show the code compiles.