On 7/6/22 21:20, Pedro Alves wrote: > On 2022-07-04 8:45 p.m., Tom de Vries via Gdb-patches wrote: >> On 7/4/22 20:32, Tom Tromey wrote: >>>>>>>> "Tom" == Tom de Vries writes: >>> >>> Tom>  /* The number of bits needed to represent all languages, with enough >>> Tom>     padding to allow for reasonable growth.  */ >>> Tom> -#define LANGUAGE_BITS 5 >>> Tom> +#define LANGUAGE_BITS 8 >>> >>> This will negatively affect the size of symbols and so I think it should >>> be avoided. >>> >> >> Ack, Pedro suggested a way to avoid this: >> ... >> +  struct { >> +    /* The language of this CU.  */ >> +    ENUM_BITFIELD (language) m_lang : LANGUAGE_BITS; >> +  }; >> ... >> > > It actually doesn't avoid it in this case, We were merely discussing the usage of LANGUAGE_BITS for general_symbol_info::m_language, and indeed using the "struct { ... };" approach avoids changing the LANGUAGE_BITS and introducing a penalty on symbol size (which is a more numerous entity than CUs). Still, of course it's also good to keep the dwarf2_per_cu_data struct as small as possible, so thanks for looking into this. > as the following field will end up > moved to the next byte, so if LANGUAGE_BITS is 5, we'll end up with 3 bits gap. > > Actually, it's worse than that -- it will align m_lang to ENUM_BITFIELD(language)'s > natural alignment, so it can introduce byte padding before and after too. :-/ :-( > > We can see it with "ptype /o" after applying your patch using the struct{} > trick. Note the "3-byte padding" below: > > ... > /* 13: 5 | 1 */ bool m_header_read_in : 1; > /* XXX 2-bit hole */ > /* 14 | 1 */ struct { > /* 14: 0 | 1 */ bool addresses_seen : 1; > /* XXX 7-bit padding */ > > /* total size (bytes): 1 */ > }; > /* 15: 0 | 4 */ unsigned int mark : 1; > /* 15: 1 | 1 */ bool files_read : 1; > /* XXX 6-bit hole */ > /* 16 | 4 */ struct { > /* 16: 0 | 4 */ dwarf_unit_type unit_type : 8; > /* XXX 3-byte padding */ <<<<<< 3 bytes > > /* total size (bytes): 4 */ > }; > /* 20 | 4 */ struct { > /* 20: 0 | 4 */ language lang : 5; > /* XXX 3-bit padding */ > /* XXX 3-byte padding */ <<<<<< 3 bytes > > /* total size (bytes): 4 */ > }; > ... > > > > So, maybe we really want something else... How about this alternative patch below? > > I wrote the new struct packed template using the array for storage before I > wrote the alternative to use __attribute__((packed)), so I left the array > version in there, as a pure standards-conforming implementation. We can just > remove that array implementation completely, and use the ATTRIBUTE_PACKED macro > if you don't think it's worth it. All compilers worth their salt support attribute > packed, or something like it, I believe. The resulting gdbsupport/pack.h would > be quite smaller. > > I tested this on x86-64 Ubuntu 20.04, both attribute packed no attribute > versions, saw no regressions. Also smoke-tested with Clang (which uses the > attribute packed implementation too, as it defines __GNUC__). > > I have not actually tested this with -fsanitize=thread, though. Would you > be up for testing that, Tom, if this approach looks reasonable? > Yes, of course. I've applied the patch and then started with my latest approach which avoid locks and uses atomics: ... diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h index f98d8b27649..bc1af0ec2d3 100644 --- a/gdb/dwarf2/read.h +++ b/gdb/dwarf2/read.h @@ -108,6 +108,7 @@ struct dwarf2_per_cu_data m_header_read_in (false), mark (false), files_read (false), + m_lang (language_unknown), scanned (false) { } @@ -180,7 +181,7 @@ struct dwarf2_per_cu_data packed m_unit_type = (dwarf_unit_type) 0; /* The language of this CU. */ - packed m_lang = language_unknown; + std::atomic m_lang __attribute__((packed)); public: /* True if this CU has been scanned by the indexer; false if @@ -332,11 +333,13 @@ struct dwarf2_per_cu_data void set_lang (enum language lang) { - /* We'd like to be more strict here, similar to what is done in - set_unit_type, but currently a partial unit can go from unknown to - minimal to ada to c. */ - if (m_lang != lang) - m_lang = lang; + enum language nope = language_unknown; + if (m_lang.compare_exchange_strong (nope, lang)) + return; + nope = lang; + if (m_lang.compare_exchange_strong (nope, lang)) + return; + gdb_assert_not_reached (); } /* Free any cached file names. */ ... I've tried both: ... packed, LANGUAGE_BYTES> m_lang = language_unknown; ... and: ... std::atomic> m_lang = language_unknown; ... and both give compilation errors: ... src/gdb/dwarf2/read.h:184:58: error: could not convert ‘language_unknown’ from ‘language’ to ‘std::atomic >’ std::atomic> m_lang = language_unknown; ^~~~~~~~~~~~~~~~ ... and: ... src/gdb/../gdbsupport/packed.h:84:47: error: bit-field ‘std::atomic packed, 1>::m_val’ with non-integral type ... Maybe one of the two should work and the pack template needs further changes, I'm not sure. Note btw that the attribute packed works here: ... + std::atomic m_lang __attribute__((packed)); ... in the sense that it's got alignment 1: ... struct atomic m_lang \ __attribute__((__aligned__(1))); /* 16 4 */ ... but given that there's no LANGUAGE_BITS/BYTES, we're back to size 4 for the m_lang field, and size 128 overall. So for now I've settled for: ... + std::atomic m_lang; ... which does get me back to size 120. WIP patch attached. Thanks, - Tom