From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tdevries@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id BAAA33857361
 for <gdb-patches@sourceware.org>; Tue, 12 Jul 2022 10:22:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BAAA33857361
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by smtp-out1.suse.de (Postfix) with ESMTPS id ED75E2242F;
 Tue, 12 Jul 2022 10:22:46 +0000 (UTC)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D35CE13638;
 Tue, 12 Jul 2022 10:22:46 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
 by imap2.suse-dmz.suse.de with ESMTPSA id 9UWBMnZLzWLMaQAAMHmgww
 (envelope-from <tdevries@suse.de>); Tue, 12 Jul 2022 10:22:46 +0000
Message-ID: <ad4f8692-48ba-2c88-06b0-8f187cf4ec73@suse.de>
Date: Tue, 12 Jul 2022 12:22:46 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.0
Subject: Re: [PATCH] Introduce struct packed template, fix -fsanitize=thread
 for per_cu fields
Content-Language: en-US
From: Tom de Vries <tdevries@suse.de>
To: Pedro Alves <pedro@palves.net>, Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org
References: <20220629152914.13149-1-tdevries@suse.de>
 <20220629152914.13149-3-tdevries@suse.de> <8735fgvie4.fsf@tromey.com>
 <8c6ba70f-305d-c3ed-591d-36f1f38d52a6@suse.de>
 <bbf37493-9bc4-c88f-bddc-4b22bd1d1382@palves.net>
 <4af2061e-9583-93fe-f33b-dcf6828ccee3@suse.de>
 <40ba7002-69a0-7a8c-018f-f82c5698bfbb@palves.net>
 <5ef32bf1-067c-c43d-b786-ab4077d8dedd@suse.de>
In-Reply-To: <5ef32bf1-067c-c43d-b786-ab4077d8dedd@suse.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Jul 2022 10:22:50 -0000

On 7/8/22 16:54, Tom de Vries wrote:
> On 7/7/22 17:26, Pedro Alves wrote:
>> On 2022-07-07 11:18 a.m., Tom de Vries wrote:
>>> On 7/6/22 21:20, Pedro Alves wrote:
>>>> On 2022-07-04 8:45 p.m., Tom de Vries via Gdb-patches wrote:
>>>>> On 7/4/22 20:32, Tom Tromey wrote:
>>>>>>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
>>>>>>
>>>>>> Tom>  /* The number of bits needed to represent all languages, 
>>>>>> with enough
>>>>>> Tom>     padding to allow for reasonable growth.  */
>>>>>> Tom> -#define LANGUAGE_BITS 5
>>>>>> Tom> +#define LANGUAGE_BITS 8
>>>>>>
>>>>>> This will negatively affect the size of symbols and so I think it 
>>>>>> should
>>>>>> be avoided.
>>>>>>
>>>>>
>>>>> Ack, Pedro suggested a way to avoid this:
>>>>> ...
>>>>> +  struct {
>>>>> +    /* The language of this CU.  */
>>>>> +    ENUM_BITFIELD (language) m_lang : LANGUAGE_BITS;
>>>>> +  };
>>>>> ...
>>>>>
>>>>
>>>> It actually doesn't avoid it in this case,
>>>
>>> We were merely discussing the usage of LANGUAGE_BITS for 
>>> general_symbol_info::m_language, and indeed using the "struct { ... 
>>> };" approach avoids changing the LANGUAGE_BITS and introducing a 
>>> penalty on symbol size (which is a more numerous entity than CUs).
>>>
>>
>> Yeah, sorry, I realized it after sending and decided I'd deserve the 
>> incoming cluebat.  :-)
>>
> 
> Heh :)
> 
>>> Still, of course it's also good to keep the dwarf2_per_cu_data struct 
>>> as small as possible, so thanks for looking into this.
>>
>> It's that, but also the desire to settle on some infrastructure or 
>> approach that we can reuse
>> going forward.
>>
> 
> Sure.
> 
>>
>>>> I have not actually tested this with -fsanitize=thread, though.  
>>>> Would you
>>>> be up for testing that, Tom, if this approach looks reasonable?
>>>>
>>>
>>> Yes, of course.
>>>
>>> I've applied the patch and then started with my latest approach which 
>>> avoid locks and uses atomics:
>>
>> Thanks.
>>
>>> ...
>>> diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
>>> index f98d8b27649..bc1af0ec2d3 100644
>>> --- a/gdb/dwarf2/read.h
>>> +++ b/gdb/dwarf2/read.h
>>> @@ -108,6 +108,7 @@ struct dwarf2_per_cu_data
>>>         m_header_read_in (false),
>>>         mark (false),
>>>         files_read (false),
>>> +      m_lang (language_unknown),
>>>         scanned (false)
>>>     {
>>>     }
>>> @@ -180,7 +181,7 @@ struct dwarf2_per_cu_data
>>>     packed<dwarf_unit_type, 1> m_unit_type = (dwarf_unit_type) 0;
>>>
>>>     /* The language of this CU.  */
>>> -  packed<language, LANGUAGE_BYTES> m_lang = language_unknown;
>>> +  std::atomic<language> m_lang __attribute__((packed));
>>>
>>>   public:
>>>     /* True if this CU has been scanned by the indexer; false if
>>> @@ -332,11 +333,13 @@ struct dwarf2_per_cu_data
>>>
>>>     void set_lang (enum language lang)
>>>     {
>>> -    /* We'd like to be more strict here, similar to what is done in
>>> -       set_unit_type,  but currently a partial unit can go from 
>>> unknown to
>>> -       minimal to ada to c.  */
>>> -    if (m_lang != lang)
>>> -      m_lang = lang;
>>> +    enum language nope = language_unknown;
>>> +    if (m_lang.compare_exchange_strong (nope, lang))
>>> +      return;
>>> +    nope = lang;
>>> +    if (m_lang.compare_exchange_strong (nope, lang))
>>> +      return;
>>> +    gdb_assert_not_reached ();
>>>     }
>>>
>>>     /* Free any cached file names.  */
>>> ...
>>>
>>> I've tried both:
>>> ...
>>>    packed<std::atomic<language>, LANGUAGE_BYTES> m_lang
>>>      = language_unknown;
>>> ...
>>> and:
>>> ...
>>>    std::atomic<packed<language, LANGUAGE_BYTES>> m_lang
>>>      = language_unknown;
>>> ...
>>> and both give compilation errors:
>>> ...
>>> src/gdb/dwarf2/read.h:184:58: error: could not convert 
>>> ‘language_unknown’ from ‘language’ to ‘std::atomic<packed<language, 
>>> 1> >’
>>>     std::atomic<packed<language, LANGUAGE_BYTES>> m_lang = 
>>> language_unknown;
>>>                                                            
>>> ^~~~~~~~~~~~~~~~
>>> ...
>>> and:
>>> ...
>>> src/gdb/../gdbsupport/packed.h:84:47: error: bit-field 
>>> ‘std::atomic<language> packed<std::atomic<language>, 1>::m_val’ with 
>>> non-integral type
>>> ...
>>>
>>> Maybe one of the two should work and the pack template needs further 
>>> changes, I'm not sure.
>>
>> Yes, I think std::atomic<packed<language, LANGUAGE_BYTES>> should 
>> work.  We need to write
>> the initialized using {}, like this:
>>
>>    std::atomic<packed<language, LANGUAGE_BYTES>> m_lang 
>> {language_unknown};
>>
>> and then we run into errors when comparing m_lang with enum language.  
>> That is because
>> the preexisting operator==/operator!= would require converting from 
>> enum language to
>> packed<language, LANGUAGE_BYTES>, and then from packed<language, 
>> LANGUAGE_BYTES> to
>> std::atomic<packed<language, LANGUAGE_BYTES>>.  That is two implicit 
>> conversions, but
>> C++ only does one automatically.  We can fix that by adding some 
>> operator==/operator!=
>> implementations.
>>
>> I've done that in patch #1 attached.  I've also ditched the 
>> non-attribute-packed implementation.
>>
> 
> Ack.
> 
>>>
>>> Note btw that the attribute packed works here:
>>> ...
>>> +  std::atomic<language> m_lang __attribute__((packed));
>>> ...
>>> in the sense that it's got alignment 1:
>>> ...
>>>          struct atomic<language>    m_lang \
>>>            __attribute__((__aligned__(1))); /*    16     4 */
>>> ...
>>> but given that there's no LANGUAGE_BITS/BYTES, we're back to size 4 
>>> for the m_lang field, and size 128 overall.
>>>
>>> So for now I've settled for:
>>> ...
>>> +  std::atomic<LANGUAGE_CONTAINER> m_lang;
>>> ...
>>> which does get me back to size 120.
>>>
>>> WIP patch attached.
>>
>> Please find attached 3 patches:
>>
>> #1 - Introduce struct packed template
>> #2 - your original patch, but using struct packed, split to a separate 
>> patch. commit log updated.
>> #3 - a version of your std::atomic WIP patch that uses 
>> std::atomic<packed>
>>
>> Patches #1 and #2 pass the testsuite cleanly for me.  Patch #3 
>> compiles, but
>> runs into a couple regressions due to the gdb_assert_not_reached in 
>> set_lang
>> being reached.  I am not surprised since that set_lang code in your patch
>> looked WIP and I just blindly converted to the new approach to show 
>> the code
>> compiles.
>>
> 
> I've rebased #1 and #2 on master, and then applied my current WIP set, 
> copying the style that I found in #3.
> 
> I'm currently testing it, I've pushed to 
> https://github.com/vries/gdb/commits/sanitize-thread-7 if you're 
> interested.
> 
> #1 and #2 LGTM.
> 

So, pushed.

Thanks,
- Tom

> Thanks a lot for all the help :)
> 
> - Tom