From: Tom de Vries <tdevries@suse.de>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Cc: Tom Tromey <tom@tromey.com>
Subject: Re: [PATCH 3/5] [gdb/symtab] Work around fsanitize=address false positive for per_ cu->lang
Date: Mon, 4 Jul 2022 09:04:02 +0200 [thread overview]
Message-ID: <138377b8-eb0f-e212-7235-4ca483c67169@suse.de> (raw)
In-Reply-To: <62bc9947-ac55-a976-d225-5c828207f558@palves.net>
[-- Attachment #1: Type: text/plain, Size: 4964 bytes --]
On 6/29/22 20:28, Pedro Alves wrote:
> On 2022-06-29 19:25, Pedro Alves wrote:
>> On 2022-06-29 18:38, Pedro Alves wrote:
>>> On 2022-06-29 16:29, Tom de Vries via Gdb-patches wrote:
>>>> When building gdb with -fsanitize=thread and gcc 12, and running test-case
>>>> gdb.dwarf2/dwz.exp, we run into a data race between:
>>>> ...
>>>> Read of size 1 at 0x7b200000300d by thread T2:^M
>>>> #0 cutu_reader::cutu_reader(dwarf2_per_cu_data*, dwarf2_per_objfile*, \
>>>> abbrev_table*, dwarf2_cu*, bool, abbrev_cache*) gdb/dwarf2/read.c:6164 \
>>>> (gdb+0x82ec95)^M
>>>> ...
>>>> and:
>>>> ...
>>>> Previous write of size 1 at 0x7b200000300d by main thread:^M
>>>> #0 prepare_one_comp_unit gdb/dwarf2/read.c:23588 (gdb+0x86f973)^M
>>>> ...
>>>>
>>>> In other words, between:
>>>> ...
>>>> if (this_cu->reading_dwo_directly)
>>>> ...
>>>> and:
>>>> ...
>>>> cu->per_cu->lang = pretend_language;
>>>> ...
>>>>
>>>> Both fields are part of the same bitfield, and writing to one field while
>>>> reading from another is not a problem, so this is a false positive.
>>>
>>> I don't understand this sentence. Particularly "same bitfield", or
>>> really "Both fields are part of the same bitfield,". How can two bitfields
>>> be part of the same bitfield?
>>>
>>> Anyhow, both bitfields are part of a sequence of contiguous bitfields, here
>>> stripped of comments:
>>>
>>> unsigned int queued : 1;
>>> unsigned int is_debug_types : 1;
>>> unsigned int is_dwz : 1;
>>> unsigned int reading_dwo_directly : 1;
>>> unsigned int tu_read : 1;
>>> mutable bool m_header_read_in : 1;
>>> bool addresses_seen : 1;
>>> unsigned int mark : 1;
>>> bool files_read : 1;
>>> ENUM_BITFIELD (dwarf_unit_type) unit_type : 8;
>>> ENUM_BITFIELD (language) lang : LANGUAGE_BITS;
>>>
>>> Per C++11, they're all part of the same _memory location_. From N3253 (C++11), intro.memory:
>>>
>>> "A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having
>>> non-zero width. (...) Two threads of execution (1.10) can update and access separate memory locations
>>> without interfering with each other.
>>> (...)
>>> [ Note: Thus a bit-field and an adjacent non-bit-field are in separate memory locations, and therefore can be
>>> concurrently updated by two threads of execution without interference. The same applies to two bit-fields,
>>> if one is declared inside a nested struct declaration and the other is not, or if the two are separated by
>>> a zero-length bit-field declaration, or if they are separated by a non-bit-field declaration. It is not safe to
>>> concurrently update two bit-fields in the same struct if all fields between them are also bit-fields of non-zero
>>> width. — end note ]"
>>>
>>> And while it is true that in practice writing to one bit-field from one thread and reading from another,
>>> if they reside on the same location, is OK in practice, it is still undefined behavior.
Ack, thanks for pointing that out, I was not aware of this.
I've reformulated things in terms of "memory location".
>>>
>>> Note the escape hatch mentioned above:
>>>
>>> "if the two are separated by a zero-length bit-field declaration"
>>>
>>> Thus, a change like this:
>>>
>>> unsigned int queued : 1;
>>> unsigned int is_debug_types : 1;
>>> unsigned int is_dwz : 1;
>>> unsigned int reading_dwo_directly : 1;
>>> unsigned int tu_read : 1;
>>> mutable bool m_header_read_in : 1;
>>> bool addresses_seen : 1;
>>> unsigned int mark : 1;
>>> bool files_read : 1;
>>> ENUM_BITFIELD (dwarf_unit_type) unit_type : 8;
>>> +
>>> + /* Ensure lang is a separate memory location, so we can update
>>> + it concurrently with other bitfields. */
>>> + char :0;
>>> +
>>> ENUM_BITFIELD (language) lang : LANGUAGE_BITS;
>>>
>>>
>>> ... should work.
>>
>> The "if one is declared inside a nested struct declaration and the other
>> is not" escape hatch may be interesting too, as in, we'd write:
>>
>> struct {
>> ENUM_BITFIELD (language) lang : LANGUAGE_BITS;
>> };
>>
>> ... and since the struct is anonymous, nothing else needs to change.
>>
>> In patch #4, you'd just do this too:
>>
>> struct {
>> ENUM_BITFIELD (dwarf_unit_type) unit_type : 8;
>> };
>>
>> The "wrapping" syntax seems to read a bit better, particularly since this
>> way you don't have to worry about putting a :0 bitfield before and
>> another after.
>
Done.
> I keep coming back, sorry... :-P
>
> Another thought is that in both patches #3 and #4, it's reading_dwo_directly
> that is racing with two other bitfields. So I wonder whether it's _that_ one
> that should be moved to a separate memory location.
I've also tried that, but I got similar errors back, with the same
writes but different reads.
Also added field addresses_seen, which I found using testing with board
cc-with-dwz-m.
Thanks,
- Tom
[-- Attachment #2: 0001-gdb-symtab-Fix-fsanitize-address-errors-for-per_cu-fields.patch --]
[-- Type: text/x-patch, Size: 3637 bytes --]
[gdb/symtab] Fix fsanitize=address errors for per_cu fields
When building gdb with -fsanitize=thread and gcc 12, and running test-case
gdb.dwarf2/dwz.exp, we run into a data race between:
...
Read of size 1 at 0x7b200000300d by thread T2:^M
#0 cutu_reader::cutu_reader(dwarf2_per_cu_data*, dwarf2_per_objfile*, \
abbrev_table*, dwarf2_cu*, bool, abbrev_cache*) gdb/dwarf2/read.c:6164 \
(gdb+0x82ec95)^M
...
and:
...
Previous write of size 1 at 0x7b200000300d by main thread:^M
#0 prepare_one_comp_unit gdb/dwarf2/read.c:23588 (gdb+0x86f973)^M
...
In other words, between:
...
if (this_cu->reading_dwo_directly)
...
and:
...
cu->per_cu->lang = pretend_language;
...
Likewise, we run into a data race between:
...
Write of size 1 at 0x7b200000300e by thread T4:
#0 process_psymtab_comp_unit gdb/dwarf2/read.c:6789 (gdb+0x830720)
...
and:
...
Previous read of size 1 at 0x7b200000300e by main thread:
#0 cutu_reader::cutu_reader(dwarf2_per_cu_data*, dwarf2_per_objfile*, \
abbrev_table*, dwarf2_cu*, bool, abbrev_cache*) gdb/dwarf2/read.c:6164 \
(gdb+0x82edab)
...
In other words, between:
...
this_cu->unit_type = DW_UT_partial;
...
and:
...
if (this_cu->reading_dwo_directly)
...
Likewise for the write to addresses_seen in cooked_indexer::check_bounds and a
read from is_dwz in dwarf2_find_containing_comp_unit for test-case
gdb.dwarf2/dw2-dir-file-name.exp and target board cc-with-dwz-m.
The problem is that the written fields are part of the same memory location as
the read fields, so executing a read and write in different threads is
undefined behavour.
Making the written fields separate memory locations fixes it:
...
struct {
ENUM_BITFIELD (dwarf_unit_type) unit_type : 8;
};
struct {
ENUM_BITFIELD (language) lang : LANGUAGE_BITS;
};
struct {
bool addresses_seen : 1;
};
...
This increases the size of struct dwarf2_per_cu_data from 120 to 128 (for -m64).
The set of fields has been established experimentally to be the minimal set to
get rid of this type of -fsanitize=thread errors, but more fields might
require the same treatment.
Looking at the properties of the lang field, unlike dwarf_version it's not
available in the unit header, so it will be set the first time during the
parallel cooked index reading. The same holds for unit_type, and likewise
for addresses_seen.
Tested on x86_64-linux.
---
gdb/dwarf2/read.h | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
index b7a03933aa5..c4b007d064d 100644
--- a/gdb/dwarf2/read.h
+++ b/gdb/dwarf2/read.h
@@ -163,7 +163,9 @@ struct dwarf2_per_cu_data
/* If addresses have been read for this CU (usually from
.debug_aranges), then this flag is set. */
- bool addresses_seen : 1;
+ struct {
+ bool addresses_seen : 1;
+ };
/* A temporary mark bit used when iterating over all CUs in
expand_symtabs_matching. */
@@ -173,11 +175,16 @@ struct dwarf2_per_cu_data
point in trying to read it again next time. */
bool files_read : 1;
- /* The unit type of this CU. */
- ENUM_BITFIELD (dwarf_unit_type) unit_type : 8;
+ /* Put this in a struct to ensure a separate memory location. */
+ struct {
+ /* The unit type of this CU. */
+ ENUM_BITFIELD (dwarf_unit_type) unit_type : 8;
+ };
- /* The language of this CU. */
- ENUM_BITFIELD (language) lang : LANGUAGE_BITS;
+ struct {
+ /* The language of this CU. */
+ ENUM_BITFIELD (language) lang : LANGUAGE_BITS;
+ };
/* True if this CU has been scanned by the indexer; false if
not. */
next prev parent reply other threads:[~2022-07-04 7:04 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-29 15:29 [PATCH 1/5] [COVER-LETTER, RFC] Fix some fsanitize=thread issues in gdb's cooked index Tom de Vries
2022-06-29 15:29 ` [PATCH 2/5] [gdb/symtab] Fix data race on per_cu->dwarf_version Tom de Vries
2022-07-01 11:16 ` Tom de Vries
2022-07-02 11:07 ` Tom de Vries
2022-07-04 18:51 ` Tom Tromey
2022-07-04 19:43 ` Tom de Vries
2022-07-04 19:53 ` Tom Tromey
2022-06-29 15:29 ` [PATCH 3/5] [gdb/symtab] Work around fsanitize=address false positive for per_cu->lang Tom de Vries
2022-06-29 17:38 ` Pedro Alves
2022-06-29 18:25 ` Pedro Alves
2022-06-29 18:28 ` Pedro Alves
2022-07-04 7:04 ` Tom de Vries [this message]
2022-07-04 18:32 ` Tom Tromey
2022-07-04 19:45 ` Tom de Vries
2022-07-06 19:20 ` [PATCH] Introduce struct packed template, fix -fsanitize=thread for per_cu fields Pedro Alves
2022-07-07 10:18 ` Tom de Vries
2022-07-07 15:26 ` Pedro Alves
2022-07-08 14:54 ` Tom de Vries
2022-07-12 10:22 ` Tom de Vries
2022-06-29 15:29 ` [PATCH 4/5] [gdb/symtab] Work around fsanitize=address false positive for per_cu->unit_type Tom de Vries
2022-06-29 15:29 ` [PATCH 5/5] [gdb/symtab] Fix data race on per_cu->lang Tom de Vries
2022-07-04 18:30 ` Tom Tromey
2022-07-05 8:17 ` Tom de Vries
2022-07-05 15:19 ` Tom de Vries
2022-07-06 15:42 ` Tom de Vries
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=138377b8-eb0f-e212-7235-4ca483c67169@suse.de \
--to=tdevries@suse.de \
--cc=gdb-patches@sourceware.org \
--cc=pedro@palves.net \
--cc=tom@tromey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).