From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simark@simark.ca>
Received: from simark.ca (simark.ca [158.69.221.121])
	by sourceware.org (Postfix) with ESMTPS id B01093885504
	for <gdb@sourceware.org>; Mon, 14 Nov 2022 00:26:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B01093885504
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=simark.ca
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark.ca
Received: from [10.0.0.11] (unknown [217.28.27.60])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPSA id E7E981E0CB;
	Sun, 13 Nov 2022 19:26:10 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=simark.ca; s=mail;
	t=1668385571; bh=bPCLyo+nAX9EjOiI1oosUOWbFDo008vRctfXNEaXxvs=;
	h=Date:Subject:To:References:From:In-Reply-To:From;
	b=OL18Lp5tF3j8FuBAiNTefv143MlCe6EdfvY8BxUtYaz2JLygP/9iM9Ia62wYDcsNA
	 xihUozWtOJulLIDcyCgNSxWVFjt59eU5XHe12cEPwwnQK+enwnX/0xDIjpvE58hhzi
	 my8i3Ca/7Eo+9DfLe/XuOv8aKsze2m+VBg2V0jZ8=
Message-ID: <b3f75aa3-dbd1-b704-c884-4083c9246967@simark.ca>
Date: Sun, 13 Nov 2022 19:26:10 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: memory increased rapidly when adding a break
To: DeJiang Zhu <doujiang24@gmail.com>, gdb@sourceware.org
References: <CAEZxTmnx+QmDubVbbCETaLnXR5GiVVk3E=VPmJviUaPZHqJFUA@mail.gmail.com>
Content-Language: en-US
From: Simon Marchi <simark@simark.ca>
In-Reply-To: <CAEZxTmnx+QmDubVbbCETaLnXR5GiVVk3E=VPmJviUaPZHqJFUA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb.sourceware.org>



On 11/13/22 05:01, DeJiang Zhu via Gdb wrote:
> Hi,
> 
> I compiled envoy(a big c++ project) by using gcc-12.2.0, debug it by using
> gdb 12.1.
> 
> But, memory increased rapidly(over 40+GB, until OOM), when adding a break.
> 
> I got this backtrace, after attach the gdb process, when memory increasing.
> 
> ```
> (gdb) bt
> #0  0x00007fa6893dc935 in _int_malloc () from /lib64/libc.so.6
> #1  0x00007fa6893df6fc in malloc () from /lib64/libc.so.6
> #2  0x0000000000468278 in xmalloc (size=4064) at alloc.c:60
> #3  0x00000000008ecd95 in call_chunkfun (size=<optimized out>,
> h=0x17a246ed0) at ./obstack.c:94
> #4  _obstack_begin_worker (h=0x17a246ed0, size=<optimized out>,
> alignment=<optimized out>) at ./obstack.c:141
> #5  0x000000000052d0d3 in demangle_parse_info::demangle_parse_info
> (this=0x17a246ec0) at cp-name-parser.y:1973
> #6  cp_demangled_name_to_comp (demangled_name=demangled_name@entry=0x8d12c8c0
> "std::stack<unsigned int, std::deque<unsigned int, std::allocator<unsigned
> int> > >::size_type", errmsg=errmsg@entry=0x0) at cp-name-parser.y:2040
> #7  0x000000000052ff5e in cp_canonicalize_string
> (string=string@entry=0x8d12c8c0
> "std::stack<unsigned int, std::deque<unsigned int, std::allocator<unsigned
> int> > >::size_type") at cp-support.c:635
> #8  0x0000000000570b98 in dwarf2_canonicalize_name (name=0x8d12c8c0
> "std::stack<unsigned int, std::deque<unsigned int, std::allocator<unsigned
> int> > >::size_type", cu=<optimized out>, objfile=0x2c3af10) at
> dwarf2/read.c:22908
> #9  0x0000000000590265 in dwarf2_compute_name (name=0x7fa55773c524
> "size_type", die=0x172590eb0, cu=0xe2aeefd0, physname=0) at
> dwarf2/read.c:10095
> #10 0x000000000058bf39 in dwarf2_full_name (cu=0xe2aeefd0, die=0x172590eb0,
> name=0x0) at dwarf2/read.c:10123
> #11 read_typedef (cu=0xe2aeefd0, die=0x172590eb0) at dwarf2/read.c:17687
> #12 read_type_die_1 (cu=0xe2aeefd0, die=0x172590eb0) at dwarf2/read.c:22531
> #13 read_type_die (die=0x172590eb0, cu=0xe2aeefd0) at dwarf2/read.c:22473
> #14 0x000000000059acda in dwarf2_add_type_defn (cu=0xe2aeefd0,
> die=0x172590eb0, fip=0x7ffd8a1be3e0) at dwarf2/read.c:14740
> #15 handle_struct_member_die (child_die=0x172590eb0, type=0x17a6becd0,
> fi=0x7ffd8a1be3e0, template_args=<optimized out>, cu=0xe2aeefd0) at
> dwarf2/read.c:15867
> #16 0x0000000000597044 in process_structure_scope (cu=0xe2aeefd0,
> die=0x172590920) at dwarf2/read.c:15908
> #17 process_die (die=0x172590920, cu=0xe2aeefd0) at dwarf2/read.c:9698
> #18 0x000000000059646d in read_namespace (cu=0xe2aeefd0, die=0x16802e140)
> at dwarf2/read.c:17068
> #19 process_die (die=0x16802e140, cu=0xe2aeefd0) at dwarf2/read.c:9737
> #20 0x0000000000598df9 in read_file_scope (die=0x1594e8360, cu=0xe2aeefd0)
> at dwarf2/read.c:10648
> #21 0x0000000000595f32 in process_die (die=0x1594e8360, cu=0xe2aeefd0) at
> dwarf2/read.c:9669
> #22 0x000000000059c0c8 in process_full_comp_unit
> (pretend_language=<optimized out>, cu=0xe2aeefd0) at dwarf2/read.c:9439
> #23 process_queue (per_objfile=0x9d546c0) at dwarf2/read.c:8652
> #24 dw2_do_instantiate_symtab (per_cu=<optimized out>,
> per_objfile=0x9d546c0, skip_partial=<optimized out>) at dwarf2/read.c:2311
> #25 0x000000000059c5f0 in dw2_instantiate_symtab (per_cu=0x9c886f0,
> per_objfile=0x9d546c0, skip_partial=<optimized out>) at
> dwarf2/read.c:2335#26 0x000000000059c78a in
> dw2_expand_symtabs_matching_one(dwarf2_per_cu_data *, dwarf2_per_objfile *,
> gdb::function_view<bool(char const*, bool)>,
> gdb::function_view<bool(compunit_symtab*)>) (per_cu=<optimized out>,
> per_objfile=<optimized out>, file_matcher=..., expansion_notify=...) at
> dwarf2/read.c:4204
> #27 0x000000000059c94b in
> dwarf2_gdb_index::expand_symtabs_matching(objfile*, gdb::function_view<bool
> (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool
> (char const*)>, gdb::function_view<bool (compunit_symtab*)>,
> enum_flags<block_search_flag_values>, domain_enum_tag, search_domain)
> (this=<optimized out>, objfile=<optimized out>, file_matcher=...,
> lookup_name=<optimized out>, symbol_matcher=
> ..., expansion_notify=..., search_flags=..., domain=UNDEF_DOMAIN,
> kind=<optimized out>) at dwarf2/read.c:4421
> #28 0x0000000000730feb in objfile::map_symtabs_matching_filename(char
> const*, char const*, gdb::function_view<bool (symtab*)>) (this=0x2c3af10,
> name=<optimized out>, name@entry=0x586f26f0 "utility.h",
> real_path=<optimized out>, real_path@entry=0x0, callback=...) at
> symfile-debug.c:207
> #29 0x0000000000741abd in iterate_over_symtabs(char const*,
> gdb::function_view<bool (symtab*)>) (name=name@entry=0x586f26f0
> "utility.h", callback=...) at symtab.c:624
> #30 0x00000000006311d7 in collect_symtabs_from_filename (file=0x586f26f0
> "utility.h", search_pspace=<optimized out>) at linespec.c:3716
> #31 0x0000000000631212 in symtabs_from_filename (filename=0x586f26f0
> "utility.h", search_pspace=<optimized out>) at linespec.c:3736
> #32 0x0000000000635e9f in parse_linespec (parser=0x7ffd8a1bf1b0,
> arg=<optimized out>, match_type=<optimized out>) at linespec.c:2557
> #33 0x0000000000636cac in event_location_to_sals (parser=0x7ffd8a1bf1b0,
> location=0x51ed4da0) at linespec.c:3082
> #34 0x0000000000636f73 in decode_line_full (location=location@entry=0x51ed4da0,
> flags=flags@entry=1, search_pspace=search_pspace@entry=0x0,
> default_symtab=<optimized out>, default_line=<optimized out>,
> canonical=0x7ffd8a1bf4e0, select_mode=0x0, filter=<optimized out>) at
> linespec.c:3161
> #35 0x00000000004b1683 in parse_breakpoint_sals (location=0x51ed4da0,
> canonical=0x7ffd8a1bf4e0) at breakpoint.c:8730
> #36 0x00000000004b5d03 in create_breakpoint (gdbarch=0xeca5dc0,
> location=location@entry=0x51ed4da0, cond_string=cond_string@entry=0x0,
> thread=<optimized out>, thread@entry=-1, extra_string=0x0,
> extra_string@entry=0x7ffd8a1bf650 "",
> force_condition=force_condition@entry=false,
> parse_extra=0, tempflag=0, type_wanted=bp_breakpoint, ignore_count=0,
> pending_break_support=AUTO_BOOLEAN_TRUE, ops=0xc23c00
> <bkpt_breakpoint_ops>, from_tty=0, enabled=1, internal=0, flags=0) at
> breakpoint.c:9009
> #37 0x0000000000674ba8 in mi_cmd_break_insert_1 (dprintf=0, argv=<optimized
> out>, argc=<optimized out>, command=<optimized out>) at
> mi/mi-cmd-break.c:361
> ```
> 
> Also, I found it's loop in `dwarf2_gdb_index::expand_symtabs_matching`.
> I added a break on `dw2_expand_symtabs_matching_one`, it hit this break
> repeatly.
> 
> ```
>   if (lookup_name == nullptr)
>     {
>       for (dwarf2_per_cu_data *per_cu
>         : all_comp_units_range (per_objfile->per_bfd))
>       {
>          QUIT;
>          if (!dw2_expand_symtabs_matching_one (per_cu, per_objfile,
> file_matcher, expansion_notify))
>            return false;
>       }
>       return true;
>     }
> ```
> 
> Seems, `per_bfd->all_comp_units.size()` is `28776`.
> I'm not sure if this is a reasonable value.

I think that's possible, if it's a big project.  For instance, my gdb
binary has about 660 compile units, and gdb is not really big.

> 
> ```
> (gdb) p per_objfile->per_bfd->all_comp_units
> $423 = {<std::_Vector_base<std::unique_ptr<dwarf2_per_cu_data,
> dwarf2_per_cu_data_deleter>,
> std::allocator<std::unique_ptr<dwarf2_per_cu_data,
> dwarf2_per_cu_data_deleter> > >> = {_M_impl =
> {<std::allocator<std::unique_ptr<dwarf2_per_cu_data,
> dwarf2_per_cu_data_deleter> >> =
> {<__gnu_cxx::new_allocator<std::unique_ptr<dwarf2_per_cu_data,
> dwarf2_per_cu_data_deleter> >> = {<No data fields>}, <No data fields>},
> <std::_Vector_base<std::unique_ptr<dwarf2_per_cu_data,
> dwarf2_per_cu_data_deleter>,
> std::allocator<std::unique_ptr<dwarf2_per_cu_data,
> dwarf2_per_cu_data_deleter> > >::_Vector_impl_data> = {_M_start =
> 0x3b6f980, _M_finish = 0x3b769e8, _M_end_of_storage = 0x3b769e8}, <No data
> fields>}}, <No data fields>}
> (gdb) p 0x3b769e8-0x3b6f980
> $424 = 28776
> ```
> 
> I can see the memory increasing rapidly in the for loop.
> I'm new to the gdb internal implementation.
> I'm not sure where could be the problem, gcc or gdb, or just a wrong use.
> 
> Could you help to point the direction? I have the files to reproduce it
> stablely.

GDB works in two steps to read compile units.  From you stack trace, it
looks like you are using an index (the .gdb_index kind).  When GDB first
loads you binary, it reads in an index present in the binary (or in the
index cache) that lists all the entity names present in each compile
unit of the program.  When you set a breakpoint using a name, GDB
"expands" all the compile units with something in it that matches what
you asked for.  "Expand" means that GDB reads the full debug information
from the DWARF for that compile unit, creating some internal data
structures to represent it.

It sounds like the breakpoint spec string you passed matches a lot of
compile units, and a lot of them get expanded.  That creates a lot of
in-memory objects, eventually reaching some limit.

Out of curiosity, what is the string you used to create your breakpoint?
>From you stack trace, it sounds like it's "utility.h:LINE".

Expanding that many CUs could be legitimate, if there's really something
matching in all these CUs, or it could be a bug where GDB expands
unrelated CUs.  There is an open bug related to a problem like this:

https://sourceware.org/bugzilla/show_bug.cgi?id=29105

Although I'm not sure this is what you see.

Is the project you build something open source that other people could
build and try?

Simon