From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 78460 invoked by alias); 10 Jan 2017 15:56:14 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 78450 invoked by uid 89); 10 Jan 2017 15:56:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=bright, sk:read_me, realised, ghost X-HELO: mail-wm0-f43.google.com Received: from mail-wm0-f43.google.com (HELO mail-wm0-f43.google.com) (74.125.82.43) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Jan 2017 15:56:03 +0000 Received: by mail-wm0-f43.google.com with SMTP id c85so132049588wmi.1 for ; Tue, 10 Jan 2017 07:56:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=yrsc9lFgx7OgOVE/hmIz9SvjIS8MB0QncDGkK6keJkw=; b=Y9FnhC9eW0QeKxTLhM6dVZGC+SIyQuJT9aw40P3nVnPVXgG0E/N1rIw3rXMLhutfsp 8AZuOwR6px/K+/0QnR6b1+P7qRg7FeO0tdbpZSfJV74XR8JAx2SWkBzIqPPA9/LORpP/ qq4hcmvjES8+5xpxTrzTfro+JTuCdw7n3jNenO9Vk5uwh3ZCNkN9vtc5J6F4nd6TKCTf pMGQ+fn9EnrIj44qXUhLTmZs4moRAzh3nd4xT/qXzOKGC6jngS6OYW8kZgxCFoup3kX8 4fsGs3z0Ei/tu+tSo3utZMds12AyZkQow9gLo7QLCmZDTS+H3V6zncTN5E5MWT/0F/EZ pCjw== X-Gm-Message-State: AIkVDXKv/M5RqNvMqMzAokc7OZlvAtKqi9gY/CUllSVypFMp6u9TRdmRi2wgADgkKnpWvyI/ X-Received: by 10.28.20.77 with SMTP id 74mr2470179wmu.86.1484063761319; Tue, 10 Jan 2017 07:56:01 -0800 (PST) Received: from griffinp-ThinkPad-X1-Carbon-2nd (cpc89244-aztw30-2-0-cust4998.18-1.cable.virginm.net. [86.31.179.135]) by smtp.gmail.com with ESMTPSA id ks1sm2884082wjb.26.2017.01.10.07.55.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 Jan 2017 07:56:00 -0800 (PST) Date: Tue, 10 Jan 2017 15:56:00 -0000 From: Peter Griffin To: Philipp Rudo Cc: gdb-patches@sourceware.org, palves@redhat.com, brobecker@adacore.com, kevinb@redhat.com, cagney@gnu.org, dje@google.com, drow@false.org, kettenis@gnu.org, yao.qi@arm.com, stanshebs@google.com, Ulrich.Weigand@de.ibm.com, elena.zannoni@oracle.com, eliz@gnu.org, arnez@linux.vnet.ibm.com Subject: Re: [PATCH] Linux kernel thread runtime support Message-ID: <20170110155556.GA15774@griffinp-ThinkPad-X1-Carbon-2nd> References: <1482427864-4317-1-git-send-email-peter.griffin@linaro.org> <20170103135514.0dad741e@ThinkPad> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170103135514.0dad741e@ThinkPad> User-Agent: Mutt/1.5.24 (2015-08-30) X-SW-Source: 2017-01/txt/msg00167.txt.bz2 Hi Philipp, Many thanks for your review feedback, and happy new year. On Tue, 03 Jan 2017, Philipp Rudo wrote: > Hi Peter > Hi everybody else > > well it had to come this way. I am also working on an kernel debugging > target quite similar to yours and planed to send my code next week or > the one after... Always the way :) Good to know others are interested in having a kernel thread layer in GDB as well. > > On the bright side, till now i worked on core dumps on S390 and also > have some features (especially module support with kernel virtual > memory handling) you are still missing. I actually removed all module and VM support from this initial submission to try and keep things as simple as possible to begin with. Although I would be interested to see how you are handling VM stuff in your version as it was/is an area of conern. As you probably realised I'm coming at this from a interactive kernel debug PoV (e.g. jtag or qemu), but there is no reason this and core dump support shouldn't share a common code base, as all that is really 'extra' in my usecase is keeping the thread list synchronized with the target over time. > So not all the work is in vain > we just need to find a common basis. Please give me some days so i can > finalize my code and send it to the list. Then we can discuss it in > detail. Yes sounds like a plan. > > Nevertheless i found some issues in your basic thread handling logic > (from kernel point of view) you should have a look at. See comments > below. Thanks for the review feedback, see my answers inline. > > Philipp > > > From linux-kthread.c: > > [...] > > > /* asm/generic/percpu.h > > * per_cpu_offset() is the offset that has to be added to a > > * percpu variable to get to the instance for a certain processor. > > * Most arches use the __per_cpu_offset array for those offsets but > > * some arches have their own ways of determining the offset (x86_64, > > s390). */ > > Yes, S390 is special in this case. You need to add an arch specific > hook. I made the hook optional, i.e. if not set the default method via > __per_cpu_offset array is used. Will define an arch hook for this in v2. > > > DECLARE_ADDR (__per_cpu_offset); > > DECLARE_ADDR (per_cpu__process_counts); > > DECLARE_ADDR (process_counts); > > DECLARE_ADDR (per_cpu__runqueues); > > DECLARE_ADDR (runqueues); > > > > #define CORE_INVAL (-1) > > CORE_INVAL is defined a second time in linux-kthread.h. Will fix in v2. > > [...] > > > void > > lkthread_get_per_cpu_offsets(int numcores) > > { > > enum bfd_endian byte_order = gdbarch_byte_order (target_gdbarch ()); > > int length = TYPE_LENGTH (builtin_type (target_gdbarch > > ())->builtin_data_ptr); CORE_ADDR curr_addr = ADDR (__per_cpu_offset); > > int core; > > > > > > if (!HAS_ADDR (__per_cpu_offset)) > > { > > if (debug_linuxkthread_threads) > > fprintf_unfiltered (gdb_stdlog, "Assuming non-SMP kernel.\n"); > > > > return; > > } > > > > for (core=0; core < numcores; core++) > > This iteration does not work. As CPUs can be added/removed the logical > CPU-ids needn't be continuous, e.g. with two CPUs the ids can be 0 and > 2. You have to walk the cpu_online_mask to see which CPUs are used at > the moment. The iteration appears several times in the code. Good point, that is something I had not considered! Will fix this in V2, although the code makes this assumption in quite a few places :( To get this working nicely I think either GDB will need a delete_thread() API which also decrements the thread number. As with the physical CPU being added / removed debug usecase we will need to add/remove a different number of swapper threads, and we don't wish for the GDB thread numbering to be 'ever increasing' in such a situation throughout the debug session. Alternatively (maybe better) I could create dummy swapper entries for every 'possible CPU', of the target, if some CPU's are offline these will remain dummy swapper threads. This has the benefit that thread numbering of all the Linux threads which are added after the swappers will keep the same thread numbering in GDB regardless of what CPU's are brought online / offline. > > > { > > if (!lkthread_h->per_cpu_offset[core]) > > lkthread_h->per_cpu_offset[core] = > > read_memory_unsigned_integer (curr_addr, length, > > byte_order); > > > > curr_addr += (CORE_ADDR) length; > > > > if (!lkthread_h->per_cpu_offset[core]) > > { > > warning ("Suspicious null per-cpu offsets," > > " or wrong number of detected cores:\n" > > "ADDR (__per_cpu_offset) = %s\nmax_cores = %d", > > phex (ADDR (__per_cpu_offset),4), max_cores); > > > > break; > > } > > } > > > > if (debug_linuxkthread_threads) > > fprintf_unfiltered (gdb_stdlog, "SMP kernel. %d cores detected\n", > > /numcores); > > } > > [...] > > > static void > > lkthread_init (void) > > { > > struct thread_info *th = NULL; > > struct cleanup *cleanup; > > int size = > > TYPE_LENGTH (builtin_type (target_gdbarch > > ())->builtin_unsigned_long); > > > > /* Ensure thread list from beneath target is up to date. */ > > cleanup = make_cleanup_restore_integer (&print_thread_events); > > print_thread_events = 0; > > update_thread_list (); > > do_cleanups (cleanup); > > > > /* Count the h/w threads. */ > > max_cores = thread_count (); > > The number of CPUs does not have to be constant. Especially when you > are running in a VM you can add/remove CPUs. Thus max_cores has to be > determined dynamically or set to the max number of CPUs, e.g. by taking > the size (in bits) of cpu_online_mask. Good point, will fix in V2 most likely by taking max possible CPU's from cpu_online_mask, and adding dummy swappers for each (as suggested above). Although I'm wondering what different GDB remote stubs do in this situation when a CPU has been taken offline. In Qemu at least it is still reported by the GDB remote stub, and we can still get a backtrace from it even when offline. I suspect with a OpenOCD setup when CPU is powered off OpenOCD will loose access to the debug unit and SWD/JTAG connection will drop (certainly my experience debugging with OpenOCD in the past with cpuidle enabled in the kernel). > > > gdb_assert (max_cores); > > > > if (debug_linuxkthread_threads) > > { > > fprintf_unfiltered (gdb_stdlog, "lkthread_init() cores(%d) GDB" > > "HW threads\n", max_cores); > > iterate_over_threads (thread_print_info, NULL); > > } > > > > /* Allocate per cpu data. */ > > lkthread_alloc_percpu_data(max_cores); > > Together with the comment above, allocating the percpu_data to a fixed > size can cause problems when the number of CPUs is increased. Will fix in v2. > > > lkthread_get_per_cpu_offsets(max_cores); > > > > if (!lkthread_get_runqueues_addr () && (max_cores > 1)) > > fprintf_unfiltered (gdb_stdlog, "Could not find the address of > > CPU" " runqueues current context information maybe " > > "less precise\n."); > > > > /* Invalidate the linux-kthread cached list. */ > > lkthread_invalidate_threadlist (); > > } > > [...] > > > static linux_kthread_info_t ** > > lkthread_get_threadlist_helper (linux_kthread_info_t ** ps) > > { > > struct linux_kthread_arch_ops *arch_ops = > > gdbarch_linux_kthread_ops (target_gdbarch ()); > > CORE_ADDR rq_idle_taskstruct; > > CORE_ADDR g, t, init_task_addr; > > int core = 0, i; > > > > if (debug_linuxkthread_threads) > > fprintf_unfiltered (gdb_stdlog, > > "lkthread_get_threadlist_helper\n"); > > > > init_task_addr = ADDR (init_task); > > g = init_task_addr; > > > > do > > { > > t = g; > > do > > { > > > > if (!arch_ops->is_kernel_address(t)) > > { > > warning ("parsing of task list stopped because of > > invalid address" "%s", phex (t, 4)); > > break; > > } > > > > lkthread_get_task_info (t, ps, core /*zero-based */ ); > > core = CORE_INVAL; > > > > if (ptid_get_tid (PTID_OF (*ps)) == 0) > > { > > /* This is init_task, let's insert the other cores > > swapper now. */ > > for (i = 1; i < max_cores; i++) > > { > > ps = &((*ps)->next); > > rq_idle_taskstruct = lkthread_get_rq_idle (i); > > lkthread_get_task_info (rq_idle_taskstruct, ps, i); > > } > > } > > How does the setting to a core work? For me it looks like only for > init_task and the swapper tasks core != CORE_INVAL. Is that the > intended behavior? Not sure if I have understood your question correctly or not but... It does look like a bug there though as these two lines > > lkthread_get_task_info (t, ps, core /*zero-based */ ); > > core = CORE_INVAL; Should be swapped, rather than adding all threads with core=0 they should be added with core=CORE_INVAL. The user is informed which thread is running on which physical CPU core by linux_kthread_extra_thread_info() callback. This calls lkthread_is_curr_task() for each GDB thread, which calls lkthread_get_running() which updates 'current->core'. If thread is currently executing on the CPU then linux_kthread_extra_thread_info() adds a 'C' suffix to thread name, so it is clear to the user when issuing a 'info threads' command. e.g ^C[New getty] Thread 1 received signal SIGINT, Interrupt. cpu_v7_do_idle () at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75 75 ret lr (gdb) info threads Id Target Id Frame * 1 [swapper/0] (pid: 0 tgid: 0 ) cpu_v7_do_idle () ^^^^ at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75 2 init (pid: 1 tgid: 1) context_switch (cookie=..., next=, prev=, rq=) at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902 3 [kthreadd] (pid: 2 tgid: 2) context_switch (cookie=..., next=, prev=, rq=) at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902 When target halted CPU0 was executing in swapper. Did that answer your question? > > > if (debug_linuxkthread_threads) > > fprintf_unfiltered (gdb_stdlog, "Got task info for %s > > (%li)\n", (*ps)->comm, ptid_get_lwp (PTID_OF (*ps))); > > > > ps = &((*ps)->next); > > > > /* Mark end of chain and remove those threads that > > disappeared from the thread_list to avoid any_thread_of_process() to > > select a ghost. */ > > if (*ps) > > (*ps)->valid = 0; > > > > t = _next_thread (t); > > } while (t && (t != g)); > > > > g = _next_task (g); > > } while (g && (g != init_task_addr)); > > Although pretty unlikely, the do {} while can end in an endless loop > when you have a memory corruption compromising one of the list_heads. > Thus there should be more sanity checks than just the one for NULL. I > use list_head->next->prev == list_head. Ok, thanks for the suggestion, will fix in v2. regards, Peter.