From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-136111-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 78460 invoked by alias); 10 Jan 2017 15:56:14 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 78450 invoked by uid 89); 10 Jan 2017 15:56:14 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=bright, sk:read_me, realised, ghost
X-HELO: mail-wm0-f43.google.com
Received: from mail-wm0-f43.google.com (HELO mail-wm0-f43.google.com) (74.125.82.43) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Jan 2017 15:56:03 +0000
Received: by mail-wm0-f43.google.com with SMTP id c85so132049588wmi.1        for <gdb-patches@sourceware.org>; Tue, 10 Jan 2017 07:56:03 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:date:from:to:cc:subject:message-id:references         :mime-version:content-disposition:in-reply-to:user-agent;        bh=yrsc9lFgx7OgOVE/hmIz9SvjIS8MB0QncDGkK6keJkw=;        b=Y9FnhC9eW0QeKxTLhM6dVZGC+SIyQuJT9aw40P3nVnPVXgG0E/N1rIw3rXMLhutfsp         8AZuOwR6px/K+/0QnR6b1+P7qRg7FeO0tdbpZSfJV74XR8JAx2SWkBzIqPPA9/LORpP/         qq4hcmvjES8+5xpxTrzTfro+JTuCdw7n3jNenO9Vk5uwh3ZCNkN9vtc5J6F4nd6TKCTf         pMGQ+fn9EnrIj44qXUhLTmZs4moRAzh3nd4xT/qXzOKGC6jngS6OYW8kZgxCFoup3kX8         4fsGs3z0Ei/tu+tSo3utZMds12AyZkQow9gLo7QLCmZDTS+H3V6zncTN5E5MWT/0F/EZ         pCjw==
X-Gm-Message-State: AIkVDXKv/M5RqNvMqMzAokc7OZlvAtKqi9gY/CUllSVypFMp6u9TRdmRi2wgADgkKnpWvyI/
X-Received: by 10.28.20.77 with SMTP id 74mr2470179wmu.86.1484063761319;        Tue, 10 Jan 2017 07:56:01 -0800 (PST)
Received: from griffinp-ThinkPad-X1-Carbon-2nd (cpc89244-aztw30-2-0-cust4998.18-1.cable.virginm.net. [86.31.179.135])        by smtp.gmail.com with ESMTPSA id ks1sm2884082wjb.26.2017.01.10.07.55.58        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);        Tue, 10 Jan 2017 07:56:00 -0800 (PST)
Date: Tue, 10 Jan 2017 15:56:00 -0000
From: Peter Griffin <peter.griffin@linaro.org>
To: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: gdb-patches@sourceware.org, palves@redhat.com, brobecker@adacore.com,	kevinb@redhat.com, cagney@gnu.org, dje@google.com, drow@false.org,	kettenis@gnu.org, yao.qi@arm.com, stanshebs@google.com,	Ulrich.Weigand@de.ibm.com, elena.zannoni@oracle.com, eliz@gnu.org,	arnez@linux.vnet.ibm.com
Subject: Re: [PATCH] Linux kernel thread runtime support
Message-ID: <20170110155556.GA15774@griffinp-ThinkPad-X1-Carbon-2nd>
References: <1482427864-4317-1-git-send-email-peter.griffin@linaro.org> <20170103135514.0dad741e@ThinkPad>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170103135514.0dad741e@ThinkPad>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-SW-Source: 2017-01/txt/msg00167.txt.bz2

Hi Philipp,

Many thanks for your review feedback, and happy new year.

On Tue, 03 Jan 2017, Philipp Rudo wrote:

> Hi Peter
> Hi everybody else
> 
> well it had to come this way. I am also working on an kernel debugging
> target quite similar to yours and planed to send my code next week or
> the one after...

Always the way :) Good to know others are interested in having a kernel
thread layer in GDB as well.

> 
> On the bright side, till now i worked on core dumps on S390 and also
> have some features (especially module support with kernel virtual
> memory handling) you are still missing.

I actually removed all module and VM support from this initial submission
to try and keep things as simple as possible to begin with. Although
I would be interested to see how you are handling VM stuff in your version
as it was/is an area of conern.

As you probably realised I'm coming at this from a interactive kernel debug
PoV (e.g. jtag or qemu), but there is no reason this and core dump support
shouldn't share a common code base, as all that is really 'extra' in my
usecase is keeping the thread list synchronized with the target over time.

> So not all the work is in vain
> we just need to find a common basis. Please give me some days so i can
> finalize my code and send it to the list. Then we can discuss it in
> detail.

Yes sounds like a plan.

> 
> Nevertheless i found some issues in your basic thread handling logic
> (from kernel point of view) you should have a look at. See comments
> below.

Thanks for the review feedback, see my answers inline.

> 
> Philipp
> 
> 
> From linux-kthread.c:
> 
> [...]
> 
> > /* asm/generic/percpu.h
> >  * per_cpu_offset() is the offset that has to be added to a
> >  * percpu variable to get to the instance for a certain processor.
> >  * Most arches use the __per_cpu_offset array for those offsets but
> >  * some arches have their own ways of determining the offset (x86_64,
> > s390). */
> 
> Yes, S390 is special in this case. You need to add an arch specific
> hook. I made the hook optional, i.e. if not set the default method via
> __per_cpu_offset array is used.

Will define an arch hook for this in v2.

> 
> > DECLARE_ADDR (__per_cpu_offset);
> > DECLARE_ADDR (per_cpu__process_counts);
> > DECLARE_ADDR (process_counts);
> > DECLARE_ADDR (per_cpu__runqueues);
> > DECLARE_ADDR (runqueues);
> > 
> > #define CORE_INVAL (-1)
> 
> CORE_INVAL is defined a second time in linux-kthread.h.

Will fix in v2.

> 
> [...]
> 
> > void
> > lkthread_get_per_cpu_offsets(int numcores)
> > {
> >   enum bfd_endian byte_order = gdbarch_byte_order (target_gdbarch ());
> >   int length = TYPE_LENGTH (builtin_type (target_gdbarch
> > ())->builtin_data_ptr); CORE_ADDR curr_addr = ADDR (__per_cpu_offset);
> >   int core;
> > 
> > 
> >   if (!HAS_ADDR (__per_cpu_offset))
> >     {
> >       if (debug_linuxkthread_threads)
> > 	fprintf_unfiltered (gdb_stdlog, "Assuming non-SMP kernel.\n");
> > 
> >       return;
> >     }
> > 
> >   for (core=0; core < numcores; core++)
> 
> This iteration does not work. As CPUs can be added/removed the logical
> CPU-ids needn't be continuous, e.g. with two CPUs the ids can be 0 and
> 2. You have to walk the cpu_online_mask to see which CPUs are used at
> the moment. The iteration appears several times in the code.

Good point, that is something I had not considered! Will fix this
in V2, although the code makes this assumption in quite a few places :(

To get this working nicely I think either GDB will need a delete_thread() API which
also decrements the thread number. As with the physical CPU being added /
removed debug usecase we will need to add/remove a different number of swapper
threads, and we don't wish for the GDB thread numbering to be 'ever increasing'
in such a situation throughout the debug session.

Alternatively (maybe better) I could create dummy swapper entries for every
'possible CPU', of the target, if some CPU's are offline these will remain
dummy swapper threads. This has the benefit that thread numbering of all the
Linux threads which are added after the swappers will keep the same thread
numbering in GDB regardless of what CPU's are brought online / offline.

> 
> >   {
> >       if (!lkthread_h->per_cpu_offset[core])
> > 	lkthread_h->per_cpu_offset[core] =
> > 	  read_memory_unsigned_integer (curr_addr, length,
> > byte_order);
> > 
> >       curr_addr += (CORE_ADDR) length;
> > 
> >       if (!lkthread_h->per_cpu_offset[core])
> > 	{
> > 	  warning ("Suspicious null per-cpu offsets,"
> > 		   " or wrong number of detected cores:\n"
> > 		   "ADDR (__per_cpu_offset) = %s\nmax_cores = %d",
> > 		   phex (ADDR (__per_cpu_offset),4), max_cores);
> > 
> > 	  break;
> > 	}
> >     }
> > 
> >   if (debug_linuxkthread_threads)
> >     fprintf_unfiltered (gdb_stdlog, "SMP kernel. %d cores detected\n",
> > 			/numcores);
> > }
> 
> [...]
> 
> > static void
> > lkthread_init (void)
> > {
> >   struct thread_info *th = NULL;
> >   struct cleanup *cleanup;
> >   int size =
> >     TYPE_LENGTH (builtin_type (target_gdbarch
> > ())->builtin_unsigned_long);
> > 
> >   /* Ensure thread list from beneath target is up to date.  */
> >   cleanup = make_cleanup_restore_integer (&print_thread_events);
> >   print_thread_events = 0;
> >   update_thread_list ();
> >   do_cleanups (cleanup);
> > 
> >   /* Count the h/w threads.  */
> >   max_cores = thread_count ();
> 
> The number of CPUs does not have to be constant. Especially when you
> are running in a VM you can add/remove CPUs. Thus max_cores has to be
> determined dynamically or set to the max number of CPUs, e.g. by taking
> the size (in bits) of cpu_online_mask.

Good point, will fix in V2 most likely by taking max possible CPU's from
cpu_online_mask, and adding dummy swappers for each (as suggested above).

Although I'm wondering what different GDB remote stubs do in this situation when
a CPU has been taken offline. In Qemu at least it is still reported by the GDB
remote stub, and we can still get a backtrace from it even when offline.

I suspect with a OpenOCD setup when CPU is powered off OpenOCD will loose
access to the debug unit and SWD/JTAG connection will drop (certainly my
experience debugging with OpenOCD in the past with cpuidle enabled in the
kernel).

> 
> >   gdb_assert (max_cores);
> > 
> >   if (debug_linuxkthread_threads)
> >     {
> >       fprintf_unfiltered (gdb_stdlog, "lkthread_init() cores(%d) GDB"
> > 			  "HW threads\n", max_cores);
> >       iterate_over_threads (thread_print_info, NULL);
> >     }
> > 
> >   /* Allocate per cpu data.  */
> >   lkthread_alloc_percpu_data(max_cores);
> 
> Together with the comment above, allocating the percpu_data to a fixed
> size can cause problems when the number of CPUs is increased.

Will fix in v2.

> 
> >   lkthread_get_per_cpu_offsets(max_cores);
> > 
> >   if (!lkthread_get_runqueues_addr () && (max_cores > 1))
> >     fprintf_unfiltered (gdb_stdlog, "Could not find the address of
> > CPU" " runqueues current context information maybe "
> > 			"less precise\n.");
> > 
> >   /* Invalidate the linux-kthread cached list.  */
> >   lkthread_invalidate_threadlist ();
> > }
> 
> [...]
> 
> > static linux_kthread_info_t **
> > lkthread_get_threadlist_helper (linux_kthread_info_t ** ps)
> > {
> >   struct linux_kthread_arch_ops *arch_ops =
> >     gdbarch_linux_kthread_ops (target_gdbarch ());
> >   CORE_ADDR rq_idle_taskstruct;
> >   CORE_ADDR g, t, init_task_addr;
> >   int core = 0, i;
> > 
> >   if (debug_linuxkthread_threads)
> >     fprintf_unfiltered (gdb_stdlog,
> > "lkthread_get_threadlist_helper\n");
> > 
> >   init_task_addr = ADDR (init_task);
> >   g = init_task_addr;
> > 
> >   do
> >     {
> >       t = g;
> >       do
> >         {
> > 
> > 	  if (!arch_ops->is_kernel_address(t))
> > 	    {
> > 	      warning ("parsing of task list stopped because of
> > invalid address" "%s", phex (t, 4));
> >               break;
> > 	    }
> > 
> >           lkthread_get_task_info (t, ps, core /*zero-based */ );
> >           core = CORE_INVAL;
> > 
> >           if (ptid_get_tid (PTID_OF (*ps)) == 0)
> >             {
> >               /* This is init_task, let's insert the other cores
> > swapper now.  */
> >               for (i = 1; i < max_cores; i++)
> >                 {
> >                   ps = &((*ps)->next);
> >                   rq_idle_taskstruct = lkthread_get_rq_idle (i);
> >                   lkthread_get_task_info (rq_idle_taskstruct, ps, i);
> >                 }
> >             }
> 
> How does the setting to a core work? For me it looks like only for
> init_task and the swapper tasks core != CORE_INVAL. Is that the
> intended behavior?

Not sure if I have understood your question correctly or not but...

It does look like a bug there though as these two lines

> >           lkthread_get_task_info (t, ps, core /*zero-based */ );
> >           core = CORE_INVAL;

Should be swapped, rather than adding all threads with core=0 they
should be added with core=CORE_INVAL.

The user is informed which thread is running on which physical CPU core by
linux_kthread_extra_thread_info() callback. This calls
lkthread_is_curr_task() for each GDB thread, which calls
lkthread_get_running() which updates 'current->core'.

If thread is currently executing on the CPU then
linux_kthread_extra_thread_info() adds a 'C<num>'
suffix to thread name, so it is clear to the user when issuing
a 'info threads' command.

e.g

^C[New getty]

Thread 1 received signal SIGINT, Interrupt.
cpu_v7_do_idle () at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75
75		ret	lr
(gdb) info threads
  Id   Target Id         Frame 
* 1    [swapper/0] (pid: 0 tgid: 0 <C0>) cpu_v7_do_idle ()
                                   ^^^^

at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/arch/arm/mm/proc-v7.S:75
  2    init (pid: 1 tgid: 1) context_switch (cookie=..., next=<optimized out>, prev=<optimized out>, rq=<optimized out>)
    at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902
  3    [kthreadd] (pid: 2 tgid: 2) context_switch (cookie=..., next=<optimized out>, prev=<optimized out>, 
    rq=<optimized out>) at /home/griffinp/Software/gdb/lkd/lkd/sources/linux/kernel/sched/core.c:2902

When target halted CPU0 was executing in swapper.

Did that answer your question?

> 
> > 	    if (debug_linuxkthread_threads)
> > 	      fprintf_unfiltered (gdb_stdlog, "Got task info for %s
> > (%li)\n", (*ps)->comm, ptid_get_lwp (PTID_OF (*ps)));
> > 
> >           ps = &((*ps)->next);
> > 
> > 	  /* Mark end of chain and remove those threads that
> > disappeared from the thread_list to avoid any_thread_of_process() to
> > 	     select a ghost.  */
> >           if (*ps)
> >             (*ps)->valid = 0;
> > 
> >           t = _next_thread (t);
> >         } while (t && (t != g));
> > 
> >       g = _next_task (g);
> >     } while (g && (g != init_task_addr));
> 
> Although pretty unlikely, the do {} while can end in an endless loop
> when you have a memory corruption compromising one of the list_heads.
> Thus there should be more sanity checks than just the one for NULL. I
> use list_head->next->prev == list_head.

Ok, thanks for the suggestion, will fix in v2.

regards,

Peter.