public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* Linux kernel debugging and other features
@ 2019-05-13 20:20 Thomas Caputi
  2019-05-14  8:34 ` Florian Weimer
  2019-05-14 13:49 ` Philipp Rudo
  0 siblings, 2 replies; 5+ messages in thread
From: Thomas Caputi @ 2019-05-13 20:20 UTC (permalink / raw)
  To: gdb

Hello gdb,

My name is Tom Caputi and I am a developer for ZFS on Linux. Recently,
I had the opportunity to work with some members of Delphix (another
major ZFS on Linux contributor) to build some debugging tools.

When we started working on this project we were surprised to see how
close gdb was to supporting this kernel debugging natively. For live
systems, we were able to use the kernel's vmlinux file from the dbgsym
package (after mucking around for a bit with KASLR offsets) along with
/proc/kcore as a core file to inspect just about any non-local
variable on the system.

For inspecting post-mortem kdumps we found that Jeff Mahoney had
already been working on this
(https://github.com/jeffmahoney/crash-python). Kdump files are
compressed and use a different on-disk format from regular core files,
but he was able to create a new "kdump" target type to support that.
His work also included code that allowed us to load the symbols for
kernel modules with their correct offsets.

Jeff also has a python script that was able to parse out Linux's list
of task_struct structures (which represent all threads on the system
threads) and hand them to gdb. This allowed us to switch threads and
view stack traces with function arguments just as we could when using
gdb to debug a userspace program.

On top of all of this, members of the Delphix team were able to put
together some code to allow some custom gdb sub-commands (written in
python) to be piped together comparable to the way commands can be
piped together in bash. By doing this we were able to put together a
few relatively simply commands to get some really powerful debugging
output.

Currently, all of this is still in the proof-of-concept stage, but I
think both Datto (my company) and Delphix would like to look to the
next steps to get these improvements integrated upstream and
stabilized. We think these could be a huge improvement to the current
situation of debugging any code in the Linux kernel. However, there
are some sticky bits that we would like to discuss if the gdb
community is interested in these changes:

1) The kdumpfile support currently requires a few custom patches added
to gdb that allow a user to create a custom target in python. The
kdumpfile target is then implemented as a python module that calls out
to libkdumpfile (written in c). I'm not sure if this is the desired
implementation of this feature. If it is not, could we get some
pointers for how we could add this support to gdb?

2) The /proc/kcore file *looks* like a core file, but it is constantly
changing underneath us as the live system changes. When debugging code
we had issues where values that should be changing were cached and
appeared to remain static. We were able to reduce the gdb cache size
to 2 bytes (I think) by running 'set stack-cache off; set code-cache
off; set dcache size 1; set dcache line-size 2', but this still
results in (at least) the last variable you inspected being cached
until you look at something else. Is there a way we can completely
disable the dcache?

3) We aren't 100% sure where all of the new code belongs. The
ZFS-specific debugging commands we can definitely keep in the ZFS
repository, but the sub-command piping infrastructure could be useful
to anyone using gdb. We're also not really sure where the scripts that
parse out kernel structures (for things like threads and per-cpu
variables) should end up.

Please let us know if you are interested in any of these changes and
let us know what some good next steps would be.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux kernel debugging and other features
  2019-05-13 20:20 Linux kernel debugging and other features Thomas Caputi
@ 2019-05-14  8:34 ` Florian Weimer
  2019-05-14 13:49 ` Philipp Rudo
  1 sibling, 0 replies; 5+ messages in thread
From: Florian Weimer @ 2019-05-14  8:34 UTC (permalink / raw)
  To: Thomas Caputi; +Cc: gdb

* Thomas Caputi:

> Please let us know if you are interested in any of these changes and
> let us know what some good next steps would be.

The crash utility seems to collect a lot of things in this area:

  <https://github.com/crash-utility/crash>
  <https://www.redhat.com/mailman/listinfo/crash-utility>

Perhaps that's the right place for your tools as well?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux kernel debugging and other features
  2019-05-13 20:20 Linux kernel debugging and other features Thomas Caputi
  2019-05-14  8:34 ` Florian Weimer
@ 2019-05-14 13:49 ` Philipp Rudo
  2019-05-15 13:28   ` Thomas Caputi
  1 sibling, 1 reply; 5+ messages in thread
From: Philipp Rudo @ 2019-05-14 13:49 UTC (permalink / raw)
  To: Thomas Caputi; +Cc: gdb, Florian Weimer, Dave Anderson

Hi Tom,

that really sounds like a great piece of code you are having. A little back I
worked at a similar project with focus on dump debugging on s390 [1].
Unfortunately the project never made it beyond the prototype...

My approach back then was to include everything directly into GDB without
using any python scripts whatsoever. All in all it worked quite well and
allowed walking the task_struct and converting them to gdb threads,
create backtrace of kernel threads and loading kernel modules as 'shared
libraries'.

However there where also some drawbacks. As it was directly included into gdb
only elf dumps could be inspected. The virtual address translation was a
little bit shaky as addresses in gdb are represented by simple ulongs without
any information about the address space. Furthermore I concentrated on
post-mortem dumps, exactly for the problems you describe below...

Anyway, I'm thrilled to have a look at your code and see how you are solving
those problems.

I'm not sure if Florian's suggestion to add the code to crash is the proper
way. Crash is based on a 7.6 gdb. So it's already quite old and probably
doesn't have all the functionality you need. Nevertheless I CCed Dave
Anderson, the crash maintainer. I guess he's interested in this discussion as
well.

Back when I was working on my project I didn't have the impression that there
was any fundamental objections in adding the feature to gdb. It basically
depends on your implementation to solve the problem. So for the next step, I'd
suggest you simply post your patches to the gdb-patches list and we take the
discussion from there.  

Thanks
Philipp

[1] https://sourceware.org/ml/gdb-patches/2018-03/msg00243.html 

On Mon, 13 May 2019 16:19:49 -0400
Thomas Caputi <tcaputi@datto.com> wrote:

> Hello gdb,
> 
> My name is Tom Caputi and I am a developer for ZFS on Linux. Recently,
> I had the opportunity to work with some members of Delphix (another
> major ZFS on Linux contributor) to build some debugging tools.
> 
> When we started working on this project we were surprised to see how
> close gdb was to supporting this kernel debugging natively. For live
> systems, we were able to use the kernel's vmlinux file from the dbgsym
> package (after mucking around for a bit with KASLR offsets) along with
> /proc/kcore as a core file to inspect just about any non-local
> variable on the system.
> 
> For inspecting post-mortem kdumps we found that Jeff Mahoney had
> already been working on this
> (https://github.com/jeffmahoney/crash-python). Kdump files are
> compressed and use a different on-disk format from regular core files,
> but he was able to create a new "kdump" target type to support that.
> His work also included code that allowed us to load the symbols for
> kernel modules with their correct offsets.
> 
> Jeff also has a python script that was able to parse out Linux's list
> of task_struct structures (which represent all threads on the system
> threads) and hand them to gdb. This allowed us to switch threads and
> view stack traces with function arguments just as we could when using
> gdb to debug a userspace program.
> 
> On top of all of this, members of the Delphix team were able to put
> together some code to allow some custom gdb sub-commands (written in
> python) to be piped together comparable to the way commands can be
> piped together in bash. By doing this we were able to put together a
> few relatively simply commands to get some really powerful debugging
> output.
> 
> Currently, all of this is still in the proof-of-concept stage, but I
> think both Datto (my company) and Delphix would like to look to the
> next steps to get these improvements integrated upstream and
> stabilized. We think these could be a huge improvement to the current
> situation of debugging any code in the Linux kernel. However, there
> are some sticky bits that we would like to discuss if the gdb
> community is interested in these changes:
> 
> 1) The kdumpfile support currently requires a few custom patches added
> to gdb that allow a user to create a custom target in python. The
> kdumpfile target is then implemented as a python module that calls out
> to libkdumpfile (written in c). I'm not sure if this is the desired
> implementation of this feature. If it is not, could we get some
> pointers for how we could add this support to gdb?
> 
> 2) The /proc/kcore file *looks* like a core file, but it is constantly
> changing underneath us as the live system changes. When debugging code
> we had issues where values that should be changing were cached and
> appeared to remain static. We were able to reduce the gdb cache size
> to 2 bytes (I think) by running 'set stack-cache off; set code-cache
> off; set dcache size 1; set dcache line-size 2', but this still
> results in (at least) the last variable you inspected being cached
> until you look at something else. Is there a way we can completely
> disable the dcache?
> 
> 3) We aren't 100% sure where all of the new code belongs. The
> ZFS-specific debugging commands we can definitely keep in the ZFS
> repository, but the sub-command piping infrastructure could be useful
> to anyone using gdb. We're also not really sure where the scripts that
> parse out kernel structures (for things like threads and per-cpu
> variables) should end up.
> 
> Please let us know if you are interested in any of these changes and
> let us know what some good next steps would be.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux kernel debugging and other features
  2019-05-14 13:49 ` Philipp Rudo
@ 2019-05-15 13:28   ` Thomas Caputi
  2019-05-15 14:28     ` Philipp Rudo
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Caputi @ 2019-05-15 13:28 UTC (permalink / raw)
  To: Philipp Rudo; +Cc: gdb, Florian Weimer, Dave Anderson

I apologize, but after speaking with some of the people at Delphix I
may have made things a little more confusing than they should have
been and spoken a bit sooner than I should have. To simplify the
current status, we should probably divide this conversation into 2
topics.

First is the set of lower-level changes to gdb that allow it to read
and understand the linux kernel and kdumps, both of which were largely
done by Jeff, although we adapted some of his code to be able to work
with live systems via /proc/kcore. In this sub-category, there are 2
main bodies of work. First is the bits of python code that inform gdb
how to interpret kernel memory and gather things like threads and
per-cpu variables so that it can present them normally. This code
works for both live and post-mortem systems. Second is the "kdump"
target which consists of (1) some patches to gdb to allow targets to
be written in python, (2) libkdumpfile (a separate library for parsing
kdump files), and (3) the kdump target itself which creates a python
target using libkdumpfile. Since this code is mostly Jeff's work (as
far as I am aware) we are going to work with him to see what he would
like to do / what his plans are.

Second is the set of gdb extensions written in python that allow
custom sub-commands to be piped together. These were written
completely by the folks at Delphix and should work with vanilla gdb
with or without Jeff's work. As for this work I have been informed
that they would like a little more time to polish the code, so perhaps
we can table this discussion for a bit until the code is ready.

So, in short, I think I jumped the gun a little bit in bringing these
changes to this mailing list. I think we still have a little more work
and polishing to do before we are ready to share everything. At this
point I guess we will come back to this discussion when things are
more ready.

Thanks,
Tom

On Tue, May 14, 2019 at 10:16 AM Philipp Rudo <prudo@linux.ibm.com> wrote:
>
> Hi Tom,
>
> that really sounds like a great piece of code you are having. A little back I
> worked at a similar project with focus on dump debugging on s390 [1].
> Unfortunately the project never made it beyond the prototype...
>
> My approach back then was to include everything directly into GDB without
> using any python scripts whatsoever. All in all it worked quite well and
> allowed walking the task_struct and converting them to gdb threads,
> create backtrace of kernel threads and loading kernel modules as 'shared
> libraries'.
>
> However there where also some drawbacks. As it was directly included into gdb
> only elf dumps could be inspected. The virtual address translation was a
> little bit shaky as addresses in gdb are represented by simple ulongs without
> any information about the address space. Furthermore I concentrated on
> post-mortem dumps, exactly for the problems you describe below...
>
> Anyway, I'm thrilled to have a look at your code and see how you are solving
> those problems.
>
> I'm not sure if Florian's suggestion to add the code to crash is the proper
> way. Crash is based on a 7.6 gdb. So it's already quite old and probably
> doesn't have all the functionality you need. Nevertheless I CCed Dave
> Anderson, the crash maintainer. I guess he's interested in this discussion as
> well.
>
> Back when I was working on my project I didn't have the impression that there
> was any fundamental objections in adding the feature to gdb. It basically
> depends on your implementation to solve the problem. So for the next step, I'd
> suggest you simply post your patches to the gdb-patches list and we take the
> discussion from there.
>
> Thanks
> Philipp
>
> [1] https://sourceware.org/ml/gdb-patches/2018-03/msg00243.html
>
> On Mon, 13 May 2019 16:19:49 -0400
> Thomas Caputi <tcaputi@datto.com> wrote:
>
> > Hello gdb,
> >
> > My name is Tom Caputi and I am a developer for ZFS on Linux. Recently,
> > I had the opportunity to work with some members of Delphix (another
> > major ZFS on Linux contributor) to build some debugging tools.
> >
> > When we started working on this project we were surprised to see how
> > close gdb was to supporting this kernel debugging natively. For live
> > systems, we were able to use the kernel's vmlinux file from the dbgsym
> > package (after mucking around for a bit with KASLR offsets) along with
> > /proc/kcore as a core file to inspect just about any non-local
> > variable on the system.
> >
> > For inspecting post-mortem kdumps we found that Jeff Mahoney had
> > already been working on this
> > (https://github.com/jeffmahoney/crash-python). Kdump files are
> > compressed and use a different on-disk format from regular core files,
> > but he was able to create a new "kdump" target type to support that.
> > His work also included code that allowed us to load the symbols for
> > kernel modules with their correct offsets.
> >
> > Jeff also has a python script that was able to parse out Linux's list
> > of task_struct structures (which represent all threads on the system
> > threads) and hand them to gdb. This allowed us to switch threads and
> > view stack traces with function arguments just as we could when using
> > gdb to debug a userspace program.
> >
> > On top of all of this, members of the Delphix team were able to put
> > together some code to allow some custom gdb sub-commands (written in
> > python) to be piped together comparable to the way commands can be
> > piped together in bash. By doing this we were able to put together a
> > few relatively simply commands to get some really powerful debugging
> > output.
> >
> > Currently, all of this is still in the proof-of-concept stage, but I
> > think both Datto (my company) and Delphix would like to look to the
> > next steps to get these improvements integrated upstream and
> > stabilized. We think these could be a huge improvement to the current
> > situation of debugging any code in the Linux kernel. However, there
> > are some sticky bits that we would like to discuss if the gdb
> > community is interested in these changes:
> >
> > 1) The kdumpfile support currently requires a few custom patches added
> > to gdb that allow a user to create a custom target in python. The
> > kdumpfile target is then implemented as a python module that calls out
> > to libkdumpfile (written in c). I'm not sure if this is the desired
> > implementation of this feature. If it is not, could we get some
> > pointers for how we could add this support to gdb?
> >
> > 2) The /proc/kcore file *looks* like a core file, but it is constantly
> > changing underneath us as the live system changes. When debugging code
> > we had issues where values that should be changing were cached and
> > appeared to remain static. We were able to reduce the gdb cache size
> > to 2 bytes (I think) by running 'set stack-cache off; set code-cache
> > off; set dcache size 1; set dcache line-size 2', but this still
> > results in (at least) the last variable you inspected being cached
> > until you look at something else. Is there a way we can completely
> > disable the dcache?
> >
> > 3) We aren't 100% sure where all of the new code belongs. The
> > ZFS-specific debugging commands we can definitely keep in the ZFS
> > repository, but the sub-command piping infrastructure could be useful
> > to anyone using gdb. We're also not really sure where the scripts that
> > parse out kernel structures (for things like threads and per-cpu
> > variables) should end up.
> >
> > Please let us know if you are interested in any of these changes and
> > let us know what some good next steps would be.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux kernel debugging and other features
  2019-05-15 13:28   ` Thomas Caputi
@ 2019-05-15 14:28     ` Philipp Rudo
  0 siblings, 0 replies; 5+ messages in thread
From: Philipp Rudo @ 2019-05-15 14:28 UTC (permalink / raw)
  To: Thomas Caputi; +Cc: gdb, Florian Weimer, Dave Anderson

Hi Tom,

On Tue, 14 May 2019 18:04:25 -0400
Thomas Caputi <tcaputi@datto.com> wrote:

> I apologize, but after speaking with some of the people at Delphix I
> may have made things a little more confusing than they should have
> been and spoken a bit sooner than I should have. To simplify the
> current status, we should probably divide this conversation into 2
> topics.

No problem. Maybe I got a little over excited as well :)

> First is the set of lower-level changes to gdb that allow it to read
> and understand the linux kernel and kdumps, both of which were largely
> done by Jeff, although we adapted some of his code to be able to work
> with live systems via /proc/kcore. In this sub-category, there are 2
> main bodies of work. First is the bits of python code that inform gdb
> how to interpret kernel memory and gather things like threads and
> per-cpu variables so that it can present them normally. This code
> works for both live and post-mortem systems. Second is the "kdump"
> target which consists of (1) some patches to gdb to allow targets to
> be written in python, (2) libkdumpfile (a separate library for parsing
> kdump files), and (3) the kdump target itself which creates a python
> target using libkdumpfile. Since this code is mostly Jeff's work (as
> far as I am aware) we are going to work with him to see what he would
> like to do / what his plans are.

Sure you should discuss that with Jeff. Nevertheless when you decide you want
to include the code in gdb, I think it is best you simply post the patches to
the mailing list. In my opinion you can discuss a topic best when you can have
a look at the code.

The only thing I'm not so sure about is the current policy about including
external libraries. I.e. if it is preferred to include libkdumpfile or
teach gdb/bfd directly to read kdump files. In the end that is something the
maintainers have to decide. But again, it's best to discuss that with a patch. 

> Second is the set of gdb extensions written in python that allow
> custom sub-commands to be piped together. These were written
> completely by the folks at Delphix and should work with vanilla gdb
> with or without Jeff's work. As for this work I have been informed
> that they would like a little more time to polish the code, so perhaps
> we can table this discussion for a bit until the code is ready.

Recently Philippe Waroquiers posted some patches [1] to implement a pipe
command. You should take a look at them, so you don't implement the same
functionality twice. 

[1] https://sourceware.org/ml/gdb-patches/2019-05/msg00133.html

> So, in short, I think I jumped the gun a little bit in bringing these
> changes to this mailing list. I think we still have a little more work
> and polishing to do before we are ready to share everything. At this
> point I guess we will come back to this discussion when things are
> more ready.

Sure, take your time. 

Looking forward to see your patches at the mailing list
Philipp

> Thanks,
> Tom
> 
> On Tue, May 14, 2019 at 10:16 AM Philipp Rudo <prudo@linux.ibm.com> wrote:
> >
> > Hi Tom,
> >
> > that really sounds like a great piece of code you are having. A little back I
> > worked at a similar project with focus on dump debugging on s390 [1].
> > Unfortunately the project never made it beyond the prototype...
> >
> > My approach back then was to include everything directly into GDB without
> > using any python scripts whatsoever. All in all it worked quite well and
> > allowed walking the task_struct and converting them to gdb threads,
> > create backtrace of kernel threads and loading kernel modules as 'shared
> > libraries'.
> >
> > However there where also some drawbacks. As it was directly included into gdb
> > only elf dumps could be inspected. The virtual address translation was a
> > little bit shaky as addresses in gdb are represented by simple ulongs without
> > any information about the address space. Furthermore I concentrated on
> > post-mortem dumps, exactly for the problems you describe below...
> >
> > Anyway, I'm thrilled to have a look at your code and see how you are solving
> > those problems.
> >
> > I'm not sure if Florian's suggestion to add the code to crash is the proper
> > way. Crash is based on a 7.6 gdb. So it's already quite old and probably
> > doesn't have all the functionality you need. Nevertheless I CCed Dave
> > Anderson, the crash maintainer. I guess he's interested in this discussion as
> > well.
> >
> > Back when I was working on my project I didn't have the impression that there
> > was any fundamental objections in adding the feature to gdb. It basically
> > depends on your implementation to solve the problem. So for the next step, I'd
> > suggest you simply post your patches to the gdb-patches list and we take the
> > discussion from there.
> >
> > Thanks
> > Philipp
> >
> > [1] https://sourceware.org/ml/gdb-patches/2018-03/msg00243.html
> >
> > On Mon, 13 May 2019 16:19:49 -0400
> > Thomas Caputi <tcaputi@datto.com> wrote:
> >  
> > > Hello gdb,
> > >
> > > My name is Tom Caputi and I am a developer for ZFS on Linux. Recently,
> > > I had the opportunity to work with some members of Delphix (another
> > > major ZFS on Linux contributor) to build some debugging tools.
> > >
> > > When we started working on this project we were surprised to see how
> > > close gdb was to supporting this kernel debugging natively. For live
> > > systems, we were able to use the kernel's vmlinux file from the dbgsym
> > > package (after mucking around for a bit with KASLR offsets) along with
> > > /proc/kcore as a core file to inspect just about any non-local
> > > variable on the system.
> > >
> > > For inspecting post-mortem kdumps we found that Jeff Mahoney had
> > > already been working on this
> > > (https://github.com/jeffmahoney/crash-python). Kdump files are
> > > compressed and use a different on-disk format from regular core files,
> > > but he was able to create a new "kdump" target type to support that.
> > > His work also included code that allowed us to load the symbols for
> > > kernel modules with their correct offsets.
> > >
> > > Jeff also has a python script that was able to parse out Linux's list
> > > of task_struct structures (which represent all threads on the system
> > > threads) and hand them to gdb. This allowed us to switch threads and
> > > view stack traces with function arguments just as we could when using
> > > gdb to debug a userspace program.
> > >
> > > On top of all of this, members of the Delphix team were able to put
> > > together some code to allow some custom gdb sub-commands (written in
> > > python) to be piped together comparable to the way commands can be
> > > piped together in bash. By doing this we were able to put together a
> > > few relatively simply commands to get some really powerful debugging
> > > output.
> > >
> > > Currently, all of this is still in the proof-of-concept stage, but I
> > > think both Datto (my company) and Delphix would like to look to the
> > > next steps to get these improvements integrated upstream and
> > > stabilized. We think these could be a huge improvement to the current
> > > situation of debugging any code in the Linux kernel. However, there
> > > are some sticky bits that we would like to discuss if the gdb
> > > community is interested in these changes:
> > >
> > > 1) The kdumpfile support currently requires a few custom patches added
> > > to gdb that allow a user to create a custom target in python. The
> > > kdumpfile target is then implemented as a python module that calls out
> > > to libkdumpfile (written in c). I'm not sure if this is the desired
> > > implementation of this feature. If it is not, could we get some
> > > pointers for how we could add this support to gdb?
> > >
> > > 2) The /proc/kcore file *looks* like a core file, but it is constantly
> > > changing underneath us as the live system changes. When debugging code
> > > we had issues where values that should be changing were cached and
> > > appeared to remain static. We were able to reduce the gdb cache size
> > > to 2 bytes (I think) by running 'set stack-cache off; set code-cache
> > > off; set dcache size 1; set dcache line-size 2', but this still
> > > results in (at least) the last variable you inspected being cached
> > > until you look at something else. Is there a way we can completely
> > > disable the dcache?
> > >
> > > 3) We aren't 100% sure where all of the new code belongs. The
> > > ZFS-specific debugging commands we can definitely keep in the ZFS
> > > repository, but the sub-command piping infrastructure could be useful
> > > to anyone using gdb. We're also not really sure where the scripts that
> > > parse out kernel structures (for things like threads and per-cpu
> > > variables) should end up.
> > >
> > > Please let us know if you are interested in any of these changes and
> > > let us know what some good next steps would be.  
> >  
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-05-15 13:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-13 20:20 Linux kernel debugging and other features Thomas Caputi
2019-05-14  8:34 ` Florian Weimer
2019-05-14 13:49 ` Philipp Rudo
2019-05-15 13:28   ` Thomas Caputi
2019-05-15 14:28     ` Philipp Rudo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).