Using systemtap on MPI applications

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* Using systemtap on MPI applications
@ 2016-03-14 14:24 Olausson, Bjoern
  2016-03-15 19:09 ` Frank Ch. Eigler
  0 siblings, 1 reply; 8+ messages in thread
From: Olausson, Bjoern @ 2016-03-14 14:24 UTC (permalink / raw)
  To: systemtap

Hello SystemTap users,

I am curious if there is a smart way to trace (basically IO) for MPI
applications running on multiple nodes?

I guess it would be possible to either run stap globally or run
"mpirun <options> stap script.stp -c mpi-application"

Is there any "good practice" to profile MPI-applications with
SystemTap if at all possible?

Kind regards,
Bjoern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using systemtap on MPI applications
  2016-03-14 14:24 Using systemtap on MPI applications Olausson, Bjoern
@ 2016-03-15 19:09 ` Frank Ch. Eigler
  2016-03-16  8:13   ` Olausson, Bjoern
  0 siblings, 1 reply; 8+ messages in thread
From: Frank Ch. Eigler @ 2016-03-15 19:09 UTC (permalink / raw)
  To: Olausson, Bjoern; +Cc: systemtap

Hi -

"Olausson, Bjoern"  wrote:

> [...]
> I am curious if there is a smart way to trace (basically IO) for MPI
> applications running on multiple nodes?
>
> I guess it would be possible to either run stap globally or run
> "mpirun <options> stap script.stp -c mpi-application"
> [...]

That would be the brute-force method.  It would require installing the
compiler etc. on all the hosts, unless you run a central stap-server
instance to do the compilation part of the work (passes 1-4).

Another possibility now is to use "stap --remote HOST1 --remote HOST2
..."  from a central box, which internally does "ssh HOST1 stapsh" to
maintain a two-way link, and perform remote execution (pass 5).

It would be nice if stap --remote learned about mpi (openmpi?), so as
to use mpirun or similar to manage remote startup of stapsh and
multiplex stdin/stdout/stderr communications with all the hosts in your
hostfile:
% stap --remote mpi:/path/to/hostfile

Or even
% stap --remote mpirun:HOST1 --remote mpirun:HOST2
may be worth doing, using individual "mpirun -H" jobs per host.

- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using systemtap on MPI applications
  2016-03-15 19:09 ` Frank Ch. Eigler
@ 2016-03-16  8:13   ` Olausson, Bjoern
  2016-03-16 14:07     ` Frank Ch. Eigler
  0 siblings, 1 reply; 8+ messages in thread
From: Olausson, Bjoern @ 2016-03-16  8:13 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Tue, Mar 15, 2016 at 8:09 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
> Hi -
>
> "Olausson, Bjoern"  wrote:
>
>> [...]
>> I am curious if there is a smart way to trace (basically IO) for MPI
>> applications running on multiple nodes?
>>
>> I guess it would be possible to either run stap globally or run
>> "mpirun <options> stap script.stp -c mpi-application"
>> [...]
>
> That would be the brute-force method.  It would require installing the
> compiler etc. on all the hosts, unless you run a central stap-server
> instance to do the compilation part of the work (passes 1-4).
>
> Another possibility now is to use "stap --remote HOST1 --remote HOST2
> ..."  from a central box, which internally does "ssh HOST1 stapsh" to
> maintain a two-way link, and perform remote execution (pass 5).
>
>
> It would be nice if stap --remote learned about mpi (openmpi?), so as
> to use mpirun or similar to manage remote startup of stapsh and
> multiplex stdin/stdout/stderr communications with all the hosts in your
> hostfile:
> % stap --remote mpi:/path/to/hostfile
>
> Or even
> % stap --remote mpirun:HOST1 --remote mpirun:HOST2
> may be worth doing, using individual "mpirun -H" jobs per host.
>
>
> - FChE

The --remote switch is a great start, didn't come across that one yet.
Thanks a lot!

Indeed it would be great if stap would be MPI aware in some way.

Still there is the issue on how to filter what stap is tracing. How
would I tell stap to only focus on one particular executable or PID
when using the --remote  switch so target() can be used.
Any ideas on that?

Greetings,
Bjoern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using systemtap on MPI applications
  2016-03-16  8:13   ` Olausson, Bjoern
@ 2016-03-16 14:07     ` Frank Ch. Eigler
  2016-03-17 12:10       ` Olausson, Bjoern
       [not found]       ` <CAE7O3Tcg7317VY5eH_ipqP7wqUR9CnwmCsdU4z+=wVbv3y14SQ@mail.gmail.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2016-03-16 14:07 UTC (permalink / raw)
  To: Olausson, Bjoern; +Cc: systemtap

Hi -

> > "Olausson, Bjoern"  wrote:

> Indeed it would be great if stap would be MPI aware in some way.

"patches welcome" :-)  we'd be glad to advise, but aren't planning
to undertake the work ourselves very soon.


> Still there is the issue on how to filter what stap is tracing. How
> would I tell stap to only focus on one particular executable or PID
> when using the --remote  switch so target() can be used.
> Any ideas on that?

I believe the "stap -c CMD" and "stap -x PID" options both travel
through "stap --remote ..." ssh, though of course the former makes a
lot more sense.  The "-c CMD" may be good enough for MPI purposes.
And there's always filtering from first principles:

  if (execname() =~ ".*foo.*" && uid() == 44) { }

- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using systemtap on MPI applications
  2016-03-16 14:07     ` Frank Ch. Eigler
@ 2016-03-17 12:10       ` Olausson, Bjoern
       [not found]       ` <CAE7O3Tcg7317VY5eH_ipqP7wqUR9CnwmCsdU4z+=wVbv3y14SQ@mail.gmail.com>
  1 sibling, 0 replies; 8+ messages in thread
From: Olausson, Bjoern @ 2016-03-17 12:10 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Wed, Mar 16, 2016 at 3:07 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Hi -
>
>> > "Olausson, Bjoern"  wrote:
>
>> Indeed it would be great if stap would be MPI aware in some way.
>
> "patches welcome" :-)  we'd be glad to advise, but aren't planning
> to undertake the work ourselves very soon.
>

Understood :-)

>
>> Still there is the issue on how to filter what stap is tracing. How
>> would I tell stap to only focus on one particular executable or PID
>> when using the --remote  switch so target() can be used.
>> Any ideas on that?
>
> I believe the "stap -c CMD" and "stap -x PID" options both travel
> through "stap --remote ..." ssh, though of course the former makes a
> lot more sense.  The "-c CMD" may be good enough for MPI purposes.
> And there's always filtering from first principles:
>
>   if (execname() =~ ".*foo.*" && uid() == 44) { }
>
> - FChE


Excuse that stupid question, but I thought using "-c CMD" will always
instantly execute the CMD, so that would not play will with MPI
executed applications :)

But is there a way to pass a string to stap on which I can apply e.g.
your above filter instead of hardcoding the exec name I want to filter
on?

Cheers and thanks a lot for all your quick answers,
Bjoern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using systemtap on MPI applications
       [not found]         ` <20160317124917.GC29879@redhat.com>
@ 2016-03-17 12:51           ` Olausson, Bjoern
  2016-05-02 15:40             ` Olausson, Bjoern
  0 siblings, 1 reply; 8+ messages in thread
From: Olausson, Bjoern @ 2016-03-17 12:51 UTC (permalink / raw)
  To: Frank Ch. Eigler, systemtap

On Thu, Mar 17, 2016 at 1:49 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Hi -
>
>> Excuse that stupid question, but I thought using "-c CMD" will always
>> instantly execute the CMD, so that would not play will with MPI
>> executed applications :)
>
> Not exactly - "stap -c CMD" should run CMD at each --remote site.
> The extent to which an mpi CMD would run correctly though (and find
> its MPI peers etc.) is unknown though.
>
>> But is there a way to pass a string to stap on which I can apply e.g.
>> your above filter instead of hardcoding the exec name I want to filter
>> on?
>
> Certainly; command line options or global variables are two of the ways.
> The latter performs better because it permits caching.
>
> stap -e 'probe something { if (execname() == @1) { ...} }'  bar
> stap -e 'global foo;  probe something { if (execname() == foo) { ...} }' -Gfoo=bar
>
> - FChE

Thanks a lot, that makes things way easier!

Cheers,
Bjoern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using systemtap on MPI applications
  2016-05-02 15:40             ` Olausson, Bjoern
@ 2016-05-02 15:25               ` Frank Ch. Eigler
  0 siblings, 0 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2016-05-02 15:25 UTC (permalink / raw)
  To: Olausson, Bjoern; +Cc: systemtap

Hi, -

> [...]  Does anyone have a suggestion on how to trace MPIIO calls?
> [...]

If I am guessing correctly, how about probing:

probe process("/usr/lib*/openmpi/lib/openmpi/mca_io_ompio.so")
     .function("*ompio_file_*") { }


- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using systemtap on MPI applications
  2016-03-17 12:51           ` Olausson, Bjoern
@ 2016-05-02 15:40             ` Olausson, Bjoern
  2016-05-02 15:25               ` Frank Ch. Eigler
  0 siblings, 1 reply; 8+ messages in thread
From: Olausson, Bjoern @ 2016-05-02 15:40 UTC (permalink / raw)
  To: Frank Ch. Eigler, systemtap

So using SystemTap with a filter on the binary works great when
running on multiple nodes - thanks for that Frank :-)

As long as I am using POSIX IO everything works as expected. Now when
I use MPIIO of course the vfs.read.return and write function do no
longer work.

Does anyone have a suggestion on how to trace MPIIO calls?

Currently I am probing like this:
probe vfs.read.return {
       if (execname() == binfilter) {
               time_stamp = timestamp()
               latency = gettimeofday_us() - @entry(gettimeofday_us())
               offset = $file->f_pos
               filename = __file_filename(file)
               bytes = $return
               printf("%s;%s;%d;%d;%d;%d;%d;%d;%d\n", filename, name,
time_stamp, latency, offset, bytes_to_read, bytes_read, bytes, pid())
       }
}

Greetings,
Bjoern


On Thu, Mar 17, 2016 at 1:51 PM, Olausson, Bjoern <contactme@olausson.de> wrote:
> On Thu, Mar 17, 2016 at 1:49 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>> Hi -
>>
>>> Excuse that stupid question, but I thought using "-c CMD" will always
>>> instantly execute the CMD, so that would not play will with MPI
>>> executed applications :)
>>
>> Not exactly - "stap -c CMD" should run CMD at each --remote site.
>> The extent to which an mpi CMD would run correctly though (and find
>> its MPI peers etc.) is unknown though.
>>
>>> But is there a way to pass a string to stap on which I can apply e.g.
>>> your above filter instead of hardcoding the exec name I want to filter
>>> on?
>>
>> Certainly; command line options or global variables are two of the ways.
>> The latter performs better because it permits caching.
>>
>> stap -e 'probe something { if (execname() == @1) { ...} }'  bar
>> stap -e 'global foo;  probe something { if (execname() == foo) { ...} }' -Gfoo=bar
>>
>> - FChE
>
> Thanks a lot, that makes things way easier!
>
> Cheers,
> Bjoern

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-02 15:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-14 14:24 Using systemtap on MPI applications Olausson, Bjoern
2016-03-15 19:09 ` Frank Ch. Eigler
2016-03-16  8:13   ` Olausson, Bjoern
2016-03-16 14:07     ` Frank Ch. Eigler
2016-03-17 12:10       ` Olausson, Bjoern
     [not found]       ` <CAE7O3Tcg7317VY5eH_ipqP7wqUR9CnwmCsdU4z+=wVbv3y14SQ@mail.gmail.com>
     [not found]         ` <20160317124917.GC29879@redhat.com>
2016-03-17 12:51           ` Olausson, Bjoern
2016-05-02 15:40             ` Olausson, Bjoern
2016-05-02 15:25               ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).