public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Integration of the shared memory based transport in the SystemTap
@ 2017-07-11  6:59 Arkady
  2017-07-12 12:35 ` David Smith
  0 siblings, 1 reply; 3+ messages in thread
From: Arkady @ 2017-07-11  6:59 UTC (permalink / raw)
  To: systemtap

Hi,

I have an implementation of shared memory which is, hopefully, rather
close to the production grade. The idea is that a probe allocates a
small chunk from the FIFO, fills the chunk with the data, "commits"
the chunk. The FIFO can be lockless if there is a FIFO per core.

This is the API in the kernel space
https://gist.github.com/larytet/4977626fd87817414c7a88dd63e7855d

In the user space the shared memory provides write()/read()/mmap() interfaces.

I am going to patch the SystemTap by adding the API into the C code.

I wonder if there is a chance for making the shared memory a first
class citizen in the STAP and what will it take to merge the API into
the mainline.

Thank you, Arkady.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Integration of the shared memory based transport in the SystemTap
  2017-07-11  6:59 Integration of the shared memory based transport in the SystemTap Arkady
@ 2017-07-12 12:35 ` David Smith
  2017-07-12 13:03   ` Arkady
  0 siblings, 1 reply; 3+ messages in thread
From: David Smith @ 2017-07-12 12:35 UTC (permalink / raw)
  To: Arkady; +Cc: systemtap

I wonder if you wouldn't take a look at the existing ring buffer code
in the transport. We had it working at one point, but then never quite
pushed it over the finish line. I'm sure it needs some updates for
kernel changes. The advantage here would be that we're using an
existing kernel interface instead of rolling our own. Plus, a good bit
of the work has been done to integrate it with systemtap already. Look
in runtime/transport/ring_buffer.c.

On Tue, Jul 11, 2017 at 1:59 AM, Arkady <arkady.miasnikov@gmail.com> wrote:
> Hi,
>
> I have an implementation of shared memory which is, hopefully, rather
> close to the production grade. The idea is that a probe allocates a
> small chunk from the FIFO, fills the chunk with the data, "commits"
> the chunk. The FIFO can be lockless if there is a FIFO per core.
>
> This is the API in the kernel space
> https://gist.github.com/larytet/4977626fd87817414c7a88dd63e7855d
>
> In the user space the shared memory provides write()/read()/mmap() interfaces.
>
> I am going to patch the SystemTap by adding the API into the C code.
>
> I wonder if there is a chance for making the shared memory a first
> class citizen in the STAP and what will it take to merge the API into
> the mainline.
>
> Thank you, Arkady.



-- 
David Smith
Principal Software Engineer
Red Hat

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Integration of the shared memory based transport in the SystemTap
  2017-07-12 12:35 ` David Smith
@ 2017-07-12 13:03   ` Arkady
  0 siblings, 0 replies; 3+ messages in thread
From: Arkady @ 2017-07-12 13:03 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap

On Wed, Jul 12, 2017 at 3:35 PM, David Smith <dsmith@redhat.com> wrote:
> I wonder if you wouldn't take a look at the existing ring buffer code
> in the transport. We had it working at one point, but then never quite
> pushed it over the finish line. I'm sure it needs some updates for
> kernel changes. The advantage here would be that we're using an
> existing kernel interface instead of rolling our own. Plus, a good bit
> of the work has been done to integrate it with systemtap already. Look
> in runtime/transport/ring_buffer.c.


I am trying to leverage the hardware as much as possible and
avoid introducing layers of abstraction. See, for example,
"allocation" function
in https://gist.github.com/larytet/4977626fd87817414c7a88dd63e7855d#file-shared_memory-h-L33

The whole procedure is 120 lines and most of the time about 40
lines are getting executed.This includes debug counters in all
branches. My implementation is data cache friendly, and
remains space efficient while keeping data structures of
different size. In my application the average event size is
150bytes and maximum event size is ~4K.

In the performance tests writing to the FIFO adds less than 5%
to the probe overhead (likely well below 20nano per allocation).

The con is that my approach is not very generic and will not fit
any application.

>
> On Tue, Jul 11, 2017 at 1:59 AM, Arkady <arkady.miasnikov@gmail.com> wrote:
>> Hi,
>>
>> I have an implementation of shared memory which is, hopefully, rather
>> close to the production grade. The idea is that a probe allocates a
>> small chunk from the FIFO, fills the chunk with the data, "commits"
>> the chunk. The FIFO can be lockless if there is a FIFO per core.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-07-12 13:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11  6:59 Integration of the shared memory based transport in the SystemTap Arkady
2017-07-12 12:35 ` David Smith
2017-07-12 13:03   ` Arkady

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).