public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug translator/26296] New: delay script-global locking until required
@ 2020-07-23 15:40 fche at redhat dot com
  2020-07-24  5:22 ` Craig Ringer
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: fche at redhat dot com @ 2020-07-23 15:40 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=26296

            Bug ID: 26296
           Summary: delay script-global locking until required
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: translator
          Assignee: systemtap at sourceware dot org
          Reporter: fche at redhat dot com
  Target Milestone: ---

The scripting language promises atomic execution of handlers that read/write
global variables.  This is implemented by taking read/write locks as
appropriate, early in the probe handler prologue.  It has been repeatedly
observed that this causes perhaps unnecessary overheads (e.g. bug #7033).

We can imagine a change that could maintain the atomic semantics, but handle
the common pattern:

  global bar
  probe foo {
    if(condition) next;
    bar = $var
  }

where a pure filtering predicate that does not read global variables is
expected to frequently skip execution of the critical sections entirely.

Instead of emitting:

prologue:
   lock_all()
body:
   if(condition) goto epilogue;
   bar=$var
epilogue:
   unlock_all()

we could emit:

prologue:
   locked_p = false
body:
   if(condition) goto epilogue;
   if(!locked_p) lock_all(); locked_p = true;
   bar=$var
epilogue:
   if (locked_p) unlock_all()

IOW: defer locking to the first moment when any global is actually
read/written, tracking locked-ness in a new context local.  This would involve
only a small change to the translator, involving only context-free logic.  That
could
later be optimized to remove repeated checks/etc. over multiple global vars in
a control-flow / context aware way.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug translator/26296] New: delay script-global locking until required
  2020-07-23 15:40 [Bug translator/26296] New: delay script-global locking until required fche at redhat dot com
@ 2020-07-24  5:22 ` Craig Ringer
  2020-07-24 14:12   ` Arkady
  2020-07-24  5:22 ` [Bug translator/26296] " craig at 2ndquadrant dot com
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Craig Ringer @ 2020-07-24  5:22 UTC (permalink / raw)
  To: fche at redhat dot com; +Cc: systemtap

>
> IOW: defer locking to the first moment when any global is actually
> read/written, tracking locked-ness in a new context local.  This would
> involve
> only a small change to the translator, involving only context-free logic.
> That
> could later be optimized to remove repeated checks/etc. over multiple
> global vars in
> a control-flow / context aware way.
>
>
Even an explicit construct that scopes locking would be handy. Borrow from
Java's "synchronized" perhaps.

The fact that whole probes get locked is a serious limitation for one of my
systemtap use cases, where I inject delays and faults into the target
application. The probe flow is supposed to be something like:

global targets_map;

probe process("foo").mark("some_probe_point") {
  if (pid() in targets_map) {
      kdelay(100000);
  }
}

where kdelay is a simple embedded C wrapper around the kernel function of
the same name. But due to the locking on the global "targets_map", every
hit on "some_probe_point" will block on the lock held by the sleeping
probe. So probes can't inject sleeps or delays to try to trigger race
conditions.

So yes, the ability to take a lock over a narrower scope than the whole
probe would be very desirable.

I've wondered about the feasibility of doing this in embedded C, but
haven't had a chance to explore it properly yet.

This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
call ksleep()  rather than busy-loop?

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug translator/26296] delay script-global locking until required
  2020-07-23 15:40 [Bug translator/26296] New: delay script-global locking until required fche at redhat dot com
  2020-07-24  5:22 ` Craig Ringer
@ 2020-07-24  5:22 ` craig at 2ndquadrant dot com
  2020-07-24 14:12 ` arkady.miasnikov at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: craig at 2ndquadrant dot com @ 2020-07-24  5:22 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #1 from craig at 2ndquadrant dot com ---
>
> IOW: defer locking to the first moment when any global is actually
> read/written, tracking locked-ness in a new context local.  This would
> involve
> only a small change to the translator, involving only context-free logic.
> That
> could later be optimized to remove repeated checks/etc. over multiple
> global vars in
> a control-flow / context aware way.
>
>
Even an explicit construct that scopes locking would be handy. Borrow from
Java's "synchronized" perhaps.

The fact that whole probes get locked is a serious limitation for one of my
systemtap use cases, where I inject delays and faults into the target
application. The probe flow is supposed to be something like:

global targets_map;

probe process("foo").mark("some_probe_point") {
  if (pid() in targets_map) {
      kdelay(100000);
  }
}

where kdelay is a simple embedded C wrapper around the kernel function of
the same name. But due to the locking on the global "targets_map", every
hit on "some_probe_point" will block on the lock held by the sleeping
probe. So probes can't inject sleeps or delays to try to trigger race
conditions.

So yes, the ability to take a lock over a narrower scope than the whole
probe would be very desirable.

I've wondered about the feasibility of doing this in embedded C, but
haven't had a chance to explore it properly yet.

This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
call ksleep()  rather than busy-loop?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug translator/26296] New: delay script-global locking until required
  2020-07-24  5:22 ` Craig Ringer
@ 2020-07-24 14:12   ` Arkady
  0 siblings, 0 replies; 9+ messages in thread
From: Arkady @ 2020-07-24 14:12 UTC (permalink / raw)
  To: Craig Ringer; +Cc: fche at redhat dot com, systemtap

On Fri, Jul 24, 2020 at 8:23 AM Craig Ringer <craig@2ndquadrant.com> wrote:
>
> >
> > IOW: defer locking to the first moment when any global is actually
> > read/written, tracking locked-ness in a new context local.  This would
> > involve
> > only a small change to the translator, involving only context-free logic.
> > That
> > could later be optimized to remove repeated checks/etc. over multiple
> > global vars in
> > a control-flow / context aware way.
> >
> >
> Even an explicit construct that scopes locking would be handy. Borrow from
> Java's "synchronized" perhaps.
>
> The fact that whole probes get locked is a serious limitation for one of my
> systemtap use cases, where I inject delays and faults into the target
> application. The probe flow is supposed to be something like:
>
> global targets_map;
>
> probe process("foo").mark("some_probe_point") {
>   if (pid() in targets_map) {
>       kdelay(100000);
>   }
> }

Usually there is a  lock because of the use of maps/associative
arrays. Use your own C implementation (check the code base I sent you)
... or we can implement inline C support.

You can not sleep in many probes. Such code does not crash
immediately, but eventually it will.
In some probes it is safe to sleep.

>
> where kdelay is a simple embedded C wrapper around the kernel function of
> the same name. But due to the locking on the global "targets_map", every
> hit on "some_probe_point" will block on the lock held by the sleeping
> probe. So probes can't inject sleeps or delays to try to trigger race
> conditions.
>
> So yes, the ability to take a lock over a narrower scope than the whole
> probe would be very desirable.
>
> I've wondered about the feasibility of doing this in embedded C, but
> haven't had a chance to explore it properly yet.

That's the route you have

>
> This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
> call ksleep()  rather than busy-loop?
>
> --

Spinlocks are Ok. Calls to sleep() generally is not safe,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug translator/26296] delay script-global locking until required
  2020-07-23 15:40 [Bug translator/26296] New: delay script-global locking until required fche at redhat dot com
  2020-07-24  5:22 ` Craig Ringer
  2020-07-24  5:22 ` [Bug translator/26296] " craig at 2ndquadrant dot com
@ 2020-07-24 14:12 ` arkady.miasnikov at gmail dot com
  2020-08-04 19:59 ` fche at redhat dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: arkady.miasnikov at gmail dot com @ 2020-07-24 14:12 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #2 from arkady.miasnikov at gmail dot com ---
On Fri, Jul 24, 2020 at 8:23 AM Craig Ringer <craig@2ndquadrant.com> wrote:
>
> >
> > IOW: defer locking to the first moment when any global is actually
> > read/written, tracking locked-ness in a new context local.  This would
> > involve
> > only a small change to the translator, involving only context-free logic.
> > That
> > could later be optimized to remove repeated checks/etc. over multiple
> > global vars in
> > a control-flow / context aware way.
> >
> >
> Even an explicit construct that scopes locking would be handy. Borrow from
> Java's "synchronized" perhaps.
>
> The fact that whole probes get locked is a serious limitation for one of my
> systemtap use cases, where I inject delays and faults into the target
> application. The probe flow is supposed to be something like:
>
> global targets_map;
>
> probe process("foo").mark("some_probe_point") {
>   if (pid() in targets_map) {
>       kdelay(100000);
>   }
> }

Usually there is a  lock because of the use of maps/associative
arrays. Use your own C implementation (check the code base I sent you)
... or we can implement inline C support.

You can not sleep in many probes. Such code does not crash
immediately, but eventually it will.
In some probes it is safe to sleep.

>
> where kdelay is a simple embedded C wrapper around the kernel function of
> the same name. But due to the locking on the global "targets_map", every
> hit on "some_probe_point" will block on the lock held by the sleeping
> probe. So probes can't inject sleeps or delays to try to trigger race
> conditions.
>
> So yes, the ability to take a lock over a narrower scope than the whole
> probe would be very desirable.
>
> I've wondered about the feasibility of doing this in embedded C, but
> haven't had a chance to explore it properly yet.

That's the route you have

>
> This reminds me - is it ever safe to sleep in a systemtap probe, e.g. to
> call ksleep()  rather than busy-loop?
>
> --

Spinlocks are Ok. Calls to sleep() generally is not safe,

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug translator/26296] delay script-global locking until required
  2020-07-23 15:40 [Bug translator/26296] New: delay script-global locking until required fche at redhat dot com
                   ` (2 preceding siblings ...)
  2020-07-24 14:12 ` arkady.miasnikov at gmail dot com
@ 2020-08-04 19:59 ` fche at redhat dot com
  2020-08-10  6:32   ` Craig Ringer
  2020-08-10  6:33 ` craig at 2ndquadrant dot com
  2020-08-18 19:06 ` fche at redhat dot com
  5 siblings, 1 reply; 9+ messages in thread
From: fche at redhat dot com @ 2020-08-04 19:59 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
> Even an explicit construct that scopes locking would be handy. Borrow from
> Java's "synchronized" perhaps.

If one can come up with easy-to-explain, implementable, safe
semantics, yeah perhaps!

> global targets_map;
> 
> probe process("foo").mark("some_probe_point") {
>   if (pid() in targets_map) {
>       kdelay(100000);
>   }
> }

In this example, you need the dual of the subject feature: 
release of locks as early as possible.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug translator/26296] delay script-global locking until required
  2020-08-04 19:59 ` fche at redhat dot com
@ 2020-08-10  6:32   ` Craig Ringer
  0 siblings, 0 replies; 9+ messages in thread
From: Craig Ringer @ 2020-08-10  6:32 UTC (permalink / raw)
  To: fche at redhat dot com; +Cc: systemtap

On Wed, 5 Aug 2020 at 03:59, fche at redhat dot com via Systemtap <
systemtap@sourceware.org> wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=26296
>
> --- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
> > Even an explicit construct that scopes locking would be handy. Borrow
> from
> > Java's "synchronized" perhaps.
>
> If one can come up with easy-to-explain, implementable, safe
> semantics, yeah perhaps!
>

I'm thinking something like this:

* Explicit locking is scoped to a block
* Locks are acquired against a named global variable
* Within a scope that uses explicit locking, ab attempt to access global
variables for which locks have not been explicitly acquired is a semantic
error
* Any exit from a block - "next", "return", throwing an exception, etc -
releases the lock at escape from the block.
* A warning will be raised during compilation if any given global is
accessed under explicit locking in one part of a script or tapset, but via
implicit probe level locking in another part.

Deadlock protection is a bit interesting. I haven't looked at how systemtap
takes care of that at the moment. If it can detect deadlock and fail
gracefully that's probably sufficient.

Of course it's all handwaving unless I have time to write it, since I don't
get to ask others to. And I'm a bit stuck in C++ error message spam in the
relatively simple patch I wrote for @enum already...


-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug translator/26296] delay script-global locking until required
  2020-07-23 15:40 [Bug translator/26296] New: delay script-global locking until required fche at redhat dot com
                   ` (3 preceding siblings ...)
  2020-08-04 19:59 ` fche at redhat dot com
@ 2020-08-10  6:33 ` craig at 2ndquadrant dot com
  2020-08-18 19:06 ` fche at redhat dot com
  5 siblings, 0 replies; 9+ messages in thread
From: craig at 2ndquadrant dot com @ 2020-08-10  6:33 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=26296

--- Comment #4 from craig at 2ndquadrant dot com ---
On Wed, 5 Aug 2020 at 03:59, fche at redhat dot com via Systemtap <
systemtap@sourceware.org> wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=26296
>
> --- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
> > Even an explicit construct that scopes locking would be handy. Borrow
> from
> > Java's "synchronized" perhaps.
>
> If one can come up with easy-to-explain, implementable, safe
> semantics, yeah perhaps!
>

I'm thinking something like this:

* Explicit locking is scoped to a block
* Locks are acquired against a named global variable
* Within a scope that uses explicit locking, ab attempt to access global
variables for which locks have not been explicitly acquired is a semantic
error
* Any exit from a block - "next", "return", throwing an exception, etc -
releases the lock at escape from the block.
* A warning will be raised during compilation if any given global is
accessed under explicit locking in one part of a script or tapset, but via
implicit probe level locking in another part.

Deadlock protection is a bit interesting. I haven't looked at how systemtap
takes care of that at the moment. If it can detect deadlock and fail
gracefully that's probably sufficient.

Of course it's all handwaving unless I have time to write it, since I don't
get to ask others to. And I'm a bit stuck in C++ error message spam in the
relatively simple patch I wrote for @enum already...

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug translator/26296] delay script-global locking until required
  2020-07-23 15:40 [Bug translator/26296] New: delay script-global locking until required fche at redhat dot com
                   ` (4 preceding siblings ...)
  2020-08-10  6:33 ` craig at 2ndquadrant dot com
@ 2020-08-18 19:06 ` fche at redhat dot com
  5 siblings, 0 replies; 9+ messages in thread
From: fche at redhat dot com @ 2020-08-18 19:06 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=26296

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #5 from Frank Ch. Eigler <fche at redhat dot com> ---
commit 25012d82 attempts an algorithmic optimization to the
locking problem.  It should handle both Craig's "early unlock"
and multiple folks' "late lock" needs, without new syntax or
semantics (!).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-18 19:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23 15:40 [Bug translator/26296] New: delay script-global locking until required fche at redhat dot com
2020-07-24  5:22 ` Craig Ringer
2020-07-24 14:12   ` Arkady
2020-07-24  5:22 ` [Bug translator/26296] " craig at 2ndquadrant dot com
2020-07-24 14:12 ` arkady.miasnikov at gmail dot com
2020-08-04 19:59 ` fche at redhat dot com
2020-08-10  6:32   ` Craig Ringer
2020-08-10  6:33 ` craig at 2ndquadrant dot com
2020-08-18 19:06 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).