public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* array sorting checked in
@ 2005-09-23  8:36 Martin Hunt
  2005-09-23 17:57 ` Frank Ch. Eigler
  2005-09-24  6:25 ` James Dickens
  0 siblings, 2 replies; 8+ messages in thread
From: Martin Hunt @ 2005-09-23  8:36 UTC (permalink / raw)
  To: systemtap

I checked in several new functions to the runtime.

_stp_map_sort(MAP map, int keynum, int dir)
_stp_map_sortn(MAP map, int n, int keynum, int dir)
_stp_map_printn(MAP map, int n, const char *fmt)

keynum = 0 for value, or > 0 for the key number to sort on.
n = number of elements to print or sort. 0 means entire array.
dir = sorting direction. -1 or 1 

_stp_map_sortn() will scan an array and put the top (or bottom) 'n'
elements at the start of the array.  This is useful when you don't care
about the whole array, just the top values.  

Here's a working  example. (the current language can't pass array
references, therefore the awkward embedded C)

--------------------

global foo, moo

function print_foo (num:long) %{
	_stp_map_printn(global_foo, THIS->num, "%1s is %s");
%}

function print_moo (num:long) %{
	_stp_map_printn(global_moo, THIS->num, "%1d,%2d,%3d,%4d,%5d = %d");
%}

function sort_foo (num:long, key:long, dir:long) %{
	_stp_map_sortn(global_foo, THIS->num, THIS->key, THIS->dir);
%}

function sort_moo (num:long, key:long, dir:long) %{
	_stp_map_sortn(global_moo, THIS->num, THIS->key, THIS->dir);
%}

probe begin
{
	foo["seattle"] = "sleepless"
	foo["new orleans"] = "wet"
	foo["florida"] = "hot"
	foo["texas"] = "red"
	foo["san francisco"] = "wired"
	foo["galveston"] = "scared"
	foo["miami"] = "viced"
	foo["maine"] = "iced"

		
	moo[1,2,3,4,5] = 1000
	moo[1,12,3,1,6] = 1001
	moo[2,2,3,4,5] = 1000
	moo[10,20,30,40,50] = 10000
	moo[8,2,3,4,7] = -1000
	moo[-1,2,4,1,6] = 2001
	moo[0,2,4,0,6] = 1002

	log("sorted by value");
	sort_foo(0,0,-1)
	print_foo(0)

	log("sorted by value (reverse)");
	sort_foo(0,0,1)
	print_foo(0)

	log("sorted by key 1");
	sort_foo(0,1,-1)
	print_foo(0)

	/* reverse the list again */
	sort_foo(0,1,1)

	log("Just the top three. Sorted by key 1");
	sort_foo(3,1,-1)
	print_foo(3)

	print_moo(0)
	log("sorted by value");
	sort_moo(0,0,-1)
	print_moo(0)

	log("sorted by value (reverse)");
	sort_moo(0,0,1)
	print_moo(0)

	log("sorted by key 1");
	sort_moo(0,1,-1)
	print_moo(0)

	exit()
}




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: array sorting checked in
  2005-09-23  8:36 array sorting checked in Martin Hunt
@ 2005-09-23 17:57 ` Frank Ch. Eigler
  2005-09-23 19:44   ` Martin Hunt
  2005-09-24  6:25 ` James Dickens
  1 sibling, 1 reply; 8+ messages in thread
From: Frank Ch. Eigler @ 2005-09-23 17:57 UTC (permalink / raw)
  To: systemtap


hunt wrote:

> [...] I checked in several new functions to the runtime.
> _stp_map_sort(MAP map, int keynum, int dir) [...]

OK.  It seems to me that the most straightforward way to expose this
in the language would be as a modifier for the "foreach" statement,
something like:

        foreach ([x,y] in sort(array)) {
          do_something_with (array[x,y])
          if (i++ > 10) break
        }

A related modifier could serve as an iteration count limiter, which
would presumably save sorting time.  Given the implementation's likely
performance, it would be wise to penalize sorting by an extra
actioncount, to represent its cost.

Adding a plain "sort <array>" operation would be of only limited
applicability, considering the possibility of concurrent/subsequent
modifications, invalidating the sorted property.  foreach should holds
a lock (but see bug #1275) on the array during iteration, which if
finagled properly, would protect the sorted property for the duration
of iteration.


- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: array sorting checked in
  2005-09-23 17:57 ` Frank Ch. Eigler
@ 2005-09-23 19:44   ` Martin Hunt
  2005-09-23 22:20     ` Frank Ch. Eigler
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Hunt @ 2005-09-23 19:44 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Fri, 2005-09-23 at 13:57 -0400, Frank Ch. Eigler wrote:
> hunt wrote:
> 
> > [...] I checked in several new functions to the runtime.
> > _stp_map_sort(MAP map, int keynum, int dir) [...]
> 
> OK.  It seems to me that the most straightforward way to expose this
> in the language would be as a modifier for the "foreach" statement,
> something like:
> 
>         foreach ([x,y] in sort(array)) {
>           do_something_with (array[x,y])
>           if (i++ > 10) break
>         }
>
>
> A related modifier could serve as an iteration count limiter, which
> would presumably save sorting time.  Given the implementation's likely
> performance, it would be wise to penalize sorting by an extra
> actioncount, to represent its cost.

That seems to add unnecessary complexity to the language. I understand
you are focusing on the locking issues but there must be a better
solution.

By far the most likely uses of array sorting are for printing (at probe
exit) and printing at timed intervals (like top). In the second case
locks are required.

> Adding a plain "sort <array>" operation would be of only limited
> applicability, considering the possibility of concurrent/subsequent
> modifications, invalidating the sorted property.  

> foreach should holds
> a lock (but see bug #1275) on the array during iteration, which if
> finagled properly, would protect the sorted property for the duration
> of iteration.

This seems a poor excuse. Lets keep our language as simple as possible.

There are several other possibilities.  Why not provide a way to pass
array references, allowing us to extend the language without hardcoding
the translator?  Then we simply do something like

function print_sorted_mapn (foo:array, num:long, key:long, dir:long, fmt:string) %{
	LOCK(THIS->foo);
	_stp_map_sortn(THIS->foo, THIS->num, THIS->key, THIS->dir);
        _stp_map_printn(THIS->foo, THIS->num, THIS->fmt);
	UNLOCK(THIS->foo);
%}

or maybe make locks explicit in the systemtap language:
	lock(foo) {
		sort(foo)
		print(foo)
	}



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: array sorting checked in
  2005-09-23 19:44   ` Martin Hunt
@ 2005-09-23 22:20     ` Frank Ch. Eigler
  0 siblings, 0 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2005-09-23 22:20 UTC (permalink / raw)
  To: systemtap

Hi -

hunt wrote:
> > OK.  It seems to me that the most straightforward way to expose this
> > in the language would be as a modifier for the "foreach" statement,
> > [...]
> > A related modifier could serve as an iteration count limiter [...]
> 
> That seems to add unnecessary complexity to the language. I understand
> you are focusing on the locking issues but there must be a better
> solution.

It's not just locking (i.e., correctness in the face of concurrency).
An array sorting operation has no script-visible effects other than
the order in which foreach() index tuples are processed.  It thus
makes sense to consider one as a preprocessing stage for the other.
(The runtime's provision of combined iterate/print functions is
independent of this point.)


> [...]
> > Adding a plain "sort <array>" operation would be of only limited
> > applicability, considering the possibility of concurrent/subsequent
> > modifications, invalidating the sorted property.  
> 
> > foreach should holds
> > a lock (but see bug #1275) on the array during iteration, which if
> > finagled properly, would protect the sorted property for the duration
> > of iteration.
> 
> This seems a poor excuse. Lets keep our language as simple as possible.
> [...]

It is an explanation, not an excuse.  Keeping the language simple is
not well served by your proposed introduction of two new constructs,
whose implications are somewhat beyond the small amount of extra
mechanism that foreach-sorted entails.


- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: array sorting checked in
  2005-09-23  8:36 array sorting checked in Martin Hunt
  2005-09-23 17:57 ` Frank Ch. Eigler
@ 2005-09-24  6:25 ` James Dickens
  2005-09-24 12:29   ` Frank Ch. Eigler
  1 sibling, 1 reply; 8+ messages in thread
From: James Dickens @ 2005-09-24  6:25 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

On 9/23/05, Martin Hunt <hunt@redhat.com> wrote:
> I checked in several new functions to the runtime.
>
> _stp_map_sort(MAP map, int keynum, int dir)
> _stp_map_sortn(MAP map, int n, int keynum, int dir)
> _stp_map_printn(MAP map, int n, const char *fmt)
>
am I missing something?

I thought the idea of systemtap was to be fast, and unobtrusive?

yet you want to allow sorrting an array  of  unknown size using bubble
sort? that can be as bad as  O(n log n) quite possibly with a lock
held? if the array has 10,000 or 100,000 elements if you sort it even
once a second it will take its toll on the system and performance. The
programmer can' t know in advance the size of the array, if he knew
that information he wouldn't be running systemtap.

if you want to maintain stability and accuracy, you either have to
limit the size of the array and quite possibilty store it in order,
which is  O(1) operation for each addition.  and do it with minimal or
no locking, if you wish the system to scale in an SMP environment.

James Dickens
uadmin.blogspot.com


[snipped]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: array sorting checked in
  2005-09-24  6:25 ` James Dickens
@ 2005-09-24 12:29   ` Frank Ch. Eigler
  2005-09-24 15:29     ` James Dickens
  2005-09-25  6:21     ` Martin Hunt
  0 siblings, 2 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2005-09-24 12:29 UTC (permalink / raw)
  To: systemtap


jamesd.wi wrote:

> [...]
> am I missing something?

Yes.

> yet you want to allow sorrting an array of unknown size using bubble
> sort? that can be as bad as O(n log n)

Actually, a bubble sort is even worse at O(n**2).  The main sort
function is described as a "merge sort", but I can't quite recognize
it as such in the code.  I assume all this was chosen for ease of
prototyping rather than longevity.

> [...] The programmer can't know in advance the size of the array, if
> he knew that information he wouldn't be running systemtap.

Array sizes are limited - by design we perform no dynamic memory
allocation during a systemtap session run.  If those limits are
exceeded, the user will be told and will be able to retry with a
larger limits.

Regardless, even if the array size is limited, it can still be large
enough that sorting would be a problem.

> if you want to maintain stability and accuracy, you either have to
> limit the size of the array and quite possibilty store it in order,
> which is O(1) operation for each addition.  [...]

That is incorrect.  If you think about it, you'll see that maintaining
a sorted array (i.e., at each insert/delete operation) is just about
as costly as sorting it once at the end.  I have no idea what you mean
by "accuracy" being a factor either way.


- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: array sorting checked in
  2005-09-24 12:29   ` Frank Ch. Eigler
@ 2005-09-24 15:29     ` James Dickens
  2005-09-25  6:21     ` Martin Hunt
  1 sibling, 0 replies; 8+ messages in thread
From: James Dickens @ 2005-09-24 15:29 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

[snip]
>
> That is incorrect.  If you think about it, you'll see that maintaining
> a sorted array (i.e., at each insert/delete operation) is just about
> as costly as sorting it once at the end.  I have no idea what you mean
> by "accuracy" being a factor either way.
>
example you are monitoring a kprobe that fires repeatedly while the
array is being sorted, if you are holding a lock on the array sure you
might block out the first firing of the probe, what happens if it
happens again? or is the system going spin waiting for the lock to be
released or are you going to drop the data from that probe that just
fired?

James



>

> - FChE
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: array sorting checked in
  2005-09-24 12:29   ` Frank Ch. Eigler
  2005-09-24 15:29     ` James Dickens
@ 2005-09-25  6:21     ` Martin Hunt
  1 sibling, 0 replies; 8+ messages in thread
From: Martin Hunt @ 2005-09-25  6:21 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Sat, 2005-09-24 at 08:29 -0400, Frank Ch. Eigler wrote:

> > yet you want to allow sorrting an array of unknown size using bubble
> > sort? that can be as bad as O(n log n)
> 
> Actually, a bubble sort is even worse at O(n**2).  The main sort
> function is described as a "merge sort", but I can't quite recognize
> it as such in the code.  

That's probably because it is taught as a recursive algorithm, but in
the real world it is always (in my experience) implemented iteratively.

> I assume all this was chosen for ease of
> prototyping rather than longevity.

I wrote it in a hurry, but sorting isn't rocket science. Assuming we
leave sorting in as a feature, what would you like changed?

> > if you want to maintain stability and accuracy, you either have to
> > limit the size of the array and quite possibilty store it in order,
> > which is O(1) operation for each addition.  [...]
> 
> That is incorrect.  If you think about it, you'll see that maintaining
> a sorted array (i.e., at each insert/delete operation) is just about
> as costly as sorting it once at the end.  

No, it would normally be much, much more costly, even ignoring
scalability problems. Imagine an array updated 100,000 times that
contains only 32 elements. Inserting the elements requires a mimimum of
100,000 comparisions. Sorting it at the end requires a worst case of 100
comparisions.

Even if each element is inserted only once and never updated, you
basically have an insertion sort which is O(n**2), unlike merge sort
which is O(n log n).

I don't imagine _stp_map_sort() being used except at probe exit. If you
are still worried about performance, I tried sorting maps containing
200,000 strings. It took about 0.3 seconds.

Martin



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-09-25  6:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-23  8:36 array sorting checked in Martin Hunt
2005-09-23 17:57 ` Frank Ch. Eigler
2005-09-23 19:44   ` Martin Hunt
2005-09-23 22:20     ` Frank Ch. Eigler
2005-09-24  6:25 ` James Dickens
2005-09-24 12:29   ` Frank Ch. Eigler
2005-09-24 15:29     ` James Dickens
2005-09-25  6:21     ` Martin Hunt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).