RE: sort a foreach on a stat value?

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* RE: sort a foreach on a stat value?
@ 2006-01-12  0:09 Stone, Joshua I
  0 siblings, 0 replies; 6+ messages in thread
From: Stone, Joshua I @ 2006-01-12  0:09 UTC (permalink / raw)
  To: systemtap

Bump.

Nobody has an opinion about this?


Stone, Joshua I wrote:
> Do we have a syntax to sort a foreach based on a stat value? 
> Obviously this doesn't make sense for a histogram, but for the
> others, like @count, it seems that this would very often be what a
> user would want to do.
> 
> For example - here's an unscalable way to do simple thread profiling:
> 
> global stat
> probe my.event { ++stat[tid()] }
> probe end {
>   foreach(tid in stat-) // sort by value -> most active
>     printf("%d: %d\n", tid, stat[tid])
> }
> 
> The scalable, per-cpu way to do this is:
> 
> global stat
> probe my.event { stat[tid()]<<<1 }
> probe end {
>   foreach(tid in stat-) // sort by value -> ???
>     printf("%d: %d\n", tid, @count(stat[tid]))
> }
> 
> But there doesn't seem to be a way to achieve the same sorting.
> 
> Along the same lines, it would be extremely useful to be able to do
> "cascading" sort - i.e. sort by more than one field.
> 
> 
> Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: sort a foreach on a stat value?
@ 2006-01-19 23:02 Stone, Joshua I
  0 siblings, 0 replies; 6+ messages in thread
From: Stone, Joshua I @ 2006-01-19 23:02 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Frank Ch. Eigler wrote:
>> 	foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
> 
> That sort of thing has some promise at abbreviating that excessive
> duplication hunt made an example of in bug #2115.
> 
> While this does not address sorting, another related syntactical
> possibility is to infer a "[idx1, idx2]" suffix on undecorated
> occurrences of the indexed array within the body of a foreach:
> 
>    foreach ([x,y] in thingie)
>      total += thingie # implied [x,y]
> 
>    foreach ([x,y,z] in mystats)
>      printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))
> 
> The latter could be abbreviated further to "@count, @sum, @min", to
> infer the innermost-looped array itself, plus its index tuple.
>
> A later independent optimization could make sure that the translator
> does not emit duplicate array-lookup operations within loops.

Inferring like this would go a long way toward simplifying the language
(from a user perspective).  And as you mention, it leaves more
opportunity for optimization.

This can be taken even further if you allow special format characters
(as in _stp_stat_print) - Then your second example could be just:

   foreach ([x,y,z] in mystats)
     printf("%C %S %m", mystats)

At this point we're not too far from a printa, but manually doing a
foreach lets you sort it.

> Unless I'm mistaken, the current runtime aggregates the whole pmap for
> loops/sorting, even if you want just the top 20.  This cost will be
> fully reflected in activity count (bug #1885) at some point.  It is
> unlikely to cost much less than the explicit copying loop above.

Yes, it does fully aggregate when you do the foreach.  As for cost, I
suspect that will depend on the keys - if you're indexing by one or more
strings I would expect the savings to be more significant.

> I wonder if this behavior makes sorting on statistical values
> sufficiently inefficient that special syntax is not sufficiently
> justified at this point, given that open-coding is possible.

Perhaps, but if we can find a "nice" way to express the sorting it may
still be worth pursuing...

>>>> Along the same lines, it would be extremely useful to be able to do
>>>> "cascading" sort - i.e. sort by more than one field.
>>> 
>>> [...]
>>>   foreach ([x1+, x2--, y2+++] in array----) { ... }
>> 
>> That's not a bad suggestion, though I think it's not obvious in which
>> order the cascading happens.  [...]
> 
> I guess we'd pick and document one of the two interpretations.

Another possible syntax choice could be:

  foreach([x, y, z] in array sorted [@count-, x+, z-])

Thus the sorting order is explicit.  Ascending (+) could be implied.

Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sort a foreach on a stat value?
  2006-01-13  8:14 Stone, Joshua I
@ 2006-01-16 20:43 ` Frank Ch. Eigler
  0 siblings, 0 replies; 6+ messages in thread
From: Frank Ch. Eigler @ 2006-01-16 20:43 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

Hi -

joshua.i.stone wrote:

> [...]  One thing I've noticed is that our foreach syntax has
> different semantics than other languages [...]

Indeed, just like in awk, we iterate over indexes rather than values.

> [...]
> 	foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
> [...]

That sort of thing has some promise at abbreviating that excessive
duplication hunt made an example of in bug #2115.  

While this does not address sorting, another related syntactical
possibility is to infer a "[idx1, idx2]" suffix on undecorated
occurrences of the indexed array within the body of a foreach:

   foreach ([x,y] in thingie)
     total += thingie # implied [x,y]

   foreach ([x,y,z] in mystats)
     printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))

The latter could be abbreviated further to "@count, @sum, @min", to
infer the innermost-looped array itself, plus its index tuple.

A later independent optimization could make sure that the translator
does not emit duplicate array-lookup operations within loops.

> [...]
> >    foreach (tid in stat) // sort by value -> ???
> >      stat_counts[tid] = @count(stat[tid])
> >    foreach (tid in stat_counts-)
> >      printf("%d: %d\n", tid, stat_counts[tid]) # and/or
> >  @avg(stat[tid])) etc. }
> [...]
> This is a passable workaround, yes.  The downside is that if stat were
> very large, I would have to fudge with the maxaction counter.  If I was
> only interested in maybe the top 20, then with a single loop construct
> it's easy to break out after 20 and not hit the MAXACTION boundary.

Unless I'm mistaken, the current runtime aggregates the whole pmap for
loops/sorting, even if you want just the top 20.  This cost will be
fully reflected in activity count (bug #1885) at some point.  It is
unlikely to cost much less than the explicit copying loop above.

I wonder if this behavior makes sorting on statistical values
sufficiently inefficient that special syntax is not sufficiently
justified at this point, given that open-coding is possible.

> >> Along the same lines, it would be extremely useful to be able to do
> >> "cascading" sort - i.e. sort by more than one field.
> > 
> > [...]
> >   foreach ([x1+, x2--, y2+++] in array----) { ... }
> 
> That's not a bad suggestion, though I think it's not obvious in which
> order the cascading happens.  [...]

I guess we'd pick and document one of the two interpretations.

- FChE

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: sort a foreach on a stat value?
@ 2006-01-13  8:14 Stone, Joshua I
  2006-01-16 20:43 ` Frank Ch. Eigler
  0 siblings, 1 reply; 6+ messages in thread
From: Stone, Joshua I @ 2006-01-13  8:14 UTC (permalink / raw)
  To: fche; +Cc: systemtap

fche@redhat.com wrote:
>> Do we have a syntax to sort a foreach based on a stat value?  [...]
>
> Not at this time.

One thing I've noticed is that our foreach syntax has different
semantics than other languages, and this might be hampering us a bit.
In at least perl and bash, the only iterator variable is the value, but
our iterator variables are keys.  This prevents us from doing something
like:

	foreach (cnt- in @count(stat)) {...}

But there is value in knowing the keys as well.  Perhaps what we need is
to allow specifying a value iterator as well, so we can have:

	foreach ([key1, key2, value+] in mymap) {...}

... which also prevents the cost of re-indexing the map.  And perhaps
with stats, it could be something like:

	foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
{...}

For more semantic clarity, perhaps a semicolon would divide the key
iterators from the value/stat iterators.  And of course, sorting by a
histogram should not be allowed.

Just some ideas, comments encouraged...

> Since the reporting phase does not need to be as "scalable" (fast) as
> the accumulation phase, perhaps one could do this thusly today:
> 
>  global stat, stat_counts
>  probe my.event { stat[tid()]<<<1 }
>  probe end {
>    foreach (tid in stat) // sort by value -> ???
>      stat_counts[tid] = @count(stat[tid])
>    foreach (tid in stat_counts-)
>      printf("%d: %d\n", tid, stat_counts[tid]) # and/or
>  @avg(stat[tid])) etc. }

This is a passable workaround, yes.  The downside is that if stat were
very large, I would have to fudge with the maxaction counter.  If I was
only interested in maybe the top 20, then with a single loop construct
it's easy to break out after 20 and not hit the MAXACTION boundary.
With two loops, I have to iterate the entire map the first time.

>> Along the same lines, it would be extremely useful to be able to do
>> "cascading" sort - i.e. sort by more than one field.
> 
> If the runtime provided such an facility, one might imagine the
> script syntax exposing it thusly:
>   foreach ([x1+, x2--, y2+++] in array----) { ... }
> encoding the cascading order in the length of those +/- suffixes.

That's not a bad suggestion, though I think it's not obvious in which
order the cascading happens.  Does + sort before ++?  Or does more
length imply higher sorting precedence?  I think you mean the former...

Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sort a foreach on a stat value?
  2005-12-24 14:27 Stone, Joshua I
@ 2006-01-12  4:14 ` Frank Ch. Eigler
  0 siblings, 0 replies; 6+ messages in thread
From: Frank Ch. Eigler @ 2006-01-12  4:14 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap


joshua.i.stone wrote:

> Do we have a syntax to sort a foreach based on a stat value?  [...]

Not at this time.


> [...]
> The scalable, per-cpu way to do this is:
> 
> global stat
> probe my.event { stat[tid()]<<<1 }
> probe end {
>   foreach(tid in stat-) // sort by value -> ???
>     printf("%d: %d\n", tid, @count(stat[tid]))
> }

Since the reporting phase does not need to be as "scalable" (fast) as
the accumulation phase, perhaps one could do this thusly today:

 global stat, stat_counts
 probe my.event { stat[tid()]<<<1 }
 probe end {
   foreach (tid in stat) // sort by value -> ???
     stat_counts[tid] = @count(stat[tid])
   foreach (tid in stat_counts-)
     printf("%d: %d\n", tid, stat_counts[tid]) # and/or @avg(stat[tid])) etc.
 }


> Along the same lines, it would be extremely useful to be able to do
> "cascading" sort - i.e. sort by more than one field.

If the runtime provided such an facility, one might imagine the script syntax
exposing it thusly:
  foreach ([x1+, x2--, y2+++] in array----) { ... } 
encoding the cascading order in the length of those +/- suffixes.

- FChE

^ permalink raw reply	[flat|nested] 6+ messages in thread

* sort a foreach on a stat value?
@ 2005-12-24 14:27 Stone, Joshua I
  2006-01-12  4:14 ` Frank Ch. Eigler
  0 siblings, 1 reply; 6+ messages in thread
From: Stone, Joshua I @ 2005-12-24 14:27 UTC (permalink / raw)
  To: systemtap

Do we have a syntax to sort a foreach based on a stat value?  Obviously
this doesn't make sense for a histogram, but for the others, like
@count, it seems that this would very often be what a user would want to
do.

For example - here's an unscalable way to do simple thread profiling:

global stat
probe my.event { ++stat[tid()] }
probe end {
  foreach(tid in stat-) // sort by value -> most active
    printf("%d: %d\n", tid, stat[tid])
}

The scalable, per-cpu way to do this is:

global stat
probe my.event { stat[tid()]<<<1 }
probe end {
  foreach(tid in stat-) // sort by value -> ???
    printf("%d: %d\n", tid, @count(stat[tid]))
}

But there doesn't seem to be a way to achieve the same sorting.

Along the same lines, it would be extremely useful to be able to do
"cascading" sort - i.e. sort by more than one field.

Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-01-19 23:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-12  0:09 sort a foreach on a stat value? Stone, Joshua I
  -- strict thread matches above, loose matches on Subject: below --
2006-01-19 23:02 Stone, Joshua I
2006-01-13  8:14 Stone, Joshua I
2006-01-16 20:43 ` Frank Ch. Eigler
2005-12-24 14:27 Stone, Joshua I
2006-01-12  4:14 ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).