* RE: sort a foreach on a stat value?
@ 2006-01-12 0:09 Stone, Joshua I
0 siblings, 0 replies; 6+ messages in thread
From: Stone, Joshua I @ 2006-01-12 0:09 UTC (permalink / raw)
To: systemtap
Bump.
Nobody has an opinion about this?
Stone, Joshua I wrote:
> Do we have a syntax to sort a foreach based on a stat value?
> Obviously this doesn't make sense for a histogram, but for the
> others, like @count, it seems that this would very often be what a
> user would want to do.
>
> For example - here's an unscalable way to do simple thread profiling:
>
> global stat
> probe my.event { ++stat[tid()] }
> probe end {
> foreach(tid in stat-) // sort by value -> most active
> printf("%d: %d\n", tid, stat[tid])
> }
>
> The scalable, per-cpu way to do this is:
>
> global stat
> probe my.event { stat[tid()]<<<1 }
> probe end {
> foreach(tid in stat-) // sort by value -> ???
> printf("%d: %d\n", tid, @count(stat[tid]))
> }
>
> But there doesn't seem to be a way to achieve the same sorting.
>
> Along the same lines, it would be extremely useful to be able to do
> "cascading" sort - i.e. sort by more than one field.
>
>
> Josh
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: sort a foreach on a stat value?
@ 2006-01-19 23:02 Stone, Joshua I
0 siblings, 0 replies; 6+ messages in thread
From: Stone, Joshua I @ 2006-01-19 23:02 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: systemtap
Frank Ch. Eigler wrote:
>> foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
>
> That sort of thing has some promise at abbreviating that excessive
> duplication hunt made an example of in bug #2115.
>
> While this does not address sorting, another related syntactical
> possibility is to infer a "[idx1, idx2]" suffix on undecorated
> occurrences of the indexed array within the body of a foreach:
>
> foreach ([x,y] in thingie)
> total += thingie # implied [x,y]
>
> foreach ([x,y,z] in mystats)
> printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))
>
> The latter could be abbreviated further to "@count, @sum, @min", to
> infer the innermost-looped array itself, plus its index tuple.
>
> A later independent optimization could make sure that the translator
> does not emit duplicate array-lookup operations within loops.
Inferring like this would go a long way toward simplifying the language
(from a user perspective). And as you mention, it leaves more
opportunity for optimization.
This can be taken even further if you allow special format characters
(as in _stp_stat_print) - Then your second example could be just:
foreach ([x,y,z] in mystats)
printf("%C %S %m", mystats)
At this point we're not too far from a printa, but manually doing a
foreach lets you sort it.
> Unless I'm mistaken, the current runtime aggregates the whole pmap for
> loops/sorting, even if you want just the top 20. This cost will be
> fully reflected in activity count (bug #1885) at some point. It is
> unlikely to cost much less than the explicit copying loop above.
Yes, it does fully aggregate when you do the foreach. As for cost, I
suspect that will depend on the keys - if you're indexing by one or more
strings I would expect the savings to be more significant.
> I wonder if this behavior makes sorting on statistical values
> sufficiently inefficient that special syntax is not sufficiently
> justified at this point, given that open-coding is possible.
Perhaps, but if we can find a "nice" way to express the sorting it may
still be worth pursuing...
>>>> Along the same lines, it would be extremely useful to be able to do
>>>> "cascading" sort - i.e. sort by more than one field.
>>>
>>> [...]
>>> foreach ([x1+, x2--, y2+++] in array----) { ... }
>>
>> That's not a bad suggestion, though I think it's not obvious in which
>> order the cascading happens. [...]
>
> I guess we'd pick and document one of the two interpretations.
Another possible syntax choice could be:
foreach([x, y, z] in array sorted [@count-, x+, z-])
Thus the sorting order is explicit. Ascending (+) could be implied.
Josh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sort a foreach on a stat value?
2006-01-13 8:14 Stone, Joshua I
@ 2006-01-16 20:43 ` Frank Ch. Eigler
0 siblings, 0 replies; 6+ messages in thread
From: Frank Ch. Eigler @ 2006-01-16 20:43 UTC (permalink / raw)
To: Stone, Joshua I; +Cc: systemtap
Hi -
joshua.i.stone wrote:
> [...] One thing I've noticed is that our foreach syntax has
> different semantics than other languages [...]
Indeed, just like in awk, we iterate over indexes rather than values.
> [...]
> foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
> [...]
That sort of thing has some promise at abbreviating that excessive
duplication hunt made an example of in bug #2115.
While this does not address sorting, another related syntactical
possibility is to infer a "[idx1, idx2]" suffix on undecorated
occurrences of the indexed array within the body of a foreach:
foreach ([x,y] in thingie)
total += thingie # implied [x,y]
foreach ([x,y,z] in mystats)
printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))
The latter could be abbreviated further to "@count, @sum, @min", to
infer the innermost-looped array itself, plus its index tuple.
A later independent optimization could make sure that the translator
does not emit duplicate array-lookup operations within loops.
> [...]
> > foreach (tid in stat) // sort by value -> ???
> > stat_counts[tid] = @count(stat[tid])
> > foreach (tid in stat_counts-)
> > printf("%d: %d\n", tid, stat_counts[tid]) # and/or
> > @avg(stat[tid])) etc. }
> [...]
> This is a passable workaround, yes. The downside is that if stat were
> very large, I would have to fudge with the maxaction counter. If I was
> only interested in maybe the top 20, then with a single loop construct
> it's easy to break out after 20 and not hit the MAXACTION boundary.
Unless I'm mistaken, the current runtime aggregates the whole pmap for
loops/sorting, even if you want just the top 20. This cost will be
fully reflected in activity count (bug #1885) at some point. It is
unlikely to cost much less than the explicit copying loop above.
I wonder if this behavior makes sorting on statistical values
sufficiently inefficient that special syntax is not sufficiently
justified at this point, given that open-coding is possible.
> >> Along the same lines, it would be extremely useful to be able to do
> >> "cascading" sort - i.e. sort by more than one field.
> >
> > [...]
> > foreach ([x1+, x2--, y2+++] in array----) { ... }
>
> That's not a bad suggestion, though I think it's not obvious in which
> order the cascading happens. [...]
I guess we'd pick and document one of the two interpretations.
- FChE
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: sort a foreach on a stat value?
@ 2006-01-13 8:14 Stone, Joshua I
2006-01-16 20:43 ` Frank Ch. Eigler
0 siblings, 1 reply; 6+ messages in thread
From: Stone, Joshua I @ 2006-01-13 8:14 UTC (permalink / raw)
To: fche; +Cc: systemtap
fche@redhat.com wrote:
>> Do we have a syntax to sort a foreach based on a stat value? [...]
>
> Not at this time.
One thing I've noticed is that our foreach syntax has different
semantics than other languages, and this might be hampering us a bit.
In at least perl and bash, the only iterator variable is the value, but
our iterator variables are keys. This prevents us from doing something
like:
foreach (cnt- in @count(stat)) {...}
But there is value in knowing the keys as well. Perhaps what we need is
to allow specifying a value iterator as well, so we can have:
foreach ([key1, key2, value+] in mymap) {...}
... which also prevents the cost of re-indexing the map. And perhaps
with stats, it could be something like:
foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
{...}
For more semantic clarity, perhaps a semicolon would divide the key
iterators from the value/stat iterators. And of course, sorting by a
histogram should not be allowed.
Just some ideas, comments encouraged...
> Since the reporting phase does not need to be as "scalable" (fast) as
> the accumulation phase, perhaps one could do this thusly today:
>
> global stat, stat_counts
> probe my.event { stat[tid()]<<<1 }
> probe end {
> foreach (tid in stat) // sort by value -> ???
> stat_counts[tid] = @count(stat[tid])
> foreach (tid in stat_counts-)
> printf("%d: %d\n", tid, stat_counts[tid]) # and/or
> @avg(stat[tid])) etc. }
This is a passable workaround, yes. The downside is that if stat were
very large, I would have to fudge with the maxaction counter. If I was
only interested in maybe the top 20, then with a single loop construct
it's easy to break out after 20 and not hit the MAXACTION boundary.
With two loops, I have to iterate the entire map the first time.
>> Along the same lines, it would be extremely useful to be able to do
>> "cascading" sort - i.e. sort by more than one field.
>
> If the runtime provided such an facility, one might imagine the
> script syntax exposing it thusly:
> foreach ([x1+, x2--, y2+++] in array----) { ... }
> encoding the cascading order in the length of those +/- suffixes.
That's not a bad suggestion, though I think it's not obvious in which
order the cascading happens. Does + sort before ++? Or does more
length imply higher sorting precedence? I think you mean the former...
Josh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sort a foreach on a stat value?
2005-12-24 14:27 Stone, Joshua I
@ 2006-01-12 4:14 ` Frank Ch. Eigler
0 siblings, 0 replies; 6+ messages in thread
From: Frank Ch. Eigler @ 2006-01-12 4:14 UTC (permalink / raw)
To: Stone, Joshua I; +Cc: systemtap
joshua.i.stone wrote:
> Do we have a syntax to sort a foreach based on a stat value? [...]
Not at this time.
> [...]
> The scalable, per-cpu way to do this is:
>
> global stat
> probe my.event { stat[tid()]<<<1 }
> probe end {
> foreach(tid in stat-) // sort by value -> ???
> printf("%d: %d\n", tid, @count(stat[tid]))
> }
Since the reporting phase does not need to be as "scalable" (fast) as
the accumulation phase, perhaps one could do this thusly today:
global stat, stat_counts
probe my.event { stat[tid()]<<<1 }
probe end {
foreach (tid in stat) // sort by value -> ???
stat_counts[tid] = @count(stat[tid])
foreach (tid in stat_counts-)
printf("%d: %d\n", tid, stat_counts[tid]) # and/or @avg(stat[tid])) etc.
}
> Along the same lines, it would be extremely useful to be able to do
> "cascading" sort - i.e. sort by more than one field.
If the runtime provided such an facility, one might imagine the script syntax
exposing it thusly:
foreach ([x1+, x2--, y2+++] in array----) { ... }
encoding the cascading order in the length of those +/- suffixes.
- FChE
^ permalink raw reply [flat|nested] 6+ messages in thread
* sort a foreach on a stat value?
@ 2005-12-24 14:27 Stone, Joshua I
2006-01-12 4:14 ` Frank Ch. Eigler
0 siblings, 1 reply; 6+ messages in thread
From: Stone, Joshua I @ 2005-12-24 14:27 UTC (permalink / raw)
To: systemtap
Do we have a syntax to sort a foreach based on a stat value? Obviously
this doesn't make sense for a histogram, but for the others, like
@count, it seems that this would very often be what a user would want to
do.
For example - here's an unscalable way to do simple thread profiling:
global stat
probe my.event { ++stat[tid()] }
probe end {
foreach(tid in stat-) // sort by value -> most active
printf("%d: %d\n", tid, stat[tid])
}
The scalable, per-cpu way to do this is:
global stat
probe my.event { stat[tid()]<<<1 }
probe end {
foreach(tid in stat-) // sort by value -> ???
printf("%d: %d\n", tid, @count(stat[tid]))
}
But there doesn't seem to be a way to achieve the same sorting.
Along the same lines, it would be extremely useful to be able to do
"cascading" sort - i.e. sort by more than one field.
Josh
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-01-19 23:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-12 0:09 sort a foreach on a stat value? Stone, Joshua I
-- strict thread matches above, loose matches on Subject: below --
2006-01-19 23:02 Stone, Joshua I
2006-01-13 8:14 Stone, Joshua I
2006-01-16 20:43 ` Frank Ch. Eigler
2005-12-24 14:27 Stone, Joshua I
2006-01-12 4:14 ` Frank Ch. Eigler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).