From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25044 invoked by alias); 30 Sep 2009 02:57:13 -0000 Received: (qmail 25029 invoked by uid 22791); 30 Sep 2009 02:57:09 -0000 X-SWARE-Spam-Status: No, hits=0.3 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS,TBC X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 30 Sep 2009 02:57:01 +0000 Received: from int-mx03.intmail.prod.int.phx2.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n8U2uxjH028236; Tue, 29 Sep 2009 22:56:59 -0400 Received: from [10.14.54.191] (dhcp-191.sjc.redhat.com [10.14.54.191]) by int-mx03.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id n8U2ux0n012621; Tue, 29 Sep 2009 22:56:59 -0400 Message-ID: <4AC2C8FB.6010701@redhat.com> Date: Wed, 30 Sep 2009 02:57:00 -0000 From: Josh Stone User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20090922 Fedora/3.0-2.7.b4.fc11 Lightning/1.0pre Thunderbird/3.0b4 MIME-Version: 1.0 To: "David J. Wilder" CC: systemtap@sourceware.org Subject: Re: [PATCH] trace tcp connection parameters References: <1253848025.21411.2.camel@wilder.ibm.com> In-Reply-To: <1253848025.21411.2.camel@wilder.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2009-q3/txt/msg00897.txt.bz2 On 09/24/2009 08:07 PM, David J. Wilder wrote: > This script (tcp_trace) can be used to trace tcp connection parameters > and state changes. This work was original inspired by Stephen Hemminger's > TCP cwnd snooper (net/ipv4/tcp_probe.c). Tcp_trace is a helpful tool for > troubleshooting connection performance issues. I would like to included > tcp_trace in the systemtap examples directory. Thanks for this script David. I don't know enough about networking to offer much insight here, but I do have some suggestions that I think will simplify the implementation. > > Signed-off-by: Varun Chandramohan > Signed-off-by: David Wilder > ------------------------------------------------------ > .../systemtap.examples/network/tcp_trace.meta | 14 + > testsuite/systemtap.examples/network/tcp_trace.stp | 822 ++++++++++++++++++++ > testsuite/systemtap.examples/network/tcp_trace.txt | 24 + > 3 files changed, 860 insertions(+), 0 deletions(-) > > diff --git a/testsuite/systemtap.examples/network/tcp_trace.meta b/testsuite/systemtap.examples/network/tcp_trace.meta > new file mode 100644 > index 0000000..2783c89 > --- /dev/null > +++ b/testsuite/systemtap.examples/network/tcp_trace.meta > @@ -0,0 +1,14 @@ > +title: Tcp connection tracing utility. > +name: tcp_trace.stp > +version: 1.0 > +author: varuncha@in.ibm.com wilder@us.ibm.com > +keywords: network trace > +subsystem: network > +status: production > +exit: user-controlled > +output: timed > +scope: per-socket > +arg_[0-9]+: tcp_trace.stp filter=all|state|txq|rxq|srtt|snd_cwnd|snd_wnd|rexmit|pmtu|ssthresh|timer|rcvwnd [timeout=] [update=change|all] Rule > +description: This scripts traces a given tcp connection based on the filter parameters given by the user. The indexing is done by the 4 tuples local address, remote address, local port, remote port. > +test_check: stap -p4 tcp_trace.stp > +test_installcheck: stap tcp_trace.stp 127.0.0.1:*-127.0.0.1:* timeout=1 > diff --git a/testsuite/systemtap.examples/network/tcp_trace.stp b/testsuite/systemtap.examples/network/tcp_trace.stp > new file mode 100755 > index 0000000..04c2a63 > --- /dev/null > +++ b/testsuite/systemtap.examples/network/tcp_trace.stp > @@ -0,0 +1,822 @@ > +#! /usr/bin/env stap > +/* > + * Copyright (C) 2009 IBM Corp. > + * This file is part of systemtap, and is free software. You can > + * redistribute it and/or modify it under the terms of the GNU General > + * Public License (GPL); either version 2, or (at your option) any > + * later version. > + * > + * Version 0.1 wilder@us.ibm.com 2009-05-13 > + * Version 0.3 varuncha@in.ibm.com 2009-05-20 > + * Version 1.0 wilder@us.ibm.com 2009-09-24 > + * > + * Tcp connection tracing utility. > + * > + * Description: > + * This scripts traces a given tcp connection based on the filter > + * parameters given by the user. The indexing is done by the 4 tuples > + * local address, remote address, local port, remote port. > + * > + * Synopsis: > + * tcp_trace.stp [filter=all|state|txq|rxq|srtt|snd_cwnd|snd_wnd|rexmit\ > + * |pmtu|ssthresh|timer|rcvwnd] [timeout=]\ > + * [update=change|all]\ > + * Rule format: > + * :-: > + * > + * filter tcp_trace.stp collects connection information from various > + * probe points. This gives various tcp related parameters. > + * However it is not all parameters are needed when debuggin +g > + * a tcp problem. Specifying only required parameters reduces > + * analysis time. Multiple parameters can be specified by using > + * ','. > + * > + * timeout (optional) When a timeout value (in seconds) is specified > + * tcpstat will automatically terminate it's run at the end of > + * the specified time and produce a report. When timeout is > + * omitted the script will run until the user terminates it by > + * typing a ^c. I don't see anything that produces a report, although some kind of summary might be interesting. > + * > + * update (optional) By default the script only displays data if there > + * is a change in the parameters specified in the "filter". This > + * can be changed by specifying "all", which will ouput parameters > + * of every packet that hits the probe. s/ouput/output/ > + * > + * Rule Format This is used to specify the particular connection being > + * traced. Only one connection can be specified at a time. > + * However, if needed the local and remote port can be a > + * wild-card (*). But the local & remote ip has to be a host ip. > + * > + * The Rule Format is: > + * :-: > + * > + * -Address are specified as ipv4 dot notation address. > + * -Ports are specified as decimal numbers. You implementation also supports (*) in the ip address, which should be documented here. Or for bonus points, it would be neat to support explicit subnet/mask specs, as I think that's closer to what people usually would deal with. > + * > + * Examples: > + * Here are some examples of using tcp_trace: > + * > + * Trace TCP connection from 172.16.15.160 to 172.16.15.1 on port 22 with > + * state,txq,rxq,pmtu filter > + * tcp_trace.stp filter=state,txq,rxq,pmtu timeout=100\ > + * 172.16.15.160:*-172.16.15.1:22 > + * > + * Trace TCP connection from 172.16.15.1 to 172.16.15.160 on port 22 with > + * all parameters in filter > + * tcp_trace.stp filter=all 172.16.15.160:22-172.16.15.1:* > + */ > + > +global tcp_state; > +global timer_info; > +global filter; > +global key_list; > +global lastkey; > +global tcp_param; > +global timeout; > +global update; > +global count; > +global length; > +global tx_current_timer = 0; > +global rx_current_timer = 0; > +global find_timer = 0; > +global argms; > +global filter_flds = 0; > +global snd_cwnd_flg = 0; > +global snd_wnd_flg = 0; > +global srtt_flg = 0; > +global state_flg = 0; > +global txq_flg = 0; > +global rxq_flg = 0; > +global rexmit_flg = 0; > +global pmtu_flg = 0; > +global ssthresh_flg = 0; > +global timer_flg = 0; > +global rcvwnd_flg = 0; Just stylistic - it's not required to initialize globals to 0 or "". The only real reason to do so is to force the type to long or string, but we should be able to figure that out automatically. I think some of these globals don't need to exist, but I'll try to mention that in the places they are used. > + > +probe begin { > + timeout = -1; > + number_of_filters = process_cmdline() > + > + if(number_of_filters < 1) { > + usage("No Connection Parameters Specified") > + } else if (number_of_filters > 1) { > + usage("Too many Connection Parameters Specified") > + } > + > + for(i =0; i < count; i++) { > + if(tcp_param[i] == "snd_cwnd") { > + snd_cwnd_flg = 1 > + } else if(tcp_param[i] == "snd_wnd") { > + snd_wnd_flg = 1 > + } else if(tcp_param[i] == "srtt") { > + srtt_flg = 1; > + } else if(tcp_param[i] == "state") { > + state_flg = 1; > + } else if(tcp_param[i] == "txq") { > + txq_flg = 1; > + } else if(tcp_param[i] == "rxq") { > + rxq_flg = 1; > + } else if(tcp_param[i] == "rexmit") { > + rexmit_flg = 1; > + } else if(tcp_param[i] == "pmtu") { > + pmtu_flg = 1; > + } else if(tcp_param[i] == "ssthresh") { > + ssthresh_flg = 1; > + } else if(tcp_param[i] == "timer") { > + timer_flg = 1; > + } else if(tcp_param[i] == "rcvwnd") { > + rcvwnd_flg = 1; > + } else if(tcp_param[i] == "all") { > + filter_flds = 1; > + } else { > + printf("Discard Unknown Filter = %s \n", tcp_param[i]); > + } > + > + } Why not just make this part of process_cmdline()? Then you don't need a global for tcp_param[] or count. > + > + if(!(snd_cwnd_flg | snd_wnd_flg | srtt_flg | state_flg | txq_flg | > + rxq_flg | rexmit_flg | pmtu_flg | ssthresh_flg | timer_flg | > + rcvwnd_flg)) > + filter_flds = 1 I don't think you need the filter_flds global -- just set all of the flags in this case. That way every "if (foo_flg || filter_flds)" can be just "if (foo_flg)". > + > + if(state_flg || filter_flds) { > + tcp_state[1] = "ESTABLISHED" > + tcp_state[2] = "SYN_SENT" > + tcp_state[3] = "SYN_RECV" > + tcp_state[4] = "FIN_WAIT1" > + tcp_state[5] = "FIN_WAIT2" > + tcp_state[6] = "TIME_WAIT" > + tcp_state[7] = "CLOSE" > + tcp_state[8] = "CLOSE_WAIT" > + tcp_state[9] = "LAST_ACK" > + tcp_state[10] = "LISTEN" > + tcp_state[11] = "CLOSING" > + } > + > + if(timer_flg || filter_flds) { > + timer_info[1] = "Rxmit" > + timer_info[2] = "Delack" > + timer_info[3] = "Probe" > + timer_info[4] = "Keepalv" > + } > + > + printf("Start TCP Probing.....\n"); > + printf("\n"); > + printf("\nSource Address Dest Address State Len Tx-Q Rx-Q PMTU SndCwnd SndWnd A.RWnd SRTT Ssthreshold Rexmit Timer\n"); > +} > + > +probe kernel.function("tcp_rcv_established"), > + kernel.function("tcp_rcv_state_process") I see that you're not using any tapset probepoints in this script. If these are useful places to probe, would you consider adding them to one of the tapsets? Also, each of the probe bodies are nearly the same, except for a few variations in variable sources (sk, length, r/w flag). It would be nice to factor that common chunk into a single function. > +{ > + sk = $sk > + > + laddr = tcpmib_local_addr(sk); > + raddr = tcpmib_remote_addr(sk); > + lport = tcpmib_local_port(sk); > + rport = tcpmib_remote_port(sk); > + length = $skb->len > + > + if ( filter_key(laddr, raddr, lport, rport, 0) == 0 ) > + next; > + > + if (is_packet_updated(laddr, raddr, lport, rport, 0, sk) == 0) > + next; > + > + print_packet_info(laddr, raddr, lport, rport, 0) > +} > + > +probe kernel.function("tcp_transmit_skb") > +{ > + sk = $sk > + > + laddr = tcpmib_local_addr(sk); > + raddr = tcpmib_remote_addr(sk); > + lport = tcpmib_local_port(sk); > + rport = tcpmib_remote_port(sk); > + > + length = $skb->len > + > + if ( filter_key(laddr, raddr, lport, rport, 1) == 0 ) > + next; > + > + if (is_packet_updated(laddr, raddr, lport, rport, 1, sk) == 0) > + next; > + > + print_packet_info(laddr, raddr, lport, rport, 1) > +} > + > +probe kernel.function("tcp_keepalive_timer") > +{ > + sk = $data; > + > + laddr = tcpmib_local_addr(sk); > + raddr = tcpmib_remote_addr(sk); > + lport = tcpmib_local_port(sk); > + rport = tcpmib_remote_port(sk); > + > + length = 0 > + tx_current_timer = 4; > + > + if ( filter_key(laddr, raddr, lport, rport, 1) == 0 ) > + next; > + > + if (is_packet_updated(laddr, raddr, lport, rport, 1, sk) == 0) > + next; > + > + print_packet_info(laddr, raddr, lport, rport, 1) > +} > + > +probe kernel.function("tcp_delack_timer") > +{ > + sk = $data; > + > + laddr = tcpmib_local_addr(sk); > + raddr = tcpmib_remote_addr(sk); > + lport = tcpmib_local_port(sk); > + rport = tcpmib_remote_port(sk); > + > + length = 0 > + tx_current_timer = 2; > + > + if ( filter_key(laddr, raddr, lport, rport, 1) == 0 ) > + next; > + > + if (is_packet_updated(laddr, raddr, lport, rport, 1, sk) == 0) > + next; > + > + print_packet_info(laddr, raddr, lport, rport, 1) > +} > + > +probe kernel.function("tcp_send_probe0") > +{ > + sk = $sk > + > + if(find_timer == 3) { > + tx_current_timer = 3 > + next; > + } > + > + laddr = tcpmib_local_addr(sk); > + raddr = tcpmib_remote_addr(sk); > + lport = tcpmib_local_port(sk); > + rport = tcpmib_remote_port(sk); > + > + length = 0 > + > + if ( filter_key(laddr, raddr, lport, rport, 1) == 0 ) > + next; > + > + if (is_packet_updated(laddr, raddr, lport, rport, 1, sk) == 0) > + next; > + > + print_packet_info(laddr, raddr, lport, rport, 1) > +} > + > +probe kernel.function("tcp_write_timer") > +{ > + sk = $data > + > + laddr = tcpmib_local_addr(sk); > + raddr = tcpmib_remote_addr(sk); > + lport = tcpmib_local_port(sk); > + rport = tcpmib_remote_port(sk); > + > + if ( filter_key(laddr, raddr, lport, rport, 1) == 0 ) > + next; > + > + find_timer = @cast(sk, "inet_connection_sock")->icsk_pending > +} This use of find_timer and rx/tx_current_timer seems racy to me -- what if there are multiple connections matching the filter and changing timers at the same time? > + > +probe kernel.function("tcp_retransmit_skb") > +{ > + sk = $sk; > + > + if(find_timer == 1) { > + tx_current_timer = 1 > + next; > + } > + > + laddr = tcpmib_local_addr(sk); > + raddr = tcpmib_remote_addr(sk); > + lport = tcpmib_local_port(sk); > + rport = tcpmib_remote_port(sk); > + > + length = $skb->len > + > + > + if ( filter_key(laddr, raddr, lport, rport, 1) == 0 ) > + next; > + > + if (is_packet_updated(laddr, raddr, lport, rport, 1, sk) == 0) > + next; > + > + print_packet_info(laddr, raddr, lport, rport, 1) > +} > + > +function print_packet_info:long (laddr:long, raddr:long, lport:long, > + rport:long, flag:long ) > +{ > + temp_addr = ip_ntop(htonl(laddr)) > + local_addr = sprintf("%s:%d",temp_addr,lport) > + temp_addr = ip_ntop(htonl(raddr)) > + remote_addr = sprintf("%s:%d",temp_addr,rport) > + argms[1] = 5 > + argms[3] = length The argms array is quite strange to me. It is only used within this function, so it doesn't really need to be global. The indexes are magic -- they don't appear to have real meaning. Some of the indexes appear to be field widths, and some are the field values. Please just use separate local variables with descriptive names for each of these. > + > + if(state_flg || filter_flds) { > + state_val = key_list[laddr, raddr, lport, rport, flag, 1] > + if(state_val < 1 || state_val > 11) { > + prnt_state = "UNKNOWN" > + } else { > + prnt_state = tcp_state[state_val] > + } > + } else { > + prnt_state = "-" > + } > + > + if(txq_flg || filter_flds) { > + argms[4] = key_list[laddr, raddr, lport, rport, flag, 2] > + } else { > + argms[4] = -1 > + } It's even more confusing when the magic indexes of key_list doesn't match the the indexes used in argms... > + > + if(rxq_flg || filter_flds) { > + argms[5] = key_list[laddr, raddr, lport, rport, flag, 3] > + } else { > + argms[5] = -1 > + } > + > + if(rexmit_flg || filter_flds) { > + argms[12] = key_list[laddr, raddr, lport, rport, flag, 4] > + } else { > + argms[12] = -1 > + } > + > + if(pmtu_flg || filter_flds) { > + argms[6] = key_list[laddr, raddr, lport, rport, flag, 5] > + } else { > + argms[6] = -1 > + } > + > + if(snd_cwnd_flg || filter_flds) { > + argms[7] = key_list[laddr, raddr, lport, rport, flag, 6] > + } else { > + argms[7] = -1 > + } > + > + if(snd_wnd_flg || filter_flds) { > + argms[8] = key_list[laddr, raddr, lport, rport, flag, 7] > + } else { > + argms[8] = -1 > + } > + > + if(srtt_flg || filter_flds) { > + argms[10] = key_list[laddr, raddr, lport, rport, flag, 8] > + } else { > + argms[10] = -1 > + } > + > + if(ssthresh_flg || filter_flds) { > + argms[11] = key_list[laddr, raddr, lport, rport, flag, 10] > + } else { > + argms[11] = -1 > + } > + > + if(timer_flg || filter_flds) { > + timer_val = key_list[laddr, raddr, lport, rport, flag, 11] > + if(timer_val < 1 || timer_val > 4) { > + prnt_timer = "N/A" > + } else { > + prnt_timer = timer_info[timer_val] > + } > + } else { > + prnt_timer = "-" > + } > + > + if(rcvwnd_flg || filter_flds) { > + argms[9] = key_list[laddr, raddr, lport, rport, flag, 12] > + } else { > + argms[9] = -1 > + } > + > + if(flag) { > + argms[0] = netmax(22,strlen(local_addr)) > + argms[2] = netmax(22,strlen(remote_addr)) > + buff = sprintf("%-*s %-*s %-*s %-*d %-*d %-*d %-*d %-*d %-*d %-*d %-*d %-*d", > + argms[0], local_addr, argms[2],remote_addr, 11, prnt_state, 6, > + argms[3], 5, argms[4], 5, argms[5], 5, argms[6], 7, argms[7], > + 7, argms[8], 6, argms[9], 4, argms[10], 11, argms[11]); > + buff1 = sprintf("%-*d %-*s",6, argms[12], argms[1], prnt_timer); > + > + > + } else { > + argms[0] = netmax(22,strlen(remote_addr)) > + argms[2] = netmax(22,strlen(local_addr)) > + buff = sprintf("%-*s %-*s %-*s %-*d %-*d %-*d %-*d %-*d %-*d %-*d %-*d %-*d", > + argms[0], remote_addr, argms[2],local_addr, 11, prnt_state, 6, > + argms[3], 5, argms[4], 5, argms[5], 5, argms[6], 7, argms[7], > + 7, argms[8], 6, argms[9], 4, argms[10], 11, argms[11]); > + buff1 = sprintf("%-*d %-*s",6, argms[12], argms[1], prnt_timer); > + } The only difference I can see in this if/else is the order of the local/remote addr, so it would probably be simpler to save which is the source and dest, and then share the same sprintfs. The dynamic field width using netmax is unnecessary. A "ipv4addr:port" string can't be more than 21 characters, so the 22 width is fine. And even if it could, the field width doesn't ever cause truncation, so the whole thing will alway be printed anyway. Actually, I don't understand why any of this is using dynamic field widths, when you're just passing literals anyway. The desired widths are already designed by the header you print in the beginning, right? I would much prefer to see "%-22s ...". Also, I think that the default right-alignment is easier to read for numbers, because it's easier to compare and judge magnitudes at a glance when the digit orders are lined up. > + > + buff_final = str_replace(buff, "-1", " -") > + buff_final1 = str_replace(buff1, "-1", " -") > + printf("%s %s\n",buff_final,buff_final1) > +} I wouldn't copy these strings into local variables like this very often, as our translator isn't too smart about copying strings around. We generate a fair amount less code if you avoid those buff_final temporaries and pass the str_replaces directly to the printf. > + > +function netmax:long (val1:long, val2:long) > +{ > + if(val1 > val2) > + return val1 > + else > + return val2 > +} > + > +function is_packet_updated:long ( laddr:long, raddr:long, lport:long, > + rport:long, flag:long, sk:long ) > +{ > + packet_updated = 0 > + > + if(state_flg || filter_flds) { > + if(update) { > + key_list[laddr, raddr, lport, rport, flag, 1] = > + @cast(sk, "sock_common")->skc_state > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 1] > + state = @cast(sk, "sock_common")->skc_state > + if(temp != state) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 1] = state > + } > + } > + if(txq_flg || filter_flds) { > + if(update) { > + if(@cast(sk, "sock_common")->skc_state == 10) > + key_list[laddr, raddr, lport, rport, flag, 2] = > + @cast(sk, "sock")->sk_max_ack_backlog > + else > + key_list[laddr, raddr, lport, rport, flag, 2] = > + (@cast(sk, "tcp_sock")->write_seq - > + @cast(sk, "tcp_sock")->snd_una) > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 2] > + if(@cast(sk, "sock_common")->skc_state == 10) > + tx_queue = > + @cast(sk, "sock")->sk_max_ack_backlog > + else > + tx_queue = (@cast(sk, "tcp_sock")->write_seq - > + @cast(sk, "tcp_sock")->snd_una) > + if(temp != tx_queue) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 2] = tx_queue > + } > + } > + if(rxq_flg || filter_flds) { > + if(update) { > + if(@cast(sk, "sock_common")->skc_state == 10) > + key_list[laddr, raddr, lport, rport, flag, 3] = > + @cast(sk, "sock")->sk_ack_backlog > + else > + key_list[laddr, raddr, lport, rport, flag, 3] = > + (@cast(sk, "tcp_sock")->rcv_nxt - @cast(sk, "tcp_sock")->copied_seq) > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 3] > + if(@cast(sk, "sock_common")->skc_state == 10) > + rx_queue = @cast(sk, "sock")->sk_max_ack_backlog > + else > + rx_queue = (@cast(sk, "tcp_sock")->rcv_nxt - > + @cast(sk, "tcp_sock")->copied_seq) > + if(temp != rx_queue) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 3] = rx_queue > + } > + } > + if(rexmit_flg || filter_flds) { > + if(update) { > + key_list[laddr, raddr, lport, rport, flag, 4] = > + @cast(sk, "inet_connection_sock")->icsk_retransmits > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 4] > + retxmit = > + @cast(sk, "inet_connection_sock")->icsk_retransmits > + if(temp != retxmit) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 4] = retxmit > + } > + } > + if(pmtu_flg || filter_flds) { > + if(update) { > + key_list[laddr, raddr, lport, rport, flag, 5] = > + @cast(sk, "inet_connection_sock")->icsk_pmtu_cookie > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 5] > + pmtu = > + @cast(sk, "inet_connection_sock")->icsk_pmtu_cookie > + if(temp != pmtu) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 5] = pmtu > + } > + } > + if(snd_cwnd_flg || filter_flds) { > + if(update) { > + key_list[laddr, raddr, lport, rport, flag, 6] = > + @cast(sk, "tcp_sock")->snd_cwnd > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 6] > + snd_cwnd = @cast(sk, "tcp_sock")->snd_cwnd > + if(temp != snd_cwnd) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 6] = snd_cwnd > + } > + } > + if(snd_wnd_flg || filter_flds) { > + if(update) { > + key_list[laddr, raddr, lport, rport, flag, 7] = > + @cast(sk, "tcp_sock")->snd_wnd > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 7] > + snd_wnd = @cast(sk, "tcp_sock")->snd_wnd > + if(temp != snd_wnd) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 7] = snd_wnd > + } > + } > + if(srtt_flg || filter_flds) { > + if(update) { > + key_list[laddr, raddr, lport, rport, flag, 8] = > + (@cast(sk,"tcp_sock")->srtt) >> 3 > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 8] > + srtt = (@cast(sk,"tcp_sock")->srtt) >> 3 > + if(temp != srtt) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 8] = srtt > + } > + } > + > + key_list[laddr, raddr, lport, rport, flag, 9] = length > + > + if(ssthresh_flg || filter_flds) { > + if(update) { > + if ((1 << @cast(sk, "inet_connection_sock")->icsk_ca_state) & ((1 << 2) | (1 << 3))) > + key_list[laddr, raddr, lport, rport, flag, 10] = > + @cast(sk, "tcp_sock")->snd_ssthresh > + else > + key_list[laddr, raddr, lport, rport, flag, 10] = > + netmax(@cast(sk, "tcp_sock")->snd_ssthresh, > + ((@cast(sk, "tcp_sock")->snd_cwnd >>1)+ > + (@cast(sk, "tcp_sock")->snd_cwnd >> 2))) > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 10] > + if ((1 << @cast(sk, "inet_connection_sock")->icsk_ca_state) & ((1 << 2) | (1 << 3))) > + ssthresh = @cast(sk, "tcp_sock")->snd_ssthresh > + else > + ssthresh = > + netmax(@cast(sk, "tcp_sock")->snd_ssthresh, > + ((@cast(sk, "tcp_sock")->snd_cwnd >> 1) + > + (@cast(sk, "tcp_sock")->snd_cwnd >> 2))) > + if(temp != ssthresh) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 10] = > + ssthresh > + } > + } > + > + if(timer_flg || filter_flds) { > + if(update) { > + if(flag) > + key_list[laddr, raddr, lport, rport, flag, 11] = > + tx_current_timer > + else > + key_list[laddr, raddr, lport, rport, flag, 11] = > + rx_current_timer > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 11] > + if(flag) { > + if(temp != tx_current_timer) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 11] = > + tx_current_timer > + } else { > + if(temp != rx_current_timer) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 11] = > + rx_current_timer > + } > + } > + } > + > + if(rcvwnd_flg || filter_flds) { > + if(update) { > + key_list[laddr, raddr, lport, rport, flag, 12] = > + @cast(sk, "tcp_sock")->rcv_wnd > + packet_updated = 1 > + } else { > + temp = key_list[laddr, raddr, lport, rport, flag, 12] > + rcvwnd = @cast(sk, "tcp_sock")->rcv_wnd > + if(temp != rcvwnd) > + packet_updated = 1 > + key_list[laddr, raddr, lport, rport, flag, 12] = rcvwnd > + } > + } > + return packet_updated > +} There's a lot of duplication between each "if (update) / else" branch. I would just use the else branch on all of them, and then the return value is "update || packet_updated". > + > +/* All command line arguments other than the filters are processed > + * first and must be placed on the command line prior to any filters. > + */ > +function process_cmdline:long () > +{ > + filter_number = 0 > + count = 0 > + for (i=1; i <= argc; i++) { > + argument = tokenize(argv[i], "=") > + > + if (argument == "filter") { > + argv[i]="" > + while(strlen(byte = tokenize(argv[i], ",")) != 0) { > + argv[i] = "" > + tcp_param[count] = byte > + count++ > + } > + continue; > + } > + > + if ( argument == "timeout" ){ > + argv[i]="" > + timeout=strtol(tokenize(argv[i], "="),10) > + continue; > + } > + > + if ( argument == "update") { > + argv[i]="" > + update_disp = tokenize(argv[i], "=") > + continue; > + } > + > + /* Anything on the command line after this point must > + * be a filter. > + */ > + source = tokenize(argv[i], "-") > + argv[i] = "" > + dest = tokenize(argv[i], "-") > + > + source_addr = tokenize(source, ":") > + source="" > + source_port = tokenize(source, ":") > + > + dest_addr = tokenize(dest, ":") > + dest="" > + dest_port = tokenize(dest, ":") > + > + /* stap bug */ > + if ( dest_port == "fobar") i=i; > + > + ++filter_number; > + j=filter_number*6; > + filter[j] = ipv4_pton(source_addr,0) // Source address > + filter[j+1] = ipv4_pton(source_addr,1) // Source address mask > + filter[j+2] = ipv4_portton(source_port) // Source port > + filter[j+3] = ipv4_pton(dest_addr,0) // Dest address > + filter[j+4] = ipv4_pton(dest_addr,1) // Dest address mask > + filter[j+5] = ipv4_portton(dest_port) // Dest port This is another case of magic indexes, which doesn't really need to be an array at all. These are distinct components, so why not just name them filter_src_addr, filter_src_port, etc.? > + > + if (filter[j]< -1 || > + filter[j+1] < -1 || > + filter[j+2] < -1 || > + filter[j+3] < -1 || > + filter[j+4] < -1 || > + filter[j+5] < -1 ) return -1; > + > + } > + if (update_disp == "all") { > + update = 1; > + } else if (update_disp == "change") { > + update = 0; > + } else { > + /*We default it as "change"*/ > + update = 0; > + } The way this flag is used, I would call it "always_update". > + return filter_number; > +} > + > +/* > + * Convert an ascii integer values between 0 and 65534 to a u16 port number. > + * "*" are treated as wildcards and will be converted to 0xffff. > + */ > +function ipv4_portton:long (port:string) > +{ > + if ( port == "*" ) port="65535"; > + pport=strtol(port,10); > + if ( pport > 0xffff ){ > + printf("Bad port number %s\n",port) > + return -22; > + } > + return pport > +} Since 65535 could be a valid port number, why not use something like -1 to represent the wildcard case? > + > +/* > + * Convert ipv4 dot notation address into longs. > + * Supports "*" in any field treating it as a wildcard by making the byte=0. > + * If make_mask is set it creates a mask based on the "*" fields. All non='*' > + * bytes are set to 0xff all * fields are set to 0x0;. > + */ > +function ipv4_pton:long (addr:string, make_mask:long) > +{ > + i=32; > + ip=0; > + ips=addr; > + while(strlen(byte = tokenize(ips, ".")) != 0) { > + i-=8; > + ips=""; > + > + if ( byte == "*" ){ > + byte = "0" > + } else { > + if ( make_mask ) byte = "255"; > + } > + > + j=strtol(byte,10); > + if ( j > 255 ){ > + printf("bad address %s\n",addr) > + return -22; > + } > + ip=ip+(j< + } > + if ( i != 0 ){ > + printf("bad address %s\n",addr) > + return -22; > + } > + return ip; > +} > + > +/* > + * Returns a unique value (stored in the global key_list) based on the socket > + * address tuple and the global collection index value. A new value is created > + * if one does not already exist. > + */ > +function build_key:long (laddr:long, raddr:long, lport:long, rport:long, flag:long) > +{ > + __index__ = 0 > + > + if ( key_list[laddr, raddr, lport, rport, flag, __index__] ) { > + return key_list[laddr, raddr, lport, rport, flag, __index__] > + } else { > + key_list[laddr, raddr, lport, rport, flag, __index__]=++lastkey > + key = key_list[laddr, raddr, lport, rport, flag, __index__] > + __index__ ++ > + while(__index__ != 12) { > + key_list[laddr, raddr, lport, rport, flag, __index__]=0 > + __index__ ++ > + } > + return key > + } > +} > + > + > +function filter_key:long (laddr:long, raddr:long, lport:long, rport:long, flag:long) > +{ > + j = 6; > + local_filter = remote_filter = 0 > + > + //Local Filter > + if ( (laddr&filter[j+1]) == filter[j] ) { > + if ( (filter[j+2] == 0xffff) || (lport == filter[j+2])) > + local_filter = 1; > + } > + // Remote filter > + if ( (raddr&filter[j+4]) == filter[j+3] ) { > + if ( (filter[j+5] == 0xffff) || (rport == filter[j+5])) > + remote_filter = 1; > + } > + > + if(local_filter && remote_filter) { > + return build_key(laddr, raddr, lport, rport, flag); > + } else { > + return 0; > + } > +} The result of filter_key is only used as a boolean, so build_key's value is wasted. (It does still have the side effect of clearing the other key_list indexes though.) However, I think you could use this key to simplify a lot of other indexing. The tuple (laddr,raddr,lport,rport,flag) only needs to be checked once here, and then the resulting key could be passed around for your index everywhere else. > + > +/* Terminates the run in timeout seconds, using global timeout value */ > +probe timer.s(1) { > + if ( timeout == -1 ) next > + if ( !timeout-- ) exit() > +} There's an off-by-one error here with the post-decrement. (e.g. timeout=1 will wait for 2 seconds.) > + > +function usage (msg:string) > +{ > + printf("\nUsage:\n"); > + printf("\ttcp_trace.stp filter=all|state|txq|rxq|srtt|snd_cwnd|snd_wnd|rexmit|pmtu|ssthresh|timer|rcvwnd [timeout=] [update=change|all] Rule\n"); > + printf("\tRule format: :-:\n\n"); > + error(msg); > +} Note that printf() goes to stdout, and error() goes to stderr, and there's no strict ordering between the two. > diff --git a/testsuite/systemtap.examples/network/tcp_trace.txt b/testsuite/systemtap.examples/network/tcp_trace.txt > new file mode 100644 > index 0000000..46fe53f > --- /dev/null > +++ b/testsuite/systemtap.examples/network/tcp_trace.txt > @@ -0,0 +1,24 @@ > +tcp_trace.stp filter=all 9.47.66.34:*-9.47.67.199:* > +Start TCP Probing..... > + > +Source Address Dest Address State Len Tx-Q Rx-Q PMTU SndCwnd SndWnd A.RWnd SRTT Ssthreshold Rexmit Timer > +9.47.67.199:55587 9.47.66.34:22 SYN_RECV 20 0 0 1500 2 6144 5840 0 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 21 21 0 1500 3 6144 5840 0 2147483647 0 N/A > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 20 21 0 1500 3 6144 5888 0 2147483647 0 N/A > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 41 0 0 1500 4 6144 5888 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 0 0 21 1500 4 6144 5888 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 0 0 792 1500 4 6144 5888 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 784 784 792 1500 4 6144 7424 1 2147483647 0 N/A > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 44 784 0 1500 4 6144 7424 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 152 152 0 1500 5 7680 7424 1 2147483647 0 N/A > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 164 152 0 1500 5 7680 7424 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 0 0 0 1500 5 9216 7424 1 2147483647 0 Delack > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 720 720 0 1500 5 9216 9088 1 2147483647 0 Delack > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 36 720 0 1500 5 9216 9088 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 0 0 16 1500 5 10752 9088 1 2147483647 0 Delack > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 68 0 0 1500 5 10752 9088 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 0 0 0 1500 5 10752 9088 1 2147483647 0 Delack > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 48 48 0 1500 5 10752 9088 1 2147483647 0 Delack > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 84 48 0 1500 5 10752 9088 1 2147483647 0 N/A > +9.47.66.34:22 9.47.67.199:55587 ESTABLISHED 64 64 0 1500 5 10752 9088 1 2147483647 0 Delack > +9.47.67.199:55587 9.47.66.34:22 ESTABLISHED 116 64 0 1500 5 10752 9088 1 2147483647 0 N/A > > I usually try to trim my reviews a bit more, but with the size of this file it was easier for me to keep all of the context. I hope it's still legible for you. Thanks, Josh