From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6584 invoked by alias); 15 Dec 2011 11:35:39 -0000 Received: (qmail 6446 invoked by uid 22791); 15 Dec 2011 11:35:38 -0000 X-SWARE-Spam-Status: No, hits=-5.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Dec 2011 11:35:21 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id pBFBZLQ9028378 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 15 Dec 2011 06:35:21 -0500 Received: from springer.wildebeest.org (ovpn-116-19.ams2.redhat.com [10.36.116.19]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id pBFBZJlM023251 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 15 Dec 2011 06:35:21 -0500 Received: by springer.wildebeest.org (Postfix, from userid 500) id 58B2440AE4; Thu, 15 Dec 2011 12:35:19 +0100 (CET) Subject: Re: Making the transport layer more robust - Power Management - follow-up From: Mark Wielaard To: "Turgis, Frederic" Cc: "systemtap@sourceware.org" In-Reply-To: <28BE1A38672C8B4481BB423D0FD1F22E050B4FF9@DNCE03.ent.ti.com> References: <28BE1A38672C8B4481BB423D0FD1F22E050B4FF9@DNCE03.ent.ti.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Date: Thu, 15 Dec 2011 15:31:00 -0000 Message-ID: <1323948918.3409.24.camel@springer.wildebeest.org> Mime-Version: 1.0 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2011-q4/txt/msg00364.txt.bz2 On Thu, 2011-12-08 at 21:22 +0000, Turgis, Frederic wrote: > - in same thread, Mark made the remark that STP_RELAY_TIMER_INTERVAL > and STP_CTL_TIMER_INTERVAL (kernel pollings) are in fact tunables, no > need to modify the code: > * to match our work-arounds, we used then -D > STP_RELAY_TIMER_INTERVAL=3D128 -D STP_CTL_TIMER_INTERVAL=3D256 > * regular wake-ups are clearly occuring less often, with no tracing > issue. But our trace bandwidth is generally hundreds of KB/s max so we > don't really need much robustness I noticed these aren't documented anywhere. I propose to document them as follows: STP_RELAY_TIMER_INTERVAL How often the relay or ring buffers are checked to see if readers need to be woken up to deliver new trace data. Timer interval given in jiffies. Defaults to "((HZ + 99) / 100)" which is every 10ms. STP_CTL_TIMER_INTERVAL How often control messages (system, warn, exit, etc.) are checked to see if control channel readers need to be woken up to notify them. Timer interval given in jiffies. Defaults to "((HZ +49)/50)" which is every 20ms. Where should we add this documentation? > Some more recent findings: > - while testing fixes on some ARM backtrace issue with Mark, I got > message "ctl_write_msg type=3D2 len=3D61 ENOMEM" several times at > beginning of test (not root-caused yet). That means lack of trace > buffer for msg type=3D2, which is OOB_DATA (error and warning messages). > Test and trace data looked fine. Messages do not appear if I compile > without -D STP_CTL_TIMER_INTERVAL=3D256. Yes, that is kind of expected. The control messages really want to be delivered and if you wait too long new control messages will not have room to be added to the buffers. Would it help you if we made the pool reserved memory buffers also tunable? Currently STP_DEFAULT_BUFFERS is defined staticly in either runtime/transport/debugfs.c (40) or runtime/transport/procfs.c (256) depending which backend we use for the control channel. Documentation would be something like: STP_DEFAULT_BUFFERS Defines the number of buffers allocated for control messages the module can store before they have to be read by stapio. Defaults to 40 (8 pre-allocated one time messages plus 32 dynamic err/warning/system messages). > - last non tunable wake-up is timeout of userspace data channel > ppoll() call in reader_thread(). Without change, we wake-up every > 200ms: > * we currently set it to 5s. No issue so far > * Mark (or someone else) suggested to use bulkmode. Here are some > findings: > + bulkmode sets timeout to NULL (or 10s if NEED_PPOLL is set). > It solves wake-up issue. I am just wondering why we have NULL in > bulkmode and 200ms otherwise That is probably because not all trace data backends really support poll/select. The ring_buffer one seems to, but the relay one doesn't. So we would need some way to detect whether the backend really supports select/poll before we can really not use any timeout. If there isn't a bug report about this, there probably should. Will's recent periodic.stp example showed stap and the stap runtime are responsible for a noticable amount of wakeups. > + OMAP hotplugs core so generally core 1 is off at beginning of > test. Therefore I don't get trace of core1 even if core1 is used > later. Makes bulkmode less usable than I thought (at least I still > need to test with core1 "on" at beginning of test to see further > behaviour) Could you file a bug report about the systemtap runtime not noticing new cores coming online for bulk mode? > That makes the possibility to tune ppoll timeout value in non bulkmode > still interesting. I even don't really know what could be consequences > of directly setting to 1s or more but tunable would be good trade-off > that does not break current status. >=20 > Well, I think I gave myself few actions to perform ! Thanks for the feedback. Please let us know how tuning things differently make your life easier. Cheers, Mark