From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 55068 invoked by alias); 12 Jul 2018 15:47:00 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 55043 invoked by uid 89); 12 Jul 2018 15:46:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=racing, serhei, Serhei, aggregates X-HELO: out4-smtp.messagingengine.com Received: from out4-smtp.messagingengine.com (HELO out4-smtp.messagingengine.com) (66.111.4.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Jul 2018 15:46:58 +0000 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 78E5621B52 for ; Thu, 12 Jul 2018 11:46:56 -0400 (EDT) Received: from web6 ([10.202.2.216]) by compute5.internal (MEProxy); Thu, 12 Jul 2018 11:46:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.com; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; bh=Npdr2dXBjmbO4diZDhFNpbR76fVJT o3LaM0225Webf4=; b=cB18mwaXJ1M3IwX2vznv9oTwgF31qx1iMjKWwzHDYGvGu v9FZm/M5AXCXnGdy0ch58SrKPkfjE7IrNTbiRDuHtuvvbxKZ3T5NPF08o3UKH3mY NCA3+LPQiyHRP8jYUCrzFmkQaFvJF2DyMfPe5jKiE6c5Ae0vKRT8yioOAV2S6fJb o6OnDOF7sO9Ptbx8PAzOiBTEg8b4v1Swngx4FCv3ZOYUD2oxpGX/6mzz/9ntA7b3 Vy1sO7YUpDguanpkQwHvtq/RliQOOtViBfi7R8LRqsNJZSdx64IU0h/8D9PIqONi PmPtI/zG0Txe2F4Uu7umZ+0ZunTxGHbg5zh+aOdqg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=Npdr2d XBjmbO4diZDhFNpbR76fVJTo3LaM0225Webf4=; b=Xx3P4bMfe0IYhTt/JQOAwb jJQyHYp/I4gZj1uCqQa9BZ4qegOEu9Q2uzNXDEHr1PFgF50rpJsNOnM+n+uZ8kLb R2bieeddUlLMEV+K5u7/qN3+eVu+oryOr0QY6oWgI4/pgYw9x4J7/KDJTmSxRLYt 3TlmVPMBSNwq9Y8Liusq5PzS7zb06H58+c0YBeTy9ItNCvJrxhOo2FKJ2A0kxMXd WgGdL7a0zXYO6TGiCn6hfChoNDZDqbwwK7dPoTR3VzBwUplyInhA28OyPHG28s1J ZQxjZDWB143fXjFiPbmFRQWCDidN6ASR6E5rQsvHbc7ZUAVFVCdd0/ytXQbnmLOw == Received: by mailuser.nyi.internal (Postfix, from userid 99) id 2A0FF4126; Thu, 12 Jul 2018 11:46:56 -0400 (EDT) Message-Id: <1531410416.3144333.1438660296.1A0E8839@webmail.messagingengine.com> From: Serhei Makarov To: systemtap@sourceware.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" Subject: Re: stap/eBPF language features brainstorm References: <1531340017.1855006.1437712520.41D1C30F@webmail.messagingengine.com> In-Reply-To: <1531340017.1855006.1437712520.41D1C30F@webmail.messagingengine.com> Date: Thu, 12 Jul 2018 15:47:00 -0000 X-SW-Source: 2018-q3/txt/msg00017.txt.bz2 On Wed, Jul 11, 2018, at 4:13 PM, Serhei Makarov wrote: > (1) A BPF_MAP_TYPE_PERCPU would be a contiguously indexed, preallocated > array of aggregates, so a BPF_MAP_TYPE_HASH would be needed to map from > sparse keys to indices into the BPF_MAP_TYPE_PERCPU. However, without > synchronization, there is no way to allocate slots in the > BPF_MAP_TYPE_PERCPU. Note that eBPF does have an atomic increment operation in the form of BPF_XADD. If it returned the value at the memory location (either before or after the increment) like a compare-and-swap operation, then it could be used to allocate array slots in a thread-safe fashion. Alas, I can't find any indication in the docs that a value is returned. Rather, the in-kernel testsuite (https://github.com/torvalds/linux/blob/4e33d7d47943aaa84a5904472cf2f9c6d6b0a6ca/lib/test_bpf.c#L4306) specifies that there should be no side-effects (as far as I can decipher the testcases). The purpose of BPF_XADD seems to be to make sure that two increment operations don't 'cancel each other out' by racing to read the same memory location. For example, see the sample eBPF program in http://www.man7.org/linux/man-pages/man2/bpf.2.html -- which contains the following atomic increment of a counter: BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* lock *(u64 *) r0 += r1 */ > (2) /usr/include/linux/bpf.h mentions BPF_MAP_TYPE_HASH_OF_MAPS, but > it's currently undocumented. Still need to read the code and investigate > if it works for our purposes. This might still be an option. > # Global variable locking semantics > > Apparently not possible due to upstream eBPF limitations. I could not > find any compare-and-swap-type operation for map elements. (bcc's > lookup_or_init() compiles to code with potential to data-race. The only > map modification helpers are lookup_elem, update_elem, and delete_elem, > and any compare-and-swap code constructed from them will be subject to > data races.) As suggested by Frank, we could implement probe exclusion with a sequence such as the following: BPF_XADD(&lock_counter, 1); value = read(lock_counter); if (value <= 1) { ... execute probe ... } else skip probe BPF_XADD(&lock_counter, -1); In this case, it is possible for two probes to *both* cancel each other's execution, but it should not be possible for two probes to execute simultaneously. Will need to test how well this works in practice (i.e. how likely are two probes to mutually cancel?)