From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 106493 invoked by alias); 15 Jul 2017 05:41:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 106083 invoked by uid 89); 15 Jul 2017 05:41:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 spammy=H*i:sk:4C_nqPT, H*f:sk:P9pjQU8, H*f:sk:4C_nqPT, H*i:sk:P9pjQU8 X-HELO: mail-oi0-f52.google.com Received: from mail-oi0-f52.google.com (HELO mail-oi0-f52.google.com) (209.85.218.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 15 Jul 2017 05:41:02 +0000 Received: by mail-oi0-f52.google.com with SMTP id x187so86081199oig.3 for ; Fri, 14 Jul 2017 22:41:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=na2Ow9cP65Bs2CZqhXZ6lwoFlX/v6tiq5x8Aiv4vBCM=; b=kcLXQpKviyzlU+cZXWGp1W/KRN+esAdve94xFmjqKPQXy2t+20R6tPJBR9tpxWRubj UYvdWtvbQB/tzHbLVmi4C6Ma0covxlg+F+kXIo9yPYBVu36hWF5/4zNr3Uokr0BRD1TZ Y8PJ2T4d165g6xKdZyn8FQb0a4FYpEqi3MPlrwt7Kvb2xiKdBqNLtMGlpCATxvaV5pVW wetGXPBkjDO2Xe3yb471W44ZtFHXE1JMY6/KZhUSD63zCzqsdYT8egPa0RbRkK8al6QH j7jhSwZ6fOxEPZQBuE5d6nfr2Kf8/XUYwx8MWLRx7NnzCIGE3+CzjiPgCVm2x4PHm5NR lRag== X-Gm-Message-State: AIVw113921G+V0v6gbPhqPQ3jFvwS+zCBEbx63p5h3aCpnGnyiLbUV48 YSphgtSLbKn3LkGGvLHmKolCCjFOV5B2 X-Received: by 10.202.81.138 with SMTP id f132mr8503335oib.94.1500097260279; Fri, 14 Jul 2017 22:41:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.64.65 with HTTP; Fri, 14 Jul 2017 22:40:39 -0700 (PDT) In-Reply-To: References: <234840fd-a06a-4dfd-a1c5-254e26144754.weixi.wwx@antfin.com> From: "Dmitry Vyukov via gcc-patches" Reply-To: Dmitry Vyukov Date: Sat, 15 Jul 2017 05:41:00 -0000 Message-ID: Subject: Re: Add support to trace comparison instructions and switch statements To: Kostya Serebryany Cc: Wish Wu , gcc , gcc-patches , weixi.wwx@antfin.com, Alexander Potapenko , andreyknvl , Victor Chibotaru , Yuri Gribov Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2017-07/txt/msg00875.txt.bz2 On Fri, Jul 14, 2017 at 11:17 PM, Kostya Serebryany wrote: >>>> > Hi >>>> > >>>> > I wrote a test for "-fsanitize-coverage=trace-cmp" . >>>> > >>>> > Is there anybody tells me if these codes could be merged into gcc ? >>>> >>>> >>>> Nice! >>>> >>>> We are currently working on Linux kernel fuzzing that use the >>>> comparison tracing. We use clang at the moment, but having this >>>> support in gcc would be great for kernel land. >>>> >>>> One concern I have: do we want to do some final refinements to the API >>>> before we implement this in both compilers? >>>> >>>> 2 things we considered from our perspective: >>>> - communicating to the runtime which operands are constants >>>> - communicating to the runtime which comparisons are counting loop checks >>>> >>>> First is useful if you do "find one operand in input and replace with >>>> the other one" thing. Second is useful because counting loop checks >>>> are usually not useful (at least all but one). >>>> In the original Go implementation I also conveyed signedness of >>>> operands, exact comparison operation (<, >, etc): >>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13 >>>> But I did not find any use for that. >>>> I also gave all comparisons unique IDs: >>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24 >>>> That turned out to be useful. And there are chances we will want this >>>> for C/C++ as well. >>>> >>>> Kostya, did anything like this pop up in your work on libfuzzer? >>>> Can we still change the clang API? At least add an additional argument >>>> to the callbacks? >>> >>> >>> I'd prefer not to change the API, but extend it (new compiler flag, new >>> callbacks), if absolutely needed. >>> Probably make it trace-cmp-guard (similar to trace-pc-guard, with an extra >>> parameter that has the ID). >>> I don't like the approach with compiler-generated constant IDs. >> >> Yes, if we do it for C/C++, we need to create globals and pass pointer >> to a global to the callbacks. IDs do not work for C/C++. >> >>> Yes, it's a bit more efficient, but much less flexible in presence of >>> multiple modules, DSOs, dlopen, etc. >>> >>> I was also looking at completely inlining this instrumentation because it's >>> pretty expensive even in it's current form >>> (adding more parameters will make things worse). >>> This is going to be much less flexible, of course, so I'll attack it only >>> once I settle on the algorithm to handle CMPs in libFuzzer. >> >> This will require a new, completely different API for >> compiler<->runtime anyway, so we can put this aside for now. >> >> >>>> At the very least I would suggest that we add an additional arg that >>>> contains some flags (1/2 arg is a const, this is counting loop check, >>>> etc). If we do that we can also have just 1 callback that accepts >>>> uint64's for args because we can pass operand size in the flags: >>> >>> >>> How many flag combinations do we need and do we *really* need them? >>> >>> If the number of flag combinations is small, I'd prefer to have separate >>> callbacks (__sanitizer_cov_trace_cmp_loop_bound ?) >>> >>> Do we really need to know that one arg is a const? >>> It could well be a constant in fact, but compiler won't see it. >>> >>> int foo(int n) { ... if (i < n) ... } >>> ... >>> foo(42); // not inlined. >>> >>> We need to handle both cases the same way. >> >> >> Well, following this line of reasoning we would need only >> __asan_load/storeN callbacks for asan and remove >> __asan_load/store1/2/4/8, because compiler might not know the size at >> compile time. Constant-ness is only an optimization. If compiler does >> not know that something is a const, fine. Based on my experience with >> go-fuzz and our early experience with kernel, we badly need const >> hint. >> Otherwise fuzzer generates gazillions of candidates based on >> comparison arguments. Note that kernel is several order of magnitude >> larger than what people usually fuzz in user-space, inputs are more >> complex and at the same time execution speed is several order of >> magnitude lower. We can't rely on raw speed. >> >> Thinking of this more, I don't thing that globals will be useful in >> the kernel context (the main problem is that we have multiple >> transient isolated kernels). If we track per-comparison site >> information, we will probably use PCs. So I am ready to give up on >> this. >> >> Both of you expressed concerns about performance. Kostya says we >> should not break existing clang API. >> If we limit this to only constant-ness, then I think we can make this >> both forward and backward compatible, which means we don't need to >> handle it now. E.g. we can: >> - if both operands are const (if it's possible at all), don't emit any callback >> - if only one is const, emit __sanitizer_cov_trace_cmp_const1 and >> pass the const in a known position (i.e. always first/second arg) >> - if none are const, emit callback __sanitizer_cov_trace_cmp_dyn1 >> Then compiler emits weak aliases form >> __sanitizer_cov_trace_cmp_const/dyn1 to the old >> __sanitizer_cov_trace_cmp1, which makes it backwards compatible. >> New runtimes that implement __sanitizer_cov_trace_cmp_const/dyn1, also >> need to provide the old __sanitizer_cov_trace_cmp1 for old compilers. > > SGTM++ > > Although, maybe we can actually break the API this way to keep things > simpler (no weak aliases) > * leave __sanitizer_cov_trace_cmp[1248] for general case > * add __sanitizer_cov_trace_cmp[1248]_const for constant in the 2-nd parameter > * add __sanitizer_cov_trace_cmp[1248]_loop for loop bound in the 2nd parameter This would work for us. > * do we need __sanitizer_cov_trace_cmp[1248]_const_loop ? We don't know yet. Potentially yes, because const is reading/writing to a static buffer, while non-const is reading/writing to a buffer with user-provided size.