From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 3BDE53858401 for ; Tue, 9 Aug 2022 22:44:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3BDE53858401 Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-104-nUD-nP1wM2GiXaU4wW4pzA-1; Tue, 09 Aug 2022 18:44:27 -0400 X-MC-Unique: nUD-nP1wM2GiXaU4wW4pzA-1 Received: by mail-qv1-f70.google.com with SMTP id l16-20020ad44250000000b0047676a29dd9so6926998qvq.1 for ; Tue, 09 Aug 2022 15:44:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc; bh=vKoVr2bb1b68cHkVDJODf8H3f3WU+P8p3f7ZoGUA0SM=; b=6VyLnUxe+VGUbQCSPLLk8+zzPtDSln7fQQi4DcFJ3q7VP2idI+eU8ikPe/V29B+BTY tuHlhDq0MjNik6PpEv6PyXJd6aiaOObvE0E6TWV2+VB3McfLbk8/XtGGnv4m/Dz+bsjk BH1jCHw/mJNZaVa+9kld9Uw4MBKi8kte6EllSwevSvgmOODkBUAnwPCwScw0Lx04lXYX 2mYZiZuauhwkXaVESHLvQ+KcfYN1Q32Jk9bY+oenUWj3nVAbksLxvZ08zXxC6pq7N2c9 ZmUW1nF6HlDqEJy+V5X/jH3IVIJc/TtCazKAagooOQb+k1sw7gqMNuthv2lMB3xh8V60 sujQ== X-Gm-Message-State: ACgBeo2vzQs4XwAEHPQzR1Kq9OLpAOEIc2OMjTVdkQ9LbVxraM738Oe7 kD50HNnn3q7NAy3+ecJDiv5jOMhr8se+z3hpk03jxLHVhFLSusMyWDF3lywDVCVy5Wp2AoAagst 3p2FQljVWsf2Ts2pcyA== X-Received: by 2002:a37:8a47:0:b0:6b5:561c:b90b with SMTP id m68-20020a378a47000000b006b5561cb90bmr18803922qkd.427.1660085066034; Tue, 09 Aug 2022 15:44:26 -0700 (PDT) X-Google-Smtp-Source: AA6agR4Gv0OHdbbqNyDQEpfcISJXN9YAW9gHi9x5eFSdqlHr0+5ECTRdSGwjIEO4iQBqGPvFwhcuig== X-Received: by 2002:a37:8a47:0:b0:6b5:561c:b90b with SMTP id m68-20020a378a47000000b006b5561cb90bmr18803907qkd.427.1660085065648; Tue, 09 Aug 2022 15:44:25 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id do35-20020a05620a2b2300b006b93b61bc74sm8697445qkb.9.2022.08.09.15.44.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Aug 2022 15:44:24 -0700 (PDT) Message-ID: <7a010cff4272202a15bd90b92863f54946f41b9e.camel@redhat.com> Subject: Re: [PATCH 2/2] analyzer: out-of-bounds checker [PR106000] From: David Malcolm To: Tim Lange , gcc-patches@gcc.gnu.org Date: Tue, 09 Aug 2022 18:44:23 -0400 In-Reply-To: <20220809211943.82098-2-mail@tim-lange.me> References: <20220809211943.82098-1-mail@tim-lange.me> <20220809211943.82098-2-mail@tim-lange.me> User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2022 22:44:33 -0000 On Tue, 2022-08-09 at 23:19 +0200, Tim Lange wrote: > This patch adds an experimental out-of-bounds checker to the > analyzer. > > The checker was tested on coreutils, curl, httpd and openssh. It is > mostly > accurate but does produce false-positives on yacc-generated files and > sometimes when the analyzer misses an invariant. These cases will be > documented in bugzilla. > (Regrtests still running with the latest changes, will report back > later.) Hi Tim, thanks for the patch, and for all the testing you've done on it. We've already had several rounds of review of this off-list, and this patch looks very close to ready. Some nits below... > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt > index 5021376b6fb..8e73af60ceb 100644 > --- a/gcc/analyzer/analyzer.opt > +++ b/gcc/analyzer/analyzer.opt > @@ -158,6 +158,10 @@ Wanalyzer-tainted-size >  Common Var(warn_analyzer_tainted_size) Init(1) Warning >  Warn about code paths in which an unsanitized value is used as a > size. >   > +Wanalyzer-out-of-bounds > +Common Var(warn_analyzer_out_of_bounds) Init(1) Warning > +Warn about code paths in which a write or read to a buffer is out- > of-bounds. > + Please keep the list alphabetized; I think this needs to be between Wanalyzer-mismatching-deallocation  and  Wanalyzer-possible-null-argument >  Wanalyzer-use-after-free >  Common Var(warn_analyzer_use_after_free) Init(1) Warning >  Warn about code paths in which a freed value is used. > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region- > model.cc > index f7df2fca245..2f9382ed96c 100644 > --- a/gcc/analyzer/region-model.cc > +++ b/gcc/analyzer/region-model.cc > @@ -1268,6 +1268,402 @@ region_model::on_stmt_pre (const gimple > *stmt, >      } >  } >   > +/* Abstract base class for all out-of-bounds warnings.  */ > + > +class out_of_bounds : public > pending_diagnostic_subclass > +{ > +public: > +  out_of_bounds (const region *reg, tree diag_arg, byte_range range) > +  : m_reg (reg), m_diag_arg (diag_arg), m_range (range) > +  {} > + > +  const char *get_kind () const final override > +  { > +    return "out_of_bounds_diagnostic"; > +  } > + > +  bool operator== (const out_of_bounds &other) const > +  { > +    return m_reg == other.m_reg > +          && m_range == other.m_range > +          && pending_diagnostic::same_tree_p (m_diag_arg, > other.m_diag_arg); > +  } > + > +  int get_controlling_option () const final override > +  { > +    return OPT_Wanalyzer_out_of_bounds; > +  } > + > +  void mark_interesting_stuff (interesting_t *interest) final > override > +  { > +    interest->add_region_creation (m_reg); > +  } > + > +protected: > +  const region *m_reg; > +  tree m_diag_arg; > +  byte_range m_range; Please add a comment clarifying what the meaning of m_range is here. Is it (a) the range of all bytes that are accessed, (b) the range of bytes that are accessed out-of-bounds, (c) etc? >From my reading of the patch I think it's (b). > +}; > + > +/* Abstract subclass to complaing about out-of-bounds > +   past the end of the buffer.  */ > + > +class past_the_end : public out_of_bounds > +{ > +public: > +  past_the_end (const region *reg, tree diag_arg, byte_range range, > +               tree byte_bound) > +  : out_of_bounds (reg, diag_arg, range), m_byte_bound (byte_bound) > +  {} > + > +  bool operator== (const past_the_end &other) const > +  { > +    return m_reg == other.m_reg > +          && m_range == other.m_range > +          && pending_diagnostic::same_tree_p (m_diag_arg, > other.m_diag_arg) Is it possible to call out_of_bounds::operator== for the first three fields, rather than a copy-and-paste of the logic? > +          && pending_diagnostic::same_tree_p (m_byte_bound, > +                                              other.m_byte_bound); > +  } > + > +  label_text > +  describe_region_creation_event (const evdesc::region_creation &ev) > final > +  override > +  { > +    if (m_byte_bound && TREE_CODE (m_byte_bound) == INTEGER_CST) > +      return ev.formatted_print ("capacity is %E bytes", > m_byte_bound); > + > +    return label_text (); > +  } > + > +protected: > +  tree m_byte_bound; > +}; [...snip the concrete subclasses...] We went through several rounds of review off-list, and I have lots of ideas for wording tweaks to the patch, but rather than me be a "backseat driver" (or bikeshedding), I think that that aspect of the patch is good enough as-is, and I'll make the wording changes myself once the patch is in trunk. [...snip...] > + > +    if (warned) > +      { > +       char num_bytes_past_buf[WIDE_INT_PRINT_BUFFER_SIZE]; > +       print_dec (m_range.m_size_in_bytes, num_bytes_past_buf, > UNSIGNED); I think we can use %wu for this, but I can fix this up in a followup. [...snip...] > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index fa23fbeaaaa..5ab834af780 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -459,6 +459,7 @@ Objective-C and Objective-C++ Dialects}. >  -Wno-analyzer-null-dereference @gol >  -Wno-analyzer-possible-null-argument @gol >  -Wno-analyzer-possible-null-dereference @gol > +-Wno-analyzer-out-of-bounds @gol Please move between -Wno-analyzer-null-dereference @gol and -Wno-analyzer-possible-null-argument @gol for alphabetization. >  -Wno-analyzer-shift-count-negative @gol >  -Wno-analyzer-shift-count-overflow @gol >  -Wno-analyzer-stale-setjmp-buffer @gol > @@ -9991,6 +9992,17 @@ This warning requires @option{-fanalyzer}, > which enables it; use >  This diagnostic warns for paths through the code in which a >  value known to be NULL is dereferenced. >   > +@item -Wno-analyzer-out-of-bounds > +@opindex Wanalyzer-out-of-bounds > +@opindex Wno-analyzer-out-of-bounds > +This warning requires @option{-fanalyzer} to enable it; use > +@option{-Wno-analyzer-out-of-bounds} to disable it. > + > +This diagnostic warns for path through the code in which a buffer is > +accessed or written out-of-bounds. Would be good to clarify the limitations: as I understand it: "The diagnostic only applies for cases where the analyzer is able to determine a constant size for the buffer. It warns when any part of a read or write is definitely before the start of the buffer, or definitely after the end." ...or somesuch wording. > + > +See @url{https://cwe.mitre.org/data/definitions/119.html, CWE-119: > Improper Restriction of Operations within the Bounds of a Memory > Buffer}. Also, please move the new entry to position to keep things alphabetized. > + >  @item -Wno-analyzer-shift-count-negative >  @opindex Wanalyzer-shift-count-negative >  @opindex Wno-analyzer-shift-count-negative [...snip...] > diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c > b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c > new file mode 100644 > index 00000000000..715c8b7460f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c > @@ -0,0 +1,119 @@ > +#include > +#include > +#include > +#include > + > +/* Wanalyzer-out-of-bounds tests for buffer overflows.  */ > + > +/* Avoid folding of memcpy.  */ > +typedef void * (*memcpy_t) (void *dst, const void *src, size_t n); > + > +static memcpy_t __attribute__((noinline)) > +get_memcpy (void) > +{ > +  return memcpy; > +} > + > + > +/* Taken from CWE-787.  */ > +void test1 (void) > +{ > +  int id_sequence[3]; > + > +  id_sequence[0] = 123; > +  id_sequence[1] = 234; > +  id_sequence[2] = 345; > +  id_sequence[3] = 456; /* { dg-line test1 } */ > + > +  /* { dg-warning "overflow" "warning" { target *-*-* } test1 } */ > +  /* { dg-message "" "note" { target *-*-* } test1 } */ I see that you've left the regexes mostly blank in the various DejaGnu directives in these new tests. Normally I'd want these to be less vague, but given that I plan to change the wordings in a followup anyway, this is OK. [...snip lots of great testcases...] With the above nits fixed, the patch is OK for trunk (assuming that your testing doesn't show any problems). Thanks again for the patch; this feels like a major new feature. Dave