From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 795A83858C2F for ; Tue, 6 Sep 2022 02:22:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 795A83858C2F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662430948; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=byMLZFXnkOZJOg7zQwPKVFpeQklFbL0BXn75f20cf9o=; b=WarsOHUhhATBD/OhsfpgUky5tqoZE3BFT0Z7aJLjVl6cAGFfNw9cL44Y7nkotVIcVQ7Ipj I6sqdKDgucD704RPtnbCSV0QK2/nPukiBvsV3e/mJsv6sT91KqZu+Pcyjl35gZixux+4vu CHv2Z+RG6SzdKRQUQlb01/8mOcaoFPc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-518-7_qllRRiOaq7POVeYfzUJA-1; Mon, 05 Sep 2022 22:22:25 -0400 X-MC-Unique: 7_qllRRiOaq7POVeYfzUJA-1 Received: by mail-qk1-f200.google.com with SMTP id bs43-20020a05620a472b00b006bb56276c94so8012899qkb.10 for ; Mon, 05 Sep 2022 19:22:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date; bh=byMLZFXnkOZJOg7zQwPKVFpeQklFbL0BXn75f20cf9o=; b=QJbnj3b9fGnXHApLwxGV8OY2HGo0wT7AGDgD+k0tp9m62b5cSrauEdV5wRxOyqhIvn rvr9aKwSqc9k1TZgzj/E0HprkVRp20SS6F7DhCb7Ej0gI8jdGj2ss/hLSCbC11lAZu/6 LfZlGx+9WS3V4ZBPUICszr9pWEQ+XvSze3Zd34q7eRIsIBsrug7d5d4Q/5sDw9DVKB6J oMyhovthbGYWz/fLWjadR1xOCziVZHm9pvqHQ2NeqxMlZWOzgEvtGQVvAZ7BrbKNA6PK RA/suaYDE8l23Iah+C4ukNDnkJe22lXVkHQu+WVmFPtiYjZioBaySgU8ezYF9zuqMdDt ekJg== X-Gm-Message-State: ACgBeo0iy5eV3ivN2C/mwVbxxcuLpqJRvidZYqGH99tJ0yq/I7SUUy/E imya2CJdAr5tLHjvy4+IDTB2l/8h/izjW7CB0w4+u8PiFJgnPDMmw/CCQRgtW2rtxx/U+1KupTZ 0qIuNzSaKWOHXMbX5iQ== X-Received: by 2002:ac8:5a12:0:b0:343:6d1b:eb3d with SMTP id n18-20020ac85a12000000b003436d1beb3dmr41796080qta.364.1662430944140; Mon, 05 Sep 2022 19:22:24 -0700 (PDT) X-Google-Smtp-Source: AA6agR5pJZuyRzwhkiLQA+o0DkNy4v7RbkafrKhsao+zWISYRR8BScoMW88ZrDGfdtdewGq41ycQQA== X-Received: by 2002:ac8:5a12:0:b0:343:6d1b:eb3d with SMTP id n18-20020ac85a12000000b003436d1beb3dmr41796074qta.364.1662430943874; Mon, 05 Sep 2022 19:22:23 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id x6-20020ac84d46000000b00342f8d4d0basm8036979qtv.43.2022.09.05.19.22.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Sep 2022 19:22:23 -0700 (PDT) Message-ID: <294db6f8bc85a6d15ec267eb4bbd0d354210e186.camel@redhat.com> Subject: Re: [PATCH][WIP?] analyzer: support for symbolic values in the out-of-bounds checker [PR106625] From: David Malcolm To: Tim Lange , gcc-patches@gcc.gnu.org Date: Mon, 05 Sep 2022 22:22:21 -0400 In-Reply-To: <20220905211624.269455-1-mail@tim-lange.me> References: <20220905211624.269455-1-mail@tim-lange.me> User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 2022-09-05 at 23:16 +0200, Tim Lange wrote: > Hi, >=20 > below is my patch, adding support for reasoning about buffer > overflows and > overreads with symbolic offsets and capacities. Thanks for the updated patch. Various comments inline below... >=20 > I've already had one off-list feedback from Dave after sending him my > preliminary work. Below, I'll be also answering some of the questions > that > came up during the first round. >=20 > To reason about out-of-bounds with symbolic values, I have decided to > do > some simplifications: > * Only reason in bytes, i.e. loosing some bits on bit-field accesses > and > =C2=A0 bit ranges in the conversion. > * Not trying to handle commutativeness [0]. > * Require similar structure of the offset and the capacity svalue to > =C2=A0 complain about symbolic out-of-bounds. > * A symbolic leaf operand [1] is positive if its type is unsigned. I > do > =C2=A0 ignore that the operand could be zero. It wouldn't make much sense > =C2=A0 to have an offset that is always zero, but is not inferable > statically > =C2=A0 such that the approxmiation here would yield a false-positive. In > order > =C2=A0 to fully prevent the false-positive, we would have to give up many > true > =C2=A0 positives. Dave also thinks that is a reasonable approximation. (nods) >=20 > > Whats the slowdown of this patch? > I found another optimization that tries to skip the structural > equality > check by trying referential equality (aka comparing pointers) first. > That > seems to do the trick and at least in my single run of compiling > coreutils, curl, httpd and openssh with the current master and my > patch > enabled, I've found little to no overhead, at most ~5s CPU time [2]. >=20 > > Why was the realloc implementation changed? > With the patch, the analyzer can reason about simple inequalities of > svalues. The previous way of getting the smaller of the current > buffer > size and the new buffer size was less accurate and lead to a false > positive because it chose the wrong svalue. The change fixes that by > using the same comparison function as the out-of-bounds checker. > Also, I > changed it to return the OLD_SIZE_SVAL by default, because I think I > had > a thinking error in my previous patch: Realloc implementations > probably > only move the buffer if the buffer grows. That means the old buffer > is > copied to the new one, only touching as many bytes as the old buffer > had. > The rest of the bytes is left uninitialized. >=20 > I added [WIP?], because the regrtests are not yet finished but a > similar > version did pass them, so I assume thats okay to post it now for > review > and hand in the regrtests later [3]. >=20 > - Tim >=20 > [0] I have tried that earlier but it turned out to be too slow. > [1] leaf =3D=3D conjured, inital or constant svalue. > [2] Note that I didn't run multiple tests and the compile farm is not > =C2=A0=C2=A0=C2=A0 isolated and I haven't done anything about caching etc= . But at > least > =C2=A0=C2=A0=C2=A0 the results show that there is no heavy slowdown. That's fine. > [3] The analyzer and analyzer-torture tests pass on my machine for > C/C++. >=20 > This patch adds support for reasoning about the inequality of two > symbolic > values in the special case specifically suited for reasoning about > out-of-bounds past the end of the buffer. With this patch, the > analyzer > catches off-by-one errors and more even when the offset and capacity > is > symbolic. >=20 > Tested on coreutils, curl, httpd and openssh. [...snip...] > +=C2=A0 label_text > +=C2=A0 describe_final_event (const evdesc::final_event &ev) final > override > +=C2=A0 { > +=C2=A0=C2=A0=C2=A0 const char *byte_str; > +=C2=A0=C2=A0=C2=A0 if (pending_diagnostic::same_tree_p (m_num_bytes, > integer_one_node)) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 byte_str =3D "byte"; > +=C2=A0=C2=A0=C2=A0 else > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 byte_str =3D "bytes"; > + As noted in the offlist review, I want to do a rewording of all of these diagnostics, but that can be a followup (for PR 106626), either by me or you, so I'm not going to be fussy about this part of the patch (e.g. the i18n issues). [...snip...] > + > +/* Check whether an access is past the end of the BASE_REG.=C2=A0 */ > + > +void region_model::check_symbolic_bounds (const region *base_reg, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 const svalue > *sym_byte_offset, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 const svalue > *num_bytes_sval, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 const svalue *capacity, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 enum access_direction dir, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 region_model_context *ctxt) > const > +{ > +=C2=A0 gcc_assert (ctxt); > + > +=C2=A0 const svalue *next_byte > +=C2=A0=C2=A0=C2=A0 =3D m_mgr->get_or_create_binop (num_bytes_sval->get_t= ype (), > PLUS_EXPR, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sym_byte_offset, num_bytes_sv= al); > + > +=C2=A0 if (eval_condition_without_cm (next_byte, GT_EXPR, > capacity).is_true ()) It occurs to me that we could use region_model::eval_condition here, rather than region_model::eval_condition_without_cm, which would allow the comparison to also take account of known constraints - this perhaps could catch cases where the user got the sense of a comparison wrong, e.g. if they write if (idx > size) arr[idx] =3D val; when they meant: if (idx < size) arr[idx] =3D val; But that can be saved for a (possible) followup, given that it might be slower, and that all of your testing has been with the code as written, and I think the above mistake is relatively unlikely compared to the kinds of off-by-one mistakes that the patch tests for. [...snip...] > +/* Return true if A is definitely larger than B. > + > +=C2=A0=C2=A0 Limitation: does not account for integer overflows and does= not > try to > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return false, so it can not be used negated.=C2=A0 */ > + > +tristate > +region_model::symbolic_greater_than (const binop_svalue *bin_a, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 const svalu= e *b) const > +{ > +=C2=A0 /* Eliminate the right-hand side of both svalues.=C2=A0 */ > +=C2=A0 if (const binop_svalue *bin_b =3D dyn_cast > (b)) > +=C2=A0=C2=A0=C2=A0 if (bin_a->get_op () =3D=3D bin_b->get_op () > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0&& eval_condition_without_cm (= bin_a->get_arg1 (), > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GT_EX= PR, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bin_b= ->get_arg1 ()).is_true () > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0&& eval_condition_without_cm (= bin_a->get_arg0 (), > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GE_EX= PR, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bin_b= ->get_arg0 ()).is_true ()) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return tristate (tristate::TS_TRUE); > + > +=C2=A0 /* Otherwise, try to remove a positive offset or factor only from > BIN_A.=C2=A0 */ > +=C2=A0 if ((bin_a->get_op () =3D=3D PLUS_EXPR || bin_a->get_op () =3D=3D > MULT_EXPR) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 && is_positive_svalue (bin_a->get_arg1 ()= )) > +=C2=A0=C2=A0=C2=A0 if (eval_condition_without_cm (bin_a->get_arg0 (),GE_= EXPR, > b).is_true ()) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return tristate (tristate::TS_TRUE); > + > +=C2=A0 return tristate::unknown (); > +} Again, I wonder if this should be using region_model::eval_condition rather than region_model::eval_condition_without_cm - but investigating that can be saved for a followup. [...snip...] The posted patch is OK to push to trunk (assuming that your testing passes). Thanks - this looks like a nice improvement to the existing warning. Dave