From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dmalcolm@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 795A83858C2F
	for <gcc-patches@gcc.gnu.org>; Tue,  6 Sep 2022 02:22:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 795A83858C2F
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1662430948;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=byMLZFXnkOZJOg7zQwPKVFpeQklFbL0BXn75f20cf9o=;
	b=WarsOHUhhATBD/OhsfpgUky5tqoZE3BFT0Z7aJLjVl6cAGFfNw9cL44Y7nkotVIcVQ7Ipj
	I6sqdKDgucD704RPtnbCSV0QK2/nPukiBvsV3e/mJsv6sT91KqZu+Pcyjl35gZixux+4vu
	CHv2Z+RG6SzdKRQUQlb01/8mOcaoFPc=
Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com
 [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-518-7_qllRRiOaq7POVeYfzUJA-1; Mon, 05 Sep 2022 22:22:25 -0400
X-MC-Unique: 7_qllRRiOaq7POVeYfzUJA-1
Received: by mail-qk1-f200.google.com with SMTP id bs43-20020a05620a472b00b006bb56276c94so8012899qkb.10
        for <gcc-patches@gcc.gnu.org>; Mon, 05 Sep 2022 19:22:24 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=mime-version:user-agent:content-transfer-encoding:references
         :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from
         :to:cc:subject:date;
        bh=byMLZFXnkOZJOg7zQwPKVFpeQklFbL0BXn75f20cf9o=;
        b=QJbnj3b9fGnXHApLwxGV8OY2HGo0wT7AGDgD+k0tp9m62b5cSrauEdV5wRxOyqhIvn
         rvr9aKwSqc9k1TZgzj/E0HprkVRp20SS6F7DhCb7Ej0gI8jdGj2ss/hLSCbC11lAZu/6
         LfZlGx+9WS3V4ZBPUICszr9pWEQ+XvSze3Zd34q7eRIsIBsrug7d5d4Q/5sDw9DVKB6J
         oMyhovthbGYWz/fLWjadR1xOCziVZHm9pvqHQ2NeqxMlZWOzgEvtGQVvAZ7BrbKNA6PK
         RA/suaYDE8l23Iah+C4ukNDnkJe22lXVkHQu+WVmFPtiYjZioBaySgU8ezYF9zuqMdDt
         ekJg==
X-Gm-Message-State: ACgBeo0iy5eV3ivN2C/mwVbxxcuLpqJRvidZYqGH99tJ0yq/I7SUUy/E
	imya2CJdAr5tLHjvy4+IDTB2l/8h/izjW7CB0w4+u8PiFJgnPDMmw/CCQRgtW2rtxx/U+1KupTZ
	0qIuNzSaKWOHXMbX5iQ==
X-Received: by 2002:ac8:5a12:0:b0:343:6d1b:eb3d with SMTP id n18-20020ac85a12000000b003436d1beb3dmr41796080qta.364.1662430944140;
        Mon, 05 Sep 2022 19:22:24 -0700 (PDT)
X-Google-Smtp-Source: AA6agR5pJZuyRzwhkiLQA+o0DkNy4v7RbkafrKhsao+zWISYRR8BScoMW88ZrDGfdtdewGq41ycQQA==
X-Received: by 2002:ac8:5a12:0:b0:343:6d1b:eb3d with SMTP id n18-20020ac85a12000000b003436d1beb3dmr41796074qta.364.1662430943874;
        Mon, 05 Sep 2022 19:22:23 -0700 (PDT)
Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193])
        by smtp.gmail.com with ESMTPSA id x6-20020ac84d46000000b00342f8d4d0basm8036979qtv.43.2022.09.05.19.22.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 05 Sep 2022 19:22:23 -0700 (PDT)
Message-ID: <294db6f8bc85a6d15ec267eb4bbd0d354210e186.camel@redhat.com>
Subject: Re: [PATCH][WIP?] analyzer: support for symbolic values in the
 out-of-bounds checker [PR106625]
From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>, gcc-patches@gcc.gnu.org
Date: Mon, 05 Sep 2022 22:22:21 -0400
In-Reply-To: <20220905211624.269455-1-mail@tim-lange.me>
References: <20220905211624.269455-1-mail@tim-lange.me>
User-Agent: Evolution 3.44.4 (3.44.4-1.fc36)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, 2022-09-05 at 23:16 +0200, Tim Lange wrote:
> Hi,
>=20
> below is my patch, adding support for reasoning about buffer
> overflows and
> overreads with symbolic offsets and capacities.

Thanks for the updated patch.

Various comments inline below...

>=20
> I've already had one off-list feedback from Dave after sending him my
> preliminary work. Below, I'll be also answering some of the questions
> that
> came up during the first round.
>=20
> To reason about out-of-bounds with symbolic values, I have decided to
> do
> some simplifications:
> * Only reason in bytes, i.e. loosing some bits on bit-field accesses
> and
> =C2=A0 bit ranges in the conversion.
> * Not trying to handle commutativeness [0].
> * Require similar structure of the offset and the capacity svalue to
> =C2=A0 complain about symbolic out-of-bounds.
> * A symbolic leaf operand [1] is positive if its type is unsigned. I
> do
> =C2=A0 ignore that the operand could be zero. It wouldn't make much sense
> =C2=A0 to have an offset that is always zero, but is not inferable
> statically
> =C2=A0 such that the approxmiation here would yield a false-positive. In
> order
> =C2=A0 to fully prevent the false-positive, we would have to give up many
> true
> =C2=A0 positives. Dave also thinks that is a reasonable approximation.

(nods)

>=20
> > Whats the slowdown of this patch?
> I found another optimization that tries to skip the structural
> equality
> check by trying referential equality (aka comparing pointers) first.
> That
> seems to do the trick and at least in my single run of compiling
> coreutils, curl, httpd and openssh with the current master and my
> patch
> enabled, I've found little to no overhead, at most ~5s CPU time [2].
>=20
> > Why was the realloc implementation changed?
> With the patch, the analyzer can reason about simple inequalities of
> svalues. The previous way of getting the smaller of the current
> buffer
> size and the new buffer size was less accurate and lead to a false
> positive because it chose the wrong svalue. The change fixes that by
> using the same comparison function as the out-of-bounds checker.
> Also, I
> changed it to return the OLD_SIZE_SVAL by default, because I think I
> had
> a thinking error in my previous patch: Realloc implementations
> probably
> only move the buffer if the buffer grows. That means the old buffer
> is
> copied to the new one, only touching as many bytes as the old buffer
> had.
> The rest of the bytes is left uninitialized.
>=20
> I added [WIP?], because the regrtests are not yet finished but a
> similar
> version did pass them, so I assume thats okay to post it now for
> review
> and hand in the regrtests later [3].
>=20
> - Tim
>=20
> [0] I have tried that earlier but it turned out to be too slow.
> [1] leaf =3D=3D conjured, inital or constant svalue.
> [2] Note that I didn't run multiple tests and the compile farm is not
> =C2=A0=C2=A0=C2=A0 isolated and I haven't done anything about caching etc=
. But at
> least
> =C2=A0=C2=A0=C2=A0 the results show that there is no heavy slowdown.

That's fine.

> [3] The analyzer and analyzer-torture tests pass on my machine for
> C/C++.
>=20
> This patch adds support for reasoning about the inequality of two
> symbolic
> values in the special case specifically suited for reasoning about
> out-of-bounds past the end of the buffer. With this patch, the
> analyzer
> catches off-by-one errors and more even when the offset and capacity
> is
> symbolic.
>=20
> Tested on coreutils, curl, httpd and openssh.

[...snip...]

> +=C2=A0 label_text
> +=C2=A0 describe_final_event (const evdesc::final_event &ev) final
> override
> +=C2=A0 {
> +=C2=A0=C2=A0=C2=A0 const char *byte_str;
> +=C2=A0=C2=A0=C2=A0 if (pending_diagnostic::same_tree_p (m_num_bytes,
> integer_one_node))
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 byte_str =3D "byte";
> +=C2=A0=C2=A0=C2=A0 else
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 byte_str =3D "bytes";
> +

As noted in the offlist review, I want to do a rewording of all of
these diagnostics, but that can be a followup (for PR 106626), either
by me or you, so I'm not going to be fussy about this part of the patch
(e.g. the i18n issues).

[...snip...]

> +
> +/* Check whether an access is past the end of the BASE_REG.=C2=A0 */
> +
> +void region_model::check_symbolic_bounds (const region *base_reg,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 const svalue
> *sym_byte_offset,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 const svalue
> *num_bytes_sval,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 const svalue *capacity,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 enum access_direction dir,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 region_model_context *ctxt)
> const
> +{
> +=C2=A0 gcc_assert (ctxt);
> +
> +=C2=A0 const svalue *next_byte
> +=C2=A0=C2=A0=C2=A0 =3D m_mgr->get_or_create_binop (num_bytes_sval->get_t=
ype (),
> PLUS_EXPR,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sym_byte_offset, num_bytes_sv=
al);
> +
> +=C2=A0 if (eval_condition_without_cm (next_byte, GT_EXPR,
> capacity).is_true ())

It occurs to me that we could use region_model::eval_condition here,
rather than region_model::eval_condition_without_cm, which would allow
the comparison to also take account of known constraints - this perhaps
could catch cases where the user got the sense of a comparison wrong,
e.g. if they write

  if (idx > size)
    arr[idx] =3D val;

when they meant:

  if (idx < size)
    arr[idx] =3D val;

But that can be saved for a (possible) followup, given that it might be
slower, and that all of your testing has been with the code as written,
and I think the above mistake is relatively unlikely compared to the
kinds of off-by-one mistakes that the patch tests for.

[...snip...]


> +/* Return true if A is definitely larger than B.
> +
> +=C2=A0=C2=A0 Limitation: does not account for integer overflows and does=
 not
> try to
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 return false, so it can not be used negated.=C2=A0 */
> +
> +tristate
> +region_model::symbolic_greater_than (const binop_svalue *bin_a,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 const svalu=
e *b) const
> +{
> +=C2=A0 /* Eliminate the right-hand side of both svalues.=C2=A0 */
> +=C2=A0 if (const binop_svalue *bin_b =3D dyn_cast <const binop_svalue *>
> (b))
> +=C2=A0=C2=A0=C2=A0 if (bin_a->get_op () =3D=3D bin_b->get_op ()
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0&& eval_condition_without_cm (=
bin_a->get_arg1 (),
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GT_EX=
PR,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bin_b=
->get_arg1 ()).is_true ()
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0&& eval_condition_without_cm (=
bin_a->get_arg0 (),
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GE_EX=
PR,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bin_b=
->get_arg0 ()).is_true ())
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return tristate (tristate::TS_TRUE);
> +
> +=C2=A0 /* Otherwise, try to remove a positive offset or factor only from
> BIN_A.=C2=A0 */
> +=C2=A0 if ((bin_a->get_op () =3D=3D PLUS_EXPR || bin_a->get_op () =3D=3D
> MULT_EXPR)
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 && is_positive_svalue (bin_a->get_arg1 ()=
))
> +=C2=A0=C2=A0=C2=A0 if (eval_condition_without_cm (bin_a->get_arg0 (),GE_=
EXPR,
> b).is_true ())
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return tristate (tristate::TS_TRUE);
> +
> +=C2=A0 return tristate::unknown ();
> +}

Again, I wonder if this should be using region_model::eval_condition
rather than region_model::eval_condition_without_cm - but investigating
that can be saved for a followup.

[...snip...]


The posted patch is OK to push to trunk (assuming that your testing
passes).

Thanks - this looks like a nice improvement to the existing warning.
Dave