Re: [PATCH][WIP?] analyzer: support for symbolic values in the out-of-bounds checker [PR106625]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][WIP?] analyzer: support for symbolic values in the out-of-bounds checker [PR106625]
Date: Mon, 05 Sep 2022 22:22:21 -0400	[thread overview]
Message-ID: <294db6f8bc85a6d15ec267eb4bbd0d354210e186.camel@redhat.com> (raw)
In-Reply-To: <20220905211624.269455-1-mail@tim-lange.me>

On Mon, 2022-09-05 at 23:16 +0200, Tim Lange wrote:
> Hi,
> 
> below is my patch, adding support for reasoning about buffer
> overflows and
> overreads with symbolic offsets and capacities.

Thanks for the updated patch.

Various comments inline below...

> 
> I've already had one off-list feedback from Dave after sending him my
> preliminary work. Below, I'll be also answering some of the questions
> that
> came up during the first round.
> 
> To reason about out-of-bounds with symbolic values, I have decided to
> do
> some simplifications:
> * Only reason in bytes, i.e. loosing some bits on bit-field accesses
> and
>   bit ranges in the conversion.
> * Not trying to handle commutativeness [0].
> * Require similar structure of the offset and the capacity svalue to
>   complain about symbolic out-of-bounds.
> * A symbolic leaf operand [1] is positive if its type is unsigned. I
> do
>   ignore that the operand could be zero. It wouldn't make much sense
>   to have an offset that is always zero, but is not inferable
> statically
>   such that the approxmiation here would yield a false-positive. In
> order
>   to fully prevent the false-positive, we would have to give up many
> true
>   positives. Dave also thinks that is a reasonable approximation.

(nods)

> 
> > Whats the slowdown of this patch?
> I found another optimization that tries to skip the structural
> equality
> check by trying referential equality (aka comparing pointers) first.
> That
> seems to do the trick and at least in my single run of compiling
> coreutils, curl, httpd and openssh with the current master and my
> patch
> enabled, I've found little to no overhead, at most ~5s CPU time [2].
> 
> > Why was the realloc implementation changed?
> With the patch, the analyzer can reason about simple inequalities of
> svalues. The previous way of getting the smaller of the current
> buffer
> size and the new buffer size was less accurate and lead to a false
> positive because it chose the wrong svalue. The change fixes that by
> using the same comparison function as the out-of-bounds checker.
> Also, I
> changed it to return the OLD_SIZE_SVAL by default, because I think I
> had
> a thinking error in my previous patch: Realloc implementations
> probably
> only move the buffer if the buffer grows. That means the old buffer
> is
> copied to the new one, only touching as many bytes as the old buffer
> had.
> The rest of the bytes is left uninitialized.
> 
> I added [WIP?], because the regrtests are not yet finished but a
> similar
> version did pass them, so I assume thats okay to post it now for
> review
> and hand in the regrtests later [3].
> 
> - Tim
> 
> [0] I have tried that earlier but it turned out to be too slow.
> [1] leaf == conjured, inital or constant svalue.
> [2] Note that I didn't run multiple tests and the compile farm is not
>     isolated and I haven't done anything about caching etc. But at
> least
>     the results show that there is no heavy slowdown.

That's fine.

> [3] The analyzer and analyzer-torture tests pass on my machine for
> C/C++.
> 
> This patch adds support for reasoning about the inequality of two
> symbolic
> values in the special case specifically suited for reasoning about
> out-of-bounds past the end of the buffer. With this patch, the
> analyzer
> catches off-by-one errors and more even when the offset and capacity
> is
> symbolic.
> 
> Tested on coreutils, curl, httpd and openssh.

[...snip...]

> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final
> override
> +  {
> +    const char *byte_str;
> +    if (pending_diagnostic::same_tree_p (m_num_bytes,
> integer_one_node))
> +      byte_str = "byte";
> +    else
> +      byte_str = "bytes";
> +

As noted in the offlist review, I want to do a rewording of all of
these diagnostics, but that can be a followup (for PR 106626), either
by me or you, so I'm not going to be fussy about this part of the patch
(e.g. the i18n issues).

[...snip...]

> +
> +/* Check whether an access is past the end of the BASE_REG.  */
> +
> +void region_model::check_symbolic_bounds (const region *base_reg,
> +                                         const svalue
> *sym_byte_offset,
> +                                         const svalue
> *num_bytes_sval,
> +                                         const svalue *capacity,
> +                                         enum access_direction dir,
> +                                         region_model_context *ctxt)
> const
> +{
> +  gcc_assert (ctxt);
> +
> +  const svalue *next_byte
> +    = m_mgr->get_or_create_binop (num_bytes_sval->get_type (),
> PLUS_EXPR,
> +                                 sym_byte_offset, num_bytes_sval);
> +
> +  if (eval_condition_without_cm (next_byte, GT_EXPR,
> capacity).is_true ())

It occurs to me that we could use region_model::eval_condition here,
rather than region_model::eval_condition_without_cm, which would allow
the comparison to also take account of known constraints - this perhaps
could catch cases where the user got the sense of a comparison wrong,
e.g. if they write

  if (idx > size)
    arr[idx] = val;

when they meant:

  if (idx < size)
    arr[idx] = val;

But that can be saved for a (possible) followup, given that it might be
slower, and that all of your testing has been with the code as written,
and I think the above mistake is relatively unlikely compared to the
kinds of off-by-one mistakes that the patch tests for.

[...snip...]


> +/* Return true if A is definitely larger than B.
> +
> +   Limitation: does not account for integer overflows and does not
> try to
> +              return false, so it can not be used negated.  */
> +
> +tristate
> +region_model::symbolic_greater_than (const binop_svalue *bin_a,
> +                                    const svalue *b) const
> +{
> +  /* Eliminate the right-hand side of both svalues.  */
> +  if (const binop_svalue *bin_b = dyn_cast <const binop_svalue *>
> (b))
> +    if (bin_a->get_op () == bin_b->get_op ()
> +       && eval_condition_without_cm (bin_a->get_arg1 (),
> +                                     GT_EXPR,
> +                                     bin_b->get_arg1 ()).is_true ()
> +       && eval_condition_without_cm (bin_a->get_arg0 (),
> +                                     GE_EXPR,
> +                                     bin_b->get_arg0 ()).is_true ())
> +      return tristate (tristate::TS_TRUE);
> +
> +  /* Otherwise, try to remove a positive offset or factor only from
> BIN_A.  */
> +  if ((bin_a->get_op () == PLUS_EXPR || bin_a->get_op () ==
> MULT_EXPR)
> +      && is_positive_svalue (bin_a->get_arg1 ()))
> +    if (eval_condition_without_cm (bin_a->get_arg0 (),GE_EXPR,
> b).is_true ())
> +      return tristate (tristate::TS_TRUE);
> +
> +  return tristate::unknown ();
> +}

Again, I wonder if this should be using region_model::eval_condition
rather than region_model::eval_condition_without_cm - but investigating
that can be saved for a followup.

[...snip...]


The posted patch is OK to push to trunk (assuming that your testing
passes).

Thanks - this looks like a nice improvement to the existing warning.
Dave

next prev parent reply	other threads:[~2022-09-06  2:22 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-05 21:16 Tim Lange
2022-09-06  2:22 ` David Malcolm [this message]
2022-09-07 15:51   ` [PATCH v2] " Tim Lange
2022-09-07 21:13     ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=294db6f8bc85a6d15ec267eb4bbd0d354210e186.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mail@tim-lange.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).