public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: priour.be@gmail.com, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] analyzer: Move gcc.dg/analyzer tests to c-c++-common (1).
Date: Thu, 24 Aug 2023 20:12:05 -0400	[thread overview]
Message-ID: <b43b8d3d48ed0ef7fdd75cdc7cbabdf01415cf86.camel@redhat.com> (raw)
In-Reply-To: <20230824183919.2485913-1-vultkayn@gcc.gnu.org>

> From: benjamin priour <priour.be@gmail.com>
> 
> Hi,
> 
> Below the first batch of a serie of patches to transition
> the analyzer testsuite from gcc.dg/analyzer to c-c++-common/analyzer.
> I do not know how long this serie will be, thus the patch was not
> numbered.
> 
> For the grand majority of the tests, the transition only required some
> adjustement over the syntax and casts to be C++-friendly, or to adjust
> the warnings regexes to fit the C++ FE.
> 
> The most noteworthy change is in the handling of known_functions,
> as described in the below patch.

Hi Benjamin.

Many thanks for putting this together, it looks like it was a lot of
work.

> Successfully regstrapped on x86_64-linux-gnu off trunk
> 18befd6f050e70f11ecca1dd58624f0ee3c68cc7.

Did you compare the before/after results from DejaGnu somehow?

Note that I've pushed 9 patches to the analyzer since
18befd6f050e70f11ecca1dd58624f0ee3c68cc7 and some of those touch the
files below, so it's worth rebasing and double-checking the results.

> Is it OK for trunk ?

It's *almost* ready; various comments inline below, throughout...

> 
> Thanks,
> Benjamin.
> 
> Patch below.
> ---
> 
> First batch of moving tests from under gcc.dg/analyzer into
> c-c++-common/analyzer.
> 
> C builtins are not recognized as such by C++, therefore
> this patch no longer uses tree.h:fndecl_built_in_p to recognize
> a builtin function, but rather the function names.
> 
> Thus functions named as C builtins - such as calloc, sprintf ... -
> are recognized as such both in C and C++ sources by the analyzer.
> 
> For user-declared functions named after builtins, the latters' function_decl
> tree are now preferred over the function_decl the user declared, even
> when the FE consider their declaration to mismatch
> (Wbuiltin-declaration-mismatch emitted). This mainly comes into account
> in the handling of these function attributes : the analyzer uses
> the builtin's attributes defined in gcc/builtins.def.
> 
> Signed-off-by: benjamin priour <vultkayn@gcc.gnu.org>
> 
> gcc/analyzer/ChangeLog:

Please add
 	PR analyzer/96395
to the ChangeLog entries, and [PR96395] to the end of the Subject of
the commit, so that these get tracked within that bug as they get
pushed.

> 
> 	* analyzer.h (class known_function): Add virtual casts to
> 	builtin_known_function.
> 	(class builtin_known_function): New subclass of known_function
> 	for builtins.
> 	* kf.cc (class kf_alloca): Now derived from
> 	builtin_known_function
> 	(class kf_calloc): Likewise.
> 	(class kf_free): Likewise.
> 	(class kf_malloc): Likewise.
> 	(class kf_memcpy_memmove): Likewise.
> 	(class kf_memset): Likewise.
> 	(class kf_realloc): Likewise.
> 	(class kf_strchr): Likewise.
> 	(class kf_sprintf): Likewise.
> 	(class kf_strcpy): Likewise.
> 	(class kf_strdup): Likewise.
> 	(class kf_strlen): Likewise.
> 	(class kf_strndup): Likewise.
> 	(register_known_functions): Builtins are now registered as
> 	known_functions by name rather than by their BUILTIN_CODE.
> 	* known-function-manager.cc (get_normal_builtin): New overload.
> 	* known-function-manager.h: New overload declaration.
> 	* region-model.cc (region_model::get_builtin_kf): New function.
> 	* region-model.h (class region_model): Add declaration of
> 	get_builtin_kf.
> 	* sm-fd.cc: For called recognized as builtins, use the attributes
> 	of that builtin as defined in gcc/builtins.def rather than the user's.
> 	* sm-malloc.cc (malloc_state_machine::on_stmt): Likewise.
> 
> gcc/testsuite/ChangeLog:

Add
 	PR analyzer/96395
here, as well, please.

> 
> 	* gcc.dg/analyzer/aliasing-3.c: Moved to...
> 	* c-c++-common/analyzer/aliasing-3.c: ...here.

[...snip...]

> diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
> index 93a28b4b5cf..63a220c9b6d 100644
> --- a/gcc/analyzer/analyzer.h
> +++ b/gcc/analyzer/analyzer.h
> @@ -128,6 +128,10 @@ struct interesting_t;
>  
>  class feasible_node;
>  
> +class known_function;
> +  class builtin_known_function;
> +  class internal_known_function;
> +
>  /* Forward decls of functions.  */
>  
>  extern void dump_tree (pretty_printer *pp, tree t);
> @@ -279,6 +283,28 @@ public:
>    {
>      return;
>    }
> +
> +  virtual const builtin_known_function *
> +  dyn_cast_builtin_kf () const { return NULL; }
> +  virtual builtin_known_function *
> +  dyn_cast_builtin_kf () { return NULL; }

I don't think we ever work with non-const known_function pointers, so
we don't need this non-const version of the vfunc.

> +};
> +
> +/* Subclass of known_function for builtin functions.  */
> +
> +class builtin_known_function : public known_function
> +{
> +public:
> +  virtual enum built_in_function builtin_code () const = 0;
> +  tree builtin_decl () const {
> +    gcc_assert (builtin_code () < END_BUILTINS);
> +    return builtin_info[builtin_code ()].decl;
> +  }
> +
> +  virtual const builtin_known_function *
> +  dyn_cast_builtin_kf () const { return this; }

This doesn't need a "virtual", but should have a "final override".


> +  virtual builtin_known_function *
> +  dyn_cast_builtin_kf () { return this; }

Drop the non-const overload, as noted above.

>  };
>  
>  /* Subclass of known_function for IFN_* functions.  */
> diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
> index 59f46bab581..6c58a1b20e5 100644
> --- a/gcc/analyzer/kf.cc
> +++ b/gcc/analyzer/kf.cc

[...snip...]

> @@ -1406,41 +1491,69 @@ register_known_functions (known_function_manager &kfm)
>      kfm.add (IFN_UBSAN_BOUNDS, make_unique<kf_ubsan_bounds> ());
>    }
>  
> -  /* Built-ins the analyzer has known_functions for.  */
> +  /* GCC built-ins that does not correspond to a function
> +     in the standard library.  */

"does not" -> "do not"

[...snip...]

> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
> index 99817aee3a9..3a8f07525ba 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -1514,6 +1514,34 @@ region_model::get_known_function (enum internal_fn ifn) const
>    return known_fn_mgr->get_internal_fn (ifn);
>  }
>  
> +/* Get any builtin_known_function for CALL and emit any warning to CTXT
> +   if not NULL.
> +
> +   The call must match all assumptions made by the known_function (such as
> +   e.g. "argument 1's type must be a pointer type").
> +
> +   Return NULL if no builtin_known_function is found, or it does
> +   not match the assumption(s).
> +
> +   Internally calls get_known_function to find a known_function and cast it
> +   to a builtin_known_function.  */

I confess I'm still a little hazy as to the whole builtin_kf logic, but
I trust you that this is needed.

Please can you add a paragraph to this comment to explain the
motivation here (perhaps giving examples?)

> +
> +const builtin_known_function *
> +region_model::get_builtin_kf (const gcall *call,
> +			       region_model_context *ctxt /* = NULL */) const
> +{
> +  region_model *mut_this = const_cast <region_model *> (this);
> +  tree callee_fndecl = mut_this->get_fndecl_for_call (call, ctxt);
> +  if (! callee_fndecl)
> +    return NULL;
> +
> +  call_details cd (call, mut_this, ctxt);
> +  if (const known_function *kf = get_known_function (callee_fndecl, cd))
> +    return kf->dyn_cast_builtin_kf ();
> +
> +  return NULL;
> +}
> +

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c b/gcc/testsuite/c-c++-common/analyzer/aliasing-3.c
> similarity index 85%
> rename from gcc/testsuite/gcc.dg/analyzer/aliasing-3.c
> rename to gcc/testsuite/c-c++-common/analyzer/aliasing-3.c
> index 003077ad5c1..f78fea64fbe 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c
> +++ b/gcc/testsuite/c-c++-common/analyzer/aliasing-3.c
> @@ -1,6 +1,14 @@
>  #include "analyzer-decls.h"
>  
> -#define NULL ((void *)0)
> +#ifdef __cplusplus
> +  #if __cplusplus >= 201103L
> +  #define NULL nullptr
> +  #else
> +  #define NULL 0
> +  #endif
> +#else
> +  #define NULL ((void *)0)
> +#endif

The above fragment of code gets used a lot; can it be moved into
analyzer-decls.h, perhaps wrapped in a "#ifndef NULL"?

That could be done as a followup patch if it's easier.

[...snip...]

> diff --git a/gcc/testsuite/c-c++-common/analyzer/analyzer-decls.h b/gcc/testsuite/c-c++-common/analyzer/analyzer-decls.h
> new file mode 100644
> index 00000000000..d9a32ed9370
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/analyzer/analyzer-decls.h
> @@ -0,0 +1,56 @@
> +#ifndef ANALYZER_DECLS_H
> +#define ANALYZER_DECLS_H
> +
[...snip...]
> +
> +/* Obtain an "unknown" void *.  */
> +extern void *__analyzer_get_unknown_ptr (void);
> +
> +#endif /* #ifndef ANALYZER_DECLS_H.  */

We don't want to have a copy of the file here (the "DRY principle"). 
For example, I added a __analyzer_get_strlen in
325f9e88802daaca0a4793ca079bb504f7d76c54 which doesn't seem to be in
this copy.

Can we simply have something like

  #include "../../gcc.dg/analyzer/analyzer-decls.h"

here so that we're getting the "real" analyzer-decls.h by reference.

The relative path in the above #include might need tweaking as I'm not
sure exactly what "-I" directives are passed in when running the c-c++-
common tests.


[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/memcpy-2.c b/gcc/testsuite/c-c++-common/analyzer/memcpy-2.c
> similarity index 79%
> rename from gcc/testsuite/gcc.dg/analyzer/memcpy-2.c
> rename to gcc/testsuite/c-c++-common/analyzer/memcpy-2.c
> index 51e4a694951..25b0a5e4ff4 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/memcpy-2.c
> +++ b/gcc/testsuite/c-c++-common/analyzer/memcpy-2.c
> @@ -1,8 +1,9 @@
>  /* { dg-additional-options "-Wno-stringop-overflow -Wno-analyzer-out-of-bounds" } */
>  
> -void
> -main (int c, void *v)
> +int
> +test (int c, void *v)
>  {
>    static char a[] = "";
>    __builtin_memcpy (v, a, -1);
> +  return 0;
>  }

FWIW, the analyzer special-cases the handling of functions named
"main", but I don't think it matters for this testcase, so the renaming
is OK.

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96841.c b/gcc/testsuite/c-c++-common/analyzer/pr96841.c
> similarity index 77%
> rename from gcc/testsuite/gcc.dg/analyzer/pr96841.c
> rename to gcc/testsuite/c-c++-common/analyzer/pr96841.c
> index 14f3f7a86a3..dd61176b73b 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/pr96841.c
> +++ b/gcc/testsuite/c-c++-common/analyzer/pr96841.c
> @@ -12,7 +12,7 @@ th (int *);
>  void
>  bv (__SIZE_TYPE__ ny, int ***mf)
>  {
> -  while (l8 ())
> +  while (l8 ()) /* { dg-warning "terminating analysis for this program point" } */
>      {
>        *mf = 0;
>        (*mf)[ny] = (int *) malloc (sizeof (int));

Does this warning happen for both C and C++?

It's probably better to simply add -Wno-analyzer-too-complex to this
case, as we don't want to be fussy about precisely which line the
message is emitted at (or, indeed, that the warning is emitted at all).

[...snip...]

> diff --git a/gcc/testsuite/c-c++-common/analyzer/test-setjmp.h b/gcc/testsuite/c-c++-common/analyzer/test-setjmp.h
> new file mode 100644
> index 00000000000..52c57d02b70
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/analyzer/test-setjmp.h
> @@ -0,0 +1,27 @@
> +/* Various integration tests for setjmp-handling expect a precise
> +   multiline output.
> +   
> +   The outputs from -fdiagnostics-path-format=inline-events for such
> +   setjmp tests are dependent on whether setjmp is a macro or a function
> +   (and whether that macro is defined in a system header).
> +
> +   setjmp is a function on some systems and a macro on others.
> +   This header provides a SETJMP macro in a (fake) system header,
> +   along with precanned decls of setjmp, for consistency of output across
> +   different systems.  */
> +
> +#pragma GCC system_header
> +
> +struct __jmp_buf_tag {
> +  char buf[1];
> +};
> +typedef struct __jmp_buf_tag jmp_buf[1];
> +typedef struct __jmp_buf_tag sigjmp_buf[1];
> +
> +extern int setjmp(jmp_buf env);
> +extern int sigsetjmp(sigjmp_buf env, int savesigs);
> +
> +extern void longjmp(jmp_buf env, int val);
> +extern void siglongjmp(sigjmp_buf env, int val);
> +
> +#define SETJMP(E) setjmp(E)

Is this a copy of gcc.dg/analyzer/test-setjmp.h ?
If so, can the content of that file be #included from this one, rather
than keeping a copy.

> diff --git a/gcc/testsuite/c-c++-common/analyzer/test-uaccess.h
b/gcc/testsuite/c-c++-common/analyzer/test-uaccess.h
> new file mode 100644
> index 00000000000..70c9d6309ef
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/analyzer/test-uaccess.h
> @@ -0,0 +1,15 @@
> +/* Shared header for testcases for copy_from_user/copy_to_user.  */
> +
> +/* Adapted from include/linux/compiler.h  */
> +
> +#define __user
> +
> +/* Adapted from include/asm-generic/uaccess.h  */
> +
> +extern int copy_from_user(void *to, const void __user *from, long n)
> +  __attribute__((access (write_only, 1, 3),
> +		 access (read_only, 2, 3)));
> +
> +extern long copy_to_user(void __user *to, const void *from, unsigned long n)
> +  __attribute__((access (write_only, 1, 3),
> +		 access (read_only, 2, 3)));

Similarly here, can this be a #include of the other file, rather than a
copy?

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/write-to-string-literal-1.c b/gcc/testsuite/c-c++-common/analyzer/write-to-string-literal-1bis.c
> similarity index 86%
> rename from gcc/testsuite/gcc.dg/analyzer/write-to-string-literal-1.c
> rename to gcc/testsuite/c-c++-common/analyzer/write-to-string-literal-1bis.c
> index 092500e066f..7d06febba77 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/write-to-string-literal-1.c
> +++ b/gcc/testsuite/c-c++-common/analyzer/write-to-string-literal-1bis.c
> @@ -1,20 +1,21 @@
>  #include <string.h>
>  
> +#ifdef __cplusplus
> +#define CONST_CAST(type) const_cast<type>
> +#else
> +#define CONST_CAST(type)
> +#endif
> +
>  /* PR analyzer/95007.  */
>  
>  void test_1 (void)
>  {
> -  char *s = "foo";
> +  char *s = CONST_CAST(char *)("foo");
>    s[0] = 'g'; /* { dg-warning "write to string literal" } */
>  }
>  
>  /* PR c/83347.  */
>  
> -void test_2 (void)
> -{
> -  memcpy ("abc", "def", 3); /* { dg-warning "write to string literal" } */
> -}
> -
>  static char * __attribute__((noinline))
>  called_by_test_3 (void)
>  {

Why the rename from -1.c to to -1bis.c ?

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c b/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
> index f8dc806d619..e94c0561665 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
> @@ -1,53 +1,14 @@
>  /* See e.g. https://en.cppreference.com/w/c/io/fprintf
>     and https://www.man7.org/linux/man-pages/man3/sprintf.3.html */
>  
> +/* { dg-skip-if "C++ fpermissive already throws an error" { c++ } } */

Given that this is in the gcc.dg directory, this directive presumably
never skips.

Is the intent here to document that
(a) this set of tests is just for C, and
(b) you've checked that this test has been examined, and isn't on the
"TODO" list to be migrated?

If so, could it just be a regular comment for humans?

> +
>  extern int
>  sprintf(char* dst, const char* fmt, ...)
>    __attribute__((__nothrow__));
>  
> -#define NULL ((void *)0)
> -
> -int
> -test_passthrough (char* dst, const char* fmt)
> -{
> -  /* This assumes that fmt doesn't have any arguments.  */
> -  return sprintf (dst, fmt);
> -}
> -
> -void
> -test_known (void)
> -{
> -  char buf[10];
> -  int res = sprintf (buf, "foo");
> -  /* TODO: ideally we would know the value of "res" is 3,
> -     and known the content and strlen of "buf" after the call */
> -}
> -
> -int
> -test_null_dst (void)
> -{
> -  return sprintf (NULL, "hello world"); /* { dg-warning "use of NULL where non-null expected" } */
> -}
>  
> -int
> -test_null_fmt (char *dst)
> -{
> -  return sprintf (dst, NULL);  /* { dg-warning "use of NULL where non-null expected" } */
> -}
> -
> -int
> -test_uninit_dst (void)
> -{
> -  char *dst;
> -  return sprintf (dst, "hello world"); /* { dg-warning "use of uninitialized value 'dst'" } */
> -}
> -
> -int
> -test_uninit_fmt_ptr (char *dst)
> -{
> -  const char *fmt;
> -  return sprintf (dst, fmt); /* { dg-warning "use of uninitialized value 'fmt'" } */
> -}
> +#define NULL ((void *)0)
>  
>  int
>  test_uninit_fmt_buf (char *dst)

Looks like you split this out into sprintf-2.c, but note that I
recently touched sprintf-1.c in
3b691e0190c6e7291f8a52e1e14d8293a28ff4ce and in
5ef89c5c2f52a2c47fd26845d1f73e20b9081fc9.  I think those changes affect
the rest of the file below this hunk, but does the result still work
after rebasing?  Should any of those changes have been moved to c-c++-
common?

[...snip...]

Thanks again for the patch.

This will be OK for trunk with the above issues fixed.

Dave


  reply	other threads:[~2023-08-25  0:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-24 18:39 priour.be
2023-08-25  0:12 ` David Malcolm [this message]
2023-08-25 12:48   ` Benjamin Priour
2023-08-25 20:10     ` David Malcolm
2023-08-26 12:22       ` [PATCH v2] analyzer: Move gcc.dg/analyzer tests to c-c++-common (1) [PR96395] priour.be
2023-08-26 22:44         ` David Malcolm
2023-08-29  6:47         ` Prathamesh Kulkarni
2023-08-29 13:22           ` Benjamin Priour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b43b8d3d48ed0ef7fdd75cdc7cbabdf01415cf86.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=priour.be@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).