public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC stage 1] Proposed new warning: -Wmisleading-indentation
@ 2015-04-16 15:08 David Malcolm
  2015-04-16 17:27 ` Mike Stump
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: David Malcolm @ 2015-04-16 15:08 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4508 bytes --]

Attached is a work-in-progress patch for a new
  -Wmisleading-indentation
warning I've been experimenting with, for GCC 6.

It kind-of-works, but there are some issues, so I wanted to get feedback
on it here.

The idea is to issue a warning when the C/C++ compiler's view of the
block structure doesn't match that of a naive human reader of the code
via indentation.

For example, CVE-2014-1266 (aka "goto fail") had:

  hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
  hashOut.length = SSL_SHA1_DIGEST_LEN;
  if ((err = SSLFreeBuffer(&hashCtx)) != 0)
      goto fail;
  if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
      goto fail;
  if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
      goto fail;
  if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
      goto fail;
  if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
      goto fail;
      goto fail;  /* MISTAKE! THIS LINE SHOULD NOT BE HERE */
  if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
      goto fail;

  err = sslRawVerify(...);

where the second "goto fail;" here:

  if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
      goto fail;
      goto fail;  /* MISTAKE! THIS LINE SHOULD NOT BE HERE */

is indented as if it's guarded by the conditional.  It isn't, making the
real meaning of the code hard to discern, and masking a security
vulnerability (where the stray goto led to the  certificate-checking
code later in the function being dead code, skipping it with err == 0,
for success).

The patch successfully issues a warning on a simplified version of this:

	if ((err = foo (b)) != 0)
		goto fail;
		goto fail;

./Wmisleading-indentation-6.c: In function ‘foo’:
./Wmisleading-indentation-6.c:15:3: warning: statement is indented as if
it were guarded by... [-Wmisleading-indentation]
   goto fail;
   ^
./Wmisleading-indentation-6.c:13:2: note: ...this clause, but it is not
  if ((err = foo (b)) != 0)
  ^

and on various other testcases.

That said, there are several major issues with the patch as it stands:

(A) works with the C FE, but not with C++ yet.  The implementation
assumes a single lexer consuming the token stream, so it doesn't yet
work with the C++ FE's ideas of tentatively-consumed tokens
(save/commit/rollback), and with its lexer stack.  I *think* that can be
fixed by relying on the fact that the integer locations of the token
stream are normally monotonic, and using that to capture and handle
those two C++ FE impl details.

(B) GTY/PCH/etc, and performance;  I know that it leaks visual_block
instances, and there's probably a way to do this with much less dynamic
memory allocation, but I wanted to get it right before making it fast.

(C) no ChangeLog yet, though it's heavily commented.

(D) tabs vs spaces.  This is probably the biggest can of worms.

Consider:

{
  if (flagA)
    if (flagB)
      if (flagC)
        foo ();
        bar (); /* would like to warn about this */
}

If this is indented using spaces&tabs with 8-space tabs, we have:

<no indent>{
<2 spaces>if (flagA)
<4 spaces>if (flagB)
<6 spaces>if (flagC)
<1 tab>foo ();
<1 tab>bar ();
<no indent>}

What does this look like in an editor?  Consider emacs, which uses uses
0-based offsets to what I call "visual columns".  The column numbers for
the start of the first token for each line are reported by emacs as:

<0>{
<2>if (flagA)
<4>if (flagB)
<6>if (flagC)
<8>foo ();
<8>bar ();
<0>}

However within libcpp and gcc, in linemap's expanded_location and in
diagnostic messages, the "column" numbers are actually 1-based counts of
*characters*, so the "column" numbers emitted in diagnostics for the
start of the first token in each line are actually:

<1>{
<3>if (flagA)
<5>if (flagB)
<7>if (flagC)
<2>foo ();
<2>bar ();
<1>}

Note the transition at tab offsets, where we go from increasing visual
column numbers:

<6>if (flagC)
<8>foo ();

to a decrease in the character count:

<7>if (flagC)
<2>foo ();

Hence to handle mixed spaces+tabs, we need some way to model "visual
columns".  I think this is doable (at the cost of adding a field to
cpp_token, and to c_token/cp_token), provided we can assume, per-buffer,
either
  (i) a consistent value for tabs in terms of spaces, or
  (ii) that we don't have mixed tabs/spaces in this buffer

An issue here is how to determine (i), or if it's OK to default to 8 and
have a command-line option (param?) to override it? (though what about,
say, each header file?)

Thoughts on this, and on the patch?

Thanks

Dave


[-- Attachment #2: misleading-indentation-v1.patch --]
[-- Type: text/x-patch, Size: 35383 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 30a717e..f71a33d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1462,6 +1462,7 @@ OBJS = \
 	var-tracking.o \
 	varasm.o \
 	varpool.o \
+	visual-parser.o \
 	vmsdbgout.o \
 	vtable-verify.o \
 	web.o \
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7d0a2cd..3795087 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -494,6 +494,10 @@ Wmain
 LangEnabledBy(C ObjC C++ ObjC++,Wpedantic, 2, 0)
 ;
 
+Wmisleading-indentation
+C C++ Common Var(warn_misleading_indentation) Warning
+Warn when the indentation of the code does not reflect the block structure
+
 Wmissing-braces
 C ObjC C++ ObjC++ Var(warn_missing_braces) Warning LangEnabledBy(C ObjC,Wall)
 Warn about possibly missing braces around initializers
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 2c41bf2..c4858ba 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin.h"
 #include "c-family/c-ada-spec.h"
 #include "cilk.h"
+#include "visual-parser.h"
 
 /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
 enum decl_context
@@ -560,6 +561,10 @@ add_stmt (tree t)
      recorded during statement expressions.  */
   if (!building_stmt_list_p ())
     push_stmt_list ();
+
+  if (vis_parser)
+    vis_parser->check_stmt (t);
+
   append_to_statement_list_force (t, &cur_stmt_list);
 
   return t;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index d0d35c5..9304e31 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "plugin.h"
 #include "omp-low.h"
+#include "visual-parser.h"
 
 \f
 /* Initialization routine for this file.  */
@@ -216,7 +217,6 @@ typedef struct GTY(()) c_parser {
   vec <c_token, va_gc> *cilk_simd_fn_tokens;
 } c_parser;
 
-
 /* The actual parser and external interface.  ??? Does this need to be
    garbage-collected?  */
 
@@ -236,6 +236,9 @@ c_lex_one_token (c_parser *parser, c_token *token)
   token->keyword = RID_MAX;
   token->pragma_kind = PRAGMA_NONE;
 
+  if (vis_parser)
+    vis_parser->on_token (token->location);
+
   switch (token->type)
     {
     case CPP_NAME:
@@ -5042,11 +5045,13 @@ c_parser_paren_condition (c_parser *parser)
 /* Parse a statement which is a block in C99.  */
 
 static tree
-c_parser_c99_block_statement (c_parser *parser)
+c_parser_c99_block_statement (c_parser *parser, location_t guard_loc)
 {
   tree block = c_begin_compound_stmt (flag_isoc99);
   location_t loc = c_parser_peek_token (parser)->location;
   c_parser_statement (parser);
+  if (vis_parser)
+    vis_parser->on_solo_stmt (block, guard_loc);
   return c_end_compound_stmt (loc, block, flag_isoc99);
 }
 
@@ -5059,7 +5064,7 @@ c_parser_c99_block_statement (c_parser *parser)
    parser->in_if_block.  */
 
 static tree
-c_parser_if_body (c_parser *parser, bool *if_p)
+c_parser_if_body (c_parser *parser, bool *if_p, location_t if_loc)
 {
   tree block = c_begin_compound_stmt (flag_isoc99);
   location_t body_loc = c_parser_peek_token (parser)->location;
@@ -5081,7 +5086,11 @@ c_parser_if_body (c_parser *parser, bool *if_p)
   else if (c_parser_next_token_is (parser, CPP_OPEN_BRACE))
     add_stmt (c_parser_compound_statement (parser));
   else
-    c_parser_statement_after_labels (parser);
+    {
+      c_parser_statement_after_labels (parser);
+      if (vis_parser)
+	vis_parser->on_solo_stmt (block, if_loc);
+    }
   return c_end_compound_stmt (body_loc, block, flag_isoc99);
 }
 
@@ -5090,7 +5099,7 @@ c_parser_if_body (c_parser *parser, bool *if_p)
    specially for the sake of -Wempty-body warnings.  */
 
 static tree
-c_parser_else_body (c_parser *parser)
+c_parser_else_body (c_parser *parser, location_t else_tok_loc)
 {
   location_t else_loc = c_parser_peek_token (parser)->location;
   tree block = c_begin_compound_stmt (flag_isoc99);
@@ -5109,7 +5118,12 @@ c_parser_else_body (c_parser *parser)
       c_parser_consume_token (parser);
     }
   else
-    c_parser_statement_after_labels (parser);
+    {
+      c_parser_statement_after_labels (parser);
+      if (vis_parser)
+	vis_parser->on_solo_stmt (block, else_tok_loc);
+    }
+
   return c_end_compound_stmt (else_loc, block, flag_isoc99);
 }
 
@@ -5124,7 +5138,7 @@ static void
 c_parser_if_statement (c_parser *parser)
 {
   tree block;
-  location_t loc;
+  location_t if_loc, cond_loc;
   tree cond;
   bool first_if = false;
   tree first_body, second_body;
@@ -5132,23 +5146,25 @@ c_parser_if_statement (c_parser *parser)
   tree if_stmt;
 
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_IF));
+  if_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
   block = c_begin_compound_stmt (flag_isoc99);
-  loc = c_parser_peek_token (parser)->location;
+  cond_loc = c_parser_peek_token (parser)->location;
   cond = c_parser_paren_condition (parser);
   in_if_block = parser->in_if_block;
   parser->in_if_block = true;
-  first_body = c_parser_if_body (parser, &first_if);
+  first_body = c_parser_if_body (parser, &first_if, if_loc);
   parser->in_if_block = in_if_block;
   if (c_parser_next_token_is_keyword (parser, RID_ELSE))
     {
+      location_t else_tok_loc = c_parser_peek_token (parser)->location;
       c_parser_consume_token (parser);
-      second_body = c_parser_else_body (parser);
+      second_body = c_parser_else_body (parser, else_tok_loc);
     }
   else
     second_body = NULL_TREE;
-  c_finish_if_stmt (loc, cond, first_body, second_body, first_if);
-  if_stmt = c_end_compound_stmt (loc, block, flag_isoc99);
+  c_finish_if_stmt (cond_loc, cond, first_body, second_body, first_if);
+  if_stmt = c_end_compound_stmt (cond_loc, block, flag_isoc99);
 
   /* If the if statement contains array notations, then we expand them.  */
   if (flag_cilkplus && contains_array_notation_expr (if_stmt))
@@ -5195,7 +5211,7 @@ c_parser_switch_statement (c_parser *parser)
   c_start_case (switch_loc, switch_cond_loc, expr);
   save_break = c_break_label;
   c_break_label = NULL_TREE;
-  body = c_parser_c99_block_statement (parser);
+  body = c_parser_c99_block_statement (parser, switch_loc);
   c_finish_case (body);
   if (c_break_label)
     {
@@ -5220,6 +5236,7 @@ c_parser_while_statement (c_parser *parser, bool ivdep)
   tree block, cond, body, save_break, save_cont;
   location_t loc;
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_WHILE));
+  location_t while_tok_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
   block = c_begin_compound_stmt (flag_isoc99);
   loc = c_parser_peek_token (parser)->location;
@@ -5239,7 +5256,7 @@ c_parser_while_statement (c_parser *parser, bool ivdep)
   c_break_label = NULL_TREE;
   save_cont = c_cont_label;
   c_cont_label = NULL_TREE;
-  body = c_parser_c99_block_statement (parser);
+  body = c_parser_c99_block_statement (parser, while_tok_loc);
   c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
   c_break_label = save_break;
@@ -5258,6 +5275,7 @@ c_parser_do_statement (c_parser *parser, bool ivdep)
   tree block, cond, body, save_break, save_cont, new_break, new_cont;
   location_t loc;
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_DO));
+  location_t do_tok_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
   if (c_parser_next_token_is (parser, CPP_SEMICOLON))
     warning_at (c_parser_peek_token (parser)->location,
@@ -5269,7 +5287,7 @@ c_parser_do_statement (c_parser *parser, bool ivdep)
   c_break_label = NULL_TREE;
   save_cont = c_cont_label;
   c_cont_label = NULL_TREE;
-  body = c_parser_c99_block_statement (parser);
+  body = c_parser_c99_block_statement (parser, do_tok_loc);
   c_parser_require_keyword (parser, RID_WHILE, "expected %<while%>");
   new_break = c_break_label;
   c_break_label = save_break;
@@ -5519,7 +5537,7 @@ c_parser_for_statement (c_parser *parser, bool ivdep)
   c_break_label = NULL_TREE;
   save_cont = c_cont_label;
   c_cont_label = NULL_TREE;
-  body = c_parser_c99_block_statement (parser);
+  body = c_parser_c99_block_statement (parser, for_loc);
   if (is_foreach_statement)
     objc_finish_foreach_loop (loc, object_expression, collection_expression, body, c_break_label, c_cont_label);
   else
@@ -11785,7 +11803,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
       add_stmt (c_end_compound_stmt (here, stmt, true));
     }
   else
-    add_stmt (c_parser_c99_block_statement (parser));
+    add_stmt (c_parser_c99_block_statement (parser, loc));
   if (c_cont_label)
     {
       tree t = build1 (LABEL_EXPR, void_type_node, c_cont_label);
@@ -14011,6 +14029,9 @@ c_parse_file (void)
   tparser.tokens = &tparser.tokens_buf[0];
   the_parser = &tparser;
 
+  if (warn_misleading_indentation)
+    vis_parser = new visual_parser ();
+
   if (c_parser_peek_token (&tparser)->pragma_kind == PRAGMA_GCC_PCH_PREPROCESS)
     c_parser_pragma_pch_preprocess (&tparser);
 
@@ -14024,6 +14045,10 @@ c_parse_file (void)
     using_eh_for_cleanups ();
 
   c_parser_translation_unit (the_parser);
+
+  delete vis_parser;
+  vis_parser = NULL;
+
   the_parser = NULL;
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 46e2453..ffbc04b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "parser.h"
 #include "type-utils.h"
 #include "omp-low.h"
+#include "visual-parser.h"
 
 \f
 /* The lexer.  */
@@ -632,6 +633,11 @@ cp_lexer_new_main (void)
 
   lexer = cp_lexer_alloc ();
 
+  if (warn_misleading_indentation)
+    vis_parser = new visual_parser ();
+
+  /* FIXME: delete this.  */
+
   /* Put the first token in the buffer.  */
   lexer->buffer->quick_push (token);
 
@@ -1030,6 +1036,9 @@ cp_lexer_consume_token (cp_lexer* lexer)
   gcc_assert (token != &eof_token);
   gcc_assert (!lexer->in_pragma || token->type != CPP_PRAGMA_EOL);
 
+  if (vis_parser)
+    vis_parser->on_token (lexer->next_token->location);
+
   do
     {
       lexer->next_token++;
@@ -2009,9 +2018,9 @@ static void cp_parser_declaration_statement
   (cp_parser *);
 
 static tree cp_parser_implicitly_scoped_statement
-  (cp_parser *, bool *);
+  (cp_parser *, bool *, location_t);
 static void cp_parser_already_scoped_statement
-  (cp_parser *);
+  (cp_parser *, location_t);
 
 /* Declarations [gram.dcl.dcl] */
 
@@ -9843,7 +9852,8 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
 		nested_if = false;
 	      }
 	    else
-	      cp_parser_implicitly_scoped_statement (parser, &nested_if);
+	      cp_parser_implicitly_scoped_statement (parser, &nested_if,
+						     token->location);
 	    parser->in_statement = in_statement;
 
 	    finish_then_clause (statement);
@@ -9853,7 +9863,8 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
 						RID_ELSE))
 	      {
 		/* Consume the `else' keyword.  */
-		cp_lexer_consume_token (parser->lexer);
+		location_t else_tok_loc
+		  = cp_lexer_consume_token (parser->lexer)->location;
 		begin_else_clause (statement);
 		/* Parse the else-clause.  */
 	        if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
@@ -9867,7 +9878,8 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
 		    cp_lexer_consume_token (parser->lexer);
 		  }
 		else
-		  cp_parser_implicitly_scoped_statement (parser, NULL);
+		  cp_parser_implicitly_scoped_statement (parser, NULL,
+							 else_tok_loc);
 
 		finish_else_clause (statement);
 
@@ -9907,7 +9919,8 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p)
 	    in_statement = parser->in_statement;
 	    parser->in_switch_statement_p = true;
 	    parser->in_statement |= IN_SWITCH_STMT;
-	    cp_parser_implicitly_scoped_statement (parser, NULL);
+	    cp_parser_implicitly_scoped_statement (parser, NULL,
+						   0);
 	    parser->in_switch_statement_p = in_switch_statement_p;
 	    parser->in_statement = in_statement;
 
@@ -10451,6 +10464,7 @@ static tree
 cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
 {
   cp_token *token;
+  location_t tok_loc;
   enum rid keyword;
   tree statement;
   unsigned char in_statement;
@@ -10460,6 +10474,8 @@ cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
   if (!token)
     return error_mark_node;
 
+  tok_loc = token->location;
+
   /* Remember whether or not we are already within an iteration
      statement.  */
   in_statement = parser->in_statement;
@@ -10483,7 +10499,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
 	cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
 	/* Parse the dependent statement.  */
 	parser->in_statement = IN_ITERATION_STMT;
-	cp_parser_already_scoped_statement (parser);
+	cp_parser_already_scoped_statement (parser, tok_loc);
 	parser->in_statement = in_statement;
 	/* We're done with the while-statement.  */
 	finish_while_stmt (statement);
@@ -10498,7 +10514,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
 	statement = begin_do_stmt ();
 	/* Parse the body of the do-statement.  */
 	parser->in_statement = IN_ITERATION_STMT;
-	cp_parser_implicitly_scoped_statement (parser, NULL);
+	cp_parser_implicitly_scoped_statement (parser, NULL, 0);
 	parser->in_statement = in_statement;
 	finish_do_body (statement);
 	/* Look for the `while' keyword.  */
@@ -10528,7 +10544,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
 
 	/* Parse the body of the for-statement.  */
 	parser->in_statement = IN_ITERATION_STMT;
-	cp_parser_already_scoped_statement (parser);
+	cp_parser_already_scoped_statement (parser, tok_loc);
 	parser->in_statement = in_statement;
 
 	/* We're done with the for-statement.  */
@@ -10773,7 +10789,8 @@ cp_parser_declaration_statement (cp_parser* parser)
    Returns the new statement.  */
 
 static tree
-cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p)
+cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p,
+				       location_t guard_loc)
 {
   tree statement;
 
@@ -10799,6 +10816,9 @@ cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p)
       cp_parser_statement (parser, NULL_TREE, false, if_p);
       /* Finish the dummy compound-statement.  */
       finish_compound_stmt (statement);
+      /* FIXME.  */
+      if (vis_parser)
+	vis_parser->on_solo_stmt (cur_stmt_list, guard_loc);
     }
 
   /* Return the statement.  */
@@ -10811,11 +10831,15 @@ cp_parser_implicitly_scoped_statement (cp_parser* parser, bool *if_p)
    scope.  */
 
 static void
-cp_parser_already_scoped_statement (cp_parser* parser)
+cp_parser_already_scoped_statement (cp_parser* parser, location_t guard_loc)
 {
   /* If the token is a `{', then we must take special action.  */
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_OPEN_BRACE))
-    cp_parser_statement (parser, NULL_TREE, false, NULL);
+    {
+      cp_parser_statement (parser, NULL_TREE, false, NULL);
+      if (vis_parser)
+	vis_parser->on_solo_stmt (cur_stmt_list, guard_loc);
+    }
   else
     {
       /* Avoid calling cp_parser_compound_statement, so that we
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 886fbb8..0d05674 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "bitmap.h"
 #include "omp-low.h"
+#include "visual-parser.h"
 
 static bool verify_constant (tree, bool, bool *, bool *);
 #define VERIFY_CONSTANT(X)						\
@@ -386,6 +387,9 @@ add_stmt (tree t)
       STMT_IS_FULL_EXPR_P (t) = stmts_are_full_exprs_p ();
     }
 
+  if (vis_parser)
+    vis_parser->check_stmt (t);
+
   /* Add T to the statement-tree.  Non-side-effect statements need to be
      recorded during statement expressions.  */
   gcc_checking_assert (!stmt_list_stack->is_empty ());
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-1.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-1.c
new file mode 100644
index 0000000..630ebfe
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-1.c
@@ -0,0 +1,12 @@
+/* { dg-options "-Wmisleading-indentation" } */
+/* { dg-do compile } */
+
+int
+foo (int flag)
+{
+  int x = 4, y = 5;
+  if (flag) /* { dg-message "3: ...this clause, but it is not" } */
+    x = 3;
+    y = 2; /* { dg-warning "statement is indented as if it were guarded by..." } */
+  return x * y;
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-10.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-10.c
new file mode 100644
index 0000000..1531662
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-10.c
@@ -0,0 +1,15 @@
+/* { dg-options "-Wmisleading-indentation" } */
+/* { dg-do compile } */
+extern void bar (int);
+
+void foo (int flag)
+{
+  if (flag) /* { dg-message "3: ...this clause, but it is not" } */
+    if (flag / 2)
+      {
+        bar (0);
+        bar (1);
+      }
+    bar (2); /* { dg-warning "statement is indented as if it were guarded by..." } */
+  bar (3);
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-2.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-2.c
new file mode 100644
index 0000000..a22cd11
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-2.c
@@ -0,0 +1,11 @@
+/* { dg-options "-Wmisleading-indentation" } */
+/* { dg-do compile } */
+
+int
+foo (int flag, int x, int y)
+{
+  if (flag) /* { dg-message "3: ...this clause, but it is not" } */
+    x++; y++; /* { dg-warning "statement is indented as if it were guarded by..." } */
+
+  return x * y;
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-3.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-3.c
new file mode 100644
index 0000000..7048225
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-3.c
@@ -0,0 +1,14 @@
+/* { dg-options "-Wmisleading-indentation" } */
+/* { dg-do compile } */
+
+int
+foo (int flag)
+{
+  int x = 4, y = 5;
+  if (flag)
+    x = 3;
+  else /* { dg-message "3: ...this clause, but it is not" } */
+    x = 2;
+    y = 2; /* { dg-warning "statement is indented as if it were guarded by..." } */
+  return x * y;
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-4.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-4.c
new file mode 100644
index 0000000..7abf379
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-4.c
@@ -0,0 +1,11 @@
+/* { dg-options "-Wmisleading-indentation" } */
+/* { dg-do compile } */
+
+void
+foo (double *a, double *b, double *c)
+{
+  int i = 0;
+  while (i < 10) /* { dg-message "3: ...this clause, but it is not" } */
+    a[i] = b[i] * c[i];
+    i++; /* { dg-warning "statement is indented as if it were guarded by..." } */
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-5.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-5.c
new file mode 100644
index 0000000..cb6d025
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-5.c
@@ -0,0 +1,11 @@
+/* { dg-options "-Wmisleading-indentation" } */
+/* { dg-do compile } */
+
+void
+foo (double *a, double *b, double *sum, double *prod)
+{
+  int i = 0;
+  for (i = 0; i < 10; i++) /* { dg-output "3: ...this one, but it is not" } */
+    sum[i] = a[i] * b[i];
+    prod[i] = a[i] * b[i]; /* { dg-warning "statement is indented as if it were guarded by..." } */
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-6.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-6.c
new file mode 100644
index 0000000..3133d15
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-6.c
@@ -0,0 +1,22 @@
+/* Based on CVE-2014-1266 aka "goto fail" */
+/* { dg-options "-Wmisleading-indentation" } */
+extern int foo (int);
+
+static int
+goto_fail(int a, int b, int c)
+{
+	int err;
+
+	/* ... */
+	if ((err = foo (a)) != 0)
+		goto fail;
+	if ((err = foo (b)) != 0) /* { dg-message "2: ...this clause, but it is not" } */
+		goto fail;
+		goto fail; /* { dg-warning "statement is indented as if it were guarded by..." } */
+	if ((err = foo (c)) != 0)
+		goto fail;
+	/* ... */
+
+fail:
+	return err;
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-7.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-7.c
new file mode 100644
index 0000000..8d1ff00
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-7.c
@@ -0,0 +1,15 @@
+/* { dg-options "-Wmisleading-indentation" } */
+
+extern int bar (int, int);
+
+int foo (int p, int q, int r, int s, int t)
+{
+  if (bar (p, q))
+    {
+      if (p) /* { dg-message "7: ...this clause, but it is not" } */
+        q++; r++; /* { dg-warning "statement is indented as if it were guarded by..." } */
+        s++; /* { dg-warning "statement is indented as if it were guarded by..." } */
+      t++;
+    }
+  return p + q + r + s + t;
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-8.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-8.c
new file mode 100644
index 0000000..82c1e6b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-8.c
@@ -0,0 +1,6 @@
+/* { dg-options "-Wmisleading-indentation" } */
+int foo (int a, int b, int c)
+{
+  /* This should *not* be flagged as misleading indentation.  */
+  if (a) return b; else return c;
+}
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-9.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation-9.c
new file mode 100644
index 0000000..e490dec
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-9.c
@@ -0,0 +1,10 @@
+/* { dg-options "-Wmisleading-indentation" } */
+/* { dg-do compile } */
+extern void bar (int);
+
+void foo (int flag)
+{
+  if (flag) /* { dg-message "3: ...this clause, but it is not" } */
+    bar (0);
+    bar (1); /* { dg-warning "statement is indented as if it were guarded by..." } */
+}
diff --git a/gcc/visual-parser.c b/gcc/visual-parser.c
new file mode 100644
index 0000000..6181e1e
--- /dev/null
+++ b/gcc/visual-parser.c
@@ -0,0 +1,255 @@
+/* "Visual parser" for detecting misleading indentation.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by David Malcolm <dmalcolm@redhat.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"			/* For rtl.h: needs enum reg_class.  */
+#include "tree.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "stor-layout.h"
+#include "varasm.h"
+#include "trans-mem.h"
+#include "langhooks.h"
+#include "input.h"
+#include "cpplib.h"
+#include "timevar.h"
+#include "c-family/c-pragma.h"
+#include "flags.h"
+#include "ggc.h"
+#include "vec.h"
+#include "target.h"
+#include "cgraph.h"
+#include "plugin.h"
+#include "omp-low.h"
+#include "visual-parser.h"
+#include "diagnostic-core.h"
+#include "tree-iterator.h"
+
+/* Ptr to the singleton instance of visual_parser.  */
+visual_parser *vis_parser;
+
+/* The ctor for visual_parser.  */
+visual_parser::visual_parser ()
+{
+  m_debug = false;
+  m_last_xloc.file = NULL;
+  m_last_xloc.line = 0;
+  m_last_xloc.column = 0;
+}
+
+/* Token-handling.  This takes a stream of locations, examining their
+   spelling locations, and calling on_indent, on_line, on_outdent
+   accordingly.  */
+
+void
+visual_parser::on_token (location_t loc)
+{
+  if (m_debug)
+    inform (loc, "on_token");
+
+  expanded_location xloc = expand_location (loc);
+  if (xloc.line != m_last_xloc.line)
+    {
+      //inform (loc, "first token on new line");
+      visual_block *curblock = get_stack_top ();
+      if (curblock)
+	{
+	  /* FIXME: what about entirely empty lines???
+	     Presumably they simply don't get tokens.  */
+	  int last_indent = curblock->get_column ();
+	  if (xloc.column > last_indent)
+	    {
+	      /* This line starts more indented than any current
+		 indentation level; begin a new "visual block". */
+	      visual_block *block = new visual_block (loc, xloc.column);
+	      m_block_stack.safe_push (block);
+	      on_indent (loc);
+	    }
+	  else if (xloc.column == last_indent)
+	    {
+	      /* This line is at the same indentation level as before,
+		 within the current "visual block".  */
+	      on_line (loc);
+	    }
+	  else
+	    {
+	      /* We have some amount of outdenting; how much?  */
+	      while (m_block_stack.length ())
+		{
+		  m_block_stack.pop ();
+		  on_outdent (loc);
+		  if (get_stack_top ()->get_column () >= xloc.column)
+		    break;
+		}
+	    }
+	}
+      else
+	{
+	  /* No current indentation level; start one. */
+	  visual_block *block = new visual_block (loc, xloc.column);
+	  m_block_stack.safe_push (block);
+	  on_indent (loc);
+	}
+    }
+  m_last_xloc = xloc;
+}
+
+/* Called when we have a token that's on a new line that's more indented
+   than the token that began the last line.  */
+void
+visual_parser::on_indent (location_t loc)
+{
+  if (m_debug)
+    inform (loc, "on_indent");
+}
+
+/* Called when we have a token that's on a new line that's less indented
+   than the token that began the last line.  */
+
+void
+visual_parser::on_outdent (location_t loc)
+{
+  if (m_debug)
+    inform (loc, "on_outdent");
+}
+
+/* Called for the first token on a new line that's at the same indentation
+   level as the previous line.  */
+void
+visual_parser::on_line (location_t loc)
+{
+  if (m_debug)
+    inform (loc, "on_line");
+  visual_block *curblock = get_stack_top ();
+  curblock->on_line (loc);
+}
+
+/* Called by the C/C++ FE when we have a guarding statement at GUARD_LOC
+   containing BLOCK, where the block wasn't written using braces, like
+   this:
+
+     guard-loc
+     |
+     V
+     if (flag)
+       foo (); <--BLOCK
+
+   so that we can detect followup statements that are within
+   the same "visual block" as the guarded statement, but which
+   aren't logically grouped within the guarding statement, such
+   as:
+
+     if (flag)
+       foo ();
+       bar ();
+
+   In the above, "bar ();" isn't guarded by the "if", but
+   misleading is in the same visual block as "foo ();".  */
+
+void
+visual_parser::on_solo_stmt (tree block, location_t guard_loc)
+{
+  /* Mark the most-indented visual block as containing
+     a solo-stmt.  */
+  get_stack_top ()->on_solo_stmt (block, guard_loc);
+}
+
+/* See comment above for visual_parser::on_solo_stmt.
+   Mark the most-indented visual block as (supposedly) only
+   containing a solo statement.  */
+
+void
+visual_block::on_solo_stmt (tree block, location_t guard_loc)
+{
+  tree_stmt_iterator tsi = tsi_start (block);
+  tree stmt = tsi_stmt (tsi);
+
+  m_loc_solo_stmt = EXPR_LOCATION (stmt);
+  m_loc_guard = guard_loc;
+
+  m_exploc_solo_stmt
+    = expand_location_to_spelling_point (m_loc_solo_stmt);
+  m_exploc_guard
+    = expand_location_to_spelling_point (guard_loc);
+
+  /* If the solo-stmt is on a new line and more indented than
+     the guard location, mark the current visual block (which
+     presumably contains the solo stmt) for checking.
+     Doing this rejects cases such as
+       if (foo) bar (); baz ();
+     where it's not clear whether or not we ought to warn about
+     "baz ();" and hence we don't.  */
+  if (m_exploc_solo_stmt.file == m_exploc_guard.file)
+    if (m_exploc_solo_stmt.line > m_exploc_guard.line)
+      if (m_exploc_solo_stmt.column > m_exploc_guard.column)
+	{
+	  //inform (m_loc_stmt, "solo statement here");
+	  //inform (m_loc_guard, "visually guarded here");
+	  m_has_solo_stmt = true;
+	}
+}
+
+/* Check NEW_STMT for misleading indentation.
+   Called when adding NEW_STMT to a statement list.
+
+   If we have a solo statement, and we're in the same
+   visual block, and NEW_STMT is visually after the
+   solo statement, then NEW_STMT is misleadingly indented as
+   if were guarded by the guard, but it isn't.
+   Issue a warning for such a statement.  */
+
+void
+visual_parser::check_stmt (tree new_stmt)
+{
+  if (m_debug)
+    inform (EXPR_LOCATION (new_stmt), "check_stmt");
+  get_stack_top ()->check_stmt (new_stmt);
+}
+
+/* See comment above for visual_parser::check_stmt.  */
+
+void
+visual_block::check_stmt (tree new_stmt)
+{
+  if (!m_has_solo_stmt)
+    return;
+
+  location_t loc_new_stmt = EXPR_LOCATION (new_stmt);
+  //inform (loc_new_stmt, "checking stmt here");
+
+  expanded_location exploc_new_stmt
+    = expand_location_to_spelling_point (loc_new_stmt);
+
+  if (exploc_new_stmt.file == m_exploc_guard.file)
+    {
+      if (/* Statement is visually after the guarded stmt.  */
+	  (exploc_new_stmt.line == m_exploc_solo_stmt.line
+	  && exploc_new_stmt.column > m_exploc_solo_stmt.column)
+	  || (exploc_new_stmt.line > m_exploc_solo_stmt.line))
+	if (warning_at (loc_new_stmt,
+			OPT_Wmisleading_indentation,
+			"statement is indented as if it"
+			" were guarded by..."))
+	  inform (m_loc_guard,
+		  "...this clause, but it is not");
+    }
+}
diff --git a/gcc/visual-parser.h b/gcc/visual-parser.h
new file mode 100644
index 0000000..295f457
--- /dev/null
+++ b/gcc/visual-parser.h
@@ -0,0 +1,167 @@
+/* "Visual parser" for detecting misleading indentation.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by David Malcolm <dmalcolm@redhat.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Forward declaration.  */
+class visual_block;
+
+/* A "visual parser" for detecting misleading indentation.
+
+   This is fed three things by the frontend:
+
+   (A) a series of location_t by the frontend's tokenizer,
+   corresponding to the locations of the token stream.  It uses
+   this to model a stack of "visual blocks", corresponding to
+   indentation levels within the source code.
+   For example:
+
+   Source       VBlock #  VBlock stack
+   ------       --------  ------------
+   void         0         [b0]
+   foo (int i)  0         [b0]
+   {            0         [b0]
+    int j;      .1        [b0, b1]
+    if (i)      .1        [b0, b1]
+     {          ..2       [b0, b1, b2]
+       foo (0); ...3      [b0, b1, b2, b3]
+       bar (0); ...3      [b0, b1, b2, b3]
+     }          ..2       [b0, b1, b2]
+    else        .1        [b0, b1]
+     foo (1);   ..4       [b0, b1, b4]
+     bar (1);   ..4       [b0, b1, b4] <-- WARNING!
+   }            0         [b0]
+
+   Note how the call to "bar (1);" called out with "WARNING!" is
+   indented as if it's in the same block as the call to "foo (1);",
+   guarded by the "else" (both are in visual block 4), but they are
+   *not* in the same actual block as far as the real frontend
+   (and language standards) see it.
+   The purpose of this class is to issue a warning about this
+   misleading indentation.
+
+   (2) The frontend notifies the class about "solo statements", that
+   is, non-compound statements guarded by control-flow statements,
+   such as "foo (1);", a non-compound statement guarded by the else
+   clause.  Misleading indentation can occur in the statement
+   immediately following such a non-compound statement, if the
+   successor statement is indented in the same way
+   (i.e. it is within the same visual block).
+
+   (3) The frontend notifiees the class about statements being added
+   to a statement list.  If we have a guarded non-compound statement,
+   the new statements can be checked for misleading indentation.
+
+   Note that we can't simply use statement locations; for example, in:
+     if (flag)
+       x = 1;
+   the "if (flag)"'s location is at the open-paren, and that of the
+   assignment "x = 1;" is at the equals-sign, so any attempt to use
+   statement locations can be fooled by varying the spacing:
+
+        V
+     if (flag)
+       x = 1;
+         ^ apparent indentation relative to conditional
+
+         V
+     if  (flag)
+       x = 1;
+         ^ same column as conditional
+
+           V
+     if    (flag)
+       x = 1;
+         ^ apparent "outdent" relative to conditional
+
+   Hence we have to use token locations.  */
+
+class GTY(()) visual_parser {
+ public:
+  visual_parser ();
+  void on_token (location_t loc);
+  void on_solo_stmt (tree block, location_t guard_loc);
+  void check_stmt (tree stmt);
+
+ private:
+  void on_newline (location_t loc);
+
+  void on_indent (location_t loc);
+  void on_outdent (location_t loc);
+  void on_line (location_t loc);
+
+  visual_block * get_stack_top () const
+  {
+    if (m_block_stack.length ())
+      return m_block_stack[m_block_stack.length () - 1];
+    else
+      return NULL;
+  }
+
+ private:
+  bool m_debug;
+  expanded_location m_last_xloc;
+  /* A stack of indentation levels.  */
+  auto_vec<visual_block *> m_block_stack;
+};
+
+/* A visual block: a run of lines with the same initial indentation.  */
+
+class visual_block
+{
+ public:
+  visual_block (location_t loc, int column_start)
+    : m_loc (loc),
+      m_loc_last_line (loc),
+      m_column_start (column_start),
+      m_has_solo_stmt (false)
+  {}
+
+  location_t get_location () const { return m_loc; }
+  location_t get_last_location () const { return m_loc_last_line; }
+  int get_column () const { return m_column_start; }
+
+  void on_line (location_t loc) { m_loc_last_line = loc; }
+
+  void on_solo_stmt (tree block, location_t guard_loc);
+  void check_stmt (tree stmt);
+
+ private:
+  location_t m_loc;
+  location_t m_loc_last_line;
+  int m_column_start;
+
+  /* Detection of misleading indentation.
+     If m_has_solo_stmt is true, then this visual
+     block contains a "solo statement" i.e. one within a block
+     created without braces, such as:
+       if (flag) <- guard
+         foo (); <- solo stmt in this visblock
+     Any followup statements that are in the same visual block as
+     "foo ();" are therefore misleadingly indented.  */
+  bool m_has_solo_stmt;
+  location_t m_loc_solo_stmt;
+  location_t m_loc_guard;
+  expanded_location m_exploc_solo_stmt;
+  expanded_location m_exploc_guard;
+
+};
+
+/* The singleton instance of the visual_parser, created by the
+   C/C++ frontend if -Wmisleading-indentation is enabled.  */
+extern visual_parser *vis_parser;

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-08-18 14:13 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-16 15:08 [RFC stage 1] Proposed new warning: -Wmisleading-indentation David Malcolm
2015-04-16 17:27 ` Mike Stump
2015-04-21 16:13   ` David Malcolm
2015-04-21 16:40     ` Trevor Saunders
2015-04-21 16:43     ` Manuel López-Ibáñez
2015-04-21 17:05     ` Mike Stump
2015-04-21 18:14     ` Manuel López-Ibáñez
2015-04-21 23:35       ` David Malcolm
2015-04-28 23:50       ` [PATCH 1/3] Implement -Wmisleading-indentation (v4) David Malcolm
2015-04-28 23:52         ` [PATCH 2/3] Fix spurious semicolons David Malcolm
2015-04-29  0:10           ` Jeff Law
2015-04-28 23:53         ` [PATCH 3/3] Fix indentation issues seen by -Wmisleading-indentation David Malcolm
2015-04-29  0:25           ` Jeff Law
2015-04-29 12:35           ` Mikael Morin
2015-05-05 19:44             ` David Malcolm
2015-05-06 11:38               ` Fix logic error in Fortran OpenACC parsing (was: [PATCH 3/3] Fix indentation issues seen by -Wmisleading-indentation) Thomas Schwinge
2015-05-08 11:24                 ` Fix logic error in Fortran OpenACC parsing Ilmir Usmanov
2015-07-27 14:37                   ` Thomas Schwinge
2015-05-11 21:23         ` [PATCH 1/3] Implement -Wmisleading-indentation (v4) Jeff Law
2015-05-12 21:28           ` David Malcolm
2015-05-12 21:47             ` David Malcolm
2015-08-18 14:13               ` PR67192 (Was: Re: [PATCH 1/3] Implement -Wmisleading-indentation (v4)) Andreas Arnez
2015-04-16 18:29 ` [RFC stage 1] Proposed new warning: -Wmisleading-indentation Manuel López-Ibáñez
2015-04-17 14:42 ` Tom Tromey
2015-04-17 16:12   ` Manuel López-Ibáñez
2015-04-28 23:13     ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).