Attached is a work-in-progress patch for a new -Wmisleading-indentation warning I've been experimenting with, for GCC 6. It kind-of-works, but there are some issues, so I wanted to get feedback on it here. The idea is to issue a warning when the C/C++ compiler's view of the block structure doesn't match that of a naive human reader of the code via indentation. For example, CVE-2014-1266 (aka "goto fail") had: hashOut.data = hashes + SSL_MD5_DIGEST_LEN; hashOut.length = SSL_SHA1_DIGEST_LEN; if ((err = SSLFreeBuffer(&hashCtx)) != 0) goto fail; if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; /* MISTAKE! THIS LINE SHOULD NOT BE HERE */ if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; err = sslRawVerify(...); where the second "goto fail;" here: if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; /* MISTAKE! THIS LINE SHOULD NOT BE HERE */ is indented as if it's guarded by the conditional. It isn't, making the real meaning of the code hard to discern, and masking a security vulnerability (where the stray goto led to the certificate-checking code later in the function being dead code, skipping it with err == 0, for success). The patch successfully issues a warning on a simplified version of this: if ((err = foo (b)) != 0) goto fail; goto fail; ./Wmisleading-indentation-6.c: In function ‘foo’: ./Wmisleading-indentation-6.c:15:3: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation] goto fail; ^ ./Wmisleading-indentation-6.c:13:2: note: ...this clause, but it is not if ((err = foo (b)) != 0) ^ and on various other testcases. That said, there are several major issues with the patch as it stands: (A) works with the C FE, but not with C++ yet. The implementation assumes a single lexer consuming the token stream, so it doesn't yet work with the C++ FE's ideas of tentatively-consumed tokens (save/commit/rollback), and with its lexer stack. I *think* that can be fixed by relying on the fact that the integer locations of the token stream are normally monotonic, and using that to capture and handle those two C++ FE impl details. (B) GTY/PCH/etc, and performance; I know that it leaks visual_block instances, and there's probably a way to do this with much less dynamic memory allocation, but I wanted to get it right before making it fast. (C) no ChangeLog yet, though it's heavily commented. (D) tabs vs spaces. This is probably the biggest can of worms. Consider: { if (flagA) if (flagB) if (flagC) foo (); bar (); /* would like to warn about this */ } If this is indented using spaces&tabs with 8-space tabs, we have: { <2 spaces>if (flagA) <4 spaces>if (flagB) <6 spaces>if (flagC) <1 tab>foo (); <1 tab>bar (); } What does this look like in an editor? Consider emacs, which uses uses 0-based offsets to what I call "visual columns". The column numbers for the start of the first token for each line are reported by emacs as: <0>{ <2>if (flagA) <4>if (flagB) <6>if (flagC) <8>foo (); <8>bar (); <0>} However within libcpp and gcc, in linemap's expanded_location and in diagnostic messages, the "column" numbers are actually 1-based counts of *characters*, so the "column" numbers emitted in diagnostics for the start of the first token in each line are actually: <1>{ <3>if (flagA) <5>if (flagB) <7>if (flagC) <2>foo (); <2>bar (); <1>} Note the transition at tab offsets, where we go from increasing visual column numbers: <6>if (flagC) <8>foo (); to a decrease in the character count: <7>if (flagC) <2>foo (); Hence to handle mixed spaces+tabs, we need some way to model "visual columns". I think this is doable (at the cost of adding a field to cpp_token, and to c_token/cp_token), provided we can assume, per-buffer, either (i) a consistent value for tabs in terms of spaces, or (ii) that we don't have mixed tabs/spaces in this buffer An issue here is how to determine (i), or if it's OK to default to 8 and have a command-line option (param?) to override it? (though what about, say, each header file?) Thoughts on this, and on the patch? Thanks Dave