public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/94247] New: Wrong char-subscripts warning for limited-range index
@ 2020-03-21 10:42 roland.illig at gmx dot de
  2020-03-21 10:48 ` [Bug c/94247] " pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: roland.illig at gmx dot de @ 2020-03-21 10:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

            Bug ID: 94247
           Summary: Wrong char-subscripts warning for limited-range index
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: roland.illig at gmx dot de
  Target Milestone: ---

#define MAXID 20
static const char shft[MAXID] = {1,2,3,4,5,6,7,1,2,3,4,5,6,7,1,2,3,4,5,6};

int hashstr(const char *s) {
  char c;
  char k = 0;
  unsigned int sum = 0;

  while( (c = *s++) != '\0' && k < MAXID-1 ) {
    sum += c + (c<<shft[k++]);
  }
  return (int)(sum >> 1);
}

The above program is a slight variation of the OpenJDK code in
src/hotspot/share/libadt/dict.cpp. It uses a char for indexing an array, which
triggers this warning:

dict.cpp: In function ‘int hashstr(const char*)’:
dict.cpp:10:28: warning: array subscript has type ‘char’ [-Wchar-subscripts]
     sum += c + (c<<shft[k++]);

At optimization levels 1 and beyond, the possible range for k is determined
correctly, and the compiler generates the same code, no matter if k has type
char or unsigned int or any other integer type.

The warning is a false positive, and the compiler already knows this. Therefore
the warning should not be generated.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
@ 2020-03-21 10:48 ` pinskia at gcc dot gnu.org
  2020-03-21 11:09 ` roland.illig at gmx dot de
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-03-21 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |diagnostic

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The warning is not context sensitive.  What I mean is it does not take into
account the range of k.

Also you can avoid the warning by using either unsigned char or signed char.

char is a special type.  Unlike other plain types, it can default either to
signed or unsigned and for GCC it depends on the ABI.

>and the compiler already knows this
Not when the warning is generated from the front-end.  It does not know the
range of the char variable there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
  2020-03-21 10:48 ` [Bug c/94247] " pinskia at gcc dot gnu.org
@ 2020-03-21 11:09 ` roland.illig at gmx dot de
  2020-03-23  7:37 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: roland.illig at gmx dot de @ 2020-03-21 11:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

--- Comment #2 from Roland Illig <roland.illig at gmx dot de> ---
(In reply to Andrew Pinski from comment #1)
> >and the compiler already knows this
> Not when the warning is generated from the front-end.  It does not know the
> range of the char variable there.

Ah, that fine distinction. From my point of view there only was "the compiler".
If the range of the variable is not known at that time, it would probably be
too much work to implement this extra rule.

So you suggest that the code should be fixed instead? Then I'll tell the
OpenJDK people.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
  2020-03-21 10:48 ` [Bug c/94247] " pinskia at gcc dot gnu.org
  2020-03-21 11:09 ` roland.illig at gmx dot de
@ 2020-03-23  7:37 ` rguenth at gcc dot gnu.org
  2020-03-23  7:37 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-03-23  7:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yes, it's bad programming practice to use 'char' for any arithmetic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
                   ` (2 preceding siblings ...)
  2020-03-23  7:37 ` rguenth at gcc dot gnu.org
@ 2020-03-23  7:37 ` rguenth at gcc dot gnu.org
  2020-03-23 17:32 ` msebor at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-03-23  7:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
                   ` (3 preceding siblings ...)
  2020-03-23  7:37 ` rguenth at gcc dot gnu.org
@ 2020-03-23 17:32 ` msebor at gcc dot gnu.org
  2020-03-23 18:41 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: msebor at gcc dot gnu.org @ 2020-03-23 17:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |msebor at gcc dot gnu.org

--- Comment #5 from Martin Sebor <msebor at gcc dot gnu.org> ---
-Wchar-subscripts is far too primitive of an instrument to rely on to detect
real bugs with any fidelity.  Besides triggering regardless of whether char is
a signed or unsigned type (i.e., every instance of it on with -fno-signed-char
is a false positive), and besides not tracking values or ranges (leading to
another class of false positives evident in test cases like in comment #0), it
doesn't diagnose uses where the char subscript is a negative literal such as in

  int f (char *s)
  {
    return s['\xff'];   // missing warning
  }

or where it has been promoted from char, such as in:

  int f (char *s)
  {
    return s[*s + 1];   // missing warning
  }

Converting the char to signed char or any other type also suppresses the
warning without resolving the underlying problem (and, on -fno-signed-char
targets, could even introduce a bug into correct code).

Making the warning more discerning and detecting more real problems would
require running it later, after some basic optimizations.  But there already is
a flow-sensitive warning with the same ultimate goal of detecting out-of-bounds
array indices as -Wchar-subscripts: -Warray-bounds.  Unfortunately, because GCC
aggressively and rather indiscriminately folds references to static constant
arrays of types other than char very early on, negative subscripts in those are
not detected by it, and so erroneous expressions like in the snippet below are
not diagnosed:

  static const int a[] = { 1, 2, 3 };

  int f (void)
  {
    return a[-1];   // missing warning
  }

The out-of-bounds index above is diagnosed in accesses to non-constant arrays,
but only at -O2, because the -Warray-bounds warning runs only at that level. 
The only reason why it isn't detected at lower levels is because no effort has
been invested into it yet.  Patches are welcome :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
                   ` (4 preceding siblings ...)
  2020-03-23 17:32 ` msebor at gcc dot gnu.org
@ 2020-03-23 18:41 ` jakub at gcc dot gnu.org
  2020-03-23 23:52 ` msebor at gcc dot gnu.org
  2020-03-24  0:03 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-03-23 18:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Martin Sebor from comment #5)
> -Wchar-subscripts is far too primitive of an instrument to rely on to detect
> real bugs with any fidelity.

No, it diagnoses the main bug, that one uses char as index type, which isn't
very portable.  If one uses explicit signed char or unsigned char, it will
behave the same way on all targets, but char will not (unless it has values
from 0 to SCHAR_MAX only).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
                   ` (5 preceding siblings ...)
  2020-03-23 18:41 ` jakub at gcc dot gnu.org
@ 2020-03-23 23:52 ` msebor at gcc dot gnu.org
  2020-03-24  0:03 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: msebor at gcc dot gnu.org @ 2020-03-23 23:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

--- Comment #7 from Martin Sebor <msebor at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #6)
> No, it diagnoses the main bug

Nope, it does not.  -Wchar-subscripts is designed and documented to diagnose a
common cause of a bug.  The actual bug itself (which, as noted in pr94182, the
manual neglects to describe) is in inadvertently using a negative index as a
result of sign extension when a positive index is intended.  When that cannot
happen there is obviously no bug to diagnose.

There's no doubt that there is room for improvement in both warnings.  Some of
the false negatives might be avoidable by enhancing -Wchar-subscripts in the
front-end (e.g., fixing the remainder of pr29455), but the better ROI is in
continuing to improve -Warray-bounds (pr56456).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/94247] Wrong char-subscripts warning for limited-range index
  2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
                   ` (6 preceding siblings ...)
  2020-03-23 23:52 ` msebor at gcc dot gnu.org
@ 2020-03-24  0:03 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-03-24  0:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94247

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Martin Sebor from comment #7)
> (In reply to Jakub Jelinek from comment #6)
> > No, it diagnoses the main bug
> 
> Nope, it does not.  -Wchar-subscripts is designed and documented to diagnose
> a common cause of a bug.  The actual bug itself (which, as noted in pr94182,
> the manual neglects to describe) is in inadvertently using a negative index
> as a result of sign extension when a positive index is intended.  When that
> cannot happen there is obviously no bug to diagnose.

Yes and no.  Let's look at it a different way.  negative index is not the
issue.  But rather knowing if char is signed or unsigned is the issue.  it is a
portability issue.  -Wchar-subscripts is designed to diagnostic that you use
char in a subscript as you might not know if it is signed or unsigned because
the ABI differences.  Look at PowerPC or ARM ABIs, you will notice that char is
unsigned while other ABIs, char is signed.  YES with the code in comment #0,
there would be no difference but having a false negative is fine.  Not all
warnings are going to be 100% because of the heurstics.  Since the false
negative is easy workarounded, just use signed char or unsigned char or int as
the loop variable.  char is a special type as I keep on mentioning.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-24  0:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-21 10:42 [Bug c/94247] New: Wrong char-subscripts warning for limited-range index roland.illig at gmx dot de
2020-03-21 10:48 ` [Bug c/94247] " pinskia at gcc dot gnu.org
2020-03-21 11:09 ` roland.illig at gmx dot de
2020-03-23  7:37 ` rguenth at gcc dot gnu.org
2020-03-23  7:37 ` rguenth at gcc dot gnu.org
2020-03-23 17:32 ` msebor at gcc dot gnu.org
2020-03-23 18:41 ` jakub at gcc dot gnu.org
2020-03-23 23:52 ` msebor at gcc dot gnu.org
2020-03-24  0:03 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).