On 09/16/2016 12:19 PM, Jason Merrill wrote:
> On 09/14/2016 01:03 PM, Martin Sebor wrote:
>> Attached is an updated patch that does away with the "overlapping"
>> warning and instead issues the same warnings as the C front end
>> (though with more context).
>>
>> In struct flexmems_t I've retained the NEXT array even though only
>> the first element is used to diagnose problems.  The second element
>> is used by the find_flexarrays function to differentiate structs
>> with flexible array members in unions (which are valid) from those
>> in structs (which are not).
>>
>> FWIW, I think this C restriction on structs is unnecessary and will
>> propose to remove it so that's valid to define a struct that contains
>> another struct (possibly recursively) with a flexible array member.
>> I.e., I think this should be made valid in C (and accepted without
>> the pedantic warning that GCC, and with this patch also G++, issues):
>
> Agreed.

This is now N2083:
   http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2083.htm

>
>> +      /* Is FLD a typedef for an anonymous struct?  */
>> +      bool anon_type_p
>> +       = (TREE_CODE (fld) == TYPE_DECL
>> +          && DECL_IMPLICIT_TYPEDEF_P (fld)
>> +          && anon_aggrname_p (DECL_NAME (fld)));
>
> We talked earlier about handling typedefs in
> cp_parser_{simple,single}_declaration, so that we catch
>
> typedef struct { int a[]; } B;
>
> or at least adding a FIXME comment here explaining that looking at
> typedefs is a temporary hack.

I've added a FIXME comment.

>
>> +      /* Type of the member.  */
>> +      tree fldtype = TREE_CODE (fld) == FIELD_DECL ? TREE_TYPE (fld)
>> : fld;
>
> Why set "fldtype" to be a TYPE_DECL rather than its type?

I'm not sure I understand the question but (IIRC) the purpose of
this code is to detect invalid uses of flexible array members in
typedefs such as this:

    struct S { typedef struct { int i, a[]; } X2 [2]; };

Sadly, it doesn't do anything for

    struct X { int i, a[]; };
    struct S { typedef struct X X2 [2]; };

>> +      /* Determine the type of the array element or object referenced
>> +        by the member so that it can be checked for flexible array
>> +        members if it hasn't been yet.  */
>> +      tree eltype = TREE_CODE (fld) == FIELD_DECL ? fldtype :
>> TREE_TYPE (fld);
>
> Given the above, this seems equivalent to
>
> tree eltype = TREE_TYPE (fld);

Yes.

>
>> +      if (RECORD_OR_UNION_TYPE_P (eltype))
>> +       {
> ...
>> +         if (fmem->array && !fmem->after[bool (pun)])
>> +           {
>> +             /* Once the member after the flexible array has been found
>> +                we're done.  */
>> +             fmem->after[bool (pun)] = fld;
>> +             break;
>> +           }
> ...
>>       if (field_nonempty_p (fld))
>>         {
> ...
>>           /* Remember the first non-static data member after the flexible
>>              array member, if one has been found, or the zero-length
>> array
>>              if it has been found.  */
>>           if (fmem->array && !fmem->after[bool (pun)])
>>             fmem->after[bool (pun)] = fld;
>>         }
>
> Can we do this in one place rather than two?

You mean merge the two assignments to fmem->after[bool (pun)] into
one?  I'm not sure.  There's quite a bit of conditional logic in
between them, including a recursive call that might set fmem.  What
would be gained by rewriting it all to merge the two assignments?
If it could be done I worry that I'd just end up duplicating the
conditions instead.

>
>> +         if (eltype == fldtype || TYPE_UNNAMED_P (eltype))
>
> This is excluding arrays, so we don't diagnose, say,
>
>> struct A
>> {
>>   int i;
>>   int ar[];
>> };
>>
>> struct B {
>>   A a[2];
>> };
>
> Should we complain elsewhere about an array of a class with a flexible
> array member?

Yes, there are outstanding gaps/bugs that remain to be fixed.  I tried
to limit this fix to not much more than the reported regression.  I did
end up tightening other things up as I found more gaps during testing
(though I probably should have resisted).  Sometimes I find it hard to
know where to stop but I feel like I need to draw the line somewhere.
I of course agree that the outstanding bugs should be fixed as well
but I'd prefer to do it in a separate change.

>> +             /* Does the field represent an anonymous struct?  */
>> +             bool anon_p = !DECL_NAME (fld) && ANON_AGGR_TYPE_P
>> (eltype);
>
> You don't need to check DECL_NAME; ANON_AGGR_TYPE_P by itself indicates
> that we're dealing with an anonymous struct/union.

Thanks, I've removed the variable.

>
>>    Similarly, PSTR is set to the outermost struct of which T is a member
>>    if one such struct exists, otherwise to NULL.  */
> ...
>>               find_flexarrays (eltype, fmem, false, pun,
>>                                !pstr && TREE_CODE (t) == RECORD_TYPE ?
>> fld : pstr);
> ...
>> +diagnose_invalid_flexarray (const flexmems_t *fmem)
>> +{
>> +  if (fmem->array && fmem->enclosing
>> +      && pedwarn (location_of (fmem->enclosing), OPT_Wpedantic,
>> +                 TYPE_DOMAIN (TREE_TYPE (fmem->array))
>> +                 ? G_("invalid use of %q#T with a zero-size array "
>> +                      "in %q#D")
>> +                 : G_("invalid use of %q#T with a flexible array
>> member "
>> +                      "in %q#T"),
>> +                 DECL_CONTEXT (fmem->array),
>> +                 DECL_CONTEXT (fmem->enclosing)))
>
> PSTR is documented to be the outermost struct,

PSTR is set to a data member of the outermost struct of which
the flexible array is a member.  I've updated the comment to make
it clearer.

> but it (and thus
> fmem->enclosing) end up being a data member of that outermost struct,
> which is why you need to take its DECL_CONTEXT.  What's the advantage of
> doing that over passing t itself to the recursive call?

I need the data member because I want to point the diagnostic at
it, not at the struct of which it's a member.  For example, in

   struct S {
     struct X {
       int i, a[];
     }
     *p,
     x;
   };

I want to point at the x because it is the problem, like this:

   warning: invalid use of âstruct S::Xâ with a flexible array member in 
struct Sâ [-Wpedantic]
       x;
       ^

not at struct X, which is not incorrect in and of itself, and neither
is p.  The diagnostic makes use of x's DECL_CONTEXT to print the name
of x's type because that's where the flexible array is defined (or in
which it's nested).

>
>> +  /* The context of the flexible array member.  Either the struct
>> +     in which it's declared or, for anonymous structs and unions,
>> +     the struct/union of which the array is effectively a member.  */
>> +  tree fmemctx = anon_context (fmem->array);
>> +  bool anon_p = fmemctx != DECL_CONTEXT (fmem->array);
>> +  if (!anon_p)
>> +    fmemctx = t;
>
> Why do you need to do something different here based on anon_p?  I don't
> see why that would affect whether we want to look through intermediate
> non-anonymous classes.

In code like this:

    struct X { int n, a[]; };
    struct Y { int n, b[]; };

    struct D: X, Y { };

The test above make the diagnostic point to the context of the invalid
flexible array member, or Y::b, rather than that of X. Without it, we
end up with:

    error: flexible array member âX::aâ not at end of âstruct Xâ

rather than with

    error: flexible array member âX::aâ not at end of âstruct Dâ

>
>> +      check_flexarrays (basetype, fmem, true);
>
> Please decorate boolean literal arguments like this with the name of the
> parameter, e.g. /*base_p*/true.

Sure.  I should mention that Jeff recently advised against it in
another review, so it would be nice to adopt the same convention
throughout and document it (I'm happy to add it to the Wiki once
it has been agreed on):

    https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00354.html

FWIW, I haven't found mentioning the formal argument in a comment
a useful or consistent convention.  Other arguments' names aren't
mentioned, and the bool argument's name cannot be mentioned when
it has a default value.

On the other hand, a convention I do find useful (though not one
that seems to be used in GCC) is indicating in a comment in the
definition of a function the value of the default argument in
functions declared to take one.

>
>> +  bool maybe_anon_p = anon_aggrname_p (TYPE_IDENTIFIER (t));
>
> Now we can use TYPE_UNNAMED_P.

Okay, thanks.

Attached is a revision of the patch with the changes discussed
above.

Martin