From mboxrd@z Thu Jan  1 00:00:00 1970
From: Kevin Buettner <kevinb@cygnus.com>
To: Andrew Cagney <ac131313@cygnus.com>
Cc: gdb@sourceware.cygnus.com
Subject: Re: Proposal: convert function definitions to prototyped form
Date: Sat, 03 Jun 2000 00:13:00 -0000
Message-id: <1000603071258.ZM32408@ocotillo.lan>
References: <1000602075018.ZM29997@ocotillo.lan> <3937816C.E66B9AE0@cygnus.com> <ac131313@cygnus.com>
X-SW-Source: 2000-06/msg00028.html

On Jun 2,  7:42pm, Andrew Cagney wrote:

> The only concern I have is, given the slightly more complex nature of
> the script (compared to PARAMS) there is a possibility that the
> conversion re-orders or re-types the argument list.  With that in mind,
> should a pre-cursor to this be to least have prototypes for all
> (global?) functions (-Wmissing-prototypes?) or only do the conversion
> when there is a prototype visible?

I've given this matter a lot of thought.

I agree that it would be desirable to have prototypes for all
functions.  Unfortunately, while it is easy to generate prototypes,
it's not so easy to know where to stick them.  Also, even if we had
prototypes in place, there's no guarantee that we'd catch the errors
after a few builds because I think there's some code in gdb (though I
don't know how much) that never gets built!  (Due to ifdefs and
near obsolete ports.)

What we really need is a method for vetting all of the changes
immediately after running the script.  I.e, we need to make sure that
the conversion does no reordering or retyping of any argument list.
Also, we need to make sure that the rewritten function declaration
is syntactically correct.

While examining the diffs (made with the -u switch) an idea occurred
to me.  Consider the following example:

    diff -ur ../../orig/gdb-sourceware/wrapper.c ./wrapper.c
    --- ../../orig/gdb-sourceware/wrapper.c	Sat May 27 17:10:27 2000
    +++ ./wrapper.c	Thu Jun  1 23:33:16 2000
    @@ -57,11 +57,8 @@
     static int wrap_parse_and_eval_type (char *);
     
     int
    -gdb_parse_exp_1 (stringptr, block, comma, expression)
    -     char **stringptr;
    -     struct block *block;
    -     int comma;
    -     struct expression **expression;
    +gdb_parse_exp_1 (char **stringptr, struct block *block, int comma,
    +		 struct expression **expression)
     {
       struct gdb_wrapper_arguments args;
       args.args[0].pointer = stringptr;

In the above diff, the lines prepended with `-' represent the original
K&R definition.  And the lines prepended with `+' represent the
transformed code.  Moreover, the diff is extremely regular in this
respect.  So...

If you take the lines which begin with `+', prepend the type on the
line before the `-' lines and tack a semicolon onto the end, you end
up with a prototype declaration.  And, if you take the lines beginning
with `-', again tack the type onto the front and put a function body
underneath it, you have a K&R style (traditional) function definition.

Now if you put these into a file with the prototype first and the K&R
definition later on, you can run "gcc -Wall" on it to see if any
warnings are produced.  Obviously, if we get warnings, we need to look
closer to see if something went wrong with the fix-decls conversion.

Of course, there are other details to consider, like making sure that
all of the types, structs, unions, and enums are declared.  Also,
in a source tree as big as gdb, we'll likely wind up with a number
of functions with the same name, so some method of disambiguating
these will be necessary.  And then of course, there's the matter
of no declared return type and other oddments.

I've written a script called ``check-decls'' which performs these
transformations on the diff output.  When I run it on the above diff,
it produces the following output (indented by four spaces by me for
readability)

    struct block { int f0; };
    struct expression { int f1; };
    #define INLINE
    #define private
    #define CONST const
    #define NORETURN
    void init___ (void *);
    int gdb_parse_exp_1 (char **stringptr, struct block *block, int comma,
		     struct expression **expression);

    int
    gdb_parse_exp_1 (stringptr, block, comma, expression)
	 char **stringptr;
	 struct block *block;
	 int comma;
	 struct expression **expression;
    {
      int ret;
      init___ (&ret);
      return ret;
    }

    void
    use___ (void)
    {
    }

The use___ () function isn't interesting in this example, but it would
be if there had been a static declaration.

Here's what happens when I run it on *all* of the diffs:

ocotillo:ptests$ ./check-decls <declsdiff >prog.c
ocotillo:ptests$ wc prog.c
  50235  112228  960827 prog.c
ocotillo:ptests$ gcc -c -Wall prog.c
prog.c: In function `exit':
prog.c:39303: warning: function declared `noreturn' has a `return' statement
prog.c: At top level:
prog.c:45903: parse error before `arg_type'
prog.c: In function `value_primitive_field':
prog.c:45907: declaration for parameter `arg_type' but no such parameter
prog.c:45906: declaration for parameter `fieldno' but no such parameter
prog.c:45905: declaration for parameter `offset' but no such parameter
prog.c:45904: declaration for parameter `arg1' but no such parameter
prog.c:45908: argument `arg_type' doesn't match prototype
prog.c:5886: prototype declaration
prog.c:45908: argument `arg1' doesn't match prototype
prog.c:5886: prototype declaration

The `exit' warning is due to the fact that there's a declaration and
definition of exit() from standalone.c.  It is of no concern.

The error following this warning looks more serious.  Here's the declaration
and the definition of the function involved:

    value_ptr value_primitive_field (register value_ptr arg1, int offset,
			   register int fieldno, register struct type *arg_type);

    value_ptr
    value_primitive_field (arg1, offset, fieldno, arg_type)
	 register value_ptr arg1;
	 int offset;
	 register int fieldno;
	 register struct type *arg_type;
    {
      value_ptr ret;
      init___ (&ret);
      return ret;
    }

I looked at this for a long, long time and didn't see anything wrong.
Finally, I realized that arg_type was a type from a different file.
(Which is one of the problems with throwing everything into one big
pot.)  Anyway, here's the type that the script declared:

    typedef struct t44 { int f44; } arg_type;

And here's the (transformed) definition which caused it to be defined:

    bool_t
    xdr_arg_type(xdrs, objp)
	    XDR *xdrs;
	    arg_type *objp;
    {
      bool_t ret;
      init___ (&ret);
      return ret;
    }

So it turns out that it's nothing to worry about.

And that's it.  There are no other warnings or errors.  Which means
that the transformation was successful and didn't mess up any of
the parameter types.

The check-decls script is below.  One might argue that it is about as
complex as the fix-decls script.  This is true, but the code which
actually extracts the `-' and `+' lines is fairly simple.  Also, after
being extracted, there are no transformations made to these lines
aside from appending ___<num> to the function name if the script
detects that the function name has already been seen.  Most
importantly, the parameter lists are not rewritten in any way.

Most of the complexity is in the analysis and generation of the
type, struct, enum, and union declarations.  But uniqueness of
these is easy to verify.  Plus, if something is screwed up, the
compiler complains.

--- check-decls ---
#!/usr/bin/perl -w

# Feed this script a unidiff after running fix-decls and it generates
# (on stdout) a program which may be used to test the validity of the
# conversion.  Just run the result through gcc -Wall and if it
# generates any warnings, there's a problem...

undef $/;		# slurp mode
my $diff = <>;		# read entire diff in $diff;

my $decls = '';
my $defns = '';

my %userstructs = ();
my %userenums = ();
my %usertypes = ();
my %funcnames = ();
my $funcname_gensym = 0;		# for names that clash
my @needuse;

while ($diff =~
	/ (
	    ^ 				# beginning of line
	    [^\n]+			# everything til the end of line
	  )
	  \n				# newline
	  (
	    (?:
	      ^				#   beginning of line
	      -				#   minus sign
	      (?: \n			#   either just a newline
		|			#     -- or -- 
	          [^-\n]		#   any character but minus and newline
	          [^\n]*		#   the rest of the line
	          \n    		#   including the newline
	      )
	    )+				# one or more of the above
	  )
	  (
	    (?:
	      ^				#   beginning of line
	      \+			#   plus sign
	      [^+]			#   any character but plus
	      [^\n]*			#   the rest of the line
	      \n			#   including the newline
	    )+				# one or more of the above
	  )
	                                                           /mgx) {
    my ($rettype, $traddecl, $isodecl) = ($1, $2, $3);

    # Remove leading diff character from the lines extracted
    foreach ($rettype, $traddecl, $isodecl) {
	s/^.//mg;
    }

    # Find type names in parameter list
    my $parmdecls = $traddecl;
    $parmdecls =~ s/^\w+\s*\([^)]*\)//;
    foreach my $parm (split /\s*;\s*/, $parmdecls) {
	$parm =~ s/\s*\**\w+(,|$).*$//;
	analyze_type($parm);
    }

    # Resolve collisions between function name (either due to statics
    # or due to the names being in different branches of an ifdef)
    my ($funcname) = $traddecl =~ /^(\w+)/;
    if (defined $funcnames{$funcname}) {
	foreach ($traddecl, $isodecl) {
	    s/\b$funcname\b/${funcname}___$funcname_gensym/;
	}
	$funcname .= "___$funcname_gensym";
	$funcname_gensym++;
    }
    $funcnames{$funcname} = $funcname;

    # Nuke comments in the return type
    $rettype =~ s#/\*.*?\*/##g;

    # Nuke partial comment in return type
    $rettype =~ s#^.*?\*/##;

    # Eliminate ``CALLBACK'' from return type
    $rettype =~ s/\bCALLBACK\b//;

    # Eliminate ``extern'' from return type
    $rettype =~ s/\bextern\b//;

    # Eliminate leading and trailing spaces from return type
    $rettype =~ s/^\s*//;
    $rettype =~ s/\s*$//;

    if (($rettype =~ /^#/) || ($rettype eq '')) {
	# preprocessor line or empty string
	$rettype = 'int';
    } elsif ($rettype eq "static") {
	$rettype = 'static int';
    } elsif ($rettype eq "private") {
	$rettype = 'static int';
    } else {
	analyze_type($rettype);
    }

    $isodecl =~ s/\n\Z/;\n/;

    $decls .= "$rettype $isodecl";
    if ($rettype =~ /\bvoid$/) {
	$defns .= "$rettype\n$traddecl\{\n}\n\n";
    } else {
	$defns .= "$rettype\n$traddecl\{\n  $rettype ret;\n"
	       .  "  init___ (&ret);\n  return ret;\n}\n\n";
    }

    if ($rettype =~/\bstatic\b/) {
	push @needuse, $funcname;
    }
}


my $typeidx = 0;

foreach $key (sort keys %usertypes) {
    print "typedef struct t$typeidx { int f$typeidx; } $key;\n";
    $typeidx++;
}

foreach $key (sort keys %userstructs) {
    print "$key { int f$typeidx; };\n";
    $typeidx++;
}

foreach $key (sort keys %userenums) {
    print "$key { e$typeidx };\n";
    $typeidx++;
}

print "#define INLINE\n";
print "#define private\n";
print "#define CONST const\n";
print "#define NORETURN\n";
print "void init___ (void *);\n";

print $decls;
print "\n";
print $defns;

print "void\nuse___ (void)\n{\n";
foreach (@needuse) {
    print "  init___ ($_);\n";
}
print "}\n";

sub analyze_type {
    my ($parm) = @_;
    $parm =~ s/\s*\**\s*$//;
    my $type;
    if ($parm =~ /\b(struct|union)\b/) {
	$parm =~ s/\A.*\b(struct|union)\b/$1/s;
	$parm =~ s/\s*\**\s*\Z//s;
	$userstructs{$parm} = $parm;
    } elsif ($parm =~ /\b(enum)\b/) {
	$parm =~ s/\A.*\b(enum)\b/$1/s;
	$parm =~ s/\s*\**\s*\Z//s;
	$userenums{$parm} = $parm;
    } elsif ((($type) = $parm =~ /(\w+)$/)
	&& ($type !~ /^(int|char|long|short|unsigned|double
			   |register|void|const|static)$/x)) {
	$usertypes{$type} = $type;
    }
}
--- end check-decls ---