From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kevin Buettner To: Andrew Cagney Cc: gdb@sourceware.cygnus.com Subject: Re: Proposal: convert function definitions to prototyped form Date: Sat, 03 Jun 2000 00:13:00 -0000 Message-id: <1000603071258.ZM32408@ocotillo.lan> References: <1000602075018.ZM29997@ocotillo.lan> <3937816C.E66B9AE0@cygnus.com> X-SW-Source: 2000-06/msg00028.html On Jun 2, 7:42pm, Andrew Cagney wrote: > The only concern I have is, given the slightly more complex nature of > the script (compared to PARAMS) there is a possibility that the > conversion re-orders or re-types the argument list. With that in mind, > should a pre-cursor to this be to least have prototypes for all > (global?) functions (-Wmissing-prototypes?) or only do the conversion > when there is a prototype visible? I've given this matter a lot of thought. I agree that it would be desirable to have prototypes for all functions. Unfortunately, while it is easy to generate prototypes, it's not so easy to know where to stick them. Also, even if we had prototypes in place, there's no guarantee that we'd catch the errors after a few builds because I think there's some code in gdb (though I don't know how much) that never gets built! (Due to ifdefs and near obsolete ports.) What we really need is a method for vetting all of the changes immediately after running the script. I.e, we need to make sure that the conversion does no reordering or retyping of any argument list. Also, we need to make sure that the rewritten function declaration is syntactically correct. While examining the diffs (made with the -u switch) an idea occurred to me. Consider the following example: diff -ur ../../orig/gdb-sourceware/wrapper.c ./wrapper.c --- ../../orig/gdb-sourceware/wrapper.c Sat May 27 17:10:27 2000 +++ ./wrapper.c Thu Jun 1 23:33:16 2000 @@ -57,11 +57,8 @@ static int wrap_parse_and_eval_type (char *); int -gdb_parse_exp_1 (stringptr, block, comma, expression) - char **stringptr; - struct block *block; - int comma; - struct expression **expression; +gdb_parse_exp_1 (char **stringptr, struct block *block, int comma, + struct expression **expression) { struct gdb_wrapper_arguments args; args.args[0].pointer = stringptr; In the above diff, the lines prepended with `-' represent the original K&R definition. And the lines prepended with `+' represent the transformed code. Moreover, the diff is extremely regular in this respect. So... If you take the lines which begin with `+', prepend the type on the line before the `-' lines and tack a semicolon onto the end, you end up with a prototype declaration. And, if you take the lines beginning with `-', again tack the type onto the front and put a function body underneath it, you have a K&R style (traditional) function definition. Now if you put these into a file with the prototype first and the K&R definition later on, you can run "gcc -Wall" on it to see if any warnings are produced. Obviously, if we get warnings, we need to look closer to see if something went wrong with the fix-decls conversion. Of course, there are other details to consider, like making sure that all of the types, structs, unions, and enums are declared. Also, in a source tree as big as gdb, we'll likely wind up with a number of functions with the same name, so some method of disambiguating these will be necessary. And then of course, there's the matter of no declared return type and other oddments. I've written a script called ``check-decls'' which performs these transformations on the diff output. When I run it on the above diff, it produces the following output (indented by four spaces by me for readability) struct block { int f0; }; struct expression { int f1; }; #define INLINE #define private #define CONST const #define NORETURN void init___ (void *); int gdb_parse_exp_1 (char **stringptr, struct block *block, int comma, struct expression **expression); int gdb_parse_exp_1 (stringptr, block, comma, expression) char **stringptr; struct block *block; int comma; struct expression **expression; { int ret; init___ (&ret); return ret; } void use___ (void) { } The use___ () function isn't interesting in this example, but it would be if there had been a static declaration. Here's what happens when I run it on *all* of the diffs: ocotillo:ptests$ ./check-decls prog.c ocotillo:ptests$ wc prog.c 50235 112228 960827 prog.c ocotillo:ptests$ gcc -c -Wall prog.c prog.c: In function `exit': prog.c:39303: warning: function declared `noreturn' has a `return' statement prog.c: At top level: prog.c:45903: parse error before `arg_type' prog.c: In function `value_primitive_field': prog.c:45907: declaration for parameter `arg_type' but no such parameter prog.c:45906: declaration for parameter `fieldno' but no such parameter prog.c:45905: declaration for parameter `offset' but no such parameter prog.c:45904: declaration for parameter `arg1' but no such parameter prog.c:45908: argument `arg_type' doesn't match prototype prog.c:5886: prototype declaration prog.c:45908: argument `arg1' doesn't match prototype prog.c:5886: prototype declaration The `exit' warning is due to the fact that there's a declaration and definition of exit() from standalone.c. It is of no concern. The error following this warning looks more serious. Here's the declaration and the definition of the function involved: value_ptr value_primitive_field (register value_ptr arg1, int offset, register int fieldno, register struct type *arg_type); value_ptr value_primitive_field (arg1, offset, fieldno, arg_type) register value_ptr arg1; int offset; register int fieldno; register struct type *arg_type; { value_ptr ret; init___ (&ret); return ret; } I looked at this for a long, long time and didn't see anything wrong. Finally, I realized that arg_type was a type from a different file. (Which is one of the problems with throwing everything into one big pot.) Anyway, here's the type that the script declared: typedef struct t44 { int f44; } arg_type; And here's the (transformed) definition which caused it to be defined: bool_t xdr_arg_type(xdrs, objp) XDR *xdrs; arg_type *objp; { bool_t ret; init___ (&ret); return ret; } So it turns out that it's nothing to worry about. And that's it. There are no other warnings or errors. Which means that the transformation was successful and didn't mess up any of the parameter types. The check-decls script is below. One might argue that it is about as complex as the fix-decls script. This is true, but the code which actually extracts the `-' and `+' lines is fairly simple. Also, after being extracted, there are no transformations made to these lines aside from appending ___ to the function name if the script detects that the function name has already been seen. Most importantly, the parameter lists are not rewritten in any way. Most of the complexity is in the analysis and generation of the type, struct, enum, and union declarations. But uniqueness of these is easy to verify. Plus, if something is screwed up, the compiler complains. --- check-decls --- #!/usr/bin/perl -w # Feed this script a unidiff after running fix-decls and it generates # (on stdout) a program which may be used to test the validity of the # conversion. Just run the result through gcc -Wall and if it # generates any warnings, there's a problem... undef $/; # slurp mode my $diff = <>; # read entire diff in $diff; my $decls = ''; my $defns = ''; my %userstructs = (); my %userenums = (); my %usertypes = (); my %funcnames = (); my $funcname_gensym = 0; # for names that clash my @needuse; while ($diff =~ / ( ^ # beginning of line [^\n]+ # everything til the end of line ) \n # newline ( (?: ^ # beginning of line - # minus sign (?: \n # either just a newline | # -- or -- [^-\n] # any character but minus and newline [^\n]* # the rest of the line \n # including the newline ) )+ # one or more of the above ) ( (?: ^ # beginning of line \+ # plus sign [^+] # any character but plus [^\n]* # the rest of the line \n # including the newline )+ # one or more of the above ) /mgx) { my ($rettype, $traddecl, $isodecl) = ($1, $2, $3); # Remove leading diff character from the lines extracted foreach ($rettype, $traddecl, $isodecl) { s/^.//mg; } # Find type names in parameter list my $parmdecls = $traddecl; $parmdecls =~ s/^\w+\s*\([^)]*\)//; foreach my $parm (split /\s*;\s*/, $parmdecls) { $parm =~ s/\s*\**\w+(,|$).*$//; analyze_type($parm); } # Resolve collisions between function name (either due to statics # or due to the names being in different branches of an ifdef) my ($funcname) = $traddecl =~ /^(\w+)/; if (defined $funcnames{$funcname}) { foreach ($traddecl, $isodecl) { s/\b$funcname\b/${funcname}___$funcname_gensym/; } $funcname .= "___$funcname_gensym"; $funcname_gensym++; } $funcnames{$funcname} = $funcname; # Nuke comments in the return type $rettype =~ s#/\*.*?\*/##g; # Nuke partial comment in return type $rettype =~ s#^.*?\*/##; # Eliminate ``CALLBACK'' from return type $rettype =~ s/\bCALLBACK\b//; # Eliminate ``extern'' from return type $rettype =~ s/\bextern\b//; # Eliminate leading and trailing spaces from return type $rettype =~ s/^\s*//; $rettype =~ s/\s*$//; if (($rettype =~ /^#/) || ($rettype eq '')) { # preprocessor line or empty string $rettype = 'int'; } elsif ($rettype eq "static") { $rettype = 'static int'; } elsif ($rettype eq "private") { $rettype = 'static int'; } else { analyze_type($rettype); } $isodecl =~ s/\n\Z/;\n/; $decls .= "$rettype $isodecl"; if ($rettype =~ /\bvoid$/) { $defns .= "$rettype\n$traddecl\{\n}\n\n"; } else { $defns .= "$rettype\n$traddecl\{\n $rettype ret;\n" . " init___ (&ret);\n return ret;\n}\n\n"; } if ($rettype =~/\bstatic\b/) { push @needuse, $funcname; } } my $typeidx = 0; foreach $key (sort keys %usertypes) { print "typedef struct t$typeidx { int f$typeidx; } $key;\n"; $typeidx++; } foreach $key (sort keys %userstructs) { print "$key { int f$typeidx; };\n"; $typeidx++; } foreach $key (sort keys %userenums) { print "$key { e$typeidx };\n"; $typeidx++; } print "#define INLINE\n"; print "#define private\n"; print "#define CONST const\n"; print "#define NORETURN\n"; print "void init___ (void *);\n"; print $decls; print "\n"; print $defns; print "void\nuse___ (void)\n{\n"; foreach (@needuse) { print " init___ ($_);\n"; } print "}\n"; sub analyze_type { my ($parm) = @_; $parm =~ s/\s*\**\s*$//; my $type; if ($parm =~ /\b(struct|union)\b/) { $parm =~ s/\A.*\b(struct|union)\b/$1/s; $parm =~ s/\s*\**\s*\Z//s; $userstructs{$parm} = $parm; } elsif ($parm =~ /\b(enum)\b/) { $parm =~ s/\A.*\b(enum)\b/$1/s; $parm =~ s/\s*\**\s*\Z//s; $userenums{$parm} = $parm; } elsif ((($type) = $parm =~ /(\w+)$/) && ($type !~ /^(int|char|long|short|unsigned|double |register|void|const|static)$/x)) { $usertypes{$type} = $type; } } --- end check-decls ---