From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2623 invoked by alias); 3 May 2005 20:17:02 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 2055 invoked from network); 3 May 2005 20:16:24 -0000 Received: from unknown (HELO sibelius.xs4all.nl) (82.92.89.47) by sourceware.org with SMTP; 3 May 2005 20:16:24 -0000 Received: from elgar.sibelius.xs4all.nl (root@elgar.sibelius.xs4all.nl [192.168.0.2]) by sibelius.xs4all.nl (8.13.0/8.13.0) with ESMTP id j43KD2OT007778; Tue, 3 May 2005 22:13:02 +0200 (CEST) Received: from elgar.sibelius.xs4all.nl (kettenis@localhost.sibelius.xs4all.nl [127.0.0.1]) by elgar.sibelius.xs4all.nl (8.13.4/8.13.3) with ESMTP id j43KD1LU001822; Tue, 3 May 2005 22:13:01 +0200 (CEST) Received: (from kettenis@localhost) by elgar.sibelius.xs4all.nl (8.13.4/8.13.4/Submit) id j43KD1dD005239; Tue, 3 May 2005 22:13:01 +0200 (CEST) Date: Tue, 03 May 2005 20:17:00 -0000 Message-Id: <200505032013.j43KD1dD005239@elgar.sibelius.xs4all.nl> From: Mark Kettenis To: gdb@sourceware.org CC: cagney@gnu.org, eliz@gnu.org In-reply-to: <42778DE6.1080106@gnu.org> (message from Andrew Cagney on Tue, 03 May 2005 10:42:46 -0400) Subject: A case for `void *' for pointers to arbitrary (byte) buffers References: <42710E90.3030300@gnu.org> <200504281919.j3SJJKF1011501@elgar.sibelius.xs4all.nl> <42715EE8.5070704@gnu.org> <01c54c8a$Blat.v2.4$ffbe8140@zahav.net.il> <42753958.70109@gnu.org> <01c54e92$Blat.v2.4$5cf24460@zahav.net.il> <42755FD4.8000009@gnu.org> <01c54f4a$Blat.v2.4$a9fc8500@zahav.net.il> <42778DE6.1080106@gnu.org> X-SW-Source: 2005-05/txt/msg00036.txt.bz2 Date: Tue, 03 May 2005 10:42:46 -0400 From: Andrew Cagney ... and such debate belongs on gdb@. Presumably Mark will respond to your proposal. OK, here it is. The Problem ----------- In many places GDB needs to pass along and pick apart blobs of memory. In many cases, GDB does this using `char *' pointers. Small blobs of memory are often allocated on the stack as `char[]' or `unsigned char[]'. Unfortunately ISO C doesn't specify whether `char' is signed or unsigned. On most systems, char is signed, but on some systems (most notably powerpc) it is unsigned. This leads to many platform-dependent type conversion bugs in code. Since we have been sloppy about the (un)signedness of char in the GDB codebase, there probably are quite a few of these bugs in GDB. Fortunately recent versions of GCC have started to warn about suspicous type conversions. But because of our sloppyness, the current GDB doesn't compiler with -Werror on systems with this new version of GCC. Compiling without -Werror will hide many bugs in our codebase, so we should fix this in GDB. The proper pointer type for arbitrary buffers --------------------------------------------- Looking at ISO C and in particular the ISO C standard library, we notice that there are two families of functions that deal with blobs of memory: * Functions starting with mem, such as memcpy(), memset(), etc. These functions all use the `void *' type to refer to the blobs of memory they operate on. * Functions starting with str, such as strcpy(), strlen(), etc. These functions don't really operate on arbitrary blobs of memory, but rather on NUL-terminated character strings. They all use the `char *' type to refer to these strings. So from a language standpoint of view we should use `char *' for NUL-terminated sequences of bytes and `void *' for arbitrary sequences of bytes. A nice thing about `void *' is that you can implicitly cast it to every other pointer type in the world. Why doesn't GDB use `void *'? ----------------------------- Well, it does, in some places. But the vast majority of our codebase predates ISO C, or at least GDB was supposed to be compiled on compilers that predate ISO C. We couldn't count on `void *' being part of the standard. And even after we made an ISO C compiler a requirement for compiling GDB new code still followed the bad examples from the K&R era. And of course there is also some laziness involved. One of the good things about `void *' pointers is that you can't do any pointer arithmetic with them. So if you want to do pointer arithmetic (and GDB has some genuine need to do that) you'll have to explicitly cast to the right pointer type. This is a good thing. This forces you to think about what you're doing. In the cases where it matters. An example ---------- An example of a good GDB interface where the above applies is extract_unsigned_integer(). Its full prototype is: LONGEST extract_unsigned_integer (const void *addr, int len); Its implementation converts the `void *' pointer into an `unsigned char *' pointer before doing pointer arithmetic. An example of a bad interface is read_memory(). Its full prototype is: void read_memory (CORE_ADDR memaddr, char *myaddr, int len) This function reads LEN bytes from MEMADDR and stores them in MYADDR. But MYADDR is not necessarily a NUL-terminated string. The second argument should be `void *'. Recently a few `struct value'-related functions were converted from using `char *' to `bfd_byte *'. If we take for example unpack_long(): LONGEST unpack_long(struct type *, const bfd_byte *valaddr); Its interface is very similar to extract_unsigned_integer(), in fact it is implemented using extract_unsigned_integer(). Therefore it should use the same pointer type as extract_unsigned_integer(): `void *'. Do we need a `bfd_byte *' or `gdb_byte *'? ------------------------------------------ We might. Many implementations will need to cast to some sort of byte type to do arithmetic or look at individual bytes. As we have seen, using can be `char *' is problematic. In most (problematic) cases we really mean `unsigned char', only a few will warrant `signed char'. And `bfd_byte *' is a nice abbreviation. Personally I'm perfectly happy with writing `unsigned char'. IMHO it has the benefit that it is perfectly clear that this type is unsigned, even to someone who has never been exposed to the GDB codebase. Why not use `xxx_byte *' instead of `void *'? --------------------------------------------- * It's nonstandard. Why do we need a nonstandard type if a perfectly god standard type is available? * It doesn't really solve the issue. It only propagates the problem one level up or down. Since `xxx_byte *' is nothing but a typedef for `unsigned char *', someone calling a functions with `xxx_byte *' as one of its arguments with a `char *' argument will suffer from the warning that raised this entire issue; `void *' breaks the chain immediately. Conclusion ---------- It's clear that we should work actively towards reserving the unqualified `char *' for strings, and probably never ever use an unqualified `char' as a data type (the C type of a single character is `int'). Since GCC 4.0 has been released and will probably be used by many in a few months time, we should solve this issue now, and preferably in our complete codebase. Hopefully we can get consesus soon, and start taking this issue into account when we review patches. Mark