Re: Big-endian Gcc on Intel IA32

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Big-endian Gcc on Intel IA32
@ 2001-12-23  7:26 dewar
  0 siblings, 0 replies; 30+ messages in thread
From: dewar @ 2001-12-23  7:26 UTC (permalink / raw)
  To: dewar, fw; +Cc: gcc, torvalds

<<Yes, but the interesting case (at least if you have to match a given
external representation) is not addressed:
>>

Well if you are somehow reading 8 bit values through an interface to
a 36-bit machine, then you have to define that separately. That has nothing
whatsoever to do with this discussion.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
@ 2001-12-23  7:06 dewar
  2001-12-23  7:08 ` Florian Weimer
  0 siblings, 1 reply; 30+ messages in thread
From: dewar @ 2001-12-23  7:06 UTC (permalink / raw)
  To: dewar, fw; +Cc: gcc, torvalds

<<I wouldn't tackle this problem at the struct level, but at the
discrete type level, IOW introduce additional integer types with
different resentation.  This would already greatly help in many cases.
>>

Yes, I already suggested this, and noted that this was what we did
in Realia COBOL (see archives)

<<If you've got signed-magnitude representation, you've got plenty of
positions in which you can place the sign bit.
>>

There are no S&M machines, so this is bogus. There are 1's complement machines
but the issue is not affected by 1s or 2s complement. Even for S&M, the
sign bit was always the most significant, so you are inventing a non-existant
problem here.

<<If the machine is word-adressed, all we do in this regard won't help
much to increase portability because a lot of data structures with a
given external representation assume you can access individual octets,
and the mapping to a useful machine implementation is certainly not
straightforward.  For example, how does an IP header look on a 36 bit
machine?
>>>

The point is that it is quite straightforward to address the problem WITHIN
an address unit. Ada already does this. Have a look at what GNAT implements
here with the Bit_Order attribute (and also see the discussion of why it
is not easy to do more). This is in the GNAT RM.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-23  7:06 dewar
@ 2001-12-23  7:08 ` Florian Weimer
  0 siblings, 0 replies; 30+ messages in thread
From: Florian Weimer @ 2001-12-23  7:08 UTC (permalink / raw)
  To: dewar; +Cc: gcc, torvalds

dewar@gnat.com writes:

> The point is that it is quite straightforward to address the problem WITHIN
> an address unit. Ada already does this. Have a look at what GNAT implements
> here with the Bit_Order attribute (and also see the discussion of why it
> is not easy to do more). This is in the GNAT RM.

Yes, but the interesting case (at least if you have to match a given
external representation) is not addressed:

| If byte flipping is required for interoperability between big- and
| little-endian machines, this must be explicitly programmed. This
| capability is not provided by @code{Bit_Order}.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Big-endian Gcc on Intel IA32
@ 2001-12-20  5:36 Etienne Lorrain
  0 siblings, 0 replies; 30+ messages in thread
From: Etienne Lorrain @ 2001-12-20  5:36 UTC (permalink / raw)
  To: gcc

  Just a (maybe meaningless) comment on attributes:

 When you begin to describe your variables with __attribute__(()),
 for instance to optimise code size or speed, you also feel a need
 to test if the variable as such or such attributes, for instance in
 inline functions/macro. Let's take the example of align/alignof:

  struct longstruct mydata __attribute__((aligned(16)));

  static const struct longstruct emptydata = {};

  void init_mydata (struct longstruct *mydataptr)
  {
  if (alignof (emptydata) > 4)
      fast_memcpy(...);
    else
      slow_memcopy(...);
  }

  Maybe a more generic way to define and test attributes, at least
 for structure types, would be something like:

  typedef struct {
      unsigned field1, field2;
      char field3;
      const unsigned __aligned__ = 16; /* not counted in sizeof() */
      } an_aligned_16_type;

  an_aligned_16_type data;

  { if (data.__aligned__ == 16) {} else {} }

  In the same spirit:
  typedef struct {
      const unsigned __packed__ = 1;
      unsigned short d1;
      unsigned long d2;
      }

  typedef struct {
      const unsigned __segment__ = __segment_gs;
      unsigned char red;
      unsigned char green;
      unsigned char blue;
      } what_I_need_for_another_project;

  void fct (void)
  {
  const unsigned __aligned__ = 64;

  ....
  }

  The programmer could also define his own attributes.
  Would also not break too much "indent" and "lint" kind of software.

  No, I do not have the time to implement, sorry.
  Etienne.

___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en franÃ§ais !
Yahoo! Courrier : http://courrier.yahoo.fr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Big-endian Gcc on Intel IA32
@ 2001-12-19 11:47 Bernard Dautrevaux
  2001-12-19 13:09 ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Bernard Dautrevaux @ 2001-12-19 11:47 UTC (permalink / raw)
  To: 'Linus Torvalds', Morten Welinder; +Cc: gcc

> -----Original Message-----
> From: Linus Torvalds [mailto:torvalds@transmeta.com]
> Sent: Tuesday, December 18, 2001 11:28 PM
> To: Morten Welinder
> Cc: gcc@gcc.gnu.org
> Subject: Re: Big-endian Gcc on Intel IA32
> 

	<skipped>

> 
> (To the person suggesting how to do it in C++ - you _can_ get 
> a subset of
> this in C by the above "embed in a structure" trick).
> 

Just can't resist comment on this. In fact in C++ you can do a lot more; you
can also define all the needed and meaningful operators on this type, so
that you may, for example, add or subtract an int from it, but not add two
of these or multiply them.

Using this C++ "trick" (if you want to call it that way; in fact it's
standard C++ coding practice) you can quite simply defined all the little
endian scalar types, even specifying they are not aligned like the host
processor may expect, and use them as if they were native types. 

Then you only have to define the variable or struct field as little-endian
and you will always swap bytes when reading/writing without having to add
htonl() around each operation. Moreover on some processors, like PowerPC,
you in fact have "load and reverse" and "store and reverse" instruction that
you can use in the inline asm instruction implementing the "convert to
native type" and "assign native type" operations, so that the performance
penalty is in fact very small.

I opersonally have defined such a set of classes for our Object Oriented
Real Time Kernel (SoftKernel) and use it on the PowerPC or 68xxx processors
to access PCI bridges, x86-family devices or USB data structures and
devices, in a way that allow the exact same source code to also be compiled
for an x86 processor and work identically in big-endian and little-endian
environments.

The only thing this does not address is bit-fields. We decided in fact to
handle them manually to be sure of the result, and always choose the
underlying scalar type to ensure that we never have "split-fields".

Just my (obviously C++ biased) .02euros

	Bernard

--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel:	+33 (0) 1 47 68 80 80
Fax:	+33 (0) 1 47 88 97 85
e-mail:	dautrevaux@microprocess.com
		b.dautrevaux@usa.net
-------------------------------------------- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Big-endian Gcc on Intel IA32
  2001-12-19 11:47 Bernard Dautrevaux
@ 2001-12-19 13:09 ` Linus Torvalds
  0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2001-12-19 13:09 UTC (permalink / raw)
  To: Bernard Dautrevaux; +Cc: Morten Welinder, gcc

On Wed, 19 Dec 2001, Bernard Dautrevaux wrote:
> >
> > (To the person suggesting how to do it in C++ - you _can_ get a subset of
> > this in C by the above "embed in a structure" trick).
>
> Just can't resist comment on this. In fact in C++ you can do a lot more; you
> can also define all the needed and meaningful operators on this type, so
> that you may, for example, add or subtract an int from it, but not add two
> of these or multiply them.

Agreed. And to do the same in C you need to start actually adding
syntactic elements, ie you can't just overload the "+" operator, you have
to manually add "my_add(a,b)".

The basic point I wanted to make was, however, the ease of retrofitting
and tracking (and, potentially fixing) existing code that depends on some
special "attribute". It gets even worse if you have multiple independent
attributes that can potentially be mixed (ie same type of element, but
different restrictions on the element).

It doesn't have to be in the compiler, of course. There are off-line tools
(both free and commercial) to track semantics, eg code viewers etc that
are able to track the flow of data through the system. And maybe that is
fundamentally the right approach.

Having a way of tagging data structures and getting the compiler to check
them statically at compile-time is historically useful, though. That is,
after all, exactly what function prototypes are - a "tag" on the function
that specifies what kinds of arguments it accepts (and one that you cannot
remove without quite explicit casting).

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
@ 2001-12-18 11:41 Morten Welinder
  2001-12-18 11:42 ` Phil Edwards
  2001-12-18 14:48 ` Linus Torvalds
  0 siblings, 2 replies; 30+ messages in thread
From: Morten Welinder @ 2001-12-18 11:41 UTC (permalink / raw)
  To: gcc; +Cc: torvalds

Linus Torvalds <torvalds at transmeta dot com> wrote...

> (Inside the kernel, I'd love to be able to taint pointers and data that
> came from user space, for example, to make sure that the compiler will
> refuse to even _compile_ code that uses such data without the proper
> safety checks. This is not all that different from keeping track of what
> byte-order a specific datum has).

I would guess that you can do this with C++.

Now I realize that you are not about to rewrite the kernel in C++
(unless you have sampled a bit too much Glogg recently, :-)  What
I am saying is that you could probably make minor changes to the
current source code such that...

1. Its C interpretation does not change.
2. Its C++ interpretation would have a user_data* type and do the
   check you ask for.

I.e., you could use g++ as the San Jose checker, ;-)  The .o files
would not be any good, of course.

I seem to remember that once upon a time you said that you wanted
type int when you deal with ints (as opposed to having some typedef
name like off_t).  If that is still true, I guess you will not like
this kind of approach.

Morten

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-18 11:41 Morten Welinder
@ 2001-12-18 11:42 ` Phil Edwards
  2001-12-18 14:48 ` Linus Torvalds
  1 sibling, 0 replies; 30+ messages in thread
From: Phil Edwards @ 2001-12-18 11:42 UTC (permalink / raw)
  To: Morten Welinder; +Cc: gcc, torvalds

On Tue, Dec 18, 2001 at 07:26:58PM -0000, Morten Welinder wrote:
> I.e., you could use g++ as the San Jose checker, ;-)  The .o files
> would not be any good, of course.

With -fsyntax-only there wouldn't even be any .o files.


Phil

-- 
If ye love wealth greater than liberty, the tranquility of servitude greater
than the animating contest for freedom, go home and leave us in peace.  We seek
not your counsel, nor your arms.  Crouch down and lick the hand that feeds you;
and may posterity forget that ye were our countrymen.            - Samuel Adams

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-18 11:41 Morten Welinder
  2001-12-18 11:42 ` Phil Edwards
@ 2001-12-18 14:48 ` Linus Torvalds
  1 sibling, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2001-12-18 14:48 UTC (permalink / raw)
  To: Morten Welinder; +Cc: gcc

On 18 Dec 2001, Morten Welinder wrote:
> Now I realize that you are not about to rewrite the kernel in C++
> (unless you have sampled a bit too much Glogg recently, :-)  What
> I am saying is that you could probably make minor changes to the
> current source code such that...
>
> 1. Its C interpretation does not change.
> 2. Its C++ interpretation would have a user_data* type and do the
>    check you ask for.

Well, there is actually a project (the "stanford checker") which goes even
further than this, and does an instrumented gcc back-end, where you can
add a lot of almost arbitrary rules on what constitutes tainting.

The problem with it is that it's not automatic, and it doesn't give the
kind of "immediate feedback" as a direct compiler warning or error does.

Your suggestion of using C++ as a separate checker is not really much more
than a very cut-down version of the (quite interesting) standford project.
It might make it slightly easier for people to check, but it's not quite
there..

> I seem to remember that once upon a time you said that you wanted
> type int when you deal with ints (as opposed to having some typedef
> name like off_t).  If that is still true, I guess you will not like
> this kind of approach.

I don't like abstraction for abstractions sake - a lot of people seem to
want to abstract things just because they _can_, not because it makes any
real sense. For example, POSIX wanted to abstract the "length of a socket
name", and created "socklen_t", which simply _has_ to be the same as "int"
if you are going to be compatible with historical uses (and has to support
all the same operations etc, so it's not a opaque type in any case). That
is a useless abstraction - you're not actually adding information, you're
only adding chaos.

However, in other cases it can be quite useful to specify a "immutable"
type. There are many cases where you have basically integer types, but
they are integers that have emplicit meaning, and doing arithmetic on them
is a nonsensical operation.

Linux actually ends up embedding some of these as unique structures, just
because that's the only way to strengthen the C type set. It does
sometimes impact code quality (gcc seems to be better at returning
integers than returning integers wrapped in a structure etc), but not by
much, and the abstraction you get in these cases is definitely worth it.

However, creating a new structure type for each thing is actually quite a
lot of effort, and gets tedious. So just a set of "taint bits" would be
syntactically easier.

(To the person suggesting how to do it in C++ - you _can_ get a subset of
this in C by the above "embed in a structure" trick).

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
@ 2001-12-18  3:49 dewar
  2001-12-23  6:59 ` Florian Weimer
  0 siblings, 1 reply; 30+ messages in thread
From: dewar @ 2001-12-18  3:49 UTC (permalink / raw)
  To: dewar, fw; +Cc: gcc, torvalds

<<Yes, in the general case, this is of course right.  But for two's
complement, octet-adressed machines, endian representation clauses for
discrete types could be implemented without major problems, I think,
at least if you restrict yourself to the little/big endian case and
ignore PDP endian.
>>

I will just give one example of a problem. What do you do with a type
which is a union, one branch of which is a four byte integer, the other
branch is two two-byte integers. Here is another problem, do you want
to allow non-contiguous fields in the case of bit field specifications.
Again, I refer people to the Norm Cohen paper on this subject. Yes, it
is possible that you can find a restricted set of cases you can deal
with at the struct level, but Florian's suggestion that the only
restriction necessary is 2-s complement (what's that got to do with
the problem???) and octet-addressed machines (what's that got to do with
the problem--it's easier to deal with this problem on word addressed machines)
is flawed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-18  3:49 dewar
@ 2001-12-23  6:59 ` Florian Weimer
  0 siblings, 0 replies; 30+ messages in thread
From: Florian Weimer @ 2001-12-23  6:59 UTC (permalink / raw)
  To: dewar; +Cc: gcc, torvalds

dewar@gnat.com writes:

> I will just give one example of a problem. What do you do with a type
> which is a union, one branch of which is a four byte integer, the other
> branch is two two-byte integers.

This probably indicates a bug in the problem, so the compiler should
reject it.

> Here is another problem, do you want to allow non-contiguous fields
> in the case of bit field specifications.

Hmm, I assume that this is part of the solution of the general
case. ;-)

> Yes, it is possible that you can find a restricted set of cases you
> can deal with at the struct level,

I wouldn't tackle this problem at the struct level, but at the
discrete type level, IOW introduce additional integer types with
different resentation.  This would already greatly help in many cases.

> but Florian's suggestion that the only restriction necessary is 2-s
> complement (what's that got to do with the problem???)

If you've got signed-magnitude representation, you've got plenty of
positions in which you can place the sign bit.

> and octet-addressed machines (what's that got to do with the
> problem--it's easier to deal with this problem on word addressed
> machines)

If the machine is word-adressed, all we do in this regard won't help
much to increase portability because a lot of data structures with a
given external representation assume you can access individual octets,
and the mapping to a useful machine implementation is certainly not
straightforward.  For example, how does an IP header look on a 36 bit
machine?

> is flawed.

Maybe. ;-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
@ 2001-12-17 18:39 dewar
  2001-12-17 18:59 ` Per Bothner
  0 siblings, 1 reply; 30+ messages in thread
From: dewar @ 2001-12-17 18:39 UTC (permalink / raw)
  To: guerby, torvalds; +Cc: dewar, gcc

> Too bad about the algol syntax and all the overkill features (hey, I think
> C++ is complex, Ada is so far off the scale that it's not even funny).>

Well this is not the place for language wars, but in practice those who know
Ada well and C++ well typically find then to be languages of similar
complexity (with perhaps a little bias showing one way or the other), but
a judgment that Ada is "so far off the scale" is one that in my experience
only comes from those who do not know Ada well, or even at all (to be fair
there are plenty of Ada folks who say dubious things about C++ without knowing
the language -- it seems to be a regrettable tendency in the programming
language area for people to have strong opinions about languages they don't
know. As a COBOL expert, I encounter that phenomenon all the time :-) :-)

Anyway, as I say, I don't think that this is the place for language wars,
and flame bait statements like the one quoted here :-)

What I *do* think we can usefully do is to learn from useful features in
the various languages, especially when we are discussing adding new features
to an existing language. In this particular case, I would really like to
see GNU C (and g++) implement some of the Ada features in the data
representation area for two reasons:

  1. These are really useful in controlling layout of data, and dealing with
  such issues as bit order etc.

  2. If this handling was in the back end, it would be more efficient, and
  would allow us to rip a lot of junk out of the Ada front end.

A candidate is indeed some kind of assistance for handling endianess. What
we did in the Realia COBOL compiler, which certainly is helpful and does
not raise awkward language issues, is to simply define the equivalent of
an attribute that applies to an integral type which says what endianess
it has (in Realia COBOL, COMP-4 was IBM compatible big endian, and COMP-5
was Intel little-endian). That's quite easy to implement as a gcc attribute
and would be very useful. This would avoid getting into the more complex
record layout issues.

Another candidate would be bit packed arrays. In practice this is a very
useful feature. Currently the circuitry for bit packing is in the front end
of the Ada compiler, but it would be nice to move it into the back end so
that other GNU languages could take advantage of it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 18:39 dewar
@ 2001-12-17 18:59 ` Per Bothner
  0 siblings, 0 replies; 30+ messages in thread
From: Per Bothner @ 2001-12-17 18:59 UTC (permalink / raw)
  To: dewar; +Cc: gcc

dewar@gnat.com wrote:

> Another candidate would be bit packed arrays. In practice this is a very
> useful feature. Currently the circuitry for bit packing is in the front end
> of the Ada compiler, but it would be nice to move it into the back end so
> that other GNU languages could take advantage of it. 

Chill also has/had 'bitstrings' and 'powersets' (i.e. what Pascal calls 
'sets'), both of which are basically bit-pached arrays of booleans.
The latter uses the SET_TYPE tree code, and we added some support in
store_constructor and a couple of other places, but mostly we just
generated function calls.  Real back-end support would have made it
easier to generate better code.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
@ 2001-12-17 13:14 dewar
  2001-12-17 13:42 ` guerby
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: dewar @ 2001-12-17 13:14 UTC (permalink / raw)
  To: dewar, gcc, torvalds

<<<<One thing that might be helpful for portability issues like this, where
the user _is_ willing and able to recompile the application, but maybe
not able to find all subtly users of byte-order dependencies would be to
allow the notion of "byte order attributes" on data structures.
>>

This is a much trickier language feature to design than you would imagine.
We have been struggling with this in Ada for a while.

<<Imagine being able to just tag the data structures with "this data
structure is big-endian", and have the compiler automatically do the
conversion when a value is loaded from such a data structure.
>>

I't snice to imagine, but hard to work out the details.
I am definitely not opposed to this, and indeed support an effort
to try to design such a feature. A good starting point for reading
is Norm Cohen's tutorial on the Bit_Order attribute in Ada (not sure
what the reference is, perhaps someone else can supply it).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 13:14 dewar
@ 2001-12-17 13:42 ` guerby
  2001-12-17 13:43 ` Linus Torvalds
  2001-12-18  1:28 ` Florian Weimer
  2 siblings, 0 replies; 30+ messages in thread
From: guerby @ 2001-12-17 13:42 UTC (permalink / raw)
  To: dewar; +Cc: dewar, gcc, torvalds

> A good starting point for reading is Norm Cohen's tutorial on the
> Bit_Order attribute in Ada (not sure what the reference is, perhaps
> someone else can supply it).

Google is our friend:

<http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00133.TXT>

<http://pebbles.ocsystems.com/~acats/ai-files/grab_bag/bitorder.pdf>

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 13:14 dewar
  2001-12-17 13:42 ` guerby
@ 2001-12-17 13:43 ` Linus Torvalds
  2001-12-17 14:22   ` guerby
                     ` (2 more replies)
  2001-12-18  1:28 ` Florian Weimer
  2 siblings, 3 replies; 30+ messages in thread
From: Linus Torvalds @ 2001-12-17 13:43 UTC (permalink / raw)
  To: dewar; +Cc: gcc

On Mon, 17 Dec 2001 dewar@gnat.com wrote:
>
> This is a much trickier language feature to design than you would imagine.
> We have been struggling with this in Ada for a while.

Hmm.. It sounds like one of those "obvious in principle" things, but I can
imagine that it falls afoul of a lot of the gcc optimizations (ie x86.md
has a pattern for doing "load + and $255" with a "movzbl" instruction,
which is legal only on little-endian data: on big-endian you can still do
it, but you have to modify the address).

That's just the _really_ obvious kind of problem I can imagine off-hand. I
assume you've seen many many more..

However, I think that the most _fundamental_ problem is completely
independent of whether a simple and good implementation for gcc is even
feasible: it's not even clear that a byte-order attribute necessarily
helps porting of legacy applications all that much.

The problem is pointers do data - you must _never_ lose the byte-order
attribute by mistake, and you must never mix them. And a compiler (and
particularly a C compiler) has a really hard time asserting that people
don't mis-use pointers, with "void *" often being used as a "whatever".

So I realize that a lot of code is byte-order dependent exactly because
the code itself uses the same pointer in different ways (ie what happens
when you pass a byte-order-aware pointer to something like "memcpy()"?
It's ok if _both_ pointers are of the same byte order and the same type,
but not in general. And that's the _easy_ case, with a standard function
that the compiler could check for).

So it may be that the feature itself is simply not very helpful, simply
because it's so hard to retrofit existing programs even if you had some
compiler support for the notion.

So the actual _implementation_ on a gcc level might be the least of your
troubles.

That said, it still sounds like one of those dangerously "simple and
clever" ideas.

On a tangential issue:

I actually think that it might be equally powerful to just have a way of
"tainting" certain pointers, and disallowing their use at compile-time
unless the recipient claims to accept the specific form of "tainting".
This is, in fact, more-or-less what the "const" qualifier does, but it
might be useful to allow user-defined "taints".

The reason this is tangential is that byte-order would be one such
potential use of "tainting" - not so much for compiler-assisted code
generation, but simply for compiler-assisted type-checking: allowing the
person who gets stuck with the job of fixing byte-order problems to
"taint" the pointers with byte-order information, and make the compiler
warn about it when a pointer is ever passed into any function that doesn't
expect that byte-order.

So the byte-order-attribute thing doesn't actually have to affect code
generation to be potentially useful.

(Inside the kernel, I'd love to be able to taint pointers and data that
came from user space, for example, to make sure that the compiler will
refuse to even _compile_ code that uses such data without the proper
safety checks. This is not all that different from keeping track of what
byte-order a specific datum has).

Ehh?

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 13:43 ` Linus Torvalds
@ 2001-12-17 14:22   ` guerby
  2001-12-17 14:52     ` Linus Torvalds
  2001-12-17 15:01   ` Richard Henderson
  2001-12-17 16:43   ` Ross Smith
  2 siblings, 1 reply; 30+ messages in thread
From: guerby @ 2001-12-17 14:22 UTC (permalink / raw)
  To: torvalds; +Cc: dewar, gcc

Linus, your "tainting" looks like static typing to me :).

$ gnatgcc -c -gnatl tmp/p.adb

GNAT 3.13p  (20000509) Copyright 1992-2000 Free Software Foundation, Inc.

Compiling: tmp/p.adb (source file time stamp: 2001-12-17 21:42:24)

     1. procedure P is
     2. 
     3.    type User_Ptr is access all Character;
     4.    type Kernel_Ptr is access all Character;
     5. 
     6.    X : User_Ptr;
     7.    Y : Kernel_Ptr;
     8. 
     9. begin
    10.    X := Y;
                |
        >>> expected type "User_Ptr" defined at line 3
        >>> found type "Kernel_Ptr" defined at line 4

    11. end P;

 11 lines: 2 errors

As for low level representation and convertion issues, Ada already
handles array Packing (packed, not packed), array Convention (Fortran
and others) for arrays, and elaborated representation clauses for
records.

procedure P2 is

   type R1 is record
      A, B : Character;
   end record;

   type R1_AB is new R1;
   for R1_AB use record
      A at 0 range 0 ..7;
      B at 0 range 8 .. 15;
   end record;
   for R1_AB'Size use 16;

   type R1_BA is new R1;
   for R1_BA use record
      A at 0 range 8 .. 15;
      B at 0 range 0 ..7;
   end record;
   for R1_BA'Size use 16;

   X_AB : R1_AB;
   X_BA : R1_BA;

begin
   X_AB := ('A', 'B');
   X_BA := R1_BA (X_AB); -- compiler does the work
   -- X_BA := X_AB; -- Illegal
end P2;

Never looked at the quality of the generated code though :).

The document mentionned by Robert looks at the issues of adding bit
ordering in the existing Ada bag of tricks in the area.

PS: You can always escape fascist compiler typing with
Unchecked_Convertion in Ada.

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 14:22   ` guerby
@ 2001-12-17 14:52     ` Linus Torvalds
  0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2001-12-17 14:52 UTC (permalink / raw)
  To: guerby; +Cc: dewar, gcc

On Mon, 17 Dec 2001 guerby@acm.org wrote:
>
> Linus, your "tainting" looks like static typing to me :).

Hey, considering that Ada has every single language feature ever imagined,
and probably some that nobody reasonably _should_ have imagined, I'm not
surprised.

Too bad about the algol syntax and all the overkill features (hey, I think
C++ is complex, Ada is so far off the scale that it's not even funny). I'm
hoping C will steal the _good_ ideas from other languages (slowly, because
that's the only way to make sure they are good).

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 13:43 ` Linus Torvalds
  2001-12-17 14:22   ` guerby
@ 2001-12-17 15:01   ` Richard Henderson
  2001-12-17 15:12     ` Linus Torvalds
  2001-12-17 16:43   ` Ross Smith
  2 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2001-12-17 15:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dewar, gcc

On Mon, Dec 17, 2001 at 01:40:04PM -0800, Linus Torvalds wrote:
> Hmm.. It sounds like one of those "obvious in principle" things, but I can
> imagine that it falls afoul of a lot of the gcc optimizations (ie x86.md
> has a pattern for doing "load + and $255" with a "movzbl" instruction,
> which is legal only on little-endian data: on big-endian you can still do
> it, but you have to modify the address).

Except that if gcc had generated a big-endian load, you'd
have "load + swap + and" in the instruction stream, which
wouldn't use movzbl.

I don't think there's anything conceptually complex about
adding this extension, just tedious.

> I actually think that it might be equally powerful to just have a way of
> "tainting" certain pointers, and disallowing their use at compile-time
> unless the recipient claims to accept the specific form of "tainting".
> This is, in fact, more-or-less what the "const" qualifier does, but it
> might be useful to allow user-defined "taints".

This runs afoul of a long-standing misfeature that a pointer to
Thing forgets about the attributes that Thing carried.  Fixing
this is desirable, but no one has stepped forward to do it.

As for the user-defined taint, implementing that should be as
simple as defining a type attribute with some string/identifier 
argument that does nothing.  Since the attribute modifies the
type, a declaration with and without should be incompatible.

Dunno how far you'd be able to go with this, given the rabid
casting to unsigned long that tends to happen in the kernel...

r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 15:01   ` Richard Henderson
@ 2001-12-17 15:12     ` Linus Torvalds
  2001-12-17 15:54       ` Richard Henderson
  2001-12-18 11:55       ` Jason Riedy
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2001-12-17 15:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: dewar, gcc

On Mon, 17 Dec 2001, Richard Henderson wrote:
>
> Dunno how far you'd be able to go with this, given the rabid
> casting to unsigned long that tends to happen in the kernel...

To be really effective, you'd really say that you _cannot_ cast a tainted
pointer (or even access through it), except with a specific "untaint me"
operation. The only thing you could do is pass it along to somebody else
who accepted a tainted pointer.

So I wasn't talking about a "const" like bit (even though it shares some
of the notions with "const"), but more of an "immutable" bit.

And I suspect some people might want to use it not as a "taint" bit, but
simply as a way to have a mechanism to strengthen the C type system.

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 15:12     ` Linus Torvalds
@ 2001-12-17 15:54       ` Richard Henderson
  2001-12-17 17:43         ` Linus Torvalds
  2001-12-18 11:55       ` Jason Riedy
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2001-12-17 15:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dewar, gcc

On Mon, Dec 17, 2001 at 03:08:58PM -0800, Linus Torvalds wrote:
> So I wasn't talking about a "const" like bit (even though it shares some
> of the notions with "const"), but more of an "immutable" bit.

That's not going to be possible without a lot of invasion
into the type system.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 15:54       ` Richard Henderson
@ 2001-12-17 17:43         ` Linus Torvalds
  2001-12-17 18:12           ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2001-12-17 17:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: dewar, gcc

On Mon, 17 Dec 2001, Richard Henderson wrote:
> On Mon, Dec 17, 2001 at 03:08:58PM -0800, Linus Torvalds wrote:
> > So I wasn't talking about a "const" like bit (even though it shares some
> > of the notions with "const"), but more of an "immutable" bit.
>
> That's not going to be possible without a lot of invasion
> into the type system.

Hmm? Not even just add a warning for _all_ casts of such pointers? This is
purely static thing, after all.

Or is it that the information just isn't carried around in a convenient
format? gcc does seem to warn about _some_ casts ("discards qualifiers
from pointer target type", "makes integer from pointer without cast").
Does it lose the information early for explicit casts or something?

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 17:43         ` Linus Torvalds
@ 2001-12-17 18:12           ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2001-12-17 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dewar, gcc

On Mon, Dec 17, 2001 at 05:10:01PM -0800, Linus Torvalds wrote:
> Or is it that the information just isn't carried around in a convenient
> format? gcc does seem to warn about _some_ casts ("discards qualifiers
> from pointer target type", "makes integer from pointer without cast").
> Does it lose the information early for explicit casts or something?

Err, no...  Actually, I don't know what I was thinking.  Of course
we can statically notice attribute differences.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 15:12     ` Linus Torvalds
  2001-12-17 15:54       ` Richard Henderson
@ 2001-12-18 11:55       ` Jason Riedy
  1 sibling, 0 replies; 30+ messages in thread
From: Jason Riedy @ 2001-12-18 11:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

And Linus Torvalds writes:
 - 
 - And I suspect some people might want to use it not as a "taint" bit, but
 - simply as a way to have a mechanism to strengthen the C type system.

There's some interesting UCB research in type qualifiers in 
general.  One specific target has been checking tainting,
and another has been locking in Linux.  The tool works on
pre-processed code, and it can take quite a lot of memory
when analyzing across multiple files.

  tool: http://www.cs.berkeley.edu/Research/Aiken/cqual/
	(older version; Jeff may provide a newer, experimental
	 one on request if he has time)
  tainting: http://www.cs.berkeley.edu/~jfoster/papers/usenix01.ps.gz
  locking: http://www.cs.berkeley.edu/~jfoster/papers/pldi02-flow.pdf

The most relevant observation:  Doing this properly (i.e. 
few to no false positives) requires polymorphic qualifiers.  
The "taint" has to pass through functions silently.  Treating
it just like "const" _does_ yield too many false positives.
So for this to be useful, you'll need more than what gcc
currently provides.

Providing polymorphic machinery for general gcc front-ends 
would take a good amount of work.  IIRC, Perl's data tainting 
is essentially a dynamic type tag.  Polymorphism is a static 
type system's way of avoiding the dynamic tag.  (One view of 
polymorphism, that is.)

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 13:43 ` Linus Torvalds
  2001-12-17 14:22   ` guerby
  2001-12-17 15:01   ` Richard Henderson
@ 2001-12-17 16:43   ` Ross Smith
  2 siblings, 0 replies; 30+ messages in thread
From: Ross Smith @ 2001-12-17 16:43 UTC (permalink / raw)
  To: gcc

Linus Torvalds wrote:
> 
> I actually think that it might be equally powerful to just have a way of
> "tainting" certain pointers, and disallowing their use at compile-time
> unless the recipient claims to accept the specific form of "tainting".
> This is, in fact, more-or-less what the "const" qualifier does, but it
> might be useful to allow user-defined "taints".

You can do this already in C++. (Of course I realise this isn't much
help for a large C application like the kernel.)

  // Andrei Alexandrescu's static assertion template
  // (this is in everybody's library by now)
  template <bool Pred> struct static_assert;
  template <> struct static_assert<true> {};

  // This is valid only if there are no bits in Src that aren't in Dst
  template <unsigned Src, unsigned Dst> struct convert_check:
    static_assert<((Src | Dst) == Dst)> {};

  // Template for qualified types
  // Each bit in Mask represents a type qualifier
  template <typename T, unsigned Mask> class qualified {
    public:
      qualified(T t = T()):
        value_(t) {}
      operator T() const {
        static_assert<(Mask == 0)>();
        return value_;
      };
      template <typename T2, unsigned Mask2>
        qualified(qualified<T2, Mask2> src):
          value_(src.value_) {
            convert_check<Mask2, Mask>();
          }
      template <unsigned Mask2> friend qualified<T, Mask2>
        qualified_cast(qualified src) {
          return qualified<T, Mask2>(src.value_);
        }
    private:
      T value_;
  };

  // Define a couple of type qualifiers
  const unsigned magic(1);
  const unsigned tainted(2);

  // Examples of use
  int main() {
    int i(42);                             // Plain int
    qualified<int, magic> m;               // Magic int
    qualified<int, (magic | tainted)> mt;  // Magic, tainted int
    mt = i;                                // OK, can add qualifiers
    // m = mt;                             // This won't compile
    m = qualified_cast<magic>(mt);         // But this will
    return 0;
  }


-- 
Ross Smith ...................................... Auckland, New Zealand
r-smith@ihug.co.nz ......................... http://storm.net.nz/~ross/
  "We need a new cosmology. New gods. New sacraments. Another drink."
                                                       -- Patti Smith

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 13:14 dewar
  2001-12-17 13:42 ` guerby
  2001-12-17 13:43 ` Linus Torvalds
@ 2001-12-18  1:28 ` Florian Weimer
  2 siblings, 0 replies; 30+ messages in thread
From: Florian Weimer @ 2001-12-18  1:28 UTC (permalink / raw)
  To: dewar; +Cc: gcc, torvalds

dewar@gnat.com writes:

> This is a much trickier language feature to design than you would
> imagine.  We have been struggling with this in Ada for a while.

Yes, in the general case, this is of course right.  But for two's
complement, octet-adressed machines, endian representation clauses for
discrete types could be implemented without major problems, I think,
at least if you restrict yourself to the little/big endian case and
ignore PDP endian.

The general case (which has to include processors with, say, 18 bit
storage units) is much harder, of course.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
@ 2001-12-17 12:08 dewar
  2001-12-17 13:10 ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: dewar @ 2001-12-17 12:08 UTC (permalink / raw)
  To: bose.ghanta, gcc

<<My question to you all is:  Is there a big-endian GCC available on IA32?
                                       If available, who is the source of
contact and what is the effort involved here?
>>

The code quality would be significantly decreased, and you would have 
compatibility problems with everything in site. To me this sounds like
a bad alley to walk down.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 12:08 dewar
@ 2001-12-17 13:10 ` Linus Torvalds
  2001-12-17 14:00   ` Alan Lehotsky
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2001-12-17 13:10 UTC (permalink / raw)
  To: dewar, gcc

In article <20011217200410.D28DAF28BE@nile.gnat.com> you write:
><<My question to you all is:  Is there a big-endian GCC available on IA32?
>                                       If available, who is the source of
>contact and what is the effort involved here?
>>>
>
>The code quality would be significantly decreased, and you would have 
>compatibility problems with everything in site. To me this sounds like
>a bad alley to walk down.

One thing that might be helpful for portability issues like this, where
the user _is_ willing and able to recompile the application, but maybe
not able to find all subtly users of byte-order dependencies would be to
allow the notion of "byte order attributes" on data structures. 

This can be especially useful for those architectures that actually have
at least limited support for either byte-order (ie I think sparc64 has a
"load as little-endian"). 

Imagine being able to just tag the data structures with "this data
structure is big-endian", and have the compiler automatically do the
conversion when a value is loaded from such a data structure.

Example:

	unsigned long x[10] __attribute__(("bigendian"));

	unsigned long i = x[0];

would generate a load + bswap on x86, and would generate just a load on
sparc and other bigendian architectures, while the corresponding
little-endian version would generate a "load as le" on sparc and a plain
load on x86. 

Imagine using this together with reading/writing all data structures to
disk as structures - and letting the compiler automatically handle the
conversions between a common-endian disk file and various different
endiannesses of different architectures.

Is it worth it? Most people will probably argue (and I really cannot
disagree) that it's not really harder to just add the conversions by
hand, and doesn't require any special compilers etc.  I remember wishing
for something like the above when Linux first got ported to other
architectures, but in the end we're probably better off having had to
think about the issues instead of just letting the compiler do much of
the work. 

		Linus

[ Asbestos suit: ON ] And hey, it's not inconceivable that big-endian
will be only a historical remnant in another ten years. [ Evil grin ]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Big-endian Gcc on Intel IA32
  2001-12-17 13:10 ` Linus Torvalds
@ 2001-12-17 14:00   ` Alan Lehotsky
  0 siblings, 0 replies; 30+ messages in thread
From: Alan Lehotsky @ 2001-12-17 14:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dewar, gcc

At 1:04 PM -0800 12/17/01, Linus Torvalds wrote:

>In article <20011217200410.D28DAF28BE@nile.gnat.com> you write:
>><<My question to you all is:  Is there a big-endian GCC available on IA32?
>>                                       If available, who is the source of
>>contact and what is the effort involved here?
> >>>
>......
>One thing that might be helpful for portability issues like this, where
>the user _is_ willing and able to recompile the application, but maybe
>not able to find all subtly users of byte-order dependencies would be to
>allow the notion of "byte order attributes" on data structures.
>
>This can be especially useful for those architectures that actually have
>at least limited support for either byte-order (ie I think sparc64 has a
>"load as little-endian").


	It ought to be possible to add a peephole to the MD file that matches the inline expansion of
	htonl() or ntohl() and results in the appropriate load-endian instruction....

	I don't think that any data-structure tagging would be necessary (although I admit that there
	might be some utility to making things work without the programmer needing to carefully
	wrap all the accesses and assignments with appropriate XtoY's)

-- Al
-- 
------------------------------------------------------------------------

		    Quality Software Management
		http://home.earthlink.net/~qsmgmt
			apl@alum.mit.edu
			(978)287-0435 Voice
			(978)808-6836 Cell
			(978)287-0436 Fax

	Software Process Improvement and Management Consulting
	     Language Design and Compiler Implementation

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Big-endian Gcc on Intel IA32
@ 2001-12-17 12:00 Ghanta, Bose
  0 siblings, 0 replies; 30+ messages in thread
From: Ghanta, Bose @ 2001-12-17 12:00 UTC (permalink / raw)
  To: 'gcc@gcc.gnu.org'; +Cc: Ghanta, Bose

Dar Gcc members,

  Today we use GCC on PA-RISC a big-endian compiler.  We like it a lot. We
would continue to use it in the current and future products at our company
(Stratus computer inc.).

We are now thinking of a platform migration to Intel IA32 platform.  As you
all know Intel IA32 is a little endian processor family and GCC and all
other products run in little endian format on IA32.  I need to address an
interoperability issue for our customers and big-endian GCC will solve some
part of this problem.

My question to you all is:  Is there a big-endian GCC available on IA32?  
                                       If available, who is the source of
contact and what is the effort involved here?  

Our current product is:

	GCC is a native (big endian) compiler hosted and targeted to a
proprietary OS(VOS) on
	a commodity (PA-RISC) processor.  ELF is the object files.
Proprietary OS (VOS) and
	follows mostly standard ABI.

  Our future plan is: 

	Create GCC as a native (big endian) compiler hosted and targeted to
a proprietary OS (VOS) on
	 a commodity (IA-32) processor. ELF is the object files.
Proprietary OS (VOS) and follows mostly
	 standard ABI.

I would appreciate any help from you here.

My phone number is: (978) 461 7617
      email is:	        Bose_Ghanta@stratus.com

Thank you,
Bose

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2001-12-23 15:08 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-23  7:26 Big-endian Gcc on Intel IA32 dewar
  -- strict thread matches above, loose matches on Subject: below --
2001-12-23  7:06 dewar
2001-12-23  7:08 ` Florian Weimer
2001-12-20  5:36 Etienne Lorrain
2001-12-19 11:47 Bernard Dautrevaux
2001-12-19 13:09 ` Linus Torvalds
2001-12-18 11:41 Morten Welinder
2001-12-18 11:42 ` Phil Edwards
2001-12-18 14:48 ` Linus Torvalds
2001-12-18  3:49 dewar
2001-12-23  6:59 ` Florian Weimer
2001-12-17 18:39 dewar
2001-12-17 18:59 ` Per Bothner
2001-12-17 13:14 dewar
2001-12-17 13:42 ` guerby
2001-12-17 13:43 ` Linus Torvalds
2001-12-17 14:22   ` guerby
2001-12-17 14:52     ` Linus Torvalds
2001-12-17 15:01   ` Richard Henderson
2001-12-17 15:12     ` Linus Torvalds
2001-12-17 15:54       ` Richard Henderson
2001-12-17 17:43         ` Linus Torvalds
2001-12-17 18:12           ` Richard Henderson
2001-12-18 11:55       ` Jason Riedy
2001-12-17 16:43   ` Ross Smith
2001-12-18  1:28 ` Florian Weimer
2001-12-17 12:08 dewar
2001-12-17 13:10 ` Linus Torvalds
2001-12-17 14:00   ` Alan Lehotsky
2001-12-17 12:00 Ghanta, Bose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).