From mboxrd@z Thu Jan  1 00:00:00 1970
From: Craig Burley <burley@gnu.org>
To: law@cygnus.com
Cc: martynas@nm3.ktu.lt, alan@spri.levels.unisa.edu.au, gcc2@cygnus.com, egcs@cygnus.com, gas2@cygnus.com
Subject: Re: ASM_COMMENT_START and gas divide operator
Date: Thu, 14 May 1998 15:07:00 -0000
Message-id: <199805142206.SAA22413@melange.gnu.org>
References: <7019.895176289@hurl.cygnus.com>
X-SW-Source: 1998/msg00186.html

>  > (In HPPA assembler's case, one glaring stupidity was the
>  > nullified branches, which have completely different semantics
>  > depending on whether the label being branched to happens to assemble
>  > to a negative or positive offset!
>I've never found this to be strange or stupid at all.  The behavior
>clearly matches the hardware.

Matching the hardware directly is *not* a wise way to design a
language, including "low-level" assembly language.  Besides,
if you look at what I'm asking for here, it's an *explicit*
(IMO s/b required) specification of the sign bit on a
conditionally nullified branch, instead of letting the sign bit
of the assembler-calculated offset govern (instead they'd be
checked against each other).

That is, for a conditionally nullified branch, you'd (IMO have to)
write either "bcond,nf label" to branch *forward* on condition cond
or "bcond,nb label" to branch *backward* on that condition.  (I
think HP uses 'n', SPARC uses 'a', but I've lost track.)

Instead what programmers must *think* is "this will be a forward
branch" and yet they can only *write* "this will be a branch in
some direction", and hope the knowledge of the direction, which
crucially affects the semantics (in that it governs under what
conditions the subsequent instruction is nullified), is "properly"
encoded via the relative placement of the target label!  Blecch.
That means they can't comfortably write serial code and fill in
target labels later on as they see fit, for example.

In short, the sign bit of the offset field performs two duties
in the hardware.  One is the normal duty vis-a-vis the current PC,
for which the calculated offset is, of course, a fine source.

The other is the rather distinct duty that bit performs when it
comes to determining whether the next instruction is to be annulled.

Since the programmer has to know what that bit's value is going to
be when writing the relevant instructions, a properly designed
assembler language would require that explicit specification,
rather than relying on subsequent assembler/linker arithmetic
done on the target label -- which the programmer *cannot*, to
my knowledge, constrain in any direct way.

After all, if you try a thought-exercise whereby you assume
the two functions were *separately encoded* -- one bit for
annull "direction", one for offset sign -- you'd realize pretty
quickly that a) such a separation would be quite useful, if
the ISA had room for it, and b) the current syntax doesn't
support the separation, even optionally.

This might seem like a minor nit, but, coming from a 128-bit VLIW
background, I know firsthand how important it is to work with a
reliable assembler *and* assembly language.  Having the language
"help" by guessing at what you *must* know you mean is bad enough
(as you point out regarding the silently-zero-filled fields),
but having it *refuse* any direct, coherent specification of what
you *must* know you mean is a classic example of bad language design.

>What is horrible is that the next linear instruction is actually
>a backwards branch as far as the hardware is concerned.  This leads
>to the infamous "empty if/else block bug" on the PA with gcc-2.7.*.

I have no problem with how the hardware works, but maybe I'm
not aware of the problem to which you're referring.  In fact,
maybe we're discussing two different things!

What I *do* have a problem with is that code like this (and I'm
probably going to get the basic instruction mnemonics wrong,
because I haven't done HPPA RISC coding for well over a year now)...

	...
	ret			; Unconditional return, no fall-through

mylabel:
	ld	...
	...
	b	finishup	; Unconditional branch, no fall-through

othercode:
	...

...can work just fine.

Then you decide "hmm, for I-cache or whatever reasons, I'd like
to move the clearly distinct block at mylabel somewhere else".

This is just the sort of code movement that's often done in any
decent programming language, including pretty much every
assembler I've ever worked with, where the worst thing that
can *normally* happen is that the assembler or linker complains
that a label is now "out of range" of a branch's offset field.

(Imagine how you'd feel if, instead of "out of range", all
the assemblers and linkers we used silently truncated the
offset to fit!  That's roughly the kind of problem I'm talking
about here.)

And, from every examination of the code an assembly programmer
is likely to make to verify that such a move is okay, there's
no way to realize that the code *breaks* because some *other*
code does a conditionally nullified branch to it, and the
reason that branch is now broken because:

  -  mylabel used to be a backward branch, but now is forward,
     or vice versa, so the semantics of the conditionally
     nullified *branch* have changed, in terms of deciding
     at run time whether the next instruction is nullified, and

  -  The stupid assembler syntax gave the programmer *no way*
     to directly encode that knowledge he *did* have at the
     moment he wrote that now-broken branch -- namely, whether
     the branch was forward or backward!

(IMO, the knowledge should be *required*, but at least *allowing*
it would be a huge improvement, assuming the assembler and linker
error-checked it.)

In other words, an apparently innocuous change in one part of a
module breaks code in a completely different part of the module.

If you think silently filling in zeros leads to subtle and
hard-to-grep-for bugs, try coping with the even rarer, even
more subtle bugs *this* kind of thing leads to, when "grep"
is of limited help (since it can't read your mind about what
directions were intended, and can't complain about "missing
fields" when you're not even *allowed* to fill in the fields
by the assembler!).

I'm not in any way advocating "assembler for dummies", since
tracking things like register usage can confuse almost anyone
(though at least they can use long "mangled" names to avoid some
such problems; in short, such problems result from non-local
context and thus are not the fault of local syntax).

What I am advocating is "no more assembler language design *by* dummies".

;-)

In this case, I'd say the summary of the linguistic bug is that
a crucial bit of semantic knowledge that applies purely locally
is derived from a combination of local and remote information.

That is, it's the relationship between the branch instruction and
its (perhaps distant in the source code) target that determines
how that branch instruction will behave *locally*, e.g. even if
it never jumps to the remote target of the branch!  It's okay
that the *hardware* behaves that way, because bits is bits, but
the assembly *language* must give the programmer a way to specify
the knowledge he must have in his head using entirely *local*
syntax (to correspond to the local semantics).

And, I'm nearly 100% sure that, if a "flag day" happened whereby
all HPPA RISC assemblers started requiring these forward/backward
notations, and every bit of code was modified to include the
*expected* direction, assuming a 100% success rate reading the
minds of the original programmer, there'd be a non-zero number of
existing bugs discovered in code in production right now.

If I had to do any development on an HP-PA RISC machine, I'd
first fix this bug in the assembler and disassembler.  Or,
at least, if I wasn't coding in assembler for the project, in
the disassembler, so gdb output would be clearer.  I wouldn't
let anyone on a project of mine write assembler code without
writing it in a fixed assembly language.  There's no point
having programmers waste time by knowing a thing, having no
way to tell the computer what they know, then going through
a long debugging session only to find out that they (or someone
else) "forgot" that thing when moving some other chunk of code
around, and *still* have no way to tell the computer what
they learned!  Especially since we're talking about one
character per relevant instruction, though you can tell by
now I must not mind typing much.

(Of course, the hacker's solution to this would be to write
tools to look at any changes to assembly source, via diff
for example, then assemble the old and new sources and make
sure that no *other* diffs vis-a-vis sign bits of offsets
appeared in the assembled code.  ;-)

        tq vm, (burley)