From mboxrd@z Thu Jan 1 00:00:00 1970 From: Craig Burley To: law@cygnus.com Cc: martynas@nm3.ktu.lt, alan@spri.levels.unisa.edu.au, gcc2@cygnus.com, egcs@cygnus.com, gas2@cygnus.com Subject: Re: ASM_COMMENT_START and gas divide operator Date: Thu, 14 May 1998 15:07:00 -0000 Message-id: <199805142206.SAA22413@melange.gnu.org> References: <7019.895176289@hurl.cygnus.com> X-SW-Source: 1998/msg00186.html > > (In HPPA assembler's case, one glaring stupidity was the > > nullified branches, which have completely different semantics > > depending on whether the label being branched to happens to assemble > > to a negative or positive offset! >I've never found this to be strange or stupid at all. The behavior >clearly matches the hardware. Matching the hardware directly is *not* a wise way to design a language, including "low-level" assembly language. Besides, if you look at what I'm asking for here, it's an *explicit* (IMO s/b required) specification of the sign bit on a conditionally nullified branch, instead of letting the sign bit of the assembler-calculated offset govern (instead they'd be checked against each other). That is, for a conditionally nullified branch, you'd (IMO have to) write either "bcond,nf label" to branch *forward* on condition cond or "bcond,nb label" to branch *backward* on that condition. (I think HP uses 'n', SPARC uses 'a', but I've lost track.) Instead what programmers must *think* is "this will be a forward branch" and yet they can only *write* "this will be a branch in some direction", and hope the knowledge of the direction, which crucially affects the semantics (in that it governs under what conditions the subsequent instruction is nullified), is "properly" encoded via the relative placement of the target label! Blecch. That means they can't comfortably write serial code and fill in target labels later on as they see fit, for example. In short, the sign bit of the offset field performs two duties in the hardware. One is the normal duty vis-a-vis the current PC, for which the calculated offset is, of course, a fine source. The other is the rather distinct duty that bit performs when it comes to determining whether the next instruction is to be annulled. Since the programmer has to know what that bit's value is going to be when writing the relevant instructions, a properly designed assembler language would require that explicit specification, rather than relying on subsequent assembler/linker arithmetic done on the target label -- which the programmer *cannot*, to my knowledge, constrain in any direct way. After all, if you try a thought-exercise whereby you assume the two functions were *separately encoded* -- one bit for annull "direction", one for offset sign -- you'd realize pretty quickly that a) such a separation would be quite useful, if the ISA had room for it, and b) the current syntax doesn't support the separation, even optionally. This might seem like a minor nit, but, coming from a 128-bit VLIW background, I know firsthand how important it is to work with a reliable assembler *and* assembly language. Having the language "help" by guessing at what you *must* know you mean is bad enough (as you point out regarding the silently-zero-filled fields), but having it *refuse* any direct, coherent specification of what you *must* know you mean is a classic example of bad language design. >What is horrible is that the next linear instruction is actually >a backwards branch as far as the hardware is concerned. This leads >to the infamous "empty if/else block bug" on the PA with gcc-2.7.*. I have no problem with how the hardware works, but maybe I'm not aware of the problem to which you're referring. In fact, maybe we're discussing two different things! What I *do* have a problem with is that code like this (and I'm probably going to get the basic instruction mnemonics wrong, because I haven't done HPPA RISC coding for well over a year now)... ... ret ; Unconditional return, no fall-through mylabel: ld ... ... b finishup ; Unconditional branch, no fall-through othercode: ... ...can work just fine. Then you decide "hmm, for I-cache or whatever reasons, I'd like to move the clearly distinct block at mylabel somewhere else". This is just the sort of code movement that's often done in any decent programming language, including pretty much every assembler I've ever worked with, where the worst thing that can *normally* happen is that the assembler or linker complains that a label is now "out of range" of a branch's offset field. (Imagine how you'd feel if, instead of "out of range", all the assemblers and linkers we used silently truncated the offset to fit! That's roughly the kind of problem I'm talking about here.) And, from every examination of the code an assembly programmer is likely to make to verify that such a move is okay, there's no way to realize that the code *breaks* because some *other* code does a conditionally nullified branch to it, and the reason that branch is now broken because: - mylabel used to be a backward branch, but now is forward, or vice versa, so the semantics of the conditionally nullified *branch* have changed, in terms of deciding at run time whether the next instruction is nullified, and - The stupid assembler syntax gave the programmer *no way* to directly encode that knowledge he *did* have at the moment he wrote that now-broken branch -- namely, whether the branch was forward or backward! (IMO, the knowledge should be *required*, but at least *allowing* it would be a huge improvement, assuming the assembler and linker error-checked it.) In other words, an apparently innocuous change in one part of a module breaks code in a completely different part of the module. If you think silently filling in zeros leads to subtle and hard-to-grep-for bugs, try coping with the even rarer, even more subtle bugs *this* kind of thing leads to, when "grep" is of limited help (since it can't read your mind about what directions were intended, and can't complain about "missing fields" when you're not even *allowed* to fill in the fields by the assembler!). I'm not in any way advocating "assembler for dummies", since tracking things like register usage can confuse almost anyone (though at least they can use long "mangled" names to avoid some such problems; in short, such problems result from non-local context and thus are not the fault of local syntax). What I am advocating is "no more assembler language design *by* dummies". ;-) In this case, I'd say the summary of the linguistic bug is that a crucial bit of semantic knowledge that applies purely locally is derived from a combination of local and remote information. That is, it's the relationship between the branch instruction and its (perhaps distant in the source code) target that determines how that branch instruction will behave *locally*, e.g. even if it never jumps to the remote target of the branch! It's okay that the *hardware* behaves that way, because bits is bits, but the assembly *language* must give the programmer a way to specify the knowledge he must have in his head using entirely *local* syntax (to correspond to the local semantics). And, I'm nearly 100% sure that, if a "flag day" happened whereby all HPPA RISC assemblers started requiring these forward/backward notations, and every bit of code was modified to include the *expected* direction, assuming a 100% success rate reading the minds of the original programmer, there'd be a non-zero number of existing bugs discovered in code in production right now. If I had to do any development on an HP-PA RISC machine, I'd first fix this bug in the assembler and disassembler. Or, at least, if I wasn't coding in assembler for the project, in the disassembler, so gdb output would be clearer. I wouldn't let anyone on a project of mine write assembler code without writing it in a fixed assembly language. There's no point having programmers waste time by knowing a thing, having no way to tell the computer what they know, then going through a long debugging session only to find out that they (or someone else) "forgot" that thing when moving some other chunk of code around, and *still* have no way to tell the computer what they learned! Especially since we're talking about one character per relevant instruction, though you can tell by now I must not mind typing much. (Of course, the hacker's solution to this would be to write tools to look at any changes to assembly source, via diff for example, then assemble the old and new sources and make sure that no *other* diffs vis-a-vis sign bits of offsets appeared in the assembled code. ;-) tq vm, (burley)