# The Motivation The assembly language for x86 and x86-64 involves two major variations of syntax: the _Microsoft assembler (MASM) syntax_ and the _GNU assembler (GAS) syntax_. The MASM syntax, also known as the _Intel syntax_, is prescriptive in Intel Software Developer Manual, and is used extensively by many non-GNU tools. The GNU syntax, also known as the _AT&T syntax_, derives from PDP-11 assembly to create Unix, and is default and dominant in the post-Unix world. The advantages of the MASM syntax are: 1. It looks more modern, closer to many other assembly languages, such as ARM, MIPS and RISC-V. 2. It is the syntax in Intel and AMD documentation. The disadvantages of the MASM syntax are: 1. MASM is proprietary software. 2. The syntax has not been formally defined, and causes ambiguity sometimes. For instance, the Intel Software Developer Manual contains this line: ```asm MOV EBX, RAM_START ``` This is ambiguous in two ways. First, it could be interpreted as either of ```asm MOV EBX, OFFSET RAM_START ; `movl $RAM_START, %ebx` MOV EBX, DWORD PTR [RAM_START] ; `movl RAM_START, %ebx` ``` Second, on x86-64 the address might be RIP-relative or absolute, as in ```asm MOV EBX, DWORD PTR [RAM_START] ; x86 absolute ; 8B 1D RAM_START ; `movl RAM_START, %ebx` ; x86-64 RIP-relative ; 8B 1D RAM_START ; `movl RAM_START(%rip), %ebx` ; x86-64 absolute ; 8B 1C 25 RAM_START ; `movl RAM_START, %ebx` ``` The first issue here is solved by interpreting it as an memory reference, but the ambiguity may still arise if the symbol results from a high-level language, such as C. When targeting x86, the Microsoft compiler decorates C identifiers: External names that denote objects or functions with the `__cdecl` or `__stdcall` calling convention are prefixed with an underscore `_`; external names that denote functions with the `__fastcall` or `__vectorcall` calling convention are prefixed with an at symbol `@`. This technique prevents symbols from conflicting with keywords in assembly. But it is no longer the case for x86-64 (as well as ARM and ARM64). If a user declares an external variable with the name `RSI`, the compiler may generate the ambiguous and incorrect ```asm MOV EAX, DWORD PTR [RSI] ; parsed as `movl (%rsi), %eax` ; should have been `movl rsi, %eax` ``` This RFC proposes formalization of the Intel syntax, by disallowing certain constructions, to resolve ambiguity. # The Proposal 1. Indirect references shall always contain a mode specifier. Plain brackets are no longer allowed. ```asm MOV EAX, [RCX] ; invalid: operand size and mode specifier are required MOV EAX, DWORD [RCX] ; invalid: mode specifier is required MOV EAX, DWORD PTR [RCX] ; valid: `movl (%rcx), %eax` VMULPD ZMM0, ZMM1, QWORD BCST [RCX] ; valid: `vmulpd (%rcx){1to8}, %zmm1, %zmm0` LEA RAX, bx[RIP] ; invalid: operand size and mode specifier are required LEA RAX, BYTE PTR bx[RIP] ; valid: `leaq bx(%rip), %rax` ``` 2. Overriding segment registers shall occur before the operand size and mode specifier. ```asm MOV EAX, DWORD PTR CS:[RCX] ; maybe invalid: symbol name cannot contain `:` MOV EAX, CS:DWORD PTR [RCX] ; valid: `movl %cs:(%rcx), %eax` ``` 3. If an identifier follows `PTR`, `BCAST` or `OFFSET`, then it is always treated as a symbol, even when it is a keyword. In other words, only registers are enclosed within brackets. This idea is shared with GAS syntax. ```asm MOV EAX, printf ; invalid: `printf` is not a known register MOV EAX, OFFSET printf ; valid: `movl $printf, %eax` MOV EAX, RCX ; invalid: operand size mismatch MOV EAX, OFFSET RCX ; valid: `movl $RCX, %eax` MOV EAX, DWORD PTR [RCX] ; valid: `movl (%rcx), %eax` MOV EAX, DWORD PTR RCX ; valid: `movl RCX, %eax` MOV EAX, DWORD PTR RCX[RIP+10] ; valid: `movl RCX+10(%rip), %eax` ``` 4. For instructions with a dummy memory operand (`LEA`, `NOP`, etc.) and those with an uncommon size (`FXSAVE`/`FXRSTOR`, `FNSAVE`/`FNRSTOR`, etc.), `BYTE PTR` shall be used. ```asm NOP DWORD PTR [RAX], EAX ; invalid: `BYTE PTR` is requred NOP BYTE PTR [RAX], EAX ; valid: 0F 1F 00 ``` 5. RIP-relative operands must have `RIP` as the base register. ```asm MOV EBX, DWORD PTR foo ; valid: `movl foo, %ebx` ; note: might cause linker errors on x86-64 MOV EBX, DWORD PTR foo[RIP] ; valid: `movl foo(%rip), %ebx` ``` 6. The base, index, scale and displacement parts of a memory operand shall appear uniformly. The displacement comes first, immediately following the mode specifier. If there is at least a base or index register, they are all placed in a pair of square brackets. This idea is also shared with GAS syntax. ```asm MOV ECX, DWORD PTR [RSI+RDI*4+field] ; invalid: `field` is not a known register MOV ECX, DWORD PTR field[RSI+RDI*4] ; valid: `movl field(%rsi,%rdi,4), %ecx` ``` # External Links 1. GCC [Bug 53929 - [meta-bug] -masm=intel with global symbol](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929)