public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* RFC: Formalization of the Intel assembly syntax (PR53929)
@ 2024-01-18  5:34 LIU Hao
  2024-01-18  9:02 ` Fangrui Song
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: LIU Hao @ 2024-01-18  5:34 UTC (permalink / raw)
  To: binutils, GCC Development


[-- Attachment #1.1: Type: text/plain, Size: 1543 bytes --]

Hello,

There hasn't been an solution to https://gcc.gnu.org/PR53929 since almost a dozen years ago, mostly 
due to compatibility with MASM. I was told that the ambiguity of Intel syntax should be classified 
as its own limitation and disrecommendation.

Notwithstanding, I am proposing a permanent solution to this issue, by banning constructions that 
cause ambiguity. This is likely to effect incompatibility with other assemblers, but it should make 
GAS parse the output of GCC flawlessly.


PR53929 contains a known ambiguous construction

    lea	rax, bx[rip]

where `bx` could denote the BX register and causes confusion. The Intel Software Developer Manual 
also contains an ambiguous construction

    MOV EBX, RAM_START

which would look like loading the offset of `RAM_START`. My proposal is that these two constructions 
are ambiguous and should be rejected. The compiler should generate assembly in the unambiguous 
subset, and we can start to implement the assembler to reject the ambiguous ones.

Their are formalized as

    lea rax, BYTE PTR bx[rip]
    mov EBX, DWORD PTR RAM_START

Roughly speaking, anything after `PTR`/`BCST` (and before `[` if any) is considered a symbol even if 
it matches a keyword; any identifier between `[` and `]` is a register and not a symbol.


My complete proposal can be found at 
<https://github.com/lhmouse/mcfgthread/wiki/Formalized-Intel-Syntax-for-x86>. Some ideas actually 
reflect the AT&T syntax. I hope it helps.


-- 
Best regards,
LIU Hao

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-18  5:34 RFC: Formalization of the Intel assembly syntax (PR53929) LIU Hao
@ 2024-01-18  9:02 ` Fangrui Song
  2024-01-18 12:54 ` Jan Beulich
       [not found] ` <DS7PR12MB5765DBF9500DE323DB4A8E29CB712@DS7PR12MB5765.namprd12.prod.outlook.com>
  2 siblings, 0 replies; 19+ messages in thread
From: Fangrui Song @ 2024-01-18  9:02 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On Wed, Jan 17, 2024 at 9:34 PM LIU Hao <lh_mouse@126.com> wrote:
>
> Hello,
>
> There hasn't been an solution to https://gcc.gnu.org/PR53929 since almost a dozen years ago, mostly
> due to compatibility with MASM. I was told that the ambiguity of Intel syntax should be classified
> as its own limitation and disrecommendation.
>
> Notwithstanding, I am proposing a permanent solution to this issue, by banning constructions that
> cause ambiguity. This is likely to effect incompatibility with other assemblers, but it should make
> GAS parse the output of GCC flawlessly.
>
>
> PR53929 contains a known ambiguous construction
>
>     lea rax, bx[rip]
>
> where `bx` could denote the BX register and causes confusion. The Intel Software Developer Manual
> also contains an ambiguous construction
>
>     MOV EBX, RAM_START
>
> which would look like loading the offset of `RAM_START`. My proposal is that these two constructions
> are ambiguous and should be rejected. The compiler should generate assembly in the unambiguous
> subset, and we can start to implement the assembler to reject the ambiguous ones.
>
> Their are formalized as
>
>     lea rax, BYTE PTR bx[rip]
>     mov EBX, DWORD PTR RAM_START
>
> Roughly speaking, anything after `PTR`/`BCST` (and before `[` if any) is considered a symbol even if
> it matches a keyword; any identifier between `[` and `]` is a register and not a symbol.
>
>
> My complete proposal can be found at
> <https://github.com/lhmouse/mcfgthread/wiki/Formalized-Intel-Syntax-for-x86>. Some ideas actually
> reflect the AT&T syntax. I hope it helps.

Thanks for the proposal. I hope that -masm=intel becomes more useful:)

Do you have a list of assembly in the unambiguous cases that fail to
be parsed today as a gas PR?
For example,

% as -msyntax=intel -mnaked-reg <<< 'lea rax, BYTE PTR bxx[rip]' -o
a.o && objdump -d -M intel a.o | grep -A1 '>:'
0000000000000000 <.text>:
   0:   48 8d 05 00 00 00 00    lea    rax,[rip+0x0]        # 0x7
% as -msyntax=intel -mnaked-reg <<< 'lea rax, BYTE PTR bx[rip]' -o a.o
&& objdump -d -M intel a.o | grep -A1 '>:'
{standard input}: Assembler messages:
{standard input}:1: Error: invalid use of register
% as -msyntax=intel -mnaked-reg <<< 'mov EBX, DWORD PTR ebx' -o a.o
{standard input}: Assembler messages:
{standard input}:1: Error: invalid use of register

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-18  5:34 RFC: Formalization of the Intel assembly syntax (PR53929) LIU Hao
  2024-01-18  9:02 ` Fangrui Song
@ 2024-01-18 12:54 ` Jan Beulich
  2024-01-18 16:40   ` LIU Hao
       [not found] ` <DS7PR12MB5765DBF9500DE323DB4A8E29CB712@DS7PR12MB5765.namprd12.prod.outlook.com>
  2 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2024-01-18 12:54 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On 18.01.2024 06:34, LIU Hao wrote:
> My complete proposal can be found at 
> <https://github.com/lhmouse/mcfgthread/wiki/Formalized-Intel-Syntax-for-x86>. Some ideas actually 
> reflect the AT&T syntax. I hope it helps.

I'm sorry, but most of your proposal may even be considered for being
acceptable only if you would gain buy-off from the MASM guys. Anything
MASM treats as valid ought to be permitted by gas as well (within the
scope of certain divergence that cannot be changed in gas without
risking to break people's code). It could probably be considered to
introduce a "strict" mode of Intel syntax, following some / most of
what you propose; making this the default cannot be an option.

Commenting on individual aspects of your proposal is a little difficult,
as you didn't provide the proposal inline (and hence it cannot be easily
used as context in a reply). But to mention the imo worst aspect:
Declaring

	mov	eax, [rcx]

as invalid is a no-go. I also don't see how this would be related to the
issue at hand. What's in the square brackets may as well be a symbol
name, so requiring the "mode specifier" doesn't disambiguate things at
all.

Otoh the "offset" part of point 3 may be possible to accept even by
default, provided (didn't check) that current gas consistently rejects
that (as an invalid use of a register name).

One remark regarding the underlying pattern leading to the issue:
Personally I view it as questionable practice to have extern or static
variables in C code with names as short as register names are. Avoiding
them does not only avoid the issue here, but also is quite likely going
to improve the code (by having more descriptive variable names). And
automatic variables aren't affected aiui, so can remain short (after
all, commonly automatic variable names are as short as a single char).

That said, I can certainly also see how the introduction of new
registers can lead to new conflicts, which isn't nice. Iirc old 32-bit
MASM escaped this problem by requiring architecture extensions to be
explicitly enabled (may have changed in newer MASM). Gas, otoh, enables
everything by default (and I don't see how we could change that).

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-18 12:54 ` Jan Beulich
@ 2024-01-18 16:40   ` LIU Hao
  2024-01-19  9:13     ` Jan Beulich
  2024-01-30  4:22     ` Hans-Peter Nilsson
  0 siblings, 2 replies; 19+ messages in thread
From: LIU Hao @ 2024-01-18 16:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils, GCC Development


[-- Attachment #1.1.1: Type: text/plain, Size: 5549 bytes --]

在 2024-01-18 20:54, Jan Beulich 写道:
> I'm sorry, but most of your proposal may even be considered for being
> acceptable only if you would gain buy-off from the MASM guys. Anything
> MASM treats as valid ought to be permitted by gas as well (within the
> scope of certain divergence that cannot be changed in gas without
> risking to break people's code). It could probably be considered to
> introduce a "strict" mode of Intel syntax, following some / most of
> what you propose; making this the default cannot be an option.

Thanks for your reply.

I have attached the Markdown source for that page, modified a few hours ago. I am planning to make 
some updates according to your advice tomorrow.

And yes, I am proposing a 'strict' mode, however not for humans, only for compilers.

My first message references a GCC bug report, where the problematic symbol `bx` comes from C source. 
I have been aware of the `/APP` and `/NO_APP` markers in generated assembly, so I suspect that GAS 
should be able to tell which parts are generated from a compiler and which parts are composed by 
hand. The proposed strict mode may apply only to the output from GCC, which are much more likely to 
contain bad symbols, but are also more controllable on the GCC side.

I believe that skillful people who write x86 assembly have known that `offset`, `shr`, `si` etc. are 
'bad' names for symbols. Therefore, it's like an issue there.


> Commenting on individual aspects of your proposal is a little difficult,
> as you didn't provide the proposal inline (and hence it cannot be easily
> used as context in a reply). But to mention the imo worst aspect:
> Declaring
> 
> 	mov	eax, [rcx]
> 
> as invalid is a no-go.

I agree. I am considering to declare the lack of a symbol as a special case.


> I also don't see how this would be related to the
> issue at hand. What's in the square brackets may as well be a symbol
> name, so requiring the "mode specifier" doesn't disambiguate things at
> all.

If someone declares a variable called `rcx` in C, it has be translated to

    mov eax, DWORD PTR rcx      # `movl rcx, %eax`

instead of

    mov eax, DWORD PTR [rcx]    # `movl (%rcx), %eax`


> One remark regarding the underlying pattern leading to the issue:
> Personally I view it as questionable practice to have extern or static
> variables in C code with names as short as register names are. Avoiding
> them does not only avoid the issue here, but also is quite likely going
> to improve the code (by having more descriptive variable names). And
> automatic variables aren't affected aiui, so can remain short (after
> all, commonly automatic variable names are as short as a single char).

Yes, we agree that longer, more descriptive names increase maintainability.

However, there are scenarios where maintainability doesn't matter much. For instance, testcases, 
sometimes machine-generated testcases, which are usually short programs, created to address issues 
in something else, and are likely to contain variables with very short names. The register names 
`si` and `es` look especially risky to me.


> That said, I can certainly also see how the introduction of new
> registers can lead to new conflicts, which isn't nice. Iirc old 32-bit
> MASM escaped this problem by requiring architecture extensions to be
> explicitly enabled (may have changed in newer MASM). Gas, otoh, enables
> everything by default (and I don't see how we could change that).

I confess! I haven't done much investigation about these compilers, and all stuff hereinafter is my 
presumption.


Given this C source:

    extern int rdx;
    int get_value() { return rdx;  }


I try to compile it directly to an object file, with MSVC, Clang and GCC:

    > cl /nologo /c test.c && echo Success
    test.c
    Success

    > clang -masm=intel -c test.c && echo Success
    Success

    > gcc -masm=intel -c test.c && echo Success
    C:\Users\lh_mouse\AppData\Local\Temp\ccjcy1Qj.s: Assembler messages:
    C:\Users\lh_mouse\AppData\Local\Temp\ccjcy1Qj.s:23: Error: invalid use of register
    C:\Users\lh_mouse\AppData\Local\Temp\ccjcy1Qj.s:23: Warning: register value used as expression


but if I compile it to assembly first, then assemble the result to an object file:

    > cl /nologo /c test.c /Fatest.asm && ml64 /nologo /c test.asm && echo success
    test.c
     Assembling: test.asm
    test.asm(9) : error A2008:syntax error : rdx
    test.asm(15) : error A2032:invalid use of register

    > clang -masm=intel -S test.c -o test.s && clang -masm=intel test.s && echo Success
    test.s:26:8: error: expected relocatable expression
            .quad   rdx
                    ^

    > gcc -masm=intel -S test.c -o test.s && gcc -masm=intel test.s && echo Success
    test.s: Assembler messages:
    test.s:23: Error: invalid use of register
    test.s:23: Warning: register value used as expression


It looks to me that both MSVC and Clang have integrated assemblers, so their compiler outputs do not 
really turn into assembly code before finally becoming target code. This approach is not subject to 
the ambiguity.

As GCC still relies on GAS to produce object files, (as stated in the first paragraph,) it might 
make some sense to implement a strict mode on outputs from GCC to resolve the potential ambiguity, 
while still providing a permissive mode for inline or handwritten assembly.


-- 
Best regards,
LIU Hao


[-- Attachment #1.1.2: Formalized-Intel-Syntax-for-x86.md.txt --]
[-- Type: text/plain, Size: 5468 bytes --]

# The Motivation

The assembly language for x86 and x86-64 involves two major variations of syntax: the _Microsoft assembler (MASM) syntax_ and the _GNU assembler (GAS) syntax_. The MASM syntax, also known as the _Intel syntax_, is prescriptive in Intel Software Developer Manual, and is used extensively by many non-GNU tools. The GNU syntax, also known as the _AT&T syntax_, derives from PDP-11 assembly to create Unix, and is default and dominant in the post-Unix world.

The advantages of the MASM syntax are:
1. It looks more modern, closer to many other assembly languages, such as ARM, MIPS and RISC-V.
2. It is the syntax in Intel and AMD documentation.

The disadvantages of the MASM syntax are:
1. MASM is proprietary software.
2. The syntax has not been formally defined, and causes ambiguity sometimes.

For instance, the Intel Software Developer Manual contains this line:
```asm
MOV EBX, RAM_START
```

This is ambiguous in two ways. First, it could be interpreted as either of
```asm
MOV EBX, OFFSET RAM_START         ; `movl $RAM_START, %ebx`
MOV EBX, DWORD PTR [RAM_START]    ; `movl RAM_START, %ebx`
```

Second, on x86-64 the address might be RIP-relative or absolute, as in
```asm
MOV EBX, DWORD PTR [RAM_START]
          ; x86    absolute       ; 8B 1D    RAM_START   ; `movl RAM_START, %ebx`
          ; x86-64 RIP-relative   ; 8B 1D    RAM_START   ; `movl RAM_START(%rip), %ebx`
          ; x86-64 absolute       ; 8B 1C 25 RAM_START   ; `movl RAM_START, %ebx`
```

The first issue here is solved by interpreting it as an memory reference, but the ambiguity may still arise if the symbol results from a high-level language, such as C. When targeting x86, the Microsoft compiler decorates C identifiers: External names that denote objects or functions with the `__cdecl` or `__stdcall` calling convention are prefixed with an underscore `_`; external names that denote functions with the `__fastcall` or `__vectorcall` calling convention are prefixed with an at symbol `@`. This technique prevents symbols from conflicting with keywords in assembly.

But it is no longer the case for x86-64 (as well as ARM and ARM64). If a user declares an external variable with the name `RSI`, the compiler may generate the ambiguous and incorrect
```asm
MOV EAX, DWORD PTR [RSI]    ; parsed as `movl (%rsi), %eax`
                            ; should have been `movl rsi, %eax`
```

This RFC proposes formalization of the Intel syntax, by disallowing certain constructions, to resolve ambiguity.

# The Proposal

1. Indirect references shall always contain a mode specifier. Plain brackets are no longer allowed.
    ```asm
    MOV EAX, [RCX]                         ; invalid: operand size and mode specifier are required
    MOV EAX, DWORD [RCX]                   ; invalid: mode specifier is required
    MOV EAX, DWORD PTR [RCX]               ; valid: `movl (%rcx), %eax`
    VMULPD ZMM0, ZMM1, QWORD BCST [RCX]    ; valid: `vmulpd (%rcx){1to8}, %zmm1, %zmm0`
    LEA RAX, bx[RIP]                       ; invalid: operand size and mode specifier are required
    LEA RAX, BYTE PTR bx[RIP]              ; valid: `leaq bx(%rip), %rax`
    ```

2. Overriding segment registers shall occur before the operand size and mode specifier.
    ```asm
    MOV EAX, DWORD PTR CS:[RCX]            ; maybe invalid: symbol name cannot contain `:`
    MOV EAX, CS:DWORD PTR [RCX]            ; valid: `movl %cs:(%rcx), %eax`
    ```

3. If an identifier follows `PTR`, `BCAST` or `OFFSET`, then it is always treated as a symbol, even when it is a keyword. In other words, only registers are enclosed within brackets. This idea is shared with GAS syntax.
    ```asm
    MOV EAX, printf                        ; invalid: `printf` is not a known register
    MOV EAX, OFFSET printf                 ; valid: `movl $printf, %eax`
    MOV EAX, RCX                           ; invalid: operand size mismatch
    MOV EAX, OFFSET RCX                    ; valid: `movl $RCX, %eax`
    MOV EAX, DWORD PTR [RCX]               ; valid: `movl (%rcx), %eax`
    MOV EAX, DWORD PTR RCX                 ; valid: `movl RCX, %eax`
    MOV EAX, DWORD PTR RCX[RIP+10]         ; valid: `movl RCX+10(%rip), %eax`
    ```

4. For instructions with a dummy memory operand (`LEA`, `NOP`, etc.) and those with an uncommon size (`FXSAVE`/`FXRSTOR`, `FNSAVE`/`FNRSTOR`, etc.), `BYTE PTR` shall be used.
    ```asm
    NOP DWORD PTR [RAX], EAX               ; invalid: `BYTE PTR` is requred
    NOP BYTE PTR [RAX], EAX                ; valid: 0F 1F 00
    ```

5. RIP-relative operands must have `RIP` as the base register.
    ```asm
    MOV EBX, DWORD PTR foo                 ; valid: `movl foo, %ebx`
                                           ; note: might cause linker errors on x86-64
    MOV EBX, DWORD PTR foo[RIP]            ; valid: `movl foo(%rip), %ebx`
    ```

6. The base, index, scale and displacement parts of a memory operand shall appear uniformly. The displacement comes first, immediately following the mode specifier. If there is at least a base or index register, they are all placed in a pair of square brackets. This idea is also shared with GAS syntax.
    ```asm
    MOV ECX, DWORD PTR [RSI+RDI*4+field]   ; invalid: `field` is not a known register
    MOV ECX, DWORD PTR field[RSI+RDI*4]    ; valid: `movl field(%rsi,%rdi,4), %ecx`
    ```

# External Links

1. GCC [Bug 53929 - [meta-bug] -masm=intel with global symbol](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
       [not found] ` <DS7PR12MB5765DBF9500DE323DB4A8E29CB712@DS7PR12MB5765.namprd12.prod.outlook.com>
@ 2024-01-19  1:42   ` LIU Hao
  2024-01-19  7:41     ` Jan Beulich
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: LIU Hao @ 2024-01-19  1:42 UTC (permalink / raw)
  To: Fangrui Song; +Cc: binutils, GCC Development


[-- Attachment #1.1: Type: text/plain, Size: 1106 bytes --]

在 2024-01-18 17:02, Fangrui Song 写道:
> Thanks for the proposal. I hope that -masm=intel becomes more useful:)
> 
> Do you have a list of assembly in the unambiguous cases that fail to
> be parsed today as a gas PR?
> For example,

Not really. Most of these are results from high-level languages. For example:

    # Expected: `movl shr(%rip), %eax`
    # Actual: error: invalid use of operator "shr"
    mov eax, DWORD PTR shr[rip]

    # Expected: `movl dword(%rip), %eax`
    # Actual: accepted as `movl 4(%rip), %eax`
    mov eax, DWORD ptr dword[rip]


In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to `.intel_syntax noprefix`:

    $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o
    {standard input}: Assembler messages:
    {standard input}:1: Error: invalid use of register

    $ as <<< '.intel_syntax noprefix;  mov eax, DWORD PTR gs:0x48' -o a.o && objdump -Mintel -d a.o
    ...
    0000000000000000 <.text>:
       0:	65 8b 04 25 48 00 00 	mov    eax,DWORD PTR gs:0x48



-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-19  1:42   ` LIU Hao
@ 2024-01-19  7:41     ` Jan Beulich
  2024-01-19  8:19     ` Fangrui Song
       [not found]     ` <DS7PR12MB5765654642BE3AD4C7F54E05CB702@DS7PR12MB5765.namprd12.prod.outlook.com>
  2 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2024-01-19  7:41 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development, Fangrui Song

On 19.01.2024 02:42, LIU Hao wrote:
> In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to `.intel_syntax noprefix`:
> 
>     $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o
>     {standard input}: Assembler messages:
>     {standard input}:1: Error: invalid use of register
> 
>     $ as <<< '.intel_syntax noprefix;  mov eax, DWORD PTR gs:0x48' -o a.o && objdump -Mintel -d a.o
>     ...
>     0000000000000000 <.text>:
>        0:	65 8b 04 25 48 00 00 	mov    eax,DWORD PTR gs:0x48

This (the error above) looks like a bug to me; I'll look into where this
odd difference in behavior is coming from.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-19  1:42   ` LIU Hao
  2024-01-19  7:41     ` Jan Beulich
@ 2024-01-19  8:19     ` Fangrui Song
       [not found]     ` <DS7PR12MB5765654642BE3AD4C7F54E05CB702@DS7PR12MB5765.namprd12.prod.outlook.com>
  2 siblings, 0 replies; 19+ messages in thread
From: Fangrui Song @ 2024-01-19  8:19 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On Thu, Jan 18, 2024 at 5:42 PM LIU Hao <lh_mouse@126.com> wrote:
>
> 在 2024-01-18 17:02, Fangrui Song 写道:
> > Thanks for the proposal. I hope that -masm=intel becomes more useful:)
> >
> > Do you have a list of assembly in the unambiguous cases that fail to
> > be parsed today as a gas PR?
> > For example,
>
> Not really. Most of these are results from high-level languages. For example:
>
>     # Expected: `movl shr(%rip), %eax`
>     # Actual: error: invalid use of operator "shr"
>     mov eax, DWORD PTR shr[rip]
>
>     # Expected: `movl dword(%rip), %eax`
>     # Actual: accepted as `movl 4(%rip), %eax`
>     mov eax, DWORD ptr dword[rip]

GCC seems to print a symbol displacement, possibly with a modifier
(for a relocation), before the left bracket.

mov edx, DWORD PTR bx@GOT[eax]
mov edx, DWORD PTR bx[eax]
mov edx, DWORD PTR and[eax]    # Error: invalid use of operator "and"

Technically, assemblers (gas and LLVM integrated assembler) can be
made to parse "bx" as a symbol, even if it matches a register name or
an operator name ("and").
However, a straightforward approach using one lookahead token cannot
disambiguate the following two cases.

mov edx, DWORD PTR fs:[eax]   # segment override prefix
mov edx, DWORD PTR fs[eax]    # symbol

So, we would need two lookahead tokens...
(https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L2534-L2550
needs more code to parse `fs:` correctly.)

It is also unfortunate that whether the displacement is an immediate
or not change the behavior of brackets.

mov eax, DWORD PTR 0          # mov    $0x0,%eax
mov eax, DWORD PTR [0]        # mov    0x0,%eax
mov eax, DWORD PTR sym        # mov    0x0,%eax with relocation
mov eax, DWORD PTR [sym]      # mov    0x0,%eax with relocation

The above reveals yet another inconsistency. For a memory reference,
it seems that we should use [] but [sym] could be ambiguous if sym
matches a register name or operator name.

Does the proposal change the placement of the displacement depending
on whether it is an immediate?
This is inconsistent, but perhaps there is not much we can improve...

extern int a[2];
int foo() { return a[1]+a[2]; }

GCC's PIC -masm=intel output

        mov     eax, DWORD PTR a[rip+8]
        add     eax, DWORD PTR a[rip+4]

The displacements (a+8 and a+4) involve a plus expression and `a` and
`8`/`4` are printed in two places.

> In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to `.intel_syntax noprefix`:
>
>     $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o
>     {standard input}: Assembler messages:
>     {standard input}:1: Error: invalid use of register
>
>     $ as <<< '.intel_syntax noprefix;  mov eax, DWORD PTR gs:0x48' -o a.o && objdump -Mintel -d a.o
>     ...
>     0000000000000000 <.text>:
>        0:       65 8b 04 25 48 00 00    mov    eax,DWORD PTR gs:0x48

Confirmed by Jan.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-18 16:40   ` LIU Hao
@ 2024-01-19  9:13     ` Jan Beulich
  2024-01-20 12:40       ` LIU Hao
  2024-01-30  4:22     ` Hans-Peter Nilsson
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2024-01-19  9:13 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On 18.01.2024 17:40, LIU Hao wrote:
> 在 2024-01-18 20:54, Jan Beulich 写道:
>> I'm sorry, but most of your proposal may even be considered for being
>> acceptable only if you would gain buy-off from the MASM guys. Anything
>> MASM treats as valid ought to be permitted by gas as well (within the
>> scope of certain divergence that cannot be changed in gas without
>> risking to break people's code). It could probably be considered to
>> introduce a "strict" mode of Intel syntax, following some / most of
>> what you propose; making this the default cannot be an option.
> 
> Thanks for your reply.
> 
> I have attached the Markdown source for that page, modified a few hours ago. I am planning to make 
> some updates according to your advice tomorrow.

Just to mention it: Attaching is in no way better than providing a link,
commenting-wise.

> And yes, I am proposing a 'strict' mode, however not for humans, only for compilers.
> 
> My first message references a GCC bug report, where the problematic symbol `bx` comes from C source. 
> I have been aware of the `/APP` and `/NO_APP` markers in generated assembly, so I suspect that GAS 
> should be able to tell which parts are generated from a compiler and which parts are composed by 
> hand. The proposed strict mode may apply only to the output from GCC, which are much more likely to 
> contain bad symbols, but are also more controllable on the GCC side.
> 
> I believe that skillful people who write x86 assembly have known that `offset`, `shr`, `si` etc. are 
> 'bad' names for symbols. Therefore, it's like an issue there.
> 
> 
>> Commenting on individual aspects of your proposal is a little difficult,
>> as you didn't provide the proposal inline (and hence it cannot be easily
>> used as context in a reply). But to mention the imo worst aspect:
>> Declaring
>>
>> 	mov	eax, [rcx]
>>
>> as invalid is a no-go.
> 
> I agree. I am considering to declare the lack of a symbol as a special case.

Well, I took this as the simplest example. But clearly there should never
be a need for an assembly programmer to needlessly write "dword ptr" or
alike, when operand size is unambiguous. Limiting "strict mode" to compiler
output would take away concerns in this regard (as machine generated
assembly has no issue with uniformly adding such redundant specifiers, much
like in AT&T mode suffixes would typically be emitted even when not needed).
But I see a severe issue with your aim at confining strict mode to
compiler generated code only: In inline assembly (see your mentioning of
APP / NO_APP above) you still potentially reference C symbols. So the
ambiguities don't disappear in APP / NO_APP regions.

>> I also don't see how this would be related to the
>> issue at hand. What's in the square brackets may as well be a symbol
>> name, so requiring the "mode specifier" doesn't disambiguate things at
>> all.
> 
> If someone declares a variable called `rcx` in C, it has be translated to
> 
>     mov eax, DWORD PTR rcx      # `movl rcx, %eax`
> 
> instead of
> 
>     mov eax, DWORD PTR [rcx]    # `movl (%rcx), %eax`

And an array happening to be indexed by rcx would then result in

    mov eax, DWORD PTR rcx[rcx]    # `movl rcx(%rcx), %eax`

? That's going to be confusing at best. I think this whole issue needs
taking care of differently, and iirc I did already suggest an alternative
in one of the bugzilla entries involved: Potentially ambiguous names
(which to a compiler may mean: all symbol names) ought to simply be
quoted, and it ought to be specified that quoted symbols are never
registers. Iirc this will require gas changes, yes, but it'll address all
ambiguities afaict.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
       [not found]     ` <DS7PR12MB5765654642BE3AD4C7F54E05CB702@DS7PR12MB5765.namprd12.prod.outlook.com>
@ 2024-01-20 12:32       ` LIU Hao
  0 siblings, 0 replies; 19+ messages in thread
From: LIU Hao @ 2024-01-20 12:32 UTC (permalink / raw)
  To: Fangrui Song; +Cc: binutils, GCC Development


[-- Attachment #1.1: Type: text/plain, Size: 1653 bytes --]

在 2024-01-19 16:19, Fangrui Song 写道:
> It is also unfortunate that whether the displacement is an immediate
> or not change the behavior of brackets.
> 
> mov eax, DWORD PTR 0          # mov    $0x0,%eax
> mov eax, DWORD PTR [0]        # mov    0x0,%eax
> mov eax, DWORD PTR sym        # mov    0x0,%eax with relocation
> mov eax, DWORD PTR [sym]      # mov    0x0,%eax with relocation
> 
> The above reveals yet another inconsistency. For a memory reference,
> it seems that we should use [] but [sym] could be ambiguous if sym
> matches a register name or operator name.

This is sort of tautology, as `DWORD PTR` and `[]` means the same thing. It's unfortunate that 
neither could replace the other; not to mention the `DWORD BCST` thing in AVX512.

That is to say, these two

    mov eax, DWORD PTR 0
    mov eax, DWORD PTR [0]

should denote the same operation. This is more useful with a segment override, to access 
thread-local data, as in

   mov rax, QWORD PTR 0         #    48 8B0425 00000000
   mov rax, QWORD PTR gs:0      # 65 48 8B0425 00000000


> Does the proposal change the placement of the displacement depending
> on whether it is an immediate?
> This is inconsistent, but perhaps there is not much we can improve...

The proposal is about elimination of ambiguity. Roughly speaking, an identifier that follows `PTR`, 
`BCST` or `OFFSET` is to be interpreted as a symbol, and an identifier that appears with in a pair 
of brackets is to be interpreted as a register.

As a numeric displacement does not cause ambiguity, it can be accepted either way.



-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-19  9:13     ` Jan Beulich
@ 2024-01-20 12:40       ` LIU Hao
  2024-01-22  8:39         ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: LIU Hao @ 2024-01-20 12:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils, GCC Development


[-- Attachment #1.1: Type: text/plain, Size: 1868 bytes --]

在 2024-01-19 17:13, Jan Beulich 写道:
> But I see a severe issue with your aim at confining strict mode to
> compiler generated code only: In inline assembly (see your mentioning of
> APP / NO_APP above) you still potentially reference C symbols. So the
> ambiguities don't disappear in APP / NO_APP regions.

My suggestion is that people who write inline assembly should have been aware of the existence of 
bad names, and should have been careful to avoid them.


> And an array happening to be indexed by rcx would then result in
> 
>      mov eax, DWORD PTR rcx[rcx]    # `movl rcx(%rcx), %eax`
> 
> ? That's going to be confusing at best. 

This is always confusing, no matter how it is written.

> I think this whole issue needs
> taking care of differently, and iirc I did already suggest an alternative
> in one of the bugzilla entries involved: Potentially ambiguous names
> (which to a compiler may mean: all symbol names) ought to simply be
> quoted, and it ought to be specified that quoted symbols are never
> registers. Iirc this will require gas changes, yes, but it'll address all
> ambiguities afaict.

The OP of GCC PR53929 said that 'the problem does _not_ go away even if I quote the symbol name by 
hand in the assembly output' which was 12 years ago. I tried my local installation and quoting the 
symbol turned out to avoid the issue:

    > as --version
    GNU assembler (GNU Binutils) 2.41.0.20240108

    > cat test.s
    .intel_syntax noprefix
    lea     rax, "bx"[rip]

    > as test.s -o test.o

    > objdump -d test.o
    test.o:     file format pe-x86-64
    (...)
       0:   48 8d 05 00 00 00 00    lea    rax,[rip+0x0]        # 7 <.text+0x7>
       7:   90                      nop


  So I think I had better try my patch in the next few days.


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-20 12:40       ` LIU Hao
@ 2024-01-22  8:39         ` Jan Beulich
  2024-01-23  1:27           ` LIU Hao
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2024-01-22  8:39 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On 20.01.2024 13:40, LIU Hao wrote:
> 在 2024-01-19 17:13, Jan Beulich 写道:
>> But I see a severe issue with your aim at confining strict mode to
>> compiler generated code only: In inline assembly (see your mentioning of
>> APP / NO_APP above) you still potentially reference C symbols. So the
>> ambiguities don't disappear in APP / NO_APP regions.
> 
> My suggestion is that people who write inline assembly should have been aware of the existence of 
> bad names, and should have been careful to avoid them.
> 
> 
>> And an array happening to be indexed by rcx would then result in
>>
>>      mov eax, DWORD PTR rcx[rcx]    # `movl rcx(%rcx), %eax`
>>
>> ? That's going to be confusing at best. 
> 
> This is always confusing, no matter how it is written.
> 
>> I think this whole issue needs
>> taking care of differently, and iirc I did already suggest an alternative
>> in one of the bugzilla entries involved: Potentially ambiguous names
>> (which to a compiler may mean: all symbol names) ought to simply be
>> quoted, and it ought to be specified that quoted symbols are never
>> registers. Iirc this will require gas changes, yes, but it'll address all
>> ambiguities afaict.
> 
> The OP of GCC PR53929 said that 'the problem does _not_ go away even if I quote the symbol name by 
> hand in the assembly output' which was 12 years ago. I tried my local installation and quoting the 
> symbol turned out to avoid the issue:
> 
>     > as --version
>     GNU assembler (GNU Binutils) 2.41.0.20240108
> 
>     > cat test.s
>     .intel_syntax noprefix
>     lea     rax, "bx"[rip]
> 
>     > as test.s -o test.o
> 
>     > objdump -d test.o
>     test.o:     file format pe-x86-64
>     (...)
>        0:   48 8d 05 00 00 00 00    lea    rax,[rip+0x0]        # 7 <.text+0x7>
>        7:   90                      nop

Right, I did some work in that direction a while ago. But iirc there are
still cases left to be addressed.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-22  8:39         ` Jan Beulich
@ 2024-01-23  1:27           ` LIU Hao
  2024-01-23  8:38             ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: LIU Hao @ 2024-01-23  1:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils, GCC Development


[-- Attachment #1.1.1: Type: text/plain, Size: 1161 bytes --]

在 2024-01-22 16:39, Jan Beulich 写道:
> Right, I did some work in that direction a while ago. But iirc there are
> still cases left to be addressed.

Attached is a draft patch for GCC, bootstrapped on {i686,x86_64}-w64-mingw32 with GCC 13.2 and 
binutils 2.41.0.

This addresses the issue when a bad name exists in the same translation unit. In the case of an 
external symbol there's still an error:

```
extern int bx;
int get(const char* p) { return p[bx]; }
```

```
lh_mouse@lhmouse-pc ~/Desktop $ x86_64-w64-mingw32-gcc -S -o - -masm=intel test.c | fgrep bx
         mov     rax, QWORD PTR .refptr.bx[rip]
         .section        .rdata$.refptr.bx, "dr"
         .globl  .refptr.bx
.refptr.bx:
         .quad   bx
lh_mouse@lhmouse-pc ~/Desktop $ x86_64-w64-mingw32-gcc  -masm=intel test.c | fgrep bx
C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s: Assembler messages:
C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s:29: Error: invalid use of register
C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s:29: Warning: register value used as expression
lh_mouse@lhmouse-pc ~/Desktop $
```




-- 
Best regards,
LIU Hao


[-- Attachment #1.1.2: 0401-Always-quote-labels-in-Intel-syntax.patch --]
[-- Type: text/x-patch, Size: 1325 bytes --]

From 2579afab42b90dceac860114acbad1ab79bca979 Mon Sep 17 00:00:00 2001
From: LIU Hao <lh_mouse@126.com>
Date: Tue, 23 Jan 2024 02:20:29 +0800
Subject: [PATCH] Always quote symbols in Intel syntax

---
 gcc/config/i386/i386.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 539083f2fbf8..785c6eda8d55 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2175,7 +2175,7 @@ extern int const svr4_debugger_register_map[FIRST_PSEUDO_REGISTER];
 #define ASM_OUTPUT_SYMBOL_REF(FILE, SYM) \
   do {							\
     const char *name					\
-      = assemble_name_resolve (XSTR (x, 0));		\
+      = assemble_name_resolve (XSTR (SYM, 0));		\
     /* In -masm=att wrap identifiers that start with $	\
        into parens.  */					\
     if (ASSEMBLER_DIALECT == ASM_ATT			\
@@ -2186,6 +2186,13 @@ extern int const svr4_debugger_register_map[FIRST_PSEUDO_REGISTER];
 	assemble_name_raw ((FILE), name);		\
 	fputc (')', (FILE));				\
       }							\
+    else if (ASSEMBLER_DIALECT == ASM_INTEL		\
+	&& name[0] != '*')				\
+      {							\
+	fputc ('\"', (FILE));				\
+	assemble_name_raw ((FILE), name);		\
+	fputc ('\"', (FILE));				\
+      }							\
     else						\
       assemble_name_raw ((FILE), name);			\
   } while (0)
-- 
2.43.0


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-23  1:27           ` LIU Hao
@ 2024-01-23  8:38             ` Jan Beulich
  2024-01-23  9:00               ` LIU Hao
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2024-01-23  8:38 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On 23.01.2024 02:27, LIU Hao wrote:
> 在 2024-01-22 16:39, Jan Beulich 写道:
>> Right, I did some work in that direction a while ago. But iirc there are
>> still cases left to be addressed.
> 
> Attached is a draft patch for GCC, bootstrapped on {i686,x86_64}-w64-mingw32 with GCC 13.2 and 
> binutils 2.41.0.

Right, but this is very "draft". You can't blindly assume the gas you use
actually can deal with quotation.

> This addresses the issue when a bad name exists in the same translation unit. In the case of an 
> external symbol there's still an error:
> 
> ```
> extern int bx;
> int get(const char* p) { return p[bx]; }
> ```
> 
> ```
> lh_mouse@lhmouse-pc ~/Desktop $ x86_64-w64-mingw32-gcc -S -o - -masm=intel test.c | fgrep bx
>          mov     rax, QWORD PTR .refptr.bx[rip]
>          .section        .rdata$.refptr.bx, "dr"
>          .globl  .refptr.bx
> .refptr.bx:
>          .quad   bx

Sure, this one needs quoting then, too.

Jan

> lh_mouse@lhmouse-pc ~/Desktop $ x86_64-w64-mingw32-gcc  -masm=intel test.c | fgrep bx
> C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s: Assembler messages:
> C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s:29: Error: invalid use of register
> C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s:29: Warning: register value used as expression
> lh_mouse@lhmouse-pc ~/Desktop $
> ```
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-23  8:38             ` Jan Beulich
@ 2024-01-23  9:00               ` LIU Hao
  2024-01-23  9:03                 ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: LIU Hao @ 2024-01-23  9:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils, GCC Development


[-- Attachment #1.1: Type: text/plain, Size: 955 bytes --]

在 2024-01-23 16:38, Jan Beulich 写道:
> Right, but this is very "draft". You can't blindly assume the gas you use
> actually can deal with quotation.

Let's assume that for the time being, but there's something else; see below.


>> .refptr.bx:
>>           .quad   bx
> 
> Sure, this one needs quoting then, too.

The attached patch contains `&& name[0] != '*'` with a reason: In the function `assemble_name_raw` 
in 'gcc/varasm.cc', if `name` starts with a `*`, then its remaining part is output without 
decoration. I have no idea what `*` means; this `.quad bx` thing apparently results from something like

    assemble_name_raw (file, "*bx");

Quoting this would break the i686 DWARF2 code, which may contain an arithmetic expression like

    .long LXXYY-1    # "LXXYY" minus one

If it was quoted like `.long "LXXYY-1"`, it would mean something very different and cause linker errors.


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-23  9:00               ` LIU Hao
@ 2024-01-23  9:03                 ` Jan Beulich
  2024-01-23  9:21                   ` LIU Hao
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2024-01-23  9:03 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On 23.01.2024 10:00, LIU Hao wrote:
> 在 2024-01-23 16:38, Jan Beulich 写道:
>> Right, but this is very "draft". You can't blindly assume the gas you use
>> actually can deal with quotation.
> 
> Let's assume that for the time being, but there's something else; see below.
> 
> 
>>> .refptr.bx:
>>>           .quad   bx
>>
>> Sure, this one needs quoting then, too.
> 
> The attached patch contains `&& name[0] != '*'` with a reason: In the function `assemble_name_raw` 
> in 'gcc/varasm.cc', if `name` starts with a `*`, then its remaining part is output without 
> decoration. I have no idea what `*` means; this `.quad bx` thing apparently results from something like
> 
>     assemble_name_raw (file, "*bx");
> 
> Quoting this would break the i686 DWARF2 code, which may contain an arithmetic expression like
> 
>     .long LXXYY-1    # "LXXYY" minus one
> 
> If it was quoted like `.long "LXXYY-1"`, it would mean something very different and cause linker errors.

Hmm, that would suggest to me that the Dwarf code abuses the interface.
A "name" certainly shouldn't be an expression. And hence the result of
the example ought to be

     .long "LXXYY"-1    # "LXXYY" minus one

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-23  9:03                 ` Jan Beulich
@ 2024-01-23  9:21                   ` LIU Hao
  2024-01-23  9:37                     ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: LIU Hao @ 2024-01-23  9:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils, GCC Development


[-- Attachment #1.1: Type: text/plain, Size: 546 bytes --]

在 2024-01-23 17:03, Jan Beulich 写道:
> Hmm, that would suggest to me that the Dwarf code abuses the interface.
> A "name" certainly shouldn't be an expression. And hence the result of
> the example ought to be
> 
>       .long "LXXYY"-1    # "LXXYY" minus one

So I shouldn't have checked for `*` right?

The calls to `output_addr_const()` are from `dw2_assemble_integer (int size, rtx x)` in 
'gcc/dwarf2asm.cc'. Now I need some directives on how to fix this; parsing the symbol seems awkward.


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-23  9:21                   ` LIU Hao
@ 2024-01-23  9:37                     ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2024-01-23  9:37 UTC (permalink / raw)
  To: LIU Hao; +Cc: binutils, GCC Development

On 23.01.2024 10:21, LIU Hao wrote:
> 在 2024-01-23 17:03, Jan Beulich 写道:
>> Hmm, that would suggest to me that the Dwarf code abuses the interface.
>> A "name" certainly shouldn't be an expression. And hence the result of
>> the example ought to be
>>
>>       .long "LXXYY"-1    # "LXXYY" minus one
> 
> So I shouldn't have checked for `*` right?

I don't know.

> The calls to `output_addr_const()` are from `dw2_assemble_integer (int size, rtx x)` in 
> 'gcc/dwarf2asm.cc'. Now I need some directives on how to fix this; parsing the symbol seems awkward.

Indeed.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-18 16:40   ` LIU Hao
  2024-01-19  9:13     ` Jan Beulich
@ 2024-01-30  4:22     ` Hans-Peter Nilsson
  2024-01-31 10:11       ` LIU Hao
  1 sibling, 1 reply; 19+ messages in thread
From: Hans-Peter Nilsson @ 2024-01-30  4:22 UTC (permalink / raw)
  To: LIU Hao; +Cc: Jan Beulich, binutils, GCC Development

On Fri, 19 Jan 2024, LIU Hao wrote:

> ? 2024-01-18 20:54, Jan Beulich ??:
> > I'm sorry, but most of your proposal may even be considered for being
> > acceptable only if you would gain buy-off from the MASM guys. Anything
> > MASM treats as valid ought to be permitted by gas as well (within the
> > scope of certain divergence that cannot be changed in gas without
> > risking to break people's code). It could probably be considered to
> > introduce a "strict" mode of Intel syntax, following some / most of
> > what you propose; making this the default cannot be an option.
> 
> Thanks for your reply.
> 
> I have attached the Markdown source for that page, modified a few hours ago. I
> am planning to make some updates according to your advice tomorrow.
> 
> And yes, I am proposing a 'strict' mode, however not for humans, only for
> compilers.
> 
> My first message references a GCC bug report, where the problematic symbol
> `bx` comes from C source. I have been aware of the `/APP` and `/NO_APP`

It's #APP #NO_APP, not /APP /NO_APP, for x86_64-linux, even for 
-masm=intel.

> markers in generated assembly, so I suspect that GAS should be able to tell
> which parts are generated from a compiler and which parts are composed by
> hand.

Since a very long time, none but a very few gcc targets (not 
including i686/x64_64-linux) emit the initial #NO_APP, which 
have to be the very first characters of the generated assembly 
file, without which subsequent #APP/#NO_APP pairs are just for 
show.

That said, I guess you're going to modify gas too.  But please 
don't change the #APP/#NO_APP semantics for non-intel targets.

brgds, H-P

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RFC: Formalization of the Intel assembly syntax (PR53929)
  2024-01-30  4:22     ` Hans-Peter Nilsson
@ 2024-01-31 10:11       ` LIU Hao
  0 siblings, 0 replies; 19+ messages in thread
From: LIU Hao @ 2024-01-31 10:11 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Jan Beulich, binutils, GCC Development


[-- Attachment #1.1: Type: text/plain, Size: 731 bytes --]

在 2024-01-30 12:22, Hans-Peter Nilsson 写道:
> It's #APP #NO_APP, not /APP /NO_APP, for x86_64-linux, even for
> -masm=intel.
> 

For x86_64-w64-mingw64, GCC emits `/APP`; but Clang still emits `#APP`, as other targets. 
(https://gcc.godbolt.org/z/oj8vdGb78)


> That said, I guess you're going to modify gas too.  But please
> don't change the #APP/#NO_APP semantics for non-intel targets.

Probably not.. As mentioned earlier, there is too little interest on this. No progress shall be made 
unless there are comments from MASM authors.

I'm keeping GCC patched locally for now. This patch does not look so horrible: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929#c26


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-01-31 10:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-18  5:34 RFC: Formalization of the Intel assembly syntax (PR53929) LIU Hao
2024-01-18  9:02 ` Fangrui Song
2024-01-18 12:54 ` Jan Beulich
2024-01-18 16:40   ` LIU Hao
2024-01-19  9:13     ` Jan Beulich
2024-01-20 12:40       ` LIU Hao
2024-01-22  8:39         ` Jan Beulich
2024-01-23  1:27           ` LIU Hao
2024-01-23  8:38             ` Jan Beulich
2024-01-23  9:00               ` LIU Hao
2024-01-23  9:03                 ` Jan Beulich
2024-01-23  9:21                   ` LIU Hao
2024-01-23  9:37                     ` Jan Beulich
2024-01-30  4:22     ` Hans-Peter Nilsson
2024-01-31 10:11       ` LIU Hao
     [not found] ` <DS7PR12MB5765DBF9500DE323DB4A8E29CB712@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-01-19  1:42   ` LIU Hao
2024-01-19  7:41     ` Jan Beulich
2024-01-19  8:19     ` Fangrui Song
     [not found]     ` <DS7PR12MB5765654642BE3AD4C7F54E05CB702@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-01-20 12:32       ` LIU Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).