public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Fangrui Song <i@maskray.me>
To: LIU Hao <lh_mouse@126.com>
Cc: binutils@sourceware.org, GCC Development <gcc@gcc.gnu.org>
Subject: Re: RFC: Formalization of the Intel assembly syntax (PR53929)
Date: Fri, 19 Jan 2024 00:19:30 -0800	[thread overview]
Message-ID: <DS7PR12MB57652A45D6B3067AB4019AF7CB702@DS7PR12MB5765.namprd12.prod.outlook.com> (raw)
In-Reply-To: <ea9234db-2443-4f73-a2e9-d39240c1d126@126.com>

On Thu, Jan 18, 2024 at 5:42 PM LIU Hao <lh_mouse@126.com> wrote:
>
> 在 2024-01-18 17:02, Fangrui Song 写道:
> > Thanks for the proposal. I hope that -masm=intel becomes more useful:)
> >
> > Do you have a list of assembly in the unambiguous cases that fail to
> > be parsed today as a gas PR?
> > For example,
>
> Not really. Most of these are results from high-level languages. For example:
>
>     # Expected: `movl shr(%rip), %eax`
>     # Actual: error: invalid use of operator "shr"
>     mov eax, DWORD PTR shr[rip]
>
>     # Expected: `movl dword(%rip), %eax`
>     # Actual: accepted as `movl 4(%rip), %eax`
>     mov eax, DWORD ptr dword[rip]

GCC seems to print a symbol displacement, possibly with a modifier
(for a relocation), before the left bracket.

mov edx, DWORD PTR bx@GOT[eax]
mov edx, DWORD PTR bx[eax]
mov edx, DWORD PTR and[eax]    # Error: invalid use of operator "and"

Technically, assemblers (gas and LLVM integrated assembler) can be
made to parse "bx" as a symbol, even if it matches a register name or
an operator name ("and").
However, a straightforward approach using one lookahead token cannot
disambiguate the following two cases.

mov edx, DWORD PTR fs:[eax]   # segment override prefix
mov edx, DWORD PTR fs[eax]    # symbol

So, we would need two lookahead tokens...
(https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L2534-L2550
needs more code to parse `fs:` correctly.)

It is also unfortunate that whether the displacement is an immediate
or not change the behavior of brackets.

mov eax, DWORD PTR 0          # mov    $0x0,%eax
mov eax, DWORD PTR [0]        # mov    0x0,%eax
mov eax, DWORD PTR sym        # mov    0x0,%eax with relocation
mov eax, DWORD PTR [sym]      # mov    0x0,%eax with relocation

The above reveals yet another inconsistency. For a memory reference,
it seems that we should use [] but [sym] could be ambiguous if sym
matches a register name or operator name.

Does the proposal change the placement of the displacement depending
on whether it is an immediate?
This is inconsistent, but perhaps there is not much we can improve...

extern int a[2];
int foo() { return a[1]+a[2]; }

GCC's PIC -masm=intel output

        mov     eax, DWORD PTR a[rip+8]
        add     eax, DWORD PTR a[rip+4]

The displacements (a+8 and a+4) involve a plus expression and `a` and
`8`/`4` are printed in two places.

> In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to `.intel_syntax noprefix`:
>
>     $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o
>     {standard input}: Assembler messages:
>     {standard input}:1: Error: invalid use of register
>
>     $ as <<< '.intel_syntax noprefix;  mov eax, DWORD PTR gs:0x48' -o a.o && objdump -Mintel -d a.o
>     ...
>     0000000000000000 <.text>:
>        0:       65 8b 04 25 48 00 00    mov    eax,DWORD PTR gs:0x48

Confirmed by Jan.

  parent reply	other threads:[~2024-01-19  8:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18  5:34 LIU Hao
2024-01-18  9:02 ` Fangrui Song
2024-01-18 12:54 ` Jan Beulich
2024-01-18 16:40   ` LIU Hao
2024-01-19  9:13     ` Jan Beulich
2024-01-20 12:40       ` LIU Hao
2024-01-22  8:39         ` Jan Beulich
2024-01-23  1:27           ` LIU Hao
2024-01-23  8:38             ` Jan Beulich
2024-01-23  9:00               ` LIU Hao
2024-01-23  9:03                 ` Jan Beulich
2024-01-23  9:21                   ` LIU Hao
2024-01-23  9:37                     ` Jan Beulich
2024-01-30  4:22     ` Hans-Peter Nilsson
2024-01-31 10:11       ` LIU Hao
     [not found] ` <DS7PR12MB5765DBF9500DE323DB4A8E29CB712@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-01-19  1:42   ` LIU Hao
2024-01-19  7:41     ` Jan Beulich
2024-01-19  8:19     ` Fangrui Song [this message]
     [not found]     ` <DS7PR12MB5765654642BE3AD4C7F54E05CB702@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-01-20 12:32       ` LIU Hao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DS7PR12MB57652A45D6B3067AB4019AF7CB702@DS7PR12MB5765.namprd12.prod.outlook.com \
    --to=i@maskray.me \
    --cc=binutils@sourceware.org \
    --cc=gcc@gcc.gnu.org \
    --cc=lh_mouse@126.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).