public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Fangrui Song <i@maskray.me>
To: LIU Hao <lh_mouse@126.com>
Cc: binutils@sourceware.org, GCC Development <gcc@gcc.gnu.org>
Subject: Re: RFC: Formalization of the Intel assembly syntax (PR53929)
Date: Thu, 18 Jan 2024 01:02:22 -0800	[thread overview]
Message-ID: <MN0PR12MB5761F72F6FBE6F8B6C7ACEEFCB712@MN0PR12MB5761.namprd12.prod.outlook.com> (raw)
In-Reply-To: <d3cb6174-caa6-47f3-9b5e-d39222701458@126.com>

On Wed, Jan 17, 2024 at 9:34 PM LIU Hao <lh_mouse@126.com> wrote:
>
> Hello,
>
> There hasn't been an solution to https://gcc.gnu.org/PR53929 since almost a dozen years ago, mostly
> due to compatibility with MASM. I was told that the ambiguity of Intel syntax should be classified
> as its own limitation and disrecommendation.
>
> Notwithstanding, I am proposing a permanent solution to this issue, by banning constructions that
> cause ambiguity. This is likely to effect incompatibility with other assemblers, but it should make
> GAS parse the output of GCC flawlessly.
>
>
> PR53929 contains a known ambiguous construction
>
>     lea rax, bx[rip]
>
> where `bx` could denote the BX register and causes confusion. The Intel Software Developer Manual
> also contains an ambiguous construction
>
>     MOV EBX, RAM_START
>
> which would look like loading the offset of `RAM_START`. My proposal is that these two constructions
> are ambiguous and should be rejected. The compiler should generate assembly in the unambiguous
> subset, and we can start to implement the assembler to reject the ambiguous ones.
>
> Their are formalized as
>
>     lea rax, BYTE PTR bx[rip]
>     mov EBX, DWORD PTR RAM_START
>
> Roughly speaking, anything after `PTR`/`BCST` (and before `[` if any) is considered a symbol even if
> it matches a keyword; any identifier between `[` and `]` is a register and not a symbol.
>
>
> My complete proposal can be found at
> <https://github.com/lhmouse/mcfgthread/wiki/Formalized-Intel-Syntax-for-x86>. Some ideas actually
> reflect the AT&T syntax. I hope it helps.

Thanks for the proposal. I hope that -masm=intel becomes more useful:)

Do you have a list of assembly in the unambiguous cases that fail to
be parsed today as a gas PR?
For example,

% as -msyntax=intel -mnaked-reg <<< 'lea rax, BYTE PTR bxx[rip]' -o
a.o && objdump -d -M intel a.o | grep -A1 '>:'
0000000000000000 <.text>:
   0:   48 8d 05 00 00 00 00    lea    rax,[rip+0x0]        # 0x7
% as -msyntax=intel -mnaked-reg <<< 'lea rax, BYTE PTR bx[rip]' -o a.o
&& objdump -d -M intel a.o | grep -A1 '>:'
{standard input}: Assembler messages:
{standard input}:1: Error: invalid use of register
% as -msyntax=intel -mnaked-reg <<< 'mov EBX, DWORD PTR ebx' -o a.o
{standard input}: Assembler messages:
{standard input}:1: Error: invalid use of register

  reply	other threads:[~2024-01-18  9:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18  5:34 LIU Hao
2024-01-18  9:02 ` Fangrui Song [this message]
2024-01-18 12:54 ` Jan Beulich
2024-01-18 16:40   ` LIU Hao
2024-01-19  9:13     ` Jan Beulich
2024-01-20 12:40       ` LIU Hao
2024-01-22  8:39         ` Jan Beulich
2024-01-23  1:27           ` LIU Hao
2024-01-23  8:38             ` Jan Beulich
2024-01-23  9:00               ` LIU Hao
2024-01-23  9:03                 ` Jan Beulich
2024-01-23  9:21                   ` LIU Hao
2024-01-23  9:37                     ` Jan Beulich
2024-01-30  4:22     ` Hans-Peter Nilsson
2024-01-31 10:11       ` LIU Hao
     [not found] ` <DS7PR12MB5765DBF9500DE323DB4A8E29CB712@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-01-19  1:42   ` LIU Hao
2024-01-19  7:41     ` Jan Beulich
2024-01-19  8:19     ` Fangrui Song
     [not found]     ` <DS7PR12MB5765654642BE3AD4C7F54E05CB702@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-01-20 12:32       ` LIU Hao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN0PR12MB5761F72F6FBE6F8B6C7ACEEFCB712@MN0PR12MB5761.namprd12.prod.outlook.com \
    --to=i@maskray.me \
    --cc=binutils@sourceware.org \
    --cc=gcc@gcc.gnu.org \
    --cc=lh_mouse@126.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).