在 2024-01-18 20:54, Jan Beulich 写道: > I'm sorry, but most of your proposal may even be considered for being > acceptable only if you would gain buy-off from the MASM guys. Anything > MASM treats as valid ought to be permitted by gas as well (within the > scope of certain divergence that cannot be changed in gas without > risking to break people's code). It could probably be considered to > introduce a "strict" mode of Intel syntax, following some / most of > what you propose; making this the default cannot be an option. Thanks for your reply. I have attached the Markdown source for that page, modified a few hours ago. I am planning to make some updates according to your advice tomorrow. And yes, I am proposing a 'strict' mode, however not for humans, only for compilers. My first message references a GCC bug report, where the problematic symbol `bx` comes from C source. I have been aware of the `/APP` and `/NO_APP` markers in generated assembly, so I suspect that GAS should be able to tell which parts are generated from a compiler and which parts are composed by hand. The proposed strict mode may apply only to the output from GCC, which are much more likely to contain bad symbols, but are also more controllable on the GCC side. I believe that skillful people who write x86 assembly have known that `offset`, `shr`, `si` etc. are 'bad' names for symbols. Therefore, it's like an issue there. > Commenting on individual aspects of your proposal is a little difficult, > as you didn't provide the proposal inline (and hence it cannot be easily > used as context in a reply). But to mention the imo worst aspect: > Declaring > > mov eax, [rcx] > > as invalid is a no-go. I agree. I am considering to declare the lack of a symbol as a special case. > I also don't see how this would be related to the > issue at hand. What's in the square brackets may as well be a symbol > name, so requiring the "mode specifier" doesn't disambiguate things at > all. If someone declares a variable called `rcx` in C, it has be translated to mov eax, DWORD PTR rcx # `movl rcx, %eax` instead of mov eax, DWORD PTR [rcx] # `movl (%rcx), %eax` > One remark regarding the underlying pattern leading to the issue: > Personally I view it as questionable practice to have extern or static > variables in C code with names as short as register names are. Avoiding > them does not only avoid the issue here, but also is quite likely going > to improve the code (by having more descriptive variable names). And > automatic variables aren't affected aiui, so can remain short (after > all, commonly automatic variable names are as short as a single char). Yes, we agree that longer, more descriptive names increase maintainability. However, there are scenarios where maintainability doesn't matter much. For instance, testcases, sometimes machine-generated testcases, which are usually short programs, created to address issues in something else, and are likely to contain variables with very short names. The register names `si` and `es` look especially risky to me. > That said, I can certainly also see how the introduction of new > registers can lead to new conflicts, which isn't nice. Iirc old 32-bit > MASM escaped this problem by requiring architecture extensions to be > explicitly enabled (may have changed in newer MASM). Gas, otoh, enables > everything by default (and I don't see how we could change that). I confess! I haven't done much investigation about these compilers, and all stuff hereinafter is my presumption. Given this C source: extern int rdx; int get_value() { return rdx; } I try to compile it directly to an object file, with MSVC, Clang and GCC: > cl /nologo /c test.c && echo Success test.c Success > clang -masm=intel -c test.c && echo Success Success > gcc -masm=intel -c test.c && echo Success C:\Users\lh_mouse\AppData\Local\Temp\ccjcy1Qj.s: Assembler messages: C:\Users\lh_mouse\AppData\Local\Temp\ccjcy1Qj.s:23: Error: invalid use of register C:\Users\lh_mouse\AppData\Local\Temp\ccjcy1Qj.s:23: Warning: register value used as expression but if I compile it to assembly first, then assemble the result to an object file: > cl /nologo /c test.c /Fatest.asm && ml64 /nologo /c test.asm && echo success test.c Assembling: test.asm test.asm(9) : error A2008:syntax error : rdx test.asm(15) : error A2032:invalid use of register > clang -masm=intel -S test.c -o test.s && clang -masm=intel test.s && echo Success test.s:26:8: error: expected relocatable expression .quad rdx ^ > gcc -masm=intel -S test.c -o test.s && gcc -masm=intel test.s && echo Success test.s: Assembler messages: test.s:23: Error: invalid use of register test.s:23: Warning: register value used as expression It looks to me that both MSVC and Clang have integrated assemblers, so their compiler outputs do not really turn into assembly code before finally becoming target code. This approach is not subject to the ambiguity. As GCC still relies on GAS to produce object files, (as stated in the first paragraph,) it might make some sense to implement a strict mode on outputs from GCC to resolve the potential ambiguity, while still providing a permissive mode for inline or handwritten assembly. -- Best regards, LIU Hao