public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber
@ 2023-10-20 19:05 torvalds@linux-foundation.org
  2023-10-20 19:09 ` [Bug middle-end/111901] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: torvalds@linux-foundation.org @ 2023-10-20 19:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901

            Bug ID: 111901
           Summary: Apparently bogus CSE of inline asm with memory clobber
           Product: gcc
           Version: 13.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: torvalds@linux-foundation.org
  Target Milestone: ---

This came up during an unrelated discussion about 'volatile' in the kernel with
Uros Bizjak, both wrt inline asms and just 'normal' volatile memory accesses.
As part of that discussion I wrote a test-program, and it shows unexpected
(arguably very buggy) gcc optimization behavior.

The test is really simple:

    int test(void)
    {
        unsigned int sum=0;
        for (int i=0 ; i < 4 ; i++){
                unsigned int val;
    #if ONE
                asm("magic1 %0":"=r"(val): :"memory");
    #else
                asm volatile("magic2 %0":"=r"(val));
    #endif
                sum+=val;
        }
        return sum;
    }

and if you compile this with

    gcc -O2 -S -DONE -funroll-all-loops t.c

the end result seems nonsensical.

The over-eager CSE combines the non-volatile asm despite the fact that it has a
memory clobber, which gcc documentation states means:

     The "memory" clobber tells the compiler that the assembly code
     performs memory reads or writes to items other than those listed in
     the input and output operands (for example, accessing the memory
     pointed to by one of the input parameters).

so combining the four identical asm statements into one seems to be actively
buggy. The inline asm may not be marked volatile, but it does clearly tell the
compiler that it does memory reads OR WRITES to operands other than those
listed. Which would on the face of it make the CSE invalid.

Imagine, for example, that there's some added statistics gathering or similar
going on inside the asm, maybe using the new Intel remote atomics for example. 

The other oddity is how it does not actually multiply the result by four in any
sane way. The above generates:

        magic1 %eax
        movl    %eax, %edx
        addl    %eax, %eax
        addl    %edx, %eax
        addl    %edx, %eax
        ret

but *without* the memory barrier it generates the much more obvious

        magic1 %eax
        sall    $2, %eax
        ret

so the memory barrier does end up affecting the end result, just in a
nonsensical way.

And no, I don't think we have cases of this kind of code in the kernel. Marking
the asm volatile will obviously disable the over-eager CSE, and is certainly
the natural thing to do for anything that actually modifies memory.

But the memory clobber really does seem to make the CSE that gcc does invalid
as per the documentation, and Uros suggested I'd do a gcc bugzilla entry for
this.

FYI: clang seems to generate the expected code for all cases (ie both
"volatile" and the memory clobber will disable CSE, and when it does CSE it, it
generates the expected single shift, rather than doing individual additions).
But clang verifies the assembler mnemonic (which I think is a misfeature - one
of the *reasons* to use inline asm is to do things the compiler doesn't
understand), so instead of "magicX", you need to use "strl" or something like
that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/111901] Apparently bogus CSE of inline asm with memory clobber
  2023-10-20 19:05 [Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber torvalds@linux-foundation.org
@ 2023-10-20 19:09 ` pinskia at gcc dot gnu.org
  2023-10-20 19:48 ` [Bug rtl-optimization/111901] " pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-20 19:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |middle-end
           Keywords|                            |inline-asm

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect without an input, the cse will happen as there is no other writes in
the loop.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/111901] Apparently bogus CSE of inline asm with memory clobber
  2023-10-20 19:05 [Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber torvalds@linux-foundation.org
  2023-10-20 19:09 ` [Bug middle-end/111901] " pinskia at gcc dot gnu.org
@ 2023-10-20 19:48 ` pinskia at gcc dot gnu.org
  2023-10-20 19:52 ` torvalds@linux-foundation.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-20 19:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
          Component|middle-end                  |rtl-optimization
      Known to fail|                            |4.9.0

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The gimple level is ok.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/111901] Apparently bogus CSE of inline asm with memory clobber
  2023-10-20 19:05 [Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber torvalds@linux-foundation.org
  2023-10-20 19:09 ` [Bug middle-end/111901] " pinskia at gcc dot gnu.org
  2023-10-20 19:48 ` [Bug rtl-optimization/111901] " pinskia at gcc dot gnu.org
@ 2023-10-20 19:52 ` torvalds@linux-foundation.org
  2023-10-20 19:59 ` pinskia at gcc dot gnu.org
  2023-10-22 17:05 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: torvalds@linux-foundation.org @ 2023-10-20 19:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901

--- Comment #3 from Linus Torvalds <torvalds@linux-foundation.org> ---
(In reply to Andrew Pinski from comment #1)
> I suspect without an input, the cse will happen as there is no other writes
> in the loop.

Yes, it looks to me like the CSE simply didn't think of the memory clobber as a
possible side effect.

Other clobbers (ie registers) presumably don't affect whether something can be
CSE'd, so I do think that a memory clobber is special.

Of course, memory clobbers are special in other ways too (ie they force spills
and reloads of any pseudos around them), but my reaction to this is that it
does look like gcc has simply missed one other implication of a memory clobber.

Do I think it *matters* in practice? Probably not. Hopefully these kinds of asm
uses don't exist.

But I do find the resulting code generation surprising to the point of "that
looks buggy".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/111901] Apparently bogus CSE of inline asm with memory clobber
  2023-10-20 19:05 [Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber torvalds@linux-foundation.org
                   ` (2 preceding siblings ...)
  2023-10-20 19:52 ` torvalds@linux-foundation.org
@ 2023-10-20 19:59 ` pinskia at gcc dot gnu.org
  2023-10-22 17:05 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-20 19:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Looks like postreload_cse is causing the issue ...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/111901] Apparently bogus CSE of inline asm with memory clobber
  2023-10-20 19:05 [Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber torvalds@linux-foundation.org
                   ` (3 preceding siblings ...)
  2023-10-20 19:59 ` pinskia at gcc dot gnu.org
@ 2023-10-22 17:05 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-22 17:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-10-22
             Status|UNCONFIRMED                 |NEW

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. The inline-asm was merged between reload and postreload dump files .

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-22 17:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-20 19:05 [Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber torvalds@linux-foundation.org
2023-10-20 19:09 ` [Bug middle-end/111901] " pinskia at gcc dot gnu.org
2023-10-20 19:48 ` [Bug rtl-optimization/111901] " pinskia at gcc dot gnu.org
2023-10-20 19:52 ` torvalds@linux-foundation.org
2023-10-20 19:59 ` pinskia at gcc dot gnu.org
2023-10-22 17:05 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).