public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call
@ 2022-04-22  5:59 570070308 at qq dot com
  2022-04-22  6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: 570070308 at qq dot com @ 2022-04-22  5:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342

            Bug ID: 105342
           Summary: [Extended Asm]Memory barrier geater than a function
                    call
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: 570070308 at qq dot com
  Target Milestone: ---

This is an enhancement request, not a bug.
According to doc, using the "memory" clobber effectively forms a read/write
memory barrier for the compiler. Through my tests, the memory barrier's range
in Extended-Asm is even greater than a function call, it will barrier the
memory in the function's own stack. I think it is useless and it may even
generate more complex code with inline funtion.


For example in the test case:
```test.c
extern unsigned long int x[512];

void test(long int,long int,long int,long int,long int);
void test1(long int);
void test2();

int kkk()
{
    unsigned long int k[512];
    for (size_t i=0; i<512; ++i )
    {
        k[i]=x[i];
    }
    test1(k[0]);
    k[1]=3;
    k[2]=3;
    k[3]=3;
    k[4]=3;
    k[5]=3;
    test2();
    test(k[1], k[2], k[3], k[4], k[5]);
    return 0;
}
```
and
```test2.c
void test2()
{
    __asm__ volatile
        (""
         :
         :
         :"memory"
         );
}
```
compile with 
```
gcc-12 -fno-stack-protector -fcf-protection=none
-fno-asynchronous-unwind-tables -mgeneral-regs-only -O3 -S test.c test2.c
```
than generate
```test.s:
kkk:
        subq    $4096, %rsp
        orq     $0, (%rsp)
        subq    $8, %rsp
        leaq    x(%rip), %rsi
        movl    $512, %ecx
        movq    %rsp, %rdi
        rep movsq
        movq    (%rsp), %rdi
        call    test1@PLT
        xorl    %eax, %eax
        call    test2@PLT
        movl    $3, %r8d
        movl    $3, %ecx
        movl    $3, %edx
        movl    $3, %esi
        movl    $3, %edi
        call    test@PLT
        xorl    %eax, %eax
        addq    $4104, %rsp
        ret
```
```test2.s
test2:
        ret
```
The kkk's assembly code looks neat.

However, if I put the contents of test2.c in test.c, then it will generate:
```test.s
kkk:
        subq    $4096, %rsp
        orq     $0, (%rsp)
        subq    $8, %rsp
        leaq    x(%rip), %rsi
        movl    $512, %ecx
        movq    %rsp, %rdi
        rep movsq
        movq    (%rsp), %rdi
        call    test1@PLT
        movq    $3, 8(%rsp)
        movq    $3, 16(%rsp)
        movq    $3, 24(%rsp)
        movq    $3, 32(%rsp)
        movq    $3, 40(%rsp)
        movq    40(%rsp), %r8
        movq    32(%rsp), %rcx
        movq    24(%rsp), %rdx
        movq    16(%rsp), %rsi
        movq    8(%rsp), %rdi
        call    test@PLT
        xorl    %eax, %eax
        addq    $4104, %rsp
        ret
test2:
        ret
```
The compiler automatically inline the function test2() and think k[1], k[2],
k[3], k[4], k[5] is barrier with the extended-asm, so the inlining test2() is
even slower than not inlining it.

The gcc-12 is installed by apt on ubuntu 22.04. Full compile log:
```
$ gcc-12 -fno-stack-protector -fcf-protection=none
-fno-asynchronous-unwind-tables -mgeneral-regs-only -O3 -S test.c test2.c -v
Using built-in specs.
COLLECT_GCC=gcc-12
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
12-20220319-1ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-OcsLtf/gcc-12-12-20220319/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-OcsLtf/gcc-12-12-20220319/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.1 20220319 (experimental) [master r12-7719-g8ca61ad148f]
(Ubuntu 12-20220319-1ubuntu1) 
COLLECT_GCC_OPTIONS='-fno-stack-protector' '-fcf-protection=none'
'-fno-asynchronous-unwind-tables' '-mgeneral-regs-only' '-O3' '-S' '-v'
'-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/12/cc1 -quiet -v -imultiarch x86_64-linux-gnu
test.c -quiet -dumpbase test.c -dumpbase-ext .c -mgeneral-regs-only
-mtune=generic -march=x86-64 -O3 -version -fno-stack-protector
-fcf-protection=none -fno-asynchronous-unwind-tables -o test.s -Wformat
-Wformat-security -fstack-clash-protection
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
        compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/12/include
 /usr/local/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
        compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 200a3dd46f0674d1a8fcf2b133bc6014
COLLECT_GCC_OPTIONS='-fno-stack-protector' '-fcf-protection=none'
'-fno-asynchronous-unwind-tables' '-mgeneral-regs-only' '-O3' '-S' '-v'
'-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/12/cc1 -quiet -v -imultiarch x86_64-linux-gnu
test2.c -quiet -dumpbase test2.c -dumpbase-ext .c -mgeneral-regs-only
-mtune=generic -march=x86-64 -O3 -version -fno-stack-protector
-fcf-protection=none -fno-asynchronous-unwind-tables -o test2.s -Wformat
-Wformat-security -fstack-clash-protection
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
        compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/12/include
 /usr/local/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
        compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 200a3dd46f0674d1a8fcf2b133bc6014
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-fno-stack-protector' '-fcf-protection=none'
'-fno-asynchronous-unwind-tables' '-mgeneral-regs-only' '-O3' '-S' '-v'
'-mtune=generic' '-march=x86-64'
```

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
  2022-04-22  5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
@ 2022-04-22  6:23 ` rguenth at gcc dot gnu.org
  2022-04-22  6:24 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22  6:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
                 CC|                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-04-22

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think this is on purpose - on GIMPLE at least.  One would need to analyze
what RTL disambiguation does with ASMs (I would guess nothing, just like
GIMPLE).

I agree that it might be more consistent to only consider things clobbered
by a call, but then we need to look at the explicit memory outputs/clobbers.

Is it really important though?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
  2022-04-22  5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
  2022-04-22  6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
@ 2022-04-22  6:24 ` pinskia at gcc dot gnu.org
  2022-04-22  6:24 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-22  6:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Yes it is greater than a function call as it has to be a barrier to even memory
which does NOT escape the function.

Yes inlining does change it from dealing with escaped memory to even local
ones. I doubt it can be fixed though because you have to mark which memory was
active in the function where it was inlined from and such.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
  2022-04-22  5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
  2022-04-22  6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
  2022-04-22  6:24 ` pinskia at gcc dot gnu.org
@ 2022-04-22  6:24 ` pinskia at gcc dot gnu.org
  2022-04-22  6:35 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-22  6:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |inline-asm
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
  2022-04-22  5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
                   ` (2 preceding siblings ...)
  2022-04-22  6:24 ` pinskia at gcc dot gnu.org
@ 2022-04-22  6:35 ` rguenth at gcc dot gnu.org
  2022-04-22  6:53 ` 570070308 at qq dot com
  2022-04-22  7:56 ` rguenther at suse dot de
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22  6:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> Yes it is greater than a function call as it has to be a barrier to even
> memory which does NOT escape the function.
> 
> Yes inlining does change it from dealing with escaped memory to even local
> ones. I doubt it can be fixed though because you have to mark which memory
> was active in the function where it was inlined from and such.

The question is whether we allow an asm (""::: "memory") to clobber
arbitrary parts of the stack the asm somehow magically can associate with
an automatic variable (the compiler could have promoted to a register!).

Basically it currently prevents optimization of stack storage across the
asm the asm cannot rely on being stack storage and thus has to assume it's
in registers.  The docs say

--
@item "memory"
The @code{"memory"} clobber tells the compiler that the assembly code
performs memory
reads or writes to items other than those listed in the input and output
operands (for example, accessing the memory pointed to by one of the input
parameters). To ensure memory contains correct values, GCC may need to flush
specific register values to memory before executing the @code{asm}. Further,
the compiler does not assume that any values read from memory before an
@code{asm} remain unchanged after that @code{asm}; it reloads them as
needed.
Using the @code{"memory"} clobber effectively forms a read/write
memory barrier for the compiler.

Note that this clobber does not prevent the @emph{processor} from doing
speculative reads past the @code{asm} statement. To prevent that, you need
processor-specific fence instructions.
--

I think clarifying that it does not protect 'memory' that can be elided
by the compiler wouldn't make it behave different in practice.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
  2022-04-22  5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
                   ` (3 preceding siblings ...)
  2022-04-22  6:35 ` rguenth at gcc dot gnu.org
@ 2022-04-22  6:53 ` 570070308 at qq dot com
  2022-04-22  7:56 ` rguenther at suse dot de
  5 siblings, 0 replies; 7+ messages in thread
From: 570070308 at qq dot com @ 2022-04-22  6:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342

--- Comment #4 from 。 <570070308 at qq dot com> ---
(In reply to Richard Biener from comment #1)
> Is it really important though?

The doc says that "The asm statement allows you to include assembly
instructions directly within C code. This may help you to maximize performance
in time-sensitive code or to access assembly instructions that are not readily
available to C programs.", and I use extended-asm rather than writing a whole
function with assembly just for maximize performance.

I try my best for not using "memory" clobber, but in some cases I have to use
it, for example, using the asm to operate a list structure like the `struct
list_head` in Linux. It is impossible to list all the list elements in the
Input/OutputOperands.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
  2022-04-22  5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
                   ` (4 preceding siblings ...)
  2022-04-22  6:53 ` 570070308 at qq dot com
@ 2022-04-22  7:56 ` rguenther at suse dot de
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2022-04-22  7:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 22 Apr 2022, 570070308 at qq dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
> 
> --- Comment #4 from 。 <570070308 at qq dot com> ---
> (In reply to Richard Biener from comment #1)
> > Is it really important though?
> 
> The doc says that "The asm statement allows you to include assembly
> instructions directly within C code. This may help you to maximize performance
> in time-sensitive code or to access assembly instructions that are not readily
> available to C programs.", and I use extended-asm rather than writing a whole
> function with assembly just for maximize performance.
> 
> I try my best for not using "memory" clobber, but in some cases I have to use
> it, for example, using the asm to operate a list structure like the `struct
> list_head` in Linux. It is impossible to list all the list elements in the
> Input/OutputOperands.

Indeed.  I think there's duplicate bugreports on the lack of a clobber
specifying "memory reachable by pointer X", the clobber syntax
is somewhat limited here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-22  7:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-22  5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
2022-04-22  6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
2022-04-22  6:24 ` pinskia at gcc dot gnu.org
2022-04-22  6:24 ` pinskia at gcc dot gnu.org
2022-04-22  6:35 ` rguenth at gcc dot gnu.org
2022-04-22  6:53 ` 570070308 at qq dot com
2022-04-22  7:56 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).