public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call
@ 2022-04-22 5:59 570070308 at qq dot com
2022-04-22 6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: 570070308 at qq dot com @ 2022-04-22 5:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
Bug ID: 105342
Summary: [Extended Asm]Memory barrier geater than a function
call
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: 570070308 at qq dot com
Target Milestone: ---
This is an enhancement request, not a bug.
According to doc, using the "memory" clobber effectively forms a read/write
memory barrier for the compiler. Through my tests, the memory barrier's range
in Extended-Asm is even greater than a function call, it will barrier the
memory in the function's own stack. I think it is useless and it may even
generate more complex code with inline funtion.
For example in the test case:
```test.c
extern unsigned long int x[512];
void test(long int,long int,long int,long int,long int);
void test1(long int);
void test2();
int kkk()
{
unsigned long int k[512];
for (size_t i=0; i<512; ++i )
{
k[i]=x[i];
}
test1(k[0]);
k[1]=3;
k[2]=3;
k[3]=3;
k[4]=3;
k[5]=3;
test2();
test(k[1], k[2], k[3], k[4], k[5]);
return 0;
}
```
and
```test2.c
void test2()
{
__asm__ volatile
(""
:
:
:"memory"
);
}
```
compile with
```
gcc-12 -fno-stack-protector -fcf-protection=none
-fno-asynchronous-unwind-tables -mgeneral-regs-only -O3 -S test.c test2.c
```
than generate
```test.s:
kkk:
subq $4096, %rsp
orq $0, (%rsp)
subq $8, %rsp
leaq x(%rip), %rsi
movl $512, %ecx
movq %rsp, %rdi
rep movsq
movq (%rsp), %rdi
call test1@PLT
xorl %eax, %eax
call test2@PLT
movl $3, %r8d
movl $3, %ecx
movl $3, %edx
movl $3, %esi
movl $3, %edi
call test@PLT
xorl %eax, %eax
addq $4104, %rsp
ret
```
```test2.s
test2:
ret
```
The kkk's assembly code looks neat.
However, if I put the contents of test2.c in test.c, then it will generate:
```test.s
kkk:
subq $4096, %rsp
orq $0, (%rsp)
subq $8, %rsp
leaq x(%rip), %rsi
movl $512, %ecx
movq %rsp, %rdi
rep movsq
movq (%rsp), %rdi
call test1@PLT
movq $3, 8(%rsp)
movq $3, 16(%rsp)
movq $3, 24(%rsp)
movq $3, 32(%rsp)
movq $3, 40(%rsp)
movq 40(%rsp), %r8
movq 32(%rsp), %rcx
movq 24(%rsp), %rdx
movq 16(%rsp), %rsi
movq 8(%rsp), %rdi
call test@PLT
xorl %eax, %eax
addq $4104, %rsp
ret
test2:
ret
```
The compiler automatically inline the function test2() and think k[1], k[2],
k[3], k[4], k[5] is barrier with the extended-asm, so the inlining test2() is
even slower than not inlining it.
The gcc-12 is installed by apt on ubuntu 22.04. Full compile log:
```
$ gcc-12 -fno-stack-protector -fcf-protection=none
-fno-asynchronous-unwind-tables -mgeneral-regs-only -O3 -S test.c test2.c -v
Using built-in specs.
COLLECT_GCC=gcc-12
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
12-20220319-1ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-OcsLtf/gcc-12-12-20220319/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-OcsLtf/gcc-12-12-20220319/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.1 20220319 (experimental) [master r12-7719-g8ca61ad148f]
(Ubuntu 12-20220319-1ubuntu1)
COLLECT_GCC_OPTIONS='-fno-stack-protector' '-fcf-protection=none'
'-fno-asynchronous-unwind-tables' '-mgeneral-regs-only' '-O3' '-S' '-v'
'-mtune=generic' '-march=x86-64'
/usr/lib/gcc/x86_64-linux-gnu/12/cc1 -quiet -v -imultiarch x86_64-linux-gnu
test.c -quiet -dumpbase test.c -dumpbase-ext .c -mgeneral-regs-only
-mtune=generic -march=x86-64 -O3 -version -fno-stack-protector
-fcf-protection=none -fno-asynchronous-unwind-tables -o test.s -Wformat
-Wformat-security -fstack-clash-protection
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/12/include
/usr/local/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 200a3dd46f0674d1a8fcf2b133bc6014
COLLECT_GCC_OPTIONS='-fno-stack-protector' '-fcf-protection=none'
'-fno-asynchronous-unwind-tables' '-mgeneral-regs-only' '-O3' '-S' '-v'
'-mtune=generic' '-march=x86-64'
/usr/lib/gcc/x86_64-linux-gnu/12/cc1 -quiet -v -imultiarch x86_64-linux-gnu
test2.c -quiet -dumpbase test2.c -dumpbase-ext .c -mgeneral-regs-only
-mtune=generic -march=x86-64 -O3 -version -fno-stack-protector
-fcf-protection=none -fno-asynchronous-unwind-tables -o test2.s -Wformat
-Wformat-security -fstack-clash-protection
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/12/include
/usr/local/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
GNU C17 (Ubuntu 12-20220319-1ubuntu1) version 12.0.1 20220319 (experimental)
[master r12-7719-g8ca61ad148f] (x86_64-linux-gnu)
compiled by GNU C version 12.0.1 20220319 (experimental) [master
r12-7719-g8ca61ad148f], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.24-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 200a3dd46f0674d1a8fcf2b133bc6014
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-fno-stack-protector' '-fcf-protection=none'
'-fno-asynchronous-unwind-tables' '-mgeneral-regs-only' '-O3' '-S' '-v'
'-mtune=generic' '-march=x86-64'
```
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
2022-04-22 5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
@ 2022-04-22 6:23 ` rguenth at gcc dot gnu.org
2022-04-22 6:24 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 6:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
CC| |rguenth at gcc dot gnu.org
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2022-04-22
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think this is on purpose - on GIMPLE at least. One would need to analyze
what RTL disambiguation does with ASMs (I would guess nothing, just like
GIMPLE).
I agree that it might be more consistent to only consider things clobbered
by a call, but then we need to look at the explicit memory outputs/clobbers.
Is it really important though?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
2022-04-22 5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
2022-04-22 6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
@ 2022-04-22 6:24 ` pinskia at gcc dot gnu.org
2022-04-22 6:24 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-22 6:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Yes it is greater than a function call as it has to be a barrier to even memory
which does NOT escape the function.
Yes inlining does change it from dealing with escaped memory to even local
ones. I doubt it can be fixed though because you have to mark which memory was
active in the function where it was inlined from and such.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
2022-04-22 5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
2022-04-22 6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
2022-04-22 6:24 ` pinskia at gcc dot gnu.org
@ 2022-04-22 6:24 ` pinskia at gcc dot gnu.org
2022-04-22 6:35 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-22 6:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |inline-asm
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
2022-04-22 5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
` (2 preceding siblings ...)
2022-04-22 6:24 ` pinskia at gcc dot gnu.org
@ 2022-04-22 6:35 ` rguenth at gcc dot gnu.org
2022-04-22 6:53 ` 570070308 at qq dot com
2022-04-22 7:56 ` rguenther at suse dot de
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 6:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> Yes it is greater than a function call as it has to be a barrier to even
> memory which does NOT escape the function.
>
> Yes inlining does change it from dealing with escaped memory to even local
> ones. I doubt it can be fixed though because you have to mark which memory
> was active in the function where it was inlined from and such.
The question is whether we allow an asm (""::: "memory") to clobber
arbitrary parts of the stack the asm somehow magically can associate with
an automatic variable (the compiler could have promoted to a register!).
Basically it currently prevents optimization of stack storage across the
asm the asm cannot rely on being stack storage and thus has to assume it's
in registers. The docs say
--
@item "memory"
The @code{"memory"} clobber tells the compiler that the assembly code
performs memory
reads or writes to items other than those listed in the input and output
operands (for example, accessing the memory pointed to by one of the input
parameters). To ensure memory contains correct values, GCC may need to flush
specific register values to memory before executing the @code{asm}. Further,
the compiler does not assume that any values read from memory before an
@code{asm} remain unchanged after that @code{asm}; it reloads them as
needed.
Using the @code{"memory"} clobber effectively forms a read/write
memory barrier for the compiler.
Note that this clobber does not prevent the @emph{processor} from doing
speculative reads past the @code{asm} statement. To prevent that, you need
processor-specific fence instructions.
--
I think clarifying that it does not protect 'memory' that can be elided
by the compiler wouldn't make it behave different in practice.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
2022-04-22 5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
` (3 preceding siblings ...)
2022-04-22 6:35 ` rguenth at gcc dot gnu.org
@ 2022-04-22 6:53 ` 570070308 at qq dot com
2022-04-22 7:56 ` rguenther at suse dot de
5 siblings, 0 replies; 7+ messages in thread
From: 570070308 at qq dot com @ 2022-04-22 6:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
--- Comment #4 from 。 <570070308 at qq dot com> ---
(In reply to Richard Biener from comment #1)
> Is it really important though?
The doc says that "The asm statement allows you to include assembly
instructions directly within C code. This may help you to maximize performance
in time-sensitive code or to access assembly instructions that are not readily
available to C programs.", and I use extended-asm rather than writing a whole
function with assembly just for maximize performance.
I try my best for not using "memory" clobber, but in some cases I have to use
it, for example, using the asm to operate a list structure like the `struct
list_head` in Linux. It is impossible to list all the list elements in the
Input/OutputOperands.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/105342] [Extended Asm]Memory barrier geater than a function call
2022-04-22 5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
` (4 preceding siblings ...)
2022-04-22 6:53 ` 570070308 at qq dot com
@ 2022-04-22 7:56 ` rguenther at suse dot de
5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2022-04-22 7:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 22 Apr 2022, 570070308 at qq dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105342
>
> --- Comment #4 from 。 <570070308 at qq dot com> ---
> (In reply to Richard Biener from comment #1)
> > Is it really important though?
>
> The doc says that "The asm statement allows you to include assembly
> instructions directly within C code. This may help you to maximize performance
> in time-sensitive code or to access assembly instructions that are not readily
> available to C programs.", and I use extended-asm rather than writing a whole
> function with assembly just for maximize performance.
>
> I try my best for not using "memory" clobber, but in some cases I have to use
> it, for example, using the asm to operate a list structure like the `struct
> list_head` in Linux. It is impossible to list all the list elements in the
> Input/OutputOperands.
Indeed. I think there's duplicate bugreports on the lack of a clobber
specifying "memory reachable by pointer X", the clobber syntax
is somewhat limited here.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-04-22 7:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-22 5:59 [Bug middle-end/105342] New: [Extended Asm]Memory barrier geater than a function call 570070308 at qq dot com
2022-04-22 6:23 ` [Bug middle-end/105342] " rguenth at gcc dot gnu.org
2022-04-22 6:24 ` pinskia at gcc dot gnu.org
2022-04-22 6:24 ` pinskia at gcc dot gnu.org
2022-04-22 6:35 ` rguenth at gcc dot gnu.org
2022-04-22 6:53 ` 570070308 at qq dot com
2022-04-22 7:56 ` rguenther at suse dot de
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).