From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DC7C83858D33; Thu, 18 Jan 2024 05:52:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DC7C83858D33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1705557130; bh=Y9le9CnCErNK3kDm1whf9qegCZHKGLPadTucjqSyNgI=; h=From:To:Subject:Date:From; b=QU2bNaxgk03o8wy0HLh4AvPmop1oPIeJeW1RB+NuunK+8r67I8sOo4eWgfez7cwTD SmVbqg6oShv+MwI995P7CaAjdbu6l0wncob4jJPw31ifau6zU/csPsSFUbrL04q9o1 Cdk/PHa3pybl50jnG6RlZX41ooZR3lPSBFTlOaKQ= From: "gnu at kosak dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/113468] New: copy of large struct violates semantics of 'volatile' Date: Thu, 18 Jan 2024 05:52:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: gnu at kosak dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113468 Bug ID: 113468 Summary: copy of large struct violates semantics of 'volatile' Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gnu at kosak dot com Target Milestone: --- Created attachment 57132 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D57132&action=3Dedit the .ii file from --save-temps Hello, I'm not 100% certain, but I'm pretty sure that the assembly generated for t= his code violates the semantics for 'volatile'. The reason I think so is that it reads and writes the same memory locations more than once, and also reads a= nd writes them out of order. I believe that both of these actions violate the rules for 'volatile'. ``` struct Foo { volatile int values[256]; }; void copy(const Foo *src, Foo *dest) { *dest =3D *src; } ``` Relevant assembly on x86_64, invoked with g++ -S -O3 test.cc ``` _Z4copyPK3FooPS_: .LFB0: .cfi_startproc endbr64 movq (%rdi), %rdx movq %rdi, %rax movq %rsi, %rcx movq %rdx, (%rsi) movq 1016(%rdi), %rdx leaq 8(%rsi), %rdi andq $-8, %rdi subq %rdi, %rcx movq %rdx, 1016(%rsi) subq %rcx, %rax addl $1024, %ecx movq %rax, %rsi shrl $3, %ecx rep movsq ret .cfi_endproc ``` Analysis: The compiler wants to use "rep movsq", but Foo is only aligned to 4 bytes, = so the generated code takes extra pains to make sure that the destination regi= ster rdi is aligned to an 8 byte boundary so that "rep movsq" can be efficient. The trick used is to first move the first and last quadwords (regardless of alignment), and then do a "rep movsq" of the interior, but with care taken = so that the destination is aligned to an 8-byte boundary. There is a 2x2 matri= x of cases here: src will be aligned to either 4 or 8; likewise dest will be ali= gned to 4 or 8. To illustrate the problem, consider when src and dest are aligne= d to 4 but not 8. For the sake of the example, assume that src is 0x50000004 and dest is 0x60000004 First, the code moves a quadword from [0x50000004] to [0x60000004]. Call th= is [FIRST STEP] because we reference it below. Then it moves a quadword from [0x50000004+1016] to [0x60000004+1016]. To my mind this already is a problem because it violates the ordering assumptions= of 'volatile'. But that's not the only problem. Finally the code does a bit of arithmetic and masking, ultimately ending up with rsi =3D 0x50000008 and rdi =3D 0x60000008 and then it does the "rep mo= vsq". My main objection is that this is a partially overlapping re-read (and re-write) of one-half of the same quadword that was copied in [FIRST STEP], which I believe violates the rules for volatile. If Foo were a bank of hard= ware registers this might be an unwelcome access pattern. Similar violations happen for the other cases in the 2x2 matrix. Apologies if I'm wrong about the above. I'm certainly not a 'volatile' expe= rt but this strikes me as rather hinky. I would stress that I have no objection to the code for the non-volatile ca= se. Output from g++ -v: ``` Using built-in specs. COLLECT_GCC=3Dg++ COLLECT_LTO_WRAPPER=3D/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper OFFLOAD_TARGET_NAMES=3Dnvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=3D1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion=3D'Ubuntu 13.2.0-4ub= untu3' --with-bugurl=3Dfile:///usr/share/doc/gcc-13/README.Bugs --enable-languages=3Dc,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=3D/usr --with-gcc-major-version-only --program-suffix=3D-13 --program-prefix=3Dx86_64-linux-gnu- --enable-shared --enable-linker-build-= id --libexecdir=3D/usr/libexec --without-included-gettext --enable-threads=3Dp= osix --libdir=3D/usr/lib --enable-nls --enable-bootstrap --enable-clocale=3Dgnu --enable-libstdcxx-debug --enable-libstdcxx-time=3Dyes --with-default-libstdcxx-abi=3Dnew --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-= zlib --enable-libphobos-checking=3Drelease --with-target-system-zlib=3Dauto --enable-objc-gc=3Dauto --enable-multiarch --disable-werror --enable-cet --with-arch-32=3Di686 --with-abi=3Dm64 --with-multilib-list=3Dm32,m64,mx32 --enable-multilib --with-tune=3Dgeneric --enable-offload-targets=3Dnvptx-none=3D/build/gcc-13-XYspKM/gcc-13-13.2.0/= debian/tmp-nvptx/usr,amdgcn-amdhsa=3D/build/gcc-13-XYspKM/gcc-13-13.2.0/deb= ian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=3Drelease --build=3Dx86_64-linux-gnu --host=3Dx86_64-linux-gnu --target=3Dx86_64-linu= x-gnu --with-build-config=3Dbootstrap-lto-lean --enable-link-serialization=3D2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (Ubuntu 13.2.0-4ubuntu3) ```=