public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113468] New: copy of large struct violates semantics of 'volatile'
@ 2024-01-18  5:52 gnu at kosak dot com
  2024-01-18  6:02 ` [Bug target/113468] " pinskia at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: gnu at kosak dot com @ 2024-01-18  5:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113468

            Bug ID: 113468
           Summary: copy of large struct violates semantics of 'volatile'
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gnu at kosak dot com
  Target Milestone: ---

Created attachment 57132
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57132&action=edit
the .ii file from --save-temps

Hello,

I'm not 100% certain, but I'm pretty sure that the assembly generated for this
code violates the semantics for 'volatile'. The reason I think so is that it
reads and writes the same memory locations more than once, and also reads and
writes them out of order. I believe that both of these actions violate the
rules for 'volatile'.

```
struct Foo {
    volatile int values[256];
};

void copy(const Foo *src, Foo *dest) {
    *dest = *src;
}

```

Relevant assembly on x86_64, invoked with g++ -S -O3 test.cc

```
_Z4copyPK3FooPS_:
.LFB0:
        .cfi_startproc
        endbr64
        movq    (%rdi), %rdx
        movq    %rdi, %rax
        movq    %rsi, %rcx
        movq    %rdx, (%rsi)
        movq    1016(%rdi), %rdx
        leaq    8(%rsi), %rdi
        andq    $-8, %rdi
        subq    %rdi, %rcx
        movq    %rdx, 1016(%rsi)
        subq    %rcx, %rax
        addl    $1024, %ecx
        movq    %rax, %rsi
        shrl    $3, %ecx
        rep movsq
        ret
        .cfi_endproc
```

Analysis:

The compiler wants to use "rep movsq", but Foo is only aligned to 4 bytes, so
the generated code takes extra pains to make sure that the destination register
rdi is aligned to an 8 byte boundary so that "rep movsq" can be efficient.

The trick used is to first move the first and last quadwords (regardless of
alignment), and then do a "rep movsq" of the interior, but with care taken so
that the destination is aligned to an 8-byte boundary. There is a 2x2 matrix of
cases here: src will be aligned to either 4 or 8; likewise dest will be aligned
to 4 or 8. To illustrate the problem, consider when src and dest are aligned to
4 but not 8. For the sake of the example, assume that src is 0x50000004 and
dest is 0x60000004

First, the code moves a quadword from [0x50000004] to [0x60000004]. Call this
[FIRST STEP] because we reference it below.

Then it moves a quadword from [0x50000004+1016] to [0x60000004+1016]. To my
mind this already is a problem because it violates the ordering assumptions of
'volatile'. But that's not the only problem.

Finally the code does a bit of arithmetic and masking, ultimately ending up
with rsi = 0x50000008 and rdi = 0x60000008 and then it does the "rep movsq".

My main objection is that this is a partially overlapping re-read (and
re-write) of one-half of the same quadword that was copied in [FIRST STEP],
which I believe violates the rules for volatile. If Foo were a bank of hardware
registers this might be an unwelcome access pattern.

Similar violations happen for the other cases in the 2x2 matrix.

Apologies if I'm wrong about the above. I'm certainly not a 'volatile' expert
but this strikes me as rather hinky.

I would stress that I have no objection to the code for the non-volatile case.

Output from g++ -v:
```
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.2.0-4ubuntu3'
--with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (Ubuntu 13.2.0-4ubuntu3)
```

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug target/113468] copy of large struct violates semantics of 'volatile'
  2024-01-18  5:52 [Bug target/113468] New: copy of large struct violates semantics of 'volatile' gnu at kosak dot com
@ 2024-01-18  6:02 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-18  6:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113468

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup of bug 47409.

*** This bug has been marked as a duplicate of bug 47409 ***

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-01-18  6:02 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-18  5:52 [Bug target/113468] New: copy of large struct violates semantics of 'volatile' gnu at kosak dot com
2024-01-18  6:02 ` [Bug target/113468] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).