public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2
@ 2023-05-12  2:30 yinyuefengyi at gmail dot com
  2023-05-12  2:56 ` [Bug middle-end/109821] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: yinyuefengyi at gmail dot com @ 2023-05-12  2:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

            Bug ID: 109821
           Summary: vect: Different output with -O2 -ftree-loop-vectorize
                    compared to -O2
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yinyuefengyi at gmail dot com
  Target Milestone: ---

For this test code, it aims to generate special patterns different with memcpy
or memmove, it generates different results with -O2 -ftree-loop-vectorize
compared to -O2, is this a bug of vectorizer that lack of checking the gap of
op-src should be larger than vector mode size (here only do vectorize if op -
src > 16)?

copy.cpp:

#include <stdio.h>
#include <cstdint>
#include <stdlib.h>

#define UNALIGNED_LOAD64(_p) (*reinterpret_cast<const uint64_t *>(_p))
#define UNALIGNED_STORE64(_p, _val) (*reinterpret_cast<uint64_t *>(_p) =
(_val))

__attribute__((__noinline__))
static void IncrementalCopyFastPath(const char* src, char* op, int len) {
    while (op - src < 8) {
        UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
        len -= op - src;
        op += op - src;
    }
    while (len > 0) {
        UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
        src += 8;
        op += 8;
        len -= 8;
    }
}

int main ()
{
  char src[] = "123456789abcdefghijklmnopqrstu";
  char *op = src+12;
  char * dst = op;
  IncrementalCopyFastPath (src, op, 36);
  int i = 0;
  while (i < 36)
    {printf("%x ", *(dst+i)), i++;}
  printf("\n");
  return 0;
}


$ gcc copy.cpp -O2 -o a.out.good
$ ./a.out.good
30 31 32 33 34 35 36 37 38 39 61 62 30 31 32 33 34 35 36 37 38 39 61 62 30 31
32 33 34 35 36 37 38 39 61 62
$ gcc copy.cpp -O2 -ftree-loop-vectorize  -o a.out.bad
$ ./a.out.bad
30 31 32 33 34 35 36 37 38 39 61 62 63 64 65 66 34 35 36 37 38 39 61 62 63 64
65 66 73 74 75 76 38 39 61 62


gimple after t.vect:

IncrementalCopyFastPath.constprop (const char * src, char * op)
{
...
  <bb 2> [local count: 118111600]:
  _4 = src_8(D) + 8;
  if (_4 != op_9(D))    // <=  the check should be op_9 > src_8 + 16 here?
    goto <bb 16>; [80.00%]
  else
    goto <bb 10>; [20.00%]
...
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/109821] vect: Different output with -O2 -ftree-loop-vectorize compared to -O2
  2023-05-12  2:30 [Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2 yinyuefengyi at gmail dot com
@ 2023-05-12  2:56 ` pinskia at gcc dot gnu.org
  2023-05-12  3:13 ` yinyuefengyi at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-12  2:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Two issues which make this undefined. First the unaligned macros still use
aligned types which gcc uses for alignment of the pointer type.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/109821] vect: Different output with -O2 -ftree-loop-vectorize compared to -O2
  2023-05-12  2:30 [Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2 yinyuefengyi at gmail dot com
  2023-05-12  2:56 ` [Bug middle-end/109821] " pinskia at gcc dot gnu.org
@ 2023-05-12  3:13 ` yinyuefengyi at gmail dot com
  2023-05-12  3:29 ` pinskia at gcc dot gnu.org
  2023-05-12  3:42 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: yinyuefengyi at gmail dot com @ 2023-05-12  3:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

--- Comment #2 from Xionghu Luo (luoxhu at gcc dot gnu.org) <yinyuefengyi at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> Two issues which make this undefined. First the unaligned macros still use
> aligned types which gcc uses for alignment of the pointer type.

Thanks Andrew :), and the second issue is?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/109821] vect: Different output with -O2 -ftree-loop-vectorize compared to -O2
  2023-05-12  2:30 [Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2 yinyuefengyi at gmail dot com
  2023-05-12  2:56 ` [Bug middle-end/109821] " pinskia at gcc dot gnu.org
  2023-05-12  3:13 ` yinyuefengyi at gmail dot com
@ 2023-05-12  3:29 ` pinskia at gcc dot gnu.org
  2023-05-12  3:42 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-12  3:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The second issue an buffer overflow:

-O0 -fsanitize=undefined -fsanitize=address, we get:

/app/example.cpp:19:9: runtime error: store to misaligned address
0x7f4fad20002c for type 'uint64_t', which requires 8 byte alignment
0x7f4fad20002c: note: pointer points here
  39 61 62 63 64 65 66 67  68 69 6a 6b 6c 6d 6e 6f  70 71 72 73 74 75 00 00  00
00 00 00 00 00 00 00
              ^ 
=================================================================
==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f4fad20003c
at pc 0x0000004013c2 bp 0x7fff80744570 sp 0x7fff80744568
WRITE of size 8 at 0x7f4fad20003c thread T0
    #0 0x4013c1 in IncrementalCopyFastPath /app/example.cpp:19
    #1 0x4015c1 in main /app/example.cpp:32
    #2 0x7f4faf58f082 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId:
1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #3 0x40114d in _start (/app/output.s+0x40114d) (BuildId:
634f65b0ae11c54185ffa1239c341eaa4dd87ee3)

Address 0x7f4fad20003c is located in stack of thread T0 at offset 60 in frame
    #0 0x401456 in main /app/example.cpp:27

  This frame has 1 object(s):
    [32, 63) 'src' (line 29) <== Memory access at offset 60 partially overflows
this variable
HINT: this may be a false positive if your program uses some custom stack
unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /app/example.cpp:20 in
IncrementalCopyFastPath
Shadow bytes around the buggy address:
  0x7f4fad1ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad1ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad1ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad1fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad1fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x7f4fad200000: f1 f1 f1 f1 00 00 00[07]f3 f3 f3 f3 00 00 00 00
  0x7f4fad200080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad200100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad200180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad200200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f4fad200280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1==ABORTING

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/109821] vect: Different output with -O2 -ftree-loop-vectorize compared to -O2
  2023-05-12  2:30 [Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2 yinyuefengyi at gmail dot com
                   ` (2 preceding siblings ...)
  2023-05-12  3:29 ` pinskia at gcc dot gnu.org
@ 2023-05-12  3:42 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-12  3:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Using:
typedef uint64_t ua64_t __attribute__((aligned(1), may_alias));

#define UNALIGNED_LOAD64(_p) (*reinterpret_cast<const ua64_t *>(_p))
#define UNALIGNED_STORE64(_p, _val) (*reinterpret_cast<ua64_t *>(_p) = (_val))


Does fix the "bug" too. Though not the buffer overflow.
increasing src to 100 bytes fixes that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-12  3:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-12  2:30 [Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2 yinyuefengyi at gmail dot com
2023-05-12  2:56 ` [Bug middle-end/109821] " pinskia at gcc dot gnu.org
2023-05-12  3:13 ` yinyuefengyi at gmail dot com
2023-05-12  3:29 ` pinskia at gcc dot gnu.org
2023-05-12  3:42 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).