[Bug c++/64704] New: software crashed when using vectorizing optimization

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/64704] New: software crashed when using vectorizing optimization
@ 2015-01-21  2:37 zhangyajie_koy at 126 dot com
  2015-01-21  2:44 ` [Bug c++/64704] " pinskia at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: zhangyajie_koy at 126 dot com @ 2015-01-21  2:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

            Bug ID: 64704
           Summary: software crashed when using vectorizing optimization
           Product: gcc
           Version: 4.8.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhangyajie_koy at 126 dot com

when executing the following for() loop,the system crash.
uint16 MessageBuffer::icmp6Checksum(int update)
{
   TRACE_FUNCTION_ENTRY("");
   register uint32 sum = 0xffff;

   struct icmp6_hdr *icmp6Ptr = NULL;
   uint8 type = findPayloadType((void**)&icmp6Ptr);

   register int i;
   uint16 len = getLength();
   register uint16 *ptr = (uint16 *)icmp6Ptr;

   for (i = 0; i < len - 1; i += 2)
   {
        sum += *ptr++;
   }
   return (sum);
}

this code runs OK,when the compiler is 4.4.1, OS is Ubuntu9.10. when the
compiler is 4.8.2,OS is Ubuntu14.04,it is crash. I check the assemble code of
this for()loop, when using 4.8.2, it optimized by 2 ways. first, loop
unrolling, it preunrolled by 10 times. second, the auto vectorizing
optimization.
after several test,i find that, when the actual loops are less than 10,it runs
OK,while, if greater than 10, it is crashed. so, it must be something wrong
with the auto vectorizing.
when i modify the makefile to close the auto vectorizing optimization using 
-O3 -fno-tree-vectorize,it is OK. the assemble code for the for() loop is shown
as below.
  for loop unrolling optimization begin：
 13081bc:       45 8d 4d ff             lea    -0x1(%r13),%r9d
 13081c0:       45 85 c9                test   %r9d,%r9d
 13081c3:       0f 8e 9e 02 00 00       jle    1308467
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3a7>
 13081c9:       41 8d 75 fe             lea    -0x2(%r13),%esi
 13081cd:       48 89 da                mov    %rbx,%rdx
 13081d0:       83 e2 0f                and    $0xf,%edx
 13081d3:       d1 ee                   shr    %esi
 13081d5:       48 d1 ea                shr    %rdx
 13081d8:       8d 7e 01                lea    0x1(%rsi),%edi
 13081db:       48 f7 da                neg    %rdx
 13081de:       83 e2 07                and    $0x7,%edx
 13081e1:       39 d7                   cmp    %edx,%edi
 13081e3:       89 f9                   mov    %edi,%ecx
 13081e5:       0f 46 d7                cmovbe %edi,%edx
 13081e8:       83 ff 0a                cmp    $0xa,%edi
 13081eb:       0f 87 0f 02 00 00       ja     1308400
<_ZN13MessageBuffer13icmp6ChecksumEi+0x340>
 13081f1:       44 0f b7 03             movzwl (%rbx),%r8d
 13081f5:       48 8d 53 02             lea    0x2(%rbx),%rdx
 13081f9:       44 01 c0                add    %r8d,%eax
 13081fc:       83 f9 01                cmp    $0x1,%ecx
 13081ff:       0f 86 95 02 00 00       jbe    130849a
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3da>
 1308205:       44 0f b7 43 02          movzwl 0x2(%rbx),%r8d
 130820a:       48 8d 53 04             lea    0x4(%rbx),%rdx
 130820e:       44 01 c0                add    %r8d,%eax
 1308211:       83 f9 02                cmp    $0x2,%ecx
 1308214:       0f 86 75 02 00 00       jbe    130848f
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3cf>
 130821a:       44 0f b7 43 04          movzwl 0x4(%rbx),%r8d
 130821f:       48 8d 53 06             lea    0x6(%rbx),%rdx
 1308223:       44 01 c0                add    %r8d,%eax
 1308226:       83 f9 03                cmp    $0x3,%ecx
 1308229:       0f 86 97 02 00 00       jbe    13084c6
<_ZN13MessageBuffer13icmp6ChecksumEi+0x406>
 130822f:       44 0f b7 43 06          movzwl 0x6(%rbx),%r8d
 1308234:       48 8d 53 08             lea    0x8(%rbx),%rdx
 1308238:       44 01 c0                add    %r8d,%eax
 130823b:       83 f9 04                cmp    $0x4,%ecx
 130823e:       0f 86 77 02 00 00       jbe    13084bb
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3fb>
 1308244:       44 0f b7 43 08          movzwl 0x8(%rbx),%r8d
 1308249:       48 8d 53 0a             lea    0xa(%rbx),%rdx
 130824d:       44 01 c0                add    %r8d,%eax
 1308250:       83 f9 05                cmp    $0x5,%ecx
 1308253:       0f 86 57 02 00 00       jbe    13084b0
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3f0>
 1308259:       44 0f b7 43 0a          movzwl 0xa(%rbx),%r8d
 130825e:       48 8d 53 0c             lea    0xc(%rbx),%rdx
 1308262:       44 01 c0                add    %r8d,%eax
 1308265:       83 f9 06                cmp    $0x6,%ecx
 1308268:       0f 86 37 02 00 00       jbe    13084a5
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3e5>
 130826e:       44 0f b7 43 0c          movzwl 0xc(%rbx),%r8d
 1308273:       48 8d 53 0e             lea    0xe(%rbx),%rdx
 1308277:       44 01 c0                add    %r8d,%eax
 130827a:       83 f9 07                cmp    $0x7,%ecx
 130827d:       0f 86 f6 01 00 00       jbe    1308479
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3b9>
 1308283:       44 0f b7 43 0e          movzwl 0xe(%rbx),%r8d
 1308288:       48 8d 53 10             lea    0x10(%rbx),%rdx
 130828c:       44 01 c0                add    %r8d,%eax
 130828f:       83 f9 08                cmp    $0x8,%ecx
 1308292:       0f 86 d6 01 00 00       jbe    130846e
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3ae>
 1308298:       44 0f b7 43 10          movzwl 0x10(%rbx),%r8d
 130829d:       48 8d 53 12             lea    0x12(%rbx),%rdx
 13082a1:       44 01 c0                add    %r8d,%eax
 13082a4:       83 f9 09                cmp    $0x9,%ecx
 13082a7:       0f 86 d7 01 00 00       jbe    1308484
<_ZN13MessageBuffer13icmp6ChecksumEi+0x3c4>
 13082ad:       44 0f b7 43 12          movzwl 0x12(%rbx),%r8d
 13082b2:       48 8d 53 14             lea    0x14(%rbx),%rdx
 13082b6:       44 01 c0                add    %r8d,%eax
 13082b9:       41 b8 14 00 00 00       mov    $0x14,%r8d
 13082bf:       39 f9                   cmp    %edi,%ecx
 13082c1:       0f 84 e2 00 00 00       je     13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 for loop unrolling optimization end:

 for loop auto vectorizing optimization begin：
 13082c7:       41 89 fe                mov    %edi,%r14d
 13082ca:       41 89 ca                mov    %ecx,%r10d
 13082cd:       41 29 ce                sub    %ecx,%r14d
 13082d0:       44 89 f5                mov    %r14d,%ebp
 13082d3:       c1 ed 03                shr    $0x3,%ebp
 13082d6:       8d 0c ed 00 00 00 00    lea    0x0(,%rbp,8),%ecx
 13082dd:       85 c9                   test   %ecx,%ecx
 13082df:       74 69                   je     130834a
<_ZN13MessageBuffer13icmp6ChecksumEi+0x28a>
 13082e1:       66 0f ef c0             pxor   %xmm0,%xmm0
 13082e5:       4e 8d 1c 53             lea    (%rbx,%r10,2),%r11
 13082e9:       66 0f ef d2             pxor   %xmm2,%xmm2
 13082ed:       45 31 d2                xor    %r10d,%r10d
 13082f0:       66 41 0f 6f 0b          movdqa (%r11),%xmm1
 13082f5:       41 83 c2 01             add    $0x1,%r10d
 13082f9:       49 83 c3 10             add    $0x10,%r11
 13082fd:       44 39 d5                cmp    %r10d,%ebp
 1308300:       66 0f 6f e1             movdqa %xmm1,%xmm4
 1308304:       66 0f 69 ca             punpckhwd %xmm2,%xmm1
 1308308:       66 0f 61 e2             punpcklwd %xmm2,%xmm4
 130830c:       66 0f fe c4             paddd  %xmm4,%xmm0
 1308310:       66 0f fe c1             paddd  %xmm1,%xmm0
 1308314:       77 da                   ja     13082f0
<_ZN13MessageBuffer13icmp6ChecksumEi+0x230>
 1308316:       66 0f 6f e8             movdqa %xmm0,%xmm5
 130831a:       41 89 ca                mov    %ecx,%r10d
 130831d:       45 8d 04 48             lea    (%r8,%rcx,2),%r8d
 1308321:       4a 8d 14 52             lea    (%rdx,%r10,2),%rdx
 1308325:       66 0f 73 dd 08          psrldq $0x8,%xmm5
 130832a:       66 0f fe c5             paddd  %xmm5,%xmm0
 130832e:       66 0f 6f f0             movdqa %xmm0,%xmm6
 1308332:       66 0f 73 de 04          psrldq $0x4,%xmm6
 1308337:       66 0f fe c6             paddd  %xmm6,%xmm0
 130833b:       66 0f 7e 44 24 0c       movd   %xmm0,0xc(%rsp)
 1308341:       03 44 24 0c             add    0xc(%rsp),%eax
 1308345:       41 39 ce                cmp    %ecx,%r14d
 1308348:       74 5f                   je     13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 130834a:       0f b7 0a                movzwl (%rdx),%ecx
 130834d:       01 c8                   add    %ecx,%eax
 130834f:       41 8d 48 02             lea    0x2(%r8),%ecx
 1308353:       44 39 c9                cmp    %r9d,%ecx
 1308356:       7d 51                   jge    13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 1308358:       0f b7 4a 02             movzwl 0x2(%rdx),%ecx
 130835c:       01 c8                   add    %ecx,%eax
 130835e:       41 8d 48 04             lea    0x4(%r8),%ecx
 1308362:       41 39 c9                cmp    %ecx,%r9d
 1308365:       7e 42                   jle    13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 1308367:       0f b7 4a 04             movzwl 0x4(%rdx),%ecx
 130836b:       01 c8                   add    %ecx,%eax
 130836d:       41 8d 48 06             lea    0x6(%r8),%ecx
 1308371:       41 39 c9                cmp    %ecx,%r9d
 1308374:       7e 33                   jle    13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 1308376:       0f b7 4a 06             movzwl 0x6(%rdx),%ecx
 130837a:       01 c8                   add    %ecx,%eax
 130837c:       41 8d 48 08             lea    0x8(%r8),%ecx
 1308380:       41 39 c9                cmp    %ecx,%r9d
 1308383:       7e 24                   jle    13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 1308385:       0f b7 4a 08             movzwl 0x8(%rdx),%ecx
 1308389:       01 c8                   add    %ecx,%eax
 130838b:       41 8d 48 0a             lea    0xa(%r8),%ecx
 130838f:       41 39 c9                cmp    %ecx,%r9d
 1308392:       7e 15                   jle    13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 1308394:       0f b7 4a 0a             movzwl 0xa(%rdx),%ecx
 1308398:       41 83 c0 0c             add    $0xc,%r8d
 130839c:       01 c8                   add    %ecx,%eax
 130839e:       45 39 c1                cmp    %r8d,%r9d
 13083a1:       7e 06                   jle    13083a9
<_ZN13MessageBuffer13icmp6ChecksumEi+0x2e9>
 13083a3:       0f b7 52 0c             movzwl 0xc(%rdx),%edx
 13083a7:       01 d0                   add    %edx,%eax
 13083a9:       48 8d 5c 73 02          lea    0x2(%rbx,%rsi,2),%rbx
 13083ae:       01 ff                   add    %edi,%edi
 for loop auto vectorizing optimization end:

our cpu info is：
Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
i have 2cpu, 8cores for each cpu.
>From gcc-bugs-return-474154-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Jan 21 02:39:28 2015
Return-Path: <gcc-bugs-return-474154-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 18466 invoked by alias); 21 Jan 2015 02:39:24 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 18401 invoked by uid 48); 21 Jan 2015 02:39:14 -0000
From: "zhangyajie_koy at 126 dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/64704] software crashed when using vectorizing optimization
Date: Wed, 21 Jan 2015 02:39:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 4.8.2
X-Bugzilla-Keywords:
X-Bugzilla-Severity: major
X-Bugzilla-Who: zhangyajie_koy at 126 dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_severity
Message-ID: <bug-64704-4-VRWOqt6LQb@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64704-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64704-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-01/txt/msg02148.txt.bz2
Content-length: 403

https://gcc.gnu.org/bugzilla/show_bug.cgi?idd704

kathy <zhangyajie_koy at 126 dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major

--- Comment #1 from kathy <zhangyajie_koy at 126 dot com> ---
is this a GCC bug? look for your response,sincerely


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
@ 2015-01-21  2:44 ` pinskia at gcc dot gnu.org
  2015-01-21  3:03 ` zhangyajie_koy at 126 dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-01-21  2:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Most likely because icmp6Ptr is not aligned to 16 bits like you say it is by
doing:
   register uint16 *ptr = (uint16 *)icmp6Ptr;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
  2015-01-21  2:44 ` [Bug c++/64704] " pinskia at gcc dot gnu.org
@ 2015-01-21  3:03 ` zhangyajie_koy at 126 dot com
  2015-01-21  8:40 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zhangyajie_koy at 126 dot com @ 2015-01-21  3:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

--- Comment #3 from kathy <zhangyajie_koy at 126 dot com> ---
(In reply to Andrew Pinski from comment #2)
> Most likely because icmp6Ptr is not aligned to 16 bits like you say it is by
> doing:
>    register uint16 *ptr = (uint16 *)icmp6Ptr;

i am not understand the assemble code clearly. but it seems that if the pointer
is not aligned to 16 bits. the comiler will do something to make ptr is aligned
to 16,and than executing the vectorizing loop.
i think, form line:13082c7 to 1308348, the code is to doing something with
align?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
  2015-01-21  2:44 ` [Bug c++/64704] " pinskia at gcc dot gnu.org
  2015-01-21  3:03 ` zhangyajie_koy at 126 dot com
@ 2015-01-21  8:40 ` jakub at gcc dot gnu.org
  2015-01-21 20:11 ` maltsevm at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-01-21  8:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
If the pointer is not 16-bit aligned and you dereference it, you invoke
undefined behavior and anything can happen.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
                   ` (2 preceding siblings ...)
  2015-01-21  8:40 ` jakub at gcc dot gnu.org
@ 2015-01-21 20:11 ` maltsevm at gmail dot com
  2015-01-23  1:22 ` zhangyajie_koy at 126 dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: maltsevm at gmail dot com @ 2015-01-21 20:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

Mikhail Maltsev <maltsevm at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maltsevm at gmail dot com

--- Comment #5 from Mikhail Maltsev <maltsevm at gmail dot com> ---
(In reply to kathy from comment #3)
> i think, form line:13082c7 to 1308348, the code is to doing something with
> align?
Yes
13082f0:       66 41 0f 6f 0b          movdqa (%r11),%xmm1
Address in %r11 is expected to be aligned by 16-byte boundary.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
                   ` (3 preceding siblings ...)
  2015-01-21 20:11 ` maltsevm at gmail dot com
@ 2015-01-23  1:22 ` zhangyajie_koy at 126 dot com
  2015-01-23  1:28 ` zhangyajie_koy at 126 dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zhangyajie_koy at 126 dot com @ 2015-01-23  1:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

--- Comment #6 from kathy <zhangyajie_koy at 126 dot com> ---
(In reply to Mikhail Maltsev from comment #5)
> (In reply to kathy from comment #3)
> > i think, form line:13082c7 to 1308348, the code is to doing something with
> > align?
> Yes
> 13082f0:       66 41 0f 6f 0b          movdqa (%r11),%xmm1
> Address in %r11 is expected to be aligned by 16-byte boundary.
what can i do to make the ptr aligned by 16-byte.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
                   ` (4 preceding siblings ...)
  2015-01-23  1:22 ` zhangyajie_koy at 126 dot com
@ 2015-01-23  1:28 ` zhangyajie_koy at 126 dot com
  2015-01-23  7:04 ` zhangyajie_koy at 126 dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: zhangyajie_koy at 126 dot com @ 2015-01-23  1:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

--- Comment #7 from kathy <zhangyajie_koy at 126 dot com> ---
(In reply to Mikhail Maltsev from comment #5)
> (In reply to kathy from comment #3)
> > i think, form line:13082c7 to 1308348, the code is to doing something with
> > align?
> Yes
> 13082f0:       66 41 0f 6f 0b          movdqa (%r11),%xmm1
> Address in %r11 is expected to be aligned by 16-byte boundary.

what can i do to make the ptr aligned by 16-byte.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
                   ` (5 preceding siblings ...)
  2015-01-23  1:28 ` zhangyajie_koy at 126 dot com
@ 2015-01-23  7:04 ` zhangyajie_koy at 126 dot com
  2015-01-24  3:31 ` maltsevm at gmail dot com
  2015-04-08  9:14 ` mpolacek at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: zhangyajie_koy at 126 dot com @ 2015-01-23  7:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

--- Comment #8 from kathy <zhangyajie_koy at 126 dot com> ---
(In reply to Mikhail Maltsev from comment #5)
> (In reply to kathy from comment #3)
> > i think, form line:13082c7 to 1308348, the code is to doing something with
> > align?
> Yes
> 13082f0:       66 41 0f 6f 0b          movdqa (%r11),%xmm1
> Address in %r11 is expected to be aligned by 16-byte boundary.

i heard of that it is not necesary to aligned by 16-byte in x86


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
                   ` (6 preceding siblings ...)
  2015-01-23  7:04 ` zhangyajie_koy at 126 dot com
@ 2015-01-24  3:31 ` maltsevm at gmail dot com
  2015-04-08  9:14 ` mpolacek at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: maltsevm at gmail dot com @ 2015-01-24  3:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

--- Comment #9 from Mikhail Maltsev <maltsevm at gmail dot com> ---
>what can i do to make the ptr aligned by 16-byte.
Well, you may skip first few bytes (of course not just discard them, but
process one-by-one). 
Fortunately, you don't need to do it manually, it can be done by the compiler.
The problem is that when you use a pointer to uint16, GCC assumes that it's
already aligned by 2 byte boundary (if it's not true, the behavior is
undefined). Consider this program:

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <linux/icmpv6.h>

typedef uint8_t uint8;
typedef uint16_t uint16;
typedef uint32_t uint32;

uint8 buf[1024] = { 0xFF, 0x01, 0x00, 0x02, 0x00 };

class MessageBuffer
{
public:
            MessageBuffer(uint8 *data, uint16 len) :
                            data_(data), len_(len) { }
    uint16  getLength() { return len_ - 1; }
    uint16  __attribute__((noinline)) icmp6Checksum_ub (int update);
    uint16  __attribute__((noinline)) icmp6Checksum_naive (int update);
    uint8   __attribute__((noinline)) findPayloadType (void **payloadStart)
    {
        uint8 *p;
        asm volatile ("leaq 1(%0), %1" : "=r"(p) : "r"(data_) : );
        /* p = data_ + 1; GCC will not use this information during tree
optimization */
        *payloadStart = p;
        return ICMPV6_ECHO_REQUEST;
    }
private:
    uint8 *data_;
    uint16 len_;
};

uint16 MessageBuffer::icmp6Checksum_ub(int)
{
    register uint32 sum = 0xffff;

    struct icmp6_hdr *icmp6Ptr = NULL;
    uint8 type = findPayloadType((void**)&icmp6Ptr);
    (void)type; /* inhibit warning */
    register int i;
    uint16 len = getLength();
    register uint16 *ptr = (uint16 *)icmp6Ptr;
    for (i = 0; i < len - 1; i += 2) {
        sum += *ptr++;
    }
    return (sum);
}

uint16 MessageBuffer::icmp6Checksum_naive(int)
{
    register uint32 sum = 0xffff;

    uint8 *data;
    findPayloadType((void**)&data);
    uint16 len = getLength();
    for (int i = 0; i < len - 1; i += 2) {
        sum += data[i] | (data[i + 1] << 8);
    }
    return (sum);
}

int main()
{
    MessageBuffer buffer(buf, 1000);
    printf("0x%.4x\n", buffer.icmp6Checksum_naive(0));
    printf("0x%.4x\n", buffer.icmp6Checksum_ub(0));
}

icmp6Checksum_naive calculates the checksum (I hope at least) and
icmp6Checksum_ub causes segfault (I tried on g++ -O3 -funroll-loops -msse2, GCC
4.8.2).

>i heard of that it is not necesary to aligned by 16-byte in x86
Maybe you confuse movdqa and movdqu (or some other instruction)?

Here is a universal implementation from Linux kernel (there are also
platform-specific versions):
http://lxr.free-electrons.com/source/lib/checksum.c
Notice that the case when address is odd is handled separately (especially in
platform-specific code).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c++/64704] software crashed when using vectorizing optimization
  2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
                   ` (7 preceding siblings ...)
  2015-01-24  3:31 ` maltsevm at gmail dot com
@ 2015-04-08  9:14 ` mpolacek at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2015-04-08  9:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64704

Marek Polacek <mpolacek at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |mpolacek at gcc dot gnu.org
         Resolution|---                         |INVALID

--- Comment #10 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
Doesn't look like a GCC bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-04-08  9:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-21  2:37 [Bug c++/64704] New: software crashed when using vectorizing optimization zhangyajie_koy at 126 dot com
2015-01-21  2:44 ` [Bug c++/64704] " pinskia at gcc dot gnu.org
2015-01-21  3:03 ` zhangyajie_koy at 126 dot com
2015-01-21  8:40 ` jakub at gcc dot gnu.org
2015-01-21 20:11 ` maltsevm at gmail dot com
2015-01-23  1:22 ` zhangyajie_koy at 126 dot com
2015-01-23  1:28 ` zhangyajie_koy at 126 dot com
2015-01-23  7:04 ` zhangyajie_koy at 126 dot com
2015-01-24  3:31 ` maltsevm at gmail dot com
2015-04-08  9:14 ` mpolacek at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).