From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7349E3858D20; Fri, 28 Oct 2022 19:07:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7349E3858D20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666984059; bh=QFleh8YG3Ue2ccE05a5VvJEUhLajEuf+j7BnzeHKX5g=; h=From:To:Subject:Date:From; b=m29qu8X6YnhJGG+WvpvKe50DnOT63yKF+PtWeu0oeQoBw9JmxHn7XxDylaRdJ9cvj 434BktQQK6JvfNXRKKzmAAX2BK3ac/B2m7GT4ohOjGt0zmvY3Kz4HZ256MJJvaf1Oj uxLtwgAFvkkYI/MgWczJJ2OiaWPgxnyjtfm2FHw4= From: "bartoldeman at users dot sourceforge.net" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/107451] New: Segmentation fault with vectorized code. Date: Fri, 28 Oct 2022 19:07:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.3.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bartoldeman at users dot sourceforge.net X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107451 Bug ID: 107451 Summary: Segmentation fault with vectorized code. Product: gcc Version: 11.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bartoldeman at users dot sourceforge.net Target Milestone: --- Created attachment 53785 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D53785&action=3Dedit Test case The following code: double dot(int n, const double *x, int inc_x, const double *y) { int i, ix; double dot[4] =3D { 0.0, 0.0, 0.0, 0.0 } ;=20 ix=3D0; for(i =3D 0; i < n; i++) { dot[0] +=3D x[ix] * y[ix] ; dot[1] +=3D x[ix+1] * y[ix+1] ; dot[2] +=3D x[ix] * y[ix+1] ; dot[3] +=3D x[ix+1] * y[ix] ; ix +=3D inc_x ; } return dot[0] + dot[1] + dot[2] + dot[3]; } int main(void) { double x =3D 0, y =3D 0; return dot(1, &x, 4096*4096, &y); } crashes with (on Linux x86-64) $ gcc -O2 -ftree-vectorize -march=3Dhaswell crash.c -o crash $ ./a.out=20 Segmentation fault for GCC 11.3.0 and also the current prerelease (gcc version 11.3.1 20221021= ), and also when patched with the patches from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107254 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107212. The loop code assembly is as follows: 18: c5 f9 10 1e vmovupd (%rsi),%xmm3 1c: c5 f9 10 21 vmovupd (%rcx),%xmm4 20: ff c2 inc %edx 22: c4 e3 65 18 0c 06 01 vinsertf128 $0x1,(%rsi,%rax,1),%ymm3,%ymm1 29: c4 e3 5d 18 04 01 01 vinsertf128 $0x1,(%rcx,%rax,1),%ymm4,%ymm0 30: 48 01 c6 add %rax,%rsi 33: 48 01 c1 add %rax,%rcx 36: c4 e3 fd 01 c9 11 vpermpd $0x11,%ymm1,%ymm1 3c: c4 e3 fd 01 c0 14 vpermpd $0x14,%ymm0,%ymm0 42: c4 e2 f5 b8 d0 vfmadd231pd %ymm0,%ymm1,%ymm2 47: 39 fa cmp %edi,%edx 49: 75 cd jne 18 what happens here is that the vinsertf128 instructions take the element from one loop iteration later, and those get put in the high halves of ymm0 and ymm1. The vpermpd instructions then throw away those high halves again, so e.g. t= hey turn 1,2,3,4 into 2,1,2,1 and 1,2,2,1 respectively. So the result is correct but the superfluous vinsertf128 instructions access memory potentially past the end of x or y and thus a produce a segfault. related issue (coming from OpenBLAS): https://github.com/easybuilders/easybuild-easyconfigs/issues/16387 may also be related: https://github.com/xianyi/OpenBLAS/issues/3740#issuecomment-1233899834 (the particular comment shows very similar code but it's for GCC 12 which vectorizes by default, OpenBLAS worked around this by disabling the tree vectorizer there but only on Mac OS and Windows).=