From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14883 invoked by alias); 23 Jan 2015 13:36:24 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 14782 invoked by uid 48); 23 Jan 2015 13:35:58 -0000 From: "burnus at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/64748] New: OpenACC: "is not a variable" error with deviceptr() Date: Fri, 23 Jan 2015 13:36:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: openacc, rejects-valid X-Bugzilla-Severity: normal X-Bugzilla-Who: burnus at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-01/txt/msg02556.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D64748 Bug ID: 64748 Summary: OpenACC: "is not a variable" error with deviceptr() Product: gcc Version: 5.0 Status: UNCONFIRMED Keywords: openacc, rejects-valid Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org =46rom https://github.com/jefflarkin/openacc-interoperability/ The following program seems to compile with PGI's and Cray's compilers. With GCC, it shows the odd: foo.c:3:30: error: =E2=80=98arr=E2=80=99 is not a variable #pragma acc kernels deviceptr(arr) ^ void set(int n, float val, float * restrict arr) { #pragma acc kernels deviceptr(arr) { for(int i=3D0; i>From gcc-bugs-return-474563-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Fri Jan 23 13:40:13 2015 Return-Path: Delivered-To: listarch-gcc-bugs@gcc.gnu.org Received: (qmail 17562 invoked by alias); 23 Jan 2015 13:40:08 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Delivered-To: mailing list gcc-bugs@gcc.gnu.org Received: (qmail 17411 invoked by uid 48); 23 Jan 2015 13:39:47 -0000 From: "adrien at guinet dot me" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/64749] New: "truncating" instructions generated instead of a load one using SSE & AVX2 intrinsics Date: Fri, 23 Jan 2015 13:40:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 4.8.4 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: adrien at guinet dot me X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-01/txt/msg02557.txt.bz2 Content-length: 2410 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64749 Bug ID: 64749 Summary: "truncating" instructions generated instead of a load one using SSE & AVX2 intrinsics Product: gcc Version: 4.8.4 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: adrien at guinet dot me Created attachment 34553 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34553&action=edit test case The code attached compiles and runs fine (that is the output of the program is the good one) using GCC 4.9. When compiled with GCC 4.8, the output is different and incorrect. Indeed, when compiled with GCC 4.8, some kind of truncating is introduced at the begginig of the loop (in f2). Here is the relevant assembly code (output of GCC 4.8) : xor eax, eax mov rbp, rsp and rsp, 0FFFFFFFFFFFFFFE0h vbroadcastss ymm3, xmm6 add rsp, 10h nop dword ptr [rax] loc_400970: vpmovzxwd ymm4, xmmword ptr [rdx+rax*4] vpmovzxwd ymm2, xmmword ptr [rcx+rax*4] vmovdqa [rsp-8+var_28], ymm4 ; truncation here is done vmovdqa xmm5, xmmword ptr [rsp-8+var_28] vpmulld ymm0, ymm4, ymm2 ; here it uses xmm5 which isn't thus the good value. ; xmm5 and ymm4 should be set like with something like this (like GCC 4.9 does): ; vmovqda xmm5, xmmword ptr [rdx+rax*4] ; vpmovzxwd ymm4, xmm5 vpmulhuw xmm1, xmm5, xmmword ptr [r8+rax*4] vpmovzxwd ymm1, xmm1 vpmulld ymm1, ymm1, ymm3 vpsubd ymm0, ymm0, ymm1 vmovdqa xmmword ptr [rsi+rax*4], xmm0 add rax, 8 cmp rdi, rax ja short loc_400970 GCC 4.9 indeed behaves correctly and generate this assembly code : vbroadcastss ymm3, dword ptr [rbp-14h] xor eax, eax nop dword ptr [rax+00h] loc_4009A8: vmovdqa xmm0, xmmword ptr [rdx+rax*4] ; 128-bits load vpmulhuw xmm2, xmm0, xmmword ptr [r8+rax*4] ; correctly uses xmm0 vpmovzxwd ymm2, xmm2 ; 16->32 bits conversion here vpmulld ymm2, ymm2, ymm3 vpmovzxwd ymm1, xmm0 vpmovzxwd ymm0, xmmword ptr [rcx+rax*4] vpmulld ymm0, ymm1, ymm0 vpsubd ymm0, ymm0, ymm2 vmovaps xmmword ptr [rsi+rax*4], xmm0 add rax, 8 cmp rdi, rax ja short loc_4009A8 Thanks for any help about this! P.S: sorry but I didn't manage to have a shorter test case :/