From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-474562-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14883 invoked by alias); 23 Jan 2015 13:36:24 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 14782 invoked by uid 48); 23 Jan 2015 13:35:58 -0000
From: "burnus at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/64748] New: OpenACC: "is not a variable" error with deviceptr()
Date: Fri, 23 Jan 2015 13:36:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: openacc, rejects-valid
X-Bugzilla-Severity: normal
X-Bugzilla-Who: burnus at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter
Message-ID: <bug-64748-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-01/txt/msg02556.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D64748

            Bug ID: 64748
           Summary: OpenACC: "is not a variable" error with deviceptr()
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Keywords: openacc, rejects-valid
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org

=46rom https://github.com/jefflarkin/openacc-interoperability/

The following program seems to compile with PGI's and Cray's compilers. With
GCC, it shows the odd:

foo.c:3:30: error: =E2=80=98arr=E2=80=99 is not a variable
 #pragma acc kernels deviceptr(arr)
                              ^

void set(int n, float val, float * restrict arr)
{
#pragma acc kernels deviceptr(arr)
  {
    for(int i=3D0; i<n; i++)
    {
      arr[i] =3D val;
    }
  }
}
>>From gcc-bugs-return-474563-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Fri Jan 23 13:40:13 2015
Return-Path: <gcc-bugs-return-474563-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 17562 invoked by alias); 23 Jan 2015 13:40:08 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 17411 invoked by uid 48); 23 Jan 2015 13:39:47 -0000
From: "adrien at guinet dot me" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/64749] New: "truncating" instructions generated instead of a load one using SSE & AVX2 intrinsics
Date: Fri, 23 Jan 2015 13:40:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.8.4
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: adrien at guinet dot me
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created
Message-ID: <bug-64749-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-01/txt/msg02557.txt.bz2
Content-length: 2410

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64749

            Bug ID: 64749
           Summary: "truncating" instructions generated instead of a load
                    one using SSE & AVX2 intrinsics
           Product: gcc
           Version: 4.8.4
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: adrien at guinet dot me

Created attachment 34553
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34553&action=edit
test case

The code attached compiles and runs fine (that is the output of the program is
the good one) using GCC 4.9. When compiled with GCC 4.8, the output is
different and incorrect.

Indeed, when compiled with GCC 4.8, some kind of truncating is introduced at
the begginig of the loop (in f2). Here is the relevant assembly code (output of
GCC 4.8) :

xor     eax, eax 
mov     rbp, rsp 
and     rsp, 0FFFFFFFFFFFFFFE0h
vbroadcastss ymm3, xmm6
add     rsp, 10h 
nop     dword ptr [rax]

loc_400970:
  vpmovzxwd ymm4, xmmword ptr [rdx+rax*4]
  vpmovzxwd ymm2, xmmword ptr [rcx+rax*4]
  vmovdqa [rsp-8+var_28], ymm4
; truncation here is done
  vmovdqa xmm5, xmmword ptr [rsp-8+var_28]
  vpmulld ymm0, ymm4, ymm2
; here it uses xmm5 which isn't thus the good value.
; xmm5 and ymm4 should be set like with something like this (like GCC 4.9
does): 
; vmovqda xmm5, xmmword ptr [rdx+rax*4]
; vpmovzxwd ymm4, xmm5
  vpmulhuw xmm1, xmm5, xmmword ptr [r8+rax*4]
  vpmovzxwd ymm1, xmm1
  vpmulld ymm1, ymm1, ymm3
  vpsubd  ymm0, ymm0, ymm1
  vmovdqa xmmword ptr [rsi+rax*4], xmm0
  add     rax, 8
  cmp     rdi, rax 
  ja      short loc_400970

GCC 4.9 indeed behaves correctly and generate this assembly code :

vbroadcastss ymm3, dword ptr [rbp-14h]
xor     eax, eax
nop     dword ptr [rax+00h]
loc_4009A8:                             
  vmovdqa xmm0, xmmword ptr [rdx+rax*4] ; 128-bits load
  vpmulhuw xmm2, xmm0, xmmword ptr [r8+rax*4] ; correctly uses xmm0
  vpmovzxwd ymm2, xmm2 ; 16->32 bits conversion here
  vpmulld ymm2, ymm2, ymm3
  vpmovzxwd ymm1, xmm0
  vpmovzxwd ymm0, xmmword ptr [rcx+rax*4]
  vpmulld ymm0, ymm1, ymm0
  vpsubd  ymm0, ymm0, ymm2
  vmovaps xmmword ptr [rsi+rax*4], xmm0
  add     rax, 8
  cmp     rdi, rax
  ja      short loc_4009A8

Thanks for any help about this!

P.S: sorry but I didn't manage to have a shorter test case :/