[Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)
@ 2014-10-29 15:18 peter.bumbulis at ianywhere dot com
  2014-10-29 18:01 ` [Bug c/63678] " jakub at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: peter.bumbulis at ianywhere dot com @ 2014-10-29 15:18 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 12121 bytes --]

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

            Bug ID: 63678
           Summary: __mm256_blend_epi16 only accepts 8-bit masks (should
                    accept 16-bit)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter.bumbulis at ianywhere dot com

Created attachment 33844
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33844&action=edit
.i file for repro.

__mm256_blend_epi16 only accepts 8-bit masks as the 3rd parameter, not 16.  The
Intel and Microsoft compilers handle this properly.

$ gcc -c -mavx2 -save-temps foo.c 
foo.c: In function â€˜blendâ€™:
foo.c:4:46: error: the last argument must be an 8-bit immediate
  return _mm256_blend_epi16(a, b, 0xabcd);
                                              ^
where foo.c is

#include <immintrin.h>

__m256i blend(__m256i a, __m256i b) {
        return _mm256_blend_epi16(a, b, 0xabcd);
}

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.2-19ubuntu1'
--with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.8 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls
--with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap
--enable-plugin --with-system-zlib --disable-browser-plugin
--enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
>From gcc-bugs-return-465224-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Oct 29 15:18:25 2014
Return-Path: <gcc-bugs-return-465224-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 11472 invoked by alias); 29 Oct 2014 15:18:25 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 11418 invoked by uid 55); 29 Oct 2014 15:18:21 -0000
From: "marxin at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/63587] [5 Regression] ICE : tree check: expected var_decl, have result_decl in add_local_variables, at tree-inline.c:4112
Date: Wed, 29 Oct 2014 15:19:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: ipa
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: marxin at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: marxin at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-63587-4-jalCxzbN4e@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-63587-4@http.gcc.gnu.org/bugzilla/>
References: <bug-63587-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-10/txt/msg02245.txt.bz2
Content-length: 845

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63587

--- Comment #14 from Martin LiÅ¡ka <marxin at gcc dot gnu.org> ---
Author: marxin
Date: Wed Oct 29 15:17:42 2014
New Revision: 216841

URL: https://gcc.gnu.org/viewcvs?rev=216841&root=gcc&view=rev
Log:
PR ipa/63587

    * g++.dg/ipa/pr63587-1.C: New test
    * g++.dg/ipa/pr63587-2.C: New test.

    * cgraphunit.c (cgraph_node::expand_thunk): Only VAR_DECLs are put
    to local declarations.
    * function.c (add_local_decl): Implementation moved from header
    file, assert introduced for tree type.
    * function.h: Likewise.


Added:
    trunk/gcc/testsuite/g++.dg/ipa/pr63587-1.C
    trunk/gcc/testsuite/g++.dg/ipa/pr63587-2.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cgraphunit.c
    trunk/gcc/function.c
    trunk/gcc/function.h
    trunk/gcc/testsuite/ChangeLog
>From gcc-bugs-return-465225-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Oct 29 15:19:26 2014
Return-Path: <gcc-bugs-return-465225-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 12482 invoked by alias); 29 Oct 2014 15:19:26 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 12427 invoked by uid 48); 29 Oct 2014 15:19:22 -0000
From: "marxin at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/63587] [5 Regression] ICE : tree check: expected var_decl, have result_decl in add_local_variables, at tree-inline.c:4112
Date: Wed, 29 Oct 2014 15:44:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: ipa
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: marxin at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: marxin at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status resolution
Message-ID: <bug-63587-4-eOJfRoeA56@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-63587-4@http.gcc.gnu.org/bugzilla/>
References: <bug-63587-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-10/txt/msg02246.txt.bz2
Content-length: 437

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63587

Martin LiÅ¡ka <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #15 from Martin LiÅ¡ka <marxin at gcc dot gnu.org> ---
Resolved.
>From gcc-bugs-return-465226-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Oct 29 15:44:30 2014
Return-Path: <gcc-bugs-return-465226-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 30434 invoked by alias); 29 Oct 2014 15:44:30 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 30399 invoked by uid 48); 29 Oct 2014 15:44:25 -0000
From: "belagod at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/63679] New: [4.9 Regression][AArch64] Failure to constant fold.
Date: Wed, 29 Oct 2014 16:40:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: belagod at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-63679-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-10/txt/msg02247.txt.bz2
Content-length: 2786

https://gcc.gnu.org/bugzilla/show_bug.cgi?idc679

            Bug ID: 63679
           Summary: [4.9 Regression][AArch64] Failure to constant fold.
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: belagod at gcc dot gnu.org

When this piece of code is compiled with -O3 -mgeneral-regs-only

int __attribute__ ((noinline))
foo ()
{
  const int a[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };
  int i, sum;

  sum = 0;
  for (i = 0; i < sizeof (a) / sizeof (*a); i++)
    sum += a[i];

  return sum;
}

4.9 gcc generates:

foo:
    sub    sp, sp, #32
    mov    w0, 28
    add    sp, sp, 32
    ret
    .size    foo, .-foo
    .ident    "GCC: (unknown) 4.9.2 20140930 (prerelease)"

5.0 generates:

foo:
    adrp    x0, .LANCHOR0
    sub    sp, sp, #32
    add    x0, x0, :lo12:.LANCHOR0
    ldr    x7, [x0]
    ldr    x6, [x0, 16]
    ldr    x1, [x0, 8]
    sbfx    x5, x7, 32, 32
    ldr    x0, [x0, 24]
    add    w2, w6, w7
    str    x0, [sp, 24]
    mov    x4, x1
    str    x1, [sp, 8]
    sbfx    x1, x6, 32, 32
    ldr    x3, [sp, 24]
    add    w1, w1, w5
    add    w1, w1, w2
    str    x7, [sp]
    add    w0, w3, w4
    sbfx    x4, x4, 32, 32
    sbfx    x3, x3, 32, 32
    add    w0, w0, w1
    add    w3, w4, w3
    str    x6, [sp, 16]
    add    w0, w3, w0
    add    sp, sp, 32
    ret
    .size    foo, .-foo
    .section    .rodata
    .align    3
.LANCHOR0 = . + 0
.LC0:
    .word    0
    .word    1
    .word    2
    .word    3
    .word    4
    .word    5
    .word    6
    .word    7
    .ident    "GCC: (unknown) 5.0.0 20141023 (experimental)"

Constant-folding seems to have got a bit messed up. I've observed this only on
aarch64-none-elf-gcc. 5.0 x86_64 seems to work fine.

foo:
.LFB0:
    .cfi_startproc
    movl    $28, %eax
    ret
    .cfi_endproc
.LFE0:
    .size    foo, .-foo
    .section    .text.unlikely
.LCOLDE0:
    .text
.LHOTE0:
    .ident    "GCC: (GNU) 5.0.0 20141023 (experimental)"
    .section    .note.GNU-stack,"",@progbits


Looks like a aarch64-specific backend issue.

$ aarch64-none-elf-gcc -v
Target: aarch64-none-elf
Configured with: /work/dev/arm/src/gcc/configure --targetªrch64-none-elf
--prefix=/work/dev/arm/bin//install --with-gmp=/work/dev/arm/bin//host-tools
--with-mpfr=/work/dev/arm/bin//host-tools
--with-mpc=/work/dev/arm/bin//host-tools
--with-cloog=/work/dev/arm/bin//host-tools
--with-isl=/work/dev/arm/bin//host-tools --with-pkgversion=unknown
--disable-shared --disable-nls --disable-threads --disable-tls
--enable-checking=yes --enable-languages=c,c++ --with-newlib
Thread model: single
gcc version 5.0.0 20141023 (experimental) (unknown)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)
  2014-10-29 15:18 [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit) peter.bumbulis at ianywhere dot com
@ 2014-10-29 18:01 ` jakub at gcc dot gnu.org
  2014-10-29 18:22 ` [Bug target/63678] " peter.bumbulis at ianywhere dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-10-29 18:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |kyukhin at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What means properly?  The underlying instruction (256-bit VPBLENDW) certainly
accepts only 8-bit mask, and e.g.
https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5369B2B5-B1E1-4D96-85AB-2019982667B4.htm
says nothing what would the upper bits mean, it also says that the mask is
8-bit immediate.  Perhaps icc just doesn't diagnose incorrect masks?
Or do you see that for 16-bit masks _mm256_blend_epi16 would actually emit more
than one insn (say separate blends with the low 8-bit mask, high 8-bit mask and
then blend together)?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)
  2014-10-29 15:18 [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit) peter.bumbulis at ianywhere dot com
  2014-10-29 18:01 ` [Bug c/63678] " jakub at gcc dot gnu.org
@ 2014-10-29 18:22 ` peter.bumbulis at ianywhere dot com
  2014-10-29 18:34 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: peter.bumbulis at ianywhere dot com @ 2014-10-29 18:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #2 from Peter Bumbulis <peter.bumbulis at ianywhere dot com> ---
The referenced web page is incorrect.  Look in the instruction set reference
manual
(https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf,
search for VPBLENDMW) or the intrinsics guide
(https://software.intel.com/sites/landingpage/IntrinsicsGuide/).

These instructions blend 16 bit quantities:  you can fit 16 of these in a 256
bit register.  For AVX512 it's a 32-bit constant.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)
  2014-10-29 15:18 [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit) peter.bumbulis at ianywhere dot com
  2014-10-29 18:01 ` [Bug c/63678] " jakub at gcc dot gnu.org
  2014-10-29 18:22 ` [Bug target/63678] " peter.bumbulis at ianywhere dot com
@ 2014-10-29 18:34 ` jakub at gcc dot gnu.org
  2014-10-29 18:37 ` peter.bumbulis at ianywhere dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-10-29 18:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Peter Bumbulis from comment #2)
> The referenced web page is incorrect.  Look in the instruction set reference
> manual
> (https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf,
> search for VPBLENDMW) or the intrinsics guide
> (https://software.intel.com/sites/landingpage/IntrinsicsGuide/).
> 
> These instructions blend 16 bit quantities:  you can fit 16 of these in a
> 256 bit register.  For AVX512 it's a 32-bit constant.

Your first reference is AVX512 documentation, _mm256_blend_epi16 is not
_mm256_mask_blend_epi16.  _mm256_blend_epi16 is for VPBLENDW instruction, and
the
https://software.intel.com/sites/landingpage/IntrinsicsGuide/ looks incorrect,
because it doesn't describe what the VPBLENDW instruction does.  In particular,
it only has 8-bit immediate, and both 128-bit lanes are blended the same given
that mask:
IF (imm8[0] == 1) THEN DEST[15:0] <- SRC2[15:0]
ELSE DEST[15:0] <- SRC1[15:0]
IF (imm8[1] == 1) THEN DEST[31:16] <- SRC2[31:16]
ELSE DEST[31:16] <- SRC1[31:16]
IF (imm8[2] == 1) THEN DEST[47:32] <- SRC2[47:32]
ELSE DEST[47:32] <- SRC1[47:32]
IF (imm8[3] == 1) THEN DEST[63:48] <- SRC2[63:48]
ELSE DEST[63:48] <- SRC1[63:48]
IF (imm8[4] == 1) THEN DEST[79:64] <- SRC2[79:64]
ELSE DEST[79:64] <- SRC1[79:64]
IF (imm8[5] == 1) THEN DEST[95:80] <- SRC2[95:80]
ELSE DEST[95:80] <- SRC1[95:80]
IF (imm8[6] == 1) THEN DEST[111:96] <- SRC2[111:96]
ELSE DEST[111:96] <- SRC1[111:96]
IF (imm8[7] == 1) THEN DEST[127:112] <- SRC2[127:112]
ELSE DEST[127:112] <- SRC1[127:112]
IF (imm8[0] == 1) THEN DEST[143:128] <- SRC2[143:128]
ELSE DEST[143:128] <- SRC1[143:128]
IF (imm8[1] == 1) THEN DEST[159:144] <- SRC2[159:144]
ELSE DEST[159:144] <- SRC1[159:144]
IF (imm8[2] == 1) THEN DEST[175:160] <- SRC2[175:160]
ELSE DEST[175:160] <- SRC1[175:160]
IF (imm8[3] == 1) THEN DEST[191:176] <- SRC2[191:176]
ELSE DEST[191:176] <- SRC1[191:176]
IF (imm8[4] == 1) THEN DEST[207:192] <- SRC2[207:192]
ELSE DEST[207:192] <- SRC1[207:192]
IF (imm8[5] == 1) THEN DEST[223:208] <- SRC2[223:208]
ELSE DEST[223:208] <- SRC1[223:208]
IF (imm8[6] == 1) THEN DEST[239:224] <- SRC2[239:224]
ELSE DEST[239:224] <- SRC1[239:224]
IF (imm8[7] == 1) THEN DEST[255:240] <- SRC2[255:240]
ELSE DEST[255:240] <- SRC1[255:240]


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)
  2014-10-29 15:18 [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit) peter.bumbulis at ianywhere dot com
                   ` (2 preceding siblings ...)
  2014-10-29 18:34 ` jakub at gcc dot gnu.org
@ 2014-10-29 18:37 ` peter.bumbulis at ianywhere dot com
  2014-10-29 18:38 ` jakub at gcc dot gnu.org
  2014-10-29 19:19 ` jakub at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: peter.bumbulis at ianywhere dot com @ 2014-10-29 18:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #4 from Peter Bumbulis <peter.bumbulis at ianywhere dot com> ---
(In reply to Peter Bumbulis from comment #2)
> The referenced web page is incorrect.  Look in the instruction set reference
> manual
> (https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf,
> search for VPBLENDMW) or the intrinsics guide
> (https://software.intel.com/sites/landingpage/IntrinsicsGuide/).
> 
> These instructions blend 16 bit quantities:  you can fit 16 of these in a
> 256 bit register.  For AVX512 it's a 32-bit constant.

My mistake:  it looks like the generated code only uses the low 8 bytes.  Sorry
for any wasted bandwidth.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)
  2014-10-29 15:18 [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit) peter.bumbulis at ianywhere dot com
                   ` (3 preceding siblings ...)
  2014-10-29 18:37 ` peter.bumbulis at ianywhere dot com
@ 2014-10-29 18:38 ` jakub at gcc dot gnu.org
  2014-10-29 19:19 ` jakub at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-10-29 18:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Trying icc 14.0.2.144 Build 2014012, I see that
a) it indeed fails to report the bug in your source
b) when using -c, it silently discards the upper 8 bits of the immediate, so
   you end up with:
   0:    c4 e3 7d 0e c1 cd        vpblendw $0xcd,%ymm1,%ymm0,%ymm0
c) when using -S, it generates invalid assembly:
        vpblendw  $43981, %ymm1, %ymm0, %ymm0                   #4.16
   which doesn't assemble at least with gas.
So, I believe erroring out on this is significantly better than what icc does
with it.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)
  2014-10-29 15:18 [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit) peter.bumbulis at ianywhere dot com
                   ` (4 preceding siblings ...)
  2014-10-29 18:38 ` jakub at gcc dot gnu.org
@ 2014-10-29 19:19 ` jakub at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-10-29 19:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-29 18:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-29 15:18 [Bug c/63678] New: __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit) peter.bumbulis at ianywhere dot com
2014-10-29 18:01 ` [Bug c/63678] " jakub at gcc dot gnu.org
2014-10-29 18:22 ` [Bug target/63678] " peter.bumbulis at ianywhere dot com
2014-10-29 18:34 ` jakub at gcc dot gnu.org
2014-10-29 18:37 ` peter.bumbulis at ianywhere dot com
2014-10-29 18:38 ` jakub at gcc dot gnu.org
2014-10-29 19:19 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).