From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22696 invoked by alias); 9 May 2016 16:55:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22685 invoked by uid 89); 9 May 2016 16:55:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.0 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=MEM_P, mem_p, rv, x,x X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 09 May 2016 16:55:26 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6717B62647; Mon, 9 May 2016 16:55:25 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-17.ams2.redhat.com [10.36.116.17]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u49GtNof018509 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 9 May 2016 12:55:24 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id u49GtLtU020600; Mon, 9 May 2016 18:55:22 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id u49GtKWR020599; Mon, 9 May 2016 18:55:20 +0200 Date: Mon, 09 May 2016 16:55:00 -0000 From: Jakub Jelinek To: Uros Bizjak , Kirill Yukhin Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] vec_extract XMM16-XMM17 improvements Message-ID: <20160509165520.GJ28550@tucnak.redhat.com> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-IsSubscribed: yes X-SW-Source: 2016-05/txt/msg00651.txt.bz2 Hi! vpextr{b,w} are in AVX512BW, so is vpsrldq, and vpextr{d,q} are in AVX512DQ. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-05-09 Jakub Jelinek * config/i386/i386.md (isa): Add x64_avx512dq, enable if TARGET_64BIT && TARGET_AVX512DQ. * config/i386/sse.md (*vec_extract): Add avx512bw alternatives. (*vec_extract_zext): Add avx512bw alternative. (*vec_extract_0, *vec_extractv4si_0_zext, *vec_extractv2di_0_sse): Use v constraint instead of x constraint. (*vec_extractv4si): Add avx512dq and avx512bw alternatives. (*vec_extractv4si_zext): Add avx512dq alternative. (*vec_extractv2di_1): Add x64_avx512dq and avx512bw alternatives, use v instead of x constraint in other alternatives where possible. * gcc.target/i386/avx512bw-vpextr-1.c: New test. * gcc.target/i386/avx512dq-vpextr-1.c: New test. --- gcc/config/i386/i386.md.jj 2016-05-09 13:33:12.000000000 +0200 +++ gcc/config/i386/i386.md 2016-05-09 16:32:32.219961730 +0200 @@ -796,7 +796,7 @@ (define_attr "isa" "base,x64,x64_sse4,x6 sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx, avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f, fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq, - avx512vl,noavx512vl" + avx512vl,noavx512vl,x64_avx512dq" (const_string "base")) (define_attr "enabled" "" @@ -807,6 +807,8 @@ (define_attr "enabled" "" (symbol_ref "TARGET_64BIT && TARGET_SSE4_1 && !TARGET_AVX") (eq_attr "isa" "x64_avx") (symbol_ref "TARGET_64BIT && TARGET_AVX") + (eq_attr "isa" "x64_avx512dq") + (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ") (eq_attr "isa" "nox64") (symbol_ref "!TARGET_64BIT") (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2") (eq_attr "isa" "sse2_noavx") --- gcc/config/i386/sse.md.jj 2016-05-09 15:08:36.000000000 +0200 +++ gcc/config/i386/sse.md 2016-05-09 16:43:54.213638239 +0200 @@ -13036,39 +13036,44 @@ (define_mode_iterator PEXTR_MODE12 [(V16QI "TARGET_SSE4_1") V8HI]) (define_insn "*vec_extract" - [(set (match_operand: 0 "register_sse4nonimm_operand" "=r,m") + [(set (match_operand: 0 "register_sse4nonimm_operand" "=r,m,r,m") (vec_select: - (match_operand:PEXTR_MODE12 1 "register_operand" "x,x") + (match_operand:PEXTR_MODE12 1 "register_operand" "x,x,v,v") (parallel [(match_operand:SI 2 "const_0_to__operand")])))] "TARGET_SSE2" "@ %vpextr\t{%2, %1, %k0|%k0, %1, %2} - %vpextr\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "*,sse4") + %vpextr\t{%2, %1, %0|%0, %1, %2} + vpextr\t{%2, %1, %k0|%k0, %1, %2} + vpextr\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "*,sse4,avx512bw,avx512bw") (set_attr "type" "sselog1") (set_attr "prefix_data16" "1") (set (attr "prefix_extra") (if_then_else - (and (eq_attr "alternative" "0") + (and (eq_attr "alternative" "0,2") (eq (const_string "mode") (const_string "V8HImode"))) (const_string "*") (const_string "1"))) (set_attr "length_immediate" "1") - (set_attr "prefix" "maybe_vex") + (set_attr "prefix" "maybe_vex,maybe_vex,evex,evex") (set_attr "mode" "TI")]) (define_insn "*vec_extract_zext" - [(set (match_operand:SWI48 0 "register_operand" "=r") + [(set (match_operand:SWI48 0 "register_operand" "=r,r") (zero_extend:SWI48 (vec_select: - (match_operand:PEXTR_MODE12 1 "register_operand" "x") + (match_operand:PEXTR_MODE12 1 "register_operand" "x,v") (parallel [(match_operand:SI 2 "const_0_to__operand")]))))] "TARGET_SSE2" - "%vpextr\t{%2, %1, %k0|%k0, %1, %2}" - [(set_attr "type" "sselog1") + "@ + %vpextr\t{%2, %1, %k0|%k0, %1, %2} + vpextr\t{%2, %1, %k0|%k0, %1, %2}" + [(set_attr "isa" "*,avx512bw") + (set_attr "type" "sselog1") (set_attr "prefix_data16" "1") (set (attr "prefix_extra") (if_then_else @@ -13089,9 +13094,9 @@ (define_insn "*vec_extract_mem" "#") (define_insn "*vec_extract_0" - [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r ,r,x ,m") + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r ,r,v ,m") (vec_select:SWI48 - (match_operand: 1 "nonimmediate_operand" "mYj,x,xm,x") + (match_operand: 1 "nonimmediate_operand" "mYj,v,vm,v") (parallel [(const_int 0)])))] "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "#" @@ -13101,7 +13106,7 @@ (define_insn_and_split "*vec_extractv4si [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI (vec_select:SI - (match_operand:V4SI 1 "register_operand" "x") + (match_operand:V4SI 1 "register_operand" "v") (parallel [(const_int 0)]))))] "TARGET_64BIT && TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_FROM_VEC" "#" @@ -13110,9 +13115,9 @@ (define_insn_and_split "*vec_extractv4si "operands[1] = gen_lowpart (SImode, operands[1]);") (define_insn "*vec_extractv2di_0_sse" - [(set (match_operand:DI 0 "nonimmediate_operand" "=x,m") + [(set (match_operand:DI 0 "nonimmediate_operand" "=v,m") (vec_select:DI - (match_operand:V2DI 1 "nonimmediate_operand" "xm,x") + (match_operand:V2DI 1 "nonimmediate_operand" "vm,v") (parallel [(const_int 0)])))] "TARGET_SSE && !TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))" @@ -13128,46 +13133,49 @@ (define_split "operands[1] = gen_lowpart (mode, operands[1]);") (define_insn "*vec_extractv4si" - [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,Yr,*x,x") + [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,rm,Yr,*x,x,Yv") (vec_select:SI - (match_operand:V4SI 1 "register_operand" "x,0,0,x") + (match_operand:V4SI 1 "register_operand" "x,v,0,0,x,v") (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))] "TARGET_SSE4_1" { switch (which_alternative) { case 0: + case 1: return "%vpextrd\t{%2, %1, %0|%0, %1, %2}"; - case 1: case 2: - operands [2] = GEN_INT (INTVAL (operands[2]) * 4); + case 3: + operands[2] = GEN_INT (INTVAL (operands[2]) * 4); return "psrldq\t{%2, %0|%0, %2}"; - case 3: - operands [2] = GEN_INT (INTVAL (operands[2]) * 4); + case 4: + case 5: + operands[2] = GEN_INT (INTVAL (operands[2]) * 4); return "vpsrldq\t{%2, %1, %0|%0, %1, %2}"; default: gcc_unreachable (); } } - [(set_attr "isa" "*,noavx,noavx,avx") - (set_attr "type" "sselog1,sseishft1,sseishft1,sseishft1") - (set_attr "prefix_extra" "1,*,*,*") + [(set_attr "isa" "*,avx512dq,noavx,noavx,avx,avx512bw") + (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1,sseishft1,sseishft1") + (set_attr "prefix_extra" "1,1,*,*,*,*") (set_attr "length_immediate" "1") - (set_attr "prefix" "maybe_vex,orig,orig,vex") + (set_attr "prefix" "maybe_vex,evex,orig,orig,vex,evex") (set_attr "mode" "TI")]) (define_insn "*vec_extractv4si_zext" - [(set (match_operand:DI 0 "register_operand" "=r") + [(set (match_operand:DI 0 "register_operand" "=r,r") (zero_extend:DI (vec_select:SI - (match_operand:V4SI 1 "register_operand" "x") + (match_operand:V4SI 1 "register_operand" "x,v") (parallel [(match_operand:SI 2 "const_0_to_3_operand")]))))] "TARGET_64BIT && TARGET_SSE4_1" "%vpextrd\t{%2, %1, %k0|%k0, %1, %2}" - [(set_attr "type" "sselog1") + [(set_attr "isa" "*,avx512dq") + (set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "maybe_vex") @@ -13196,26 +13204,28 @@ (define_insn_and_split "*vec_extractv4si }) (define_insn "*vec_extractv2di_1" - [(set (match_operand:DI 0 "nonimmediate_operand" "=rm,m,x,x,x,x,r") + [(set (match_operand:DI 0 "nonimmediate_operand" "=rm,rm,m,x,x,Yv,x,v,r") (vec_select:DI - (match_operand:V2DI 1 "nonimmediate_operand" "x ,x,0,x,x,o,o") + (match_operand:V2DI 1 "nonimmediate_operand" "x ,v ,v,0,x, v,x,o,o") (parallel [(const_int 1)])))] "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "@ %vpextrq\t{$1, %1, %0|%0, %1, 1} + vpextrq\t{$1, %1, %0|%0, %1, 1} %vmovhps\t{%1, %0|%0, %1} psrldq\t{$8, %0|%0, 8} vpsrldq\t{$8, %1, %0|%0, %1, 8} + vpsrldq\t{$8, %1, %0|%0, %1, 8} movhlps\t{%1, %0|%0, %1} # #" - [(set_attr "isa" "x64_sse4,*,sse2_noavx,avx,noavx,*,x64") - (set_attr "type" "sselog1,ssemov,sseishft1,sseishft1,ssemov,ssemov,imov") - (set_attr "length_immediate" "1,*,1,1,*,*,*") - (set_attr "prefix_rex" "1,*,*,*,*,*,*") - (set_attr "prefix_extra" "1,*,*,*,*,*,*") - (set_attr "prefix" "maybe_vex,maybe_vex,orig,vex,orig,*,*") - (set_attr "mode" "TI,V2SF,TI,TI,V4SF,DI,DI")]) + [(set_attr "isa" "x64_sse4,x64_avx512dq,*,sse2_noavx,avx,avx512bw,noavx,*,x64") + (set_attr "type" "sselog1,sselog1,ssemov,sseishft1,sseishft1,sseishft1,ssemov,ssemov,imov") + (set_attr "length_immediate" "1,1,*,1,1,1,*,*,*") + (set_attr "prefix_rex" "1,1,*,*,*,*,*,*,*") + (set_attr "prefix_extra" "1,1,*,*,*,*,*,*,*") + (set_attr "prefix" "maybe_vex,evex,maybe_vex,orig,vex,evex,orig,*,*") + (set_attr "mode" "TI,TI,V2SF,TI,TI,TI,V4SF,DI,DI")]) (define_split [(set (match_operand: 0 "register_operand") --- gcc/testsuite/gcc.target/i386/avx512bw-vpextr-1.c.jj 2016-05-09 15:52:04.847639780 +0200 +++ gcc/testsuite/gcc.target/i386/avx512bw-vpextr-1.c 2016-05-09 16:45:46.662102460 +0200 @@ -0,0 +1,109 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512vl -mavx512bw" } */ + +typedef char v16qi __attribute__((vector_size (16))); +typedef short v8hi __attribute__((vector_size (16))); +typedef int v4si __attribute__((vector_size (16))); +typedef long long v2di __attribute__((vector_size (16))); + +void +f1 (v16qi a) +{ + register v16qi c __asm ("xmm16") = a; + register unsigned char e __asm ("dl"); + asm volatile ("" : "+v" (c)); + v16qi d = c; + e = ((unsigned char *) &d)[3]; + asm volatile ("" : : "q" (e)); +} + +unsigned short +f2 (v8hi a) +{ + register v8hi c __asm ("xmm16") = a; + register unsigned short e __asm ("dx"); + asm volatile ("" : "+v" (c)); + v8hi d = c; + e = ((unsigned short *) &d)[3]; + asm volatile ("" : : "r" (e)); +} + +unsigned int +f3 (v16qi a) +{ + register v16qi c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v16qi d = c; + return ((unsigned char *) &d)[3]; +} + +unsigned int +f4 (v8hi a) +{ + register v8hi c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v8hi d = c; + return ((unsigned short *) &d)[3]; +} + +unsigned long long +f5 (v16qi a) +{ + register v16qi c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v16qi d = c; + return ((unsigned char *) &d)[3]; +} + +unsigned long long +f6 (v8hi a) +{ + register v8hi c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v8hi d = c; + return ((unsigned short *) &d)[3]; +} + +void +f7 (v16qi a, unsigned char *p) +{ + register v16qi c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v16qi d = c; + *p = ((unsigned char *) &d)[3]; +} + +void +f8 (v8hi a, unsigned short *p) +{ + register v8hi c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v8hi d = c; + *p = ((unsigned short *) &d)[3]; +} + +void +f9 (v4si a) +{ + register v4si c __asm ("xmm16") = a; + register unsigned int e __asm ("xmm17"); + asm volatile ("" : "+v" (c)); + v4si d = c; + e = ((unsigned int *) &d)[3]; + asm volatile ("" : "+v" (e)); +} + +void +f10 (v2di a) +{ + register v2di c __asm ("xmm16") = a; + register unsigned long long e __asm ("xmm17"); + asm volatile ("" : "+v" (c)); + v2di d = c; + e = ((unsigned long long *) &d)[1]; + asm volatile ("" : "+v" (e)); +} + +/* { dg-final { scan-assembler-times "vpextrb\[^\n\r]*xmm16" 4 } } */ +/* { dg-final { scan-assembler-times "vpextrw\[^\n\r]*xmm16" 4 } } */ +/* { dg-final { scan-assembler-times "vpsrldq\[^\n\r]*xmm1\[67\]\[^\n\r]*xmm1\[67\]" 2 } } */ --- gcc/testsuite/gcc.target/i386/avx512dq-vpextr-1.c.jj 2016-05-09 16:02:02.183614536 +0200 +++ gcc/testsuite/gcc.target/i386/avx512dq-vpextr-1.c 2016-05-09 16:01:24.000000000 +0200 @@ -0,0 +1,53 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512vl -mavx512dq" } */ + +typedef int v4si __attribute__((vector_size (16))); +typedef long long v2di __attribute__((vector_size (16))); + +unsigned int +f1 (v4si a) +{ + register v4si c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v4si d = c; + return ((unsigned int *) &d)[3]; +} + +unsigned long long +f2 (v2di a) +{ + register v2di c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v2di d = c; + return ((unsigned long long *) &d)[1]; +} + +unsigned long long +f3 (v4si a) +{ + register v4si c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v4si d = c; + return ((unsigned int *) &d)[3]; +} + +void +f4 (v4si a, unsigned int *p) +{ + register v4si c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v4si d = c; + *p = ((unsigned int *) &d)[3]; +} + +void +f5 (v2di a, unsigned long long *p) +{ + register v2di c __asm ("xmm16") = a; + asm volatile ("" : "+v" (c)); + v2di d = c; + *p = ((unsigned long long *) &d)[1]; +} + +/* { dg-final { scan-assembler-times "vpextrd\[^\n\r]*xmm16" 3 } } */ +/* { dg-final { scan-assembler-times "vpextrq\[^\n\r]*xmm16" 2 } } */ Jakub