From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115465 invoked by alias); 4 Apr 2017 06:40:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115412 invoked by uid 89); 4 Apr 2017 06:40:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy=instructing X-HELO: mail-vk0-f52.google.com Received: from mail-vk0-f52.google.com (HELO mail-vk0-f52.google.com) (209.85.213.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 04 Apr 2017 06:40:00 +0000 Received: by mail-vk0-f52.google.com with SMTP id z204so164614820vkd.1 for ; Mon, 03 Apr 2017 23:40:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=7sJOf6XGRC8SGJDaRs075BeFWbG9YCILpUg/24NyYhM=; b=g9mx5+4rylHOIM9mHKDjX352ZvvELjsOYr8uEfV1IXV2IonWIKEraCmXPlqf5+Onm0 GwzrN7KegxRep5cb8DilPmTCdaCsIqzWdZvI9FUXA1WDIVPsc83soOfy5jDEjoMGQhSP omDvbheaYfX0Vt8hUSx3TX4xcnz86V7KLIqDhqw4mARvWEXsulJpxmV6Qltqvy+5UkOH Me7DRoTW14eLqZNO9D4qTFkRlZ6hYcJ2TARyaLMfBYtBLxhJYcuaH2Ootgep9B2TvhsC w3B3kd2X3j6uK4hhk2G0uelwJirI3d9MAhWFynmJ26eQykX3L7KmAUVzFsmevQZC+qnB 4utw== X-Gm-Message-State: AFeK/H1spmT/Yf7O1/iqRHZ+rMhQDqqs2lkRsz4HqqWi8u/PoIs5azuNwblI0Llobg6UvLvBKNFZ0Tw+0Mjssg== X-Received: by 10.176.83.124 with SMTP id y57mr9399784uay.141.1491288000206; Mon, 03 Apr 2017 23:40:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.183.3 with HTTP; Mon, 3 Apr 2017 23:39:59 -0700 (PDT) In-Reply-To: <20170403203437.GF17461@tucnak> References: <20170403203437.GF17461@tucnak> From: Uros Bizjak Date: Tue, 04 Apr 2017 06:40:00 -0000 Message-ID: Subject: Re: [PATCH] Fix MMX/SSE/AVX* shifts by non-immediate scalar (PR target/80286) To: Jakub Jelinek Cc: Kirill Yukhin , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2017-04/txt/msg00119.txt.bz2 On Mon, Apr 3, 2017 at 10:34 PM, Jakub Jelinek wrote: > Hi! > > This patch deals just with correctness of vector shifts by scalar > non-immediate. The manuals say the shift count is bits [0:63] of > the corresponding source operand (XMM reg or memory in some cases), > and if the count is bigger than number of bits - 1 in the vector element, > it is treated as number of bits shift count. > We are modelling it as SImode shift count though, the upper 32 bits > may be random in some cases which causes wrong-code. > Fixed by using DImode that matches what the insns do. IIRC, SImode was choosen to simplify GPR->XMM register moves on 32bit target. It does look this was wrong choice from the correctness point. > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Any thoughts on what to do to generate reasonable code when the shift count > comes from memory (e.g. as int variable) or is in the low bits of some XMM > regioster? The problem with int variable from memory is, that shifts access full 128bits for their count operand, so this is effectively a no-go. If there is a 128bit count value in memory, we can maybe define shift pattern with: (subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN")) ? > First of all, perhaps we could have some combiner (or peephole) pattern that would > transform sign-extend from e.g. SI to DI on the shift count into zero-extend > if there are no other uses of the extension result - if the shift count is > negative in SImode (or even QImode), then it is already large number and the > upper 32 bits or more don't really change anything on that. We can introduce shift patterns with embedded extensions, and split them to zext + shift. These new patterns can be easily macroized with any_extend code iterator and SWI124 mode iterator, so we avoid pattern explosion. > Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through > GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero > extended. Not sure if we want to add =v / vm alternative to > zero_extendsidi2*, it already has some x but with ?s that prevent the RA > from using it. So thoughts on that? The ? is there to discourage RA from allocating xmm reg (all these alternatives have * on xmm reg), in effect instructing RA to prefer GPRs. If the value is already in xmm reg, then I expect ? alternative will be used. So, yes, v/v alternative as you proposed would be a good addition to zero_extendsidi alternatives. Please note though that pmovzxdq operates on a vector value, so memory operands should be avoided. > > 2017-04-03 Jakub Jelinek > > PR target/80286 > * config/i386/i386.c (ix86_expand_args_builtin): If op has scalar > int mode, convert_modes it to mode as unsigned, otherwise use > lowpart_subreg to mode rather than SImode. > * config/i386/sse.md (ashr3, > ashr3, ashr3, 3): > Use DImode instead of SImode for the shift count operand. > * config/i386/mmx.md (mmx_ashr3, mmx_3): > Likewise. > testsuite/ > * gcc.target/i386/avx-pr80286.c: New test. > * gcc.dg/pr80286.c: New test. OK for trunk and backports. Thanks, Uros. > --- gcc/config/i386/i386.c.jj 2017-04-03 10:40:22.000000000 +0200 > +++ gcc/config/i386/i386.c 2017-04-03 18:31:39.482367634 +0200 > @@ -35582,10 +35582,17 @@ ix86_expand_args_builtin (const struct b > { > /* SIMD shift insns take either an 8-bit immediate or > register as count. But builtin functions take int as > - count. If count doesn't match, we put it in register. */ > + count. If count doesn't match, we put it in register. > + The instructions are using 64-bit count, if op is just > + 32-bit, zero-extend it, as negative shift counts > + are undefined behavior and zero-extension is more > + efficient. */ > if (!match) > { > - op = lowpart_subreg (SImode, op, GET_MODE (op)); > + if (SCALAR_INT_MODE_P (GET_MODE (op))) > + op = convert_modes (mode, GET_MODE (op), op, 1); > + else > + op = lowpart_subreg (mode, op, GET_MODE (op)); > if (!insn_p->operand[i + 1].predicate (op, mode)) > op = copy_to_reg (op); > } > --- gcc/config/i386/sse.md.jj 2017-04-03 13:43:50.179572564 +0200 > +++ gcc/config/i386/sse.md 2017-04-03 18:01:19.713852914 +0200 > @@ -10620,7 +10620,7 @@ (define_insn "ashr3< > [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v") > (ashiftrt:VI24_AVX512BW_1 > (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm") > - (match_operand:SI 2 "nonmemory_operand" "v,N")))] > + (match_operand:DI 2 "nonmemory_operand" "v,N")))] > "TARGET_AVX512VL" > "vpsra\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "type" "sseishft") > @@ -10634,7 +10634,7 @@ (define_insn "ashr3" > [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x") > (ashiftrt:VI24_AVX2 > (match_operand:VI24_AVX2 1 "register_operand" "0,x") > - (match_operand:SI 2 "nonmemory_operand" "xN,xN")))] > + (match_operand:DI 2 "nonmemory_operand" "xN,xN")))] > "TARGET_SSE2" > "@ > psra\t{%2, %0|%0, %2} > @@ -10667,7 +10667,7 @@ (define_insn "ashr3" > [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v") > (ashiftrt:VI248_AVX512BW_AVX512VL > (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand" "v,vm") > - (match_operand:SI 2 "nonmemory_operand" "v,N")))] > + (match_operand:DI 2 "nonmemory_operand" "v,N")))] > "TARGET_AVX512F" > "vpsra\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "type" "sseishft") > @@ -10681,7 +10681,7 @@ (define_insn "3 [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v") > (any_lshift:VI2_AVX2_AVX512BW > (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v") > - (match_operand:SI 2 "nonmemory_operand" "xN,vN")))] > + (match_operand:DI 2 "nonmemory_operand" "xN,vN")))] > "TARGET_SSE2 && && " > "@ > p\t{%2, %0|%0, %2} > @@ -10700,7 +10700,7 @@ (define_insn "3 [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v") > (any_lshift:VI48_AVX2 > (match_operand:VI48_AVX2 1 "register_operand" "0,x,v") > - (match_operand:SI 2 "nonmemory_operand" "xN,xN,vN")))] > + (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))] > "TARGET_SSE2 && " > "@ > p\t{%2, %0|%0, %2} > @@ -10720,7 +10720,7 @@ (define_insn "3 [(set (match_operand:VI48_512 0 "register_operand" "=v,v") > (any_lshift:VI48_512 > (match_operand:VI48_512 1 "nonimmediate_operand" "v,m") > - (match_operand:SI 2 "nonmemory_operand" "vN,N")))] > + (match_operand:DI 2 "nonmemory_operand" "vN,N")))] > "TARGET_AVX512F && " > "vp\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "isa" "avx512f") > --- gcc/config/i386/mmx.md.jj 2017-04-03 13:43:50.119573339 +0200 > +++ gcc/config/i386/mmx.md 2017-04-03 18:01:19.708852979 +0200 > @@ -930,7 +930,7 @@ (define_insn "mmx_ashr3" > [(set (match_operand:MMXMODE24 0 "register_operand" "=y") > (ashiftrt:MMXMODE24 > (match_operand:MMXMODE24 1 "register_operand" "0") > - (match_operand:SI 2 "nonmemory_operand" "yN")))] > + (match_operand:DI 2 "nonmemory_operand" "yN")))] > "TARGET_MMX" > "psra\t{%2, %0|%0, %2}" > [(set_attr "type" "mmxshft") > @@ -944,7 +944,7 @@ (define_insn "mmx_3" > [(set (match_operand:MMXMODE248 0 "register_operand" "=y") > (any_lshift:MMXMODE248 > (match_operand:MMXMODE248 1 "register_operand" "0") > - (match_operand:SI 2 "nonmemory_operand" "yN")))] > + (match_operand:DI 2 "nonmemory_operand" "yN")))] > "TARGET_MMX" > "p\t{%2, %0|%0, %2}" > [(set_attr "type" "mmxshft") > --- gcc/testsuite/gcc.target/i386/avx-pr80286.c.jj 2017-04-03 18:44:07.552698281 +0200 > +++ gcc/testsuite/gcc.target/i386/avx-pr80286.c 2017-04-03 18:43:51.000000000 +0200 > @@ -0,0 +1,26 @@ > +/* PR target/80286 */ > +/* { dg-do run { target avx } } */ > +/* { dg-options "-O2 -mavx" } */ > + > +#include "avx-check.h" > +#include > + > +__m256i m; > + > +__attribute__((noinline, noclone)) __m128i > +foo (__m128i x) > +{ > + int s = _mm_cvtsi128_si32 (_mm256_castsi256_si128 (m)); > + return _mm_srli_epi16 (x, s); > +} > + > +static void > +avx_test (void) > +{ > + __m128i a = (__m128i) (__v8hi) { 1 << 7, 2 << 8, 3 << 9, 4 << 10, 5 << 11, 6 << 12, 7 << 13, 8 << 12 }; > + m = (__m256i) (__v8si) { 7, 8, 9, 10, 11, 12, 13, 14 }; > + __m128i c = foo (a); > + __m128i b = (__m128i) (__v8hi) { 1, 2 << 1, 3 << 2, 4 << 3, 5 << 4, 6 << 5, 7 << 6, 8 << 5 }; > + if (__builtin_memcmp (&c, &b, sizeof (__m128i))) > + __builtin_abort (); > +} > --- gcc/testsuite/gcc.dg/pr80286.c.jj 2017-04-03 18:45:27.574663948 +0200 > +++ gcc/testsuite/gcc.dg/pr80286.c 2017-04-03 18:45:18.386782707 +0200 > @@ -0,0 +1,23 @@ > +/* PR target/80286 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -Wno-psabi" } */ > + > +typedef int V __attribute__((vector_size (4 * sizeof (int)))); > + > +__attribute__((noinline, noclone)) V > +foo (V x, V y) > +{ > + return x << y[0]; > +} > + > +int > +main () > +{ > + V x = { 1, 2, 3, 4 }; > + V y = { 5, 6, 7, 8 }; > + V z = foo (x, y); > + V e = { 1 << 5, 2 << 5, 3 << 5, 4 << 5 }; > + if (__builtin_memcmp (&z, &e, sizeof (V))) > + __builtin_abort (); > + return 0; > +} > > Jakub