From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 93217 invoked by alias); 2 May 2017 10:11:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 93161 invoked by uid 89); 2 May 2017 10:11:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=Jensen, Hx-languages-length:2661, jensen, 6656 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 02 May 2017 10:11:51 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8D79D10EC2E; Tue, 2 May 2017 10:11:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8D79D10EC2E Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jakub@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 8D79D10EC2E Received: from tucnak.zalov.cz (ovpn-116-29.ams2.redhat.com [10.36.116.29]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1CFCC7E399; Tue, 2 May 2017 10:11:51 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id v42ABnm7006333; Tue, 2 May 2017 12:11:49 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id v42ABlW3006332; Tue, 2 May 2017 12:11:47 +0200 Date: Tue, 02 May 2017 10:17:00 -0000 From: Jakub Jelinek To: Allan Sandfeld Jensen , Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts Message-ID: <20170502101147.GA1809@tucnak> Reply-To: Jakub Jelinek References: <201704221338.46300.linux@carewolf.com> <201704240933.09704.linux@carewolf.com> <20170424074349.GG1809@tucnak> <201704241515.11173.linux@carewolf.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201704241515.11173.linux@carewolf.com> User-Agent: Mutt/1.7.1 (2016-10-04) X-IsSubscribed: yes X-SW-Source: 2017-05/txt/msg00076.txt.bz2 On Mon, Apr 24, 2017 at 03:15:11PM +0200, Allan Sandfeld Jensen wrote: > Okay, I have tried that, and I also made it more obvious how the intrinsics > can become non-immediate shift. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index b58f5050db0..b9406550fc5 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,10 @@ > +2017-04-22 Allan Sandfeld Jensen > + > + * config/i386/emmintrin.h (_mm_slli_*, _mm_srli_*): > + Use vector intrinstics instead of builtins. > + * config/i386/avx2intrin.h (_mm256_slli_*, _mm256_srli_*): > + Use vector intrinstics instead of builtins. > + > 2017-04-21 Uros Bizjak > > * config/i386/i386.md (*extzvqi_mem_rex64): Move above *extzv. > diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h > index 82f170a3d61..64ba52b244e 100644 > --- a/gcc/config/i386/avx2intrin.h > +++ b/gcc/config/i386/avx2intrin.h > @@ -665,13 +665,6 @@ _mm256_slli_si256 (__m256i __A, const int __N) > > extern __inline __m256i > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > -_mm256_slli_epi16 (__m256i __A, int __B) > -{ > - return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B); > -} > - > -extern __inline __m256i > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > _mm256_sll_epi16 (__m256i __A, __m128i __B) > { > return (__m256i)__builtin_ia32_psllw256((__v16hi)__A, (__v8hi)__B); > @@ -679,9 +672,11 @@ _mm256_sll_epi16 (__m256i __A, __m128i __B) > > extern __inline __m256i > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > -_mm256_slli_epi32 (__m256i __A, int __B) > +_mm256_slli_epi16 (__m256i __A, int __B) > { > - return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B); > + if (__builtin_constant_p(__B)) > + return ((unsigned int)__B < 16) ? (__m256i)((__v16hi)__A << __B) : _mm256_setzero_si256(); > + return _mm256_sll_epi16(__A, _mm_cvtsi32_si128(__B)); > } The formatting is wrong, missing spaces before function names and opening (, too long lines. Also, you've removed some builtin uses like __builtin_ia32_psllwi256 above, but haven't removed those builtins from the compiler (unlike the intrinsics, the builtins are not supported and can be removed). But I guess the primary question is on Uros, do we want to handle this in the *intrin.h headers and thus increase the size of those (and their parsing time etc.), or do we want to handle this in the target folders (tree as well as gimple one), where we'd convert e.g. __builtin_ia32_psllwi256 to the shift if the shift count is constant. Jakub