From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 108175 invoked by alias); 13 Feb 2020 13:08:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 108159 invoked by uid 89); 13 Feb 2020 13:08:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-4.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-oi1-f181.google.com Received: from mail-oi1-f181.google.com (HELO mail-oi1-f181.google.com) (209.85.167.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 13 Feb 2020 13:08:35 +0000 Received: by mail-oi1-f181.google.com with SMTP id z2so5679084oih.6 for ; Thu, 13 Feb 2020 05:08:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BiKwU4VVRurZNaDpsCP1mdsM7rnnYacrdq1urJjt04k=; b=Z0Pd+MtIMzKpO/3NMDQq2UJ4UFNb+Qcp5de8KMr79od7xiHnnctGYs4BOnlO5+2eM8 GLiTF/ikWI2jegXSvfWohsYoreJwzalRt7EvH1DiUqzmr0L8Q2SZMvdomuhMY9c5W8Qd uk82ZRkSU5ObwVY2o66sfbUYVQP8emDmkO80Q7TWpj9dMrg5jmETIcFKeOUd3Mi8X4My T1/A8mVo0/rU8wVapf2e3hmqMqQuMjOAnbkcK1s/o+k/hpqfXYu/Qc5k/AjyEL6Bep6M 6dx7YuxWwd8cSYx5FdqbECd/pxnPl9J1n2WYRHxDF6z6cknfFwoHr2eslJO8AmF/LK/7 D8kw== MIME-Version: 1.0 References: <20190222162451.GA18480@intel.com> In-Reply-To: From: "H.J. Lu" Date: Thu, 13 Feb 2020 13:08:00 -0000 Message-ID: Subject: PING^7: [PATCH] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move To: Jakub Jelinek , Jeffrey Law Cc: GCC Patches , Jan Hubicka , Uros Bizjak Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2020-02/txt/msg00788.txt.bz2 On Thu, Feb 6, 2020 at 8:17 PM H.J. Lu wrote: > > On Mon, Jan 27, 2020 at 10:59 AM H.J. Lu wrote: > > > > On Mon, Jul 8, 2019 at 8:19 AM H.J. Lu wrote: > > > > > > On Tue, Jun 18, 2019 at 8:59 AM H.J. Lu wrote: > > > > > > > > On Fri, May 31, 2019 at 10:38 AM H.J. Lu wrote: > > > > > > > > > > On Tue, May 21, 2019 at 2:43 PM H.J. Lu wrote: > > > > > > > > > > > > On Fri, Feb 22, 2019 at 8:25 AM H.J. Lu wrote: > > > > > > > > > > > > > > Hi Jan, Uros, > > > > > > > > > > > > > > This patch fixes the wrong code bug: > > > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229 > > > > > > > > > > > > > > Tested on AVX2 and AVX512 with and without --with-arch=native. > > > > > > > > > > > > > > OK for trunk? > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > H.J. > > > > > > > -- > > > > > > > i386 backend has > > > > > > > > > > > > > > INT_MODE (OI, 32); > > > > > > > INT_MODE (XI, 64); > > > > > > > > > > > > > > So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation, > > > > > > > in case of const_1, all 512 bits set. > > > > > > > > > > > > > > We can load zeros with narrower instruction, (e.g. 256 bit by inherent > > > > > > > zeroing of highpart in case of 128 bit xor), so TImode in this case. > > > > > > > > > > > > > > Some targets prefer V4SF mode, so they will emit float xorps for zeroing. > > > > > > > > > > > > > > sse.md has > > > > > > > > > > > > > > (define_insn "mov_internal" > > > > > > > [(set (match_operand:VMOVE 0 "nonimmediate_operand" > > > > > > > "=v,v ,v ,m") > > > > > > > (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand" > > > > > > > " C,BC,vm,v"))] > > > > > > > .... > > > > > > > /* There is no evex-encoded vmov* for sizes smaller than 64-bytes > > > > > > > in avx512f, so we need to use workarounds, to access sse registers > > > > > > > 16-31, which are evex-only. In avx512vl we don't need workarounds. */ > > > > > > > if (TARGET_AVX512F && < 64 && !TARGET_AVX512VL > > > > > > > && (EXT_REX_SSE_REG_P (operands[0]) > > > > > > > || EXT_REX_SSE_REG_P (operands[1]))) > > > > > > > { > > > > > > > if (memory_operand (operands[0], mode)) > > > > > > > { > > > > > > > if ( == 32) > > > > > > > return "vextract64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; > > > > > > > else if ( == 16) > > > > > > > return "vextract32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; > > > > > > > else > > > > > > > gcc_unreachable (); > > > > > > > } > > > > > > > ... > > > > > > > > > > > > > > However, since ix86_hard_regno_mode_ok has > > > > > > > > > > > > > > /* TODO check for QI/HI scalars. */ > > > > > > > /* AVX512VL allows sse regs16+ for 128/256 bit modes. */ > > > > > > > if (TARGET_AVX512VL > > > > > > > && (mode == OImode > > > > > > > || mode == TImode > > > > > > > || VALID_AVX256_REG_MODE (mode) > > > > > > > || VALID_AVX512VL_128_REG_MODE (mode))) > > > > > > > return true; > > > > > > > > > > > > > > /* xmm16-xmm31 are only available for AVX-512. */ > > > > > > > if (EXT_REX_SSE_REGNO_P (regno)) > > > > > > > return false; > > > > > > > > > > > > > > if (TARGET_AVX512F && < 64 && !TARGET_AVX512VL > > > > > > > && (EXT_REX_SSE_REG_P (operands[0]) > > > > > > > || EXT_REX_SSE_REG_P (operands[1]))) > > > > > > > > > > > > > > is a dead code. > > > > > > > > > > > > > > Also for > > > > > > > > > > > > > > long long *p; > > > > > > > volatile __m256i yy; > > > > > > > > > > > > > > void > > > > > > > foo (void) > > > > > > > { > > > > > > > _mm256_store_epi64 (p, yy); > > > > > > > } > > > > > > > > > > > > > > with AVX512VL, we should generate > > > > > > > > > > > > > > vmovdqa %ymm0, (%rax) > > > > > > > > > > > > > > not > > > > > > > > > > > > > > vmovdqa64 %ymm0, (%rax) > > > > > > > > > > > > > > All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov: > > > > > > > > > > > > > > 1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector > > > > > > > moves will be generated. > > > > > > > 2. If xmm16-xmm31/ymm16-ymm31 registers are used: > > > > > > > a. With AVX512VL, AVX512VL vector moves will be generated. > > > > > > > b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register > > > > > > > move will be done with zmm register move. > > > > > > > > > > > > > > ext_sse_reg_operand is removed since it is no longer needed. > > > > > > > > > > > > > > Tested on AVX2 and AVX512 with and without --with-arch=native. > > > > > > > > > > > > > > gcc/ > > > > > > > > > > > > > > PR target/89229 > > > > > > > PR target/89346 > > > > > > > * config/i386/i386-protos.h (ix86_output_ssemov): New prototype. > > > > > > > * config/i386/i386.c (ix86_get_ssemov): New function. > > > > > > > (ix86_output_ssemov): Likewise. > > > > > > > * config/i386/i386.md (*movxi_internal_avx512f): Call > > > > > > > ix86_output_ssemov for TYPE_SSEMOV. > > > > > > > (*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV. > > > > > > > Remove ext_sse_reg_operand and TARGET_AVX512VL check. > > > > > > > (*movti_internal): Likewise. > > > > > > > (*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > > > > > > > Remove ext_sse_reg_operand check. > > > > > > > (*movsi_internal): Likewise. > > > > > > > (*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > > > > > > > (*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > > > > > > > Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL > > > > > > > and ext_sse_reg_operand check. > > > > > > > (*movsf_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV. > > > > > > > Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and > > > > > > > ext_sse_reg_operand check. > > > > > > > * config/i386/mmx.md (MMXMODE:*mov_internal): Call > > > > > > > ix86_output_ssemov for TYPE_SSEMOV. Remove ext_sse_reg_operand > > > > > > > check. > > > > > > > * config/i386/sse.md (VMOVE:mov_internal): Call > > > > > > > ix86_output_ssemov for TYPE_SSEMOV. Remove TARGET_AVX512VL > > > > > > > check. > > > > > > > * config/i386/predicates.md (ext_sse_reg_operand): Removed. > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > > > > > PR target/89229 > > > > > > > PR target/89346 > > > > > > > * gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated. > > > > > > > * gcc.target/i386/pr89229-2a.c: New test. > > > > > > > * gcc.target/i386/pr89229-2b.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-2c.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-3a.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-3b.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-3c.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-4a.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-4b.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-4c.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-5a.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-5b.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-5c.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-6a.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-6b.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-6c.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-7a.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-7b.c: Likewise. > > > > > > > * gcc.target/i386/pr89229-7c.c: Likewise. > > > > > > > --- > > > > > > > > > > > > PING: > > > > > > > > > > > > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html > > > > > > > > > > > > > > > > > > > > > > PING. > > > > > > > > > > > > > PING. > > > > > > > > > > PING. > > > > > > > Here is the rebased patch. I'd like to see it got fixed for GCC 10. > > > > PING. Here is the rebased patch. > PING. https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00415.html -- H.J.