From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 109350 invoked by alias); 18 Feb 2019 14:37:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 109319 invoked by uid 89); 18 Feb 2019 14:37:44 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,GIT_PATCH_1,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-it1-f193.google.com Received: from mail-it1-f193.google.com (HELO mail-it1-f193.google.com) (209.85.166.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 18 Feb 2019 14:37:42 +0000 Received: by mail-it1-f193.google.com with SMTP id v72so41524504itc.0 for ; Mon, 18 Feb 2019 06:37:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5fAvIZl/vz5Fvi3iMk2wgothq/DJwR3NbEe9uDGy0YU=; b=ZvBT9w7BvXZTBUlKUcLhzrLy14PgVnurKzYu6FHlWuwWiq1iG3u49jxonTdO0FF86/ /HLwwFT6D2g/gE3cUk0QzUVG/bXeQz0ywexsjqDBcuI3HZ5+DjE4WY0XeHGcqHlY7Gju qg49JzAIdWUNURKaNC1YY+TxBiiKLiKDjxHza2qhQ3ePJwlQ2vWLyrE+h+8Bf2vihYjy PXaE5CkpZTWLgtPF+/yc5szdVmZC0sb9ylibi5Bl5mvYaouejKJUMS4NfjH2za1DQw7/ yzfAEFzle8y9sPfYjj7rvjO0WmPVWWbvE3jie9T1YXN0Bpk6JwRQ1oLFIS+viYSzVhnh kdCg== MIME-Version: 1.0 References: <20190216224032.4889-1-hjl.tools@gmail.com> In-Reply-To: From: Uros Bizjak Date: Mon, 18 Feb 2019 14:37:00 -0000 Message-ID: Subject: Re: [PATCH 00/41] V8: Emulate MMX intrinsics with SSE To: "H.J. Lu" Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-SW-Source: 2019-02/txt/msg01457.txt.bz2 On Mon, Feb 18, 2019 at 3:22 PM H.J. Lu wrote: > > > > > > > > > > > On x86-64, since __m64 is returned and passed in XMM = registers, we can > > > > > > > > > > > emulate MMX intrinsics with SSE instructions. To supp= ort it, we added > > > > > > > > > > > > > > > > > > > > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_= SSE2) > > > > > > > > > > > > > > > > > > > > > > ;; Define instruction set of MMX instructions > > > > > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64= _avx" > > > > > > > > > > > (const_string "base")) > > > > > > > > > > > > > > > > > > > > > > (eq_attr "mmx_isa" "native") > > > > > > > > > > > (symbol_ref "!TARGET_MMX_WITH_SSE") > > > > > > > > > > > (eq_attr "mmx_isa" "x64") > > > > > > > > > > > (symbol_ref "TARGET_MMX_WITH_SSE") > > > > > > > > > > > (eq_attr "mmx_isa" "x64_avx") > > > > > > > > > > > (symbol_ref "TARGET_MMX_WITH_SSE && TARGET= _AVX") > > > > > > > > > > > (eq_attr "mmx_isa" "x64_noavx") > > > > > > > > > > > (symbol_ref "TARGET_MMX_WITH_SSE && !TARGE= T_AVX") > > > > > > > > > > > > > > > > > > > > > > We added SSE emulation to MMX patterns and disabled M= MX alternatives with > > > > > > > > > > > TARGET_MMX_WITH_SSE. > > > > > > > > > > > > > > > > > > > > > > Most of MMX instructions have equivalent SSE versions= and results of some > > > > > > > > > > > SSE versions need to be reshuffled to the right order= for MMX. Thee are > > > > > > > > > > > couple tricky cases: > > > > > > > > > > > > > > > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent= . We emulate MMX > > > > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the uppe= r 64 bits of the > > > > > > > > > > > mask operand and handle unmapped bits 64:127 at memor= y address by > > > > > > > > > > > adjusting source and mask operands together with memo= ry address. > > > > > > > > > > > > > > > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, wh= ich is available > > > > > > > > > > > in 64-bit mode. > > > > > > > > > > > > > > > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb ta= kes a 4-bit index. > > > > > > > > > > > SSE emulation must clear the bit 4 in the shuffle con= trol mask. > > > > > > > > > > > > > > > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must= properly preserve > > > > > > > > > > > the upper 64 bits of destination XMM register. > > > > > > > > > > > > > > > > > > > > > > Tests are also added to check each SSE emulation of M= MX intrinsics. > > > > > > > > > > > > > > > > > > > > > > There are no regressions on i686 and x86-64. For x86= -64, GCC is also > > > > > > > > > > > tested with > > > > > > > > > > > > > > > > > > > > > > --with-arch=3Dnative --with-cpu=3Dnative > > > > > > > > > > > > > > > > > > > > > > on AVX2 and AVX512F machines. > > > > > > > > > > > > > > > > > > > > An idea that would take patch a step further also on 32= bit targets: > > > > > > > > > > > > > > > > > > > > *Assuming* that operations on XMM registers are as fast= (or perhaps > > > > > > > > > > faster) than operations on MMX registers, we can change= mmx_isa > > > > > > > > > > attribute in e.g. > > > > > > > > > > > > > > > > > > > > + "@ > > > > > > > > > > + p\t{%2, %0|%0, %2} > > > > > > > > > > + p\t{%2, %0|%0, %2} > > > > > > > > > > + vp\t{%2, %1, %0|%0, %1, %2}" > > > > > > > > > > + [(set_attr "mmx_isa" "native,x64_noavx,x64_avx") > > > > > > > > > > > > > > > > > > > > to: > > > > > > > > > > > > > > > > > > > > [(set_attr "isa" "*,noavx,avx") > > > > > > > > > > (set_attr "mmx_isa" "native,*,*")] > > > > > > > > > > > > > > > > > > > > So, for x86_64 everything stays the same, but for x86_3= 2 we now allow > > > > > > > > > > intrinsics to use xmm registers in addition to mmx regi= sters. We can't > > > > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (a= nd some tricky > > > > > > > > > > cases, e.g. monvti that works only for 64bit targets an= d e.g. maskmovq > > > > > > > > > > & similar, which are more efficient with MMX regs), but= RA has much > > > > > > > > > > more freedom to allocate the most effective register se= t even for > > > > > > > > > > 32bit targets. > > > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since MMX registers are used to pass and return __m64 val= ues, > > > > > > > > > we can't really get rid of MMX instructions in 32-bit mod= e. If people > > > > > > > > > have to stay with 32-bit mode, they need MMX. I don't th= ink we should > > > > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode. > > > > > > > > > > > > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit tar= gets. We > > > > > > > > should not *disable* SSE alternatives on 32bit targets. > > > > > > > > > > > > I don't think my patch set disables any SSE alternatives in 32-= bit > > > > > > mode. However, > > > > > > it DOES NOT enable any SSE alternatives in 32-bit mode. To rea= lly enable SSE > > > > > > alternatives in > > > > > > > > > > > > (define_insn "*mmx_3" > > > > > > [(set (match_operand:MMXMODEI 0 "register_operand" "=3Dy,x,Yv= ") > > > > > > (any_logic:MMXMODEI > > > > > > (match_operand:MMXMODEI 1 "register_mmxmem_operand" "= %0,0,Yv") > > > > > > (match_operand:MMXMODEI 2 "register_mmxmem_operand" "= ym,x,Yv")))] > > > > > > "(TARGET_MMX || TARGET_MMX_WITH_SSE) > > > > > > && ix86_binary_operator_ok (, mode, operands)" > > > > > > "@ > > > > > > p\t{%2, %0|%0, %2} > > > > > > p\t{%2, %0|%0, %2} > > > > > > vp\t{%2, %1, %0|%0, %1, %2}" > > > > > > [(set_attr "mmx_isa" "native,x64_noavx,x64_avx") > > > > > > (set_attr "type" "mmxadd,sselog,sselog") > > > > > > (set_attr "mode" "DI,TI,TI")]) > > > > > > > > > > > > register_mmxmem_operand must return true for SSE alternatives: > > > > > > > > > > It returns true for register and memory operands for 32bit target= s, because > > > > > > > > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2) > > > > > > > > Will > > > > > > > > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))] > > > > > > > > work well with RA? I got some wrong code before register_mmxmem_op= erand > > > > was added to match "ym,x,Yv". > > > > > > I see no reason why it shouldn't. > > > > This will be equivalent to replace register_operand in > > > > [(match_operand:VI1_AVX512VLBW 1 "register_operand" "v") > > > > with nonimmediate_operand. If it should work, I can do it in i386.md a= nd > > sse.md to check it out. > > > > I tried: > > sed -i -e "s/\"register_operand\"[ > \t]\+\(\"[^=3D^\+^f]\+\"[^=3D]\+$\)/\"nonimmediate_operand\" \1/" i386.md I don't know what is the point in changing these operands, but (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv") should work without problems. Uros. > and got > > (gdb) call debug_rtx (insn) > (insn 65 19 67 2 (parallel [ > (set (reg/f:SI 97) > (plus:SI (mem/u/c:SI (plus:SI (reg:SI 82) > (const:SI (unspec:SI [ > (symbol_ref:SI > ("gomp_tls_data") [flags 0x62] gomp_tls_data>) > ] UNSPEC_GOTNTPOFF))) [17 S4 A8]) > (mem/u/c:SI (const_int 0 [0]) [0 S4 A8 AS2]))) > (clobber (reg:CC 17 flags)) > ]) "/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lo= ck.c":139:7 > -1 > (expr_list:REG_DEAD (reg:SI 82) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (expr_list:REG_EQUIV (symbol_ref:SI ("gomp_tls_data") > [flags 0x62] ) > (nil))))) > (gdb) c > Continuing. > during RTL pass: ira > /export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c: In > function =E2=80=98gomp_test_nest_lock_25=E2=80=99: > /export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c:149:1: > internal compiler error: in elimination_costs_in_insn, at > reload1.c:3640 > 149 | } > | ^ > 0x108b258 elimination_costs_in_insn > /export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:3637 > 0x108596f calculate_elim_costs_all_insns() > /export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:1609 > 0xe61a7a ira_costs() > /export/gnu/import/git/gitlab/x86-gcc/gcc/ira-costs.c:2298 > 0xe56613 ira_build() > /export/gnu/import/git/gitlab/x86-gcc/gcc/ira-build.c:3432 > 0xe4b31d ira > /export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5346 > 0xe4bba0 execute > /export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5657 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > > > -- > H.J.