From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-496514-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 109350 invoked by alias); 18 Feb 2019 14:37:45 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 109319 invoked by uid 89); 18 Feb 2019 14:37:44 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,GIT_PATCH_1,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mail-it1-f193.google.com
Received: from mail-it1-f193.google.com (HELO mail-it1-f193.google.com) (209.85.166.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 18 Feb 2019 14:37:42 +0000
Received: by mail-it1-f193.google.com with SMTP id v72so41524504itc.0        for <gcc-patches@gcc.gnu.org>; Mon, 18 Feb 2019 06:37:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc:content-transfer-encoding;        bh=5fAvIZl/vz5Fvi3iMk2wgothq/DJwR3NbEe9uDGy0YU=;        b=ZvBT9w7BvXZTBUlKUcLhzrLy14PgVnurKzYu6FHlWuwWiq1iG3u49jxonTdO0FF86/         /HLwwFT6D2g/gE3cUk0QzUVG/bXeQz0ywexsjqDBcuI3HZ5+DjE4WY0XeHGcqHlY7Gju         qg49JzAIdWUNURKaNC1YY+TxBiiKLiKDjxHza2qhQ3ePJwlQ2vWLyrE+h+8Bf2vihYjy         PXaE5CkpZTWLgtPF+/yc5szdVmZC0sb9ylibi5Bl5mvYaouejKJUMS4NfjH2za1DQw7/         yzfAEFzle8y9sPfYjj7rvjO0WmPVWWbvE3jie9T1YXN0Bpk6JwRQ1oLFIS+viYSzVhnh         kdCg==
MIME-Version: 1.0
References: <20190216224032.4889-1-hjl.tools@gmail.com> <CAFULd4btm_bc_u8b9zsf9rzr0odmXjXyR_6Oj07G2-T__EOuKQ@mail.gmail.com> <CAMe9rOpYwHBzzcxxrX+z=w+er-8M0sbhP8CuXtzCNx2Ct8S4sw@mail.gmail.com> <CAFULd4ZYuTSTV0+P7nYJPC782v=P41iLc7y918W2ZC6K_EnTyg@mail.gmail.com> <CAFULd4ZMZFNDabxKw11Y4grMHRja5649G1oD4MvZfqhjQjW1-Q@mail.gmail.com> <CAMe9rOpj5r4QKGt9_JoZX7zBTPswrwGNT13pbEXHUqLDaeV_6g@mail.gmail.com> <CAFULd4ZxAFejuwpLbOavzbhQ7edF1kzYY27GjOhWdKyEoQT27w@mail.gmail.com> <CAMe9rOqnKztRjW-3TpEVvtANNOkKhc1-up+1qBPFFfkKNihryQ@mail.gmail.com> <CAFULd4b7HWB5-_ByzKWdKv8_H64eMD2tCff5F1FgY7=Ek__b+g@mail.gmail.com> <CAMe9rOoX75MBaG9387Taz0gbwzFRauV2SP6_bSQZ5c7CEbAX_Q@mail.gmail.com> <CAMe9rOrTkVCats+hSy8jBN1nkHtTywRhptKu4kJU-SL7S-R9hw@mail.gmail.com>
In-Reply-To: <CAMe9rOrTkVCats+hSy8jBN1nkHtTywRhptKu4kJU-SL7S-R9hw@mail.gmail.com>
From: Uros Bizjak <ubizjak@gmail.com>
Date: Mon, 18 Feb 2019 14:37:00 -0000
Message-ID: <CAFULd4Z5SZ6Fumo552S8VS41o7xkiYyCgsLNGEZM+X6Ow3_P8w@mail.gmail.com>
Subject: Re: [PATCH 00/41] V8: Emulate MMX intrinsics with SSE
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2019-02/txt/msg01457.txt.bz2

On Mon, Feb 18, 2019 at 3:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > > > > > > > > > > On x86-64, since __m64 is returned and passed in XMM =
registers, we can
> > > > > > > > > > > emulate MMX intrinsics with SSE instructions. To supp=
ort it, we added
> > > > > > > > > > >
> > > > > > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_=
SSE2)
> > > > > > > > > > >
> > > > > > > > > > > ;; Define instruction set of MMX instructions
> > > > > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64=
_avx"
> > > > > > > > > > >   (const_string "base"))
> > > > > > > > > > >
> > > > > > > > > > >          (eq_attr "mmx_isa" "native")
> > > > > > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")
> > > > > > > > > > >          (eq_attr "mmx_isa" "x64")
> > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")
> > > > > > > > > > >          (eq_attr "mmx_isa" "x64_avx")
> > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET=
_AVX")
> > > > > > > > > > >          (eq_attr "mmx_isa" "x64_noavx")
> > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGE=
T_AVX")
> > > > > > > > > > >
> > > > > > > > > > > We added SSE emulation to MMX patterns and disabled M=
MX alternatives with
> > > > > > > > > > > TARGET_MMX_WITH_SSE.
> > > > > > > > > > >
> > > > > > > > > > > Most of MMX instructions have equivalent SSE versions=
 and results of some
> > > > > > > > > > > SSE versions need to be reshuffled to the right order=
 for MMX.  Thee are
> > > > > > > > > > > couple tricky cases:
> > > > > > > > > > >
> > > > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent=
.  We emulate MMX
> > > > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the uppe=
r 64 bits of the
> > > > > > > > > > > mask operand and handle unmapped bits 64:127 at memor=
y address by
> > > > > > > > > > > adjusting source and mask operands together with memo=
ry address.
> > > > > > > > > > >
> > > > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, wh=
ich is available
> > > > > > > > > > > in 64-bit mode.
> > > > > > > > > > >
> > > > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb ta=
kes a 4-bit index.
> > > > > > > > > > > SSE emulation must clear the bit 4 in the shuffle con=
trol mask.
> > > > > > > > > > >
> > > > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must=
 properly preserve
> > > > > > > > > > > the upper 64 bits of destination XMM register.
> > > > > > > > > > >
> > > > > > > > > > > Tests are also added to check each SSE emulation of M=
MX intrinsics.
> > > > > > > > > > >
> > > > > > > > > > > There are no regressions on i686 and x86-64.  For x86=
-64, GCC is also
> > > > > > > > > > > tested with
> > > > > > > > > > >
> > > > > > > > > > > --with-arch=3Dnative --with-cpu=3Dnative
> > > > > > > > > > >
> > > > > > > > > > > on AVX2 and AVX512F machines.
> > > > > > > > > >
> > > > > > > > > > An idea that would take patch a step further also on 32=
 bit targets:
> > > > > > > > > >
> > > > > > > > > > *Assuming* that operations on XMM registers are as fast=
 (or perhaps
> > > > > > > > > > faster) than operations on MMX registers, we can change=
 mmx_isa
> > > > > > > > > > attribute in e.g.
> > > > > > > > > >
> > > > > > > > > > +  "@
> > > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}
> > > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}
> > > > > > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"
> > > > > > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > > > > > > > >
> > > > > > > > > > to:
> > > > > > > > > >
> > > > > > > > > > [(set_attr "isa" "*,noavx,avx")
> > > > > > > > > >  (set_attr "mmx_isa" "native,*,*")]
> > > > > > > > > >
> > > > > > > > > > So, for x86_64 everything stays the same, but for x86_3=
2 we now allow
> > > > > > > > > > intrinsics to use xmm registers in addition to mmx regi=
sters. We can't
> > > > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (a=
nd some tricky
> > > > > > > > > > cases, e.g. monvti that works only for 64bit targets an=
d e.g. maskmovq
> > > > > > > > > > & similar, which are more efficient with MMX regs), but=
 RA has much
> > > > > > > > > > more freedom to allocate the most effective register se=
t even for
> > > > > > > > > > 32bit targets.
> > > > > > > > > >
> > > > > > > > > > WDYT?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Since MMX registers are used to pass and return __m64 val=
ues,
> > > > > > > > > we can't really get rid of MMX instructions in 32-bit mod=
e.  If people
> > > > > > > > > have to stay with 32-bit mode, they need MMX.  I don't th=
ink we should
> > > > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.
> > > > > > > >
> > > > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit tar=
gets. We
> > > > > > > > should not *disable* SSE alternatives on 32bit targets.
> > > > > >
> > > > > > I don't think my patch set disables any SSE alternatives in 32-=
bit
> > > > > > mode.   However,
> > > > > > it DOES NOT enable any SSE alternatives in 32-bit mode.  To rea=
lly enable SSE
> > > > > > alternatives in
> > > > > >
> > > > > > (define_insn "*mmx_<code><mode>3"
> > > > > >   [(set (match_operand:MMXMODEI 0 "register_operand" "=3Dy,x,Yv=
")
> > > > > >         (any_logic:MMXMODEI
> > > > > >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "=
%0,0,Yv")
> > > > > >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "=
ym,x,Yv")))]
> > > > > >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> > > > > >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> > > > > >   "@
> > > > > >    p<logic>\t{%2, %0|%0, %2}
> > > > > >    p<logic>\t{%2, %0|%0, %2}
> > > > > >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"
> > > > > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > > > >    (set_attr "type" "mmxadd,sselog,sselog")
> > > > > >    (set_attr "mode" "DI,TI,TI")])
> > > > > >
> > > > > > register_mmxmem_operand must return true for SSE alternatives:
> > > > >
> > > > > It returns true for register and memory operands for 32bit target=
s, because
> > > > >
> > > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)
> > > >
> > > > Will
> > > >
> > > > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]
> > > >
> > > > work well with RA?  I got some wrong code before register_mmxmem_op=
erand
> > > > was added to match "ym,x,Yv".
> > >
> > > I see no reason why it shouldn't.
> >
> > This will be equivalent to replace register_operand in
> >
> > [(match_operand:VI1_AVX512VLBW 1 "register_operand" "v")
> >
> > with nonimmediate_operand.  If it should work, I can do it in i386.md a=
nd
> > sse.md to check it out.
> >
>
> I tried:
>
> sed -i -e "s/\"register_operand\"[
> \t]\+\(\"[^=3D^\+^f]\+\"[^=3D]\+$\)/\"nonimmediate_operand\" \1/" i386.md

I don't know what is the point in changing these operands, but

(match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")

should work without problems.

Uros.

> and got
>
> (gdb) call debug_rtx (insn)
> (insn 65 19 67 2 (parallel [
>             (set (reg/f:SI 97)
>                 (plus:SI (mem/u/c:SI (plus:SI (reg:SI 82)
>                             (const:SI (unspec:SI [
>                                         (symbol_ref:SI
> ("gomp_tls_data") [flags 0x62] <var_decl 0x7fffea6c5e10
> gomp_tls_data>)
>                                     ] UNSPEC_GOTNTPOFF))) [17  S4 A8])
>                     (mem/u/c:SI (const_int 0 [0]) [0  S4 A8 AS2])))
>             (clobber (reg:CC 17 flags))
>         ]) "/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lo=
ck.c":139:7
> -1
>      (expr_list:REG_DEAD (reg:SI 82)
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (expr_list:REG_EQUIV (symbol_ref:SI ("gomp_tls_data")
> [flags 0x62] <var_decl 0x7fffea6c5e10 gomp_tls_data>)
>                 (nil)))))
> (gdb) c
> Continuing.
> during RTL pass: ira
> /export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c: In
> function =E2=80=98gomp_test_nest_lock_25=E2=80=99:
> /export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c:149:1:
> internal compiler error: in elimination_costs_in_insn, at
> reload1.c:3640
>   149 | }
>       | ^
> 0x108b258 elimination_costs_in_insn
> /export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:3637
> 0x108596f calculate_elim_costs_all_insns()
> /export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:1609
> 0xe61a7a ira_costs()
> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira-costs.c:2298
> 0xe56613 ira_build()
> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira-build.c:3432
> 0xe4b31d ira
> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5346
> 0xe4bba0 execute
> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5657
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
>
>
> --
> H.J.