From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <binutils-return-105107-listarch-binutils=sources.redhat.com@sourceware.org>
Received: (qmail 110032 invoked by alias); 19 Mar 2019 08:30:30 -0000
Mailing-List: contact binutils-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <binutils.sourceware.org>
List-Subscribe: <mailto:binutils-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/binutils/>
List-Post: <mailto:binutils@sourceware.org>
List-Help: <mailto:binutils-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: binutils-owner@sourceware.org
Received: (qmail 109845 invoked by uid 89); 19 Mar 2019 08:30:19 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-7.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=
X-HELO: prv1-mh.provo.novell.com
Received: from prv1-mh.provo.novell.com (HELO prv1-mh.provo.novell.com) (137.65.248.33) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 19 Mar 2019 08:30:12 +0000
Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com	with Novell_GroupWise; Tue, 19 Mar 2019 02:30:10 -0600
Message-Id: <5C90A88F020000780022025A@prv1-mh.provo.novell.com>
Date: Tue, 19 Mar 2019 08:30:00 -0000
From: "Jan Beulich" <JBeulich@suse.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: <binutils@sourceware.org>
Subject: Re: [PATCH] x86: Correct EVEX vector load/store optimization
References: <20190315235414.11609-1-hjl.tools@gmail.com> <20190317204712.GA6721@gmail.com> <5C8FA1E8020000780021FE41@prv1-mh.provo.novell.com> <CAMe9rOqe1z6ESMcF_kFkaJLN4j9HBvJf2POVCXHsQxic5dzeTQ@mail.gmail.com>
In-Reply-To: <CAMe9rOqe1z6ESMcF_kFkaJLN4j9HBvJf2POVCXHsQxic5dzeTQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SW-Source: 2019-03/txt/msg00120.txt.bz2

>>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote:
> On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote:
>> > --- a/gas/config/tc-i386.c
>> > +++ b/gas/config/tc-i386.c
>> > @@ -4075,6 +4075,56 @@ optimize_encoding (void)
>> >           i.types[j].bitfield.ymmword =3D 0;
>> >         }
>> >      }
>> > +  else if ((cpu_arch_flags.bitfield.cpuavx
>> > +         || cpu_arch_isa_flags.bitfield.cpuavx)
>>
>> Once again a questionable condition, as per earlier replies to
>> other patches of yours.
>=20
> Fixed.
>=20
>> > +        && i.vec_encoding !=3D vex_encoding_evex
>> > +        && !i.types[0].bitfield.zmmword
>> > +        && !i.mask
>> > +        && is_evex_encoding (&i.tm)
>> > +        && (i.tm.base_opcode =3D=3D 0x666f
>> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) =3D=3D 0x666f
>> > +            || i.tm.base_opcode =3D=3D 0xf36f
>> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) =3D=3D 0xf36f
>> > +            || i.tm.base_opcode =3D=3D 0xf26f
>> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) =3D=3D 0xf26f)
>>
>> All three of these can be expressed with just a single comparison,
>> using & or | instead of ^ and (if necessary) adjusting the literal
>> value compared against.
>=20
> Fixed.
>=20
>> > +        && i.tm.extension_opcode =3D=3D None)
>> > +    {
>> > +      /* Optimize: -O1:
>> > +        VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16,
>> > +        vmovdqu32 and vmovdqu64:
>> > +          EVEX VOP %xmmM, %xmmN
>> > +            -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16)
>> > +          EVEX VOP %ymmM, %ymmN
>> > +            -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16)
>> > +          EVEX VOP %xmmM, mem
>> > +            -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16)
>> > +          EVEX VOP %ymmM, mem
>> > +            -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16)
>> > +          EVEX VOP mem, %xmmN
>> > +            -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16)
>>
>> There's some confusion on this line.
>>
>> > +          EVEX VOP mem, %ymmN
>> > +            -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)
>> > +       */
>>
>> For the variants with a memory operand I doubt the conversion
>> is always a win, and it may be against the user request in case of
>> -Os. This is because of the Disp8 scaling the EVEX encoding permits.
>=20
> Fixed.
>=20
>> > +      if (i.tm.base_opcode =3D=3D 0xf26f)
>> > +     i.tm.base_opcode =3D 0xf36f;
>> > +      else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) =3D=3D 0xf26f)
>> > +     i.tm.base_opcode =3D 0xf36f ^ Opcode_SIMD_IntD;
>>
>> This again can be expressed without "else if()" afaict.
>>
>=20
> Fixed.
>=20
> Here is the patch.

Thanks.

>--- a/gas/config/tc-i386.c
>+++ b/gas/config/tc-i386.c
>@@ -4068,18 +4068,14 @@ optimize_encoding (void)
> 	    i.types[j].bitfield.ymmword =3D 0;
> 	  }
>     }
>-  else if ((cpu_arch_flags.bitfield.cpuavx
>-	    || cpu_arch_isa_flags.bitfield.cpuavx)
>-	   && i.vec_encoding !=3D vex_encoding_evex
>+  else if (i.vec_encoding !=3D vex_encoding_evex
> 	   && !i.types[0].bitfield.zmmword

Ah, here the remaining cpuavx goes away as well.

>+      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) =3D=3D 0xf26f)
>+	{
>+	  i.tm.base_opcode &=3D Opcode_SIMD_IntD;
>+	  i.tm.base_opcode |=3D 0xf36f;
>+	}

How about the even simpler

      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) =3D=3D 0xf26f)
	i.tm.base_opcode ^=3D 0xf36f ^ 0xf26f;

?

Jan