Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
@ 2014-07-02 17:03 Uros Bizjak
  2014-07-03 10:45 ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-07-02 17:03 UTC (permalink / raw)
  To: gcc-patches
  Cc: Илья
	Энкович

Hello!

> Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape
> bytes in opcode).  This situation happens when REX prefix is used in SSE4 instructions.  This
> patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those
> instructions.  I achieved it by adding new tuning flag and new alternatives affected by tuning.

> SSE4 instructions are not very widely used by GCC but I see some significant gains caused by
> this patch (tested on Avoton on -O3).

> 2014-07-02  Ilya Enkovich  <ilya.enkovich@intel.com>

> * config/i386/constraints.md (Yr): New.
> * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
> (REG_CLASS_NAMES): Likewise.
> (REG_CLASS_CONTENTS): Likewise.
> * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
> which use only NO_REX_SSE_REGS.

You don't need to add alternatives, just change existing alternatives
from "x" to "Yr". The allocator will handle reduced register set just
fine.

BTW: I think that "Yr" is a very confusing name for the alternative.
I'd suggest "Ya".

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-02 17:03 [PATCH, i386] Add prefixes avoidance tuning for silvermont target Uros Bizjak
@ 2014-07-03 10:45 ` Ilya Enkovich
  2014-07-03 11:11   ` Uros Bizjak
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-03 10:45 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

2014-07-02 21:03 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> Hello!
>
>> Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape
>> bytes in opcode).  This situation happens when REX prefix is used in SSE4 instructions.  This
>> patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those
>> instructions.  I achieved it by adding new tuning flag and new alternatives affected by tuning.
>
>> SSE4 instructions are not very widely used by GCC but I see some significant gains caused by
>> this patch (tested on Avoton on -O3).
>
>> 2014-07-02  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>> * config/i386/constraints.md (Yr): New.
>> * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
>> (REG_CLASS_NAMES): Likewise.
>> (REG_CLASS_CONTENTS): Likewise.
>> * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
>> which use only NO_REX_SSE_REGS.
>
> You don't need to add alternatives, just change existing alternatives
> from "x" to "Yr". The allocator will handle reduced register set just
> fine.

Hi,

Thanks for review!

My first patch version did such replacement. Performance results were
OK but I got into stability issues due to peephole2 pass.  Peepholes
may exchange operands of instructions and ignore register restrictions
assuming all SSE registers are homogeneous.  It caused unrecognized
instructions on some tests.  I preferred to add a new alternative
instead of fixing peephole and possibly other similar problems.

>
> BTW: I think that "Yr" is a very confusing name for the alternative.
> I'd suggest "Ya".

Will rename.

Ilya

>
> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 10:45 ` Ilya Enkovich
@ 2014-07-03 11:11   ` Uros Bizjak
  2014-07-03 11:51     ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-07-03 11:11 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches

On Thu, Jul 3, 2014 at 12:45 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

>>> Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape
>>> bytes in opcode).  This situation happens when REX prefix is used in SSE4 instructions.  This
>>> patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those
>>> instructions.  I achieved it by adding new tuning flag and new alternatives affected by tuning.
>>
>>> SSE4 instructions are not very widely used by GCC but I see some significant gains caused by
>>> this patch (tested on Avoton on -O3).
>>
>>> 2014-07-02  Ilya Enkovich  <ilya.enkovich@intel.com>
>>
>>> * config/i386/constraints.md (Yr): New.
>>> * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
>>> (REG_CLASS_NAMES): Likewise.
>>> (REG_CLASS_CONTENTS): Likewise.
>>> * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
>>> which use only NO_REX_SSE_REGS.
>>
>> You don't need to add alternatives, just change existing alternatives
>> from "x" to "Yr". The allocator will handle reduced register set just
>> fine.
>
> Hi,
>
> Thanks for review!
>
> My first patch version did such replacement. Performance results were
> OK but I got into stability issues due to peephole2 pass.  Peepholes
> may exchange operands of instructions and ignore register restrictions
> assuming all SSE registers are homogeneous.  It caused unrecognized
> instructions on some tests.  I preferred to add a new alternative
> instead of fixing peephole and possibly other similar problems.

No, please rather fix the peephole2 patterns. It is just a matter of
putting satisfies_constraint_Xx to their insn condition. In effect,
peephole2 pass is nullifying your optimization. Also, RA is still free
to allocate unwanted registers, even when prefixed with "?".

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 11:11   ` Uros Bizjak
@ 2014-07-03 11:51     ` Ilya Enkovich
  2014-07-03 12:07       ` Uros Bizjak
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-03 11:51 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

2014-07-03 15:11 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Thu, Jul 3, 2014 at 12:45 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>
>>>> Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape
>>>> bytes in opcode).  This situation happens when REX prefix is used in SSE4 instructions.  This
>>>> patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those
>>>> instructions.  I achieved it by adding new tuning flag and new alternatives affected by tuning.
>>>
>>>> SSE4 instructions are not very widely used by GCC but I see some significant gains caused by
>>>> this patch (tested on Avoton on -O3).
>>>
>>>> 2014-07-02  Ilya Enkovich  <ilya.enkovich@intel.com>
>>>
>>>> * config/i386/constraints.md (Yr): New.
>>>> * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
>>>> (REG_CLASS_NAMES): Likewise.
>>>> (REG_CLASS_CONTENTS): Likewise.
>>>> * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
>>>> which use only NO_REX_SSE_REGS.
>>>
>>> You don't need to add alternatives, just change existing alternatives
>>> from "x" to "Yr". The allocator will handle reduced register set just
>>> fine.
>>
>> Hi,
>>
>> Thanks for review!
>>
>> My first patch version did such replacement. Performance results were
>> OK but I got into stability issues due to peephole2 pass.  Peepholes
>> may exchange operands of instructions and ignore register restrictions
>> assuming all SSE registers are homogeneous.  It caused unrecognized
>> instructions on some tests.  I preferred to add a new alternative
>> instead of fixing peephole and possibly other similar problems.
>
> No, please rather fix the peephole2 patterns. It is just a matter of
> putting satisfies_constraint_Xx to their insn condition. In effect,
> peephole2 pass is nullifying your optimization. Also, RA is still free
> to allocate unwanted registers, even when prefixed with "?".

I didn't find a nice way to fix peephole2 patterns to take register
constraints into account. Is there any way to do it?
Also fully restrict xmm8-15 does not seem right.  It is just costly
but not fully disallowed.

Ilya

>
> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 11:51     ` Ilya Enkovich
@ 2014-07-03 12:07       ` Uros Bizjak
  2014-07-03 13:38         ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-07-03 12:07 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches

On Thu, Jul 3, 2014 at 1:50 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

> I didn't find a nice way to fix peephole2 patterns to take register
> constraints into account. Is there any way to do it?

Use REX_SSE_REGNO_P (REGNO (operands[...])) in the insn C constraint.

> Also fully restrict xmm8-15 does not seem right.  It is just costly
> but not fully disallowed.

As said earlier, you can try "Ya*x" as a constraint.

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 12:07       ` Uros Bizjak
@ 2014-07-03 13:38         ` Ilya Enkovich
  2014-07-14 14:22           ` Ilya Enkovich
  2014-07-14 19:58           ` Uros Bizjak
  0 siblings, 2 replies; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-03 13:38 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

2014-07-03 16:07 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Thu, Jul 3, 2014 at 1:50 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>
>> I didn't find a nice way to fix peephole2 patterns to take register
>> constraints into account. Is there any way to do it?
>
> Use REX_SSE_REGNO_P (REGNO (operands[...])) in the insn C constraint.

Peephole doesn't know whether it works with tuned instruction or not,
right? I would need to mark all instructions I modify with some
attribute and then check for it in peephole.

>
>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>> but not fully disallowed.
>
> As said earlier, you can try "Ya*x" as a constraint.

I tried it. It does not seem to affect allocation much. I do not see
any gain on targeted tests.

Ilya

>
> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 13:38         ` Ilya Enkovich
@ 2014-07-14 14:22           ` Ilya Enkovich
  2014-07-14 19:58           ` Uros Bizjak
  1 sibling, 0 replies; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-14 14:22 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

Ping

2014-07-03 17:38 GMT+04:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
> 2014-07-03 16:07 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Thu, Jul 3, 2014 at 1:50 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>
>>> I didn't find a nice way to fix peephole2 patterns to take register
>>> constraints into account. Is there any way to do it?
>>
>> Use REX_SSE_REGNO_P (REGNO (operands[...])) in the insn C constraint.
>
> Peephole doesn't know whether it works with tuned instruction or not,
> right? I would need to mark all instructions I modify with some
> attribute and then check for it in peephole.
>
>>
>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>> but not fully disallowed.
>>
>> As said earlier, you can try "Ya*x" as a constraint.
>
> I tried it. It does not seem to affect allocation much. I do not see
> any gain on targeted tests.
>
> Ilya
>
>>
>> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 13:38         ` Ilya Enkovich
  2014-07-14 14:22           ` Ilya Enkovich
@ 2014-07-14 19:58           ` Uros Bizjak
  2014-07-15  8:30             ` Ilya Enkovich
  1 sibling, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-07-14 19:58 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Jakub Jelinek

On Thu, Jul 3, 2014 at 3:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2014-07-03 16:07 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Thu, Jul 3, 2014 at 1:50 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>
>>> I didn't find a nice way to fix peephole2 patterns to take register
>>> constraints into account. Is there any way to do it?
>>
>> Use REX_SSE_REGNO_P (REGNO (operands[...])) in the insn C constraint.
>
> Peephole doesn't know whether it works with tuned instruction or not,
> right? I would need to mark all instructions I modify with some
> attribute and then check for it in peephole.
>
>>
>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>> but not fully disallowed.
>>
>> As said earlier, you can try "Ya*x" as a constraint.
>
> I tried it. It does not seem to affect allocation much. I do not see
> any gain on targeted tests.

Strange, because the documentation claims:

'*'
     Says that the following character should be ignored when choosing
     register preferences.  '*' has no effect on the meaning of the
     constraint as a constraint, and no effect on reloading.  For LRA
     '*' additionally disparages slightly the alternative if the
     following character matches the operand.

Let me rethink this a bit. Prehaps we could reconsider Jakub's
proposal with "Ya,!x" (with two alternatives). IIRC this approach was
needed for some MMX alternatives, where we didn't want RA to allocate
a MMX register when the value could be passed in integer regs, but the
value was still allowed in MMX register.

As a side note, I'll investigate pushdf pattern, if it needs to be
updated. The proposed approach is from reload era, and things changed
substantially since then.

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-14 19:58           ` Uros Bizjak
@ 2014-07-15  8:30             ` Ilya Enkovich
  2014-07-15  8:46               ` Uros Bizjak
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-15  8:30 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jakub Jelinek

2014-07-14 23:58 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Thu, Jul 3, 2014 at 3:38 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2014-07-03 16:07 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
>>> On Thu, Jul 3, 2014 at 1:50 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>
>>>> I didn't find a nice way to fix peephole2 patterns to take register
>>>> constraints into account. Is there any way to do it?
>>>
>>> Use REX_SSE_REGNO_P (REGNO (operands[...])) in the insn C constraint.
>>
>> Peephole doesn't know whether it works with tuned instruction or not,
>> right? I would need to mark all instructions I modify with some
>> attribute and then check for it in peephole.
>>
>>>
>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>>> but not fully disallowed.
>>>
>>> As said earlier, you can try "Ya*x" as a constraint.
>>
>> I tried it. It does not seem to affect allocation much. I do not see
>> any gain on targeted tests.
>
> Strange, because the documentation claims:
>
> '*'
>      Says that the following character should be ignored when choosing
>      register preferences.  '*' has no effect on the meaning of the
>      constraint as a constraint, and no effect on reloading.  For LRA
>      '*' additionally disparages slightly the alternative if the
>      following character matches the operand.
>
> Let me rethink this a bit. Prehaps we could reconsider Jakub's
> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
> needed for some MMX alternatives, where we didn't want RA to allocate
> a MMX register when the value could be passed in integer regs, but the
> value was still allowed in MMX register.

That's is what my patch already does, but with '?' instead of '!'.
Will try it with '!' and see if there is a big difference.

Ilya

>
> As a side note, I'll investigate pushdf pattern, if it needs to be
> updated. The proposed approach is from reload era, and things changed
> substantially since then.
>
> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-15  8:30             ` Ilya Enkovich
@ 2014-07-15  8:46               ` Uros Bizjak
  2014-07-15 11:02                 ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-07-15  8:46 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

>>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>>>> but not fully disallowed.
>>>>
>>>> As said earlier, you can try "Ya*x" as a constraint.
>>>
>>> I tried it. It does not seem to affect allocation much. I do not see
>>> any gain on targeted tests.
>>
>> Strange, because the documentation claims:
>>
>> '*'
>>      Says that the following character should be ignored when choosing
>>      register preferences.  '*' has no effect on the meaning of the
>>      constraint as a constraint, and no effect on reloading.  For LRA
>>      '*' additionally disparages slightly the alternative if the
>>      following character matches the operand.
>>
>> Let me rethink this a bit. Prehaps we could reconsider Jakub's
>> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
>> needed for some MMX alternatives, where we didn't want RA to allocate
>> a MMX register when the value could be passed in integer regs, but the
>> value was still allowed in MMX register.
>
> That's is what my patch already does, but with '?' instead of '!'.

Yes, I know. The problem is, that Ya*x type conditional allocation
worked OK in the past for "not preferred, but still alowed regclass"
registers, There are several patterns in i386.md that live by this
premise, including movsf_internal and movdf_internal. If this approach
doesn't work anymore, then we have to either figure out what is the
reason, or invent a new strategy that will be applicable to all cases.

Can you please post a small test that illustrates the case where Ya,!x
works, but Ya*x doesn't?

I have added Vlad to CC for his opinion on this matter. This is
clearly a RA issue that has to be resolved before your patch is
committed.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-15  8:46               ` Uros Bizjak
@ 2014-07-15 11:02                 ` Ilya Enkovich
  2014-07-15 14:03                   ` Uros Bizjak
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-15 11:02 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

On 15 Jul 10:42, Uros Bizjak wrote:
> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 
> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
> >>>>> but not fully disallowed.
> >>>>
> >>>> As said earlier, you can try "Ya*x" as a constraint.
> >>>
> >>> I tried it. It does not seem to affect allocation much. I do not see
> >>> any gain on targeted tests.
> >>
> >> Strange, because the documentation claims:
> >>
> >> '*'
> >>      Says that the following character should be ignored when choosing
> >>      register preferences.  '*' has no effect on the meaning of the
> >>      constraint as a constraint, and no effect on reloading.  For LRA
> >>      '*' additionally disparages slightly the alternative if the
> >>      following character matches the operand.
> >>
> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
> >> needed for some MMX alternatives, where we didn't want RA to allocate
> >> a MMX register when the value could be passed in integer regs, but the
> >> value was still allowed in MMX register.
> >
> > That's is what my patch already does, but with '?' instead of '!'.
> 
> Yes, I know. The problem is, that Ya*x type conditional allocation
> worked OK in the past for "not preferred, but still alowed regclass"
> registers, There are several patterns in i386.md that live by this
> premise, including movsf_internal and movdf_internal. If this approach
> doesn't work anymore, then we have to either figure out what is the
> reason, or invent a new strategy that will be applicable to all cases.
> 
> Can you please post a small test that illustrates the case where Ya,!x
> works, but Ya*x doesn't?

It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.

Patch1:

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d907353..6b03b72 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11852,10 +11852,10 @@
    (set_attr "mode" "OI")])

 (define_insn "sse4_1_<code>v4hiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=x")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
        (any_extend:V4SI
          (vec_select:V4HI
-           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
            (parallel [(const_int 0) (const_int 1)
                       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1"

Patch2:

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d907353..b3721c4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11852,10 +11852,10 @@
    (set_attr "mode" "OI")])

 (define_insn "sse4_1_<code>v4hiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=x")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
        (any_extend:V4SI
          (vec_select:V4HI
-           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
            (parallel [(const_int 0) (const_int 1)
                       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1"


Here are results of looking for pmovzxwd in resulting binaries:
#objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
76
#objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
0
#objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
76

Therefore I make a conclusion that Yr*x does not really differ much from x.


Thanks,
Ilya

> 
> I have added Vlad to CC for his opinion on this matter. This is
> clearly a RA issue that has to be resolved before your patch is
> committed.
> 
> Thanks,
> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-15 11:02                 ` Ilya Enkovich
@ 2014-07-15 14:03                   ` Uros Bizjak
  2014-11-05  9:35                     ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-07-15 14:03 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> On 15 Jul 10:42, Uros Bizjak wrote:
>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>
>> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>> >>>>> but not fully disallowed.
>> >>>>
>> >>>> As said earlier, you can try "Ya*x" as a constraint.
>> >>>
>> >>> I tried it. It does not seem to affect allocation much. I do not see
>> >>> any gain on targeted tests.
>> >>
>> >> Strange, because the documentation claims:
>> >>
>> >> '*'
>> >>      Says that the following character should be ignored when choosing
>> >>      register preferences.  '*' has no effect on the meaning of the
>> >>      constraint as a constraint, and no effect on reloading.  For LRA
>> >>      '*' additionally disparages slightly the alternative if the
>> >>      following character matches the operand.
>> >>
>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
>> >> needed for some MMX alternatives, where we didn't want RA to allocate
>> >> a MMX register when the value could be passed in integer regs, but the
>> >> value was still allowed in MMX register.
>> >
>> > That's is what my patch already does, but with '?' instead of '!'.
>>
>> Yes, I know. The problem is, that Ya*x type conditional allocation
>> worked OK in the past for "not preferred, but still alowed regclass"
>> registers, There are several patterns in i386.md that live by this
>> premise, including movsf_internal and movdf_internal. If this approach
>> doesn't work anymore, then we have to either figure out what is the
>> reason, or invent a new strategy that will be applicable to all cases.
>>
>> Can you please post a small test that illustrates the case where Ya,!x
>> works, but Ya*x doesn't?
>
> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.
>
> Patch1:
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d907353..6b03b72 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -11852,10 +11852,10 @@
>     (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_<code>v4hiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
>         (any_extend:V4SI
>           (vec_select:V4HI
> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
>             (parallel [(const_int 0) (const_int 1)
>                        (const_int 2) (const_int 3)]))))]
>    "TARGET_SSE4_1"
>
> Patch2:
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d907353..b3721c4 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -11852,10 +11852,10 @@
>     (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_<code>v4hiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
>         (any_extend:V4SI
>           (vec_select:V4HI
> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
>             (parallel [(const_int 0) (const_int 1)
>                        (const_int 2) (const_int 3)]))))]
>    "TARGET_SSE4_1"
>
>
> Here are results of looking for pmovzxwd in resulting binaries:
> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 76
> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 0
> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 76
>
> Therefore I make a conclusion that Yr*x does not really differ much from x.

Just FTR:

Using "Yr,*x" is also a viable option:

#objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep
"xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
0

I believe that the above is the way to go with LRA. Vladimir, what do you think?

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-15 14:03                   ` Uros Bizjak
@ 2014-11-05  9:35                     ` Ilya Enkovich
  2014-11-05 10:00                       ` Uros Bizjak
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-11-05  9:35 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

Hi,

Having stage1 close to end, may we make some decision regarding this
patch? Having a couple of working variants, may we choose and use one
of them?

Thanks,
Ilya

2014-07-15 17:38 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> On 15 Jul 10:42, Uros Bizjak wrote:
>>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>
>>> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>> >>>>> but not fully disallowed.
>>> >>>>
>>> >>>> As said earlier, you can try "Ya*x" as a constraint.
>>> >>>
>>> >>> I tried it. It does not seem to affect allocation much. I do not see
>>> >>> any gain on targeted tests.
>>> >>
>>> >> Strange, because the documentation claims:
>>> >>
>>> >> '*'
>>> >>      Says that the following character should be ignored when choosing
>>> >>      register preferences.  '*' has no effect on the meaning of the
>>> >>      constraint as a constraint, and no effect on reloading.  For LRA
>>> >>      '*' additionally disparages slightly the alternative if the
>>> >>      following character matches the operand.
>>> >>
>>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
>>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
>>> >> needed for some MMX alternatives, where we didn't want RA to allocate
>>> >> a MMX register when the value could be passed in integer regs, but the
>>> >> value was still allowed in MMX register.
>>> >
>>> > That's is what my patch already does, but with '?' instead of '!'.
>>>
>>> Yes, I know. The problem is, that Ya*x type conditional allocation
>>> worked OK in the past for "not preferred, but still alowed regclass"
>>> registers, There are several patterns in i386.md that live by this
>>> premise, including movsf_internal and movdf_internal. If this approach
>>> doesn't work anymore, then we have to either figure out what is the
>>> reason, or invent a new strategy that will be applicable to all cases.
>>>
>>> Can you please post a small test that illustrates the case where Ya,!x
>>> works, but Ya*x doesn't?
>>
>> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.
>>
>> Patch1:
>>
>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> index d907353..6b03b72 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -11852,10 +11852,10 @@
>>     (set_attr "mode" "OI")])
>>
>>  (define_insn "sse4_1_<code>v4hiv4si2"
>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
>>         (any_extend:V4SI
>>           (vec_select:V4HI
>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
>>             (parallel [(const_int 0) (const_int 1)
>>                        (const_int 2) (const_int 3)]))))]
>>    "TARGET_SSE4_1"
>>
>> Patch2:
>>
>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> index d907353..b3721c4 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -11852,10 +11852,10 @@
>>     (set_attr "mode" "OI")])
>>
>>  (define_insn "sse4_1_<code>v4hiv4si2"
>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
>>         (any_extend:V4SI
>>           (vec_select:V4HI
>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
>>             (parallel [(const_int 0) (const_int 1)
>>                        (const_int 2) (const_int 3)]))))]
>>    "TARGET_SSE4_1"
>>
>>
>> Here are results of looking for pmovzxwd in resulting binaries:
>> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 76
>> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 0
>> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 76
>>
>> Therefore I make a conclusion that Yr*x does not really differ much from x.
>
> Just FTR:
>
> Using "Yr,*x" is also a viable option:
>
> #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep
> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 0
>
> I believe that the above is the way to go with LRA. Vladimir, what do you think?
>
> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-11-05  9:35                     ` Ilya Enkovich
@ 2014-11-05 10:00                       ` Uros Bizjak
  2014-12-02 11:08                         ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-11-05 10:00 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

On Wed, Nov 5, 2014 at 10:35 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Hi,
>
> Having stage1 close to end, may we make some decision regarding this
> patch? Having a couple of working variants, may we choose and use one
> of them?

I propose to wait for Vlad for an update about his plans on register
preference algorythm that would fix this (and other "Ya*r"-type
issues).

In the absence of the fix, we'll go with "Yr,*x".

Uros.

>
> Thanks,
> Ilya
>
> 2014-07-15 17:38 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> On 15 Jul 10:42, Uros Bizjak wrote:
>>>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>
>>>> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>>> >>>>> but not fully disallowed.
>>>> >>>>
>>>> >>>> As said earlier, you can try "Ya*x" as a constraint.
>>>> >>>
>>>> >>> I tried it. It does not seem to affect allocation much. I do not see
>>>> >>> any gain on targeted tests.
>>>> >>
>>>> >> Strange, because the documentation claims:
>>>> >>
>>>> >> '*'
>>>> >>      Says that the following character should be ignored when choosing
>>>> >>      register preferences.  '*' has no effect on the meaning of the
>>>> >>      constraint as a constraint, and no effect on reloading.  For LRA
>>>> >>      '*' additionally disparages slightly the alternative if the
>>>> >>      following character matches the operand.
>>>> >>
>>>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
>>>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
>>>> >> needed for some MMX alternatives, where we didn't want RA to allocate
>>>> >> a MMX register when the value could be passed in integer regs, but the
>>>> >> value was still allowed in MMX register.
>>>> >
>>>> > That's is what my patch already does, but with '?' instead of '!'.
>>>>
>>>> Yes, I know. The problem is, that Ya*x type conditional allocation
>>>> worked OK in the past for "not preferred, but still alowed regclass"
>>>> registers, There are several patterns in i386.md that live by this
>>>> premise, including movsf_internal and movdf_internal. If this approach
>>>> doesn't work anymore, then we have to either figure out what is the
>>>> reason, or invent a new strategy that will be applicable to all cases.
>>>>
>>>> Can you please post a small test that illustrates the case where Ya,!x
>>>> works, but Ya*x doesn't?
>>>
>>> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.
>>>
>>> Patch1:
>>>
>>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>>> index d907353..6b03b72 100644
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -11852,10 +11852,10 @@
>>>     (set_attr "mode" "OI")])
>>>
>>>  (define_insn "sse4_1_<code>v4hiv4si2"
>>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
>>>         (any_extend:V4SI
>>>           (vec_select:V4HI
>>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
>>>             (parallel [(const_int 0) (const_int 1)
>>>                        (const_int 2) (const_int 3)]))))]
>>>    "TARGET_SSE4_1"
>>>
>>> Patch2:
>>>
>>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>>> index d907353..b3721c4 100644
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -11852,10 +11852,10 @@
>>>     (set_attr "mode" "OI")])
>>>
>>>  (define_insn "sse4_1_<code>v4hiv4si2"
>>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
>>>         (any_extend:V4SI
>>>           (vec_select:V4HI
>>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
>>>             (parallel [(const_int 0) (const_int 1)
>>>                        (const_int 2) (const_int 3)]))))]
>>>    "TARGET_SSE4_1"
>>>
>>>
>>> Here are results of looking for pmovzxwd in resulting binaries:
>>> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>>> 76
>>> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>>> 0
>>> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>>> 76
>>>
>>> Therefore I make a conclusion that Yr*x does not really differ much from x.
>>
>> Just FTR:
>>
>> Using "Yr,*x" is also a viable option:
>>
>> #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep
>> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 0
>>
>> I believe that the above is the way to go with LRA. Vladimir, what do you think?
>>
>> Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-11-05 10:00                       ` Uros Bizjak
@ 2014-12-02 11:08                         ` Ilya Enkovich
  2014-12-02 11:21                           ` Uros Bizjak
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-12-02 11:08 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

[-- Attachment #1: Type: text/plain, Size: 2563 bytes --]

On 05 Nov 11:00, Uros Bizjak wrote:
> On Wed, Nov 5, 2014 at 10:35 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> > Hi,
> >
> > Having stage1 close to end, may we make some decision regarding this
> > patch? Having a couple of working variants, may we choose and use one
> > of them?
> 
> I propose to wait for Vlad for an update about his plans on register
> preference algorythm that would fix this (and other "Ya*r"-type
> issues).
> 
> In the absence of the fix, we'll go with "Yr,*x".
> 
> Uros.
> 

Hi,

Here is an updated patch which uses "Yr,*x" option to avoid long prefixes for Silvermont.  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2014-12-02  Ilya Enkovich  <ilya.enkovich@intel.com>

	* config/i386/constraints.md (Yr): New.
	* config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
	(REG_CLASS_NAMES): Likewise.
	(REG_CLASS_CONTENTS): Likewise.
	* config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
	which use only NO_REX_SSE_REGS.
	(vec_set<mode>_0): Likewise.
	(*vec_setv4sf_sse4_1): Likewise.
	(sse4_1_insertps): Likewise.
	(*sse4_1_extractps): Likewise.
	(*sse4_1_mulv2siv2di3<mask_name>): Likewise.
	(*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise.
	(*sse4_1_<code><mode>3<mask_name>): Likewise.
	(*sse4_1_<code><mode>3): Likewise.
	(*sse4_1_eqv2di3): Likewise.
	(sse4_2_gtv2di3): Likewise.
	(*vec_extractv4si): Likewise.
	(*vec_concatv2si_sse4_1): Likewise.
	(vec_concatv2di): Likewise.
	(<sse4_1>_blend<ssemodesuffix><avxsizesuffix>): Likewise.
	(<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>): Likewise.
	(<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise.
	(<vi8_sse4_1_avx2_avx512>_movntdqa): Likewise.
	(<sse4_1_avx2>_mpsadbw): Likewise.
	(<sse4_1_avx2>packusdw<mask_name>): Likewise.
	(<sse4_1_avx2>_pblendvb): Likewise.
	(sse4_1_pblendw): Likewise.
	(sse4_1_phminposuw): Likewise.
	(sse4_1_<code>v8qiv8hi2<mask_name>): Likewise.
	(sse4_1_<code>v4qiv4si2<mask_name>): Likewise.
	(sse4_1_<code>v4hiv4si2<mask_name>): Likewise.
	(sse4_1_<code>v2qiv2di2<mask_name>): Likewise.
	(sse4_1_<code>v2hiv2di2<mask_name>): Likewise.
	(sse4_1_<code>v2siv2di2<mask_name>): Likewise.
	(sse4_1_ptest): Likewise.
	(<sse4_1>_round<ssemodesuffix><avxsizesuffix>): Likewise.
	(sse4_1_round<ssescalarmodesuffix>): Likewise.
	* config/i386/subst.md (mask_prefix4): New.
	* config/i386/x86-tune.def (X86_TUNE_AVOID_4BYTE_PREFIXES): New.

gcc/testsuites/

2014-12-02  Ilya Enkovich  <ilya.enkovich@intel.com>

	* gcc.target/i386/sse2-init-v2di-2.c: Adjust to changed
	vec_concatv2di template.

[-- Attachment #2: slm-prefixes1.patch --]
[-- Type: text/plain, Size: 38486 bytes --]

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 4e07d70..72925cd 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -108,6 +108,8 @@
 ;;  d	Integer register when integer DFmode moves are enabled
 ;;  x	Integer register when integer XFmode moves are enabled
 ;;  f	x87 register when 80387 floating point arithmetic is enabled
+;;  r	SSE regs not requiring REX prefix when prefixes avoidance is enabled
+;;	and all SSE regs otherwise
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -150,6 +152,10 @@
  "(ix86_fpmath & FPMATH_387) ? FLOAT_REGS : NO_REGS"
  "@internal Any x87 register when 80387 FP arithmetic is enabled.")
 
+(define_register_constraint "Yr"
+ "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : ALL_SSE_REGS) : NO_REGS"
+ "@internal Lower SSE register when avoiding REX prefix and all SSE registers otherwise.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3f5f979..6d9df65 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1311,6 +1311,7 @@ enum reg_class
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
+  NO_REX_SSE_REGS,
   SSE_REGS,
   EVEX_SSE_REGS,
   BND_REGS,
@@ -1369,6 +1370,7 @@ enum reg_class
    "FP_TOP_REG", "FP_SECOND_REG",	\
    "FLOAT_REGS",			\
    "SSE_FIRST_REG",			\
+   "NO_REX_SSE_REGS",			\
    "SSE_REGS",				\
    "EVEX_SSE_REGS",			\
    "BND_REGS",				\
@@ -1409,6 +1411,7 @@ enum reg_class
     { 0x0200,       0x0,    0x0 },       /* FP_SECOND_REG */             \
     { 0xff00,       0x0,    0x0 },       /* FLOAT_REGS */                \
   { 0x200000,       0x0,    0x0 },       /* SSE_FIRST_REG */             \
+{ 0x1fe00000,  0x000000,    0x0 },       /* NO_REX_SSE_REGS */           \
 { 0x1fe00000,  0x1fe000,    0x0 },       /* SSE_REGS */                  \
        { 0x0,0xffe00000,   0x1f },       /* EVEX_SSE_REGS */             \
        { 0x0,       0x0,0x1e000 },       /* BND_REGS */			 \
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ca5d720..ac311c1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6338,26 +6338,28 @@
 ;; Although insertps takes register source, we prefer
 ;; unpcklps with register source since it is shorter.
 (define_insn "*vec_concatv2sf_sse4_1"
-  [(set (match_operand:V2SF 0 "register_operand"     "=x,x,x,x,x,*y ,*y")
+  [(set (match_operand:V2SF 0 "register_operand"     "=Yr,*x,x,Yr,*x,x,x,*y ,*y")
 	(vec_concat:V2SF
-	  (match_operand:SF 1 "nonimmediate_operand" " 0,x,0,x,m, 0 , m")
-	  (match_operand:SF 2 "vector_move_operand"  " x,x,m,m,C,*ym, C")))]
+	  (match_operand:SF 1 "nonimmediate_operand" "  0, 0,x, 0,0, x,m, 0 , m")
+	  (match_operand:SF 2 "vector_move_operand"  " Yr,*x,x, m,m, m,C,*ym, C")))]
   "TARGET_SSE4_1"
   "@
    unpcklps\t{%2, %0|%0, %2}
+   unpcklps\t{%2, %0|%0, %2}
    vunpcklps\t{%2, %1, %0|%0, %1, %2}
    insertps\t{$0x10, %2, %0|%0, %2, 0x10}
+   insertps\t{$0x10, %2, %0|%0, %2, 0x10}
    vinsertps\t{$0x10, %2, %1, %0|%0, %1, %2, 0x10}
    %vmovss\t{%1, %0|%0, %1}
    punpckldq\t{%2, %0|%0, %2}
    movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
-   (set_attr "type" "sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_data16" "*,*,1,*,*,*,*")
-   (set_attr "prefix_extra" "*,*,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,1,1,*,*,*")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
+  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
+   (set_attr "type" "sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
+   (set_attr "prefix_data16" "*,*,*,1,1,*,*,*,*")
+   (set_attr "prefix_extra" "*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
+   (set_attr "mode" "V4SF,V4SF,V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
@@ -6405,49 +6407,51 @@
 ;; see comment above inline_secondary_memory_needed function in i386.c
 (define_insn "vec_set<mode>_0"
   [(set (match_operand:VI4F_128 0 "nonimmediate_operand"
-	  "=v,v,v ,x,x,v,x  ,x  ,m ,m   ,m")
+	  "=Yr,*v,v,v ,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
 	(vec_merge:VI4F_128
 	  (vec_duplicate:VI4F_128
 	    (match_operand:<ssescalarmode> 2 "general_operand"
-	  " v,m,*r,m,x,v,*rm,*rm,!x,!*re,!*fF"))
+	  " Yr,*v,m,*r,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
 	  (match_operand:VI4F_128 1 "vector_move_operand"
-	  " C,C,C ,C,0,v,0  ,x  ,0 ,0   ,0")
+	  " C , C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
+   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
    %vmov<ssescalarmodesuffix>\t{%2, %0|%0, %2}
    %vmovd\t{%2, %0|%0, %2}
    movss\t{%2, %0|%0, %2}
    movss\t{%2, %0|%0, %2}
    vmovss\t{%2, %1, %0|%0, %1, %2}
    pinsrd\t{$0, %2, %0|%0, %2, 0}
+   pinsrd\t{$0, %2, %0|%0, %2, 0}
    vpinsrd\t{$0, %2, %1, %0|%0, %1, %2, 0}
    #
    #
    #"
-  [(set_attr "isa" "sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,avx,*,*,*")
+  [(set_attr "isa" "sse4,sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
    (set (attr "type")
-     (cond [(eq_attr "alternative" "0,6,7")
+     (cond [(eq_attr "alternative" "0,1,7,8,9")
 	      (const_string "sselog")
-	    (eq_attr "alternative" "9")
+	    (eq_attr "alternative" "11")
 	      (const_string "imov")
-	    (eq_attr "alternative" "10")
+	    (eq_attr "alternative" "12")
 	      (const_string "fmov")
 	   ]
 	   (const_string "ssemov")))
-   (set_attr "prefix_extra" "*,*,*,*,*,*,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,*,*,*,*,1,1,*,*,*")
-   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,vex,*,*,*")
-   (set_attr "mode" "SF,<ssescalarmode>,SI,SF,SF,SF,TI,TI,*,*,*")])
+   (set_attr "prefix_extra" "*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
+   (set_attr "mode" "SF,SF,<ssescalarmode>,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
 
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
-  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (match_operand:SF 2 "nonimmediate_operand" "xm,xm"))
-	  (match_operand:V4SF 1 "register_operand" "0,x")
+	    (match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,xm"))
+	  (match_operand:V4SF 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
    && ((unsigned) exact_log2 (INTVAL (operands[3]))
@@ -6457,26 +6461,27 @@
   switch (which_alternative)
     {
     case 0:
-      return "insertps\t{%3, %2, %0|%0, %2, %3}";
     case 1:
+      return "insertps\t{%3, %2, %0|%0, %2, %3}";
+    case 2:
       return "vinsertps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "V4SF")])
 
 (define_insn "sse4_1_insertps"
-  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
-	(unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "xm,xm")
-		      (match_operand:V4SF 1 "register_operand" "0,x")
-		      (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,x")
+	(unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "Yrm,*xm,xm")
+		      (match_operand:V4SF 1 "register_operand" "0,0,x")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 		     UNSPEC_INSERTPS))]
   "TARGET_SSE4_1"
 {
@@ -6490,19 +6495,20 @@
   switch (which_alternative)
     {
     case 0:
-      return "insertps\t{%3, %2, %0|%0, %2, %3}";
     case 1:
+      return "insertps\t{%3, %2, %0|%0, %2, %3}";
+    case 2:
       return "vinsertps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "V4SF")])
 
 (define_split
@@ -6544,13 +6550,14 @@
 })
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=*rm,rm,x,x")
 	(vec_select:SF
-	  (match_operand:V4SF 1 "register_operand" "x,0,x")
-	  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n")])))]
+	  (match_operand:V4SF 1 "register_operand" "Yr,*x,0,x")
+	  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n")])))]
   "TARGET_SSE4_1"
   "@
    %vextractps\t{%2, %1, %0|%0, %1, %2}
+   %vextractps\t{%2, %1, %0|%0, %1, %2}
    #
    #"
   "&& reload_completed && SSE_REG_P (operands[0])"
@@ -6575,13 +6582,13 @@
     }
   DONE;
 }
-  [(set_attr "isa" "*,noavx,avx")
-   (set_attr "type" "sselog,*,*")
-   (set_attr "prefix_data16" "1,*,*")
-   (set_attr "prefix_extra" "1,*,*")
-   (set_attr "length_immediate" "1,*,*")
-   (set_attr "prefix" "maybe_vex,*,*")
-   (set_attr "mode" "V4SF,*,*")])
+  [(set_attr "isa" "*,*,noavx,avx")
+   (set_attr "type" "sselog,sselog,*,*")
+   (set_attr "prefix_data16" "1,1,*,*")
+   (set_attr "prefix_extra" "1,1,*,*")
+   (set_attr "length_immediate" "1,1,*,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,*,*")
+   (set_attr "mode" "V4SF,V4SF,*,*")])
 
 (define_insn_and_split "*vec_extractv4sf_mem"
   [(set (match_operand:SF 0 "register_operand" "=x,*r,f")
@@ -9553,26 +9560,27 @@
   "ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);")
 
 (define_insn "*sse4_1_mulv2siv2di3<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%0,v")
+	      (match_operand:V4SI 1 "nonimmediate_operand" "%0,0,v")
 	      (parallel [(const_int 0) (const_int 2)])))
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 2 "nonimmediate_operand" "xm,vm")
+	      (match_operand:V4SI 2 "nonimmediate_operand" "Yrm,*xm,vm")
 	      (parallel [(const_int 0) (const_int 2)])))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>
    && ix86_binary_operator_ok (MULT, V4SImode, operands)"
   "@
    pmuldq\t{%2, %0|%0, %2}
+   pmuldq\t{%2, %0|%0, %2}
    vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseimul")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "avx512bw_pmaddwd512<mode><mask_name>"
@@ -9752,19 +9760,20 @@
 })
 
 (define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
-  [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v")
+  [(set (match_operand:VI4_AVX512F 0 "register_operand" "=Yr,*x,v")
 	(mult:VI4_AVX512F
-	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,0,v")
+	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "Yrm,*xm,vm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    pmulld\t{%2, %0|%0, %2}
+   pmulld\t{%2, %0|%0, %2}
    vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "<mask_prefix3>")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "<mask_prefix4>")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "mul<mode>3"
@@ -10241,20 +10250,21 @@
 })
 
 (define_insn "*sse4_1_<code><mode>3<mask_name>"
-  [(set (match_operand:VI14_128 0 "register_operand" "=x,v")
+  [(set (match_operand:VI14_128 0 "register_operand" "=Yr,*x,v")
 	(smaxmin:VI14_128
-	  (match_operand:VI14_128 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI14_128 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI14_128 1 "nonimmediate_operand" "%0,0,v")
+	  (match_operand:VI14_128 2 "nonimmediate_operand" "Yrm,*xm,vm")))]
   "TARGET_SSE4_1
    && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "@
    p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
+   p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*<code>v8hi3"
@@ -10324,20 +10334,21 @@
 })
 
 (define_insn "*sse4_1_<code><mode>3<mask_name>"
-  [(set (match_operand:VI24_128 0 "register_operand" "=x,v")
+  [(set (match_operand:VI24_128 0 "register_operand" "=Yr,*x,v")
 	(umaxmin:VI24_128
-	  (match_operand:VI24_128 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI24_128 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI24_128 1 "nonimmediate_operand" "%0,0,v")
+	  (match_operand:VI24_128 2 "nonimmediate_operand" "Yrm,*xm,vm")))]
   "TARGET_SSE4_1
    && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "@
    p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
+   p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*<code>v16qi3"
@@ -10427,18 +10438,19 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "*sse4_1_eqv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(eq:V2DI
-	  (match_operand:V2DI 1 "nonimmediate_operand" "%0,x")
-	  (match_operand:V2DI 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:V2DI 1 "nonimmediate_operand" "%0,0,x")
+	  (match_operand:V2DI 2 "nonimmediate_operand" "Yrm,*xm,xm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (EQ, V2DImode, operands)"
   "@
    pcmpeqq\t{%2, %0|%0, %2}
+   pcmpeqq\t{%2, %0|%0, %2}
    vpcmpeqq\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*sse2_eq<mode>3"
@@ -10474,18 +10486,19 @@
   "ix86_fixup_binary_operands_no_copy (EQ, V2DImode, operands);")
 
 (define_insn "sse4_2_gtv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(gt:V2DI
-	  (match_operand:V2DI 1 "register_operand" "0,x")
-	  (match_operand:V2DI 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:V2DI 1 "register_operand" "0,0,x")
+	  (match_operand:V2DI 2 "nonimmediate_operand" "Yrm,*xm,xm")))]
   "TARGET_SSE4_2"
   "@
    pcmpgtq\t{%2, %0|%0, %2}
+   pcmpgtq\t{%2, %0|%0, %2}
    vpcmpgtq\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "avx2_gt<mode>3"
@@ -12705,9 +12718,9 @@
   "operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));")
 
 (define_insn "*vec_extractv4si"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,x,x")
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,Yr,*x,x")
 	(vec_select:SI
-	  (match_operand:V4SI 1 "register_operand" "x,0,x")
+	  (match_operand:V4SI 1 "register_operand" "x,0,0,x")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "TARGET_SSE4_1"
 {
@@ -12717,10 +12730,11 @@
       return "%vpextrd\t{%2, %1, %0|%0, %1, %2}";
 
     case 1:
+    case 2:
       operands [2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "psrldq\t{%2, %0|%0, %2}";
 
-    case 2:
+    case 3:
       operands [2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
 
@@ -12728,11 +12742,11 @@
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "*,noavx,avx")
-   (set_attr "type" "sselog1,sseishft1,sseishft1")
-   (set_attr "prefix_extra" "1,*,*")
+  [(set_attr "isa" "*,noavx,noavx,avx")
+   (set_attr "type" "sselog1,sseishft1,sseishft1,sseishft1")
+   (set_attr "prefix_extra" "1,*,*,*")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex,orig,vex")
+   (set_attr "prefix" "maybe_vex,orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extractv4si_zext"
@@ -12839,25 +12853,27 @@
    (set_attr "mode" "TI,TI,DF,V4SF")])
 
 (define_insn "*vec_concatv2si_sse4_1"
-  [(set (match_operand:V2SI 0 "register_operand"     "=x, x,x,x, x, *y,*y")
+  [(set (match_operand:V2SI 0 "register_operand"     "=Yr,*x,x, Yr,*x,x, x, *y,*y")
 	(vec_concat:V2SI
-	  (match_operand:SI 1 "nonimmediate_operand" " 0, x,0,x,rm,  0,rm")
-	  (match_operand:SI 2 "vector_move_operand"  "rm,rm,x,x, C,*ym, C")))]
+	  (match_operand:SI 1 "nonimmediate_operand" "  0, 0,x,  0,0, x,rm,  0,rm")
+	  (match_operand:SI 2 "vector_move_operand"  " rm,rm,rm,Yr,*x,x, C,*ym, C")))]
   "TARGET_SSE4_1"
   "@
    pinsrd\t{$1, %2, %0|%0, %2, 1}
+   pinsrd\t{$1, %2, %0|%0, %2, 1}
    vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
    punpckldq\t{%2, %0|%0, %2}
+   punpckldq\t{%2, %0|%0, %2}
    vpunpckldq\t{%2, %1, %0|%0, %1, %2}
    %vmovd\t{%1, %0|%0, %1}
    punpckldq\t{%2, %0|%0, %2}
    movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
-   (set_attr "type" "sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_extra" "1,1,*,*,*,*,*")
-   (set_attr "length_immediate" "1,1,*,*,*,*,*")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "TI,TI,TI,TI,TI,DI,DI")])
+  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
+   (set_attr "type" "sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
+   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*")
+   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
@@ -12900,15 +12916,16 @@
 ;; movd instead of movq is required to handle broken assemblers.
 (define_insn "vec_concatv2di"
   [(set (match_operand:V2DI 0 "register_operand"
-	  "=x,x ,Yi,x ,!x,x,x,x,x,x")
+	  "=Yr,*x,x ,Yi,x ,!x,x,x,x,x,x")
 	(vec_concat:V2DI
 	  (match_operand:DI 1 "nonimmediate_operand"
-	  " 0,x ,r ,xm,*y,0,x,0,0,x")
+	  "  0, 0,x ,r ,xm,*y,0,x,0,0,x")
 	  (match_operand:DI 2 "vector_move_operand"
-	  "rm,rm,C ,C ,C ,x,x,x,m,m")))]
+	  "*rm,rm,rm,C ,C ,C ,x,x,x,m,m")))]
   "TARGET_SSE"
   "@
    pinsrq\t{$1, %2, %0|%0, %2, 1}
+   pinsrq\t{$1, %2, %0|%0, %2, 1}
    vpinsrq\t{$1, %2, %1, %0|%0, %1, %2, 1}
    * return HAVE_AS_IX86_INTERUNIT_MOVQ ? \"%vmovq\t{%1, %0|%0, %1}\" : \"%vmovd\t{%1, %0|%0, %1}\";
    %vmovq\t{%1, %0|%0, %1}
@@ -12918,17 +12935,17 @@
    movlhps\t{%2, %0|%0, %2}
    movhps\t{%2, %0|%0, %2}
    vmovhps\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "x64_sse4_noavx,x64_avx,x64,sse2,sse2,sse2_noavx,avx,noavx,noavx,avx")
+  [(set_attr "isa" "x64_sse4_noavx,x64_sse4_noavx,x64_avx,x64,sse2,sse2,sse2_noavx,avx,noavx,noavx,avx")
    (set (attr "type")
      (if_then_else
-       (eq_attr "alternative" "0,1,5,6")
+       (eq_attr "alternative" "0,1,2,6,7")
        (const_string "sselog")
        (const_string "ssemov")))
-   (set_attr "prefix_rex" "1,1,1,*,*,*,*,*,*,*")
-   (set_attr "prefix_extra" "1,1,*,*,*,*,*,*,*,*")
-   (set_attr "length_immediate" "1,1,*,*,*,*,*,*,*,*")
-   (set_attr "prefix" "orig,vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex")
-   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")])
+   (set_attr "prefix_rex" "1,1,1,1,*,*,*,*,*,*,*")
+   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*,*,*")
+   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")])
 
 (define_expand "vec_unpacks_lo_<mode>"
   [(match_operand:<sseunpackmode> 0 "register_operand")
@@ -13968,61 +13985,64 @@
   [(V8SF "255") (V4SF "15") (V4DF "15") (V2DF "3")])
 
 (define_insn "<sse4_1>_blend<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:VF_128_256
-	  (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	  (match_operand:VF_128_256 1 "register_operand" "0,x")
+	  (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	  (match_operand:VF_128_256 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_0_to_<blendbits>_operand")))]
   "TARGET_SSE4_1"
   "@
    blend<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   blend<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vblend<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "register_operand" "0,x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:VF_128_256 3 "register_operand" "Yz,x")]
+	  [(match_operand:VF_128_256 1 "register_operand" "0,0,x")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:VF_128_256 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
   "@
    blendv<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   blendv<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vblendv<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector") 
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector") 
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse4_1>_dp<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,x")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 	  UNSPEC_DP))]
   "TARGET_SSE4_1"
   "@
    dp<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   dp<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vdp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemul")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<MODE>")])
 
 ;; Mode attribute used by `vmovntdqa' pattern
@@ -14030,86 +14050,90 @@
    [(V2DI "sse4_1") (V4DI "avx2") (V8DI "avx512f")])
 
 (define_insn "<vi8_sse4_1_avx2_avx512>_movntdqa"
-  [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=x, v")
-	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m, m")]
+  [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=Yr,*x, v")
+	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m, m, m")]
 		     UNSPEC_MOVNTDQA))]
   "TARGET_SSE4_1"
   "%vmovntdqa\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
-   (set_attr "prefix_extra" "1, *")
-   (set_attr "prefix" "maybe_vex, evex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,evex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse4_1_avx2>_mpsadbw"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
-	  [(match_operand:VI1_AVX2 1 "register_operand" "0,x")
-	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+	  [(match_operand:VI1_AVX2 1 "register_operand" "0,0,x")
+	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 	  UNSPEC_MPSADBW))]
   "TARGET_SSE4_1"
   "@
    mpsadbw\t{%3, %2, %0|%0, %2, %3}
+   mpsadbw\t{%3, %2, %0|%0, %2, %3}
    vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse4_1_avx2>_packusdw<mask_name>"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v")
+  [(set (match_operand:VI2_AVX2 0 "register_operand" "=Yr,*x,v")
 	(vec_concat:VI2_AVX2
 	  (us_truncate:<ssehalfvecmode>
-	    (match_operand:<sseunpackmode> 1 "register_operand" "0,v"))
+	    (match_operand:<sseunpackmode> 1 "register_operand" "0,0,v"))
 	  (us_truncate:<ssehalfvecmode>
-	    (match_operand:<sseunpackmode> 2 "nonimmediate_operand" "xm,vm"))))]
+	    (match_operand:<sseunpackmode> 2 "nonimmediate_operand" "Yrm,*xm,vm"))))]
   "TARGET_SSE4_1 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
    packusdw\t{%2, %0|%0, %2}
+   packusdw\t{%2, %0|%0, %2}
    vpackusdw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse4_1_avx2>_pblendvb"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
-	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,x")
-	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,x")]
+	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,0,x")
+	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
   "@
    pblendvb\t{%3, %2, %0|%0, %2, %3}
+   pblendvb\t{%3, %2, %0|%0, %2, %3}
    vpblendvb\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "*,1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "length_immediate" "*,*,1")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "sse4_1_pblendw"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:V8HI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,xm")
-	  (match_operand:V8HI 1 "register_operand" "0,x")
-	  (match_operand:SI 3 "const_0_to_255_operand" "n,n")))]
+	  (match_operand:V8HI 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	  (match_operand:V8HI 1 "register_operand" "0,0,x")
+	  (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")))]
   "TARGET_SSE4_1"
   "@
    pblendw\t{%3, %2, %0|%0, %2, %3}
+   pblendw\t{%3, %2, %0|%0, %2, %3}
    vpblendw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 ;; The builtin uses an 8-bit immediate.  Expand that.
@@ -14157,8 +14181,8 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "sse4_1_phminposuw"
-  [(set (match_operand:V8HI 0 "register_operand" "=x")
-	(unspec:V8HI [(match_operand:V8HI 1 "nonimmediate_operand" "xm")]
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x")
+	(unspec:V8HI [(match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*xm")]
 		     UNSPEC_PHMINPOSUW))]
   "TARGET_SSE4_1"
   "%vphminposuw\t{%1, %0|%0, %1}"
@@ -14190,10 +14214,10 @@
    (set_attr "mode" "XI")])
 
 (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*v")
 	(any_extend:V8HI
 	  (vec_select:V8QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
@@ -14233,10 +14257,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v4qiv4si2<mask_name>"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
 	(any_extend:V4SI
 	  (vec_select:V4QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
@@ -14269,10 +14293,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v4hiv4si2<mask_name>"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
 	(any_extend:V4SI
 	  (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
@@ -14313,10 +14337,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2qiv2di2<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
 	(any_extend:V2DI
 	  (vec_select:V2QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %w1}"
@@ -14351,10 +14375,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2hiv2di2<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
 	(any_extend:V2DI
 	  (vec_select:V2HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
@@ -14386,10 +14410,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2siv2di2<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
 	(any_extend:V2DI
 	  (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
@@ -14430,8 +14454,8 @@
 
 (define_insn "sse4_1_ptest"
   [(set (reg:CC FLAGS_REG)
-	(unspec:CC [(match_operand:V2DI 0 "register_operand" "x")
-		    (match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+	(unspec:CC [(match_operand:V2DI 0 "register_operand" "Yr,*x")
+		    (match_operand:V2DI 1 "nonimmediate_operand" "Yrm,*xm")]
 		   UNSPEC_PTEST))]
   "TARGET_SSE4_1"
   "%vptest\t{%1, %0|%0, %1}"
@@ -14441,10 +14465,10 @@
    (set_attr "mode" "TI")])
 
 (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "xm")
-	   (match_operand:SI 2 "const_0_to_15_operand" "n")]
+	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "Yrm,*xm")
+	   (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
 	  UNSPEC_ROUND))]
   "TARGET_ROUND"
   "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
@@ -14524,24 +14548,25 @@
 })
 
 (define_insn "sse4_1_round<ssescalarmodesuffix>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:VF_128
 	  (unspec:VF_128
-	    [(match_operand:VF_128 2 "register_operand" "x,x")
-	     (match_operand:SI 3 "const_0_to_15_operand" "n,n")]
+	    [(match_operand:VF_128 2 "register_operand" "Yr,*x,x")
+	     (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")]
 	    UNSPEC_ROUND)
-	  (match_operand:VF_128 1 "register_operand" "0,x")
+	  (match_operand:VF_128 1 "register_operand" "0,0,x")
 	  (const_int 1)))]
   "TARGET_ROUND"
   "@
    round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "<MODE>")])
 
 (define_expand "round<mode>2"
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 91228c8..d4ce519 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -63,6 +63,7 @@
 (define_subst_attr "mask_prefix" "mask" "vex" "evex")
 (define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex")
 (define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex")
+(define_subst_attr "mask_prefix4" "mask" "orig,orig,vex" "evex")
 (define_subst_attr "mask_expand_op3" "mask" "3" "5")
 
 (define_subst "mask"
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 735e6e5..b5c6e4f 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -395,6 +395,10 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb",
 DEF_TUNE (X86_TUNE_VECTOR_PARALLEL_EXECUTION, "vec_parallel",
           m_NEHALEM | m_SANDYBRIDGE | m_HASWELL)
 
+/* X86_TUNE_AVOID_4BYTE_PREFIXES: Avoid instructions requiring 4+ bytes of prefixes.  */
+DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
+          m_SILVERMONT | m_INTEL)
+
 /*****************************************************************************/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too)     */
 /*****************************************************************************/
diff --git a/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c b/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
index 0aa5264..b347a4a 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
@@ -10,4 +10,4 @@ test (long long b)
   return _mm_cvtsi64_si128 (b); 
 }
 
-/* { dg-final { scan-assembler-times "vec_concatv2di/3" 1 } } */
+/* { dg-final { scan-assembler-times "vec_concatv2di/4" 1 } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-12-02 11:08                         ` Ilya Enkovich
@ 2014-12-02 11:21                           ` Uros Bizjak
  2014-12-04 14:05                             ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Uros Bizjak @ 2014-12-02 11:21 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

On Tue, Dec 2, 2014 at 12:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

>> > Having stage1 close to end, may we make some decision regarding this
>> > patch? Having a couple of working variants, may we choose and use one
>> > of them?
>>
>> I propose to wait for Vlad for an update about his plans on register
>> preference algorythm that would fix this (and other "Ya*r"-type
>> issues).
>>
>> In the absence of the fix, we'll go with "Yr,*x".
>>
>> Uros.
>>
>
> Hi,
>
> Here is an updated patch which uses "Yr,*x" option to avoid long prefixes for Silvermont.  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2014-12-02  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * config/i386/constraints.md (Yr): New.
>         * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
>         (REG_CLASS_NAMES): Likewise.
>         (REG_CLASS_CONTENTS): Likewise.
>         * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
>         which use only NO_REX_SSE_REGS.
>         (vec_set<mode>_0): Likewise.
>         (*vec_setv4sf_sse4_1): Likewise.
>         (sse4_1_insertps): Likewise.
>         (*sse4_1_extractps): Likewise.
>         (*sse4_1_mulv2siv2di3<mask_name>): Likewise.
>         (*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise.
>         (*sse4_1_<code><mode>3<mask_name>): Likewise.
>         (*sse4_1_<code><mode>3): Likewise.
>         (*sse4_1_eqv2di3): Likewise.
>         (sse4_2_gtv2di3): Likewise.
>         (*vec_extractv4si): Likewise.
>         (*vec_concatv2si_sse4_1): Likewise.
>         (vec_concatv2di): Likewise.
>         (<sse4_1>_blend<ssemodesuffix><avxsizesuffix>): Likewise.
>         (<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>): Likewise.
>         (<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise.
>         (<vi8_sse4_1_avx2_avx512>_movntdqa): Likewise.
>         (<sse4_1_avx2>_mpsadbw): Likewise.
>         (<sse4_1_avx2>packusdw<mask_name>): Likewise.
>         (<sse4_1_avx2>_pblendvb): Likewise.
>         (sse4_1_pblendw): Likewise.
>         (sse4_1_phminposuw): Likewise.
>         (sse4_1_<code>v8qiv8hi2<mask_name>): Likewise.
>         (sse4_1_<code>v4qiv4si2<mask_name>): Likewise.
>         (sse4_1_<code>v4hiv4si2<mask_name>): Likewise.
>         (sse4_1_<code>v2qiv2di2<mask_name>): Likewise.
>         (sse4_1_<code>v2hiv2di2<mask_name>): Likewise.
>         (sse4_1_<code>v2siv2di2<mask_name>): Likewise.
>         (sse4_1_ptest): Likewise.
>         (<sse4_1>_round<ssemodesuffix><avxsizesuffix>): Likewise.
>         (sse4_1_round<ssescalarmodesuffix>): Likewise.
>         * config/i386/subst.md (mask_prefix4): New.
>         * config/i386/x86-tune.def (X86_TUNE_AVOID_4BYTE_PREFIXES): New.
>
> gcc/testsuites/
>
> 2014-12-02  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * gcc.target/i386/sse2-init-v2di-2.c: Adjust to changed
>         vec_concatv2di template.

OK for mainline with a change below.

@@ -6544,13 +6550,14 @@
 })

 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=*rm,rm,x,x")
  (vec_select:SF

Please do not change preferences of non-SSE registers. Please check
the patch that similar changes didn't creep in.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-12-02 11:21                           ` Uros Bizjak
@ 2014-12-04 14:05                             ` Ilya Enkovich
  0 siblings, 0 replies; 27+ messages in thread
From: Ilya Enkovich @ 2014-12-04 14:05 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jakub Jelinek, Vladimir Makarov

[-- Attachment #1: Type: text/plain, Size: 3617 bytes --]

On 02 Dec 12:21, Uros Bizjak wrote:
> On Tue, Dec 2, 2014 at 12:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 
> >> > Having stage1 close to end, may we make some decision regarding this
> >> > patch? Having a couple of working variants, may we choose and use one
> >> > of them?
> >>
> >> I propose to wait for Vlad for an update about his plans on register
> >> preference algorythm that would fix this (and other "Ya*r"-type
> >> issues).
> >>
> >> In the absence of the fix, we'll go with "Yr,*x".
> >>
> >> Uros.
> >>
> >
> > Hi,
> >
> > Here is an updated patch which uses "Yr,*x" option to avoid long prefixes for Silvermont.  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
> >
> > Thanks,
> > Ilya
> > --
> > gcc/
> >
> > 2014-12-02  Ilya Enkovich  <ilya.enkovich@intel.com>
> >
> >         * config/i386/constraints.md (Yr): New.
> >         * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
> >         (REG_CLASS_NAMES): Likewise.
> >         (REG_CLASS_CONTENTS): Likewise.
> >         * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
> >         which use only NO_REX_SSE_REGS.
> >         (vec_set<mode>_0): Likewise.
> >         (*vec_setv4sf_sse4_1): Likewise.
> >         (sse4_1_insertps): Likewise.
> >         (*sse4_1_extractps): Likewise.
> >         (*sse4_1_mulv2siv2di3<mask_name>): Likewise.
> >         (*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise.
> >         (*sse4_1_<code><mode>3<mask_name>): Likewise.
> >         (*sse4_1_<code><mode>3): Likewise.
> >         (*sse4_1_eqv2di3): Likewise.
> >         (sse4_2_gtv2di3): Likewise.
> >         (*vec_extractv4si): Likewise.
> >         (*vec_concatv2si_sse4_1): Likewise.
> >         (vec_concatv2di): Likewise.
> >         (<sse4_1>_blend<ssemodesuffix><avxsizesuffix>): Likewise.
> >         (<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>): Likewise.
> >         (<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise.
> >         (<vi8_sse4_1_avx2_avx512>_movntdqa): Likewise.
> >         (<sse4_1_avx2>_mpsadbw): Likewise.
> >         (<sse4_1_avx2>packusdw<mask_name>): Likewise.
> >         (<sse4_1_avx2>_pblendvb): Likewise.
> >         (sse4_1_pblendw): Likewise.
> >         (sse4_1_phminposuw): Likewise.
> >         (sse4_1_<code>v8qiv8hi2<mask_name>): Likewise.
> >         (sse4_1_<code>v4qiv4si2<mask_name>): Likewise.
> >         (sse4_1_<code>v4hiv4si2<mask_name>): Likewise.
> >         (sse4_1_<code>v2qiv2di2<mask_name>): Likewise.
> >         (sse4_1_<code>v2hiv2di2<mask_name>): Likewise.
> >         (sse4_1_<code>v2siv2di2<mask_name>): Likewise.
> >         (sse4_1_ptest): Likewise.
> >         (<sse4_1>_round<ssemodesuffix><avxsizesuffix>): Likewise.
> >         (sse4_1_round<ssescalarmodesuffix>): Likewise.
> >         * config/i386/subst.md (mask_prefix4): New.
> >         * config/i386/x86-tune.def (X86_TUNE_AVOID_4BYTE_PREFIXES): New.
> >
> > gcc/testsuites/
> >
> > 2014-12-02  Ilya Enkovich  <ilya.enkovich@intel.com>
> >
> >         * gcc.target/i386/sse2-init-v2di-2.c: Adjust to changed
> >         vec_concatv2di template.
> 
> OK for mainline with a change below.
> 
> @@ -6544,13 +6550,14 @@
>  })
> 
>  (define_insn_and_split "*sse4_1_extractps"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x")
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=*rm,rm,x,x")
>   (vec_select:SF
> 
> Please do not change preferences of non-SSE registers. Please check
> the patch that similar changes didn't creep in.
> 
> Thanks,
> Uros.

Thanks for review!  Didn't find any other typo.  Attached is a committed version.

Thanks,
Ilya

[-- Attachment #2: slm-prefixes1.patch --]
[-- Type: text/plain, Size: 38485 bytes --]

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 4e07d70..72925cd 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -108,6 +108,8 @@
 ;;  d	Integer register when integer DFmode moves are enabled
 ;;  x	Integer register when integer XFmode moves are enabled
 ;;  f	x87 register when 80387 floating point arithmetic is enabled
+;;  r	SSE regs not requiring REX prefix when prefixes avoidance is enabled
+;;	and all SSE regs otherwise
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -150,6 +152,10 @@
  "(ix86_fpmath & FPMATH_387) ? FLOAT_REGS : NO_REGS"
  "@internal Any x87 register when 80387 FP arithmetic is enabled.")
 
+(define_register_constraint "Yr"
+ "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : ALL_SSE_REGS) : NO_REGS"
+ "@internal Lower SSE register when avoiding REX prefix and all SSE registers otherwise.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3f5f979..6d9df65 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1311,6 +1311,7 @@ enum reg_class
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
+  NO_REX_SSE_REGS,
   SSE_REGS,
   EVEX_SSE_REGS,
   BND_REGS,
@@ -1369,6 +1370,7 @@ enum reg_class
    "FP_TOP_REG", "FP_SECOND_REG",	\
    "FLOAT_REGS",			\
    "SSE_FIRST_REG",			\
+   "NO_REX_SSE_REGS",			\
    "SSE_REGS",				\
    "EVEX_SSE_REGS",			\
    "BND_REGS",				\
@@ -1409,6 +1411,7 @@ enum reg_class
     { 0x0200,       0x0,    0x0 },       /* FP_SECOND_REG */             \
     { 0xff00,       0x0,    0x0 },       /* FLOAT_REGS */                \
   { 0x200000,       0x0,    0x0 },       /* SSE_FIRST_REG */             \
+{ 0x1fe00000,  0x000000,    0x0 },       /* NO_REX_SSE_REGS */           \
 { 0x1fe00000,  0x1fe000,    0x0 },       /* SSE_REGS */                  \
        { 0x0,0xffe00000,   0x1f },       /* EVEX_SSE_REGS */             \
        { 0x0,       0x0,0x1e000 },       /* BND_REGS */			 \
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ca5d720..c3aaea3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6338,26 +6338,28 @@
 ;; Although insertps takes register source, we prefer
 ;; unpcklps with register source since it is shorter.
 (define_insn "*vec_concatv2sf_sse4_1"
-  [(set (match_operand:V2SF 0 "register_operand"     "=x,x,x,x,x,*y ,*y")
+  [(set (match_operand:V2SF 0 "register_operand"     "=Yr,*x,x,Yr,*x,x,x,*y ,*y")
 	(vec_concat:V2SF
-	  (match_operand:SF 1 "nonimmediate_operand" " 0,x,0,x,m, 0 , m")
-	  (match_operand:SF 2 "vector_move_operand"  " x,x,m,m,C,*ym, C")))]
+	  (match_operand:SF 1 "nonimmediate_operand" "  0, 0,x, 0,0, x,m, 0 , m")
+	  (match_operand:SF 2 "vector_move_operand"  " Yr,*x,x, m,m, m,C,*ym, C")))]
   "TARGET_SSE4_1"
   "@
    unpcklps\t{%2, %0|%0, %2}
+   unpcklps\t{%2, %0|%0, %2}
    vunpcklps\t{%2, %1, %0|%0, %1, %2}
    insertps\t{$0x10, %2, %0|%0, %2, 0x10}
+   insertps\t{$0x10, %2, %0|%0, %2, 0x10}
    vinsertps\t{$0x10, %2, %1, %0|%0, %1, %2, 0x10}
    %vmovss\t{%1, %0|%0, %1}
    punpckldq\t{%2, %0|%0, %2}
    movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
-   (set_attr "type" "sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_data16" "*,*,1,*,*,*,*")
-   (set_attr "prefix_extra" "*,*,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,1,1,*,*,*")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
+  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
+   (set_attr "type" "sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
+   (set_attr "prefix_data16" "*,*,*,1,1,*,*,*,*")
+   (set_attr "prefix_extra" "*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
+   (set_attr "mode" "V4SF,V4SF,V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
@@ -6405,49 +6407,51 @@
 ;; see comment above inline_secondary_memory_needed function in i386.c
 (define_insn "vec_set<mode>_0"
   [(set (match_operand:VI4F_128 0 "nonimmediate_operand"
-	  "=v,v,v ,x,x,v,x  ,x  ,m ,m   ,m")
+	  "=Yr,*v,v,v ,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
 	(vec_merge:VI4F_128
 	  (vec_duplicate:VI4F_128
 	    (match_operand:<ssescalarmode> 2 "general_operand"
-	  " v,m,*r,m,x,v,*rm,*rm,!x,!*re,!*fF"))
+	  " Yr,*v,m,*r,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
 	  (match_operand:VI4F_128 1 "vector_move_operand"
-	  " C,C,C ,C,0,v,0  ,x  ,0 ,0   ,0")
+	  " C , C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
+   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
    %vmov<ssescalarmodesuffix>\t{%2, %0|%0, %2}
    %vmovd\t{%2, %0|%0, %2}
    movss\t{%2, %0|%0, %2}
    movss\t{%2, %0|%0, %2}
    vmovss\t{%2, %1, %0|%0, %1, %2}
    pinsrd\t{$0, %2, %0|%0, %2, 0}
+   pinsrd\t{$0, %2, %0|%0, %2, 0}
    vpinsrd\t{$0, %2, %1, %0|%0, %1, %2, 0}
    #
    #
    #"
-  [(set_attr "isa" "sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,avx,*,*,*")
+  [(set_attr "isa" "sse4,sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
    (set (attr "type")
-     (cond [(eq_attr "alternative" "0,6,7")
+     (cond [(eq_attr "alternative" "0,1,7,8,9")
 	      (const_string "sselog")
-	    (eq_attr "alternative" "9")
+	    (eq_attr "alternative" "11")
 	      (const_string "imov")
-	    (eq_attr "alternative" "10")
+	    (eq_attr "alternative" "12")
 	      (const_string "fmov")
 	   ]
 	   (const_string "ssemov")))
-   (set_attr "prefix_extra" "*,*,*,*,*,*,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,*,*,*,*,1,1,*,*,*")
-   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,vex,*,*,*")
-   (set_attr "mode" "SF,<ssescalarmode>,SI,SF,SF,SF,TI,TI,*,*,*")])
+   (set_attr "prefix_extra" "*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
+   (set_attr "mode" "SF,SF,<ssescalarmode>,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
 
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
-  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (match_operand:SF 2 "nonimmediate_operand" "xm,xm"))
-	  (match_operand:V4SF 1 "register_operand" "0,x")
+	    (match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,xm"))
+	  (match_operand:V4SF 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
    && ((unsigned) exact_log2 (INTVAL (operands[3]))
@@ -6457,26 +6461,27 @@
   switch (which_alternative)
     {
     case 0:
-      return "insertps\t{%3, %2, %0|%0, %2, %3}";
     case 1:
+      return "insertps\t{%3, %2, %0|%0, %2, %3}";
+    case 2:
       return "vinsertps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "V4SF")])
 
 (define_insn "sse4_1_insertps"
-  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
-	(unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "xm,xm")
-		      (match_operand:V4SF 1 "register_operand" "0,x")
-		      (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,x")
+	(unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "Yrm,*xm,xm")
+		      (match_operand:V4SF 1 "register_operand" "0,0,x")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 		     UNSPEC_INSERTPS))]
   "TARGET_SSE4_1"
 {
@@ -6490,19 +6495,20 @@
   switch (which_alternative)
     {
     case 0:
-      return "insertps\t{%3, %2, %0|%0, %2, %3}";
     case 1:
+      return "insertps\t{%3, %2, %0|%0, %2, %3}";
+    case 2:
       return "vinsertps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "V4SF")])
 
 (define_split
@@ -6544,13 +6550,14 @@
 })
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,x,x")
 	(vec_select:SF
-	  (match_operand:V4SF 1 "register_operand" "x,0,x")
-	  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n")])))]
+	  (match_operand:V4SF 1 "register_operand" "Yr,*x,0,x")
+	  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n")])))]
   "TARGET_SSE4_1"
   "@
    %vextractps\t{%2, %1, %0|%0, %1, %2}
+   %vextractps\t{%2, %1, %0|%0, %1, %2}
    #
    #"
   "&& reload_completed && SSE_REG_P (operands[0])"
@@ -6575,13 +6582,13 @@
     }
   DONE;
 }
-  [(set_attr "isa" "*,noavx,avx")
-   (set_attr "type" "sselog,*,*")
-   (set_attr "prefix_data16" "1,*,*")
-   (set_attr "prefix_extra" "1,*,*")
-   (set_attr "length_immediate" "1,*,*")
-   (set_attr "prefix" "maybe_vex,*,*")
-   (set_attr "mode" "V4SF,*,*")])
+  [(set_attr "isa" "*,*,noavx,avx")
+   (set_attr "type" "sselog,sselog,*,*")
+   (set_attr "prefix_data16" "1,1,*,*")
+   (set_attr "prefix_extra" "1,1,*,*")
+   (set_attr "length_immediate" "1,1,*,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,*,*")
+   (set_attr "mode" "V4SF,V4SF,*,*")])
 
 (define_insn_and_split "*vec_extractv4sf_mem"
   [(set (match_operand:SF 0 "register_operand" "=x,*r,f")
@@ -9553,26 +9560,27 @@
   "ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);")
 
 (define_insn "*sse4_1_mulv2siv2di3<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%0,v")
+	      (match_operand:V4SI 1 "nonimmediate_operand" "%0,0,v")
 	      (parallel [(const_int 0) (const_int 2)])))
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 2 "nonimmediate_operand" "xm,vm")
+	      (match_operand:V4SI 2 "nonimmediate_operand" "Yrm,*xm,vm")
 	      (parallel [(const_int 0) (const_int 2)])))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>
    && ix86_binary_operator_ok (MULT, V4SImode, operands)"
   "@
    pmuldq\t{%2, %0|%0, %2}
+   pmuldq\t{%2, %0|%0, %2}
    vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseimul")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "avx512bw_pmaddwd512<mode><mask_name>"
@@ -9752,19 +9760,20 @@
 })
 
 (define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
-  [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v")
+  [(set (match_operand:VI4_AVX512F 0 "register_operand" "=Yr,*x,v")
 	(mult:VI4_AVX512F
-	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,0,v")
+	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "Yrm,*xm,vm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    pmulld\t{%2, %0|%0, %2}
+   pmulld\t{%2, %0|%0, %2}
    vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "<mask_prefix3>")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "<mask_prefix4>")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "mul<mode>3"
@@ -10241,20 +10250,21 @@
 })
 
 (define_insn "*sse4_1_<code><mode>3<mask_name>"
-  [(set (match_operand:VI14_128 0 "register_operand" "=x,v")
+  [(set (match_operand:VI14_128 0 "register_operand" "=Yr,*x,v")
 	(smaxmin:VI14_128
-	  (match_operand:VI14_128 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI14_128 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI14_128 1 "nonimmediate_operand" "%0,0,v")
+	  (match_operand:VI14_128 2 "nonimmediate_operand" "Yrm,*xm,vm")))]
   "TARGET_SSE4_1
    && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "@
    p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
+   p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*<code>v8hi3"
@@ -10324,20 +10334,21 @@
 })
 
 (define_insn "*sse4_1_<code><mode>3<mask_name>"
-  [(set (match_operand:VI24_128 0 "register_operand" "=x,v")
+  [(set (match_operand:VI24_128 0 "register_operand" "=Yr,*x,v")
 	(umaxmin:VI24_128
-	  (match_operand:VI24_128 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI24_128 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI24_128 1 "nonimmediate_operand" "%0,0,v")
+	  (match_operand:VI24_128 2 "nonimmediate_operand" "Yrm,*xm,vm")))]
   "TARGET_SSE4_1
    && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "@
    p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
+   p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*<code>v16qi3"
@@ -10427,18 +10438,19 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "*sse4_1_eqv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(eq:V2DI
-	  (match_operand:V2DI 1 "nonimmediate_operand" "%0,x")
-	  (match_operand:V2DI 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:V2DI 1 "nonimmediate_operand" "%0,0,x")
+	  (match_operand:V2DI 2 "nonimmediate_operand" "Yrm,*xm,xm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (EQ, V2DImode, operands)"
   "@
    pcmpeqq\t{%2, %0|%0, %2}
+   pcmpeqq\t{%2, %0|%0, %2}
    vpcmpeqq\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*sse2_eq<mode>3"
@@ -10474,18 +10486,19 @@
   "ix86_fixup_binary_operands_no_copy (EQ, V2DImode, operands);")
 
 (define_insn "sse4_2_gtv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(gt:V2DI
-	  (match_operand:V2DI 1 "register_operand" "0,x")
-	  (match_operand:V2DI 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:V2DI 1 "register_operand" "0,0,x")
+	  (match_operand:V2DI 2 "nonimmediate_operand" "Yrm,*xm,xm")))]
   "TARGET_SSE4_2"
   "@
    pcmpgtq\t{%2, %0|%0, %2}
+   pcmpgtq\t{%2, %0|%0, %2}
    vpcmpgtq\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "avx2_gt<mode>3"
@@ -12705,9 +12718,9 @@
   "operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));")
 
 (define_insn "*vec_extractv4si"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,x,x")
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,Yr,*x,x")
 	(vec_select:SI
-	  (match_operand:V4SI 1 "register_operand" "x,0,x")
+	  (match_operand:V4SI 1 "register_operand" "x,0,0,x")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "TARGET_SSE4_1"
 {
@@ -12717,10 +12730,11 @@
       return "%vpextrd\t{%2, %1, %0|%0, %1, %2}";
 
     case 1:
+    case 2:
       operands [2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "psrldq\t{%2, %0|%0, %2}";
 
-    case 2:
+    case 3:
       operands [2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
 
@@ -12728,11 +12742,11 @@
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "*,noavx,avx")
-   (set_attr "type" "sselog1,sseishft1,sseishft1")
-   (set_attr "prefix_extra" "1,*,*")
+  [(set_attr "isa" "*,noavx,noavx,avx")
+   (set_attr "type" "sselog1,sseishft1,sseishft1,sseishft1")
+   (set_attr "prefix_extra" "1,*,*,*")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex,orig,vex")
+   (set_attr "prefix" "maybe_vex,orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extractv4si_zext"
@@ -12839,25 +12853,27 @@
    (set_attr "mode" "TI,TI,DF,V4SF")])
 
 (define_insn "*vec_concatv2si_sse4_1"
-  [(set (match_operand:V2SI 0 "register_operand"     "=x, x,x,x, x, *y,*y")
+  [(set (match_operand:V2SI 0 "register_operand"     "=Yr,*x,x, Yr,*x,x, x, *y,*y")
 	(vec_concat:V2SI
-	  (match_operand:SI 1 "nonimmediate_operand" " 0, x,0,x,rm,  0,rm")
-	  (match_operand:SI 2 "vector_move_operand"  "rm,rm,x,x, C,*ym, C")))]
+	  (match_operand:SI 1 "nonimmediate_operand" "  0, 0,x,  0,0, x,rm,  0,rm")
+	  (match_operand:SI 2 "vector_move_operand"  " rm,rm,rm,Yr,*x,x, C,*ym, C")))]
   "TARGET_SSE4_1"
   "@
    pinsrd\t{$1, %2, %0|%0, %2, 1}
+   pinsrd\t{$1, %2, %0|%0, %2, 1}
    vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
    punpckldq\t{%2, %0|%0, %2}
+   punpckldq\t{%2, %0|%0, %2}
    vpunpckldq\t{%2, %1, %0|%0, %1, %2}
    %vmovd\t{%1, %0|%0, %1}
    punpckldq\t{%2, %0|%0, %2}
    movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
-   (set_attr "type" "sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_extra" "1,1,*,*,*,*,*")
-   (set_attr "length_immediate" "1,1,*,*,*,*,*")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "TI,TI,TI,TI,TI,DI,DI")])
+  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
+   (set_attr "type" "sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
+   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*")
+   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
@@ -12900,15 +12916,16 @@
 ;; movd instead of movq is required to handle broken assemblers.
 (define_insn "vec_concatv2di"
   [(set (match_operand:V2DI 0 "register_operand"
-	  "=x,x ,Yi,x ,!x,x,x,x,x,x")
+	  "=Yr,*x,x ,Yi,x ,!x,x,x,x,x,x")
 	(vec_concat:V2DI
 	  (match_operand:DI 1 "nonimmediate_operand"
-	  " 0,x ,r ,xm,*y,0,x,0,0,x")
+	  "  0, 0,x ,r ,xm,*y,0,x,0,0,x")
 	  (match_operand:DI 2 "vector_move_operand"
-	  "rm,rm,C ,C ,C ,x,x,x,m,m")))]
+	  "*rm,rm,rm,C ,C ,C ,x,x,x,m,m")))]
   "TARGET_SSE"
   "@
    pinsrq\t{$1, %2, %0|%0, %2, 1}
+   pinsrq\t{$1, %2, %0|%0, %2, 1}
    vpinsrq\t{$1, %2, %1, %0|%0, %1, %2, 1}
    * return HAVE_AS_IX86_INTERUNIT_MOVQ ? \"%vmovq\t{%1, %0|%0, %1}\" : \"%vmovd\t{%1, %0|%0, %1}\";
    %vmovq\t{%1, %0|%0, %1}
@@ -12918,17 +12935,17 @@
    movlhps\t{%2, %0|%0, %2}
    movhps\t{%2, %0|%0, %2}
    vmovhps\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "x64_sse4_noavx,x64_avx,x64,sse2,sse2,sse2_noavx,avx,noavx,noavx,avx")
+  [(set_attr "isa" "x64_sse4_noavx,x64_sse4_noavx,x64_avx,x64,sse2,sse2,sse2_noavx,avx,noavx,noavx,avx")
    (set (attr "type")
      (if_then_else
-       (eq_attr "alternative" "0,1,5,6")
+       (eq_attr "alternative" "0,1,2,6,7")
        (const_string "sselog")
        (const_string "ssemov")))
-   (set_attr "prefix_rex" "1,1,1,*,*,*,*,*,*,*")
-   (set_attr "prefix_extra" "1,1,*,*,*,*,*,*,*,*")
-   (set_attr "length_immediate" "1,1,*,*,*,*,*,*,*,*")
-   (set_attr "prefix" "orig,vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex")
-   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")])
+   (set_attr "prefix_rex" "1,1,1,1,*,*,*,*,*,*,*")
+   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*,*,*")
+   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")])
 
 (define_expand "vec_unpacks_lo_<mode>"
   [(match_operand:<sseunpackmode> 0 "register_operand")
@@ -13968,61 +13985,64 @@
   [(V8SF "255") (V4SF "15") (V4DF "15") (V2DF "3")])
 
 (define_insn "<sse4_1>_blend<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:VF_128_256
-	  (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	  (match_operand:VF_128_256 1 "register_operand" "0,x")
+	  (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	  (match_operand:VF_128_256 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_0_to_<blendbits>_operand")))]
   "TARGET_SSE4_1"
   "@
    blend<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   blend<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vblend<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "register_operand" "0,x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:VF_128_256 3 "register_operand" "Yz,x")]
+	  [(match_operand:VF_128_256 1 "register_operand" "0,0,x")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:VF_128_256 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
   "@
    blendv<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   blendv<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vblendv<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector") 
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector") 
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse4_1>_dp<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,x")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 	  UNSPEC_DP))]
   "TARGET_SSE4_1"
   "@
    dp<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   dp<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vdp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemul")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<MODE>")])
 
 ;; Mode attribute used by `vmovntdqa' pattern
@@ -14030,86 +14050,90 @@
    [(V2DI "sse4_1") (V4DI "avx2") (V8DI "avx512f")])
 
 (define_insn "<vi8_sse4_1_avx2_avx512>_movntdqa"
-  [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=x, v")
-	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m, m")]
+  [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=Yr,*x, v")
+	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m, m, m")]
 		     UNSPEC_MOVNTDQA))]
   "TARGET_SSE4_1"
   "%vmovntdqa\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
-   (set_attr "prefix_extra" "1, *")
-   (set_attr "prefix" "maybe_vex, evex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,evex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse4_1_avx2>_mpsadbw"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
-	  [(match_operand:VI1_AVX2 1 "register_operand" "0,x")
-	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+	  [(match_operand:VI1_AVX2 1 "register_operand" "0,0,x")
+	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 	  UNSPEC_MPSADBW))]
   "TARGET_SSE4_1"
   "@
    mpsadbw\t{%3, %2, %0|%0, %2, %3}
+   mpsadbw\t{%3, %2, %0|%0, %2, %3}
    vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse4_1_avx2>_packusdw<mask_name>"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v")
+  [(set (match_operand:VI2_AVX2 0 "register_operand" "=Yr,*x,v")
 	(vec_concat:VI2_AVX2
 	  (us_truncate:<ssehalfvecmode>
-	    (match_operand:<sseunpackmode> 1 "register_operand" "0,v"))
+	    (match_operand:<sseunpackmode> 1 "register_operand" "0,0,v"))
 	  (us_truncate:<ssehalfvecmode>
-	    (match_operand:<sseunpackmode> 2 "nonimmediate_operand" "xm,vm"))))]
+	    (match_operand:<sseunpackmode> 2 "nonimmediate_operand" "Yrm,*xm,vm"))))]
   "TARGET_SSE4_1 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
    packusdw\t{%2, %0|%0, %2}
+   packusdw\t{%2, %0|%0, %2}
    vpackusdw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse4_1_avx2>_pblendvb"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
-	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,x")
-	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,x")]
+	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,0,x")
+	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
   "@
    pblendvb\t{%3, %2, %0|%0, %2, %3}
+   pblendvb\t{%3, %2, %0|%0, %2, %3}
    vpblendvb\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "*,1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "length_immediate" "*,*,1")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "sse4_1_pblendw"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:V8HI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,xm")
-	  (match_operand:V8HI 1 "register_operand" "0,x")
-	  (match_operand:SI 3 "const_0_to_255_operand" "n,n")))]
+	  (match_operand:V8HI 2 "nonimmediate_operand" "Yrm,*xm,xm")
+	  (match_operand:V8HI 1 "register_operand" "0,0,x")
+	  (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")))]
   "TARGET_SSE4_1"
   "@
    pblendw\t{%3, %2, %0|%0, %2, %3}
+   pblendw\t{%3, %2, %0|%0, %2, %3}
    vpblendw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 ;; The builtin uses an 8-bit immediate.  Expand that.
@@ -14157,8 +14181,8 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "sse4_1_phminposuw"
-  [(set (match_operand:V8HI 0 "register_operand" "=x")
-	(unspec:V8HI [(match_operand:V8HI 1 "nonimmediate_operand" "xm")]
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x")
+	(unspec:V8HI [(match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*xm")]
 		     UNSPEC_PHMINPOSUW))]
   "TARGET_SSE4_1"
   "%vphminposuw\t{%1, %0|%0, %1}"
@@ -14190,10 +14214,10 @@
    (set_attr "mode" "XI")])
 
 (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*v")
 	(any_extend:V8HI
 	  (vec_select:V8QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
@@ -14233,10 +14257,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v4qiv4si2<mask_name>"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
 	(any_extend:V4SI
 	  (vec_select:V4QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
@@ -14269,10 +14293,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v4hiv4si2<mask_name>"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
 	(any_extend:V4SI
 	  (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
@@ -14313,10 +14337,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2qiv2di2<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
 	(any_extend:V2DI
 	  (vec_select:V2QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %w1}"
@@ -14351,10 +14375,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2hiv2di2<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
 	(any_extend:V2DI
 	  (vec_select:V2HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
@@ -14386,10 +14410,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2siv2di2<mask_name>"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
 	(any_extend:V2DI
 	  (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "Yrm,*vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
@@ -14430,8 +14454,8 @@
 
 (define_insn "sse4_1_ptest"
   [(set (reg:CC FLAGS_REG)
-	(unspec:CC [(match_operand:V2DI 0 "register_operand" "x")
-		    (match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+	(unspec:CC [(match_operand:V2DI 0 "register_operand" "Yr,*x")
+		    (match_operand:V2DI 1 "nonimmediate_operand" "Yrm,*xm")]
 		   UNSPEC_PTEST))]
   "TARGET_SSE4_1"
   "%vptest\t{%1, %0|%0, %1}"
@@ -14441,10 +14465,10 @@
    (set_attr "mode" "TI")])
 
 (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "xm")
-	   (match_operand:SI 2 "const_0_to_15_operand" "n")]
+	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "Yrm,*xm")
+	   (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
 	  UNSPEC_ROUND))]
   "TARGET_ROUND"
   "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
@@ -14524,24 +14548,25 @@
 })
 
 (define_insn "sse4_1_round<ssescalarmodesuffix>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:VF_128
 	  (unspec:VF_128
-	    [(match_operand:VF_128 2 "register_operand" "x,x")
-	     (match_operand:SI 3 "const_0_to_15_operand" "n,n")]
+	    [(match_operand:VF_128 2 "register_operand" "Yr,*x,x")
+	     (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")]
 	    UNSPEC_ROUND)
-	  (match_operand:VF_128 1 "register_operand" "0,x")
+	  (match_operand:VF_128 1 "register_operand" "0,0,x")
 	  (const_int 1)))]
   "TARGET_ROUND"
   "@
    round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "<MODE>")])
 
 (define_expand "round<mode>2"
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 91228c8..d4ce519 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -63,6 +63,7 @@
 (define_subst_attr "mask_prefix" "mask" "vex" "evex")
 (define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex")
 (define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex")
+(define_subst_attr "mask_prefix4" "mask" "orig,orig,vex" "evex")
 (define_subst_attr "mask_expand_op3" "mask" "3" "5")
 
 (define_subst "mask"
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 735e6e5..b5c6e4f 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -395,6 +395,10 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb",
 DEF_TUNE (X86_TUNE_VECTOR_PARALLEL_EXECUTION, "vec_parallel",
           m_NEHALEM | m_SANDYBRIDGE | m_HASWELL)
 
+/* X86_TUNE_AVOID_4BYTE_PREFIXES: Avoid instructions requiring 4+ bytes of prefixes.  */
+DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
+          m_SILVERMONT | m_INTEL)
+
 /*****************************************************************************/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too)     */
 /*****************************************************************************/
diff --git a/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c b/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
index 0aa5264..b347a4a 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
@@ -10,4 +10,4 @@ test (long long b)
   return _mm_cvtsi64_si128 (b); 
 }
 
-/* { dg-final { scan-assembler-times "vec_concatv2di/3" 1 } } */
+/* { dg-final { scan-assembler-times "vec_concatv2di/4" 1 } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
@ 2014-07-03 11:24 Uros Bizjak
  0 siblings, 0 replies; 27+ messages in thread
From: Uros Bizjak @ 2014-07-03 11:24 UTC (permalink / raw)
  To: gcc-patches
  Cc: Илья
	Энкович,
	Andi Kleen, Vladimir Makarov, Jakub Jelinek

Hello!

>> There is already a higher priority for registers not requiring REX.
>> My patch affects cases when compiler has to use xmm8-15 and it just
>> tries to say LRA to assign them for non SSE4 instructions.  I doubt it
>> would have some use for other targets than Silvermont.
>
> When it is just a hint, shouldn't there be something like Ya,???x
> or Ya,!x or similar in the SSE4 constraints?  I mean, xmm{8-15} can be used,
> just is costly.

Maybe we can use "Ya*x", similar to *pushdf pattern, where it is
costly - but tolerable - to push DFmode value through integer
registers.

Oh, and I didn't notice that Ya name is already taken...

Uros.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 10:56     ` Jakub Jelinek
@ 2014-07-03 11:07       ` Ilya Enkovich
  0 siblings, 0 replies; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-03 11:07 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andi Kleen, gcc-patches, Vladimir Makarov

2014-07-03 14:56 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
> On Thu, Jul 03, 2014 at 02:49:10PM +0400, Ilya Enkovich wrote:
>> 2014-07-02 20:21 GMT+04:00 Andi Kleen <andi@firstfloor.org>:
>> > Ilya Enkovich <enkovich.gnu@gmail.com> writes:
>> >
>> >> Silvermont processors have penalty for instructions having 4+ bytes of
>> >> prefixes (including escape bytes in opcode).  This situation happens
>> >> when REX prefix is used in SSE4 instructions.  This patch tries to
>> >> avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
>> >> those instructions.  I achieved it by adding new tuning flag and new
>> >> alternatives affected by tuning.
>> >
>> > Why make it a tuning flag? Shouldn't this help unconditionally for code
>> > size everywhere? Or is there some drawback?
>>
>> There is already a higher priority for registers not requiring REX.
>> My patch affects cases when compiler has to use xmm8-15 and it just
>> tries to say LRA to assign them for non SSE4 instructions.  I doubt it
>> would have some use for other targets than Silvermont.
>
> When it is just a hint, shouldn't there be something like Ya,???x
> or Ya,!x or similar in the SSE4 constraints?  I mean, xmm{8-15} can be used,
> just is costly.
>

I made it Ya,?x.  I do not know how many '?' would be reasonable here.
One was enough to get gain on tests where I expected it.

Ilya

>         Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-03 10:49   ` Ilya Enkovich
@ 2014-07-03 10:56     ` Jakub Jelinek
  2014-07-03 11:07       ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Jelinek @ 2014-07-03 10:56 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Andi Kleen, gcc-patches, Vladimir Makarov

On Thu, Jul 03, 2014 at 02:49:10PM +0400, Ilya Enkovich wrote:
> 2014-07-02 20:21 GMT+04:00 Andi Kleen <andi@firstfloor.org>:
> > Ilya Enkovich <enkovich.gnu@gmail.com> writes:
> >
> >> Silvermont processors have penalty for instructions having 4+ bytes of
> >> prefixes (including escape bytes in opcode).  This situation happens
> >> when REX prefix is used in SSE4 instructions.  This patch tries to
> >> avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
> >> those instructions.  I achieved it by adding new tuning flag and new
> >> alternatives affected by tuning.
> >
> > Why make it a tuning flag? Shouldn't this help unconditionally for code
> > size everywhere? Or is there some drawback?
> 
> There is already a higher priority for registers not requiring REX.
> My patch affects cases when compiler has to use xmm8-15 and it just
> tries to say LRA to assign them for non SSE4 instructions.  I doubt it
> would have some use for other targets than Silvermont.

When it is just a hint, shouldn't there be something like Ya,???x
or Ya,!x or similar in the SSE4 constraints?  I mean, xmm{8-15} can be used,
just is costly.

	Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-02 16:27   ` Jakub Jelinek
  2014-07-02 17:08     ` Mike Stump
@ 2014-07-03 10:54     ` Ilya Enkovich
  1 sibling, 0 replies; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-03 10:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andi Kleen, gcc-patches

2014-07-02 20:27 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
> On Wed, Jul 02, 2014 at 09:21:25AM -0700, Andi Kleen wrote:
>> Ilya Enkovich <enkovich.gnu@gmail.com> writes:
>>
>> > Silvermont processors have penalty for instructions having 4+ bytes of
>> > prefixes (including escape bytes in opcode).  This situation happens
>> > when REX prefix is used in SSE4 instructions.  This patch tries to
>> > avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
>> > those instructions.  I achieved it by adding new tuning flag and new
>> > alternatives affected by tuning.
>>
>> Why make it a tuning flag? Shouldn't this help unconditionally for code
>> size everywhere? Or is there some drawback?
>
> I don't think it will make code smaller, if you already have some value in
> xmm8..xmm15 register, then by not allowing those registers directly on SSE4
> insns just means it reloading and larger code.
>
> BTW, is that change needed also when emitting AVX insns instead of SSE4?

This is for Silvermont only. It does not have AVX.

Ilya

>
>         Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-02 16:21 ` Andi Kleen
  2014-07-02 16:27   ` Jakub Jelinek
@ 2014-07-03 10:49   ` Ilya Enkovich
  2014-07-03 10:56     ` Jakub Jelinek
  1 sibling, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-03 10:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

2014-07-02 20:21 GMT+04:00 Andi Kleen <andi@firstfloor.org>:
> Ilya Enkovich <enkovich.gnu@gmail.com> writes:
>
>> Silvermont processors have penalty for instructions having 4+ bytes of
>> prefixes (including escape bytes in opcode).  This situation happens
>> when REX prefix is used in SSE4 instructions.  This patch tries to
>> avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
>> those instructions.  I achieved it by adding new tuning flag and new
>> alternatives affected by tuning.
>
> Why make it a tuning flag? Shouldn't this help unconditionally for code
> size everywhere? Or is there some drawback?

There is already a higher priority for registers not requiring REX.
My patch affects cases when compiler has to use xmm8-15 and it just
tries to say LRA to assign them for non SSE4 instructions.  I doubt it
would have some use for other targets than Silvermont.

Ilya

>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-02 17:08     ` Mike Stump
@ 2014-07-02 18:30       ` Andi Kleen
  0 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-07-02 18:30 UTC (permalink / raw)
  To: Mike Stump; +Cc: Jakub Jelinek, Ilya Enkovich, gcc-patches, hubicka

Mike Stump <mikestump@comcast.net> writes:

>>>> Silvermont processors have penalty for instructions having 4+ bytes of
>>>> prefixes (including escape bytes in opcode).  This situation happens
>>>> when REX prefix is used in SSE4 instructions.  This patch tries to
>>>> avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
>>>> those instructions.  I achieved it by adding new tuning flag and new
>>>> alternatives affected by tuning.
>>> 
>>> Why make it a tuning flag? Shouldn't this help unconditionally for code
>>> size everywhere? Or is there some drawback? 
>> 
>> I don't think it will make code smaller, if you already have some value in
>> xmm8..xmm15 register, then by not allowing those registers directly on SSE4
>> insns just means it reloading and larger code.
>
> I can’t help but think a better way to do this is to explain the costs
> of the REX registers as being more expensive, then let the register
> allocator prefer the cheaper registers.  You then leave them all as
> valid, which I think is better than disappearing 1/2 of the registers.

Yes that would sound like a much better strategy.

BTW I thought gcc already did that for the integer registers to
avoid unnecessary prefixes, but maybe I misremember. Copying Honza.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-02 16:27   ` Jakub Jelinek
@ 2014-07-02 17:08     ` Mike Stump
  2014-07-02 18:30       ` Andi Kleen
  2014-07-03 10:54     ` Ilya Enkovich
  1 sibling, 1 reply; 27+ messages in thread
From: Mike Stump @ 2014-07-02 17:08 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andi Kleen, Ilya Enkovich, gcc-patches

On Jul 2, 2014, at 9:27 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jul 02, 2014 at 09:21:25AM -0700, Andi Kleen wrote:
>> Ilya Enkovich <enkovich.gnu@gmail.com> writes:
>> 
>>> Silvermont processors have penalty for instructions having 4+ bytes of
>>> prefixes (including escape bytes in opcode).  This situation happens
>>> when REX prefix is used in SSE4 instructions.  This patch tries to
>>> avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
>>> those instructions.  I achieved it by adding new tuning flag and new
>>> alternatives affected by tuning.
>> 
>> Why make it a tuning flag? Shouldn't this help unconditionally for code
>> size everywhere? Or is there some drawback? 
> 
> I don't think it will make code smaller, if you already have some value in
> xmm8..xmm15 register, then by not allowing those registers directly on SSE4
> insns just means it reloading and larger code.

I can’t help but think a better way to do this is to explain the costs of the REX registers as being more expensive, then let the register allocator prefer the cheaper registers.  You then leave them all as valid, which I think is better than disappearing 1/2 of the registers.  Is it really cheaper to spill and reload those than use a REX register?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-02 16:21 ` Andi Kleen
@ 2014-07-02 16:27   ` Jakub Jelinek
  2014-07-02 17:08     ` Mike Stump
  2014-07-03 10:54     ` Ilya Enkovich
  2014-07-03 10:49   ` Ilya Enkovich
  1 sibling, 2 replies; 27+ messages in thread
From: Jakub Jelinek @ 2014-07-02 16:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ilya Enkovich, gcc-patches

On Wed, Jul 02, 2014 at 09:21:25AM -0700, Andi Kleen wrote:
> Ilya Enkovich <enkovich.gnu@gmail.com> writes:
> 
> > Silvermont processors have penalty for instructions having 4+ bytes of
> > prefixes (including escape bytes in opcode).  This situation happens
> > when REX prefix is used in SSE4 instructions.  This patch tries to
> > avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
> > those instructions.  I achieved it by adding new tuning flag and new
> > alternatives affected by tuning.
> 
> Why make it a tuning flag? Shouldn't this help unconditionally for code
> size everywhere? Or is there some drawback? 

I don't think it will make code smaller, if you already have some value in
xmm8..xmm15 register, then by not allowing those registers directly on SSE4
insns just means it reloading and larger code.

BTW, is that change needed also when emitting AVX insns instead of SSE4?

	Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
  2014-07-02 10:44 Ilya Enkovich
@ 2014-07-02 16:21 ` Andi Kleen
  2014-07-02 16:27   ` Jakub Jelinek
  2014-07-03 10:49   ` Ilya Enkovich
  0 siblings, 2 replies; 27+ messages in thread
From: Andi Kleen @ 2014-07-02 16:21 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches

Ilya Enkovich <enkovich.gnu@gmail.com> writes:

> Silvermont processors have penalty for instructions having 4+ bytes of
> prefixes (including escape bytes in opcode).  This situation happens
> when REX prefix is used in SSE4 instructions.  This patch tries to
> avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in
> those instructions.  I achieved it by adding new tuning flag and new
> alternatives affected by tuning.

Why make it a tuning flag? Shouldn't this help unconditionally for code
size everywhere? Or is there some drawback? 

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH, i386] Add prefixes avoidance tuning for silvermont target
@ 2014-07-02 10:44 Ilya Enkovich
  2014-07-02 16:21 ` Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2014-07-02 10:44 UTC (permalink / raw)
  To: gcc-patches

Hi,

Silvermont processors have penalty for instructions having 4+ bytes of prefixes (including escape bytes in opcode).  This situation happens when REX prefix is used in SSE4 instructions.  This patch tries to avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in those instructions.  I achieved it by adding new tuning flag and new alternatives affected by tuning.

SSE4 instructions are not very widely used by GCC but I see some significant gains caused by this patch (tested on Avoton on -O3).

DENmark (EEMBC2.0) +2%
TCPmark (EEMBC2.0) +32%
SPEC2000 +0,5%
	  
Bootstrapped and tested on linux-x86_64.

Does it look OK for trunk?

Thanks,
Ilya
--
gcc/

2014-07-02  Ilya Enkovich  <ilya.enkovich@intel.com>

	* config/i386/constraints.md (Yr): New.
	* config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
	(REG_CLASS_NAMES): Likewise.
	(REG_CLASS_CONTENTS): Likewise.
	* config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
	which use only NO_REX_SSE_REGS.
	(vec_set<mode>_0): Likewise.
	(*vec_setv4sf_sse4_1): Likewise.
	(sse4_1_insertps): Likewise.
	(*sse4_1_extractps): Likewise.
	(*sse4_1_mulv2siv2di3): Likewise.
	(*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise.
	(*sse4_1_<code><mode>3): Likewise.
	(*sse4_1_<code><mode>3): Likewise.
	(*sse4_1_eqv2di3): Likewise.
	(sse4_2_gtv2di3): Likewise.
	(*vec_extractv4si): Likewise.
	(*vec_concatv2si_sse4_1): Likewise.
	(vec_concatv2di): Likewise.
	(<sse4_1>_blend<ssemodesuffix><avxsizesuffix>): Likewise.
	(<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>): Likewise.
	(<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise.
	(<sse4_1_avx2>_movntdqa): Likewise.
	(<sse4_1_avx2>_mpsadbw): Likewise.
	(sse4_1_packusdw): Likewise.
	(<sse4_1_avx2>_pblendvb): Likewise.
	(sse4_1_pblendw): Likewise.
	(sse4_1_phminposuw): Likewise.
	(sse4_1_<code>v8qiv8hi2): Likewise.
	(sse4_1_<code>v4qiv4si2): Likewise.
	(sse4_1_<code>v4hiv4si2): Likewise.
	(sse4_1_<code>v2qiv2di2): Likewise.
	(sse4_1_<code>v2hiv2di2): Likewise.
	(sse4_1_<code>v2siv2di2): Likewise.
	(sse4_1_ptest): Likewise.
	(<sse4_1>_round<ssemodesuffix><avxsizesuffix>): Likewise.
	(sse4_1_round<ssescalarmodesuffix>): Likewise.
	* config/i386/subst.md (mask_prefix4): New.
	* config/i386/x86-tune.def (X86_TUNE_AVOID_4BYTE_PREFIXES): New.

gcc/testsuites/

2014-07-02  Ilya Enkovich  <ilya.enkovich@intel.com>

	* gcc.target/i386/sse2-init-v2di-2.c: Adjust to changed
	vec_concatv2di template.



diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 8e0a583..d6a0ea9 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -105,6 +105,8 @@
 ;;  d	Integer register when integer DFmode moves are enabled
 ;;  x	Integer register when integer XFmode moves are enabled
 ;;  f	x87 register when 80387 floating point arithmetic is enabled
+;;  r	SSE regs not requiring REX prefix when prefixes avoidance is enabled
+;;	and all SSE regs otherwise
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -147,6 +149,10 @@
  "(ix86_fpmath & FPMATH_387) ? FLOAT_REGS : NO_REGS"
  "@internal Any x87 register when 80387 FP arithmetic is enabled.")
 
+(define_register_constraint "Yr"
+ "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : ALL_SSE_REGS) : NO_REGS"
+ "@internal Lower SSE register when avoiding REX prefix and all SSE registers otherwise.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 9e3ef94..54239e9 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1282,6 +1282,7 @@ enum reg_class
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
+  NO_REX_SSE_REGS,
   SSE_REGS,
   EVEX_SSE_REGS,
   ALL_SSE_REGS,
@@ -1339,6 +1340,7 @@ enum reg_class
    "FP_TOP_REG", "FP_SECOND_REG",	\
    "FLOAT_REGS",			\
    "SSE_FIRST_REG",			\
+   "NO_REX_SSE_REGS",			\
    "SSE_REGS",				\
    "EVEX_SSE_REGS",			\
    "ALL_SSE_REGS",			\
@@ -1378,6 +1380,7 @@ enum reg_class
     { 0x0200,       0x0,   0x0 },       /* FP_SECOND_REG */             \
     { 0xff00,       0x0,   0x0 },       /* FLOAT_REGS */                \
   { 0x200000,       0x0,   0x0 },       /* SSE_FIRST_REG */             \
+{ 0x1fe00000,  0x000000,   0x0 },       /* NO_REX_SSE_REGS */           \
 { 0x1fe00000,  0x1fe000,   0x0 },       /* SSE_REGS */                  \
        { 0x0,0xffe00000,  0x1f },       /* EVEX_SSE_REGS */             \
 { 0x1fe00000,0xffffe000,  0x1f },       /* ALL_SSE_REGS */              \
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d907353..1e561c2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5585,26 +5585,28 @@
 ;; Although insertps takes register source, we prefer
 ;; unpcklps with register source since it is shorter.
 (define_insn "*vec_concatv2sf_sse4_1"
-  [(set (match_operand:V2SF 0 "register_operand"     "=x,x,x,x,x,*y ,*y")
+  [(set (match_operand:V2SF 0 "register_operand"     "=Yr,?x,x,Yr,?x,x,x,*y ,*y")
 	(vec_concat:V2SF
-	  (match_operand:SF 1 "nonimmediate_operand" " 0,x,0,x,m, 0 , m")
-	  (match_operand:SF 2 "vector_move_operand"  " x,x,m,m,C,*ym, C")))]
+	  (match_operand:SF 1 "nonimmediate_operand" "  0, 0,x, 0,0, x,m, 0 , m")
+	  (match_operand:SF 2 "vector_move_operand"  " Yr,?x,x, m,m, m,C,*ym, C")))]
   "TARGET_SSE4_1"
   "@
    unpcklps\t{%2, %0|%0, %2}
+   unpcklps\t{%2, %0|%0, %2}
    vunpcklps\t{%2, %1, %0|%0, %1, %2}
    insertps\t{$0x10, %2, %0|%0, %2, 0x10}
+   insertps\t{$0x10, %2, %0|%0, %2, 0x10}
    vinsertps\t{$0x10, %2, %1, %0|%0, %1, %2, 0x10}
    %vmovss\t{%1, %0|%0, %1}
    punpckldq\t{%2, %0|%0, %2}
    movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
-   (set_attr "type" "sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_data16" "*,*,1,*,*,*,*")
-   (set_attr "prefix_extra" "*,*,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,1,1,*,*,*")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
+  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
+   (set_attr "type" "sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
+   (set_attr "prefix_data16" "*,*,*,1,1,*,*,*,*")
+   (set_attr "prefix_extra" "*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
+   (set_attr "mode" "V4SF,V4SF,V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
@@ -5652,49 +5654,51 @@
 ;; see comment above inline_secondary_memory_needed function in i386.c
 (define_insn "vec_set<mode>_0"
   [(set (match_operand:VI4F_128 0 "nonimmediate_operand"
-	  "=x,x,x ,x,x,x,x  ,x  ,m ,m   ,m")
+	  "=Yr,?x,x,x ,x,x,x, Yr, ?x ,x  ,m ,m   ,m")
 	(vec_merge:VI4F_128
 	  (vec_duplicate:VI4F_128
 	    (match_operand:<ssescalarmode> 2 "general_operand"
-	  " x,m,*r,m,x,x,*rm,*rm,!x,!*re,!*fF"))
+	  " Yr,?x,m,*r,m,x,x,*rm,*rm,*rm,!x,!*re,!*fF"))
 	  (match_operand:VI4F_128 1 "vector_move_operand"
-	  " C,C,C ,C,0,x,0  ,x  ,0 ,0   ,0")
+	  "  C,C, C,C ,C,0,x, 0, 0  ,x  ,0 ,0   ,0")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
+   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
    %vmov<ssescalarmodesuffix>\t{%2, %0|%0, %2}
    %vmovd\t{%2, %0|%0, %2}
    movss\t{%2, %0|%0, %2}
    movss\t{%2, %0|%0, %2}
    vmovss\t{%2, %1, %0|%0, %1, %2}
    pinsrd\t{$0, %2, %0|%0, %2, 0}
+   pinsrd\t{$0, %2, %0|%0, %2, 0}
    vpinsrd\t{$0, %2, %1, %0|%0, %1, %2, 0}
    #
    #
    #"
-  [(set_attr "isa" "sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,avx,*,*,*")
+  [(set_attr "isa" "sse4,sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
    (set (attr "type")
-     (cond [(eq_attr "alternative" "0,6,7")
+     (cond [(eq_attr "alternative" "0,7,9")
 	      (const_string "sselog")
-	    (eq_attr "alternative" "9")
+	    (eq_attr "alternative" "11")
 	      (const_string "imov")
-	    (eq_attr "alternative" "10")
+	    (eq_attr "alternative" "12")
 	      (const_string "fmov")
 	   ]
 	   (const_string "ssemov")))
-   (set_attr "prefix_extra" "*,*,*,*,*,*,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,*,*,*,*,1,1,*,*,*")
-   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,vex,*,*,*")
-   (set_attr "mode" "SF,<ssescalarmode>,SI,SF,SF,SF,TI,TI,*,*,*")])
+   (set_attr "prefix_extra" "*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
+   (set_attr "mode" "SF,SF,<ssescalarmode>,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
 
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
-  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,?x,x")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (match_operand:SF 2 "nonimmediate_operand" "xm,xm"))
-	  (match_operand:V4SF 1 "register_operand" "0,x")
+	    (match_operand:SF 2 "nonimmediate_operand" "Yrm,?xm,xm"))
+	  (match_operand:V4SF 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
    && ((unsigned) exact_log2 (INTVAL (operands[3]))
@@ -5704,26 +5708,27 @@
   switch (which_alternative)
     {
     case 0:
-      return "insertps\t{%3, %2, %0|%0, %2, %3}";
     case 1:
+      return "insertps\t{%3, %2, %0|%0, %2, %3}";
+    case 2:
       return "vinsertps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "V4SF")])
 
 (define_insn "sse4_1_insertps"
-  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
-	(unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "xm,xm")
-		      (match_operand:V4SF 1 "register_operand" "0,x")
-		      (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,?x,x")
+	(unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "Yrm,?xm,xm")
+		      (match_operand:V4SF 1 "register_operand" "0,0,x")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 		     UNSPEC_INSERTPS))]
   "TARGET_SSE4_1"
 {
@@ -5737,19 +5742,20 @@
   switch (which_alternative)
     {
     case 0:
-      return "insertps\t{%3, %2, %0|%0, %2, %3}";
     case 1:
+      return "insertps\t{%3, %2, %0|%0, %2, %3}";
+    case 2:
       return "vinsertps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "V4SF")])
 
 (define_split
@@ -5791,13 +5797,14 @@
 })
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=*rm,rm,x,x")
 	(vec_select:SF
-	  (match_operand:V4SF 1 "register_operand" "x,0,x")
-	  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n")])))]
+	  (match_operand:V4SF 1 "register_operand" "Yr,?x,0,x")
+	  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n")])))]
   "TARGET_SSE4_1"
   "@
    %vextractps\t{%2, %1, %0|%0, %1, %2}
+   %vextractps\t{%2, %1, %0|%0, %1, %2}
    #
    #"
   "&& reload_completed && SSE_REG_P (operands[0])"
@@ -5822,13 +5829,13 @@
     }
   DONE;
 }
-  [(set_attr "isa" "*,noavx,avx")
-   (set_attr "type" "sselog,*,*")
-   (set_attr "prefix_data16" "1,*,*")
-   (set_attr "prefix_extra" "1,*,*")
-   (set_attr "length_immediate" "1,*,*")
-   (set_attr "prefix" "maybe_vex,*,*")
-   (set_attr "mode" "V4SF,*,*")])
+  [(set_attr "isa" "*,*,noavx,avx")
+   (set_attr "type" "sselog,sselog,*,*")
+   (set_attr "prefix_data16" "1,1,*,*")
+   (set_attr "prefix_extra" "1,1,*,*")
+   (set_attr "length_immediate" "1,1,*,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,*,*")
+   (set_attr "mode" "V4SF,V4SF,*,*")])
 
 (define_insn_and_split "*vec_extractv4sf_mem"
   [(set (match_operand:SF 0 "register_operand" "=x,*r,f")
@@ -7980,25 +7987,26 @@
   "ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);")
 
 (define_insn "*sse4_1_mulv2siv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,?x,x")
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%0,x")
+	      (match_operand:V4SI 1 "nonimmediate_operand" "%0,0,x")
 	      (parallel [(const_int 0) (const_int 2)])))
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 2 "nonimmediate_operand" "xm,xm")
+	      (match_operand:V4SI 2 "nonimmediate_operand" "Yrm,?xm,xm")
 	      (parallel [(const_int 0) (const_int 2)])))))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, V4SImode, operands)"
   "@
    pmuldq\t{%2, %0|%0, %2}
+   pmuldq\t{%2, %0|%0, %2}
    vpmuldq\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseimul")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_expand "avx2_pmaddwd"
@@ -8155,19 +8163,20 @@
 })
 
 (define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
-  [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v")
+  [(set (match_operand:VI4_AVX512F 0 "register_operand" "=Yr,?x,v")
 	(mult:VI4_AVX512F
-	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,0,v")
+	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "Yrm,?xm,vm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    pmulld\t{%2, %0|%0, %2}
+   pmulld\t{%2, %0|%0, %2}
    vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "<mask_prefix3>")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "<mask_prefix4>")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "mul<mode>3"
@@ -8524,18 +8533,19 @@
 })
 
 (define_insn "*sse4_1_<code><mode>3"
-  [(set (match_operand:VI14_128 0 "register_operand" "=x,x")
+  [(set (match_operand:VI14_128 0 "register_operand" "=Yr,?x,x")
 	(smaxmin:VI14_128
-	  (match_operand:VI14_128 1 "nonimmediate_operand" "%0,x")
-	  (match_operand:VI14_128 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:VI14_128 1 "nonimmediate_operand" "%0,0,x")
+	  (match_operand:VI14_128 2 "nonimmediate_operand" "Yrm,?xm,xm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "@
    p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
+   p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*<code>v8hi3"
@@ -8605,18 +8615,19 @@
 })
 
 (define_insn "*sse4_1_<code><mode>3"
-  [(set (match_operand:VI24_128 0 "register_operand" "=x,x")
+  [(set (match_operand:VI24_128 0 "register_operand" "=Yr,?x,x")
 	(umaxmin:VI24_128
-	  (match_operand:VI24_128 1 "nonimmediate_operand" "%0,x")
-	  (match_operand:VI24_128 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:VI24_128 1 "nonimmediate_operand" "%0,0,x")
+	  (match_operand:VI24_128 2 "nonimmediate_operand" "Yrm,?xm,xm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "@
    p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
+   p<maxmin_int><ssemodesuffix>\t{%2, %0|%0, %2}
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
-   (set_attr "prefix_extra" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*<code>v16qi3"
@@ -8684,18 +8695,19 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "*sse4_1_eqv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,?x,x")
 	(eq:V2DI
-	  (match_operand:V2DI 1 "nonimmediate_operand" "%0,x")
-	  (match_operand:V2DI 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:V2DI 1 "nonimmediate_operand" "%0,0,x")
+	  (match_operand:V2DI 2 "nonimmediate_operand" "Yrm,?xm,xm")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (EQ, V2DImode, operands)"
   "@
    pcmpeqq\t{%2, %0|%0, %2}
+   pcmpeqq\t{%2, %0|%0, %2}
    vpcmpeqq\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*sse2_eq<mode>3"
@@ -8731,18 +8743,19 @@
   "ix86_fixup_binary_operands_no_copy (EQ, V2DImode, operands);")
 
 (define_insn "sse4_2_gtv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,?x,x")
 	(gt:V2DI
-	  (match_operand:V2DI 1 "register_operand" "0,x")
-	  (match_operand:V2DI 2 "nonimmediate_operand" "xm,xm")))]
+	  (match_operand:V2DI 1 "register_operand" "0,0,x")
+	  (match_operand:V2DI 2 "nonimmediate_operand" "Yrm,?xm,xm")))]
   "TARGET_SSE4_2"
   "@
    pcmpgtq\t{%2, %0|%0, %2}
+   pcmpgtq\t{%2, %0|%0, %2}
    vpcmpgtq\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "avx2_gt<mode>3"
@@ -10432,9 +10445,9 @@
   "operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));")
 
 (define_insn "*vec_extractv4si"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,x,x")
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,Yr,?x,x")
 	(vec_select:SI
-	  (match_operand:V4SI 1 "register_operand" "x,0,x")
+	  (match_operand:V4SI 1 "register_operand" "x,0,0,x")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "TARGET_SSE4_1"
 {
@@ -10444,10 +10457,11 @@
       return "%vpextrd\t{%2, %1, %0|%0, %1, %2}";
 
     case 1:
+    case 2:
       operands [2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "psrldq\t{%2, %0|%0, %2}";
 
-    case 2:
+    case 3:
       operands [2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
 
@@ -10455,11 +10469,11 @@
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "*,noavx,avx")
-   (set_attr "type" "sselog1,sseishft1,sseishft1")
-   (set_attr "prefix_extra" "1,*,*")
+  [(set_attr "isa" "*,noavx,noavx,avx")
+   (set_attr "type" "sselog1,sseishft1,sseishft1,sseishft1")
+   (set_attr "prefix_extra" "1,*,*,*")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex,orig,vex")
+   (set_attr "prefix" "maybe_vex,orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extractv4si_zext"
@@ -10566,25 +10580,27 @@
    (set_attr "mode" "TI,TI,DF,V4SF")])
 
 (define_insn "*vec_concatv2si_sse4_1"
-  [(set (match_operand:V2SI 0 "register_operand"     "=x, x,x,x, x, *y,*y")
+  [(set (match_operand:V2SI 0 "register_operand"     "=Yr,?x,x, Yr,?x,x, x, *y,*y")
 	(vec_concat:V2SI
-	  (match_operand:SI 1 "nonimmediate_operand" " 0, x,0,x,rm,  0,rm")
-	  (match_operand:SI 2 "vector_move_operand"  "rm,rm,x,x, C,*ym, C")))]
+	  (match_operand:SI 1 "nonimmediate_operand" "  0, 0,x,  0,0, x,rm,  0,rm")
+	  (match_operand:SI 2 "vector_move_operand"  " rm,rm,rm,Yr,?x,x, C,*ym, C")))]
   "TARGET_SSE4_1"
   "@
    pinsrd\t{$1, %2, %0|%0, %2, 1}
+   pinsrd\t{$1, %2, %0|%0, %2, 1}
    vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
    punpckldq\t{%2, %0|%0, %2}
+   punpckldq\t{%2, %0|%0, %2}
    vpunpckldq\t{%2, %1, %0|%0, %1, %2}
    %vmovd\t{%1, %0|%0, %1}
    punpckldq\t{%2, %0|%0, %2}
    movd\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
-   (set_attr "type" "sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
-   (set_attr "prefix_extra" "1,1,*,*,*,*,*")
-   (set_attr "length_immediate" "1,1,*,*,*,*,*")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex,orig,orig")
-   (set_attr "mode" "TI,TI,TI,TI,TI,DI,DI")])
+  [(set_attr "isa" "noavx,noavx,avx,noavx,noavx,avx,*,*,*")
+   (set_attr "type" "sselog,sselog,sselog,sselog,sselog,sselog,ssemov,mmxcvt,mmxmov")
+   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*")
+   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
 ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
@@ -10627,15 +10643,16 @@
 ;; movd instead of movq is required to handle broken assemblers.
 (define_insn "vec_concatv2di"
   [(set (match_operand:V2DI 0 "register_operand"
-	  "=x,x ,Yi,x ,!x,x,x,x,x,x")
+	  "=Yr,?x,x ,Yi,x ,!x,x,x,x,x,x")
 	(vec_concat:V2DI
 	  (match_operand:DI 1 "nonimmediate_operand"
-	  " 0,x ,r ,xm,*y,0,x,0,0,x")
+	  "  0, 0,x ,r ,xm,*y,0,x,0,0,x")
 	  (match_operand:DI 2 "vector_move_operand"
-	  "rm,rm,C ,C ,C ,x,x,x,m,m")))]
+	  "*rm,rm,rm,C ,C ,C ,x,x,x,m,m")))]
   "TARGET_SSE"
   "@
    pinsrq\t{$1, %2, %0|%0, %2, 1}
+   pinsrq\t{$1, %2, %0|%0, %2, 1}
    vpinsrq\t{$1, %2, %1, %0|%0, %1, %2, 1}
    * return HAVE_AS_IX86_INTERUNIT_MOVQ ? \"%vmovq\t{%1, %0|%0, %1}\" : \"%vmovd\t{%1, %0|%0, %1}\";
    %vmovq\t{%1, %0|%0, %1}
@@ -10645,17 +10662,17 @@
    movlhps\t{%2, %0|%0, %2}
    movhps\t{%2, %0|%0, %2}
    vmovhps\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "x64_sse4_noavx,x64_avx,x64,sse2,sse2,sse2_noavx,avx,noavx,noavx,avx")
+  [(set_attr "isa" "x64_sse4_noavx,x64_sse4_noavx,x64_avx,x64,sse2,sse2,sse2_noavx,avx,noavx,noavx,avx")
    (set (attr "type")
      (if_then_else
-       (eq_attr "alternative" "0,1,5,6")
+       (eq_attr "alternative" "0,2,6,7")
        (const_string "sselog")
        (const_string "ssemov")))
-   (set_attr "prefix_rex" "1,1,1,*,*,*,*,*,*,*")
-   (set_attr "prefix_extra" "1,1,*,*,*,*,*,*,*,*")
-   (set_attr "length_immediate" "1,1,*,*,*,*,*,*,*,*")
-   (set_attr "prefix" "orig,vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex")
-   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")])
+   (set_attr "prefix_rex" "1,1,1,1,*,*,*,*,*,*,*")
+   (set_attr "prefix_extra" "1,1,1,*,*,*,*,*,*,*,*")
+   (set_attr "length_immediate" "1,1,1,*,*,*,*,*,*,*,*")
+   (set_attr "prefix" "orig,orig,vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex")
+   (set_attr "mode" "TI,TI,TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")])
 
 (define_expand "vec_unpacks_lo_<mode>"
   [(match_operand:<sseunpackmode> 0 "register_operand")
@@ -11552,91 +11569,95 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "<sse4_1>_blend<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,?x,x")
 	(vec_merge:VF_128_256
-	  (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	  (match_operand:VF_128_256 1 "register_operand" "0,x")
+	  (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,?xm,xm")
+	  (match_operand:VF_128_256 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_0_to_<blendbits>_operand")))]
   "TARGET_SSE4_1"
   "@
    blend<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   blend<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vblend<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,?x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "register_operand" "0,x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:VF_128_256 3 "register_operand" "Yz,x")]
+	  [(match_operand:VF_128_256 1 "register_operand" "0,0,x")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,?xm,xm")
+	   (match_operand:VF_128_256 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
   "@
    blendv<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   blendv<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vblendv<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector") 
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector") 
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse4_1>_dp<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,?x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,x")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "Yrm,?xm,xm")
+	   (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 	  UNSPEC_DP))]
   "TARGET_SSE4_1"
   "@
    dp<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   dp<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vdp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemul")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse4_1_avx2>_movntdqa"
-  [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=x, v")
-	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m, m")]
+  [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=Yr,?x, v")
+	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m, m, m")]
 		     UNSPEC_MOVNTDQA))]
   "TARGET_SSE4_1"
   "%vmovntdqa\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
-   (set_attr "prefix_extra" "1, *")
-   (set_attr "prefix" "maybe_vex, evex")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "maybe_vex,maybe_vex,evex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse4_1_avx2>_mpsadbw"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,?x,x")
 	(unspec:VI1_AVX2
-	  [(match_operand:VI1_AVX2 1 "register_operand" "0,x")
-	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:SI 3 "const_0_to_255_operand" "n,n")]
+	  [(match_operand:VI1_AVX2 1 "register_operand" "0,0,x")
+	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "Yrm,?xm,xm")
+	   (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 	  UNSPEC_MPSADBW))]
   "TARGET_SSE4_1"
   "@
    mpsadbw\t{%3, %2, %0|%0, %2, %3}
+   mpsadbw\t{%3, %2, %0|%0, %2, %3}
    vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_packusdw"
@@ -11654,56 +11675,59 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_packusdw"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,?x,x")
 	(vec_concat:V8HI
 	  (us_truncate:V4HI
-	    (match_operand:V4SI 1 "register_operand" "0,x"))
+	    (match_operand:V4SI 1 "register_operand" "0,0,x"))
 	  (us_truncate:V4HI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,xm"))))]
+	    (match_operand:V4SI 2 "nonimmediate_operand" "Yrm,?xm,xm"))))]
   "TARGET_SSE4_1"
   "@
    packusdw\t{%2, %0|%0, %2}
+   packusdw\t{%2, %0|%0, %2}
    vpackusdw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "<sse4_1_avx2>_pblendvb"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,?x,x")
 	(unspec:VI1_AVX2
-	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,x")
-	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")
-	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,x")]
+	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,0,x")
+	   (match_operand:VI1_AVX2 2 "nonimmediate_operand" "Yrm,?xm,xm")
+	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
   "@
    pblendvb\t{%3, %2, %0|%0, %2, %3}
+   pblendvb\t{%3, %2, %0|%0, %2, %3}
    vpblendvb\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "*,1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "vector,vector")
+   (set_attr "length_immediate" "*,*,1")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "sse4_1_pblendw"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,?x,x")
 	(vec_merge:V8HI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,xm")
-	  (match_operand:V8HI 1 "register_operand" "0,x")
-	  (match_operand:SI 3 "const_0_to_255_operand" "n,n")))]
+	  (match_operand:V8HI 2 "nonimmediate_operand" "Yrm,?xm,xm")
+	  (match_operand:V8HI 1 "register_operand" "0,0,x")
+	  (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")))]
   "TARGET_SSE4_1"
   "@
    pblendw\t{%3, %2, %0|%0, %2, %3}
+   pblendw\t{%3, %2, %0|%0, %2, %3}
    vpblendw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
 ;; The builtin uses an 8-bit immediate.  Expand that.
@@ -11751,8 +11775,8 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "sse4_1_phminposuw"
-  [(set (match_operand:V8HI 0 "register_operand" "=x")
-	(unspec:V8HI [(match_operand:V8HI 1 "nonimmediate_operand" "xm")]
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,?x")
+	(unspec:V8HI [(match_operand:V8HI 1 "nonimmediate_operand" "Yrm,?xm")]
 		     UNSPEC_PHMINPOSUW))]
   "TARGET_SSE4_1"
   "%vphminposuw\t{%1, %0|%0, %1}"
@@ -11773,10 +11797,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v8qiv8hi2"
-  [(set (match_operand:V8HI 0 "register_operand" "=x")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,?x")
 	(any_extend:V8HI
 	  (vec_select:V8QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,?xm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
@@ -11816,10 +11840,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v4qiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=x")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,?x")
 	(any_extend:V4SI
 	  (vec_select:V4QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,?xm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1"
@@ -11852,10 +11876,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v4hiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=x")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,?x")
 	(any_extend:V4SI
 	  (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,?xm")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1"
@@ -11896,10 +11920,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2qiv2di2"
-  [(set (match_operand:V2DI 0 "register_operand" "=x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,?x")
 	(any_extend:V2DI
 	  (vec_select:V2QI
-	    (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	    (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,?xm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1"
   "%vpmov<extsuffix>bq\t{%1, %0|%0, %w1}"
@@ -11934,10 +11958,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2hiv2di2"
-  [(set (match_operand:V2DI 0 "register_operand" "=x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,?x")
 	(any_extend:V2DI
 	  (vec_select:V2HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,?xm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1"
   "%vpmov<extsuffix>wq\t{%1, %0|%0, %k1}"
@@ -11968,10 +11992,10 @@
    (set_attr "mode" "OI")])
 
 (define_insn "sse4_1_<code>v2siv2di2"
-  [(set (match_operand:V2DI 0 "register_operand" "=x")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,?x")
 	(any_extend:V2DI
 	  (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "xm")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "Yrm,?xm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE4_1"
   "%vpmov<extsuffix>dq\t{%1, %0|%0, %q1}"
@@ -12012,8 +12036,8 @@
 
 (define_insn "sse4_1_ptest"
   [(set (reg:CC FLAGS_REG)
-	(unspec:CC [(match_operand:V2DI 0 "register_operand" "x")
-		    (match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+	(unspec:CC [(match_operand:V2DI 0 "register_operand" "Yr,?x")
+		    (match_operand:V2DI 1 "nonimmediate_operand" "Yrm,?xm")]
 		   UNSPEC_PTEST))]
   "TARGET_SSE4_1"
   "%vptest\t{%1, %0|%0, %1}"
@@ -12023,10 +12047,10 @@
    (set_attr "mode" "TI")])
 
 (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,?x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "xm")
-	   (match_operand:SI 2 "const_0_to_15_operand" "n")]
+	  [(match_operand:VF_128_256 1 "nonimmediate_operand" "Yrm,?xm")
+	   (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
 	  UNSPEC_ROUND))]
   "TARGET_ROUND"
   "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
@@ -12106,24 +12130,25 @@
 })
 
 (define_insn "sse4_1_round<ssescalarmodesuffix>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=Yr,?x,x")
 	(vec_merge:VF_128
 	  (unspec:VF_128
-	    [(match_operand:VF_128 2 "register_operand" "x,x")
-	     (match_operand:SI 3 "const_0_to_15_operand" "n,n")]
+	    [(match_operand:VF_128 2 "register_operand" "Yr,?x,x")
+	     (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")]
 	    UNSPEC_ROUND)
-	  (match_operand:VF_128 1 "register_operand" "0,x")
+	  (match_operand:VF_128 1 "register_operand" "0,0,x")
 	  (const_int 1)))]
   "TARGET_ROUND"
   "@
    round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
    vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "<MODE>")])
 
 (define_expand "round<mode>2"
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 1654cba..c6d7178 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -57,6 +57,7 @@
 (define_subst_attr "mask_prefix" "mask" "vex" "evex")
 (define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex")
 (define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex")
+(define_subst_attr "mask_prefix4" "mask" "orig,orig,vex" "evex")
 
 (define_subst "mask"
   [(set (match_operand:SUBST_V 0)
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index cb44dc3..825ef14 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -395,6 +395,10 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb",
 DEF_TUNE (X86_TUNE_VECTOR_PARALLEL_EXECUTION, "vec_parallel",
           m_NEHALEM | m_SANDYBRIDGE | m_HASWELL)
 
+/* X86_TUNE_AVOID_4BYTE_PREFIXES: Avoid instructions requiring 4+ bytes of prefixes.  */
+DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
+          m_SILVERMONT | m_INTEL)
+
 /*****************************************************************************/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too)     */
 /*****************************************************************************/
diff --git a/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c b/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
index 6a50573..4739e4f 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c
@@ -10,4 +10,4 @@ test (long long b)
   return _mm_cvtsi64_si128 (b); 
 }
 
-/* { dg-final { scan-assembler-times "vec_concatv2di/3" 1 } } */
+/* { dg-final { scan-assembler-times "vec_concatv2di/4" 1 } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-12-04 14:05 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-02 17:03 [PATCH, i386] Add prefixes avoidance tuning for silvermont target Uros Bizjak
2014-07-03 10:45 ` Ilya Enkovich
2014-07-03 11:11   ` Uros Bizjak
2014-07-03 11:51     ` Ilya Enkovich
2014-07-03 12:07       ` Uros Bizjak
2014-07-03 13:38         ` Ilya Enkovich
2014-07-14 14:22           ` Ilya Enkovich
2014-07-14 19:58           ` Uros Bizjak
2014-07-15  8:30             ` Ilya Enkovich
2014-07-15  8:46               ` Uros Bizjak
2014-07-15 11:02                 ` Ilya Enkovich
2014-07-15 14:03                   ` Uros Bizjak
2014-11-05  9:35                     ` Ilya Enkovich
2014-11-05 10:00                       ` Uros Bizjak
2014-12-02 11:08                         ` Ilya Enkovich
2014-12-02 11:21                           ` Uros Bizjak
2014-12-04 14:05                             ` Ilya Enkovich
  -- strict thread matches above, loose matches on Subject: below --
2014-07-03 11:24 Uros Bizjak
2014-07-02 10:44 Ilya Enkovich
2014-07-02 16:21 ` Andi Kleen
2014-07-02 16:27   ` Jakub Jelinek
2014-07-02 17:08     ` Mike Stump
2014-07-02 18:30       ` Andi Kleen
2014-07-03 10:54     ` Ilya Enkovich
2014-07-03 10:49   ` Ilya Enkovich
2014-07-03 10:56     ` Jakub Jelinek
2014-07-03 11:07       ` Ilya Enkovich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).