[MIPS] Add sbasic supoert ffor MSA (SIMD)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [MIPS] Add sbasic supoert ffor MSA (SIMD)
@ 2014-05-21 13:50 Graham Stott
  2014-05-21 17:59 ` Joseph S. Myers
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Graham Stott @ 2014-05-21 13:50 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

RichardS

I sent this last week but it never made it to the list

This is the latest incarnation of the patch to add basic SIMD support to the MIPS backend
find attached a msa.tgz with MSA.patch and MSA.ChangeLog.

RichardS as you are aware from the various patches that have been hitting your inbox
recently that the  MIPS backend is undergoing some churn with many pending patches.
For things such as  LRA, FPXX/modeless, R6 and MSA has is now added to that list.

A few things to be note.

FPXX/modeless.
=============
Currently MSA is using  its own ABI but will switch to use the new FPXX/modeless ABI 
making things easier when mixing -msa and -mno-msa code together.

The existing MSA ABI is a stop gap until then.

A patch switch to the new MSA ABI will follow.

LRA.
====
MSA doesn't have an explicit dependency on LRA.

R6
==
MSA doesn't have an explicit dependency on LRA.

Pipeline description
=================
A pipeline description p5600.md and updates to mips-msa.md 
to use that when scheduling MSA code.

Header msa.h for MSA.
====================
msa.h  is  included, which will be installed when configured/built/installed.
 It provides prototypes, typedefs etc for the vector types used MSA its contents follow the MSA whitepaper.

No testsuite effect target bits autovect.
=================================
The autovec tests are skipped for MSA because of the absence of any effective target
configuration that enable these tests on MIPS.

A testsuite patch to add the basic mips effective target bits for MSA will follow.

No cost model for autovectorization.
===============================
This  MSA support included does not include any cost model for use by autovectorization.

A basic cost model for MSA will follow with later patches that refine the cost model.

Graham 

[-- Attachment #2: msa.tgz --]
[-- Type: application/x-compressed, Size: 56921 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-21 13:50 [MIPS] Add sbasic supoert ffor MSA (SIMD) Graham Stott
@ 2014-05-21 17:59 ` Joseph S. Myers
  2014-05-28  8:03   ` Matthew Fortune
  2014-05-21 19:21 ` Richard Henderson
  2014-06-01 20:42 ` Richard Sandiford
  2 siblings, 1 reply; 11+ messages in thread
From: Joseph S. Myers @ 2014-05-21 17:59 UTC (permalink / raw)
  To: Graham Stott; +Cc: gcc-patches

On Wed, 21 May 2014, Graham Stott wrote:

> msa.h  is  included, which will be installed when configured/built/installed.
>  It provides prototypes, typedefs etc for the vector types used MSA its contents follow the MSA whitepaper.

Unless it's part of the defined interface that the user may not have 
macros called "vector_size", "aligned", "a", "b" and "c", you should use 
the __*__ attribute names, and __a etc. parameter names, to be 
namespace-clean.

You shouldn't need to declare __builtin_* functions anyway.  And if a 
function can be represented directly with GNU C vector extensions, it's 
preferred to implement it that way inline in the header rather than having 
built-in functions duplicating existing GNU C functionality.  (Look at 
what AArch64 arm_neon.h does where possible, and what ARM arm_neon.h has 
been moved towards lately.  I don't now what the msa.h functions do, so I 
don't know if this actually applies to any of them - but it's something to 
consider, so that built-in functions are only defined where actually 
needed.)

Use appropriate @dots{} and @minus{} markup in the documentation.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-21 13:50 [MIPS] Add sbasic supoert ffor MSA (SIMD) Graham Stott
  2014-05-21 17:59 ` Joseph S. Myers
@ 2014-05-21 19:21 ` Richard Henderson
  2014-06-01 20:42 ` Richard Sandiford
  2 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2014-05-21 19:21 UTC (permalink / raw)
  To: Graham Stott, gcc-patches

> +(define_expand "one_cmpl<mode>2"
> +  [(match_operand:IMSA 0 "register_operand")
> +   (match_operand:IMSA 1 "register_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  if (<MODE>mode == V16QImode)
> +    emit_insn (gen_msa_nori_b (operands[0], operands[1], const0_rtx));
> +  else
> +    {
> +      rtx reg = gen_reg_rtx (<MODE>mode);
> +      emit_insn (gen_msa_ldi<mode> (reg, const0_rtx));
> +      emit_insn (gen_msa_nor_v_<msafmt> (operands[0], reg, operands[1]));
> +    }
> +  DONE;
> +})

Surely ~(x | x) would be preferable to ~(x | 0) if you actually have to load 0
into a register.

And most definitely combine would prefer to see NOT instead of a complex
expression.  I think you're better off with a define_insn than an expand.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-21 17:59 ` Joseph S. Myers
@ 2014-05-28  8:03   ` Matthew Fortune
  2014-05-28  8:25     ` pinskia
  2014-05-28 14:28     ` Richard Earnshaw
  0 siblings, 2 replies; 11+ messages in thread
From: Matthew Fortune @ 2014-05-28  8:03 UTC (permalink / raw)
  To: Joseph S. Myers, Graham Stott
  Cc: gcc-patches, Ilie Garbacea, Rich Fuhler, Doug Gilmore

> You shouldn't need to declare __builtin_* functions anyway.  And if a
> function can be represented directly with GNU C vector extensions, it's
> preferred to implement it that way inline in the header rather than having
> built-in functions duplicating existing GNU C functionality.  (Look at
> what AArch64 arm_neon.h does where possible, and what ARM arm_neon.h has
> been moved towards lately.  I don't now what the msa.h functions do, so I
> don't know if this actually applies to any of them - but it's something to
> consider, so that built-in functions are only defined where actually
> needed.)

In the aarch64 arm_neon.h header there are a decent number of inline asm
implementations too instead of builtins. It is not immediately obvious to me
as to what the deciding factor is between adding a builtin and using inline
asm when vector extensions do not support the operation. Do you happen to
know why inline asm is used in places?

This looks like a reasonable idea to use GNU extensions where available. The
down-side to this approach is that it may be necessary to write quite
dis-similar headers for LLVM vs GCC which I think is part of the reason why
the header is written as it is. I don't know if that is a good reason to
require builtins or not though.

Regards,
Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-28  8:03   ` Matthew Fortune
@ 2014-05-28  8:25     ` pinskia
  2014-05-28 14:28     ` Richard Earnshaw
  1 sibling, 0 replies; 11+ messages in thread
From: pinskia @ 2014-05-28  8:25 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: Joseph S. Myers, Graham Stott, gcc-patches, Ilie Garbacea,
	Rich Fuhler, Doug Gilmore



On May 28, 2014, at 1:03 AM, Matthew Fortune <Matthew.Fortune@imgtec.com> wrote:

>> You shouldn't need to declare __builtin_* functions anyway.  And if a
>> function can be represented directly with GNU C vector extensions, it's
>> preferred to implement it that way inline in the header rather than having
>> built-in functions duplicating existing GNU C functionality.  (Look at
>> what AArch64 arm_neon.h does where possible, and what ARM arm_neon.h has
>> been moved towards lately.  I don't now what the msa.h functions do, so I
>> don't know if this actually applies to any of them - but it's something to
>> consider, so that built-in functions are only defined where actually
>> needed.)
> 
> In the aarch64 arm_neon.h header there are a decent number of inline asm
> implementations too instead of builtins. It is not immediately obvious to me
> as to what the deciding factor is between adding a builtin and using inline
> asm when vector extensions do not support the operation. Do you happen to
> know why inline asm is used in places?

Most likely simplify implementation at the time. Inline-asm is useless when it comes to scheduling code.  So the answer should be easy there. 


> 
> This looks like a reasonable idea to use GNU extensions where available. The
> down-side to this approach is that it may be necessary to write quite
> dis-similar headers for LLVM vs GCC which I think is part of the reason why
> the header is written as it is. I don't know if that is a good reason to
> require builtins or not though.

Well clang supports Opencl and Opencl got many of its vector behaviors from gcc. So I doubt it would be too hard for so ifdefs in there.

Thanks,
Andrew

> 
> Regards,
> Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-28  8:03   ` Matthew Fortune
  2014-05-28  8:25     ` pinskia
@ 2014-05-28 14:28     ` Richard Earnshaw
  2014-05-28 17:49       ` Mike Stump
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Earnshaw @ 2014-05-28 14:28 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: Joseph S. Myers, Graham Stott, gcc-patches, Ilie Garbacea,
	Rich Fuhler, Doug Gilmore

On 28/05/14 09:03, Matthew Fortune wrote:
>> You shouldn't need to declare __builtin_* functions anyway.  And if a
>> function can be represented directly with GNU C vector extensions, it's
>> preferred to implement it that way inline in the header rather than having
>> built-in functions duplicating existing GNU C functionality.  (Look at
>> what AArch64 arm_neon.h does where possible, and what ARM arm_neon.h has
>> been moved towards lately.  I don't now what the msa.h functions do, so I
>> don't know if this actually applies to any of them - but it's something to
>> consider, so that built-in functions are only defined where actually
>> needed.)
> 
> In the aarch64 arm_neon.h header there are a decent number of inline asm
> implementations too instead of builtins. It is not immediately obvious to me
> as to what the deciding factor is between adding a builtin and using inline
> asm when vector extensions do not support the operation. Do you happen to
> know why inline asm is used in places?
> 

Speed of implementation.  We're gradually replacing these with proper
builtins, but that takes a lot more work.

> This looks like a reasonable idea to use GNU extensions where available. The
> down-side to this approach is that it may be necessary to write quite
> dis-similar headers for LLVM vs GCC which I think is part of the reason why
> the header is written as it is. I don't know if that is a good reason to
> require builtins or not though.
> 

I regard these headers as part of the compiler.  In an ideal world the
contents of arm_neon.h should be replacable with

#pragma GCC neon_intrinsics

which would make parsing pretty much instantaneous.

It's never that simple, though; sadly :-(

R.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-28 14:28     ` Richard Earnshaw
@ 2014-05-28 17:49       ` Mike Stump
  2014-05-29  9:09         ` Matthew Fortune
  2014-05-29  9:39         ` Ramana Radhakrishnan
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Stump @ 2014-05-28 17:49 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Matthew Fortune, Joseph S. Myers, Graham Stott, gcc-patches,
	Ilie Garbacea, Rich Fuhler, Doug Gilmore

On May 28, 2014, at 7:27 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
> 
> Speed of implementation.  We're gradually replacing these with proper
> builtins, but that takes a lot more work.

As an owner of a port with more builtins that yours, I can offer a technological solution to reduce the cost of builtins to:

(define_builtin “my_stop"
  [
    (define_outputs [(void_operand 0)])
    (define_rtl_pattern “my_stop" [])
  ]
)

(define_insn “my_stop"
  [(unspec_volatile [(const_int 0)]
                    UNSPECV_STOP)]
  ""
  “stop”)

for example.  This creates the builtins, allows overloading, allows input/output parameters, can reorder operands, allows for complex types, allows memory reference parameters, allows pure markings, does vectors, conditional availability, generates documentation, creates test suites and more.  If you wire up a speaker it even sings.

Someone would have have to step forward with a need and some time to port their port over to the new scheme and help with the reason for why the technology should go in.  It is mostly contained in 5600 lines of self contained python code, and is built to solve the problem generally.  It adds about 800 lines to builtins.c.  It has a macro system that is more powerful than the macro system .md files use, so one gets to share and collapse builtins rather nicely.  It is known to work for C and C++.  Other languages may need extending; C for example cost is around 250 lines to support.

One promise, you will never have to create an argument list, or a type, for example here is a two output, type input functional instruction with some doc content:

(define_mode_iterator MYTYPE
        [V8QI V4HI V2SI DI ...])

(define_builtin “my_foo” "my_foo2_<type>"
  [
    (define_desc    “Doc string for operation")
    (define_outputs [(var_operand:T_MYTYPE 0)
                     (var_operand:T_MYTYPE 1)])
    (define_inputs  [(var_operand:T_MYTYPE 2)
                     (var_operand:T_MYTYPE 3)])
    (define_rtl_pattern “my_foo2_<mode>" [0 2 1 3])
    (attributes [pure])
  ]
)

I stripped it so you can’t know what the instruction was, but you get a flavor of multiple outputs, doc bits, pure, overloading, arguments and argument rearranging.

Let me know if you’re interested.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-28 17:49       ` Mike Stump
@ 2014-05-29  9:09         ` Matthew Fortune
  2014-06-03 18:57           ` Richard Sandiford
  2014-05-29  9:39         ` Ramana Radhakrishnan
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Fortune @ 2014-05-29  9:09 UTC (permalink / raw)
  To: Mike Stump, Richard Sandiford
  Cc: Joseph S. Myers, Graham Stott, gcc-patches, Ilie Garbacea,
	Rich Fuhler, Doug Gilmore, Richard Earnshaw

Mike Stump <mikestump@comcast.net> writes:
> On May 28, 2014, at 7:27 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
> >
> > Speed of implementation.  We're gradually replacing these with proper
> > builtins, but that takes a lot more work.
> 
> As an owner of a port with more builtins that yours, I can offer a
> technological solution to reduce the cost of builtins to:
> 
> (define_builtin "my_stop"
>   [
>     (define_outputs [(void_operand 0)])
>     (define_rtl_pattern "my_stop" [])
>   ]
> )
> 
> (define_insn "my_stop"
>   [(unspec_volatile [(const_int 0)]
>                     UNSPECV_STOP)]
>   ""
>   "stop")
> 
> for example.  This creates the builtins, allows overloading, allows
> input/output parameters, can reorder operands, allows for complex types,
> allows memory reference parameters, allows pure markings, does vectors,
> conditional availability, generates documentation, creates test suites and
> more.  If you wire up a speaker it even sings.
> 
> Someone would have have to step forward with a need and some time to port
> their port over to the new scheme and help with the reason for why the
> technology should go in.  It is mostly contained in 5600 lines of self
> contained python code, and is built to solve the problem generally.  It adds
> about 800 lines to builtins.c.  It has a macro system that is more powerful
> than the macro system .md files use, so one gets to share and collapse
> builtins rather nicely.  It is known to work for C and C++.  Other languages
> may need extending; C for example cost is around 250 lines to support.

Myself and others at IMG would be interested in reviewing/evaluating the
implementation and assuming it looks useful then we would of course help to
get it in shape for submission.
 
> One promise, you will never have to create an argument list, or a type, for
> example here is a two output, type input functional instruction with some
> doc content:
> 
> (define_mode_iterator MYTYPE
>         [V8QI V4HI V2SI DI ...])
> 
> (define_builtin "my_foo" "my_foo2_<type>"
>   [
>     (define_desc    "Doc string for operation")
>     (define_outputs [(var_operand:T_MYTYPE 0)
>                      (var_operand:T_MYTYPE 1)])
>     (define_inputs  [(var_operand:T_MYTYPE 2)
>                      (var_operand:T_MYTYPE 3)])
>     (define_rtl_pattern "my_foo2_<mode>" [0 2 1 3])
>     (attributes [pure])
>   ]
> )
> 
> I stripped it so you can't know what the instruction was, but you get a
> flavor of multiple outputs, doc bits, pure, overloading, arguments and
> argument rearranging.

Can you post the implementation as an RFC? I suspect the python aspect
will cause the most trouble as GCC builds do not currently require python
I guess that could change depending on the value added. Otherwise it would
be a rewrite I guess.

Before digging in too deep though it would be useful to know if RichardS
would be willing to consider this kind of thing for the MIPS port?

Regards,
Matthew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-28 17:49       ` Mike Stump
  2014-05-29  9:09         ` Matthew Fortune
@ 2014-05-29  9:39         ` Ramana Radhakrishnan
  1 sibling, 0 replies; 11+ messages in thread
From: Ramana Radhakrishnan @ 2014-05-29  9:39 UTC (permalink / raw)
  To: Mike Stump
  Cc: Richard Earnshaw, Matthew Fortune, Joseph S. Myers, Graham Stott,
	gcc-patches, Ilie Garbacea, Rich Fuhler, Doug Gilmore

On Wed, May 28, 2014 at 6:49 PM, Mike Stump <mikestump@comcast.net> wrote:
> On May 28, 2014, at 7:27 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>
>> Speed of implementation.  We're gradually replacing these with proper
>> builtins, but that takes a lot more work.
>
> As an owner of a port with more builtins that yours, I can offer a technological solution to reduce the cost of builtins to:
>
> (define_builtin “my_stop"
>   [
>     (define_outputs [(void_operand 0)])
>     (define_rtl_pattern “my_stop" [])
>   ]
> )
>
> (define_insn “my_stop"
>   [(unspec_volatile [(const_int 0)]
>                     UNSPECV_STOP)]
>   ""
>   “stop”)
>
> for example.  This creates the builtins, allows overloading, allows input/output parameters, can reorder operands, allows for complex types, allows memory reference parameters, allows pure markings, does vectors, conditional availability, generates documentation, creates test suites and more.  If you wire up a speaker it even sings.
>
> Someone would have have to step forward with a need and some time to port their port over to the new scheme and help with the reason for why the technology should go in.  It is mostly contained in 5600 lines of self contained python code, and is built to solve the problem generally.  It adds about 800 lines to builtins.c.  It has a macro system that is more powerful than the macro system .md files use, so one gets to share and collapse builtins rather nicely.  It is known to work for C and C++.  Other languages may need extending; C for example cost is around 250 lines to support.
>
> One promise, you will never have to create an argument list, or a type, for example here is a two output, type input functional instruction with some doc content:
>
> (define_mode_iterator MYTYPE
>         [V8QI V4HI V2SI DI ...])
>
> (define_builtin “my_foo” "my_foo2_<type>"
>   [
>     (define_desc    “Doc string for operation")
>     (define_outputs [(var_operand:T_MYTYPE 0)
>                      (var_operand:T_MYTYPE 1)])
>     (define_inputs  [(var_operand:T_MYTYPE 2)
>                      (var_operand:T_MYTYPE 3)])
>     (define_rtl_pattern “my_foo2_<mode>" [0 2 1 3])
>     (attributes [pure])
>   ]
> )
>
> I stripped it so you can’t know what the instruction was, but you get a flavor of multiple outputs, doc bits, pure, overloading, arguments and argument rearranging.
>
>
> Let me know if you’re interested.

This sounds interesting - could you post something for an RFC or in a
branch so that one can play with it ?

Ramana

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-21 13:50 [MIPS] Add sbasic supoert ffor MSA (SIMD) Graham Stott
  2014-05-21 17:59 ` Joseph S. Myers
  2014-05-21 19:21 ` Richard Henderson
@ 2014-06-01 20:42 ` Richard Sandiford
  2 siblings, 0 replies; 11+ messages in thread
From: Richard Sandiford @ 2014-06-01 20:42 UTC (permalink / raw)
  To: Graham Stott; +Cc: gcc-patches

Hi Graham,

Thanks for the patch.  I agree with what Richard and Joseph said.  Also...

I think it'd be better to keep the p5600 bits separate and wait until
the main p5600 patch has gone in.  It looks like it uses a different
naming scheme for the insn reservations.

Graham Stott <Graham.Stott@imgtec.com> writes:
> +(define_constraint "Un31"
> +  "@internal
> +   A replicated vector const in which the replicated value is negative
> +   integer number in range [-31,0]."
> +  (and (match_code "const_vector")
> +       (match_test "mips_const_vector_same_int_p (op, mode, -31, 0)")))
> +
> +(define_constraint "Up31"
> +  "@internal
> +   A replicated vector const in which the replicated value is positive
> +   integer number in range [0,31]."
> +  (and (match_code "const_vector")
> +       (match_test "mips_const_vector_same_int_p (op, mode, 0, 31)")))

The convention so far (and followed by the other constraints in the patch)
is for the number to be a bit count rather than a limit.  So "Unv5" and
"Uuv5" would probably be better ("u" for "unsigned").

> +;; Same as MODE128.  Used by vcond to iterate two modes.
> +(define_mode_iterator MSA_2     [V2DF V4SF V2DI V4SI V8HI V16QI])

"Same as MSA".

> +;; This attribute is used to form the MODE for reg_or_0_operand
> +;; constraint.
> +(define_mode_attr REGOR0
> +  [(V2DF "DF")
> +   (V4SF "SF")
> +   (V2DI "DI")
> +   (V4SI "SI")
> +   (V8HI "SI")
> +   (V16QI "SI")])

I still don't really like this part.  It's used only for two rvalues
and in both cases I think should be using UNITMODE instead.  If necessary,
we should take a lowpart subreg _before_ calling the gen_* routine.
As well as (IMO) being cleaner, it will tell the rtl optimisers that
bits above the unit mode are "don't care".

> +(define_expand "vec_extract<mode>"
> +  [(match_operand:<UNITMODE> 0 "register_operand")
> +   (match_operand:IMSA 1 "register_operand")
> +   (match_operand 2 "const_int_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  gcc_assert (UINTVAL (operands[2]) < GET_MODE_NUNITS (<MODE>mode));

Why not:

(define_expand "vec_extract<mode>"
  [(match_operand:<UNITMODE> 0 "register_operand")
   (match_operand:IMSA 1 "register_operand")
   (match_operand 2 "const_<indeximm>_operand")]
  "ISA_HAS_MSA"
{
...
}

without the assert?

> +      rtx dest1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_msa_copy_s_<msafmt> (dest1, operands[1], operands[2]));
> +      emit_move_insn (operands[0],
> +		      gen_lowpart (<UNITMODE>mode, dest1));

Why not just make copy_s assign directly in UNITMODE?

> +(define_expand "vec_extract<mode>"
> +  [(match_operand:<UNITMODE> 0 "register_operand")
> +   (match_operand:FMSA 1 "register_operand")
> +   (match_operand 2 "const_int_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  rtx temp;
> +  HOST_WIDE_INT val = UINTVAL (operands[2]);
> +
> +  gcc_assert (val < GET_MODE_NUNITS (<MODE>mode));

Looks like const_<indeximm>_operand would be better than an assert
here too.  Several other cases later; won't list them all.

> +      /* We need to do the SLDI operation in V16QImode and adjust
> +       * operand[2] accordingly.  */

No "*" at the beginning of the second line.

> +      rtx op2b  = GEN_INT (val * GET_MODE_SIZE (<UNITMODE>mode));

Should only be one space before "=".

> +      rtx res = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx temp1 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx temp2 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx xres = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx xop1 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx xop2 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +
> +      mips_expand_msa_vcond (res, true_val, false_val,
> +			     GET_CODE (operands[3]), operands[4], operands[5]);
> +      // Results in -1 or 0 so need to convert this to correct result for the
> +      // correct true/false given by operands[1]/operands[2] repectively.
> +      emit_move_insn (xres, res);
> +      if (operands[1] != true_val)
> +	{
> +	  emit_move_insn (xop1, operands[1]);
> +	  emit_insn (gen_and<MSA_2:mode_i>3 (temp1, xres, xop1));
> +	}
> +      else
> +	emit_move_insn (temp1, xres);

Should only create xop1 and xop2 if they're needed (i.e. in the
"if" arms), otherwise the register number goes to waste.  Why is
the temporary needed though?  The predicates require a register_operand
or true_val, so can't you use operands[1] directly?

> +      emit_move_insn (temp2, CONSTM1_RTX (<MSA_2:VIMODE>mode));
> +      emit_insn (gen_xor<MSA_2:mode_i>3 (temp2, xres, temp2));

Since there's a NOR instruction, I think we should have a (not ...)
pattern and use that instead of this sequence.

> +      if (operands[2] != false_val)
> +	{
> +	  emit_move_insn (xop2, operands[2]);
> +	  emit_insn (gen_and<MSA_2:mode_i>3 (temp2, temp2, xop2));
> +	}
> +      emit_insn (gen_ior<MSA_2:mode_i>3 (xres, temp1, temp2));
> +      emit_move_insn (operands[0], xres);

Same comment about needing xop2 here.

> +(define_expand "vcond<MSA_2:mode><MSA:mode>"
> +  [(set (match_operand:MSA_2 0 "register_operand")
> +	(if_then_else:MSA_2
> +	  (match_operator 3 ""
> +	    [(match_operand:MSA 4 "register_operand")
> +	     (match_operand:MSA 5 "register_operand")])
> +	  (match_operand:MSA_2 1 "reg_or_m1_operand")
> +	  (match_operand:MSA_2 2 "reg_or_0_operand")))]
> +  "ISA_HAS_MSA
> +   && (GET_MODE_NUNITS (<MSA_2:MODE>mode)
> +       == GET_MODE_NUNITS (<MSA:MODE>mode))"

The main C code for this pattern is a cut-&-paste of vcondu.  It'd be
better to factor it into a common routine.

> +(define_insn "msa_insert_<msafmt>"
> +  [(set (match_operand:IMSA 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<REGOR0> 3 "reg_or_0_operand" "dJ")]
> +		       UNSPEC_MSA_INSERT))]

<MODE> looks weird here.  Isn't that just IMSA?  I think we should
use IMSA consistently, so that the rhs of the (set ...) obviously
agrees with the lhs and so that operand 1 obviously agrees with
operand 0 (which it matches).

Same for other patterns where :<MODE> is used.

> +; Similar to msa_insert_<msafmt> but with <UNITMODE>mode for operand 3.
> +(define_insn "*msa_insert_<msafmt_f>"
> +  [(set (match_operand:MSA_3 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<UNITMODE> 3 "reg_or_0_operand" "dJ")]
> +		       UNSPEC_MSA_INSERT))]

As above, I don't think we want both these patterns.  We should just have
the <UNITMODE> one and if necessary take the <UNITMODE> subreg before
calling the gen_* pattern.

> +;; Note that insert.d and insert.d_f will be split later if !TARGET_64BIT.
> +
> +(define_split
> +  [(set (match_operand:V2DI 0 "register_operand")
> +	(unspec:V2DI [(match_operand:V2DI 1 "register_operand")
> +		      (match_operand 2 "const_0_or_1_operand")
> +		      (match_operand:DI 3 "reg_or_0_operand")]
> +		     UNSPEC_MSA_INSERT))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_insert_d (operands[0], operands[1], operands[2], operands[3]);
> +  DONE;
> +})

Comment seems misplaced; probably belongs above the insn definition instead.

> +(define_insn "msa_insve_<msafmt_f>"
> +  [(set (match_operand:MSA 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<MODE> 3 "register_operand" "f")]
> +		       UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.<msafmt>\t%w0[%2],%w3[0]"
> +  [(set_attr "type"     "arith")
> +   (set_attr "mode"     "TI")
> +   (set_attr "msa_execunit" "msa_eu_logic_l")])
> +
> +;; operand 3 is a scalar
> +(define_insn "msa_insve_<msafmt>_f_s"
> +  [(set (match_operand:FMSA 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<UNITMODE> 3 "register_operand" "f")]
> +		       UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.<msafmt>\t%w0[%2],%w3[0]"
> +  [(set_attr "type"     "arith")
> +   (set_attr "mode"     "TI")
> +   (set_attr "msa_execunit" "msa_eu_logic_l")])

Here too I think we just want the <UNITMODE> version, creating lowparts
where necessary.

> +;; Note that copy_s.d will be split later if !TARGET_64BIT.
> +;; Note that copy_s.d_f will be split later if !TARGET_64BIT.
> +
> +(define_split
> +  [(set (match_operand:DI 0 "register_operand")
> +	(unspec:DI [(match_operand:V2DI 1 "register_operand")
> +		    (match_operand 2 "const_0_or_1_operand")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_copy_d (operands[0], operands[1], operands[2], gen_msa_copy_s_w);
> +  DONE;
> +})
> +
> +(define_split
> +  [(set (match_operand:DF 0 "register_operand")
> +	(unspec:DF [(match_operand:V2DF 1 "register_operand")
> +		    (match_operand 2 "const_0_or_1_operand")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_copy_d (operands[0], operands[1], operands[2], gen_msa_copy_s_w);
> +  DONE;
> +})

Same misplaced comment.  Please use mode iterators to combine the splits.

> +(define_expand "vec_perm<mode>"
> +  [(match_operand:MSA 0 "register_operand")
> +   (match_operand:MSA 1 "register_operand")
> +   (match_operand:MSA 2 "register_operand")
> +   (match_operand:<VIMODE> 3 "register_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  /* Note that GCC always uses memory order (as big-endian) in indexing,
> +     and layouts operands[1] frist and then operands[2] next.
> +     However, vshf starts indexes from wt to ws, so so we need to swap
> +     two operands.  MSA loads or stores elements to or from the rightmost
> +     position of vector registers, for both big-endian and little-endian CPUs.
> +     No need to change any index numbers.  */
> +  emit_insn (gen_msa_vshf<mode> (operands[0], operands[3], operands[2],
> +				 operands[1]));
> +  DONE;
> +})

TBH I still prefer the wording I suggested in the previous review
to this version.

> +(define_insn "msa_lsa"
> + [(set (match_operand:SI 0 "register_operand" "=d")
> +       (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d")
> +			 (match_operand    2 "const_immlsa_operand" ""))
> +		(match_operand:SI 3 "register_operand" "d")))]
> + "ISA_HAS_LSA"
> + "lsa\t%0,%1,%3,%y2"
> + [(set_attr "type"      "arith")
> +  (set_attr "mode"      "SI")])

Seems like this should be a :P pattern, to cope with 64-bit addresses too.
Operand 2 should have a mode.

> +;; 128-bit integer/MSA vector registers moves
> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and J, so that move_type for J is fmove and its
> +;; instruction length can be 1.
> +(define_insn "movti_msa"
> +  [(set (match_operand:TI 0 "nonimmediate_operand" "=*d,*d,*d,*R,*d,*f,f,R,f,*m")
> +	(match_operand:TI 1 "move_operand" "*d,*i,*R,*d*J,*f,*d,R,f,fJ,*i*d"))]
> +  "ISA_HAS_MSA
> +   && !TARGET_64BIT
> +   && (register_operand (operands[0], TImode)
> +       || reg_or_0_operand (operands[1], TImode))"
> +  { return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type"	"move,const,load,store,mfc,mtc,fpload,fpstore,fmove,store")
> +   (set_attr "mode"     "TI")])

As I said in the first review:

  I don't understand the last sentence.  It looks like mips_output_move
  uses LDI.x for the f<-J case, and msa_ldi<mode> has type arith.
  I think these should be kept as separate alternatives.

Please address this.  The type of LDI.x should be consistent whether
it comes from msa_ldi<mode> or the move patterns.

Again from the first review:

  Why do we need to allow TImode in GPRs for 32-bit mode?  I think we should
  try to avoid that if at all possible.

Please address this too.  If we really do need TImode in GPRs for some
reason, please say what it is.

> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and J, so that move_type for J is fmove and its
> +;; instruction length can be 1.
> +(define_insn "movti_msa_64bit"
> +  [(set (match_operand:TI 0 "nonimmediate_operand" "=*d,*d,*d,*R,*a,*d,*d,*f,f,R,f,*m")
> +	(match_operand:TI 1 "move_operand" "*d,*i,*R,*d*J,*d*J,*a,*f,*d,R,f,fJ,*i*d"))]
> +  "ISA_HAS_MSA
> +   && TARGET_64BIT
> +   && (register_operand (operands[0], TImode)
> +       || reg_or_0_operand (operands[1], TImode))"
> +  { return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type" "move,const,load,store,mtlo,mflo,mfc,mtc,fpload,fpstore,fmove,store")
> +   (set_attr "mode" "TI")])

Rather than copying the comment from the previous pattern, maybe use
something like:

;; Similarly for 64-bit hosts.  In this case we also need to provide
;; alternatives for the accumulator registers.

> +(define_expand "mov<mode>"
> +  [(set (match_operand:MODE128 0)
> +	(match_operand:MODE128 1))]
> +  "TARGET_64BIT || TARGET_MSA"
> +{
> +  if (register_operand (operands[0], <MODE>mode)
> +      && mips_const_vector_same_int_p (operands[1], GET_MODE (operands[1]), -512, 511))
> +    {
> +      emit_insn (gen_msa_ldi<mode> (operands[0], CONST_VECTOR_ELT (operands[1], 0)));
> +      DONE;
> +    }
> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
> +    DONE;

Again from the first review:

  Why the "TARGET_64BIT || "?  Why do we want to allow these modes without
  MSA in that case?

Two lines longer than 80 chars.

> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and YG, so that move_type for YG is fmove and its
> +;; instruction length can be 1.
> +(define_insn "mov<mode>_msa"
> +  [(set (match_operand:MODE128 0 "nonimmediate_operand" "=f,f,R,R,*f,*d,*d,*d,*R")
> +	(match_operand:MODE128 1 "move_operand" "fYG,R,f,YG,*d,*f,*d*YG,*R,*d"))]
> +  "ISA_HAS_MSA
> +   && (register_operand (operands[0], <MODE>mode)
> +       || reg_or_0_operand (operands[1], <MODE>mode))"
> +{ return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type"	"fmove,fpload,fpstore,store,mtc,mfc,move,load,store")
> +   (set_attr "mode"     "TI")])

Also from the first review:

  Same question I suppose: why do we want to allow these vectors into GPRs?

I.e. why not restrict MSA vector modes to vector registers?  I don't
see any reason off-hand why we'd need to put those in GPRs.

> +(define_split
> +  [(set (match_operand:TI 0 "nonimmediate_operand")
> +	(match_operand:TI 1 "move_operand"))]
> +  "reload_completed && TARGET_MSA
> +   && mips_split_128bit_move_p (operands[0], operands[1])"
> +  [(const_int 0)]
> +{
> +  mips_split_128bit_move (operands[0], operands[1]);
> +  DONE;
> +})
> +
> +(define_split
> +  [(set (match_operand:MSA 0 "nonimmediate_operand")
> +	(match_operand:MSA 1 "move_operand"))]
> +  "reload_completed && TARGET_MSA
> +   && mips_split_128bit_move_p (operands[0], operands[1])"
> +  [(const_int 0)]
> +{
> +  mips_split_128bit_move (operands[0], operands[1]);
> +  DONE;
> +})

Also from the first review:

  Please instead add whatever needs to be done to the existing
  mips_split_move_insn_p/mips_split_move_insn pair.

I'm not sure whether this patch was supposed to address the previous
comments or not, but it probably isn't productive for me to just quote
what I said first time round.  If the patch is still WIP and you're just
asking for comments about particular parts then I'd be happy to review those,
but given the size of this patch, I'd rather not re-review the whole
thing until the points from earlier reviews have been sorted out.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
  2014-05-29  9:09         ` Matthew Fortune
@ 2014-06-03 18:57           ` Richard Sandiford
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Sandiford @ 2014-06-03 18:57 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: Mike Stump, Joseph S. Myers, Graham Stott, gcc-patches,
	Ilie Garbacea, Rich Fuhler, Doug Gilmore, Richard Earnshaw

Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Mike Stump <mikestump@comcast.net> writes:
>> On May 28, 2014, at 7:27 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> >
>> > Speed of implementation.  We're gradually replacing these with proper
>> > builtins, but that takes a lot more work.
>> 
>> As an owner of a port with more builtins that yours, I can offer a
>> technological solution to reduce the cost of builtins to:
>> 
>> (define_builtin "my_stop"
>>   [
>>     (define_outputs [(void_operand 0)])
>>     (define_rtl_pattern "my_stop" [])
>>   ]
>> )
>> 
>> (define_insn "my_stop"
>>   [(unspec_volatile [(const_int 0)]
>>                     UNSPECV_STOP)]
>>   ""
>>   "stop")
>> 
>> for example.  This creates the builtins, allows overloading, allows
>> input/output parameters, can reorder operands, allows for complex types,
>> allows memory reference parameters, allows pure markings, does vectors,
>> conditional availability, generates documentation, creates test suites and
>> more.  If you wire up a speaker it even sings.
>> 
>> Someone would have have to step forward with a need and some time to port
>> their port over to the new scheme and help with the reason for why the
>> technology should go in.  It is mostly contained in 5600 lines of self
>> contained python code, and is built to solve the problem generally.  It adds
>> about 800 lines to builtins.c.  It has a macro system that is more powerful
>> than the macro system .md files use, so one gets to share and collapse
>> builtins rather nicely.  It is known to work for C and C++.  Other languages
>> may need extending; C for example cost is around 250 lines to support.
>
> Myself and others at IMG would be interested in reviewing/evaluating the
> implementation and assuming it looks useful then we would of course help to
> get it in shape for submission.
>  
>> One promise, you will never have to create an argument list, or a type, for
>> example here is a two output, type input functional instruction with some
>> doc content:
>> 
>> (define_mode_iterator MYTYPE
>>         [V8QI V4HI V2SI DI ...])
>> 
>> (define_builtin "my_foo" "my_foo2_<type>"
>>   [
>>     (define_desc    "Doc string for operation")
>>     (define_outputs [(var_operand:T_MYTYPE 0)
>>                      (var_operand:T_MYTYPE 1)])
>>     (define_inputs  [(var_operand:T_MYTYPE 2)
>>                      (var_operand:T_MYTYPE 3)])
>>     (define_rtl_pattern "my_foo2_<mode>" [0 2 1 3])
>>     (attributes [pure])
>>   ]
>> )
>> 
>> I stripped it so you can't know what the instruction was, but you get a
>> flavor of multiple outputs, doc bits, pure, overloading, arguments and
>> argument rearranging.
>
> Can you post the implementation as an RFC? I suspect the python aspect
> will cause the most trouble as GCC builds do not currently require python
> I guess that could change depending on the value added. Otherwise it would
> be a rewrite I guess.
>
> Before digging in too deep though it would be useful to know if RichardS
> would be willing to consider this kind of thing for the MIPS port?

Yeah, it definitely sounds good in principle.

Richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-06-03 18:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-21 13:50 [MIPS] Add sbasic supoert ffor MSA (SIMD) Graham Stott
2014-05-21 17:59 ` Joseph S. Myers
2014-05-28  8:03   ` Matthew Fortune
2014-05-28  8:25     ` pinskia
2014-05-28 14:28     ` Richard Earnshaw
2014-05-28 17:49       ` Mike Stump
2014-05-29  9:09         ` Matthew Fortune
2014-06-03 18:57           ` Richard Sandiford
2014-05-29  9:39         ` Ramana Radhakrishnan
2014-05-21 19:21 ` Richard Henderson
2014-06-01 20:42 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).