md description for intruction that modifies multiple operands

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* md description for intruction that modifies multiple operands
@ 2003-05-25  2:34 ` Fred Fish
  2003-05-25  7:16   ` David Edelsohn
  2003-05-26 23:54   ` Fred Fish
  0 siblings, 2 replies; 17+ messages in thread
From: Fred Fish @ 2003-05-25  2:34 UTC (permalink / raw)
  To: gcc; +Cc: fnf

I'm working on a machine which has an instruction that modifies
multiple operands, with all the operands being both input and output
operands.  In addition, the operands have to be in sequential
registers.  The C interface is a builtin function call.

For the sake of discussion, assume that the instruction is called
"block4" and the C interface is used something like:

	int foo (int a, int b, int c, int d)
	{
	  __builtin_block4 (a, b, c, d);
	  __builtin_block4 (c, b, a, d);
	  return (a);
	}

The incomplete machine description entry for this instruction, which
is used when the compiler sees a call to __builtin_block, might look
something like:

	(define_insn "block4"
	  [(set (match_operand:SI 0 "register_operand" "=d")
	        (unspec:SI [(match_operand:SI 1 "register_operand" "d")
	                    (match_operand:SI 2 "register_operand" "d")
	                    (match_operand:SI 3 "register_operand" "d")] 123))]
	  "TARGET_FOO"
	  "block4\\t%0 # %0,%1,%2,%3")

There are a couple problems with this entry, and any advice on the
best way to fix them would be appreciated.

(1) The above entry only tells the compiler that the first operand is an output
operand, so I tried changing it to something like:

	(define_insn "block4"
	  [(set (match_operand:SI 0 "register_operand" "+d")
	        (unspec:SI [(match_operand:SI 1 "register_operand" "+d")
	                    (match_operand:SI 2 "register_operand" "+d")
	                    (match_operand:SI 3 "register_operand" "+d")] 123))]
	  "TARGET_FOO"
	  "block4\\t%0 # %0,%1,%2,%3")

to let it know that all operands are both read/write.  I'm not
entirely sure this is the correct approach.

(2) There doesn't appear to be any obvious way to tell the compiler
that the registers allocated to the operands have to be sequential.

Even harder though, there is another instruction where the registers
have to be "every fourth register", I.E. something like R(N), R(N+4),
R(N+8), R(N+12).  I'm pretty sure that satisfying this requirement is
something that is going to take a little work.  :-)

Thanks for any advice.

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-25  2:34 ` md description for intruction that modifies multiple operands Fred Fish
@ 2003-05-25  7:16   ` David Edelsohn
  2003-05-25 15:52     ` Fred Fish
  2003-05-26 23:54   ` Fred Fish
  1 sibling, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2003-05-25  7:16 UTC (permalink / raw)
  To: fnf; +Cc: gcc

	For the consecutive registers, you might allocate one TImode
instead of four SImode registers.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-25  7:16   ` David Edelsohn
@ 2003-05-25 15:52     ` Fred Fish
  0 siblings, 0 replies; 17+ messages in thread
From: Fred Fish @ 2003-05-25 15:52 UTC (permalink / raw)
  To: David Edelsohn; +Cc: fnf, gcc

> 	For the consecutive registers, you might allocate one TImode
> instead of four SImode registers.

What is the best way to associate the 4 independent variables to the
four chunks of the TImode?

What I posted was a gross simplification of the actual situation.  The
operands are actually a vector type of mode V16SI and we currently do
have a machine description that uses vec_concat to pack the operands
into a V64SI and vec_select to pull a V64SI apart into it's V16SI
parts.  But it's somewhat hairy, and in addition the compiler seems to
want to allocate 4 new registers for each concatenated operand, which
manifests itself as spill failures in reload when the register
pressure becomes too high.  I was hoping to be able to dump all that
extra cruft and simply allocate the individual operand registers in a
specific order.

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-25  2:34 ` md description for intruction that modifies multiple operands Fred Fish
  2003-05-25  7:16   ` David Edelsohn
@ 2003-05-26 23:54   ` Fred Fish
  2003-05-29 16:28     ` Eric Christopher
  1 sibling, 1 reply; 17+ messages in thread
From: Fred Fish @ 2003-05-26 23:54 UTC (permalink / raw)
  To: fnf; +Cc: gcc

> (1) The above entry only tells the compiler that the first operand is an output
> operand, so I tried changing it to something like:
> 
> 	(define_insn "block4"
> 	  [(set (match_operand:SI 0 "register_operand" "+d")
> 	        (unspec:SI [(match_operand:SI 1 "register_operand" "+d")
> 	                    (match_operand:SI 2 "register_operand" "+d")
> 	                    (match_operand:SI 3 "register_operand" "+d")] 123))]
> 	  "TARGET_FOO"
> 	  "block4\\t%0 # %0,%1,%2,%3")
> 
> to let it know that all operands are both read/write.  I'm not
> entirely sure this is the correct approach.

That didn't seem to work.  I also tried something like:

	(define_insn "block4"
	  [(set (match_operand:SI 0 "register_operand" "+d")
	        (unspec:SI [(match_operand:SI 1 "register_operand" "+d")
	                    (match_operand:SI 2 "register_operand" "+d")
	                    (match_operand:SI 3 "register_operand" "+d")] 123))
	   (set (match_dup 1) (unspec:SI [(match_dup 0) (match_dup 2) (match_dup 3)] 124))
	   (set (match_dup 2) (unspec:SI [(match_dup 0) (match_dup 1) (match_dup 3)] 125))
	   (set (match_dup 3) (unspec:SI [(match_dup 0) (match_dup 1) (match_dup 2)] 126))]
	  "TARGET_FOO"
	  "block4\\t%0 # %0,%1,%2,%3")

which works a little better, but still seems to have some problems
with the compiler recognizing that every operand is both an input and
output.

Am I totally off track here?

Thanks.

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-26 23:54   ` Fred Fish
@ 2003-05-29 16:28     ` Eric Christopher
  2003-05-29 17:27       ` Richard Earnshaw
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Christopher @ 2003-05-29 16:28 UTC (permalink / raw)
  To: fnf; +Cc: gcc


> That didn't seem to work.  I also tried something like:
> 
> 	(define_insn "block4"
> 	  [(set (match_operand:SI 0 "register_operand" "+d")
> 	        (unspec:SI [(match_operand:SI 1 "register_operand" "+d")
> 	                    (match_operand:SI 2 "register_operand" "+d")
> 	                    (match_operand:SI 3 "register_operand" "+d")] 123))
> 	   (set (match_dup 1) (unspec:SI [(match_dup 0) (match_dup 2) (match_dup 3)] 124))
> 	   (set (match_dup 2) (unspec:SI [(match_dup 0) (match_dup 1) (match_dup 3)] 125))
> 	   (set (match_dup 3) (unspec:SI [(match_dup 0) (match_dup 1) (match_dup 2)] 126))]
> 	  "TARGET_FOO"
> 	  "block4\\t%0 # %0,%1,%2,%3")
> 
> which works a little better, but still seems to have some problems
> with the compiler recognizing that every operand is both an input and
> output.
> 
> Am I totally off track here?

I don't think so. Though it looks like you might want to define a single
unspec number for the pattern and maybe use a parallel? *guesses*

-eric

-- 
Eric Christopher <echristo@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 16:28     ` Eric Christopher
@ 2003-05-29 17:27       ` Richard Earnshaw
  2003-05-29 17:30         ` Fred Fish
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Earnshaw @ 2003-05-29 17:27 UTC (permalink / raw)
  To: Eric Christopher; +Cc: fnf, gcc, Richard.Earnshaw

> 
> > That didn't seem to work.  I also tried something like:
> > 
> > 	(define_insn "block4"
> > 	  [(set (match_operand:SI 0 "register_operand" "+d")
> > 	        (unspec:SI [(match_operand:SI 1 "register_operand" "+d")
> > 	                    (match_operand:SI 2 "register_operand" "+d")
> > 	                    (match_operand:SI 3 "register_operand" "+d")] 123))
> > 	   (set (match_dup 1) (unspec:SI [(match_dup 0) (match_dup 2) (match_dup 3)] 124))
> > 	   (set (match_dup 2) (unspec:SI [(match_dup 0) (match_dup 1) (match_dup 3)] 125))
> > 	   (set (match_dup 3) (unspec:SI [(match_dup 0) (match_dup 1) (match_dup 2)] 126))]
> > 	  "TARGET_FOO"
> > 	  "block4\\t%0 # %0,%1,%2,%3")
> > 
> > which works a little better, but still seems to have some problems
> > with the compiler recognizing that every operand is both an input and
> > output.
> > 
> > Am I totally off track here?
> 
> I don't think so. Though it looks like you might want to define a single
> unspec number for the pattern and maybe use a parallel? *guesses*
> 

You are probably better off if you only use match_dup to match inputs to 
inputs and outputs to outputs.  Use tied register allocation for inputs to 
outputs.  Ties are best done using adjacent number pairs.  Hence something 
like:

 	(define_insn "block4"
 	  [(set (match_operand:SI 0 "register_operand" "=d")
 	        (unspec:SI [(match_operand:SI 3 "register_operand" "2")
 	                    (match_operand:SI 5 "register_operand" "4")
 	                    (match_operand:SI 7 "register_operand" "6")] 123))
 	   (set (match_operand:SI 2 "register_operand" "=d")
		 (unspec:SI [(match_operand:SI 1 "register_operand" "0")
			     (match_dup 5) (match_dup 7)] 124))
 	   (set (match_operand:SI 4 "register_operand" "=d")
		 (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 7)] 125))
 	   (set (match_operand:SI 6 "register_operand" "=d")
		 (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 7)] 126))]

 	  "TARGET_FOO"
 	  "block4\\t%0 # %0,%2,%4,%6")


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 17:27       ` Richard Earnshaw
@ 2003-05-29 17:30         ` Fred Fish
  2003-05-29 17:37           ` Richard Earnshaw
  0 siblings, 1 reply; 17+ messages in thread
From: Fred Fish @ 2003-05-29 17:30 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: Eric Christopher, fnf, gcc

>> I don't think so. Though it looks like you might want to define a single
>> unspec number for the pattern and maybe use a parallel? *guesses*

I thought each unspec had to have a unique number.

Where are you suggesting placing a "parallel"?

> You are probably better off if you only use match_dup to match inputs to
> inputs and outputs to outputs.  Use tied register allocation for inputs to
> outputs.  Ties are best done using adjacent number pairs.  Hence something
> like:

Thanks much for the example.  I didn't see anything in the docs about
"tied register allocation".  What specifically does this mean?  Is it
a way to get registers allocated in sequence?

Perhaps I should give a more realistic code example and *.md entry.

The hardware handles vectors of 512 bits each, which can be organized as
a 4x4 matrix of 16 32-bit ints.  We typedef a "matrix_t" to be a V16SI type.
Here is an actual code example:

  typedef int matrix_t __attribute__((__mode__(V16SI)));

  matrix_t foo (matrix_t t0, matrix_t t1, matrix_t t2, matrix_t t3)
  {
    __BLOCK4_M (t0, t1, t2, t3);
    return (t0);
  }

This example takes four matrix_t (V16SI) types as function arguments,
passed in hardware registers $m0, $m1, $m2, and $m3, for t0, t1, t2,
and t3 respectively.  The __BLOCK4_M builtin takes four matrix_t
operands, does some matrix arithmetic on them, and returns the results
left in the four operands.  One restriction is that the block4
operands have to be allocated to sequential hardware registers.

Here is the actual md file entry I put in based on your example:

  (define_insn "fm_block4"
    [(set (match_operand:V16SI 0 "register_operand" "=v")
          (unspec:V16SI [(match_operand:V16SI 3 "register_operand" "2")
                         (match_operand:V16SI 5 "register_operand" "4")
                         (match_operand:V16SI 7 "register_operand" "6")] 460))
     (set (match_operand:V16SI 2 "register_operand" "=v")
          (unspec:V16SI [(match_operand:V16SI 1 "register_operand" "0")
                         (match_dup 5) (match_dup 7)] 461))
     (set (match_operand:V16SI 4 "register_operand" "=v")
          (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 462))
     (set (match_operand:V16SI 6 "register_operand" "=v")
          (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 463))]
    "TARGET_FM"
    "block4.m\\t%0,%2,%4,%6"
    [(set_attr "type" "fm")])

For the above example, running "cc1 -da -O2 x.c" generates the
following rtl file and then the compiler gets a segfault due to the
set of a "(nil)".  BTW, first matrix hardware register is 176, first
pseudo reg is 200.  Note also I deleted some extraneous instructions
like NOTES:

  (insn 3 2 4 (nil) (set (reg/v:V16SI 206 [ t0 ])
          (reg:V16SI 176 $m0 [ t0 ])) -1 (nil)
      (nil))
  
  (insn 4 3 5 (nil) (set (reg/v:V16SI 207 [ t1 ])
          (reg:V16SI 177 $m1 [ t1 ])) -1 (nil)
      (nil))
  
  (insn 5 4 6 (nil) (set (reg/v:V16SI 208 [ t2 ])
          (reg:V16SI 178 $m2 [ t2 ])) -1 (nil)
      (nil))
  
  (insn 6 5 7 (nil) (set (reg/v:V16SI 209 [ t3 ])
          (reg:V16SI 179 $m3 [ t3 ])) -1 (nil)
      (nil))
  
  (insn 12 10 14 (nil) (parallel [
              (set (reg/v:V16SI 206 [ t0 ])
                  (unspec:V16SI [
                          (reg/v:V16SI 209 [ t3 ])
                          (reg/v:V16SI 209 [ t3 ])
                          (reg/v:V16SI 207 [ t1 ])
                      ] 460))
              (set (reg/v:V16SI 208 [ t2 ])
                  (unspec:V16SI [
                          (reg/v:V16SI 207 [ t1 ])
                          (reg/v:V16SI 209 [ t3 ])
                          (reg/v:V16SI 207 [ t1 ])
                      ] 461))
              (set (nil)
                  (unspec:V16SI [
                          (reg/v:V16SI 207 [ t1 ])
                          (reg/v:V16SI 209 [ t3 ])
                          (reg/v:V16SI 207 [ t1 ])
                      ] 462))
              (set (reg/v:V16SI 208 [ t2 ])
                  (unspec:V16SI [
                          (reg/v:V16SI 207 [ t1 ])
                          (reg/v:V16SI 209 [ t3 ])
                          (reg/v:V16SI 207 [ t1 ])
                      ] 463))
          ]) -1 (nil)
      (nil))
  
  (insn 16 15 17 (nil) (set (reg:V16SI 205 [ <result> ])
          (reg/v:V16SI 206 [ t0 ])) -1 (nil)
      (nil))
  
  (jump_insn 17 16 18 (nil) (set (pc)
          (label_ref 22)) -1 (nil)
      (nil))

I do much appreciate all the help.  I've been a gdb hacker for the
last 14 years and a gcc hacker for all of about 2 months.  :-)

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 17:30         ` Fred Fish
@ 2003-05-29 17:37           ` Richard Earnshaw
  2003-05-29 17:41             ` Richard Earnshaw
  2003-05-29 17:53             ` Fred Fish
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Earnshaw @ 2003-05-29 17:37 UTC (permalink / raw)
  To: fnf; +Cc: Richard.Earnshaw, Eric Christopher, gcc

> >> I don't think so. Though it looks like you might want to define a single
> >> unspec number for the pattern and maybe use a parallel? *guesses*
> 
> I thought each unspec had to have a unique number.

That was probably a cut-and-paste error on my part.  But an unspec is just 
a black-box operation to the compiler.  The number is only there as a 
discriminator to resolve potential ambiguities.  So two insn matches that 
insns of the form

insn_a: (set x (unspec [(y) (z)] 0))
insn_b: (set a (unspec [(b)] 0))

is not illegal, though it is bad style.

> 
> Where are you suggesting placing a "parallel"?

A define_insn is an implicit parallel when there are multiple statements.  
 Only define expand needs an explicit parallel.

> 
> > You are probably better off if you only use match_dup to match inputs to
> > inputs and outputs to outputs.  Use tied register allocation for inputs to
> > outputs.  Ties are best done using adjacent number pairs.  Hence something
> > like:
> 
> Thanks much for the example.  I didn't see anything in the docs about
> "tied register allocation".  What specifically does this mean?  Is it
> a way to get registers allocated in sequence?

It's a way to ensure that an input operand is allocated to the same 
register as an output operand.  Look for "0 in constraint" in the 
documentation (the machine description section).

> 
> Perhaps I should give a more realistic code example and *.md entry.
> 
> The hardware handles vectors of 512 bits each, which can be organized as
> a 4x4 matrix of 16 32-bit ints.  We typedef a "matrix_t" to be a V16SI type.
> Here is an actual code example:
> 
>   typedef int matrix_t __attribute__((__mode__(V16SI)));
> 
>   matrix_t foo (matrix_t t0, matrix_t t1, matrix_t t2, matrix_t t3)
>   {
>     __BLOCK4_M (t0, t1, t2, t3);
>     return (t0);
>   }
> 
> This example takes four matrix_t (V16SI) types as function arguments,
> passed in hardware registers $m0, $m1, $m2, and $m3, for t0, t1, t2,
> and t3 respectively.  The __BLOCK4_M builtin takes four matrix_t
> operands, does some matrix arithmetic on them, and returns the results
> left in the four operands.  One restriction is that the block4
> operands have to be allocated to sequential hardware registers.
> 
> Here is the actual md file entry I put in based on your example:
> 
>   (define_insn "fm_block4"
>     [(set (match_operand:V16SI 0 "register_operand" "=v")
>           (unspec:V16SI [(match_operand:V16SI 3 "register_operand" "2")
>                          (match_operand:V16SI 5 "register_operand" "4")
>                          (match_operand:V16SI 7 "register_operand" "6")] 460))
>      (set (match_operand:V16SI 2 "register_operand" "=v")
>           (unspec:V16SI [(match_operand:V16SI 1 "register_operand" "0")
>                          (match_dup 5) (match_dup 7)] 461))
>      (set (match_operand:V16SI 4 "register_operand" "=v")
>           (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 462))
>      (set (match_operand:V16SI 6 "register_operand" "=v")
>           (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 463))]
>     "TARGET_FM"
>     "block4.m\\t%0,%2,%4,%6"
>     [(set_attr "type" "fm")])
> 
> For the above example, running "cc1 -da -O2 x.c" generates the
> following rtl file and then the compiler gets a segfault due to the
> set of a "(nil)".  BTW, first matrix hardware register is 176, first
> pseudo reg is 200.  Note also I deleted some extraneous instructions
> like NOTES:
> 
>   (insn 3 2 4 (nil) (set (reg/v:V16SI 206 [ t0 ])
>           (reg:V16SI 176 $m0 [ t0 ])) -1 (nil)
>       (nil))
>   
>   (insn 4 3 5 (nil) (set (reg/v:V16SI 207 [ t1 ])
>           (reg:V16SI 177 $m1 [ t1 ])) -1 (nil)
>       (nil))
>   
>   (insn 5 4 6 (nil) (set (reg/v:V16SI 208 [ t2 ])
>           (reg:V16SI 178 $m2 [ t2 ])) -1 (nil)
>       (nil))
>   
>   (insn 6 5 7 (nil) (set (reg/v:V16SI 209 [ t3 ])
>           (reg:V16SI 179 $m3 [ t3 ])) -1 (nil)
>       (nil))
>   
>   (insn 12 10 14 (nil) (parallel [
>               (set (reg/v:V16SI 206 [ t0 ])
>                   (unspec:V16SI [
>                           (reg/v:V16SI 209 [ t3 ])
>                           (reg/v:V16SI 209 [ t3 ])
>                           (reg/v:V16SI 207 [ t1 ])
>                       ] 460))
>               (set (reg/v:V16SI 208 [ t2 ])
>                   (unspec:V16SI [
>                           (reg/v:V16SI 207 [ t1 ])
>                           (reg/v:V16SI 209 [ t3 ])
>                           (reg/v:V16SI 207 [ t1 ])
>                       ] 461))
>               (set (nil)
>                   (unspec:V16SI [
>                           (reg/v:V16SI 207 [ t1 ])
>                           (reg/v:V16SI 209 [ t3 ])
>                           (reg/v:V16SI 207 [ t1 ])
>                       ] 462))
>               (set (reg/v:V16SI 208 [ t2 ])
>                   (unspec:V16SI [
>                           (reg/v:V16SI 207 [ t1 ])
>                           (reg/v:V16SI 209 [ t3 ])
>                           (reg/v:V16SI 207 [ t1 ])
>                       ] 463))
>           ]) -1 (nil)
>       (nil))
>   
>   (insn 16 15 17 (nil) (set (reg:V16SI 205 [ <result> ])
>           (reg/v:V16SI 206 [ t0 ])) -1 (nil)
>       (nil))
>   
>   (jump_insn 17 16 18 (nil) (set (pc)
>           (label_ref 22)) -1 (nil)
>       (nil))
> 

This looks like an expansion problem.  How are you calling 
gen_fm_block4()?  You need to pass 8 arguments to it now, something like

	gen_fm_block4(t0, t0, t1, t1, t2, t2, t3, t3);

> I do much appreciate all the help.  I've been a gdb hacker for the
> last 14 years and a gcc hacker for all of about 2 months.  :-)

I'm nearer the reverse.  Expect me to call in the favour sometime :-)

R.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 17:37           ` Richard Earnshaw
@ 2003-05-29 17:41             ` Richard Earnshaw
  2003-05-29 17:49               ` Fred Fish
  2003-05-29 17:53             ` Fred Fish
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Earnshaw @ 2003-05-29 17:41 UTC (permalink / raw)
  To: fnf; +Cc: Richard.Earnshaw, Eric Christopher, gcc

>  So two insn matches that insns of the form

Err, I should re-read what I write before I post :-(

Take that as:

So two insn matchers that match insns of the form:

> 
> insn_a: (set x (unspec [(y) (z)] 0))
> insn_b: (set a (unspec [(b)] 0))
> 
> is not illegal, though it is bad style.
> 

Note also that it is regarded as good style now do define constants in the 
MD file for unspec operations.  Hence on ARM we now have

(define_constants
  [(UNSPEC_SIN       0) ; `sin' operation (MODE_FLOAT):
                        ;   operand 0 is the result,
                        ;   operand 1 the parameter.
   (UNPSEC_COS       1) ; `cos' operation (MODE_FLOAT):
                        ;   operand 0 is the result,
                        ;   operand 1 the parameter.
   (UNSPEC_PUSH_MULT 2) ; `push multiple' operation:
                        ;   operand 0 is the first register,
                        ;   subsequent registers are in parallel (use ...)
                        ;   expressions.
   (UNSPEC_PIC_SYM   3) ; A symbol that has been treated properly for pic
                        ;   usage, that is, we will add the pic_register
                        ;   value to it before trying to dereference it.
   (UNSPEC_PIC_BASE  4) ; Adding the PC value to the offset to the
                        ;   GLOBAL_OFFSET_TABLE.  The operation is fully
                        ;   described by the RTL but must be wrapped to
                        ;   prevent combine from trying to rip it apart.
   (UNSPEC_PRLG_STK  5) ; A special barrier that prevents frame accesses 
                        ;   being scheduled before the stack adjustment 
insn.
...
])

Then 

(define_insn "pic_load_addr_arm"
  [(set (match_operand:SI 0 "s_register_operand" "=r")
        (unspec:SI [(match_operand:SI 1 "" "mX")] UNSPEC_PIC_SYM))]
  "TARGET_ARM && flag_pic"
  "ldr%?\\t%0, %1"
  [(set_attr "type" "load")
   (set (attr "pool_range")     (const_int 4096))
   (set (attr "neg_pool_range") (const_int 4084))]
)

This reduces the risk of getting two unspecs that conflict in their 
discriminator field.

R.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 17:41             ` Richard Earnshaw
@ 2003-05-29 17:49               ` Fred Fish
  0 siblings, 0 replies; 17+ messages in thread
From: Fred Fish @ 2003-05-29 17:49 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: fnf, Eric Christopher, gcc

> Note also that it is regarded as good style now do define constants in the 
> MD file for unspec operations.  Hence on ARM we now have

I've been putting that off since we need about 450 of them and
there have been quite a few changes since we mostly mechanically
generated all the templates.  Certainly before we submit our
changes to the baseline gcc we'll fix that.  :-)

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 17:37           ` Richard Earnshaw
  2003-05-29 17:41             ` Richard Earnshaw
@ 2003-05-29 17:53             ` Fred Fish
  2003-05-29 18:03               ` Fred Fish
  2003-05-30 10:43               ` Richard Earnshaw
  1 sibling, 2 replies; 17+ messages in thread
From: Fred Fish @ 2003-05-29 17:53 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: fnf, Eric Christopher, gcc

> >      (set (match_operand:V16SI 4 "register_operand" "=v")
> >           (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 462))
> >      (set (match_operand:V16SI 6 "register_operand" "=v")
> >           (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 463))]

BTW, after seeing the new RTL, I realized that the last "set" above
should probably be:

   (set (match_operand:V16SI 6 "register_operand" "=v")
        (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 5)] 463))]

Note the change from 7 -> 5.  Hopefully this is correct.

> This looks like an expansion problem.  How are you calling 
> gen_fm_block4()?  You need to pass 8 arguments to it now, something like
> 
> 	gen_fm_block4(t0, t0, t1, t1, t2, t2, t3, t3);

That was the problem.  I fixed it and the generated code for the example
is now:

 foo:
        j       $31
        block4.m        $m0,$m1,$m2,$m3

which is completely optimal.  The function args are passed in m0
through m3, the block4 is called with them in the right order, and the
function returns with the result left in m0.

However, I'm not clear on whether or not the template guarantees that
the register allocation will be sequential.  I suspect not.  So we may
still have the problem of training the register allocator to ensure
that the operands to the block4.m instruction are always some
sequential set of four registers out of the possible 16 (m0-m15).

I won't even try to think yet about the block4v instruction, which
requires a set like {m0,m4,m8,m12} or {m1,m5,m9,m13}.  :-(

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 17:53             ` Fred Fish
@ 2003-05-29 18:03               ` Fred Fish
  2003-05-29 18:41                 ` Eric Christopher
  2003-05-30 12:12                 ` Richard Earnshaw
  2003-05-30 10:43               ` Richard Earnshaw
  1 sibling, 2 replies; 17+ messages in thread
From: Fred Fish @ 2003-05-29 18:03 UTC (permalink / raw)
  To: fnf; +Cc: Richard.Earnshaw, Eric Christopher, gcc

> However, I'm not clear on whether or not the template guarantees that
> the register allocation will be sequential.  I suspect not.  So we may
> still have the problem of training the register allocator to ensure
> that the operands to the block4.m instruction are always some
> sequential set of four registers out of the possible 16 (m0-m15).

OK, I answered this myself by trying the example code:

  typedef int matrix_t __attribute__((__mode__(V16SI)));

  matrix_t foo (matrix_t t0, matrix_t t1, matrix_t t2, matrix_t t3)
  {
    __BLOCK4_M (t0, t1, t2, t3);
    __BLOCK4_M (t3, t2, t1, t0);
    return (t0);
  }

which generated:

foo:
        block4.m        $m0,$m1,$m2,$m3
        j       $31
        block4.m        $m3,$m2,$m1,$m0

when what it needed to do was to shuffle the contents of m0 through m3
out of the first block4 into a new set of registers, or the same set
using a temporary and some register swaps.

Oh well.  This is still a huge improvement over our first cut, which
uses an ugly set of intermediate instructions that do vector
concatenations to get a V64SI type for the unspec for block4 (which
then takes just one V64SI operand), and then a bunch more vector
splits to pick the result apart as needed.  The register allocator
does know to put a V64SI type in four sequential matrix registers,
each of which holds an V16SI type.

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 18:03               ` Fred Fish
@ 2003-05-29 18:41                 ` Eric Christopher
  2003-05-30 12:12                 ` Richard Earnshaw
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Christopher @ 2003-05-29 18:41 UTC (permalink / raw)
  To: fnf; +Cc: Richard.Earnshaw, gcc


> when what it needed to do was to shuffle the contents of m0 through m3
> out of the first block4 into a new set of registers, or the same set
> using a temporary and some register swaps.
> 

Right. This is because gcc allocates in a linear order according to
REG_ALLOC_ORDER... I don't think you can depend on it allocating
particular numbers at all. :(

-eric

-- 
Eric Christopher <echristo@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 17:53             ` Fred Fish
  2003-05-29 18:03               ` Fred Fish
@ 2003-05-30 10:43               ` Richard Earnshaw
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Earnshaw @ 2003-05-30 10:43 UTC (permalink / raw)
  To: fnf; +Cc: Richard.Earnshaw, Eric Christopher, gcc

> > >      (set (match_operand:V16SI 4 "register_operand" "=v")
> > >           (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 462))
> > >      (set (match_operand:V16SI 6 "register_operand" "=v")
> > >           (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 463))]
> 
> BTW, after seeing the new RTL, I realized that the last "set" above
> should probably be:
> 
>    (set (match_operand:V16SI 6 "register_operand" "=v")
>         (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 5)] 463))]
> 
> Note the change from 7 -> 5.  Hopefully this is correct.
> 
> > This looks like an expansion problem.  How are you calling 
> > gen_fm_block4()?  You need to pass 8 arguments to it now, something like
> > 
> > 	gen_fm_block4(t0, t0, t1, t1, t2, t2, t3, t3);
> 
> That was the problem.  I fixed it and the generated code for the example
> is now:
> 
>  foo:
>         j       $31
>         block4.m        $m0,$m1,$m2,$m3
> 
> which is completely optimal.  The function args are passed in m0
> through m3, the block4 is called with them in the right order, and the
> function returns with the result left in m0.
> 
> However, I'm not clear on whether or not the template guarantees that
> the register allocation will be sequential.  I suspect not.  So we may
> still have the problem of training the register allocator to ensure
> that the operands to the block4.m instruction are always some
> sequential set of four registers out of the possible 16 (m0-m15).

There's no way to do this, unfortunately.  ARM has a similar problem with 
the load-multiple operations.  In that case a bit set of registers to load 
is encoded in the instruction and the marked registers are filled 
sequentially from memory from the lowest numbered register at the lowest 
address.  We work around this by using specific hard registers for that 
pattern and then using peepholes for spotting a few cases 
opportunistically.  Take a look at the movstrqi pattern in arm.md if you 
want some ideas.

> 
> I won't even try to think yet about the block4v instruction, which
> requires a set like {m0,m4,m8,m12} or {m1,m5,m9,m13}.  :-(
> 

Equally impossible for the same reasons, and maybe more ;-(

R.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-29 18:03               ` Fred Fish
  2003-05-29 18:41                 ` Eric Christopher
@ 2003-05-30 12:12                 ` Richard Earnshaw
  2003-05-30 17:33                   ` Fred Fish
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Earnshaw @ 2003-05-30 12:12 UTC (permalink / raw)
  To: fnf; +Cc: Richard.Earnshaw, Eric Christopher, gcc

> > However, I'm not clear on whether or not the template guarantees that
> > the register allocation will be sequential.  I suspect not.  So we may
> > still have the problem of training the register allocator to ensure
> > that the operands to the block4.m instruction are always some
> > sequential set of four registers out of the possible 16 (m0-m15).
> 
> OK, I answered this myself by trying the example code:
> 
>   typedef int matrix_t __attribute__((__mode__(V16SI)));
> 
>   matrix_t foo (matrix_t t0, matrix_t t1, matrix_t t2, matrix_t t3)
>   {
>     __BLOCK4_M (t0, t1, t2, t3);
>     __BLOCK4_M (t3, t2, t1, t0);
>     return (t0);
>   }
> 
> which generated:
> 
> foo:
>         block4.m        $m0,$m1,$m2,$m3
>         j       $31
>         block4.m        $m3,$m2,$m1,$m0
> 
> when what it needed to do was to shuffle the contents of m0 through m3
> out of the first block4 into a new set of registers, or the same set
> using a temporary and some register swaps.
> 
> Oh well.  This is still a huge improvement over our first cut, which
> uses an ugly set of intermediate instructions that do vector
> concatenations to get a V64SI type for the unspec for block4 (which
> then takes just one V64SI operand), and then a bunch more vector
> splits to pick the result apart as needed.  The register allocator
> does know to put a V64SI type in four sequential matrix registers,
> each of which holds an V16SI type.

If that's the case, then you might be able to make something like the 
following work:

	(set (subreg:V16SI (reg:V64SI tmp1) 0) (reg:V16SI t0))
	(set (subreg:V16SI (reg:V64SI tmp1) 1) (reg:V16SI t1))
	(set (subreg:V16SI (reg:V64SI tmp1) 2) (reg:V16SI t2))
	(set (subreg:V16SI (reg:V64SI tmp1) 3) (reg:V16SI t3))

 	(parallel [(set (reg:V64SI tmp2) (unspec:V64SI [(reg:V64SI tmp1)] 
VEC_MMUL))
		   (use (subreg:V16SI (reg:V64SI tmp2) 0))
		   (use (subreg:V16SI (reg:V64SI tmp2) 1))
		   (use (subreg:V16SI (reg:V64SI tmp2) 2))
		   (use (subreg:V16SI (reg:V64SI tmp2) 3))])

	(set (reg:V16SI t0) (subreg:V16SI (reg:V64SI tmp2) 0))
	(set (reg:V16SI t1) (subreg:V16SI (reg:V64SI tmp2) 1))
	(set (reg:V16SI t2) (subreg:V16SI (reg:V64SI tmp2) 2))
	(set (reg:V16SI t3) (subreg:V16SI (reg:V64SI tmp2) 3))

Where each line (except the body of the parallel) is a separate 
instruction.  Hopefully the register allocator will be able to eliminate 
most of the move instructions by operating directly on the subregs.  
Remember to use operand_subword() to generate the SUBREGs or you might run 
into problems with subregs of subregs...

You would then have a pattern to match the parallel something like

(define_insn "*fm_block4_body"
  (set (match_operand:V64SI 0 "register_operand" "=d")
       (unspec:V64SI 1 "register_operand" "0"))
  (use (match_operand:V16SI 2 "register_operand" "X"))
  (use (match_operand:V16SI 3 "register_operand" "X"))
  (use (match_operand:V16SI 4 "register_operand" "X"))
  (use (match_operand:V16SI 5 "register_operand" "X"))
  ""
  "block4.m\t%2, %3, %4, %5")

Note that the USE operands use the X constraint letter.  This should cause 
the register allocator to ignore these parts of the instruction for 
register allocation purposes.  This shouldn't matter since they are tied 
in the pattern to use tmp2 and exist solely so that we can extract the 
subregs of the V64SI register that finally gets allocated.

Finally, if the above seems to work but produce poor code, then try it 
with the new register allocator (-fnew-ra).

Note, I haven't tried any of the above.  So you will have to experiment.  
There's a possibility that the above will fail completely with the 
register renumbering pass at -O3.

R.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
  2003-05-30 12:12                 ` Richard Earnshaw
@ 2003-05-30 17:33                   ` Fred Fish
  0 siblings, 0 replies; 17+ messages in thread
From: Fred Fish @ 2003-05-30 17:33 UTC (permalink / raw)
  To: Richard.Earnshaw; +Cc: fnf, Eric Christopher, gcc

I do appreciate all the input I've gotten so far.  It has helped to
clarify that the hardest part of this problem is the requirement for
sequential register allocation.  Given that, perhaps we can simplify
how we use concatenated vector types.

> If that's the case, then you might be able to make something like the 
> following work ...

I've read the docs on "subreg" and looked at some of the code
templates in the mips.md file and it does look promising, though I'm
not sure if the vec_select that we use does essentially the same thing
or not.

Perhaps it would be useful to more closely examine the specific case
of the block2 instruction, which is similar to block4, but we only
have to deal with two matrix_t operands.  Here is a source code
example:

	typedef int matrix_t __attribute__((__mode__(V16SI)));

	matrix_t foo (matrix_t t0, matrix_t t1)
	{
	  __BLOCK2_M (t0, t1);
	  return (t0);
	}

When compiled with our current implementation, this produces the
rather ugly code below, even though optimization (-O2) has been used:

    foo:
        set.m.m $m2,$m0    # 10   fm_block2_concat/1  [length = 8]
        set.m.m $m3,$m1
        set.m.m $m1,$m3    # 34   movv32si_regreg     [length = 8]
        set.m.m $m0,$m2
        block2.m $m0       # 11   fm_block2_internal  [length = 4]
        j       $31        # 37   return              [length = 4]
        set.m.m $m0,$m0    # 24   fm_block2_split0/1  [length = 4]

My interpretation of what the compiler has done is:

	(1) The vec_concat instruction allocates a V32SI type
	and generates the code to copy the component parts into
	the register pair m2/m3 allocated to hold the V32SI.
	Perhaps this copying would go way if we could use subreg?

          set.m.m $m2,$m0    # 10   fm_block2_concat/1  [length = 8]
          set.m.m $m3,$m1

	(2) Copy the V32SI to another V32SI as input to the block2.
	Not sure why??

          set.m.m $m1,$m3    # 34   movv32si_regreg     [length = 8]
          set.m.m $m0,$m2

	(3) Expand code for the block2, leaving the output V32SI
	in m0/m1

          block2.m $m0       # 11   fm_block2_internal  [length = 4]

	(4) Split out the V16SI part of the V32SI that we want to
	return to the caller.  (Note this can be eliminated)

          set.m.m $m0,$m0    # 24   fm_block2_split0/1  [length = 4]

Here are the relevant md file entries, starting with the define_expand
that is used to generate the initial RTL for the __BLOCK2_M builtin:

	(define_expand "fm_block2"
	  [(set (match_dup:V32SI 2)
	        (vec_concat:V32SI (match_operand:V16SI 0 "move_operand" "")
	                          (match_operand:V16SI 1 "move_operand" "")))
	   (set (match_dup:V32SI 2)
	        (unspec:V32SI [(match_dup:V32SI 2)] 281 ))
	   (set (match_dup:V16SI 0)
	        (vec_select:V16SI (match_dup:V32SI 2) (parallel [(const_int 0)])))
	   (set (match_dup:V16SI 1)
	        (vec_select:V16SI (match_dup:V32SI 2) (parallel [(const_int 1)])))]
	  "TARGET_FM"
	  "{ operands[2] = gen_reg_rtx (V32SImode); }")

	(define_insn "fm_block2_concat"
	  [(set (match_operand:V32SI 0 "register_operand" "=&w,&w")
	        (vec_concat:V32SI (match_operand:V16SI 1 "move_operand" "v,m")
	                          (match_operand:V16SI 2 "move_operand" "v,m")))]
	  "TARGET_FM"
	  "@
	   set.m.m\\t%H0,%1\;set.m.m\\t%I0,%2
	   load.m\\t%H0,%1\;load.m\\t%I0,%2"
	  [(set_attr "type" "fm")
	   (set_attr "length" "8,8")])
	
	(define_insn "movv32si_regreg"
	  [(set (match_operand:V32SI 0 "register_operand" "=w")
	        (match_operand:V32SI 1 "register_operand" "w"))]
	  "TARGET_FM"
	  "set.m.m\\t%L0,%L1\;set.m.m\\t%M0,%M1"
	  [(set_attr "type" "fm")
	   (set_attr "length" "8")])
	
	(define_insn "fm_block2_internal"
	  [(set (match_operand:V32SI 0 "register_operand" "=w" )
	        (unspec:V32SI [(match_operand:V32SI 1 "register_operand" "0")] 281))]
	  "TARGET_FM"
	  "block2.m\\t%0"
	  [(set_attr "type" "fm")])
	
	(define_insn "fm_block2_split0"
	  [(set (match_operand:V16SI 0 "nonimmediate_operand" "=v,m")
	        (vec_select:V16SI (match_operand:V32SI 1 "register_operand" "w,w") (parallel [(const_int 0)])))]
	  "TARGET_FM"
	  "@
	   set.m.m\\t%0,%H1
	   store.m\\t%H1,%0"
	  [(set_attr "type" "fm")])
	
	(define_insn "fm_block2_split1"
	  [(set (match_operand:V16SI 0 "nonimmediate_operand" "=v,m")
	        (vec_select:V16SI (match_operand:V32SI 1 "register_operand" "w,w") (parallel [(const_int 1)])))]
	  "TARGET_FM"
	  "@
	   set.m.m\\t%0,%I1
	   store.m\\t%I1,%0"
	  [(set_attr "type" "fm")])
	
Any suggestions on how to improve this implementation would be
appreciated.  If using subreg can eliminate some of the explicit
packing/unpacking of V16SI and V32SI types that would be great.

-Fred

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: md description for intruction that modifies multiple operands
@ 2003-05-30 15:14 Richard Kenner
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Kenner @ 2003-05-30 15:14 UTC (permalink / raw)
  To: fnf; +Cc: gcc

    I thought each unspec had to have a unique number.

Not necessarily.  They have to be used in a unique *way*, though.  In other
words, there's no requirement for an UNSPEC that's only used in a PLUS to
have a distinct number from one that's only used in a MINUS.

However, it's much clear from a documentation point of view if they are
unique, of course, and since you're not going to run out of possible value,
there's no reason *not* to make them unique.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2003-05-30 17:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fnf@public.ninemoons.com>
2003-05-25  2:34 ` md description for intruction that modifies multiple operands Fred Fish
2003-05-25  7:16   ` David Edelsohn
2003-05-25 15:52     ` Fred Fish
2003-05-26 23:54   ` Fred Fish
2003-05-29 16:28     ` Eric Christopher
2003-05-29 17:27       ` Richard Earnshaw
2003-05-29 17:30         ` Fred Fish
2003-05-29 17:37           ` Richard Earnshaw
2003-05-29 17:41             ` Richard Earnshaw
2003-05-29 17:49               ` Fred Fish
2003-05-29 17:53             ` Fred Fish
2003-05-29 18:03               ` Fred Fish
2003-05-29 18:41                 ` Eric Christopher
2003-05-30 12:12                 ` Richard Earnshaw
2003-05-30 17:33                   ` Fred Fish
2003-05-30 10:43               ` Richard Earnshaw
2003-05-30 15:14 Richard Kenner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).