[Bug target/55295] New: [SH] Add support for fipr instruction

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/55295] New: [SH] Add support for fipr instruction
@ 2012-11-12 22:29 olegendo at gcc dot gnu.org
  2012-11-12 22:39 ` [Bug target/55295] " olegendo at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-11-12 22:29 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

             Bug #: 55295
           Summary: [SH] Add support for fipr instruction
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: olegendo@gcc.gnu.org
            Target: sh4*-*-*


Created attachment 28671
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28671
Example combine patterns

On SH4* targets there is a currently unused instruction 'fipr' which can be
used to calculate the dot product of two V4SF vectors:

fipr  FVm, FVn
FR(n+3) = FR(m+0)*FR(n+0) + FR(m+1)*FR(n+1) + FR(m+2)*FR(n+2) + FR(m+3)*FR(n+3)

Some (C++) code that could utilize this:

typedef float v4sf __attribute__ ((vector_size (16)));

float test00 (const v4sf& a, const v4sf& b)
{
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
}

float test01 (const v4sf& a, const v4sf& b, const v4sf& c)
{
  float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
  float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3];
  return x + y;
}

float test02 (float a0, float a1, float a2, float a3,
         float b0, float b1, float b2, float b3)
{
  return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3;
}

float test03 (const float* a, const float* b)
{
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
}

Dot products of vectors with 3 elements could also be handled by the fipr insn
by setting the irrelevant element to 0.0 in one of the vector operands.  For 2
element vectors an fmul,fmac sequence seems to be adequate (which already
works).

I've tried adding some combine patterns to handle the V2SF case (see
attachment), but the results are not so convincing.  For example, the case

float test02 (float a0, float a1, float a2, float a3,
         float b0, float b1, float b2, float b3)
{
  return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3;
}

compiled with -O2 -m4-single -mb results in:

        fmov.s  fr12,@-r15      ! 42    movsf_ie/7    [length = 2]
        fmov.s  fr13,@-r15      ! 43    movsf_ie/7    [length = 2]
        fmov.s  fr14,@-r15      ! 44    movsf_ie/7    [length = 2]
        fmov.s  fr15,@-r15      ! 45    movsf_ie/7    [length = 2]
        fmov    fr9,fr12        ! 31    movsf_ie/1    [length = 2]
        fmov    fr8,fr13        ! 32    movsf_ie/1    [length = 2]
        fmov    fr11,fr14       ! 33    movsf_ie/1    [length = 2]
        fmov    fr10,fr15       ! 34    movsf_ie/1    [length = 2]
        fmov    fr5,fr0         ! 27    movsf_ie/1    [length = 2]
        fmov    fr4,fr1         ! 28    movsf_ie/1    [length = 2]
        fmov    fr7,fr2         ! 29    movsf_ie/1    [length = 2]
        fmov    fr6,fr3         ! 30    movsf_ie/1    [length = 2]
        fipr    fv12,fv0        ! 35    fipr_compact    [length = 2]
        fmov.s  @r15+,fr15      ! 50    movsf_ie/6    [length = 2]
        fmov.s  @r15+,fr14      ! 51    movsf_ie/6    [length = 2]
        fmov    fr3,fr0         ! 36    movsf_ie/1    [length = 2]
        fmov.s  @r15+,fr13      ! 52    movsf_ie/6    [length = 2]
        rts                     ! 54    *return_i    [length = 2]
        fmov.s  @r15+,fr12      ! 53    movsf_ie/6    [length = 2]

which actually is supposed to be:

        fipr    fv4,fv8
        rts
        fmov    fr11,fr0



Also, in the case of

float test01 (const v4sf& a, const v4sf& b, const v4sf& c)
{
  float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
  float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3];
  return x + y;
}

only one fipr insn is generated, due to various other optimization effects.

It seems there is no standard name pattern for doing FP vector dot products
yet.  
I guess it would be better to also have some tree-optimization support for
this.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
@ 2012-11-12 22:39 ` olegendo at gcc dot gnu.org
  2013-03-04 16:24 ` turkeyman at gmail dot com
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-11-12 22:39 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-11-12 22:39:27 UTC ---
I forgot to mention that at least there should be a target specific built-in
function to generate the fipr insn.  There is already a SHmedia built-in for
that, so adding one for SH4* shouldn't be a big deal.  However, ideally the
compiler would discover fipr opportunities by itself (when compiling with
-ffast-math).


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
  2012-11-12 22:39 ` [Bug target/55295] " olegendo at gcc dot gnu.org
@ 2013-03-04 16:24 ` turkeyman at gmail dot com
  2013-03-04 21:51 ` olegendo at gcc dot gnu.org
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: turkeyman at gmail dot com @ 2013-03-04 16:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

Manu Evans <turkeyman at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |turkeyman at gmail dot com

--- Comment #2 from Manu Evans <turkeyman at gmail dot com> 2013-03-04 16:22:29 UTC ---
+1

I'm seeing the same pattern.
Infact, I'm noticing a lot of my maths code seems to be performing a lot of
redundant moves.

Are there actually any builtins/intrinsics available for the SH4?
How do I access the awesome vector operations without breaking out the inline
asm?

It would be nice to have some intrinsics that understand vectors as sequences
of 4 float regs, and automate a sequential (vector) load.

Also, the ftrv opcode doesn't seem to be accessible either.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
  2012-11-12 22:39 ` [Bug target/55295] " olegendo at gcc dot gnu.org
  2013-03-04 16:24 ` turkeyman at gmail dot com
@ 2013-03-04 21:51 ` olegendo at gcc dot gnu.org
  2013-03-05  1:55 ` turkeyman at gmail dot com
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-03-04 21:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-03-04 21:50:58 UTC ---
(In reply to comment #2)
> +1
> 
> I'm seeing the same pattern.
> Infact, I'm noticing a lot of my maths code seems to be performing a lot of
> redundant moves.

Some examples would be great regarding this matter, although I can already
imagine what the code looks like.  One of the problems is the auto-inc-dec pass
(see PR 50749).  A long time ago the rule of thumb for SH4 programmers was
"read float values with post-inc addressing in your C code, and write float
values with pre-dec addressing".  This does not work anymore, since all memory
accesses are turned into array like index based addresses internally in the
compiler.  Then the auto-inc-dec RTL pass is supposed to find post-inc and
pre-dec addressing mode opportunities, but it fails to do so in most cases.
I have started writing a replacement RTL pass that would try to optimize
addressing mode selections.  I hope to get it in for GCC 4.9.

Anyway, if you have some example code that you can share, it would be really
appreciated and helpful during development for testing purposes.

> Are there actually any builtins/intrinsics available for the SH4?
> How do I access the awesome vector operations without breaking out the inline
> asm?

There aren't that many HW vector ops on SH4, just fipr and ftrv.  At the
moment, there are no builtins for those, so you'd have to use inline asm
intrinsics.  Like I mentioned in comment #1, I'd rather make the compiler
figure out opportunities from portable generic code.  Although for ftrv the
patterns might be a bit .... complicated, also because the compiler then has to
manage the 2nd FPU regs bank...

> It would be nice to have some intrinsics that understand vectors as sequences
> of 4 float regs, and automate a sequential (vector) load.

That would be the job of the address-mode-selection RTL pass.  It would also
improve overall code quality on SH.  The fastest way to load 4 float vectors is
to use 2x fmov.d.  The compiler could also do that automatically, but this
requires FPSCR switching, which unfortunately also needs some rework (e.g. see
PR 53513, PR 6526).

And on top of that, we also have PR 13423.  It seems that the proper fix for
this is a new reworked (vector) ABI for SH.

> 
> Also, the ftrv opcode doesn't seem to be accessible either.

True.  I really hope that I'll find enough time to brush up SH FPU code
generation for GCC 4.9.  Until then, I'd suggest to use inline-asm style
intrinsics.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2013-03-04 21:51 ` olegendo at gcc dot gnu.org
@ 2013-03-05  1:55 ` turkeyman at gmail dot com
  2013-03-05 12:28 ` olegendo at gcc dot gnu.org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: turkeyman at gmail dot com @ 2013-03-05  1:55 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #4 from Manu Evans <turkeyman at gmail dot com> 2013-03-05 01:55:08 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> > +1
> > 
> > I'm seeing the same pattern.
> > Infact, I'm noticing a lot of my maths code seems to be performing a lot of
> > redundant moves.
> 
> Some examples would be great regarding this matter, although I can already
> imagine what the code looks like.  One of the problems is the auto-inc-dec pass
> (see PR 50749).  A long time ago the rule of thumb for SH4 programmers was
> "read float values with post-inc addressing in your C code, and write float
> values with pre-dec addressing".  This does not work anymore, since all memory
> accesses are turned into array like index based addresses internally in the
> compiler.  Then the auto-inc-dec RTL pass is supposed to find post-inc and
> pre-dec addressing mode opportunities, but it fails to do so in most cases.
> I have started writing a replacement RTL pass that would try to optimize
> addressing mode selections.  I hope to get it in for GCC 4.9.
> 
> Anyway, if you have some example code that you can share, it would be really
> appreciated and helpful during development for testing purposes.
> 
> > Are there actually any builtins/intrinsics available for the SH4?
> > How do I access the awesome vector operations without breaking out the inline
> > asm?
> 
> There aren't that many HW vector ops on SH4, just fipr and ftrv.  At the
> moment, there are no builtins for those, so you'd have to use inline asm
> intrinsics.  Like I mentioned in comment #1, I'd rather make the compiler
> figure out opportunities from portable generic code.  Although for ftrv the
> patterns might be a bit .... complicated, also because the compiler then has to
> manage the 2nd FPU regs bank...
>
> > It would be nice to have some intrinsics that understand vectors as sequences
> > of 4 float regs, and automate a sequential (vector) load.
> 
> That would be the job of the address-mode-selection RTL pass.  It would also
> improve overall code quality on SH.  The fastest way to load 4 float vectors is
> to use 2x fmov.d.  The compiler could also do that automatically, but this
> requires FPSCR switching, which unfortunately also needs some rework (e.g. see
> PR 53513, PR 6526).
> 
> And on top of that, we also have PR 13423.  It seems that the proper fix for
> this is a new reworked (vector) ABI for SH.

Well I hope you find the time for all this, the (small) sh4 community will love
you! :)

Why is a new ABI important?


> > Also, the ftrv opcode doesn't seem to be accessible either.
> 
> True.  I really hope that I'll find enough time to brush up SH FPU code
> generation for GCC 4.9.  Until then, I'd suggest to use inline-asm style
> intrinsics.

4.9? That sounds like it could be years off... :(

I'm not sure what you mean by 'inline-asm style intrinsics'?
Last time I used inline-asm blocks in GCC it totally broke the optimisation. It
wouldn't reorder across inline-asm blocks, and it couldn't eliminate any
redundant load/stores appearing within the block in the event the value was
already resident.

Can you give me a small demonstration of what you mean?
I found whenever I touch inline-asm, the block just grows and grows in scope
upwards until my whole tight routine is written in asm... but that was some
years back, GCC3 era.


I'll report examples here as I find compelling situations.

But on a tangent, can you explain this behaviour? It's really ruining my code:

float testfunc(float v, float v2)
{
    return v*v2 + v;
}

Compiled with: -O3 -mfused-madd

testfunc:
.LFB1:
    .cfi_startproc
    mov.l    .L3,r1      ;
    lds.l    @r1+,fpscr  ; <- why does it mess with fpscr?
    add    #-4,r1
    fmov    fr5,fr0
    add    #4,r1       ; <- +4 after -4... redundant?
    fmac    fr0,fr4,fr0
    rts    
    lds.l    @r1+,fpscr
.L4:
    .align 2
.L3:
    .long    __fpscr_values
    .cfi_endproc

There's a lot of rubbish in there... I expect:

testfunc:
.LFB1:
    .cfi_startproc
    fmov    fr5,fr0
    fmac    fr0,fr4,fr0
    rts    
    .cfi_endproc


I'm also noticing that -ffast-math is inhibiting fmac emission in some cases:

Compiled with: -O3 -mfused-madd -ffast-math

testfunc:
.LFB1:
    .cfi_startproc
    mov.l    .L3,r1
    lds.l    @r1+,fpscr
    fldi1    fr0         ; what is a 1.0 doing here?
    add    #-4,r1
    add    #4,r1
    fadd    fr4,fr0     ; v+1 ??
    fmul    fr5,fr0     ; (v+1)*v2 ?? That's not what the code does...
    rts    
    lds.l    @r1+,fpscr

What's going on there? That doesn't even look correct...

Cheers!


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2013-03-05  1:55 ` turkeyman at gmail dot com
@ 2013-03-05 12:28 ` olegendo at gcc dot gnu.org
  2013-03-05 12:53 ` turkeyman at gmail dot com
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-03-05 12:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-03-05 12:28:22 UTC ---
(In reply to comment #4)
> 
> Why is a new ABI important?
> 

Because currently, there is no way to pass something like

struct { float x, y, z, w };

as function arguments in registers, although the default SH ABI could allow
passing up to 3 of such vectors.  The same applies to

typedef float v4sf __attribute__ ((vector_size (16)));

or 

std::array<float, 4>

However, code that does that will be incompatible with existing calling
conventions etc, thus a new (additional and optional) ABI.

> 4.9? That sounds like it could be years off... :(

4.8 is about to be released soon.  4.9 should follow at around the same time
next year.  Of course you can still grab the current development version and
use it anytime.

> 
> I'm not sure what you mean by 'inline-asm style intrinsics'?

Something like:

static inline void* get_gbr (void) throw ()
{
  void* retval;
  __asm__ volatile ("stc gbr, %0" : "=r" (retval) : );
  return retval;
}

> Last time I used inline-asm blocks in GCC it totally broke the optimisation. It
> wouldn't reorder across inline-asm blocks, and it couldn't eliminate any
> redundant load/stores appearing within the block in the event the value was
> already resident.
> 
> Can you give me a small demonstration of what you mean?
> I found whenever I touch inline-asm, the block just grows and grows in scope
> upwards until my whole tight routine is written in asm... but that was some
> years back, GCC3 era.
> 

Yes, there are some limits of what the compiler can do with an asm block.  It
won't analyze the contents of the asm block, only the placeholders.  Thus it
usually can't eliminate redundant loads/stores.

> 
> I'll report examples here as I find compelling situations.
> 
> But on a tangent, can you explain this behaviour? It's really ruining my code:
> 
> float testfunc(float v, float v2)
> {
>     return v*v2 + v;
> }
> 
> Compiled with: -O3 -mfused-madd
> 
> testfunc:
> .LFB1:
>     .cfi_startproc
>     mov.l    .L3,r1      ;
>     lds.l    @r1+,fpscr  ; <- why does it mess with fpscr?
>     add    #-4,r1
>     fmov    fr5,fr0
>     add    #4,r1       ; <- +4 after -4... redundant?
>     fmac    fr0,fr4,fr0
>     rts    
>     lds.l    @r1+,fpscr
> .L4:
>     .align 2
> .L3:
>     .long    __fpscr_values
>     .cfi_endproc
> 
> There's a lot of rubbish in there... I expect:
> 
> testfunc:
> .LFB1:
>     .cfi_startproc
>     fmov    fr5,fr0
>     fmac    fr0,fr4,fr0
>     rts    
>     .cfi_endproc
> 

The fpscr value is changed because its default setting is to operate on
double-precision float values.  This is the default configuration of the
compiler.  You can change it by using e.g. -m4-single, which will assume that
FPSCR setting is configured for single-precision at function entry/return.

The +4 -4 thing is a known problem and stems from the fact that the FPSCR
load/store insns are available only as post-inc/pre-dec.

> 
> I'm also noticing that -ffast-math is inhibiting fmac emission in some cases:
> 
> Compiled with: -O3 -mfused-madd -ffast-math
> 
> testfunc:
> .LFB1:
>     .cfi_startproc
>     mov.l    .L3,r1
>     lds.l    @r1+,fpscr
>     fldi1    fr0         ; what is a 1.0 doing here?
>     add    #-4,r1
>     add    #4,r1
>     fadd    fr4,fr0     ; v+1 ??
>     fmul    fr5,fr0     ; (v+1)*v2 ?? That's not what the code does...
>     rts    
>     lds.l    @r1+,fpscr
> 
> What's going on there? That doesn't even look correct...

The transformation is legitimate, although unlucky, since using fmac would be
better in this case.

The original expression 'v*v2 + v' is converted to '(1 + v2)*v' and that's what
the code does.  Probably you compiled for little endian and got confused by the
floating point register ordering for arguments.  It goes like ...
fr5 = arg 0
fr4 = arg 1
fr7 = arg 2
fr6 = arg 3
...

This is another reason for adding a new ABI, BTW.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2013-03-05 12:28 ` olegendo at gcc dot gnu.org
@ 2013-03-05 12:53 ` turkeyman at gmail dot com
  2013-03-06  1:05 ` olegendo at gcc dot gnu.org
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: turkeyman at gmail dot com @ 2013-03-05 12:53 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #6 from Manu Evans <turkeyman at gmail dot com> 2013-03-05 12:53:26 UTC ---
Awesome, thanks for the info and help!

Strange -m4-single won't work with my toolchain, it says 'not compatible with
this configuration' >_<

Looking forward to all these fixes! :)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2013-03-05 12:53 ` turkeyman at gmail dot com
@ 2013-03-06  1:05 ` olegendo at gcc dot gnu.org
  2013-03-13 18:21 ` olegendo at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-03-06  1:05 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #7 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-03-06 01:05:14 UTC ---
(In reply to comment #5)
> > 
> > I'm also noticing that -ffast-math is inhibiting fmac emission in some cases:
> > 
> > Compiled with: -O3 -mfused-madd -ffast-math
> > 
> > testfunc:
> > .LFB1:
> >     .cfi_startproc
> >     mov.l    .L3,r1
> >     lds.l    @r1+,fpscr
> >     fldi1    fr0         ; what is a 1.0 doing here?
> >     add    #-4,r1
> >     add    #4,r1
> >     fadd    fr4,fr0     ; v+1 ??
> >     fmul    fr5,fr0     ; (v+1)*v2 ?? That's not what the code does...
> >     rts    
> >     lds.l    @r1+,fpscr
> > 
> > What's going on there? That doesn't even look correct...
> 
> The transformation is legitimate, although unlucky, since using fmac would be
> better in this case.
> 

I've opened a new PR 56547 for this issue.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2013-03-06  1:05 ` olegendo at gcc dot gnu.org
@ 2013-03-13 18:21 ` olegendo at gcc dot gnu.org
  2014-12-07 23:49 ` olegendo at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-03-13 18:21 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #8 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-03-13 18:21:37 UTC ---
(In reply to comment #5)
> 
> This is another reason for adding a new ABI, BTW.

Just for the record, I've opened a new PR 56592 for this.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2013-03-13 18:21 ` olegendo at gcc dot gnu.org
@ 2014-12-07 23:49 ` olegendo at gcc dot gnu.org
  2014-12-09 22:37 ` olegendo at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-07 23:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #28671|0                           |1
        is obsolete|                            |

--- Comment #9 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 34213
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34213&action=edit
Combine patterns for matching fipr

An updated patch for trunk.  As for the redundant fp moves and/or ferries
through fpul, those seem to be caused by the lack of various vec_* patterns. 
See also PR 13423.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2014-12-07 23:49 ` olegendo at gcc dot gnu.org
@ 2014-12-09 22:37 ` olegendo at gcc dot gnu.org
  2015-03-01 19:06 ` olegendo at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-09 22:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #10 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #9)
> Created attachment 34213 [details]
> Combine patterns for matching fipr
> 
> An updated patch for trunk.  As for the redundant fp moves and/or ferries
> through fpul, those seem to be caused by the lack of various vec_* patterns.
> See also PR 13423.

An alternative pattern for the core fipr insn could be:

(define_insn "fipr_compact"
  [(set (match_operand:V4SF 0 "fp_arith_reg_operand" "=f")
    (vec_concat:V4SF
      (vec_concat:V2SF
        (vec_select:SF (match_operand:V4SF 1 "fp_arith_reg_operand" "%0")
               (parallel [(const_int 0)]))
        (vec_select:SF (match_dup 1) (parallel [(const_int 1)])))
      (vec_concat:V2SF
        (vec_select:SF (match_dup 1) (parallel [(const_int 2)]))
        (plus:SF
          (plus:SF (vec_select:SF (mult:V4SF (match_dup 1)
                             (match_operand:V4SF 2
                           "fp_arith_reg_operand" "f"))
                      (parallel [(const_int 0)]))
               (vec_select:SF (mult:V4SF (match_dup 1) (match_dup 2))
                      (parallel [(const_int 1)])))
          (plus:SF (vec_select:SF (mult:V4SF (match_dup 1) (match_dup 2))
                      (parallel [(const_int 2)]))
               (vec_select:SF (mult:V4SF (match_dup 1) (match_dup 2))
                      (parallel [(const_int 3)])))))))
   (clobber (reg:SI FPSCR_STAT_REG))
   (use (reg:SI FPSCR_MODES_REG))]
  "TARGET_SH4"
  "fipr    %2,%0"
  [(set_attr "type" "fp")
   (set_attr "fp_mode" "single")])

However, I'm not sure whether register allocation understands this properly. 
Matching fipr insn during combine has other issues, such as v4sf register
construction from individual sf values.  Before investigating the issue at
combine level, playing along with the vectorizer seems more promising.  For
that vector load/store patterns need to be added (PR 13423) first.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2014-12-09 22:37 ` olegendo at gcc dot gnu.org
@ 2015-03-01 19:06 ` olegendo at gcc dot gnu.org
  2015-03-02  0:17 ` turkeyman at gmail dot com
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-03-01 19:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #11 from Oleg Endo <olegendo at gcc dot gnu.org> ---
A note on the side...
As mentioned above, fipr can also be used to do a 3D dot product.  However,
GCC's vector extensions do not allow specifying vectors of length 3.  To
support that I guess the easiest way is to do it with a bunch of combine
patterns which canonicalize/split into the vector version.  Last time I've
tried, register allocation was pretty bad for that as mentioned in comment #10.
 Probably it will require some specific pre-allocation before RA.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2015-03-01 19:06 ` olegendo at gcc dot gnu.org
@ 2015-03-02  0:17 ` turkeyman at gmail dot com
  2015-03-02  9:00 ` olegendo at gcc dot gnu.org
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: turkeyman at gmail dot com @ 2015-03-02  0:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #12 from Manu Evans <turkeyman at gmail dot com> ---
Hey, I'm still following this with great interest.

Is it possible to make an intrinsic for this instruction so it can be issued at
will?

What I'm still more interested in at this point, would be some support for
passing vectors in registers, making it possible to eliminate so much of that
fmov noise.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2015-03-02  0:17 ` turkeyman at gmail dot com
@ 2015-03-02  9:00 ` olegendo at gcc dot gnu.org
  2023-03-21  9:38 ` kazade at gmail dot com
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-03-02  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #13 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Manu Evans from comment #12)
> Hey, I'm still following this with great interest.
> 
> Is it possible to make an intrinsic for this instruction so it can be issued
> at will?

Yes, that's what I wanted to do.  Automatic detection of the fipr insn would be
restricted to relaxed FP math (e.g. -ffast-math), because it has reduced FP
precision.  So it makes sense to add a __builtin_sh_fipr.


> What I'm still more interested in at this point, would be some support for
> passing vectors in registers, making it possible to eliminate so much of
> that fmov noise.

See also PR 56592, PR 13423, PR 64305.

Unfortunately those are not so trivial to solve and I have little time at the
moment.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2015-03-02  9:00 ` olegendo at gcc dot gnu.org
@ 2023-03-21  9:38 ` kazade at gmail dot com
  2023-03-21  9:43 ` olegendo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: kazade at gmail dot com @ 2023-03-21  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

Luke Benstead <kazade at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kazade at gmail dot com

--- Comment #14 from Luke Benstead <kazade at gmail dot com> ---
Was there a particular reason why this patch wasn't merged? It would be really
cool to see GCC generate fipr like it does fsrra etc. 

Is there anything I can do to help?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-03-21  9:38 ` kazade at gmail dot com
@ 2023-03-21  9:43 ` olegendo at gcc dot gnu.org
  2023-03-21 11:46 ` kazade at gmail dot com
  2023-03-21 11:52 ` olegendo at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-03-21  9:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #15 from Oleg Endo <olegendo at gcc dot gnu.org> ---
It's been too long since I've looked into it.  Maybe some middle-end parts got
more suitable over the time, but it was difficult to make it generate the fipr
instruction automatically due to the reasons stated above.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-03-21  9:43 ` olegendo at gcc dot gnu.org
@ 2023-03-21 11:46 ` kazade at gmail dot com
  2023-03-21 11:52 ` olegendo at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: kazade at gmail dot com @ 2023-03-21 11:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #16 from Luke Benstead <kazade at gmail dot com> ---
OK so perhaps adding __builtin_sh_fipr is a good first step?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/55295] [SH] Add support for fipr instruction
  2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-03-21 11:46 ` kazade at gmail dot com
@ 2023-03-21 11:52 ` olegendo at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-03-21 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #17 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Luke Benstead from comment #16)
> OK so perhaps adding __builtin_sh_fipr is a good first step?

Yeah, you can try and see if it produces any useful results for you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-03-21 11:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
2012-11-12 22:39 ` [Bug target/55295] " olegendo at gcc dot gnu.org
2013-03-04 16:24 ` turkeyman at gmail dot com
2013-03-04 21:51 ` olegendo at gcc dot gnu.org
2013-03-05  1:55 ` turkeyman at gmail dot com
2013-03-05 12:28 ` olegendo at gcc dot gnu.org
2013-03-05 12:53 ` turkeyman at gmail dot com
2013-03-06  1:05 ` olegendo at gcc dot gnu.org
2013-03-13 18:21 ` olegendo at gcc dot gnu.org
2014-12-07 23:49 ` olegendo at gcc dot gnu.org
2014-12-09 22:37 ` olegendo at gcc dot gnu.org
2015-03-01 19:06 ` olegendo at gcc dot gnu.org
2015-03-02  0:17 ` turkeyman at gmail dot com
2015-03-02  9:00 ` olegendo at gcc dot gnu.org
2023-03-21  9:38 ` kazade at gmail dot com
2023-03-21  9:43 ` olegendo at gcc dot gnu.org
2023-03-21 11:46 ` kazade at gmail dot com
2023-03-21 11:52 ` olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).