public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/55295] New: [SH] Add support for fipr instruction
@ 2012-11-12 22:29 olegendo at gcc dot gnu.org
  2012-11-12 22:39 ` [Bug target/55295] " olegendo at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-11-12 22:29 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

             Bug #: 55295
           Summary: [SH] Add support for fipr instruction
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: olegendo@gcc.gnu.org
            Target: sh4*-*-*


Created attachment 28671
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28671
Example combine patterns

On SH4* targets there is a currently unused instruction 'fipr' which can be
used to calculate the dot product of two V4SF vectors:

fipr  FVm, FVn
FR(n+3) = FR(m+0)*FR(n+0) + FR(m+1)*FR(n+1) + FR(m+2)*FR(n+2) + FR(m+3)*FR(n+3)

Some (C++) code that could utilize this:

typedef float v4sf __attribute__ ((vector_size (16)));

float test00 (const v4sf& a, const v4sf& b)
{
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
}

float test01 (const v4sf& a, const v4sf& b, const v4sf& c)
{
  float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
  float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3];
  return x + y;
}

float test02 (float a0, float a1, float a2, float a3,
         float b0, float b1, float b2, float b3)
{
  return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3;
}

float test03 (const float* a, const float* b)
{
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
}

Dot products of vectors with 3 elements could also be handled by the fipr insn
by setting the irrelevant element to 0.0 in one of the vector operands.  For 2
element vectors an fmul,fmac sequence seems to be adequate (which already
works).

I've tried adding some combine patterns to handle the V2SF case (see
attachment), but the results are not so convincing.  For example, the case

float test02 (float a0, float a1, float a2, float a3,
         float b0, float b1, float b2, float b3)
{
  return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3;
}

compiled with -O2 -m4-single -mb results in:

        fmov.s  fr12,@-r15      ! 42    movsf_ie/7    [length = 2]
        fmov.s  fr13,@-r15      ! 43    movsf_ie/7    [length = 2]
        fmov.s  fr14,@-r15      ! 44    movsf_ie/7    [length = 2]
        fmov.s  fr15,@-r15      ! 45    movsf_ie/7    [length = 2]
        fmov    fr9,fr12        ! 31    movsf_ie/1    [length = 2]
        fmov    fr8,fr13        ! 32    movsf_ie/1    [length = 2]
        fmov    fr11,fr14       ! 33    movsf_ie/1    [length = 2]
        fmov    fr10,fr15       ! 34    movsf_ie/1    [length = 2]
        fmov    fr5,fr0         ! 27    movsf_ie/1    [length = 2]
        fmov    fr4,fr1         ! 28    movsf_ie/1    [length = 2]
        fmov    fr7,fr2         ! 29    movsf_ie/1    [length = 2]
        fmov    fr6,fr3         ! 30    movsf_ie/1    [length = 2]
        fipr    fv12,fv0        ! 35    fipr_compact    [length = 2]
        fmov.s  @r15+,fr15      ! 50    movsf_ie/6    [length = 2]
        fmov.s  @r15+,fr14      ! 51    movsf_ie/6    [length = 2]
        fmov    fr3,fr0         ! 36    movsf_ie/1    [length = 2]
        fmov.s  @r15+,fr13      ! 52    movsf_ie/6    [length = 2]
        rts                     ! 54    *return_i    [length = 2]
        fmov.s  @r15+,fr12      ! 53    movsf_ie/6    [length = 2]

which actually is supposed to be:

        fipr    fv4,fv8
        rts
        fmov    fr11,fr0



Also, in the case of

float test01 (const v4sf& a, const v4sf& b, const v4sf& c)
{
  float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
  float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3];
  return x + y;
}

only one fipr insn is generated, due to various other optimization effects.

It seems there is no standard name pattern for doing FP vector dot products
yet.  
I guess it would be better to also have some tree-optimization support for
this.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-03-21 11:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-12 22:29 [Bug target/55295] New: [SH] Add support for fipr instruction olegendo at gcc dot gnu.org
2012-11-12 22:39 ` [Bug target/55295] " olegendo at gcc dot gnu.org
2013-03-04 16:24 ` turkeyman at gmail dot com
2013-03-04 21:51 ` olegendo at gcc dot gnu.org
2013-03-05  1:55 ` turkeyman at gmail dot com
2013-03-05 12:28 ` olegendo at gcc dot gnu.org
2013-03-05 12:53 ` turkeyman at gmail dot com
2013-03-06  1:05 ` olegendo at gcc dot gnu.org
2013-03-13 18:21 ` olegendo at gcc dot gnu.org
2014-12-07 23:49 ` olegendo at gcc dot gnu.org
2014-12-09 22:37 ` olegendo at gcc dot gnu.org
2015-03-01 19:06 ` olegendo at gcc dot gnu.org
2015-03-02  0:17 ` turkeyman at gmail dot com
2015-03-02  9:00 ` olegendo at gcc dot gnu.org
2023-03-21  9:38 ` kazade at gmail dot com
2023-03-21  9:43 ` olegendo at gcc dot gnu.org
2023-03-21 11:46 ` kazade at gmail dot com
2023-03-21 11:52 ` olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).