[Bug target/53949] [SH] Add support for mac.w / mac.l instructions

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "olegendo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/53949] [SH] Add support for mac.w / mac.l instructions
Date: Sun, 15 Jul 2012 12:11:00 -0000	[thread overview]
Message-ID: <bug-53949-4-VgvJCb3eaP@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-53949-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53949

--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-07-15 12:11:20 UTC ---
Created attachment 27799
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27799
Proof of concept patch

This is a proof of concept patch just to probe around.
The idea is to allow the RA to allocate macl and mach registers in DImode, and
have mac insns that use the macl/mach regs as a pair in DImode.

With the patch applied, the following function ...

int64_t test01 (const int16_t* a, const int16_t* b)
{
  int64_t sum = 0;
  for (int i = 0; i < 16; ++i)
    sum += (int64_t)(*a++) * (int64_t)(*b++);
  return sum;
}

compiled with -m4 -O2 results in ...

__Z6test01PKsS0_:
.LFB0:
        .cfi_startproc
        mov    #16,r1          ! 88    movsi_ie/3    [length = 2]
        clrmac                  ! 39    clrmac/1    [length = 2]
        .align 2
.L3:
        dt    r1              ! 89    dect    [length = 2]
        bf/s    .L3             ! 90    branch_false    [length = 2]
        mac.w    @r4+,@r5+       ! 61    *macw    [length = 2]
        sts    macl,r0         ! 82    movsi_ie/8    [length = 2]
        rts                     ! 99    *return_i    [length = 2]
        sts    mach,r1         ! 83    movsi_ie/8    [length = 2]

... which is not that bad already.


Some notes I took while playing around with this:


- When compiling for big endian the RA mistakes mach and macl when
  storing mach:macl to a DImode reg:reg pair.
  This could probably fixed by providing appropriate move insns patterns.


- Move insns/splits for DImode mach:macl <-> memory have to be added.
  I've seen an ICE when compiling with -O1:
  error: unrecognizable insn:
  (insn 122 14 15 2 (set (mem/c:DI (plus:SI (reg/f:SI 15 r15)
                (const_int 8 [0x8])) [0 %sfp+-8 S8 A32])
        (reg:DI 148 macl)) sh_mac.cpp:38 -1
     (nil))


- In some cases the mach:macl reg pair gets swapped to a general reg pair
  without any obvious need.  Example function:

  int64_t test04 (const int16_t* a, const int16_t* b,
                  const int16_t* c, const int16_t* d)
  {
    int64_t sum0 = 0;
    int64_t sum1 = 0;
    for (int i = 0; i < 16; ++i)
      sum0 += (int64_t)(*a++) * (int64_t)(*b++);

    for (int i = 0; i < 16; ++i)
      sum1 += (int64_t)(*c++) * (int64_t)(*d++);

    return sum0 - sum1;
  }

  The IRA pass first allocates sum0 and sum1 to mach:macl, but then reload
  seems to think that they are conflicting and moves sum0 to a general regs
  pair.  This results in ...

        mov     #0,r2
        mov     #16,r1
        mov     r2,r3
  .L16:
        lds     r2,macl
        lds     r3,mach
        dt      r1
        mac.w   @r4+,@r5+
        sts     macl,r2
        bf/s    .L16
        sts     mach,r3

        mov     #16,r1
        clrmac
        .align 2
  .L18:
        dt      r1
        bf/s    .L18
        mac.w   @r6+,@r7+


  which would be better as:
        mov     #16,r1
        clrmac
  .L16:
        dt      r1
        bf/s    .L16
        mac.w   @r4+,@r5+

        sts     macl,r2
        sts     mach,r3
        clrmac
        mov     #16,r1
  .L18:
        dt      r1
        bf/s    .L18
        mac.w   @r6+,@r7+


- Loops with multiple running sums like
  for (int i = 0; i < 16; ++i)
  {
    sum0 += (int64_t)(*a++) * (int64_t)(*b++);
    sum1 += (int64_t)(*c++) * (int64_t)(*d++);
  }

  result in macl:mach swapping to general reg pairs between subsequent
  mac.w instructions.  Ideally such loops should be split into multiple
  loops like in the previous example.


- When loop unrolling is turned on the auto-inc addresses refs are
  converted to displacement addresses.  Because the auto-inc-dec pass
  currently fails to detect a lot of auto-inc-dec possibilities the 
  mac.w pattern will not match.
  The same goes for manually unrolled code like 

  sum += (int64_t)(*a++) * (int64_t)(*b++);
  sum += (int64_t)(*a++) * (int64_t)(*b++);


- Running sum variables should be turned into DImode variables if possible:
  int32_t test00 (const int16_t* a, const int16_t* b)
  {
    int32_t sum = 0;
    for (int i = 0; i < 16; ++i)
      sum += (*a++) * (*b++);
    return sum;
  }


- The existing multiplication patterns could be adopted to utilize macl:mach
  reg pair allocation, especially 32x32 -> 64 bit multiplications.


- Normal multiplications that do not need a full MAC operation but use
  memory operands can be done with a clrmac-mac sequence.


Probably there are more subtle issues.  Also, I have not tried expanding
the standard name 'maddmn4' pattern,  maybe it would make some of the
problems mentioned above automagically disappear.

next prev parent reply	other threads:[~2012-07-15 12:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-13  9:01 [Bug target/53949] New: " olegendo at gcc dot gnu.org
2012-07-13 10:34 ` [Bug target/53949] " olegendo at gcc dot gnu.org
2012-07-13 11:01 ` chrbr at gcc dot gnu.org
2012-07-13 11:24 ` chrbr at gcc dot gnu.org
2012-07-15 12:11 ` olegendo at gcc dot gnu.org [this message]
2012-07-17 19:13 ` olegendo at gcc dot gnu.org
2012-07-17 23:05 ` kkojima at gcc dot gnu.org
2012-07-22 16:47 ` olegendo at gcc dot gnu.org
2012-10-11 20:43 ` olegendo at gcc dot gnu.org
2012-11-07 21:31 ` olegendo at gcc dot gnu.org
2013-05-04 13:39 ` olegendo at gcc dot gnu.org
2013-12-17 12:25 ` olegendo at gcc dot gnu.org
2013-12-17 12:37 ` olegendo at gcc dot gnu.org
2015-01-30 19:37 ` olegendo at gcc dot gnu.org
2015-02-01  0:37 ` olegendo at gcc dot gnu.org
2015-04-08 20:08 ` olegendo at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-53949-4-VgvJCb3eaP@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).