[Bug target/64299] New: [SH] improve FPSCR.PR mode switching by reordering insns

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "olegendo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/64299] New: [SH] improve FPSCR.PR mode switching by reordering insns
Date: Sat, 13 Dec 2014 17:50:00 -0000	[thread overview]
Message-ID: <bug-64299-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64299

            Bug ID: 64299
           Summary: [SH] improve FPSCR.PR mode switching by reordering
                    insns
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: olegendo at gcc dot gnu.org
            Target: sh*-*-*

Compiling the following function with -O2 results in 4 FPSCR.PR mode switches:

double test_0 (const float* a, const float* b, const double* c, float x)
{
  float aa = a[0] * b[0];  // single
  double cc = c[0] + c[1];  // double

  aa += b[1] * b[2];  // single
  cc += c[2] + c[3];  // double
  aa += b[3] * b[4];  // single

  return aa / cc; // double
}

Since the calculations are independent and FPSCR flags are not read between the
operations, the fp insns can be reordered to reduce the amount of mode
switches.  The resulting code should be the same as when doing the reordering
manually:

double test_1 (const float* a, const float* b, const double* c, float x)
{
  float aa = a[0] * b[0] + b[1] * b[2] + b[3] * b[4];  // single
  double cc = c[0] + c[1] + c[2] + c[3];  // double

  return aa / cc; // double
}

which results in only 2 FPSCR.PR mode switches.


Moreover, the following example

double test_2 (const float* x, const double* y, unsigned int c)
{
  float var0 = 0;
  double var1 = 0;

  while (c--)
  {
    float xx = x[0] * x[1] + x[2] + 123.0f;
    x += 3;

    double yy = y[0] + y[1];
    y += 2;

    var0 += xx;
    var1 += yy;
  }

  return var0 + var1;
}

is a good candidate for doing loop distribution.  Since var0 and var1 are
independent the loop can be replaced with a single precision loop and a double
precision loop, eliminating the FPSCR.PR mode switches inside the loop.

                 reply	other threads:[~2014-12-13 17:50 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-64299-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).