public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Alan Modra <amodra@gmail.com>
To: David Edelsohn <dje.gcc@gmail.com>,
	Michael Meissner <meissner@linux.vnet.ibm.com>
Cc: Steven Bosscher <stevenb.gcc@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	Vladimir Makarov <vmakarov@redhat.com>
Subject: Re: LRA vs reload on powerpc: 2 extra FAILs that are actually improvements?
Date: Sun, 01 Dec 2013 05:55:00 -0000	[thread overview]
Message-ID: <20131201055522.GP9211@bubble.grove.modra.org> (raw)
In-Reply-To: <CAGWvnykhYc3hNMaZ9upde11ZXdoFfn3-AjA+NfF-SOoiThBP8A@mail.gmail.com>

> On Sat, Nov 2, 2013 at 6:48 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> > The failure of pr53199.c is because of different instruction selection
> > for bswap. Test case is reduced to just one function:
[snip]
> > Is this an improvement or a regression? If it's an improvement then
> > these two test cases should be adjusted :-)

As David said, going through memory is bad, we get a load-hit-store
flush.  Definitely a regression on power7.  Does anyone know why the
bswapdi2_64bit r,r alternative is disparaged?  Seems like it has been
that way since the orginal mainline commit.

int main (void)
{
  int i;
  long ret = 0;
  long tmp1, tmp2, tmp3;

  for (i = 0; i < 1000000000; i++)
#if MEM == 1
    /* From pr53199.c reg_reverse, -mlra -mcpu=power6 -mtune=power7.  */
    __asm__ __volatile__ ("\
	addi %1,1,-16\n\
	srdi %3,%0,32\n\
	li %2,4\n\
	stwbrx %0,0,%1\n\
	stwbrx %3,%2,%1\n\
	ld %0,-16(1)" : "+r" (ret), "=&b" (tmp1), "=&r" (tmp2), "=&r" (tmp3));
#elif MEM == 2
    /* From pr53199.c reg_reverse, -mlra -mcpu=power6.  */
    __asm__ __volatile__ ("\
	addi %1,1,-16\n\
	srdi %3,%0,32\n\
	addi %2,%1,4\n\
	stwbrx %0,0,%1\n\
	stwbrx %3,0,%2\n\
	ld %0,-16(1)" : "+r" (ret), "=&b" (tmp1), "=&b" (tmp2), "=&r" (tmp3));
#elif MEM == 3
    /* From pr53199.c reg_reverse, -mlra -mcpu=power7.  */
    __asm__ __volatile__ ("\
	std %0,-16(1)\n\
	addi %1,1,-16\n\
	ldbrx %0,0,%1\n" : "+r" (ret), "=&b" (tmp1));
#else
    __asm__ __volatile__ ("\
	srdi %1,%0,32\n\
	rlwinm %2,%0,8,0xffffffff\n\
	rlwinm %3,%1,8,0xffffffff\n\
	rlwimi %2,%0,24,0,7\n\
	rlwimi %2,%0,24,16,23\n\
	rlwimi %3,%1,24,0,7\n\
	rlwimi %3,%1,24,16,23\n\
	sldi %2,%2,32\n\
	or %2,%2,%3\n\
	mr %0,%2" : "+r" (ret), "=&r" (tmp1), "=&r" (tmp2), "=&r" (tmp3));
#endif
  return ret;
}

/*
amodra@bns:~> gcc -O2 bswap_mem.c 
amodra@bns:~> time ./a.out 

real	0m3.096s
user	0m3.089s
sys	0m0.001s
amodra@bns:~> time ./a.out 

real	0m3.096s
user	0m3.094s
sys	0m0.002s
amodra@bns:~> gcc -O2 -DMEM=1 bswap_mem.c 
amodra@bns:~> time ./a.out 

real	0m12.661s
user	0m12.657s
sys	0m0.003s
amodra@bns:~> time ./a.out 

real	0m12.660s
user	0m12.657s
sys	0m0.003s
amodra@bns:~> gcc -O2 -DMEM=2 bswap_mem.c 
amodra@bns:~> time ./a.out 

real	0m12.660s
user	0m12.657s
sys	0m0.003s
amodra@bns:~> time ./a.out 

real	0m12.660s
user	0m12.657s
sys	0m0.004s
amodra@bns:~> gcc -O2 -DMEM=3 bswap_mem.c 
amodra@bns:~> time ./a.out 

real	0m10.279s
user	0m10.276s
sys	0m0.003s
amodra@bns:~> time ./a.out 

real	0m10.279s
user	0m10.276s
sys	0m0.003s

I also looked at the register version and -DMEM=1 case with power7
simulators finding that the register version had a delay of 12 cycles
from completion of the first instruction to completion of the last.
The -DMEM=1 case had a corresponding delay of 49 cycles, which matches
the loop timing above quite well.
*/

-- 
Alan Modra
Australia Development Lab, IBM

      reply	other threads:[~2013-12-01  5:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-02 22:49 Steven Bosscher
2013-11-04 14:16 ` David Edelsohn
2013-12-01  5:55   ` Alan Modra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131201055522.GP9211@bubble.grove.modra.org \
    --to=amodra@gmail.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=meissner@linux.vnet.ibm.com \
    --cc=stevenb.gcc@gmail.com \
    --cc=vmakarov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).