public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/58879] New: PPC: Missed opportunity to use lwbrx
@ 2013-10-25 17:21 marcus at mc dot pp.se
  0 siblings, 0 replies; only message in thread
From: marcus at mc dot pp.se @ 2013-10-25 17:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58879

            Bug ID: 58879
           Summary: PPC: Missed opportunity to use lwbrx
           Product: gcc
           Version: 4.7.3
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marcus at mc dot pp.se

Hi.

Please consider the following function, compiled on PPC (32 bit):

uint32_t swap32(uint32_t *in)
{
#if 1
  uint8_t a[] = {
    ((*in) & (uint32_t)0xff000000UL)>>24,
    ((*in) & (uint32_t)0x00ff0000UL)>>16,
    ((*in) & (uint32_t)0x0000ff00UL)>>8,
    (*in) & (uint32_t)0x000000ffUL,
  };
#else
  const uint8_t *a = (uint8_t *)in;
#endif
  uint32_t r =
    (a[0]) |
    (a[1] << 8) |
    (a[2] << 16) |
    (a[3] << 24);

  return r;
}

With the code in the #if branch, this results in a single lwbrx instruction. 
However, with the code in the #else branch it does not (getting lbz + slwi + or
instead).

Why the uint8_t pointer?  Well, my real code is a C++ template containing the
following:

  uint8_t data[nBytes];
  T getValue() const {
    T v = 0;
    int i;
    for (i=0; i<nBytes; i++)
      v |= data[i]<<(i*8);
    return v;
  }

If nBytes happens to be 4 in a particular instantiation of the template, then
this collapses beatifully into a single movl instuction on AMD64.  So I think
I'm not being totally unreasonable in hoping for a lwbrx on PPC (or lwz, if
-mlittle is in effect), provided strict alignment is not required of course.

I don't know how difficult it would be to make this work, but given that byte
array reassembly -> word load already works on AMD64, and reverse order
reassembly already can give a lwbrx at least _sometimes_ on PPC, it seems like
it would be feasable at least.  And it would be a neat trick to get efficient
code from portable source, without a lot of #ifdefs and __builtin_whatevers. 
:-)

Thanks for listening

  // Marcus


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-10-25 17:21 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-25 17:21 [Bug target/58879] New: PPC: Missed opportunity to use lwbrx marcus at mc dot pp.se

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).