public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/65413] New: inefficient code returning aggregates on powepc64le
@ 2015-03-12 22:36 msebor at gcc dot gnu.org
  2015-03-12 23:49 ` [Bug target/65413] " msebor at gcc dot gnu.org
  2015-03-14 20:19 ` [Bug target/65413] inefficient code returning aggregates on powerpc64le segher at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: msebor at gcc dot gnu.org @ 2015-03-12 22:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65413

            Bug ID: 65413
           Summary: inefficient code returning aggregates on powepc64le
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: msebor at gcc dot gnu.org

When returning aggregates that don't fit into a single register on powerpc64le,
gcc emits the assembly below which ultimately results in the same values in the
r3 and r4 registers as it started out with.  All the shift and rotate
instructions are unnecessary (and in fact, foo could be as simple as a blr).


$ cat ~/tmp/x.c && gcc -O2 -S -Wall -o/dev/tty ~/tmp/x.c
typedef struct { int a[3]; } A;

A __attribute__ ((noinline)) foo (A a) {
    return a;
}

A bar (A a) {
    return foo (a);
}
    .file    "x.c"
    .machine power8
    .abiversion 2
    .section    ".toc","aw"
    .section    ".text"
    .align 2
    .p2align 4,,15
    .globl foo
    .type    foo, @function
foo:
    mr 9,3
    li 3,0
    rldicl 10,9,0,32
    srdi 9,9,32
    rldimi 3,10,0,32
    rldicl 4,4,0,32
    rldimi 3,9,32,0
    blr
    .long 0
    .byte 0,0,0,0,0,0,0,0
    .size    foo,.-foo
    .align 2
    .p2align 4,,15
    .globl bar
    .type    bar, @function
bar:
0:    addis 2,12,.TOC.-0b@ha
    addi 2,2,.TOC.-0b@l
    .localentry    bar,.-bar
    mflr 0
    std 0,16(1)
    stdu 1,-64(1)
    bl foo
    addi 1,1,64
    ld 0,16(1)
    mr 9,3
    li 3,0
    rldicl 10,9,0,32
    srdi 9,9,32
    rldimi 3,10,0,32
    rldicl 4,4,0,32
    mtlr 0
    rldimi 3,9,32,0
    blr
    .long 0
    .byte 0,0,0,1,128,0,0,0
    .size    bar,.-bar
    .ident    "GCC: (GNU) 5.0.0 20150303 (experimental)"
    .section    .note.GNU-stack,"",@progbits


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/65413] inefficient code returning aggregates on powepc64le
  2015-03-12 22:36 [Bug c/65413] New: inefficient code returning aggregates on powepc64le msebor at gcc dot gnu.org
@ 2015-03-12 23:49 ` msebor at gcc dot gnu.org
  2015-03-14 20:19 ` [Bug target/65413] inefficient code returning aggregates on powerpc64le segher at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: msebor at gcc dot gnu.org @ 2015-03-12 23:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65413

--- Comment #1 from Martin Sebor <msebor at gcc dot gnu.org> ---
Actually, similarly inefficient code is generated even for aggregates that do
fit into a register.  The trigger appears to be that the aggregate not take up
an even multiple of a register.  For example, returning a "struct { char a[7];
}" results in the code below.  Another example is "struct { short a[3]; }"

foo:
    mr 10,3
    rlwinm 8,3,0,0xff
    li 9,0
    rldicl 7,10,48,56
    rldimi 9,8,0,56
    rldicl 8,10,56,56
    rldimi 9,8,8,48
    rldicl 10,10,40,56
    srdi 8,3,32
    rldimi 9,7,16,40
    rldimi 9,10,24,32
    rlwinm 10,8,0,0xff
    rldimi 9,10,32,24
    rldicl 8,8,56,56
    rldicl 3,3,16,56
    rldimi 9,8,40,16
    rldimi 9,3,48,8
    mr 3,9
    blr


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/65413] inefficient code returning aggregates on powerpc64le
  2015-03-12 22:36 [Bug c/65413] New: inefficient code returning aggregates on powepc64le msebor at gcc dot gnu.org
  2015-03-12 23:49 ` [Bug target/65413] " msebor at gcc dot gnu.org
@ 2015-03-14 20:19 ` segher at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: segher at gcc dot gnu.org @ 2015-03-14 20:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65413

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-03-14
                 CC|                            |segher at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |segher at gcc dot gnu.org
   Target Milestone|---                         |6.0
            Summary|inefficient code returning  |inefficient code returning
                   |aggregates on powepc64le    |aggregates on powerpc64le
     Ever confirmed|0                           |1

--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This is first expanded to a copy of the arg to a struct on stack; then
CSE etc. keep everything in regs, and DSE removes the stores completely.

We are left with copying 0 to a reg, and then copying parts of the arg
to a zero_extract of that.  Which combine will not handle (see
combinable_i3pat, the "inner_dest != dest" branch).

This will get better when we no longer write rl*imi as a zero_extract,
hopefully completely solved even.  We'll see.  Mine.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-14 20:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-12 22:36 [Bug c/65413] New: inefficient code returning aggregates on powepc64le msebor at gcc dot gnu.org
2015-03-12 23:49 ` [Bug target/65413] " msebor at gcc dot gnu.org
2015-03-14 20:19 ` [Bug target/65413] inefficient code returning aggregates on powerpc64le segher at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).