[Bug tree-optimization/57830] New: fold_builtin_memory_op expands memcpy without regard to -Os

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/57830] New: fold_builtin_memory_op expands memcpy without regard to -Os
@ 2013-07-05 12:33 amylaar at gcc dot gnu.org
  2013-07-05 15:58 ` [Bug target/57830] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: amylaar at gcc dot gnu.org @ 2013-07-05 12:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57830

            Bug ID: 57830
           Summary: fold_builtin_memory_op expands memcpy without regard
                    to -Os
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amylaar at gcc dot gnu.org

I see that the memcpy call at the end of gcc.dg/strlenopt-10.c:fn2.c
is expanded for the avr target (which has "#define BIGGEST_ALIGNMENT 8",
 i.e. the "dest_align < TYPE_ALIGN (desttype)" test at builtins.c:8923
 succeeds) irrespective of -Os or the size of the copied object.
So this generates 20 loads, 20 stores, ancillary address arithmetic,
and sky-high register pressure with 18 call-saved registers saved in
the prologue and restored in the epilogue.
Just leaving the call to memcpy alone would generate shorter code.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/57830] fold_builtin_memory_op expands memcpy without regard to -Os
  2013-07-05 12:33 [Bug tree-optimization/57830] New: fold_builtin_memory_op expands memcpy without regard to -Os amylaar at gcc dot gnu.org
@ 2013-07-05 15:58 ` pinskia at gcc dot gnu.org
  2013-07-05 16:27 ` amylaar at gcc dot gnu.org
  2013-07-05 17:47 ` jakub at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2013-07-05 15:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57830

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |target

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Expand should have converted the assignment back to memcpy due to tuning.  If
the expand is not converting it back to memcpy/memmove, then it is a bug either
in the AVR target or expand.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/57830] fold_builtin_memory_op expands memcpy without regard to -Os
  2013-07-05 12:33 [Bug tree-optimization/57830] New: fold_builtin_memory_op expands memcpy without regard to -Os amylaar at gcc dot gnu.org
  2013-07-05 15:58 ` [Bug target/57830] " pinskia at gcc dot gnu.org
@ 2013-07-05 16:27 ` amylaar at gcc dot gnu.org
  2013-07-05 17:47 ` jakub at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: amylaar at gcc dot gnu.org @ 2013-07-05 16:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57830

--- Comment #2 from Jorn Wolfgang Rennecke <amylaar at gcc dot gnu.org> ---
Created attachment 30464
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30464&action=edit
strlenopt-10.c optimized dump file from -Os compilation

This is expanded not into a single, but multiple assignments:
  MEM[(char * {ref-all})lp_24(D)] = _3;
  MEM[(char * {ref-all})lp_24(D) + 2B] = _12;
  MEM[(char * {ref-all})lp_24(D) + 4B] = _14;
  MEM[(char * {ref-all})lp_24(D) + 6B] = _15;
  MEM[(char * {ref-all})lp_24(D) + 8B] = _17;
  MEM[(char * {ref-all})lp_24(D) + 10B] = _18;
  MEM[(char * {ref-all})lp_24(D) + 12B] = _19;
  MEM[(char * {ref-all})lp_24(D) + 14B] = _21;
  MEM[(char * {ref-all})lp_24(D) + 16B] = _22;
  MEM[(char * {ref-all})lp_24(D) + 18B] = _23;

So I can't see how expand could convert that back.
OTOH convert is able to expand memcpy to multiple assignments, under control
of the target, so if in doubt, it's better to leave it as memcpy till expand.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/57830] fold_builtin_memory_op expands memcpy without regard to -Os
  2013-07-05 12:33 [Bug tree-optimization/57830] New: fold_builtin_memory_op expands memcpy without regard to -Os amylaar at gcc dot gnu.org
  2013-07-05 15:58 ` [Bug target/57830] " pinskia at gcc dot gnu.org
  2013-07-05 16:27 ` amylaar at gcc dot gnu.org
@ 2013-07-05 17:47 ` jakub at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-05 17:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57830

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The folding only folds the memcpy into
MEM[(char * {ref-all})lp] = MEM[(char * {ref-all})&l];
which is certainly desirable, as it improves optimizations, and at any point
can be expanded back to memcpy if that is desirable.
It is early SRA that turns that
  l[1] = _12;
  _14 = strlen (q_8);
  l[2] = _14;
  _16 = strlen (r_10);
...
  MEM[(char * {ref-all})lp_32(D)] = MEM[(char * {ref-all})&l];
into the scalar stores and in the end actually increases the register pressure
when you don't have enough call clobbered registers.  Note the testcase is
highlly artificial, and predicting whether the SRA is a win or not is very
hard.
If you look at how it is optimized with the strlen pass actually run (-Os
-foptimize-strlen, for -O2 it is enabled by default) you'll see that it is
actually a win, instead of storing constants into memory and then memcpying it
afterwards you store constants into memory at the end (with 4 exceptions I
think).


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-07-05 17:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-05 12:33 [Bug tree-optimization/57830] New: fold_builtin_memory_op expands memcpy without regard to -Os amylaar at gcc dot gnu.org
2013-07-05 15:58 ` [Bug target/57830] " pinskia at gcc dot gnu.org
2013-07-05 16:27 ` amylaar at gcc dot gnu.org
2013-07-05 17:47 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).