public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/58863] New: for loop not aligned at -O2 or -O3
@ 2013-10-24 17:14 ali.baharev at gmail dot com
  2013-10-24 17:23 ` [Bug other/58863] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: ali.baharev at gmail dot com @ 2013-10-24 17:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863

            Bug ID: 58863
           Summary: for loop not aligned at -O2 or -O3
           Product: gcc
           Version: 4.7.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ali.baharev at gmail dot com

The for loop in work() is the hotspot:

const int LOOP_BOUND = 200000000;

__attribute__((noinline))
static int add(const int& x, const int& y) {
    return x + y;
}

__attribute__((noinline))
static int work(int xval, int yval) {
    int sum(0);
    for (int i=0; i<LOOP_BOUND; ++i) {
        int x(xval+sum);
        int y(yval+sum);
        int z = add(x, y);
        sum += z;
    }
    return sum;
}

int main(int , char* argv[]) {
    int result = work(*argv[1], *argv[2]);
    return result;
}


Running 

g++ -O2 main.cpp && objdump -d | c++filt 

gives

  400598:       41 8d 34 1c             lea    (%r12,%rbx,1),%esi
  [...]
  4005ab:       75 eb                   jne    400598 <work(int, int)+0x18>

According to the documentation:

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

-falign-loops   Enabled at levels -O2, -O3. 

By analyzing the assembly code, it looks like gcc aligns things to the next 16
byte boundary by default on this machine in other cases.

If I pass -falign-loops=16 it becomes:

  4005a0:       41 8d 34 1c             lea    (%r12,%rbx,1),%esi
  [...]
  4005b3:       75 eb                   jne    4005a0 <work(int, int)+0x20>

I guess it is also supposed to look like this when just -O2 is passed, at least
that is what the documentation suggestes to me.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug other/58863] for loop not aligned at -O2 or -O3
  2013-10-24 17:14 [Bug other/58863] New: for loop not aligned at -O2 or -O3 ali.baharev at gmail dot com
@ 2013-10-24 17:23 ` pinskia at gcc dot gnu.org
  2013-10-24 17:31 ` ali.baharev at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2013-10-24 17:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We have:
    .p2align 4,,10
    .p2align 3

so the max number of bytes we will skip is 10 but still align it to a 8 byte
boundary.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug other/58863] for loop not aligned at -O2 or -O3
  2013-10-24 17:14 [Bug other/58863] New: for loop not aligned at -O2 or -O3 ali.baharev at gmail dot com
  2013-10-24 17:23 ` [Bug other/58863] " pinskia at gcc dot gnu.org
@ 2013-10-24 17:31 ` ali.baharev at gmail dot com
  2013-10-24 17:33 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: ali.baharev at gmail dot com @ 2013-10-24 17:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863

--- Comment #2 from Ali Baharev <ali.baharev at gmail dot com> ---
Please check with objdump. It's not what I get in the executable.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug other/58863] for loop not aligned at -O2 or -O3
  2013-10-24 17:14 [Bug other/58863] New: for loop not aligned at -O2 or -O3 ali.baharev at gmail dot com
  2013-10-24 17:23 ` [Bug other/58863] " pinskia at gcc dot gnu.org
  2013-10-24 17:31 ` ali.baharev at gmail dot com
@ 2013-10-24 17:33 ` pinskia at gcc dot gnu.org
  2013-10-24 17:37 ` ali.baharev at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2013-10-24 17:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Ali Baharev from comment #2)
> Please check with objdump. It's not what I get in the executable.

Yes it is.  Read my comment again.  we align loops to 8 byte by default but try
to align it to 16 byte if we don't need to fill in over 10 bytes.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug other/58863] for loop not aligned at -O2 or -O3
  2013-10-24 17:14 [Bug other/58863] New: for loop not aligned at -O2 or -O3 ali.baharev at gmail dot com
                   ` (2 preceding siblings ...)
  2013-10-24 17:33 ` pinskia at gcc dot gnu.org
@ 2013-10-24 17:37 ` ali.baharev at gmail dot com
  2013-10-24 17:39 ` ali.baharev at gmail dot com
  2013-10-25 10:19 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: ali.baharev at gmail dot com @ 2013-10-24 17:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863

--- Comment #4 from Ali Baharev <ali.baharev at gmail dot com> ---
My mistake, sorry. 

So, you are saying that the default alignment is 8 byte for loops?

The funny thing is, this code runs 15% faster, if any of the followings are
passed:

 -Os
 -O2 -fno-align-loops -fno-align-functions
 -O2 -fno-omit-frame-pointer

At least on my machine and in this case, 16 byte alignment is better (or any
multiple of 16 byte). -march=native has no effect on the performance.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug other/58863] for loop not aligned at -O2 or -O3
  2013-10-24 17:14 [Bug other/58863] New: for loop not aligned at -O2 or -O3 ali.baharev at gmail dot com
                   ` (3 preceding siblings ...)
  2013-10-24 17:37 ` ali.baharev at gmail dot com
@ 2013-10-24 17:39 ` ali.baharev at gmail dot com
  2013-10-25 10:19 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: ali.baharev at gmail dot com @ 2013-10-24 17:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863

--- Comment #5 from Ali Baharev <ali.baharev at gmail dot com> ---
OK, then 8 byte default alignment for loops is the default. If you think it is
not a bug, then let's close this. Sorry for the false alarm.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug other/58863] for loop not aligned at -O2 or -O3
  2013-10-24 17:14 [Bug other/58863] New: for loop not aligned at -O2 or -O3 ali.baharev at gmail dot com
                   ` (4 preceding siblings ...)
  2013-10-24 17:39 ` ali.baharev at gmail dot com
@ 2013-10-25 10:19 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-10-25 10:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
It works as designed.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-25 10:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-24 17:14 [Bug other/58863] New: for loop not aligned at -O2 or -O3 ali.baharev at gmail dot com
2013-10-24 17:23 ` [Bug other/58863] " pinskia at gcc dot gnu.org
2013-10-24 17:31 ` ali.baharev at gmail dot com
2013-10-24 17:33 ` pinskia at gcc dot gnu.org
2013-10-24 17:37 ` ali.baharev at gmail dot com
2013-10-24 17:39 ` ali.baharev at gmail dot com
2013-10-25 10:19 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).