public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/48006] New: Inefficient optimization depends on builtin integer type of same size.
@ 2011-03-06 14:40 carlo at gcc dot gnu.org
  2011-03-07 10:23 ` [Bug c/48006] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: carlo at gcc dot gnu.org @ 2011-03-06 14:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48006

           Summary: Inefficient optimization depends on builtin integer
                    type of same size.
           Product: gcc
           Version: 4.4.5
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: carlo@gcc.gnu.org


Created attachment 23561
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23561
The test file used.

Working on M4RI, I found that changing a typedef from unsigned long long to
unsigned long caused one of the bench marks to become 50% slower. This is
peculiar since I'm on a 64-bit box where the size of both is 8 byte.

After investigation I ended up with the following function that is the cause
for at least 25% slow down, so a good case to investigate this (compiler) bug
(assuming you're willing to call not-optimal compiled code a bug).

===============Start of File=====================
#define RADIX 64
typedef unsigned long word;
typedef unsigned long size_t;

typedef struct _mm_block {
  size_t size;
  void *data;
} mmb_t;

typedef struct {
  mmb_t *blocks;
  size_t nrows;
  size_t ncols;
  size_t width;
  size_t offset;
  word** rows;
} mzd_t;

typedef unsigned char BIT;
#define ONE ((word)1)
#define GET_BIT(w, spot) (((w) >> (RADIX - 1 - (spot))) & ONE)

static inline BIT mzd_read_bit(const mzd_t *M, const size_t row, const size_t
col ) {
  return GET_BIT(M->rows[row][(col+M->offset)/RADIX], (col+M->offset) % RADIX);
}

void foo(mzd_t* DST, mzd_t const* A, int i, int eol)
{
#ifdef OLDCODE
    unsigned long long* temp = (unsigned long long*)DST->rows[i];
    for (int j = 0; j < eol; j += RADIX, ++temp)
      for (int k = RADIX - 1; k >= 0; --k)
        *temp |= ((unsigned long long)mzd_read_bit(A, j+k,
i+A->offset))<<(RADIX-1-k);
#else
    word* temp = DST->rows[i];
    for (int j = 0; j < eol; j += RADIX, ++temp)
      for (int k = RADIX - 1; k >= 0; --k)
        *temp |= ((word)mzd_read_bit(A, j+k, i+A->offset))<<(RADIX-1-k);
#endif
}
===================END OF FILE====================================

Compile this with on a x86_64 machine with:

gcc -std=gnu99 -O2 -c transposebody.c -fPIC -DPIC -o transposebody.o -DOLDCODE
-save-temps

one with and without the -DOLDCODE will show a remarkable difference in the
resulting assembly code, using more registers and a lot more instructions when
OLDCODE is not defined.

Note that the only difference is that with OLDCODE defined we cast the unsigned
char returned from mzd_read_bit to an unsigned long long instead of to an
unsigned long, and the type of temp is unsigned long long* instead of unsigned
long*.

$ uname -a
Linux hikaru 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64
GNU/Linux


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/48006] Inefficient optimization depends on builtin integer type of same size.
  2011-03-06 14:40 [Bug c/48006] New: Inefficient optimization depends on builtin integer type of same size carlo at gcc dot gnu.org
@ 2011-03-07 10:23 ` rguenth at gcc dot gnu.org
  2011-04-12 15:48 ` carlo at gcc dot gnu.org
  2011-04-12 15:50 ` carlo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-07 10:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48006

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-07 10:23:11 UTC ---
I think what you are seeing is type-based alias analysis at work.  When using
unsigned long long you get no aliasing of the accesses to *rows to the
nrows, ncols ... members (they use size_t which is unsigned long).  But as
soon as you change the typedef to unsigned long (which means the same type
as size_t) then the compiler has to assume all stores to *rows alias the
size_t members and so their loads are not hoisted out of the loop.

You should probably try to play with restrict qualifying some of the pointers
to tell the compiler that *rows does not point to nrows.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/48006] Inefficient optimization depends on builtin integer type of same size.
  2011-03-06 14:40 [Bug c/48006] New: Inefficient optimization depends on builtin integer type of same size carlo at gcc dot gnu.org
  2011-03-07 10:23 ` [Bug c/48006] " rguenth at gcc dot gnu.org
@ 2011-04-12 15:48 ` carlo at gcc dot gnu.org
  2011-04-12 15:50 ` carlo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: carlo at gcc dot gnu.org @ 2011-04-12 15:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48006

Carlo Wood <carlo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |
   Target Milestone|---                         |4.6.1

--- Comment #2 from Carlo Wood <carlo at gcc dot gnu.org> 2011-04-12 15:48:07 UTC ---
I couldn't get it to work with restrict, but after changing all size_t into
int, the difference indeed disappears, so I guess you're right and this is
caused by strict aliasing.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/48006] Inefficient optimization depends on builtin integer type of same size.
  2011-03-06 14:40 [Bug c/48006] New: Inefficient optimization depends on builtin integer type of same size carlo at gcc dot gnu.org
  2011-03-07 10:23 ` [Bug c/48006] " rguenth at gcc dot gnu.org
  2011-04-12 15:48 ` carlo at gcc dot gnu.org
@ 2011-04-12 15:50 ` carlo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: carlo at gcc dot gnu.org @ 2011-04-12 15:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48006

Carlo Wood <carlo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID

--- Comment #3 from Carlo Wood <carlo at gcc dot gnu.org> 2011-04-12 15:50:31 UTC ---
My last remark accidently opened it again. Closing again.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-04-12 15:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-06 14:40 [Bug c/48006] New: Inefficient optimization depends on builtin integer type of same size carlo at gcc dot gnu.org
2011-03-07 10:23 ` [Bug c/48006] " rguenth at gcc dot gnu.org
2011-04-12 15:48 ` carlo at gcc dot gnu.org
2011-04-12 15:50 ` carlo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).