[Bug rtl-optimization/49028] New: Missed optimization of pointer arithmetic

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/49028] New: Missed optimization of pointer arithmetic
@ 2011-05-17 17:32 piotr.wyderski at gmail dot com
  2011-05-17 17:40 ` [Bug rtl-optimization/49028] " piotr.wyderski at gmail dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-17 17:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49028

           Summary: Missed optimization of pointer arithmetic
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: piotr.wyderski@gmail.com

The following examples come from x64, but I believe the problem
shows on other architectures too. 

I have implemented an object recycler based on a circular buffer of pointers
with a single cursor. N is the capacity of the buffer; powers of two are highly
preferred, so let we assume N = 16.

template <unsigned int N> struct R {

    void*  m_Data[N];
    void** m_Cursor;

    void xxx_release(void* p) __attribute__((__noinline__));
};

template <unsigned int N> void R<N>::xxx_release(void* p) {

    m_Cursor = m_Data + ((m_Cursor + 1 - m_Data) % N);
    *m_Cursor = p;
}

int main(int argc, char *argv[]) {      

    R<16> rrr;
    rrr.xxx_release(0);
    return 0;
}

This generates (-O3 -msse -msse2 -msse4.2 -mfpmath=sse -march=core2
-mtune=core2):

000000000041a910 <_ZN1RILj16EE11xxx_releaseEPv>:
  41a910:    48 8b 97 80 00 00 00     mov    0x80(%rdi),%rdx
  41a917:    48 83 c2 08              add    $0x8,%rdx
  41a91b:    48 29 fa                 sub    %rdi,%rdx
  41a91e:    48 89 d0                 mov    %rdx,%rax
  41a921:    48 c1 fa 3f              sar    $0x3f,%rdx
  41a925:    48 c1 ea 3c              shr    $0x3c,%rdx
  41a929:    48 c1 f8 03              sar    $0x3,%rax
  41a92d:    48 01 d0                 add    %rdx,%rax
  41a930:    83 e0 0f                 and    $0xf,%eax
  41a933:    48 29 d0                 sub    %rdx,%rax
  41a936:    48 8d 14 c7              lea    (%rdi,%rax,8),%rdx
  41a93a:    48 89 34 c7              mov    %rsi,(%rdi,%rax,8)
  41a93e:    48 89 97 80 00 00 00     mov    %rdx,0x80(%rdi)
  41a945:    c3                       retq   

The sequence is far from being optimal. The compiler should not move pointer
arithmetic to the type-independent integer domain (i.e. were (p + 1 - p) == 1),
but notice that the actual increment, whis is M = sizeof(void*), is a power of
2, in this case 8, so the final modulo result in integer domain for (p + 1) % N
will be the same as (((char*) p) + M) % (N * M). To cut a long story short:
instead of the above it should just generate:

000000000041a910 <_ZN1RILj16EE11xxx_releaseEPv>:
    mov    0x80(%rdi),%rdx
    add    $0x08, %rdx
    sub    %rdi, %rdx
    and    $0x7f, %rdx
    add    %rdi, %rdx
    mov    %rsi, (%rdx)
    mov    %rdx, 0x80(%rdi)
    retq

I'm not sure I understand what is GCC trying to achieve with the shifts.
Secondly, my example above uses only simple addressing modes, but GCC
uses the most complex mode twice: in lea and in the subsequent mov.
Complex lea has 3 cycles of latency on SandyBridge, according to the
Intel Optimization Manual, p. 3.5.1.3, so it should best be avoided.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/49028] Missed optimization of pointer arithmetic
  2011-05-17 17:32 [Bug rtl-optimization/49028] New: Missed optimization of pointer arithmetic piotr.wyderski at gmail dot com
@ 2011-05-17 17:40 ` piotr.wyderski at gmail dot com
  2011-05-18 10:18 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-17 17:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49028

--- Comment #1 from Piotr Wyderski <piotr.wyderski at gmail dot com> 2011-05-17 17:24:03 UTC ---
If I change the function to:

template <unsigned int N> void R<N>::xxx_release(void* p) {

    char* q = reinterpret_cast<char*>(m_Cursor);
       char* b = reinterpret_cast<char*>(m_Data);

       q = ((q + sizeof(void*)) - b) % (N * sizeof(void*)) + b;
    m_Cursor = reinterpret_cast<void**>(q);
    *m_Cursor = p;
}

Then the generated code is:

000000000041a910 <_ZN1RILj16EE11xxx_releaseEPv>:
  41a910:    48 8b 87 80 00 00 00     mov    0x80(%rdi),%rax
  41a917:    48 83 c0 08              add    $0x8,%rax
  41a91b:    48 29 f8                 sub    %rdi,%rax
  41a91e:    83 e0 7f                 and    $0x7f,%eax
  41a921:    48 01 f8                 add    %rdi,%rax
  41a924:    48 89 87 80 00 00 00     mov    %rax,0x80(%rdi)
  41a92b:    48 89 30                 mov    %rsi,(%rax)
  41a92e:    c3                       retq   

which is astonishingly close to my hand-made assembly...


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/49028] Missed optimization of pointer arithmetic
  2011-05-17 17:32 [Bug rtl-optimization/49028] New: Missed optimization of pointer arithmetic piotr.wyderski at gmail dot com
  2011-05-17 17:40 ` [Bug rtl-optimization/49028] " piotr.wyderski at gmail dot com
@ 2011-05-18 10:18 ` rguenth at gcc dot gnu.org
  2011-05-18 13:15 ` piotr.wyderski at gmail dot com
  2021-08-14 20:53 ` [Bug tree-optimization/49028] " pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-05-18 10:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49028

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011.05.18 09:40:28
     Ever Confirmed|0                           |1

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-05-18 09:40:28 UTC ---
I think we are confused by the division by the element size which is of course
what the language semantic requires.  Thus, we start with

(void) ((this)->m_Cursor = (void * *) &(this)->m_Data + (long unsigned int)
((long unsigned int) ((((long int) ((this)->m_Cursor + 8) - (long int)
&(this)->m_Data) /[ex] 8) % 16) * 8))

and because of the modulus we can't fold it.  In the IL we end up with

<bb 2>:
  D.2101_3 = this_1(D)->m_Cursor;
  D.2102_4 = D.2101_3 + 8;
  D.2103_5 = (long int) D.2102_4;
  D.2100_6 = &this_1(D)->m_Data;
  D.2104_7 = (long int) D.2100_6;
  D.2105_8 = D.2103_5 - D.2104_7;
  D.2106_9 = D.2105_8 /[ex] 8;
  D.2107_10 = D.2106_9 % 16;
  D.2108_11 = (long unsigned int) D.2107_10;
  D.2110_13 = &this_1(D)->m_Data[D.2108_11]{lb: 0 sz: 8};
  this_1(D)->m_Cursor = D.2110_13;
  MEM[(void * *)this_1(D)].m_Data[D.2108_11]{lb: 0 sz: 8} = p_15(D);
  return;

where it requires quite some elaboration to combine the D.2108_11 index
computation with the address generation via the array-reference.

Maybe we can do some generic clever tricks to (A /[ex] CST1) % CST2?
We'd like to re-associate it somehow.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/49028] Missed optimization of pointer arithmetic
  2011-05-17 17:32 [Bug rtl-optimization/49028] New: Missed optimization of pointer arithmetic piotr.wyderski at gmail dot com
  2011-05-17 17:40 ` [Bug rtl-optimization/49028] " piotr.wyderski at gmail dot com
  2011-05-18 10:18 ` rguenth at gcc dot gnu.org
@ 2011-05-18 13:15 ` piotr.wyderski at gmail dot com
  2021-08-14 20:53 ` [Bug tree-optimization/49028] " pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: piotr.wyderski at gmail dot com @ 2011-05-18 13:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49028

--- Comment #3 from Piotr Wyderski <piotr.wyderski at gmail dot com> 2011-05-18 12:02:58 UTC ---
(In reply to comment #2)

> Maybe we can do some generic clever tricks to (A /[ex] CST1) % CST2?
> We'd like to re-associate it somehow.

Wouldn't it be possible to do add the following subtree replacement rule?

           *            %
          / \          / \
         %   B        A   *
        / \      =>      / \
       / \ C            B   C
      A   B

It works in contexts where it can be proven that for_all a.(a % b) == 0,
which by definition includes b-aligned pointers.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/49028] Missed optimization of pointer arithmetic
  2011-05-17 17:32 [Bug rtl-optimization/49028] New: Missed optimization of pointer arithmetic piotr.wyderski at gmail dot com
                   ` (2 preceding siblings ...)
  2011-05-18 13:15 ` piotr.wyderski at gmail dot com
@ 2021-08-14 20:53 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-14 20:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49028

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|minor                       |enhancement
          Component|rtl-optimization            |tree-optimization
   Last reconfirmed|2011-05-18 09:40:28         |2021-8-14

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---

  _3 = _2 + 8;
  _4 = _3 - _1;
  _5 = _4 /[ex] 8;
  _6 = _5 % 16;
  _7 = (long unsigned int) _6;
  _8 = _7 * 8;
  _9 = _1 + _8;

I Noticed only MSVC actually does this optimization.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-14 20:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-17 17:32 [Bug rtl-optimization/49028] New: Missed optimization of pointer arithmetic piotr.wyderski at gmail dot com
2011-05-17 17:40 ` [Bug rtl-optimization/49028] " piotr.wyderski at gmail dot com
2011-05-18 10:18 ` rguenth at gcc dot gnu.org
2011-05-18 13:15 ` piotr.wyderski at gmail dot com
2021-08-14 20:53 ` [Bug tree-optimization/49028] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).