public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/51254] New: Missed Optimization: IVOPTS don't handle unaligned memory access.
@ 2011-11-21  4:46 duyuehai at gmail dot com
  2011-11-21  5:06 ` [Bug tree-optimization/51254] " duyuehai at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: duyuehai at gmail dot com @ 2011-11-21  4:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51254

             Bug #: 51254
           Summary: Missed Optimization: IVOPTS don't handle unaligned
                    memory access.
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: duyuehai@gmail.com


IVOPTS don't handle unaligned memory access because
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17949. but without this
optimization, we may generate sub-optimal code.

here is a case from EEMBC autcor00:

fxpAutoCorrelation (
    e_s16 *InputData,
    e_s16 *AutoCorrData,
    e_s16 DataSize,
    e_s16 NumberOfLags,
    e_s16 Scale
)
{
    n_int i;
    n_int lag;
    n_int LastIndex;
    e_s32 Accumulator;


    for (lag = 0; lag < NumberOfLags; lag++) {
        Accumulator = 0;
        LastIndex = DataSize - lag;
        for (i = 0; i < LastIndex; i++) {
            Accumulator += ((e_s32) InputData[i] * (e_s32) InputData[i+lag]) >>
Scale;
        }


        AutoCorrData[lag] = (e_s16) (Accumulator >> 16) ;
    }
}

Compile it with a arm cross-compiler
Compile flags:
-O3 -mfpu=neon -mfloat-abi=softfp 

the key vectorized loop is:

.L8:
    add    r7, ip, sl
    vldmia    ip, {d18-d19}
    vld1.16    {q8}, [r7]
    vmull.s16 q12, d18, d16
    vshl.s32    q12, q12, q11
    vmull.s16 q8, d19, d17
    add    r4, r4, #1
    vadd.i32    q10, q12, q10
    vshl.s32    q8, q8, q11
    cmp    r4, r8
    vadd.i32    q10, q8, q10
    add    ip, ip, #16
    bcc    .L8

  There are three ADD insn in it which used to calculate address and loop
counter, but we can see we only need one ADD insn for calculating loop counter,
other two can be optimized with address post increment operation.

  The root cause of this is because IVOPTS don't handle unaligned memory
access. if we remove those check in find_interesting_uses_address, the result
is:

.L8:
    vldmia    r6!, {d18-d19}
    vld1.16    {q8}, [r7]!
    vmull.s16 q12, d18, d16
    vshl.s32    q12, q12, q11
    vmull.s16 q8, d19, d17
    add    r4, r4, #1
    vadd.i32    q10, q12, q10
    vshl.s32    q8, q8, q11
    cmp    r4, sl
    vadd.i32    q10, q8, q10
    bcc    .L8

  This should be the result we want.

see http://gcc.gnu.org/ml/gcc/2011-11/msg00311.html for more details.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-12-14  9:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-21  4:46 [Bug tree-optimization/51254] New: Missed Optimization: IVOPTS don't handle unaligned memory access duyuehai at gmail dot com
2011-11-21  5:06 ` [Bug tree-optimization/51254] " duyuehai at gmail dot com
2011-12-13  4:16 ` duyuehai at gmail dot com
2011-12-13  7:19 ` duyuehai at gmail dot com
2011-12-13 10:49 ` rguenther at suse dot de
2011-12-14  3:14 ` duyuehai at gmail dot com
2011-12-14  9:43 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).