public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: Andrew Haley <aph@redhat.com>
To: Cody Rigney <codyrigney92@gmail.com>, gcc-help@gcc.gnu.org
Subject: Re: Compiler optimizing variables in inline assembly
Date: Thu, 20 Feb 2014 09:14:00 -0000	[thread overview]
Message-ID: <5305C77D.3090807@redhat.com> (raw)
In-Reply-To: <CA+1=iYaWg6OyzNjM9K2Qb1fn40ei0Ls+3AhVyXcg-h2Pm3xQaw@mail.gmail.com>

Hi,

On 02/19/2014 07:04 PM, Cody Rigney wrote:
> I'm trying to add NEON optimizations to OpenCV's LK optical flow.  See
> link below.
> https://github.com/Itseez/opencv/blob/2.4/modules/video/src/lkpyramid.cpp
> 
> The gcc version could vary since this is an open source project, but
> the one I'm currently using is 4.8.1. The target architecture is ARMv7
> w/ NEON. The processor I'm testing on is an ARM
> Cortex-A15(big.LITTLE).
> 
> The problem is, in release mode (where optimizations are set) it does
> not work properly. However, in debug mode, it works fine. I tracked
> down a specific variable(FLT_SCALE) that was being optimized out and
> made it volatile and that part worked fine after that. However, I'm
> still having incorrect behavior from some other optimization.

Forget about using volatile here.  That's just wrong.

You have to mark your inputs, outputs, and clobbers correctly.

Look at this asm:

               __asm__ volatile (
                                  "vld1.16 {q0}, [%0]\n\t" //trow0[x + cn]
                                  "vld1.16 {q1}, [%1]\n\t" //trow0[x - cn]
                                  "vsub.i16 q5, q0, q1\n\t" //this is t0
                                  "vld1.16 {q2}, [%2]\n\t" //trow1[x + cn]
                                  "vld1.16 {q3}, [%3]\n\t" //trow1[x - cn]
                                  "vadd.i16 q6, q2, q3\n\t" //this
needs mult by 3
                                  "vld1.16 {q4}, [%4]\n\t" //trow1[x]
                                  "vmul.i16 q7, q6, q8\n\t" //this
needs to add to trow1[x]*10
                                  "vmul.i16 q10, q4, q9\n\t" //this is
trow1[x]*10
                                  "vadd.i16 q11, q7, q10\n\t" //this is t1
                                  "vswp d22, d11\n\t"
                                  "vst2.16 {q5}, [%5]\n\t" //interleave
                                  "vst2.16 {q11}, [%6]\n\t" //interleave
                                  :
                                  : "r" (trow0 + x + cn),  //0
                                    "r" (trow0 + x - cn),  //1
                                    "r" (trow1 + x + cn),  //2
                                    "r" (trow1 + x - cn),  //3
                                    "r" (trow1 + x),       //4
                                    "r" (drow + (x*2)),     //5
                                    "r" (drow + (x*2)+8)   //6
                                  :
                                  );

It has no outputs.  How is this possible?  It does a lot of work.  It must
have some outputs.  I think there should be some outputs for this asm.  I
think they are memory outputs.

Go through all, the asm blocks, and mark the inputs, outputs, and clobbers.
Then it should work.

Remember one basic thing: you must tell GCC about everything that an
asm does.  If it affects memory, you must tell GCC.  If it reads memory,
you must tell GCC.  DO not lie to the compiler: it will bite you.

Andrew.

  reply	other threads:[~2014-02-20  9:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-19 19:05 Cody Rigney
2014-02-20  9:14 ` Andrew Haley [this message]
2014-02-20 19:30   ` Cody Rigney
2014-02-21  9:53     ` Andrew Haley
2014-02-21 14:06       ` Cody Rigney
2014-02-21 15:02         ` Andrew Haley
2014-02-21 15:20           ` Cody Rigney
2014-02-27 13:18             ` Cody Rigney
2014-02-27 14:03               ` Andrew Haley
2014-02-27 18:34                 ` Cody Rigney
2014-02-21  9:54     ` David Brown
2014-02-21  9:55     ` David Brown
2014-02-20  9:54 ` David Brown
2014-02-20 19:39   ` Cody Rigney
2014-02-21 10:15     ` David Brown
2014-02-21 14:11       ` Cody Rigney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5305C77D.3090807@redhat.com \
    --to=aph@redhat.com \
    --cc=codyrigney92@gmail.com \
    --cc=gcc-help@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).