From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21055 invoked by alias); 20 Feb 2014 19:30:01 -0000 Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org Received: (qmail 21019 invoked by uid 89); 20 Feb 2014 19:30:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f68.google.com Received: from mail-pa0-f68.google.com (HELO mail-pa0-f68.google.com) (209.85.220.68) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 20 Feb 2014 19:29:59 +0000 Received: by mail-pa0-f68.google.com with SMTP id kp14so1326546pab.3 for ; Thu, 20 Feb 2014 11:29:57 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.66.148.134 with SMTP id ts6mr3956408pab.113.1392924597307; Thu, 20 Feb 2014 11:29:57 -0800 (PST) Received: by 10.70.18.193 with HTTP; Thu, 20 Feb 2014 11:29:57 -0800 (PST) In-Reply-To: <5305C77D.3090807@redhat.com> References: <5305C77D.3090807@redhat.com> Date: Thu, 20 Feb 2014 19:30:00 -0000 Message-ID: Subject: Re: Compiler optimizing variables in inline assembly From: Cody Rigney To: Andrew Haley Cc: gcc-help@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 X-SW-Source: 2014-02/txt/msg00125.txt.bz2 That makes sense. In this case, the input parameters are actually memory addresses. So how would I do an output or clobber that would tell the compiler that the memory at those addresses will change? Thanks for your time, Cody On Thu, Feb 20, 2014 at 4:14 AM, Andrew Haley wrote: > Hi, > > On 02/19/2014 07:04 PM, Cody Rigney wrote: >> I'm trying to add NEON optimizations to OpenCV's LK optical flow. See >> link below. >> https://github.com/Itseez/opencv/blob/2.4/modules/video/src/lkpyramid.cpp >> >> The gcc version could vary since this is an open source project, but >> the one I'm currently using is 4.8.1. The target architecture is ARMv7 >> w/ NEON. The processor I'm testing on is an ARM >> Cortex-A15(big.LITTLE). >> >> The problem is, in release mode (where optimizations are set) it does >> not work properly. However, in debug mode, it works fine. I tracked >> down a specific variable(FLT_SCALE) that was being optimized out and >> made it volatile and that part worked fine after that. However, I'm >> still having incorrect behavior from some other optimization. > > Forget about using volatile here. That's just wrong. > > You have to mark your inputs, outputs, and clobbers correctly. > > Look at this asm: > > __asm__ volatile ( > "vld1.16 {q0}, [%0]\n\t" //trow0[x + cn] > "vld1.16 {q1}, [%1]\n\t" //trow0[x - cn] > "vsub.i16 q5, q0, q1\n\t" //this is t0 > "vld1.16 {q2}, [%2]\n\t" //trow1[x + cn] > "vld1.16 {q3}, [%3]\n\t" //trow1[x - cn] > "vadd.i16 q6, q2, q3\n\t" //this > needs mult by 3 > "vld1.16 {q4}, [%4]\n\t" //trow1[x] > "vmul.i16 q7, q6, q8\n\t" //this > needs to add to trow1[x]*10 > "vmul.i16 q10, q4, q9\n\t" //this is > trow1[x]*10 > "vadd.i16 q11, q7, q10\n\t" //this is t1 > "vswp d22, d11\n\t" > "vst2.16 {q5}, [%5]\n\t" //interleave > "vst2.16 {q11}, [%6]\n\t" //interleave > : > : "r" (trow0 + x + cn), //0 > "r" (trow0 + x - cn), //1 > "r" (trow1 + x + cn), //2 > "r" (trow1 + x - cn), //3 > "r" (trow1 + x), //4 > "r" (drow + (x*2)), //5 > "r" (drow + (x*2)+8) //6 > : > ); > > It has no outputs. How is this possible? It does a lot of work. It must > have some outputs. I think there should be some outputs for this asm. I > think they are memory outputs. > > Go through all, the asm blocks, and mark the inputs, outputs, and clobbers. > Then it should work. > > Remember one basic thing: you must tell GCC about everything that an > asm does. If it affects memory, you must tell GCC. If it reads memory, > you must tell GCC. DO not lie to the compiler: it will bite you. > > Andrew. >