From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28154 invoked by alias); 6 Dec 2013 13:52:24 -0000 Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org Received: (qmail 28134 invoked by uid 89); 6 Dec 2013 13:52:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-qa0-f53.google.com Received: from Unknown (HELO mail-qa0-f53.google.com) (209.85.216.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 06 Dec 2013 13:52:22 +0000 Received: by mail-qa0-f53.google.com with SMTP id j5so556353qaq.5 for ; Fri, 06 Dec 2013 05:52:14 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.49.48.69 with SMTP id j5mr58751951qen.71.1386337934009; Fri, 06 Dec 2013 05:52:14 -0800 (PST) Received: by 10.229.183.6 with HTTP; Fri, 6 Dec 2013 05:52:13 -0800 (PST) In-Reply-To: References: Date: Fri, 06 Dec 2013 13:52:00 -0000 Message-ID: Subject: Re: x86 gcc lacks simple optimization From: Konstantin Vladimirov To: Richard Biener Cc: GCC Development , GCC-help Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2013-12/txt/msg00046.txt.bz2 Hi, Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c and now it yields code like (x86 again): .L5: movzbl 4(%esi,%eax,4), %edx movb %dl, 4(%ebx,%eax,4) addl $1, %eax cmpl %ecx, %eax jne .L5 So, excessive lea is gone. It is great, thank you so much. But I wonder what else can I do to move add upper to simplify memory accesses (I am guessing, this is some arithmetical re-associations, still not sure where to look). For architecture, I am working on, it is important. What would you advise? --- With best regards, Konstantin On Fri, Dec 6, 2013 at 2:25 PM, Richard Biener wrote: > On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov > wrote: >> Hi, >> >> nothing changes if everything is unsigned and we are guaranteed to not >> raise UB on overflow: >> >> unsigned foo(unsigned char *t, unsigned char *v, unsigned w) >> { >> unsigned i; >> >> for (i = 1; i != w; ++i) >> { >> unsigned x = i << 2; >> v[x + 4] = t[x + 4]; >> } >> >> return 0; >> } >> >> yields: >> >> .L5: >> leal 0(,%eax,4), %edx >> addl $1, %eax >> movzbl 4(%edi,%edx), %ecx >> cmpl %ebx, %eax >> movb %cl, 4(%esi,%edx) >> jne .L5 >> >> What is SCEV infrastructure (guessing scalar evolutions?) and what >> files/passes to look in? > > tree-scalar-evolution.c, look at where it handles MULT_EXPR but > lacks LSHIFT_EXPR support. > > Richard. > >> --- >> With best regards, Konstantin >> >> On Fri, Dec 6, 2013 at 2:10 PM, Richard Biener >> wrote: >>> On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov >>> wrote: >>>> Hi, >>>> >>>> Consider code: >>>> >>>> int foo(char *t, char *v, int w) >>>> { >>>> int i; >>>> >>>> for (i = 1; i != w; ++i) >>>> { >>>> int x = i << 2; >>>> v[x + 4] = t[x + 4]; >>>> } >>>> >>>> return 0; >>>> } >>>> >>>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: >>>> >>>> gcc -O2 -m32 -S test.c >>>> >>>> You will see loop, formed like: >>>> >>>> .L5: >>>> leal 0(,%eax,4), %edx >>>> addl $1, %eax >>>> movzbl 4(%edi,%edx), %ecx >>>> cmpl %ebx, %eax >>>> movb %cl, 4(%esi,%edx) >>>> jne .L5 >>>> >>>> But it can be easily simplified to something like this: >>>> >>>> .L5: >>>> addl $1, %eax >>>> movzbl (%esi,%eax,4), %edx >>>> cmpl %ecx, %eax >>>> movb %dl, (%ebx,%eax,4) >>>> jne .L5 >>>> >>>> (i.e. left shift may be moved to address). >>>> >>>> First question to gcc-help maillist. May be there are some options, >>>> that I've missed, and there IS a way to explain gcc my intention to do >>>> this? >>>> >>>> And second question to gcc developers mail list. I am working on >>>> private backend and want to add this optimization to my backend. What >>>> do you advise me to do -- custom gimple pass, or rtl pass, or modify >>>> some existent pass, etc? >>> >>> This looks like a deficiency in induction variable optimization. Note >>> that i << 2 may overflow and this overflow does not invoke undefined >>> behavior but is in the implementation defined behavior category. >>> >>> The issue in this case is likely that the SCEV infrastructure does not handle >>> left-shifts. >>> >>> Richard. >>> >>>> --- >>>> With best regards, Konstantin