From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26597 invoked by alias); 3 Oct 2013 10:47:24 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 26542 invoked by uid 48); 3 Oct 2013 10:47:21 -0000 From: "olegendo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses Date: Thu, 03 Oct 2013 10:47:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 4.7.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: olegendo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-10/txt/msg00154.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #16 from Oleg Endo --- (In reply to bin.cheng from comment #15) > There must be another scenario for the example, and in this case example: > > int test_0 (char* p, int c) > { > int r = 0; > r += *p++; > r += *p++; > r += *p++; > return r; > } > > should be translated into sth like: > //... > ldrb [rx] > ldrb [rx+1] > ldrb [rx+2] > add rx, rx, #3 > //... As mentioned above, on SH this is the case (displacement addressing mode is selected). However, the order of the memory accesses is not the same as in the original source code (which is OK for non-volatile mems). > This way all loads are independent and can be issued on super scalar > machine. Actuall for targets like arm which supports post-increment > constant (other than size of memory access), it can be further changed into: > //... > ldrb [rx], #3 > ldrb [rx-2] > ldrb [rx-1] > //... Whether this is transformation is beneficial or not depends on the target architecture of course. E.g. SH2A and SH4* is 2-way super scalar, but they have only one memory load/store unit. Thus the loads would not be done in parallel anyway and the latency of the post-incremented address register can be neglected. There is a similar case on SH regarding floating point loads. SH ISAs (other than SH2A) don't have a displacement addressing mode for floating point loads/stores. When loading adjacent memory locations it's best to use post-inc addressing modes and when storing adjacent memory locations it's best to use pre-dec stores. I.e. things like float* x = ...; *x++ = a; *x++ = b; *x++ = c; should become: add #8,r1 fmov.s fr0,@r1 // store c fmov.s fr1,@-r1 // store b fmov.s fr2,@-r1 // store a > For now auto-increment pass can't do this optimization. I once have a patch > for this but benchmark shows the case is not common. To be honest, I think such optimizations as mentioned above are out of scope of the auto-increment pass. Of course we can try to wallpaper its shortcomings and get some improvements here and there, but as soon as ivopts or another tree pass is changed and outputs different sequences it will break again. Thus I suggested to replace the auto-inc-dec pass with a more generic addressing mode selection RTL pass (PR 56590). Unfortunately, I don't have anything useful yet. But I could share some notes and thoughts regarding AMS optimization, if you're interested.