From: Richard Earnshaw <rearnsha@arm.com>
To: Revital1 Eres <ERES@il.ibm.com>
Cc: Roman Zhuykov <zhroma@ispras.ru>,
dm@ispras.ru, gcc@gcc.gnu.org, cltang@codesourcery.com,
yao@codesourcery.com, Ayal Zaks <ZAKS@il.ibm.com>
Subject: Re: [ARM] Implementing doloop pattern
Date: Wed, 05 Jan 2011 15:35:00 -0000 [thread overview]
Message-ID: <1294241704.7406.12.camel@e102346-lin.cambridge.arm.com> (raw)
In-Reply-To: <OFFD96764C.38FEC784-ONC2257809.005BD5C2-C2257809.005D0F95@il.ibm.com>
On Thu, 2010-12-30 at 18:56 +0200, Revital1 Eres wrote:
> Hello,
>
> The attached patch is my latest attempt to model doloop for arm.
> I followed Chung-Lin Tang suggestion and used subs+jump similar to your
> patch.
> On crotex-A8 I see gain of 29% on autocor benchmark (telecom suite) with
> SMS using the following flags: -fmodulo-sched-allow-regmoves
> -funsafe-loop-optimizations -fmodulo-sched -fno-auto-inc-dec
> -fdump-rtl-sms -mthumb -mcpu=cortex-a8 -O3. (compare to using only
> -mthumb -mcpu=cortex-a8 -O3)
>
> I have not fully tested the patch and it's not in the proper format of
> submission yet.
>
> Thanks,
> Revital
>
> (See attached file: patch_arm_doloop.txt)
>
>
>
> From: Roman Zhuykov <zhroma@ispras.ru>
> To: gcc@gcc.gnu.org
> Cc: dm@ispras.ru
> Date: 30/12/2010 04:04 PM
> Subject: [ARM] Implementing doloop pattern
> Sent by: gcc-owner@gcc.gnu.org
>
>
>
> Hello!
>
> The main idea of the work described below was to estimate speedup we can
> gain from SMS on ARM. SMS depends on doloop_end pattern and there is no
> appropriate instruction on ARM. We decided to create a "fake"
> doloop_end pattern on ARM using a pair of "subs" and "bne" assembler
> instructions. In implementation we used ideas from machine description
> files of other architectures, e. g. spu, which expands doloop_end
> pattern only when SMS is enabled. The patch is attached.
>
> This patch allows to use any possible register for the doloop pattern.
> It was tested on trunk snapshot from 30 Aug 2010. It works fine on
> several small examples, but gives an ICE on sqlite-amalgamation-3.6.1
> source:
> sqlite3.c: In function 'sqlite3WhereBegin':
> sqlite3.c:76683:1: internal compiler error: in patch_jump_insn, at
> cfgrtl.c:1020
>
> ICE happens in ira pass, when cleanup_cfg is called at the end or ira.
>
> The "bad" instruction looks like
> (jump_insn 3601 628 4065 76 (parallel [
> (set (pc)
> (if_then_else (ne (mem/c:SI (plus:SI (reg/f:SI 13 sp)
> (const_int 36 [0x24])) [105 %sfp+-916
> S4 A32])
> (const_int 1 [0x1]))
> (label_ref 3600)
> (pc)))
> (set (mem/c:SI (plus:SI (reg/f:SI 13 sp)
> (const_int 36 [0x24])) [105 %sfp+-916 S4 A32])
> (plus:SI (mem/c:SI (plus:SI (reg/f:SI 13 sp)
> (const_int 36 [0x24])) [105 %sfp+-916 S4 A32])
> (const_int -1 [0xffffffffffffffff])))
> ]) sqlite3.c:75235 328 {doloop_end_internal}
> (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
> (nil))
> -> 3600)
>
> So, the problem seems to be with ira. Memory is used instead of a
> register to store doloop counter. We tried to fix this by explicitly
> specifying hard register (r5) for doloop pattern. The fixed version
> seems to work, but this doesn't look like a proper fix. On trunk
> snapshot from 17 Dec 2010 the ICE described above have disappeared, but
> probably it's just a coincidence, and it will shop up anyway on some
> other test case.
>
> The r5-fix shows the following results (compare "-O2 -fno-auto-inc-dec
> -fmodulo-sched" vs "-O2 -fno-auto-inc-dec").
> Aburto benchmarks: heapsort and matmult - 3% speedup. nsieve - 7% slowdown.
> Other aburto tests, sqlite tests and libevas rasterization library
> (expedite testsuite) show around zero results.
>
> A motivating example shows about 23% speedup:
>
> char scal (int n, char *a, char *b)
> {
> int i;
> char s = 0;
> for (i = 0; i < n; i++)
> s += a[i] * b[i];
> return s;
> }
>
> We have analyzed SMS results, and can conclude that if SMS has
> successfully built a schedule for the loop we usually gain a speedup,
> and when SMS fails, we often have some slowdown, which have appeared
> because of do-loop conversion.
>
> The questions are:
> How to properly fix the ICE described?
> Do you think this approach (after the fixes) can make its way into trunk?
>
> Happy holidays!
> --
> Roman Zhuykov
>
> [attachment "sms-doloop-any-reg.diff" deleted by Revital1 Eres/Haifa/IBM]
@@ -162,6 +175,7 @@ doloop_condition_get (rtx doloop_pat)
return 0;
if ((XEXP (condition, 0) == reg)
+ || (REGNO (XEXP (condition, 0)) == CC_REGNUM)
|| (GET_CODE (XEXP (condition, 0)) == PLUS
&& XEXP (XEXP (condition, 0), 0) == reg))
You can't depend on CC_REGNUM in generic code. That's part of the
private machine description for ARM. Other cores have different ways of
representing condition codes.
R.
next prev parent reply other threads:[~2011-01-05 15:35 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-30 14:04 Roman Zhuykov
2010-12-30 16:02 ` Ulrich Weigand
2010-12-30 16:56 ` Revital1 Eres
2011-01-05 15:35 ` Richard Earnshaw [this message]
2011-01-06 7:59 ` Revital1 Eres
2011-01-06 9:11 ` Andreas Schwab
2011-01-13 11:11 ` Ramana Radhakrishnan
2011-01-13 13:51 ` Nathan Froyd
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1294241704.7406.12.camel@e102346-lin.cambridge.arm.com \
--to=rearnsha@arm.com \
--cc=ERES@il.ibm.com \
--cc=ZAKS@il.ibm.com \
--cc=cltang@codesourcery.com \
--cc=dm@ispras.ru \
--cc=gcc@gcc.gnu.org \
--cc=yao@codesourcery.com \
--cc=zhroma@ispras.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).