From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6459 invoked by alias); 14 Jun 2010 12:40:26 -0000 Received: (qmail 5529 invoked by uid 48); 14 Jun 2010 12:39:54 -0000 Date: Mon, 14 Jun 2010 12:40:00 -0000 Message-ID: <20100614123953.5528.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "jamborm at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-06/txt/msg01526.txt.bz2 ------- Comment #15 from jamborm at gcc dot gnu dot org 2010-06-14 12:39 ------- (In reply to comment #14) > SSE performance is fine again, thanks a lot! > > One more question, if that's OK... > Depending on ARRSZ the testcase uses wildly varying amounts of CPU time; it's > about half a second for ARRSZ=1024, but almost 10 seconds for ARRSZ=20 on my > machine, which is extremely strange because the operation count is the same in > both cases. I suspect that something weird is happening with respect to the > cache and prefetching. Should I open another PR for this? > The generated assembly is not different for the two cases, except that there are much smaller offsets, of course. This means that the lpic and pre1 arrays are much closer to each other which may be something the processor does not like. I find this surprising but unless you can think of a specific missed optimization opportunity (I can't), I don't think it is a PR material. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423