From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19615 invoked by alias); 29 Apr 2009 09:32:40 -0000 Received: (qmail 19411 invoked by uid 48); 29 Apr 2009 09:32:07 -0000 Date: Wed, 29 Apr 2009 09:32:00 -0000 Message-ID: <20090429093207.19410.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "jakub at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-04/txt/msg02853.txt.bz2 ------- Comment #13 from jakub at gcc dot gnu dot org 2009-04-29 09:32 ------- You are benchmarking something completely unrelated. What really matters is how code that has 4 branches/calls in one 16-byte block is able to predict all those branches. And Core2 similarly to various AMD CPUs is not able to predict them well. In the #c6 testcase it considers the je, call, jne and ret whether they can be in a 16 byte block or not. They can't, je is 2 bytes, call 5 bytes, leal 4 bytes (but gcc uses min_insn_size, which is 2 in this case), testl 2, jne 2, addq 4 (but again, min_insn_size is 2 in this case). min_insn_size seems to be very conservative, I guess teaching it about a bunch of prefixes couldn't hurt, for non-jump/call insns ATM it estimates just the displacement size, doesn't consider any prefixes (even those that really can't change after machine reorg), etc. -- jakub at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942