From mboxrd@z Thu Jan 1 00:00:00 1970 From: wilson@gcc.gnu.org To: gbv@ctv.es, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, nobody@gcc.gnu.org, wilson@gcc.gnu.org Subject: Re: c/3917: IA-64 assembler output shows erroneous cycle counting Date: Mon, 17 Sep 2001 22:24:00 -0000 Message-id: <20010918052359.2903.qmail@sourceware.cygnus.com> X-SW-Source: 2001-09/msg00377.html List-Id: Synopsis: IA-64 assembler output shows erroneous cycle counting Responsible-Changed-From-To: unassigned->wilson Responsible-Changed-By: wilson Responsible-Changed-When: Mon Sep 17 22:23:58 2001 Responsible-Changed-Why: IA-64 maintainer State-Changed-From-To: open->analyzed State-Changed-By: wilson State-Changed-When: Mon Sep 17 22:23:58 2001 State-Changed-Why: The cycle counts indicate what the scheduler thinks the hardware will do. They will always be wrong to some extent, because perfect emulation of the hardware pipeline is difficult. Also, current gcc infrastructure does not have any easy way to describe pipelines as complicated as the Itanium. Major discrepancies should be fixed though. I need a testcase to make sure that we are talking about the same thing. I have provided one of my own. double sub2 (double, double, double, double); double sub (double w, double x, double y, double z, double a, double b, double c, double d) { return sub2 (a + b, c + d, a + b, c + d); } With this testcase, I see that the 4 add instructions get scheduled in 4 different cycles for no apparent reason. You are correct that there is a bug in itanium_split_issue. It should allow 2 FP instructions per cycle. When I tested this, I ran into a number of bugs, and those needed further bug fixes. There was a problem where scheduling M M F I0 instructions caused selection of MLX MFI bundles, requiring emitting a nop.x instruction, which we did not have support for. Bundles MFI MFI would have been better. This is a problem with insn_matches_slot not knowing that an instruction filling LX slots uses FI issue slots, thus it thought with MLX MFI the I would issue to the I0 unit which is not true. With this fixed, I ran into another problem where scheduling a L instruction (mov.l) in one cycle and then a I instruction requiring unit I0 in the next cycle gave an abort. This is because it doesn't know that L instructions take two slots, so it thought that the MLX bundle was not full yet, and tried to schedule in the I0 instruction without rotating out the MLX bundle which is impossible. To fix this, I hacked in code to make L instructions take two slots, which forces the bundle rotation. This works, but did not seem to be the right solution. With this patch, the scheduler now puts the first two add instructions in the same cycle, but not the last two. This is an improvement but we can do better. The remaining problem, which I have not fixed yet, is more involved. It has to do with how the scheduler tries to schedule into two bundles at a time, matching the hardware issue rate. When a bundle is partially filled, we need to decide whether to pad it with nops, or to try to continue filling it with instructions for the next cycle. The current code here is suboptimal. It always tries to fill with instructions from the next cycle. We would get better code in many cases if we padded with nops. This will have to be done carefully to avoid slowing down the core with too many nops. I am planning to continue working on this patch. The current version of the patch is included below. I get a slight performancce increase on specint95, I have not yet had time to test it on specfp which would be more interesting. Your patch by the way is backwards. You should always do "diff oldfile newfile". http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=3917&database=gcc