From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30546 invoked by alias); 8 Jul 2015 17:59:10 -0000 Mailing-List: contact jit-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Subscribe: Sender: jit-owner@gcc.gnu.org Received: (qmail 30504 invoked by uid 89); 8 Jul 2015 17:59:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.98.7 on sourceware.org X-Virus-Found: No X-HELO: mx1.redhat.com Message-ID: <1436377875.24803.100.camel@surprise> Subject: Re: Filed PR jit/66812 for the code generation issue From: David Malcolm To: Dibyendu Majumdar Cc: jit@gcc.gnu.org Date: Thu, 01 Jan 2015 00:00:00 -0000 In-Reply-To: <1436377619.24803.97.camel@surprise> References: <1436365266.24803.65.camel@surprise> <1436367926.24803.71.camel@surprise> <1436369443.24803.75.camel@surprise> <1436377619.24803.97.camel@surprise> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-SW-Source: 2015-q3/txt/msg00035.txt.bz2 On Wed, 2015-07-08 at 13:46 -0400, David Malcolm wrote: > On Wed, 2015-07-08 at 11:30 -0400, David Malcolm wrote: > > On Wed, 2015-07-08 at 11:05 -0400, David Malcolm wrote: > > > On Wed, 2015-07-08 at 10:21 -0400, David Malcolm wrote: > > > > On Sat, 2015-07-04 at 16:58 +0100, Dibyendu Majumdar wrote: > > > > > On 4 July 2015 at 14:20, Dibyendu Majumdar wrote: > > > > > > On 4 July 2015 at 13:11, Dibyendu Majumdar wrote: > > > > > >> Looks like in the failure case the code is being incorrectly > > > > > >> optimized. I wonder if this is a manifestation of the get_address bug, > > > > > >> perhaps the real fix will be better than the patch I am using. I will > > > > > >> use the latest gcc 5 branch and see if that helps. > > > > > >> > > > > > > > > > > > > Hi Dave, > > > > > > > > > > > > I am now using the latest gcc-5-branch from gcc github mirror. > > > > > > Unfortunately the issue still persists. > > > > > > > > > > > > If set optimization level to 0 or 1, then it works ok, but at levels 2 > > > > > > or 3 the break occurs. > > > > > > > > > > > > > > > > Adding the -fno-strict-aliasing appears to resolve the issue with -O2 > > > > > and -O3 but with this enabled the benchmarks are degraded. > > > > > > > > I've filed the bad code generation issue as: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66812 > > > > > > > > and I'm investigating it. > > > > > > > > > Notes on investigation so far. > > > > > > With the fixes here > > > https://gcc.gnu.org/ml/jit/2015-q3/msg00025.html > > > and here: > > > https://gcc.gnu.org/ml/jit/2015-q3/msg00028.html > > > the reproducer from > > > https://gcc.gnu.org/ml/jit/2015-q3/msg00012.html > > > now correctly fails with: > > > error: gcc_jit_context_new_field: unknown size for field > > > "errorJmp" (type: struct ravi_lua_longjmp) > > > > > > Upon applying the equivalent of the fix from: > > > https://github.com/dibyendumajumdar/ravi/commit/d65d2e68fbdcf211ed36deea05727f996ede8296 > > > to the generator you provided, gcc_jit_context_compile completes. > > > > > > I don't know if it's generating bad code (given that I don't have Ravi > > > itself running, just the reproducer). However, I tried comparing the > > > bug_rdump.txt and bug_rdump_ok.txt reproducers, and tried adding > > > -fno-strict-aliasing to the former. > > > > > > I turned on GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING and started comparing > > > the dumpfiles. > > > > > > The diffs stay easily comparable until fake.c.018t.ssa, when numbering > > > differences between temporaries mean that a simple "diff" becomes > > > unreadable. > > > > > > So I wrote a tool to try to alleviate this: > > > https://github.com/davidmalcolm/gcc-dump-diff > > > The tool tries to renumber temporaries in dumpfiles so that they are > > > more consistent between runs, and then tries to do a textual diff of two > > > dumpfiles. > > > > > > (note that it might renumber labels also) > > > > Yes, labels *were* being renumbered. I've fixed that now. > > > > > With this, the dumpfiles become comparable again. I see that when > > > rerunning the bug_rdump.txt reproducer, the first big difference between > > > the with and without -fno-strict-aliasing case happens at 035t.fre1. > > > > > > Note in the following how, without -fno-strict-aliasing, various > > > statements and blocks are eliminated at 035t.fre1, which treats them as > > > dead code. In particular, it seems to have optimized away > > > OP_TEST_do_jmp_5_0 and OP_TEST_do_skip_5_0 (if I'm reading things > > > right). > > > > It's OP_TEST_do_jmp_5_14, jmp_9_2 and OP_TEST_do_skip_5_15 that are > > being optimized away by pass fre1 when -fno-strict-aliasing isn't > > supplied. > > > > I'm attaching a fixed diff, showing the correct label names. > > > > > > > My hunch at this time is that this optimization is being too > > > aggressive... for some reason. I'll continue poking at this. > > Dibyendu: what Lua code generated the reproducer? What is the code > meant to be doing? > > I used gcc_jit_function_dump_to_dot to dump the CFG in GraphViz format; > you can see the result here: > https://dmalcolm.fedorapeople.org/gcc/2015-07-08/rdump.png > and with the printfs here: > https://dmalcolm.fedorapeople.org/gcc/2015-07-08/rdump_ok.png > > I see that both paths out of the "entry" block go through empty blocks > and then into "jmp_5_1". > > A similar thing happens later with "jmp_9_2": both paths from the > conditional lead through empty blocks to "jmp_12_3". > > Those pairs of empty blocks look odd. Is the code correct? > > Looking at the body of "jmp_5_1", and annotating, I see: > > jmp_5_1: > (&L->ci->u.l.base[(int)1])->value_.b = (int)0; > (&L->ci->u.l.base[(int)1])->tt_ = (int)1; > > comparison_0_11 = (&L->ci->u.l.base[(int)1])->tt_ == (int)0; > /* this must be true because of the 2nd assignment above */ > > comparison_0_12 = (&L->ci->u.l.base[(int)1])->tt_ == (int)1; > /* similarly this must be false */ > > comparison_0_13 = (&L->ci->u.l.base[(int)1])->value_.b == (int)0; > /* this must be true because of the 1st assignment above */ > > isfalse_0_10 = comparison_0_11 || comparison_0_12 && comparison_0_13; > /* hence we have: true || false && true > and hence: true */ > > if (!(!(isfalse_0_10))) goto OP_TEST_do_jmp_5_14; else goto OP_TEST_do_skip_5_15; > /* hence this always takes the 1st path; > the 2nd path is indeed dead code */ > > So it does in fact seem reasonable for the optimizer to optimize away > OP_TEST_do_skip_5_15, and I think that once it does that, it merges > OP_TEST_do_jmp_5_14 and jmp_9_2 into jmp_5_1, and can then do similar > optimizations to the statements that were in jmp_9_2. > > So it seems that things I reported pass "fre1" as doing are reasonable. > > It seems that the optimizer is only able to assume the above values when > strict aliasing is enabled, but it seems to be a reasonable > optimization. (I suspect that for some reason the presence of the > printfs also is stopping this optimization; perhaps JIT doesn't know as > much as the C frontend about the lack of side-effects of printf?) > > Is the code being supplied correct? It's not clear to me what it's > meant to be doing, but that CFG looks curious to me. Maybe the input is > incorrect, but it only turns into a problem when optimized? > > FWIW, if this is a CFG issue, note that blocks are cheap in libgccjit; I > find it easiest to simply create a gcc_jit_block * at the entrypoint of > every opcode even if there's no fancy control flow going on; blocks get > immediately consolidated internally, so any redundancy there is very > brief. Do you have a way to print opcodes to a buffer? I think we'd find it much easier to track this issue down if Ravi used gcc_jit_block_add_comment to annotate each opcode: https://gcc.gnu.org/onlinedocs/jit/topics/functions.html#gcc_jit_block_add_comment so we can see the Ravi IR embedded in the libgccjit IR, as it were.