From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jit-return-735-listarch-jit=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30546 invoked by alias); 8 Jul 2015 17:59:10 -0000
Mailing-List: contact jit-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Post: <mailto:jit@gcc.gnu.org>
List-Help: <mailto:jit-help@gcc.gnu.org>
List-Subscribe: <mailto:jit-subscribe@gcc.gnu.org>
Sender: jit-owner@gcc.gnu.org
Received: (qmail 30504 invoked by uid 89); 8 Jul 2015 17:59:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Checked: by ClamAV 0.98.7 on sourceware.org
X-Virus-Found: No
X-HELO: mx1.redhat.com
Message-ID: <1436377875.24803.100.camel@surprise>
Subject: Re: Filed PR jit/66812 for the code generation issue
From: David Malcolm <dmalcolm@redhat.com>
To: Dibyendu Majumdar <mobile@majumdar.org.uk>
Cc: jit@gcc.gnu.org
Date: Thu, 01 Jan 2015 00:00:00 -0000
In-Reply-To: <1436377619.24803.97.camel@surprise>
References: 
	<CACXZuxc3z92zKFWhNBU4a0LDxhyNRiZcbMf83HVYkOn-7CScDQ@mail.gmail.com>
	 <CACXZuxfKRkEzSjzPCvpSvL0EcryEofyL-xjgJLxByNQDNpBjTg@mail.gmail.com>
	 <CACXZuxez_fzRYnmZ-traUo2uOr5fU4kC3VrEbWYC8UWDnvEa_A@mail.gmail.com>
	 <CACXZuxfb+OOW6Mc4GPBfUaE4VSAhOmy=f1KCXxLmXq=Mw7DdyQ@mail.gmail.com>
	 <1436365266.24803.65.camel@surprise> <1436367926.24803.71.camel@surprise>
	 <1436369443.24803.75.camel@surprise> <1436377619.24803.97.camel@surprise>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
X-SW-Source: 2015-q3/txt/msg00035.txt.bz2

On Wed, 2015-07-08 at 13:46 -0400, David Malcolm wrote:
> On Wed, 2015-07-08 at 11:30 -0400, David Malcolm wrote:
> > On Wed, 2015-07-08 at 11:05 -0400, David Malcolm wrote:
> > > On Wed, 2015-07-08 at 10:21 -0400, David Malcolm wrote:
> > > > On Sat, 2015-07-04 at 16:58 +0100, Dibyendu Majumdar wrote:
> > > > > On 4 July 2015 at 14:20, Dibyendu Majumdar <mobile@majumdar.org.uk> wrote:
> > > > > > On 4 July 2015 at 13:11, Dibyendu Majumdar <mobile@majumdar.org.uk> wrote:
> > > > > >> Looks like in the failure case the code is being incorrectly
> > > > > >> optimized. I wonder if this is a manifestation of the get_address bug,
> > > > > >> perhaps the real fix will be better than the patch I am using. I will
> > > > > >> use the latest gcc 5 branch and see if that helps.
> > > > > >>
> > > > > >
> > > > > > Hi Dave,
> > > > > >
> > > > > > I am now using the latest gcc-5-branch from gcc github mirror.
> > > > > > Unfortunately the issue still persists.
> > > > > >
> > > > > > If set optimization level to 0 or 1, then it works ok, but at levels 2
> > > > > > or 3 the break occurs.
> > > > > >
> > > > > 
> > > > > Adding the -fno-strict-aliasing appears to resolve the issue with -O2
> > > > > and -O3 but with this enabled the benchmarks are degraded.
> > > > 
> > > > I've filed the bad code generation issue as:
> > > >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66812
> > > > 
> > > > and I'm investigating it.
> > > 
> > > 
> > > Notes on investigation so far.
> > > 
> > > With the fixes here
> > >   https://gcc.gnu.org/ml/jit/2015-q3/msg00025.html
> > > and here:
> > >   https://gcc.gnu.org/ml/jit/2015-q3/msg00028.html
> > > the reproducer from
> > >  https://gcc.gnu.org/ml/jit/2015-q3/msg00012.html
> > > now correctly fails with:
> > >  error: gcc_jit_context_new_field: unknown size for field
> > > "errorJmp" (type: struct ravi_lua_longjmp)
> > > 
> > > Upon applying the equivalent of the fix from:
> > > https://github.com/dibyendumajumdar/ravi/commit/d65d2e68fbdcf211ed36deea05727f996ede8296
> > > to the generator you provided, gcc_jit_context_compile completes.
> > > 
> > > I don't know if it's generating bad code (given that I don't have Ravi
> > > itself running, just the reproducer).  However, I tried comparing the
> > > bug_rdump.txt and bug_rdump_ok.txt reproducers, and tried adding
> > > -fno-strict-aliasing to the former.
> > > 
> > > I turned on GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING and started comparing
> > > the dumpfiles.
> > > 
> > > The diffs stay easily comparable until fake.c.018t.ssa, when numbering
> > > differences between temporaries mean that a simple "diff" becomes
> > > unreadable.
> > > 
> > > So I wrote a tool to try to alleviate this:
> > >   https://github.com/davidmalcolm/gcc-dump-diff
> > > The tool tries to renumber temporaries in dumpfiles so that they are
> > > more consistent between runs, and then tries to do a textual diff of two
> > > dumpfiles.
> > > 
> > > (note that it might renumber labels also)
> > 
> > Yes, labels *were* being renumbered.  I've fixed that now.
> > 
> > > With this, the dumpfiles become comparable again.  I see that when
> > > rerunning the bug_rdump.txt reproducer, the first big difference between
> > > the with and without -fno-strict-aliasing case happens at 035t.fre1.
> > > 
> > > Note in the following how, without -fno-strict-aliasing, various
> > > statements and blocks are eliminated at 035t.fre1, which treats them as
> > > dead code.  In particular, it seems to have optimized away
> > > OP_TEST_do_jmp_5_0 and OP_TEST_do_skip_5_0 (if I'm reading things
> > > right).
> > 
> > It's OP_TEST_do_jmp_5_14, jmp_9_2 and OP_TEST_do_skip_5_15 that are
> > being optimized away by pass fre1 when -fno-strict-aliasing isn't
> > supplied.
> > 
> > I'm attaching a fixed diff, showing the correct label names.
> > 
> > 
> > > My hunch at this time is that this optimization is being too
> > > aggressive... for some reason.  I'll continue poking at this.
> 
> Dibyendu: what Lua code generated the reproducer?  What is the code
> meant to be doing?
> 
> I used gcc_jit_function_dump_to_dot to dump the CFG in GraphViz format;
> you can see the result here:
>  https://dmalcolm.fedorapeople.org/gcc/2015-07-08/rdump.png
> and with the printfs here:
>  https://dmalcolm.fedorapeople.org/gcc/2015-07-08/rdump_ok.png
> 
> I see that both paths out of the "entry" block go through empty blocks
> and then into "jmp_5_1".
> 
> A similar thing happens later with "jmp_9_2": both paths from the
> conditional lead through empty blocks to "jmp_12_3".
> 
> Those pairs of empty blocks look odd.  Is the code correct?
> 
> Looking at the body of "jmp_5_1", and annotating, I see:
> 
> jmp_5_1:
>   (&L->ci->u.l.base[(int)1])->value_.b = (int)0;
>   (&L->ci->u.l.base[(int)1])->tt_ = (int)1;
> 
>   comparison_0_11 = (&L->ci->u.l.base[(int)1])->tt_ == (int)0;
>      /* this must be true because of the 2nd assignment above */
> 
>   comparison_0_12 = (&L->ci->u.l.base[(int)1])->tt_ == (int)1;
>      /* similarly this must be false */
> 
>   comparison_0_13 = (&L->ci->u.l.base[(int)1])->value_.b == (int)0;
>      /* this must be true because of the 1st assignment above */
> 
>   isfalse_0_10 = comparison_0_11 || comparison_0_12 && comparison_0_13;
>      /* hence we have:   true || false && true
>         and hence:       true  */
> 
>   if (!(!(isfalse_0_10))) goto OP_TEST_do_jmp_5_14; else goto OP_TEST_do_skip_5_15;
>       /* hence this always takes the 1st path;
>          the 2nd path is indeed dead code */
> 
> So it does in fact seem reasonable for the optimizer to optimize away
> OP_TEST_do_skip_5_15, and I think that once it does that, it merges
> OP_TEST_do_jmp_5_14 and jmp_9_2 into jmp_5_1, and can then do similar
> optimizations to the statements that were in jmp_9_2.
> 
> So it seems that things I reported pass "fre1" as doing are reasonable.
> 
> It seems that the optimizer is only able to assume the above values when
> strict aliasing is enabled, but it seems to be a reasonable
> optimization.  (I suspect that for some reason the presence of the
> printfs also is stopping this optimization; perhaps JIT doesn't know as
> much as the C frontend about the lack of side-effects of printf?)
> 
> Is the code being supplied correct?  It's not clear to me what it's
> meant to be doing, but that CFG looks curious to me.  Maybe the input is
> incorrect, but it only turns into a problem when optimized?
> 
> FWIW, if this is a CFG issue, note that blocks are cheap in libgccjit; I
> find it easiest to simply create a gcc_jit_block * at the entrypoint of
> every opcode even if there's no fancy control flow going on; blocks get
> immediately consolidated internally, so any redundancy there is very
> brief.

Do you have a way to print opcodes to a buffer?  I think we'd find it
much easier to track this issue down if Ravi used
gcc_jit_block_add_comment to annotate each opcode:
https://gcc.gnu.org/onlinedocs/jit/topics/functions.html#gcc_jit_block_add_comment

so we can see the Ravi IR embedded in the libgccjit IR, as it were.