From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jit-return-994-listarch-jit=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 56961 invoked by alias); 16 Aug 2017 14:02:51 -0000
Mailing-List: contact jit-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Post: <mailto:jit@gcc.gnu.org>
List-Help: <mailto:jit-help@gcc.gnu.org>
List-Subscribe: <mailto:jit-subscribe@gcc.gnu.org>
Sender: jit-owner@gcc.gnu.org
Received: (qmail 36683 invoked by uid 89); 16 Aug 2017 14:02:22 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Checked: by ClamAV 0.99.2 on sourceware.org
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org
X-Spam-Level:
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 Aug 2017 14:02:04 +0000
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 2DB9564111;	Wed, 16 Aug 2017 14:01:59 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 2DB9564111
Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dmalcolm@redhat.com
Received: from ovpn-117-90.phx2.redhat.com (ovpn-117-90.phx2.redhat.com [10.3.117.90])	by smtp.corp.redhat.com (Postfix) with ESMTP id A708C5D6A4;	Wed, 16 Aug 2017 14:01:57 +0000 (UTC)
Message-ID: <1502892117.3741.15.camel@redhat.com>
Subject: Re: [committed] jit: add gcc_jit_type_get_vector
From: David Malcolm <dmalcolm@redhat.com>
To: Michael Cree <mcree@orcon.net.nz>
Cc: jit@gcc.gnu.org
Date: Sun, 01 Jan 2017 00:00:00 -0000
In-Reply-To: <20170816095854.dp3qe5dmsuqnblsg@tower>
References: <20170809084227.s23odfpcdyjvrtin@tower>	 <1502326873-58234-1-git-send-email-dmalcolm@redhat.com>	 <20170816095854.dp3qe5dmsuqnblsg@tower>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 16 Aug 2017 14:01:59 +0000 (UTC)
X-IsSubscribed: yes
X-SW-Source: 2017-q3/txt/msg00009.txt.bz2

On Wed, 2017-08-16 at 21:58 +1200, Michael Cree wrote:
> On Wed, Aug 09, 2017 at 09:01:13PM -0400, David Malcolm wrote:
> > On Wed, 2017-08-09 at 20:42 +1200, Michael Cree wrote:
> > > On Mon, Aug 07, 2017 at 10:28:57AM -0400, David Malcolm wrote:
> > > > What would the ideal API
> > > > look like? 
> > > > 
> > > > Maybe something like:
> > > > 
> > > >   extern gcc_jit_type *
> > > >   gcc_jit_type_get_vector (gcc_jit_type *type, unsigned
> > > > nunits);
> > > >  
> > > > with various requirements (type must be integral/floating
> > > > point;
> > > > nunits
> > > > must be a power of two).
> > > 
> > > I suspect that would do the job nicely.
> > 
> > I implemented the above (although I switched the 2nd arg to be
> > "size_t num_units").
> 
> Thanks!  I haven't been able to try the vector type yet; current
> gcc trunk which I just pulled failed to build.
> 
> But I have started work using gcc 6.4 and 7.1 libgccjit (without
> the vector type) and have a problem noted below.  But first:
> 
> > It looks like you may not need to explicitly use builtins to
> > access machine specific simd intrinsics; for example, on x86_64
> > when I tried multiplying two of these together for float, with
> > GCC_JIT_BINARY_OP_MULT, which led to this gimple:
> > 
> > jit_v4f_mult (const vector(4) <float:32> * a, const vector(4)
> > <float:32> * b, vector(4) <float:32> * c)
> > {
> >   initial:
> >   _1 = *a;
> >   _2 = *b;
> >   _3 = _1 * _2;
> >   *c = _3;
> >   return;
> > }
> > 
> > on this x86_64 box it compiled to:
> > 
> > 	movaps	(%rdi), %xmm0
> > 	mulps	(%rsi), %xmm0
> > 	movaps	%xmm0, (%rdx)
> > 	ret
> > 
> > (i.e. using the "mulps" SIMD instruction).
> 
> Yep, compiling with optimisation set to -O3 will enable the
> vectorisation optimisations.  I normally compile with -O2;
> historically
> I have found -O3 as likely to impair performance as to improve
> performance (so I tend not to use -O3) but maybe that has changed in
> recent decades ;-)
> 
> The vectorisation optimisations are not clever enough to well
> optimise
> more complicated image processing filters so accessing the builtins
> will
> be necessary. 
> 
> But I have hit a problem which I suspect is a bug in the gcc
> optimiser.
> 
> In the vein of your example above, but working on uint8_t pixel data
> and adding saturation, the jit compiler segfaults in the optimiser. I
> provide below the gimple produced by the function that causes the
> problem (I presume that is more useful than the code calling the
> gcc_jit routines), 

There's actually a handy entrypoint for generating minimal reproducers
for such crashes:
  gcc_jit_context_dump_reproducer_to_file

https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#gcc_jit_context_dump_reproducer_to_file

Can you add a call to that to your code (after the context is fully
populated), and see if the resulting .c file leads to the crash when
run?  If so, can you post the .c file here please (or attach it to
bugzilla), and hopefully I can then reproduce it at my end.

> and a backtrace from the jit compiler.  This example
> is from Debian gcc 6.3.0-18 (but it also happens with gcc 7.1;
> unfortunately my build of gcc from the trunk failed).  Should I file
> a
> bug report, and if so, against what component?

Yes please.  There's a "jit" component.

> For the below I have set optimisation level to -O3 (to get
> vectorisation) and specified -mavx2 as a compiler arg.  (BTW, the
> same
> segfault also occurs when compiling for Arm and Arm64.  Also if I set
> optimisation level to -O2 the example compiles and runs correctly.)
> 
> The offending function I implement in the JIT is essentially:
> 
> ip_jit_im_add_clip_UBYTE (struct ip_image * dest, struct ip_image *
> src)
> {
>     int rowlen = dest->size.x;
>     int numrows = dest->size.y;
>     for (int j=0; j<numrows; j++) {
>         uint8_t *sptr, *dptr;
>         dptr = (uint8_t *)dest->imrow[j];
>         sptr = (uint8_t *)src->imrow[j];
>         for (int i=0; i<rowlen; i++) {
>             int ival = (int)*dptr + (int)*sptr;
>             if (ival > UINT8_MAX)
>                 ival = UINT8_MAX;
>             *dptr = (uint8_t)ival;
>             sptr++; dptr++;
>         }
>     }
> }
> 
> 
> The gimple produced is:
> 
> ip_jit_im_add_clip_UBYTE (struct ip_image * dest, struct ip_image *
> src)
> {
>   void * * D.370;
>   sizetype D.371;
>   sizetype D.372;
>   void * * D.373;
>   void * * D.374;
>   void * * D.375;
>   unsigned char D.376;
>   signed int D.377;
>   unsigned char D.378;
>   signed int D.379;
>   unsigned char D.380;
>   sizetype D.381;
>   signed int ival;
>   signed int i;
>   unsigned char * sptr;
>   unsigned char * dptr;
>   signed int j;
>   signed int numrows;
>   signed int rowlen;
> 
>   F1:
>   rowlen = dest->size.x;
>   numrows = dest->size.y;
>   j = 0;
>   goto C1;
>   C1:
>   if (j < numrows) goto L1; else goto A1;
>   L1:
>   D.370 = dest->imrow;
>   D.371 = (sizetype) j;
>   D.372 = D.371 * 8;
>   D.373 = D.370 + D.372;
>   dptr = *D.373;
>   D.374 = src->imrow;
>   D.371 = (sizetype) j;
>   D.372 = D.371 * 8;
>   D.375 = D.374 + D.372;
>   sptr = *D.375;
>   i = 0;
>   goto C2;
>   C2:
>   if (i < rowlen) goto L2; else goto A2;
>   L2:
>   D.376 = *dptr;
>   D.377 = (signed int) D.376;
>   D.378 = *sptr;
>   D.379 = (signed int) D.378;
>   ival = D.377 + D.379;
>   if (ival > 255) goto p_C1_true; else goto p_C1_end;
>   A2:
>   j = j + 1;
>   goto C1;
>   A1:
>   return;
>   p_C1_true:
>   ival = 255;
>   goto p_C1_end;
>   p_C1_end:
>   D.380 = (unsigned char) ival;
>   *dptr = D.380;
>   D.381 = 1;
>   dptr = dptr + D.381;
>   D.381 = 1;
>   sptr = sptr + D.381;
>   i = i + 1;
>   goto C2;
> }
> 
> 
> And the optimiser segfaults while compiling the above with:
> 
> Program received signal SIGSEGV, Segmentation fault.
> optab_for_tree_code (code=code@entry=VEC_UNPACK_LO_EXPR, type=type@en
> try=0x0, 
>     subtype=subtype@entry=optab_default) at ../../src/gcc/optabs-
> tree.c:190
> 190	../../src/gcc/optabs-tree.c: No such file or directory.
> (gdb) bt
> #0  optab_for_tree_code (code=code@entry=VEC_UNPACK_LO_EXPR, type=typ
> e@entry=0x0, 
>     subtype=subtype@entry=optab_default) at ../../src/gcc/optabs-
> tree.c:190
> #1  0x00007ffff6148593 in supportable_widening_operation (code=code@e
> ntry=NOP_EXPR, 
>     stmt=stmt@entry=0x7ffff3d170f0, vectype_out=vectype_out@entry=0x7
> ffff3d32f18, 
>     vectype_in=0x7ffff3cf3f18, code1=code1@entry=0x7fffffffd804, 
>     code2=code2@entry=0x7fffffffd808, multi_step_cvt=0x7fffffffd814, 
>     interm_types=0x7fffffffd850) at ../../src/gcc/tree-vect-
> stmts.c:9037
> #2  0x00007ffff614c2e5 in vectorizable_conversion (stmt=stmt@entry=0x
> 7ffff3d170f0, 
>     gsi=gsi@entry=0x0, vec_stmt=vec_stmt@entry=0x0, slp_node=slp_node
> @entry=0x0)
>     at ../../src/gcc/tree-vect-stmts.c:3803
> #3  0x00007ffff6159d25 in vect_analyze_stmt (stmt=stmt@entry=0x7ffff3
> d170f0, 
>     need_to_vectorize=need_to_vectorize@entry=0x7fffffffd978, node=no
> de@entry=0x0)
>     at ../../src/gcc/tree-vect-stmts.c:8135
> #4  0x00007ffff616830b in vect_analyze_loop_operations
> (loop_vinfo=0x555555e80660, 
>     loop_vinfo=0x555555e80660) at ../../src/gcc/tree-vect-loop.c:1727
> #5  vect_analyze_loop_2 (fatal=<synthetic pointer>: <optimized out>,
> loop_vinfo=0x555555e80660)
>     at ../../src/gcc/tree-vect-loop.c:2015
> #6  vect_analyze_loop (loop=loop@entry=0x7ffff3d02ee0) at
> ../../src/gcc/tree-vect-loop.c:2268
> #7  0x00007ffff617a37f in vectorize_loops () at ../../src/gcc/tree-
> vectorizer.c:532
> #8  0x00007ffff5eec80a in execute_one_pass (pass=pass@entry=0x555555c
> 335c0)
>     at ../../src/gcc/passes.c:2336
> #9  0x00007ffff5eecdd8 in execute_pass_list_1 (pass=0x555555c335c0)
>     at ../../src/gcc/passes.c:2420
> #10 0x00007ffff5eecdea in execute_pass_list_1 (pass=0x555555c32e30)
>     at ../../src/gcc/passes.c:2421
> #11 0x00007ffff5eecdea in execute_pass_list_1 (pass=0x555555c31c90)
>     at ../../src/gcc/passes.c:2421
> #12 0x00007ffff5eece3d in execute_pass_list (fn=<optimized out>,
> pass=<optimized out>)
>     at ../../src/gcc/passes.c:2431
> #13 0x00007ffff5c7c4b3 in cgraph_node::expand (this=0x7ffff3d132e0)
>     at ../../src/gcc/cgraphunit.c:1990
> #14 0x00007ffff5c7db6f in expand_all_functions () at
> ../../src/gcc/cgraphunit.c:2126
> #15 symbol_table::compile (this=0x7ffff3cd30a8) at
> ../../src/gcc/cgraphunit.c:2482
> #16 0x00007ffff5c7f53a in symbol_table::finalize_compilation_unit
> (this=0x7ffff3cd30a8)
>     at ../../src/gcc/cgraphunit.c:2572
> #17 0x00007ffff5fa227a in compile_file () at
> ../../src/gcc/toplev.c:488
> #18 0x00007ffff5bdb207 in do_compile () at
> ../../src/gcc/toplev.c:2011
> #19 toplev::main (this=this@entry=0x7fffffffdd4e, argc=<optimized
> out>, argv=<optimized out>)
>     at ../../src/gcc/toplev.c:2119
> #20 0x00007ffff5bfd066 in gcc::jit::playback::context::compile (this=
> this@entry=0x7fffffffdda0)
>     at ../../src/gcc/jit/jit-playback.c:1789
> #21 0x00007ffff5bf3bf9 in gcc::jit::recording::context::compile (this
> =this@entry=0x555555bb8990)
>     at ../../src/gcc/jit/jit-recording.c:1241
> #22 0x00007ffff5be9649 in gcc_jit_context_compile
> (ctxt=0x555555bb8990)
>     at ../../src/gcc/jit/libgccjit.c:2677
> #23 0x00005555555703c7 in ip_init_jit () at jit.c:615
> #24 0x0000555555568c6f in im_add (dest=0x5555559b3530,
> src=0x5555558b0eb0, flag=0)
>     at arith.c:750
> #25 0x000055555556364e in run_libip_operator (flag=0,
> s=0x5555558b0eb0, d=0x5555559b3530, op=0)
>     at arith-test.c:228
> #26 im_op_ii_check (op=0, type=3, size=..., flag=<optimized out>,
> source=<optimized out>)
>     at arith-test.c:334
> #27 0x000055555556428f in run_im_ii_tests (operator=0, size=...,
> chk_flag=114)
>     at arith-test.c:488
> #28 0x000055555555ef34 in main (argc=<optimized out>, argv=<optimized
> out>) at arith-test.c:601
> 
> Cheers
> Michael.