From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 37276 invoked by alias); 16 Aug 2017 10:03:08 -0000 Mailing-List: contact jit-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Subscribe: Sender: jit-owner@gcc.gnu.org Received: (qmail 89323 invoked by uid 89); 16 Aug 2017 09:59:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.99.2 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=2268, 57AM, optimise, multiplying X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-HELO: smtp-3.orcon.net.nz Received: from smtp-3.orcon.net.nz (HELO smtp-3.orcon.net.nz) (60.234.4.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 Aug 2017 09:59:04 +0000 Received: from [121.99.228.40] (port=41326 helo=tower) by smtp-3.orcon.net.nz with esmtpa (Exim 4.86_2) (envelope-from ) id 1dhv6J-0003yn-7l; Wed, 16 Aug 2017 21:58:59 +1200 Date: Sun, 01 Jan 2017 00:00:00 -0000 From: Michael Cree To: David Malcolm Cc: jit@gcc.gnu.org Subject: Re: [committed] jit: add gcc_jit_type_get_vector Message-ID: <20170816095854.dp3qe5dmsuqnblsg@tower> References: <20170809084227.s23odfpcdyjvrtin@tower> <1502326873-58234-1-git-send-email-dmalcolm@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1502326873-58234-1-git-send-email-dmalcolm@redhat.com> User-Agent: NeoMutt/20170113 (1.7.2) X-GeoIP: NZ X-Spam_score: -2.9 X-Spam_score_int: -28 X-Spam_bar: -- X-IsSubscribed: yes X-SW-Source: 2017-q3/txt/msg00008.txt.bz2 On Wed, Aug 09, 2017 at 09:01:13PM -0400, David Malcolm wrote: > On Wed, 2017-08-09 at 20:42 +1200, Michael Cree wrote: > > On Mon, Aug 07, 2017 at 10:28:57AM -0400, David Malcolm wrote: > > > What would the ideal API > > > look like? > > > > > > Maybe something like: > > > > > > extern gcc_jit_type * > > > gcc_jit_type_get_vector (gcc_jit_type *type, unsigned nunits); > > > > > > with various requirements (type must be integral/floating point; > > > nunits > > > must be a power of two). > > > > I suspect that would do the job nicely. > > I implemented the above (although I switched the 2nd arg to be > "size_t num_units"). Thanks! I haven't been able to try the vector type yet; current gcc trunk which I just pulled failed to build. But I have started work using gcc 6.4 and 7.1 libgccjit (without the vector type) and have a problem noted below. But first: > It looks like you may not need to explicitly use builtins to > access machine specific simd intrinsics; for example, on x86_64 > when I tried multiplying two of these together for float, with > GCC_JIT_BINARY_OP_MULT, which led to this gimple: > > jit_v4f_mult (const vector(4) * a, const vector(4) * b, vector(4) * c) > { > initial: > _1 = *a; > _2 = *b; > _3 = _1 * _2; > *c = _3; > return; > } > > on this x86_64 box it compiled to: > > movaps (%rdi), %xmm0 > mulps (%rsi), %xmm0 > movaps %xmm0, (%rdx) > ret > > (i.e. using the "mulps" SIMD instruction). Yep, compiling with optimisation set to -O3 will enable the vectorisation optimisations. I normally compile with -O2; historically I have found -O3 as likely to impair performance as to improve performance (so I tend not to use -O3) but maybe that has changed in recent decades ;-) The vectorisation optimisations are not clever enough to well optimise more complicated image processing filters so accessing the builtins will be necessary. But I have hit a problem which I suspect is a bug in the gcc optimiser. In the vein of your example above, but working on uint8_t pixel data and adding saturation, the jit compiler segfaults in the optimiser. I provide below the gimple produced by the function that causes the problem (I presume that is more useful than the code calling the gcc_jit routines), and a backtrace from the jit compiler. This example is from Debian gcc 6.3.0-18 (but it also happens with gcc 7.1; unfortunately my build of gcc from the trunk failed). Should I file a bug report, and if so, against what component? For the below I have set optimisation level to -O3 (to get vectorisation) and specified -mavx2 as a compiler arg. (BTW, the same segfault also occurs when compiling for Arm and Arm64. Also if I set optimisation level to -O2 the example compiles and runs correctly.) The offending function I implement in the JIT is essentially: ip_jit_im_add_clip_UBYTE (struct ip_image * dest, struct ip_image * src) { int rowlen = dest->size.x; int numrows = dest->size.y; for (int j=0; jimrow[j]; sptr = (uint8_t *)src->imrow[j]; for (int i=0; i UINT8_MAX) ival = UINT8_MAX; *dptr = (uint8_t)ival; sptr++; dptr++; } } } The gimple produced is: ip_jit_im_add_clip_UBYTE (struct ip_image * dest, struct ip_image * src) { void * * D.370; sizetype D.371; sizetype D.372; void * * D.373; void * * D.374; void * * D.375; unsigned char D.376; signed int D.377; unsigned char D.378; signed int D.379; unsigned char D.380; sizetype D.381; signed int ival; signed int i; unsigned char * sptr; unsigned char * dptr; signed int j; signed int numrows; signed int rowlen; F1: rowlen = dest->size.x; numrows = dest->size.y; j = 0; goto C1; C1: if (j < numrows) goto L1; else goto A1; L1: D.370 = dest->imrow; D.371 = (sizetype) j; D.372 = D.371 * 8; D.373 = D.370 + D.372; dptr = *D.373; D.374 = src->imrow; D.371 = (sizetype) j; D.372 = D.371 * 8; D.375 = D.374 + D.372; sptr = *D.375; i = 0; goto C2; C2: if (i < rowlen) goto L2; else goto A2; L2: D.376 = *dptr; D.377 = (signed int) D.376; D.378 = *sptr; D.379 = (signed int) D.378; ival = D.377 + D.379; if (ival > 255) goto p_C1_true; else goto p_C1_end; A2: j = j + 1; goto C1; A1: return; p_C1_true: ival = 255; goto p_C1_end; p_C1_end: D.380 = (unsigned char) ival; *dptr = D.380; D.381 = 1; dptr = dptr + D.381; D.381 = 1; sptr = sptr + D.381; i = i + 1; goto C2; } And the optimiser segfaults while compiling the above with: Program received signal SIGSEGV, Segmentation fault. optab_for_tree_code (code=code@entry=VEC_UNPACK_LO_EXPR, type=type@entry=0x0, subtype=subtype@entry=optab_default) at ../../src/gcc/optabs-tree.c:190 190 ../../src/gcc/optabs-tree.c: No such file or directory. (gdb) bt #0 optab_for_tree_code (code=code@entry=VEC_UNPACK_LO_EXPR, type=type@entry=0x0, subtype=subtype@entry=optab_default) at ../../src/gcc/optabs-tree.c:190 #1 0x00007ffff6148593 in supportable_widening_operation (code=code@entry=NOP_EXPR, stmt=stmt@entry=0x7ffff3d170f0, vectype_out=vectype_out@entry=0x7ffff3d32f18, vectype_in=0x7ffff3cf3f18, code1=code1@entry=0x7fffffffd804, code2=code2@entry=0x7fffffffd808, multi_step_cvt=0x7fffffffd814, interm_types=0x7fffffffd850) at ../../src/gcc/tree-vect-stmts.c:9037 #2 0x00007ffff614c2e5 in vectorizable_conversion (stmt=stmt@entry=0x7ffff3d170f0, gsi=gsi@entry=0x0, vec_stmt=vec_stmt@entry=0x0, slp_node=slp_node@entry=0x0) at ../../src/gcc/tree-vect-stmts.c:3803 #3 0x00007ffff6159d25 in vect_analyze_stmt (stmt=stmt@entry=0x7ffff3d170f0, need_to_vectorize=need_to_vectorize@entry=0x7fffffffd978, node=node@entry=0x0) at ../../src/gcc/tree-vect-stmts.c:8135 #4 0x00007ffff616830b in vect_analyze_loop_operations (loop_vinfo=0x555555e80660, loop_vinfo=0x555555e80660) at ../../src/gcc/tree-vect-loop.c:1727 #5 vect_analyze_loop_2 (fatal=: , loop_vinfo=0x555555e80660) at ../../src/gcc/tree-vect-loop.c:2015 #6 vect_analyze_loop (loop=loop@entry=0x7ffff3d02ee0) at ../../src/gcc/tree-vect-loop.c:2268 #7 0x00007ffff617a37f in vectorize_loops () at ../../src/gcc/tree-vectorizer.c:532 #8 0x00007ffff5eec80a in execute_one_pass (pass=pass@entry=0x555555c335c0) at ../../src/gcc/passes.c:2336 #9 0x00007ffff5eecdd8 in execute_pass_list_1 (pass=0x555555c335c0) at ../../src/gcc/passes.c:2420 #10 0x00007ffff5eecdea in execute_pass_list_1 (pass=0x555555c32e30) at ../../src/gcc/passes.c:2421 #11 0x00007ffff5eecdea in execute_pass_list_1 (pass=0x555555c31c90) at ../../src/gcc/passes.c:2421 #12 0x00007ffff5eece3d in execute_pass_list (fn=, pass=) at ../../src/gcc/passes.c:2431 #13 0x00007ffff5c7c4b3 in cgraph_node::expand (this=0x7ffff3d132e0) at ../../src/gcc/cgraphunit.c:1990 #14 0x00007ffff5c7db6f in expand_all_functions () at ../../src/gcc/cgraphunit.c:2126 #15 symbol_table::compile (this=0x7ffff3cd30a8) at ../../src/gcc/cgraphunit.c:2482 #16 0x00007ffff5c7f53a in symbol_table::finalize_compilation_unit (this=0x7ffff3cd30a8) at ../../src/gcc/cgraphunit.c:2572 #17 0x00007ffff5fa227a in compile_file () at ../../src/gcc/toplev.c:488 #18 0x00007ffff5bdb207 in do_compile () at ../../src/gcc/toplev.c:2011 #19 toplev::main (this=this@entry=0x7fffffffdd4e, argc=, argv=) at ../../src/gcc/toplev.c:2119 #20 0x00007ffff5bfd066 in gcc::jit::playback::context::compile (this=this@entry=0x7fffffffdda0) at ../../src/gcc/jit/jit-playback.c:1789 #21 0x00007ffff5bf3bf9 in gcc::jit::recording::context::compile (this=this@entry=0x555555bb8990) at ../../src/gcc/jit/jit-recording.c:1241 #22 0x00007ffff5be9649 in gcc_jit_context_compile (ctxt=0x555555bb8990) at ../../src/gcc/jit/libgccjit.c:2677 #23 0x00005555555703c7 in ip_init_jit () at jit.c:615 #24 0x0000555555568c6f in im_add (dest=0x5555559b3530, src=0x5555558b0eb0, flag=0) at arith.c:750 #25 0x000055555556364e in run_libip_operator (flag=0, s=0x5555558b0eb0, d=0x5555559b3530, op=0) at arith-test.c:228 #26 im_op_ii_check (op=0, type=3, size=..., flag=, source=) at arith-test.c:334 #27 0x000055555556428f in run_im_ii_tests (operator=0, size=..., chk_flag=114) at arith-test.c:488 #28 0x000055555555ef34 in main (argc=, argv=) at arith-test.c:601 Cheers Michael.