From: Michael Cree <mcree@orcon.net.nz>
To: David Malcolm <dmalcolm@redhat.com>
Cc: jit@gcc.gnu.org
Subject: Re: [committed] jit: add gcc_jit_type_get_vector
Date: Sun, 01 Jan 2017 00:00:00 -0000 [thread overview]
Message-ID: <20170816095854.dp3qe5dmsuqnblsg@tower> (raw)
In-Reply-To: <1502326873-58234-1-git-send-email-dmalcolm@redhat.com>
On Wed, Aug 09, 2017 at 09:01:13PM -0400, David Malcolm wrote:
> On Wed, 2017-08-09 at 20:42 +1200, Michael Cree wrote:
> > On Mon, Aug 07, 2017 at 10:28:57AM -0400, David Malcolm wrote:
> > > What would the ideal API
> > > look like?
> > >
> > > Maybe something like:
> > >
> > > extern gcc_jit_type *
> > > gcc_jit_type_get_vector (gcc_jit_type *type, unsigned nunits);
> > >
> > > with various requirements (type must be integral/floating point;
> > > nunits
> > > must be a power of two).
> >
> > I suspect that would do the job nicely.
>
> I implemented the above (although I switched the 2nd arg to be
> "size_t num_units").
Thanks! I haven't been able to try the vector type yet; current
gcc trunk which I just pulled failed to build.
But I have started work using gcc 6.4 and 7.1 libgccjit (without
the vector type) and have a problem noted below. But first:
> It looks like you may not need to explicitly use builtins to
> access machine specific simd intrinsics; for example, on x86_64
> when I tried multiplying two of these together for float, with
> GCC_JIT_BINARY_OP_MULT, which led to this gimple:
>
> jit_v4f_mult (const vector(4) <float:32> * a, const vector(4) <float:32> * b, vector(4) <float:32> * c)
> {
> initial:
> _1 = *a;
> _2 = *b;
> _3 = _1 * _2;
> *c = _3;
> return;
> }
>
> on this x86_64 box it compiled to:
>
> movaps (%rdi), %xmm0
> mulps (%rsi), %xmm0
> movaps %xmm0, (%rdx)
> ret
>
> (i.e. using the "mulps" SIMD instruction).
Yep, compiling with optimisation set to -O3 will enable the
vectorisation optimisations. I normally compile with -O2; historically
I have found -O3 as likely to impair performance as to improve
performance (so I tend not to use -O3) but maybe that has changed in
recent decades ;-)
The vectorisation optimisations are not clever enough to well optimise
more complicated image processing filters so accessing the builtins will
be necessary.
But I have hit a problem which I suspect is a bug in the gcc optimiser.
In the vein of your example above, but working on uint8_t pixel data
and adding saturation, the jit compiler segfaults in the optimiser. I
provide below the gimple produced by the function that causes the
problem (I presume that is more useful than the code calling the
gcc_jit routines), and a backtrace from the jit compiler. This example
is from Debian gcc 6.3.0-18 (but it also happens with gcc 7.1;
unfortunately my build of gcc from the trunk failed). Should I file a
bug report, and if so, against what component?
For the below I have set optimisation level to -O3 (to get
vectorisation) and specified -mavx2 as a compiler arg. (BTW, the same
segfault also occurs when compiling for Arm and Arm64. Also if I set
optimisation level to -O2 the example compiles and runs correctly.)
The offending function I implement in the JIT is essentially:
ip_jit_im_add_clip_UBYTE (struct ip_image * dest, struct ip_image * src)
{
int rowlen = dest->size.x;
int numrows = dest->size.y;
for (int j=0; j<numrows; j++) {
uint8_t *sptr, *dptr;
dptr = (uint8_t *)dest->imrow[j];
sptr = (uint8_t *)src->imrow[j];
for (int i=0; i<rowlen; i++) {
int ival = (int)*dptr + (int)*sptr;
if (ival > UINT8_MAX)
ival = UINT8_MAX;
*dptr = (uint8_t)ival;
sptr++; dptr++;
}
}
}
The gimple produced is:
ip_jit_im_add_clip_UBYTE (struct ip_image * dest, struct ip_image * src)
{
void * * D.370;
sizetype D.371;
sizetype D.372;
void * * D.373;
void * * D.374;
void * * D.375;
unsigned char D.376;
signed int D.377;
unsigned char D.378;
signed int D.379;
unsigned char D.380;
sizetype D.381;
signed int ival;
signed int i;
unsigned char * sptr;
unsigned char * dptr;
signed int j;
signed int numrows;
signed int rowlen;
F1:
rowlen = dest->size.x;
numrows = dest->size.y;
j = 0;
goto C1;
C1:
if (j < numrows) goto L1; else goto A1;
L1:
D.370 = dest->imrow;
D.371 = (sizetype) j;
D.372 = D.371 * 8;
D.373 = D.370 + D.372;
dptr = *D.373;
D.374 = src->imrow;
D.371 = (sizetype) j;
D.372 = D.371 * 8;
D.375 = D.374 + D.372;
sptr = *D.375;
i = 0;
goto C2;
C2:
if (i < rowlen) goto L2; else goto A2;
L2:
D.376 = *dptr;
D.377 = (signed int) D.376;
D.378 = *sptr;
D.379 = (signed int) D.378;
ival = D.377 + D.379;
if (ival > 255) goto p_C1_true; else goto p_C1_end;
A2:
j = j + 1;
goto C1;
A1:
return;
p_C1_true:
ival = 255;
goto p_C1_end;
p_C1_end:
D.380 = (unsigned char) ival;
*dptr = D.380;
D.381 = 1;
dptr = dptr + D.381;
D.381 = 1;
sptr = sptr + D.381;
i = i + 1;
goto C2;
}
And the optimiser segfaults while compiling the above with:
Program received signal SIGSEGV, Segmentation fault.
optab_for_tree_code (code=code@entry=VEC_UNPACK_LO_EXPR, type=type@entry=0x0,
subtype=subtype@entry=optab_default) at ../../src/gcc/optabs-tree.c:190
190 ../../src/gcc/optabs-tree.c: No such file or directory.
(gdb) bt
#0 optab_for_tree_code (code=code@entry=VEC_UNPACK_LO_EXPR, type=type@entry=0x0,
subtype=subtype@entry=optab_default) at ../../src/gcc/optabs-tree.c:190
#1 0x00007ffff6148593 in supportable_widening_operation (code=code@entry=NOP_EXPR,
stmt=stmt@entry=0x7ffff3d170f0, vectype_out=vectype_out@entry=0x7ffff3d32f18,
vectype_in=0x7ffff3cf3f18, code1=code1@entry=0x7fffffffd804,
code2=code2@entry=0x7fffffffd808, multi_step_cvt=0x7fffffffd814,
interm_types=0x7fffffffd850) at ../../src/gcc/tree-vect-stmts.c:9037
#2 0x00007ffff614c2e5 in vectorizable_conversion (stmt=stmt@entry=0x7ffff3d170f0,
gsi=gsi@entry=0x0, vec_stmt=vec_stmt@entry=0x0, slp_node=slp_node@entry=0x0)
at ../../src/gcc/tree-vect-stmts.c:3803
#3 0x00007ffff6159d25 in vect_analyze_stmt (stmt=stmt@entry=0x7ffff3d170f0,
need_to_vectorize=need_to_vectorize@entry=0x7fffffffd978, node=node@entry=0x0)
at ../../src/gcc/tree-vect-stmts.c:8135
#4 0x00007ffff616830b in vect_analyze_loop_operations (loop_vinfo=0x555555e80660,
loop_vinfo=0x555555e80660) at ../../src/gcc/tree-vect-loop.c:1727
#5 vect_analyze_loop_2 (fatal=<synthetic pointer>: <optimized out>, loop_vinfo=0x555555e80660)
at ../../src/gcc/tree-vect-loop.c:2015
#6 vect_analyze_loop (loop=loop@entry=0x7ffff3d02ee0) at ../../src/gcc/tree-vect-loop.c:2268
#7 0x00007ffff617a37f in vectorize_loops () at ../../src/gcc/tree-vectorizer.c:532
#8 0x00007ffff5eec80a in execute_one_pass (pass=pass@entry=0x555555c335c0)
at ../../src/gcc/passes.c:2336
#9 0x00007ffff5eecdd8 in execute_pass_list_1 (pass=0x555555c335c0)
at ../../src/gcc/passes.c:2420
#10 0x00007ffff5eecdea in execute_pass_list_1 (pass=0x555555c32e30)
at ../../src/gcc/passes.c:2421
#11 0x00007ffff5eecdea in execute_pass_list_1 (pass=0x555555c31c90)
at ../../src/gcc/passes.c:2421
#12 0x00007ffff5eece3d in execute_pass_list (fn=<optimized out>, pass=<optimized out>)
at ../../src/gcc/passes.c:2431
#13 0x00007ffff5c7c4b3 in cgraph_node::expand (this=0x7ffff3d132e0)
at ../../src/gcc/cgraphunit.c:1990
#14 0x00007ffff5c7db6f in expand_all_functions () at ../../src/gcc/cgraphunit.c:2126
#15 symbol_table::compile (this=0x7ffff3cd30a8) at ../../src/gcc/cgraphunit.c:2482
#16 0x00007ffff5c7f53a in symbol_table::finalize_compilation_unit (this=0x7ffff3cd30a8)
at ../../src/gcc/cgraphunit.c:2572
#17 0x00007ffff5fa227a in compile_file () at ../../src/gcc/toplev.c:488
#18 0x00007ffff5bdb207 in do_compile () at ../../src/gcc/toplev.c:2011
#19 toplev::main (this=this@entry=0x7fffffffdd4e, argc=<optimized out>, argv=<optimized out>)
at ../../src/gcc/toplev.c:2119
#20 0x00007ffff5bfd066 in gcc::jit::playback::context::compile (this=this@entry=0x7fffffffdda0)
at ../../src/gcc/jit/jit-playback.c:1789
#21 0x00007ffff5bf3bf9 in gcc::jit::recording::context::compile (this=this@entry=0x555555bb8990)
at ../../src/gcc/jit/jit-recording.c:1241
#22 0x00007ffff5be9649 in gcc_jit_context_compile (ctxt=0x555555bb8990)
at ../../src/gcc/jit/libgccjit.c:2677
#23 0x00005555555703c7 in ip_init_jit () at jit.c:615
#24 0x0000555555568c6f in im_add (dest=0x5555559b3530, src=0x5555558b0eb0, flag=0)
at arith.c:750
#25 0x000055555556364e in run_libip_operator (flag=0, s=0x5555558b0eb0, d=0x5555559b3530, op=0)
at arith-test.c:228
#26 im_op_ii_check (op=0, type=3, size=..., flag=<optimized out>, source=<optimized out>)
at arith-test.c:334
#27 0x000055555556428f in run_im_ii_tests (operator=0, size=..., chk_flag=114)
at arith-test.c:488
#28 0x000055555555ef34 in main (argc=<optimized out>, argv=<optimized out>) at arith-test.c:601
Cheers
Michael.
next prev parent reply other threads:[~2017-08-16 10:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-01 0:00 does libgccjit support vector types? Michael Cree
2017-01-01 0:00 ` David Malcolm
2017-01-01 0:00 ` Michael Cree
2017-01-01 0:00 ` [committed] jit: add gcc_jit_type_get_vector David Malcolm
2017-01-01 0:00 ` Michael Cree [this message]
2017-01-01 0:00 ` David Malcolm
2017-01-01 0:00 ` Michael Cree
2017-01-01 0:00 ` David Malcolm
2017-01-01 0:00 ` [committed] jit: fix segfault with autovectorization (PR tree-optimization/46805) David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170816095854.dp3qe5dmsuqnblsg@tower \
--to=mcree@orcon.net.nz \
--cc=dmalcolm@redhat.com \
--cc=jit@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).